* [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
@ 2025-05-09 0:46 NeilBrown
2025-05-09 0:46 ` [PATCH 1/6] nfs_localio: use cmpxchg() to install new nfs_file_localio NeilBrown
` (6 more replies)
0 siblings, 7 replies; 56+ messages in thread
From: NeilBrown @ 2025-05-09 0:46 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Mike Snitzer, Pali Rohár,
Vincent Mailhol
Cc: Chuck Lever, Jeff Layton, linux-nfs
This is a revised version a the earlier series. I've actually tested
this time and fixed a few issues including the one that Mike found.
Original description:
Following the reports of older compilers complaining about the rcu
annotations in nfs-localio I reviewed the relevant code and found some
races and opportunities for simplification.
Thanks,
NeilBrown
[PATCH 1/6] nfs_localio: use cmpxchg() to install new
[PATCH 2/6] nfs_localio: always hold nfsd net ref with nfsd_file ref
[PATCH 3/6] nfs_localio: simplify interface to nfsd for getting
[PATCH 4/6] nfs_localio: duplicate nfs_close_local_fh()
[PATCH 5/6] nfs_localio: protect race between nfs_uuid_put() and
[PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 1/6] nfs_localio: use cmpxchg() to install new nfs_file_localio
2025-05-09 0:46 [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers NeilBrown
@ 2025-05-09 0:46 ` NeilBrown
2025-05-09 0:46 ` [PATCH 2/6] nfs_localio: always hold nfsd net ref with nfsd_file ref NeilBrown
` (5 subsequent siblings)
6 siblings, 0 replies; 56+ messages in thread
From: NeilBrown @ 2025-05-09 0:46 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Mike Snitzer, Pali Rohár,
Vincent Mailhol
Cc: Chuck Lever, Jeff Layton, linux-nfs
Rather than using nfs_uuid.lock to protect installing
a new ro_file or rw_file, change to use cmpxchg().
Removing the file already uses xchg() so this improves symmetry
and also makes the code a little simpler.
Also remove the optimisation of not taking the lock, and not removing
the nfs_file_localio from the linked list, when both ->ro_file and
->rw_file are already NULL. Given that ->nfs_uuid was not NULL, it is
extremely unlikely that neither ->ro_file or ->rw_file is NULL so
this optimisation can be of little value and it complicates
understanding of the code - why can the list_del_init() be skipped?
Finally, move the assignment of NULL to ->nfs_uuid until after
the last action on the nfs_file_localio (the list_del_init). As soon as
this is NULL a racing nfs_close_local_fh() can bypass all the locking
and go on to free the nfs_file_localio, so we must be certain to be
finished with it first.
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/localio.c | 11 +++--------
fs/nfs_common/nfslocalio.c | 39 +++++++++++++++++---------------------
2 files changed, 20 insertions(+), 30 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 5c21caeae075..3674dd86f095 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -279,14 +279,9 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
if (IS_ERR(new))
return NULL;
/* try to swap in the pointer */
- spin_lock(&clp->cl_uuid.lock);
- nf = rcu_dereference_protected(*pnf, 1);
- if (!nf) {
- nf = new;
- new = NULL;
- rcu_assign_pointer(*pnf, nf);
- }
- spin_unlock(&clp->cl_uuid.lock);
+ nf = unrcu_pointer(cmpxchg(pnf, NULL, RCU_INITIALIZER(new)));
+ if (!nf)
+ swap(nf, new);
rcu_read_lock();
}
nf = nfs_local_file_get(nf);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 6a0bdea6d644..bdf251332b6b 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -273,8 +273,8 @@ EXPORT_SYMBOL_GPL(nfs_open_local_fh);
void nfs_close_local_fh(struct nfs_file_localio *nfl)
{
- struct nfsd_file *ro_nf = NULL;
- struct nfsd_file *rw_nf = NULL;
+ struct nfsd_file *ro_nf;
+ struct nfsd_file *rw_nf;
nfs_uuid_t *nfs_uuid;
rcu_read_lock();
@@ -285,28 +285,23 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
return;
}
- ro_nf = rcu_access_pointer(nfl->ro_file);
- rw_nf = rcu_access_pointer(nfl->rw_file);
- if (ro_nf || rw_nf) {
- spin_lock(&nfs_uuid->lock);
- if (ro_nf)
- ro_nf = rcu_dereference_protected(xchg(&nfl->ro_file, NULL), 1);
- if (rw_nf)
- rw_nf = rcu_dereference_protected(xchg(&nfl->rw_file, NULL), 1);
-
- /* Remove nfl from nfs_uuid->files list */
- RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
- list_del_init(&nfl->list);
- spin_unlock(&nfs_uuid->lock);
- rcu_read_unlock();
+ ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
+ rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
- if (ro_nf)
- nfs_to_nfsd_file_put_local(ro_nf);
- if (rw_nf)
- nfs_to_nfsd_file_put_local(rw_nf);
- return;
- }
+ spin_lock(&nfs_uuid->lock);
+ /* Remove nfl from nfs_uuid->files list */
+ list_del_init(&nfl->list);
+ spin_unlock(&nfs_uuid->lock);
rcu_read_unlock();
+ /* Now we can allow racing nfs_close_local_fh() to
+ * skip the locking.
+ */
+ RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
+
+ if (ro_nf)
+ nfs_to_nfsd_file_put_local(ro_nf);
+ if (rw_nf)
+ nfs_to_nfsd_file_put_local(rw_nf);
}
EXPORT_SYMBOL_GPL(nfs_close_local_fh);
--
2.49.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 2/6] nfs_localio: always hold nfsd net ref with nfsd_file ref
2025-05-09 0:46 [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers NeilBrown
2025-05-09 0:46 ` [PATCH 1/6] nfs_localio: use cmpxchg() to install new nfs_file_localio NeilBrown
@ 2025-05-09 0:46 ` NeilBrown
2025-05-09 0:46 ` [PATCH 3/6] nfs_localio: simplify interface to nfsd for getting nfsd_file NeilBrown
` (4 subsequent siblings)
6 siblings, 0 replies; 56+ messages in thread
From: NeilBrown @ 2025-05-09 0:46 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Mike Snitzer, Pali Rohár,
Vincent Mailhol
Cc: Chuck Lever, Jeff Layton, linux-nfs
Having separate nfsd_file_put and nfsd_file_put_local in struct
nfsd_localio_operations doesn't make much sense. The difference is that
nfsd_file_put doesn't drop a reference to the nfs_net which is what
keeps nfsd from shutting down.
Currently, if nfsd tries to shutdown it will invalidate the files stored
in the list from the nfs_uuid and this will drop all references to the
nfsd net that the client holds. But the client could still hold some
references to nfsd_files for active IO. So nfsd might think is has
completely shut down local IO, but hasn't and has no way to wait for
those active IO requests to complete.
So this patch changes nfsd_file_get to nfsd_file_get_local and has it
increase the ref count on the nfsd net and it replaces all calls to
->nfsd_put_file to ->nfsd_put_file_local.
It also changes ->nfsd_open_local_fh to return with the refcount on the
net elevated precisely when a valid nfsd_file is returned.
This means that whenever the client holds a valid nfsd_file, there will
be an associated count on the nfsd net, and so the count can only reach
zero when all nfsd_files have been returned.
nfs_local_file_put() is changed to call nfs_to_nfsd_file_put_local()
instead of replacing calls to one with calls to the other because this
will help a later patch which changes nfs_to_nfsd_file_put_local() to
take an __rcu pointer while nfs_local_file_put() doesn't.
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/localio.c | 4 ++--
fs/nfs_common/nfslocalio.c | 5 ++---
fs/nfsd/filecache.c | 21 +++++++++++++++++++++
fs/nfsd/filecache.h | 1 +
fs/nfsd/localio.c | 9 +++++++--
include/linux/nfslocalio.h | 3 +--
6 files changed, 34 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 3674dd86f095..27d5fff9747b 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -209,12 +209,12 @@ EXPORT_SYMBOL_GPL(nfs_local_probe_async);
static inline struct nfsd_file *nfs_local_file_get(struct nfsd_file *nf)
{
- return nfs_to->nfsd_file_get(nf);
+ return nfs_to->nfsd_file_get_local(nf);
}
static inline void nfs_local_file_put(struct nfsd_file *nf)
{
- nfs_to->nfsd_file_put(nf);
+ nfs_to_nfsd_file_put_local(nf);
}
/*
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index bdf251332b6b..f6821b2c87a2 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -262,9 +262,8 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
/* We have an implied reference to net thanks to nfsd_net_try_get */
localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt,
cred, nfs_fh, fmode);
- if (IS_ERR(localio))
- nfs_to_nfsd_net_put(net);
- else
+ nfs_to_nfsd_net_put(net);
+ if (!IS_ERR(localio))
nfs_uuid_add_file(uuid, nfl);
return localio;
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ab85e6a2454f..eedf2af8ee6e 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -386,6 +386,27 @@ nfsd_file_put_local(struct nfsd_file *nf)
return net;
}
+/**
+ * nfsd_file_get_local - get nfsd_file reference and reference to net
+ * @nf: nfsd_file of which to put the reference
+ *
+ * Get reference to both the nfsd_file and nf->nf_net.
+ */
+struct nfsd_file *
+nfsd_file_get_local(struct nfsd_file *nf)
+{
+ struct net *net = nf->nf_net;
+
+ if (nfsd_net_try_get(net)) {
+ nf = nfsd_file_get(nf);
+ if (!nf)
+ nfsd_net_put(net);
+ } else {
+ nf = NULL;
+ }
+ return nf;
+}
+
/**
* nfsd_file_file - get the backing file of an nfsd_file
* @nf: nfsd_file of which to access the backing file.
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 5865f9c72712..cd02f91aaef1 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -63,6 +63,7 @@ int nfsd_file_cache_start_net(struct net *net);
void nfsd_file_cache_shutdown_net(struct net *net);
void nfsd_file_put(struct nfsd_file *nf);
struct net *nfsd_file_put_local(struct nfsd_file *nf);
+struct nfsd_file *nfsd_file_get_local(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
struct file *nfsd_file_file(struct nfsd_file *nf);
void nfsd_file_close_inode_sync(struct inode *inode);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 238647fa379e..2c0afd1067ea 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -29,8 +29,7 @@ static const struct nfsd_localio_operations nfsd_localio_ops = {
.nfsd_net_put = nfsd_net_put,
.nfsd_open_local_fh = nfsd_open_local_fh,
.nfsd_file_put_local = nfsd_file_put_local,
- .nfsd_file_get = nfsd_file_get,
- .nfsd_file_put = nfsd_file_put,
+ .nfsd_file_get_local = nfsd_file_get_local,
.nfsd_file_file = nfsd_file_file,
};
@@ -71,6 +70,9 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
if (nfs_fh->size > NFS4_FHSIZE)
return ERR_PTR(-EINVAL);
+ if (!nfsd_net_try_get(net))
+ return ERR_PTR(-ENXIO);
+
/* nfs_fh -> svc_fh */
fh_init(&fh, NFS4_FHSIZE);
fh.fh_handle.fh_size = nfs_fh->size;
@@ -92,6 +94,9 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
if (rq_cred.cr_group_info)
put_group_info(rq_cred.cr_group_info);
+ if (IS_ERR(localio))
+ nfsd_net_put(net);
+
return localio;
}
EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 9aa8a43843d7..c3f34bae60e1 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -66,8 +66,7 @@ struct nfsd_localio_operations {
const struct nfs_fh *,
const fmode_t);
struct net *(*nfsd_file_put_local)(struct nfsd_file *);
- struct nfsd_file *(*nfsd_file_get)(struct nfsd_file *);
- void (*nfsd_file_put)(struct nfsd_file *);
+ struct nfsd_file *(*nfsd_file_get_local)(struct nfsd_file *);
struct file *(*nfsd_file_file)(struct nfsd_file *);
} ____cacheline_aligned;
--
2.49.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 3/6] nfs_localio: simplify interface to nfsd for getting nfsd_file
2025-05-09 0:46 [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers NeilBrown
2025-05-09 0:46 ` [PATCH 1/6] nfs_localio: use cmpxchg() to install new nfs_file_localio NeilBrown
2025-05-09 0:46 ` [PATCH 2/6] nfs_localio: always hold nfsd net ref with nfsd_file ref NeilBrown
@ 2025-05-09 0:46 ` NeilBrown
2025-05-09 0:46 ` [PATCH 4/6] nfs_localio: duplicate nfs_close_local_fh() NeilBrown
` (3 subsequent siblings)
6 siblings, 0 replies; 56+ messages in thread
From: NeilBrown @ 2025-05-09 0:46 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Mike Snitzer, Pali Rohár,
Vincent Mailhol
Cc: Chuck Lever, Jeff Layton, linux-nfs
The nfsd_localio_operations structure contains nfsd_file_get() to get a
reference to an nfsd_file. This is only used in one place, where
nfsd_open_local_fh() is also used.
This patch combines the two, calling nfsd_open_local_fh() passing a
pointer to where the nfsd_file pointer might be stored. If there is a
pointer there an nfsd_file_get() can get a reference, that reference is
returned. If not a new nfsd_file is acquired, stored at the pointer,
and returned. When we store a reference we also increase the refcount
on the net, as that refcount is decrements when we clear the stored
pointer.
We now get an extra reference *before* storing the new nfsd_file at the
given location. This avoids possible races with the nfsd_file being
freed before the final reference can be taken.
This patch moves the rcu_dereference() needed after fetching from
ro_file or rw_file into the nfsd code where the 'struct nfs_file' is
fully defined. This avoids an error reported by older versions of gcc
such as gcc-8 which complain about rcu_dereference() use in contexts
where the structure (which will supposedly be accessed) is not fully
defined.
Reported-by: Pali Rohár <pali@kernel.org>
Reported-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/localio.c | 31 ++++--------------
fs/nfs_common/nfslocalio.c | 3 +-
fs/nfsd/localio.c | 65 +++++++++++++++++++++++++++-----------
include/linux/nfslocalio.h | 6 ++--
4 files changed, 57 insertions(+), 48 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 27d5fff9747b..030a54c8c9d8 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -207,11 +207,6 @@ void nfs_local_probe_async(struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_local_probe_async);
-static inline struct nfsd_file *nfs_local_file_get(struct nfsd_file *nf)
-{
- return nfs_to->nfsd_file_get_local(nf);
-}
-
static inline void nfs_local_file_put(struct nfsd_file *nf)
{
nfs_to_nfsd_file_put_local(nf);
@@ -226,12 +221,13 @@ static inline void nfs_local_file_put(struct nfsd_file *nf)
static struct nfsd_file *
__nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, struct nfs_file_localio *nfl,
+ struct nfsd_file __rcu **pnf,
const fmode_t mode)
{
struct nfsd_file *localio;
localio = nfs_open_local_fh(&clp->cl_uuid, clp->cl_rpcclient,
- cred, fh, nfl, mode);
+ cred, fh, nfl, pnf, mode);
if (IS_ERR(localio)) {
int status = PTR_ERR(localio);
trace_nfs_local_open_fh(fh, mode, status);
@@ -258,7 +254,7 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, struct nfs_file_localio *nfl,
const fmode_t mode)
{
- struct nfsd_file *nf, *new, __rcu **pnf;
+ struct nfsd_file *nf, __rcu **pnf;
if (!nfs_server_is_local(clp))
return NULL;
@@ -270,24 +266,9 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
else
pnf = &nfl->ro_file;
- new = NULL;
- rcu_read_lock();
- nf = rcu_dereference(*pnf);
- if (!nf) {
- rcu_read_unlock();
- new = __nfs_local_open_fh(clp, cred, fh, nfl, mode);
- if (IS_ERR(new))
- return NULL;
- /* try to swap in the pointer */
- nf = unrcu_pointer(cmpxchg(pnf, NULL, RCU_INITIALIZER(new)));
- if (!nf)
- swap(nf, new);
- rcu_read_lock();
- }
- nf = nfs_local_file_get(nf);
- rcu_read_unlock();
- if (new)
- nfs_to_nfsd_file_put_local(new);
+ nf = __nfs_local_open_fh(clp, cred, fh, nfl, pnf, mode);
+ if (IS_ERR(nf))
+ return NULL;
return nf;
}
EXPORT_SYMBOL_GPL(nfs_local_open_fh);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index f6821b2c87a2..503f85f64b76 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -237,6 +237,7 @@ static void nfs_uuid_add_file(nfs_uuid_t *nfs_uuid, struct nfs_file_localio *nfl
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
const struct nfs_fh *nfs_fh, struct nfs_file_localio *nfl,
+ struct nfsd_file __rcu **pnf,
const fmode_t fmode)
{
struct net *net;
@@ -261,7 +262,7 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
rcu_read_unlock();
/* We have an implied reference to net thanks to nfsd_net_try_get */
localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt,
- cred, nfs_fh, fmode);
+ cred, nfs_fh, pnf, fmode);
nfs_to_nfsd_net_put(net);
if (!IS_ERR(localio))
nfs_uuid_add_file(uuid, nfl);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 2c0afd1067ea..80d9ff6608a7 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -24,20 +24,6 @@
#include "filecache.h"
#include "cache.h"
-static const struct nfsd_localio_operations nfsd_localio_ops = {
- .nfsd_net_try_get = nfsd_net_try_get,
- .nfsd_net_put = nfsd_net_put,
- .nfsd_open_local_fh = nfsd_open_local_fh,
- .nfsd_file_put_local = nfsd_file_put_local,
- .nfsd_file_get_local = nfsd_file_get_local,
- .nfsd_file_file = nfsd_file_file,
-};
-
-void nfsd_localio_ops_init(void)
-{
- nfs_to = &nfsd_localio_ops;
-}
-
/**
* nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
*
@@ -46,6 +32,7 @@ void nfsd_localio_ops_init(void)
* @rpc_clnt: rpc_clnt that the client established
* @cred: cred that the client established
* @nfs_fh: filehandle to lookup
+ * @nfp: place to find the nfsd_file, or store it if it was non-NULL
* @fmode: fmode_t to use for open
*
* This function maps a local fh to a path on a local filesystem.
@@ -56,10 +43,11 @@ void nfsd_localio_ops_init(void)
* set. Caller (NFS client) is responsible for calling nfsd_net_put and
* nfsd_file_put (via nfs_to_nfsd_file_put_local).
*/
-struct nfsd_file *
+static struct nfsd_file *
nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
- const struct nfs_fh *nfs_fh, const fmode_t fmode)
+ const struct nfs_fh *nfs_fh, struct nfsd_file __rcu **pnf,
+ const fmode_t fmode)
{
int mayflags = NFSD_MAY_LOCALIO;
struct svc_cred rq_cred;
@@ -73,6 +61,12 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
if (!nfsd_net_try_get(net))
return ERR_PTR(-ENXIO);
+ rcu_read_lock();
+ localio = nfsd_file_get(rcu_dereference(*pnf));
+ rcu_read_unlock();
+ if (localio)
+ return localio;
+
/* nfs_fh -> svc_fh */
fh_init(&fh, NFS4_FHSIZE);
fh.fh_handle.fh_size = nfs_fh->size;
@@ -94,12 +88,47 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
if (rq_cred.cr_group_info)
put_group_info(rq_cred.cr_group_info);
- if (IS_ERR(localio))
+ if (!IS_ERR(localio)) {
+ struct nfsd_file *new;
+ if (!nfsd_net_try_get(net)) {
+ nfsd_file_put(localio);
+ nfsd_net_put(net);
+ return ERR_PTR(-ENXIO);
+ }
+ nfsd_file_get(localio);
+ again:
+ new = unrcu_pointer(cmpxchg(pnf, NULL, RCU_INITIALIZER(localio)));
+ if (new) {
+ /* Some other thread installed an nfsd_file */
+ if (nfsd_file_get(new) == NULL)
+ goto again;
+ /*
+ * Drop the ref we were going to install and the
+ * one we were going to return.
+ */
+ nfsd_file_put(localio);
+ nfsd_file_put(localio);
+ localio = new;
+ }
+ } else
nfsd_net_put(net);
return localio;
}
-EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
+
+static const struct nfsd_localio_operations nfsd_localio_ops = {
+ .nfsd_net_try_get = nfsd_net_try_get,
+ .nfsd_net_put = nfsd_net_put,
+ .nfsd_open_local_fh = nfsd_open_local_fh,
+ .nfsd_file_put_local = nfsd_file_put_local,
+ .nfsd_file_get_local = nfsd_file_get_local,
+ .nfsd_file_file = nfsd_file_file,
+};
+
+void nfsd_localio_ops_init(void)
+{
+ nfs_to = &nfsd_localio_ops;
+}
/*
* UUID_IS_LOCAL XDR functions
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index c3f34bae60e1..e6cd6ec447f5 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -50,10 +50,6 @@ void nfs_localio_invalidate_clients(struct list_head *nn_local_clients,
spinlock_t *nn_local_clients_lock);
/* localio needs to map filehandle -> struct nfsd_file */
-extern struct nfsd_file *
-nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
- const struct cred *, const struct nfs_fh *,
- const fmode_t) __must_hold(rcu);
void nfs_close_local_fh(struct nfs_file_localio *);
struct nfsd_localio_operations {
@@ -64,6 +60,7 @@ struct nfsd_localio_operations {
struct rpc_clnt *,
const struct cred *,
const struct nfs_fh *,
+ struct nfsd_file __rcu **pnf,
const fmode_t);
struct net *(*nfsd_file_put_local)(struct nfsd_file *);
struct nfsd_file *(*nfsd_file_get_local)(struct nfsd_file *);
@@ -76,6 +73,7 @@ extern const struct nfsd_localio_operations *nfs_to;
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *,
struct rpc_clnt *, const struct cred *,
const struct nfs_fh *, struct nfs_file_localio *,
+ struct nfsd_file __rcu **pnf,
const fmode_t);
static inline void nfs_to_nfsd_net_put(struct net *net)
--
2.49.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 4/6] nfs_localio: duplicate nfs_close_local_fh()
2025-05-09 0:46 [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers NeilBrown
` (2 preceding siblings ...)
2025-05-09 0:46 ` [PATCH 3/6] nfs_localio: simplify interface to nfsd for getting nfsd_file NeilBrown
@ 2025-05-09 0:46 ` NeilBrown
2025-05-09 0:46 ` [PATCH 5/6] nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh() NeilBrown
` (2 subsequent siblings)
6 siblings, 0 replies; 56+ messages in thread
From: NeilBrown @ 2025-05-09 0:46 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Mike Snitzer, Pali Rohár,
Vincent Mailhol
Cc: Chuck Lever, Jeff Layton, linux-nfs
nfs_close_local_fh() is called from two different places for quite
different use case.
It is called from nfs_uuid_put() when the nfs_uuid is being detached -
possibly because the nfs server is not longer serving that filesystem.
In this case there will always be an nfs_uuid and so rcu_read_lock() is
not needed.
It is also called when the nfs_file_localio is no longer needed. In
this case there may not be an active nfs_uuid.
These two can race, and handling the race properly while avoiding
excessive locking will require different handling on each side.
This patch prepares the way by opencoding nfs_close_local_fh() into
nfs_uuid_put(), then simplifying the code there as befits the context.
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs_common/nfslocalio.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 503f85f64b76..49c59f0c78c6 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -171,7 +171,26 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
/* Walk list of files and ensure their last references dropped */
list_for_each_entry_safe(nfl, tmp, &local_files, list) {
- nfs_close_local_fh(nfl);
+ struct nfsd_file *ro_nf;
+ struct nfsd_file *rw_nf;
+
+ ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
+ rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
+
+ spin_lock(&nfs_uuid->lock);
+ /* Remove nfl from nfs_uuid->files list */
+ list_del_init(&nfl->list);
+ spin_unlock(&nfs_uuid->lock);
+ /* Now we can allow racing nfs_close_local_fh() to
+ * skip the locking.
+ */
+ RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
+
+ if (ro_nf)
+ nfs_to_nfsd_file_put_local(ro_nf);
+ if (rw_nf)
+ nfs_to_nfsd_file_put_local(rw_nf);
+
cond_resched();
}
--
2.49.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 5/6] nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()
2025-05-09 0:46 [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers NeilBrown
` (3 preceding siblings ...)
2025-05-09 0:46 ` [PATCH 4/6] nfs_localio: duplicate nfs_close_local_fh() NeilBrown
@ 2025-05-09 0:46 ` NeilBrown
2025-05-09 0:46 ` [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer NeilBrown
2025-05-09 16:01 ` [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers Chuck Lever
6 siblings, 0 replies; 56+ messages in thread
From: NeilBrown @ 2025-05-09 0:46 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Mike Snitzer, Pali Rohár,
Vincent Mailhol
Cc: Chuck Lever, Jeff Layton, linux-nfs
nfs_uuid_put() and nfs_close_local_fh() can race if a "struct
nfs_file_localio" is released at the same time that nfsd calls
nfs_localio_invalidate_clients().
It is important that neither of these functions completes after the
other has started looking at a given nfs_file_localio and before it
finishes.
If nfs_uuid_put() exits while nfs_close_local_fh() is closing ro_file
and rw_file it could return to __nfd_file_cache_purge() while some files
are still referenced so the purge may not succeed.
If nfs_close_local_fh() exits while nfsd_uuid_put() is still closing the
files then the "struct nfs_file_localio" could be freed while
nfsd_uuid_put() is still looking at it. This side is currently handled
by copying the pointers out of ro_file and rw_file before deleting from
the list in nfsd_uuid. We need to preserve this while ensuring that
nfsd_uuid_put() does wait for nfs_close_local_fh().
This patch use nfl->uuid and nfl->list to provide the required
interlock.
nfs_uuid_put() removes the nfs_file_localio from the list, then drops
locks and puts the two files, then reclaims the spinlock and sets
->nfs_uuid to NULL.
nfs_close_local_fh() operates in the reverse order, setting ->nfs_uuid
to NULL, then closing the files, then unlinking from the list.
If nfs_uuid_put() finds that ->nfs_uuid is already NULL, it waits for
the nfs_file_localio to be removed from the list. If
nfs_close_local_fh() find that it has already been unlinked it waits for
->nfs_uuid to become NULL. This ensure that one of the two tries to
close the files, but they each waits for the other.
As nfs_uuid_put() is making the list empty, change from a
list_for_each_safe loop to a while that always takes the first entry.
This makes the intent more clear.
Also don't move the list to a temporary local list as this would defeat
the guarantees required for the interlock.
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs_common/nfslocalio.c | 81 ++++++++++++++++++++++++++------------
1 file changed, 56 insertions(+), 25 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 49c59f0c78c6..1dd5a8cca064 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -151,8 +151,7 @@ EXPORT_SYMBOL_GPL(nfs_localio_enable_client);
*/
static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
{
- LIST_HEAD(local_files);
- struct nfs_file_localio *nfl, *tmp;
+ struct nfs_file_localio *nfl;
spin_lock(&nfs_uuid->lock);
if (unlikely(!rcu_access_pointer(nfs_uuid->net))) {
@@ -166,37 +165,49 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
nfs_uuid->dom = NULL;
}
- list_splice_init(&nfs_uuid->files, &local_files);
- spin_unlock(&nfs_uuid->lock);
-
/* Walk list of files and ensure their last references dropped */
- list_for_each_entry_safe(nfl, tmp, &local_files, list) {
+
+ while ((nfl = list_first_entry_or_null(&nfs_uuid->files,
+ struct nfs_file_localio,
+ list)) != NULL) {
struct nfsd_file *ro_nf;
struct nfsd_file *rw_nf;
+ /* If nfs_uuid is already NULL, nfs_close_local_fh is
+ * closing and we must wait, else we unlink and close.
+ */
+ if (rcu_access_pointer(nfl->nfs_uuid) == NULL) {
+ /* nfs_close_local_fh() is doing the
+ * close and we must wait. until it unlinks
+ */
+ wait_var_event_spinlock(nfl,
+ list_first_entry_or_null(
+ &nfs_uuid->files,
+ struct nfs_file_localio,
+ list) != nfl,
+ &nfs_uuid->lock);
+ continue;
+ }
+
ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
- spin_lock(&nfs_uuid->lock);
/* Remove nfl from nfs_uuid->files list */
list_del_init(&nfl->list);
spin_unlock(&nfs_uuid->lock);
- /* Now we can allow racing nfs_close_local_fh() to
- * skip the locking.
- */
- RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
-
if (ro_nf)
nfs_to_nfsd_file_put_local(ro_nf);
if (rw_nf)
nfs_to_nfsd_file_put_local(rw_nf);
-
cond_resched();
+ spin_lock(&nfs_uuid->lock);
+ /* Now we can allow racing nfs_close_local_fh() to
+ * skip the locking.
+ */
+ RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
+ wake_up_var_locked(&nfl->nfs_uuid, &nfs_uuid->lock);
}
- spin_lock(&nfs_uuid->lock);
- BUG_ON(!list_empty(&nfs_uuid->files));
-
/* Remove client from nn->local_clients */
if (nfs_uuid->list_lock) {
spin_lock(nfs_uuid->list_lock);
@@ -304,23 +315,43 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
return;
}
- ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
- rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
-
spin_lock(&nfs_uuid->lock);
- /* Remove nfl from nfs_uuid->files list */
- list_del_init(&nfl->list);
+ if (!rcu_access_pointer(nfl->nfs_uuid)) {
+ /* nfs_uuid_put has finished here */
+ spin_unlock(&nfs_uuid->lock);
+ rcu_read_unlock();
+ return;
+ }
+ if (list_empty(&nfs_uuid->files)) {
+ /* nfs_uuid_put() has started closing files, wait for it
+ * to finished
+ */
+ spin_unlock(&nfs_uuid->lock);
+ rcu_read_unlock();
+ wait_var_event(&nfl->nfs_uuid,
+ rcu_access_pointer(nfl->nfs_uuid) == NULL);
+ return;
+ }
+ /* tell nfs_uuid_put() to wait for us */
+ RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
spin_unlock(&nfs_uuid->lock);
rcu_read_unlock();
- /* Now we can allow racing nfs_close_local_fh() to
- * skip the locking.
- */
- RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
+ ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
+ rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
if (ro_nf)
nfs_to_nfsd_file_put_local(ro_nf);
if (rw_nf)
nfs_to_nfsd_file_put_local(rw_nf);
+
+ /* Remove nfl from nfs_uuid->files list and signal nfs_uuid_put()
+ * that we are done. The moment we drop the spinlock the
+ * nfs_uuid could be freed.
+ */
+ spin_lock(&nfs_uuid->lock);
+ list_del_init(&nfl->list);
+ wake_up_var_locked(&nfl->nfs_uuid, &nfs_uuid->lock);
+ spin_unlock(&nfs_uuid->lock);
}
EXPORT_SYMBOL_GPL(nfs_close_local_fh);
--
2.49.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer
2025-05-09 0:46 [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers NeilBrown
` (4 preceding siblings ...)
2025-05-09 0:46 ` [PATCH 5/6] nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh() NeilBrown
@ 2025-05-09 0:46 ` NeilBrown
2025-05-09 11:03 ` kernel test robot
2025-07-08 14:20 ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
2025-05-09 16:01 ` [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers Chuck Lever
6 siblings, 2 replies; 56+ messages in thread
From: NeilBrown @ 2025-05-09 0:46 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Mike Snitzer, Pali Rohár,
Vincent Mailhol
Cc: Chuck Lever, Jeff Layton, linux-nfs
Instead of calling xchg() and unrcu_pointer() before
nfsd_file_put_local(), we now pass pointer to the __rcu pointer and call
xchg() and unrcu_pointer() inside that function.
Where unrcu_pointer() is currently called the internals of "struct
nfsd_file" are not known and that causes older compilers such as gcc-8
to complain.
In some cases we have a __kernel (aka normal) pointer not an __rcu
pointer so we need to cast it to __rcu first. This is strictly a
weakening so no information is lost. Somewhat surprisingly, this cast
is accepted by gcc-8.
This has the pleasing result that the cmpxchg() which sets ro_file and
rw_file, and also the xchg() which clears them, are both now in the nfsd
code.
Reported-by: Pali Rohár <pali@kernel.org>
Reported-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Fixes: 86e00412254a ("nfs: cache all open LOCALIO nfsd_file(s) in client")
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/localio.c | 11 +++++++++--
fs/nfs_common/nfslocalio.c | 24 ++++++------------------
fs/nfsd/filecache.c | 11 ++++++++---
fs/nfsd/filecache.h | 2 +-
include/linux/nfslocalio.h | 17 ++++++++++-------
5 files changed, 34 insertions(+), 31 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 030a54c8c9d8..e6d36b3d3fc0 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -207,9 +207,16 @@ void nfs_local_probe_async(struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_local_probe_async);
-static inline void nfs_local_file_put(struct nfsd_file *nf)
+static inline void nfs_local_file_put(struct nfsd_file *localio)
{
- nfs_to_nfsd_file_put_local(nf);
+ /* nfs_to_nfsd_file_put_local() expects an __rcu pointer
+ * but we have a __kernel pointer. It is always safe
+ * to cast a __kernel pointer to an __rcu pointer
+ * because the cast only weakens what is known about the pointer.
+ */
+ struct nfsd_file __rcu *nf = (struct nfsd_file __rcu*) localio;
+
+ nfs_to_nfsd_file_put_local(&nf);
}
/*
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 1dd5a8cca064..05c7c16e37ab 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -170,9 +170,6 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
while ((nfl = list_first_entry_or_null(&nfs_uuid->files,
struct nfs_file_localio,
list)) != NULL) {
- struct nfsd_file *ro_nf;
- struct nfsd_file *rw_nf;
-
/* If nfs_uuid is already NULL, nfs_close_local_fh is
* closing and we must wait, else we unlink and close.
*/
@@ -189,17 +186,14 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
continue;
}
- ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
- rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
-
/* Remove nfl from nfs_uuid->files list */
list_del_init(&nfl->list);
spin_unlock(&nfs_uuid->lock);
- if (ro_nf)
- nfs_to_nfsd_file_put_local(ro_nf);
- if (rw_nf)
- nfs_to_nfsd_file_put_local(rw_nf);
+
+ nfs_to_nfsd_file_put_local(&nfl->ro_file);
+ nfs_to_nfsd_file_put_local(&nfl->rw_file);
cond_resched();
+
spin_lock(&nfs_uuid->lock);
/* Now we can allow racing nfs_close_local_fh() to
* skip the locking.
@@ -303,8 +297,6 @@ EXPORT_SYMBOL_GPL(nfs_open_local_fh);
void nfs_close_local_fh(struct nfs_file_localio *nfl)
{
- struct nfsd_file *ro_nf;
- struct nfsd_file *rw_nf;
nfs_uuid_t *nfs_uuid;
rcu_read_lock();
@@ -337,12 +329,8 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
spin_unlock(&nfs_uuid->lock);
rcu_read_unlock();
- ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
- rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
- if (ro_nf)
- nfs_to_nfsd_file_put_local(ro_nf);
- if (rw_nf)
- nfs_to_nfsd_file_put_local(rw_nf);
+ nfs_to_nfsd_file_put_local(&nfl->ro_file);
+ nfs_to_nfsd_file_put_local(&nfl->rw_file);
/* Remove nfl from nfs_uuid->files list and signal nfs_uuid_put()
* that we are done. The moment we drop the spinlock the
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index eedf2af8ee6e..e108b6c705b4 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -378,11 +378,16 @@ nfsd_file_put(struct nfsd_file *nf)
* the reference of the nfsd_file.
*/
struct net *
-nfsd_file_put_local(struct nfsd_file *nf)
+nfsd_file_put_local(struct nfsd_file __rcu **pnf)
{
- struct net *net = nf->nf_net;
+ struct nfsd_file *nf;
+ struct net *net = NULL;
- nfsd_file_put(nf);
+ nf = unrcu_pointer(xchg(pnf, NULL));
+ if (nf) {
+ net = nf->nf_net;
+ nfsd_file_put(nf);
+ }
return net;
}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index cd02f91aaef1..722b26c71e45 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -62,7 +62,7 @@ void nfsd_file_cache_shutdown(void);
int nfsd_file_cache_start_net(struct net *net);
void nfsd_file_cache_shutdown_net(struct net *net);
void nfsd_file_put(struct nfsd_file *nf);
-struct net *nfsd_file_put_local(struct nfsd_file *nf);
+struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
struct nfsd_file *nfsd_file_get_local(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
struct file *nfsd_file_file(struct nfsd_file *nf);
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index e6cd6ec447f5..5c7c92659e73 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -62,7 +62,7 @@ struct nfsd_localio_operations {
const struct nfs_fh *,
struct nfsd_file __rcu **pnf,
const fmode_t);
- struct net *(*nfsd_file_put_local)(struct nfsd_file *);
+ struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
struct nfsd_file *(*nfsd_file_get_local)(struct nfsd_file *);
struct file *(*nfsd_file_file)(struct nfsd_file *);
} ____cacheline_aligned;
@@ -88,16 +88,19 @@ static inline void nfs_to_nfsd_net_put(struct net *net)
rcu_read_unlock();
}
-static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
+static inline void nfs_to_nfsd_file_put_local(struct nfsd_file __rcu **localio)
{
/*
- * Must not hold RCU otherwise nfsd_file_put() can easily trigger:
- * "Voluntary context switch within RCU read-side critical section!"
- * by scheduling deep in underlying filesystem (e.g. XFS).
+ * Either *localio must be guaranteed to be non-NULL, or caller
+ * must prevent nfsd shutdown from completing as nfs_close_local_fh()
+ * does by blocking the nfs_uuid from being finally put.
*/
- struct net *net = nfs_to->nfsd_file_put_local(localio);
+ struct net *net;
- nfs_to_nfsd_net_put(net);
+ net = nfs_to->nfsd_file_put_local(localio);
+
+ if (net)
+ nfs_to_nfsd_net_put(net);
}
#else /* CONFIG_NFS_LOCALIO */
--
2.49.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer
2025-05-09 0:46 ` [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer NeilBrown
@ 2025-05-09 11:03 ` kernel test robot
2025-07-08 14:20 ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
1 sibling, 0 replies; 56+ messages in thread
From: kernel test robot @ 2025-05-09 11:03 UTC (permalink / raw)
To: NeilBrown, Trond Myklebust, Anna Schumaker, Mike Snitzer,
Pali Rohár, Vincent Mailhol
Cc: oe-kbuild-all, Chuck Lever, Jeff Layton, linux-nfs
Hi NeilBrown,
kernel test robot noticed the following build warnings:
[auto build test WARNING on brauner-vfs/vfs.all]
[also build test WARNING on linus/master v6.15-rc5]
[cannot apply to trondmy-nfs/linux-next next-20250508]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/nfs_localio-use-cmpxchg-to-install-new-nfs_file_localio/20250509-085046
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20250509004852.3272120-7-neil%40brown.name
patch subject: [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer
config: i386-buildonly-randconfig-002-20250509 (https://download.01.org/0day-ci/archive/20250509/202505091849.42eZCL5e-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250509/202505091849.42eZCL5e-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505091849.42eZCL5e-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> fs/nfsd/filecache.c:382: warning: Function parameter or struct member 'pnf' not described in 'nfsd_file_put_local'
>> fs/nfsd/filecache.c:382: warning: Excess function parameter 'nf' description in 'nfsd_file_put_local'
vim +382 fs/nfsd/filecache.c
65294c1f2c5e72 Jeff Layton 2019-08-18 372
a61e147e6be6e7 Mike Snitzer 2024-09-05 373 /**
b33f7dec3a6721 Mike Snitzer 2024-11-15 374 * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
c840b8e1f039e9 Mike Snitzer 2024-11-13 375 * @nf: nfsd_file of which to put the reference
a61e147e6be6e7 Mike Snitzer 2024-09-05 376 *
c840b8e1f039e9 Mike Snitzer 2024-11-13 377 * First save the associated net to return to caller, then put
c840b8e1f039e9 Mike Snitzer 2024-11-13 378 * the reference of the nfsd_file.
a61e147e6be6e7 Mike Snitzer 2024-09-05 379 */
c840b8e1f039e9 Mike Snitzer 2024-11-13 380 struct net *
100c0943040501 NeilBrown 2025-05-09 381 nfsd_file_put_local(struct nfsd_file __rcu **pnf)
a61e147e6be6e7 Mike Snitzer 2024-09-05 @382 {
100c0943040501 NeilBrown 2025-05-09 383 struct nfsd_file *nf;
100c0943040501 NeilBrown 2025-05-09 384 struct net *net = NULL;
a61e147e6be6e7 Mike Snitzer 2024-09-05 385
100c0943040501 NeilBrown 2025-05-09 386 nf = unrcu_pointer(xchg(pnf, NULL));
100c0943040501 NeilBrown 2025-05-09 387 if (nf) {
100c0943040501 NeilBrown 2025-05-09 388 net = nf->nf_net;
a61e147e6be6e7 Mike Snitzer 2024-09-05 389 nfsd_file_put(nf);
100c0943040501 NeilBrown 2025-05-09 390 }
c840b8e1f039e9 Mike Snitzer 2024-11-13 391 return net;
a61e147e6be6e7 Mike Snitzer 2024-09-05 392 }
a61e147e6be6e7 Mike Snitzer 2024-09-05 393
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
2025-05-09 0:46 [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers NeilBrown
` (5 preceding siblings ...)
2025-05-09 0:46 ` [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer NeilBrown
@ 2025-05-09 16:01 ` Chuck Lever
2025-05-09 21:02 ` Mike Snitzer
2025-05-10 3:01 ` NeilBrown
6 siblings, 2 replies; 56+ messages in thread
From: Chuck Lever @ 2025-05-09 16:01 UTC (permalink / raw)
To: NeilBrown, Trond Myklebust, Anna Schumaker, Mike Snitzer,
Pali Rohár, Vincent Mailhol
Cc: Jeff Layton, linux-nfs, paulmck
[ adding Paul McK ]
On 5/8/25 8:46 PM, NeilBrown wrote:
> This is a revised version a the earlier series. I've actually tested
> this time and fixed a few issues including the one that Mike found.
As Mike mentioned in a previous thread, at this point, any fix for this
issue will need to be applied to recent stable kernels as well. This
series looks a bit too complicated for that.
I expect that other subsystems will encounter this issue eventually,
so it would be beneficial to address the root cause. For that purpose, I
think I like Vincent's proposal the best:
https://lore.kernel.org/linux-nfs/8c67a295-8caa-4e53-a764-f691657bbe62@wanadoo.fr/raw
None of this is to say that Neil's patches shouldn't be applied. But
perhaps these are not a broad solution to the RCU compilation issue.
> Original description:
> Following the reports of older compilers complaining about the rcu
> annotations in nfs-localio I reviewed the relevant code and found some
> races and opportunities for simplification.
>
> Thanks,
> NeilBrown
>
> [PATCH 1/6] nfs_localio: use cmpxchg() to install new
> [PATCH 2/6] nfs_localio: always hold nfsd net ref with nfsd_file ref
> [PATCH 3/6] nfs_localio: simplify interface to nfsd for getting
> [PATCH 4/6] nfs_localio: duplicate nfs_close_local_fh()
> [PATCH 5/6] nfs_localio: protect race between nfs_uuid_put() and
> [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a
--
Chuck Lever
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
2025-05-09 16:01 ` [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers Chuck Lever
@ 2025-05-09 21:02 ` Mike Snitzer
2025-05-10 0:16 ` Paul E. McKenney
2025-05-10 3:01 ` NeilBrown
1 sibling, 1 reply; 56+ messages in thread
From: Mike Snitzer @ 2025-05-09 21:02 UTC (permalink / raw)
To: Chuck Lever
Cc: NeilBrown, Trond Myklebust, Anna Schumaker, Pali Rohár,
Vincent Mailhol, Jeff Layton, linux-nfs, paulmck
On Fri, May 09, 2025 at 12:01:19PM -0400, Chuck Lever wrote:
> [ adding Paul McK ]
>
> On 5/8/25 8:46 PM, NeilBrown wrote:
> > This is a revised version a the earlier series. I've actually tested
> > this time and fixed a few issues including the one that Mike found.
>
> As Mike mentioned in a previous thread, at this point, any fix for this
> issue will need to be applied to recent stable kernels as well. This
> series looks a bit too complicated for that.
>
> I expect that other subsystems will encounter this issue eventually,
> so it would be beneficial to address the root cause. For that purpose, I
> think I like Vincent's proposal the best:
>
> https://lore.kernel.org/linux-nfs/8c67a295-8caa-4e53-a764-f691657bbe62@wanadoo.fr/raw
>
> None of this is to say that Neil's patches shouldn't be applied. But
> perhaps these are not a broad solution to the RCU compilation issue.
I agree with your suggested approach. Hopefully Paul agrees and
Vincent can polish a patch for near-term inclusion.
It'll be at least a week before I can put adequate time to reviewing
and testing Neil's patchset.
Thanks,
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
2025-05-09 21:02 ` Mike Snitzer
@ 2025-05-10 0:16 ` Paul E. McKenney
2025-05-10 2:44 ` NeilBrown
0 siblings, 1 reply; 56+ messages in thread
From: Paul E. McKenney @ 2025-05-10 0:16 UTC (permalink / raw)
To: Mike Snitzer
Cc: Chuck Lever, NeilBrown, Trond Myklebust, Anna Schumaker,
Pali Rohár, Vincent Mailhol, Jeff Layton, linux-nfs
On Fri, May 09, 2025 at 05:02:29PM -0400, Mike Snitzer wrote:
> On Fri, May 09, 2025 at 12:01:19PM -0400, Chuck Lever wrote:
> > [ adding Paul McK ]
> >
> > On 5/8/25 8:46 PM, NeilBrown wrote:
> > > This is a revised version a the earlier series. I've actually tested
> > > this time and fixed a few issues including the one that Mike found.
> >
> > As Mike mentioned in a previous thread, at this point, any fix for this
> > issue will need to be applied to recent stable kernels as well. This
> > series looks a bit too complicated for that.
> >
> > I expect that other subsystems will encounter this issue eventually,
> > so it would be beneficial to address the root cause. For that purpose, I
> > think I like Vincent's proposal the best:
> >
> > https://lore.kernel.org/linux-nfs/8c67a295-8caa-4e53-a764-f691657bbe62@wanadoo.fr/raw
> >
> > None of this is to say that Neil's patches shouldn't be applied. But
> > perhaps these are not a broad solution to the RCU compilation issue.
>
> I agree with your suggested approach. Hopefully Paul agrees and
> Vincent can polish a patch for near-term inclusion.
The main issue that I see with Vincent's patch is that we need the
asterisks ("*") to remain in order to mesh with the definition of __rcu
and to work correctly with the "sparse" tool. The __rcu address space
applies to the pointed-to data, not the pointer. So removing the
asterisks from __rcu_access_pointer() and friends would require the
various uses of __rcu to move to the other side of the asterisk,
right? For example, this:
struct pci_dev __rcu *pdev;
would need to become this:
struct pci_dev *__rcu pdev
And the same for the more than 1,000 other uses of __rcu.
Or am I missing something here? In particular, did it turn out that
sparse and the other tools were all happy with this change?
Thanx, Paul
> It'll be at least a week before I can put adequate time to reviewing
> and testing Neil's patchset.
>
> Thanks,
> Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
2025-05-10 0:16 ` Paul E. McKenney
@ 2025-05-10 2:44 ` NeilBrown
0 siblings, 0 replies; 56+ messages in thread
From: NeilBrown @ 2025-05-10 2:44 UTC (permalink / raw)
To: paulmck
Cc: Mike Snitzer, Chuck Lever, Trond Myklebust, Anna Schumaker,
Pali Rohár, Vincent Mailhol, Jeff Layton, linux-nfs
On Sat, 10 May 2025, paulmck@kernel.org wrote:
> On Fri, May 09, 2025 at 05:02:29PM -0400, Mike Snitzer wrote:
> > On Fri, May 09, 2025 at 12:01:19PM -0400, Chuck Lever wrote:
> > > [ adding Paul McK ]
> > >
> > > On 5/8/25 8:46 PM, NeilBrown wrote:
> > > > This is a revised version a the earlier series. I've actually tested
> > > > this time and fixed a few issues including the one that Mike found.
> > >
> > > As Mike mentioned in a previous thread, at this point, any fix for this
> > > issue will need to be applied to recent stable kernels as well. This
> > > series looks a bit too complicated for that.
> > >
> > > I expect that other subsystems will encounter this issue eventually,
> > > so it would be beneficial to address the root cause. For that purpose, I
> > > think I like Vincent's proposal the best:
> > >
> > > https://lore.kernel.org/linux-nfs/8c67a295-8caa-4e53-a764-f691657bbe62@wanadoo.fr/raw
> > >
> > > None of this is to say that Neil's patches shouldn't be applied. But
> > > perhaps these are not a broad solution to the RCU compilation issue.
> >
> > I agree with your suggested approach. Hopefully Paul agrees and
> > Vincent can polish a patch for near-term inclusion.
>
> The main issue that I see with Vincent's patch is that we need the
> asterisks ("*") to remain in order to mesh with the definition of __rcu
> and to work correctly with the "sparse" tool. The __rcu address space
> applies to the pointed-to data, not the pointer. So removing the
> asterisks from __rcu_access_pointer() and friends would require the
> various uses of __rcu to move to the other side of the asterisk,
> right? For example, this:
>
> struct pci_dev __rcu *pdev;
>
> would need to become this:
>
> struct pci_dev *__rcu pdev
>
> And the same for the more than 1,000 other uses of __rcu.
>
> Or am I missing something here? In particular, did it turn out that
> sparse and the other tools were all happy with this change?
Sparse isn't happy with the change. I think it appears to sparse as:
__force __kernel struct foo __rcu *
rather than
struct foo __rcu __force __kernel *
and the last one wins - so sparse sees __kernel last in the upstream
code but sees __rcu last with the patch applied.
Certainly I see messages like:
../include/net/netns/generic.h:46:12: warning: incorrect type in assignment (different address spaces)
../include/net/netns/generic.h:46:12: expected struct net_generic *ng
../include/net/netns/generic.h:46:12: got struct net_generic [noderef] __rcu *
so the __rcu it still seen.
NeilBrown
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
2025-05-09 16:01 ` [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers Chuck Lever
2025-05-09 21:02 ` Mike Snitzer
@ 2025-05-10 3:01 ` NeilBrown
2025-05-10 16:02 ` Chuck Lever
1 sibling, 1 reply; 56+ messages in thread
From: NeilBrown @ 2025-05-10 3:01 UTC (permalink / raw)
To: Chuck Lever
Cc: Trond Myklebust, Anna Schumaker, Mike Snitzer, Pali Rohár,
Vincent Mailhol, Jeff Layton, linux-nfs, paulmck
On Sat, 10 May 2025, Chuck Lever wrote:
> [ adding Paul McK ]
>
> On 5/8/25 8:46 PM, NeilBrown wrote:
> > This is a revised version a the earlier series. I've actually tested
> > this time and fixed a few issues including the one that Mike found.
>
> As Mike mentioned in a previous thread, at this point, any fix for this
> issue will need to be applied to recent stable kernels as well. This
> series looks a bit too complicated for that.
>
> I expect that other subsystems will encounter this issue eventually,
> so it would be beneficial to address the root cause. For that purpose, I
> think I like Vincent's proposal the best:
>
> https://lore.kernel.org/linux-nfs/8c67a295-8caa-4e53-a764-f691657bbe62@wanadoo.fr/raw
>
> None of this is to say that Neil's patches shouldn't be applied. But
> perhaps these are not a broad solution to the RCU compilation issue.
Do we need a "broad solution to the RCU compilation issue"?
Does it ever make sense to "dereference" a pointer to a structure that is
not fully specified? What does that even mean?
I find it harder to argue against use of rcu_access_pointer() in that
context, at least for test-against-NULL, but sparse doesn't complain
about a bare test of an __rcu pointer against NULL, so maybe there is no
need for rcu_access_pointer() for simple tests - in which case the
documentation should be updated.
(of course rcu_dereference() doesn't actually dereference the pointer,
despite its name. It just declared that there is an imminent intention
to dereference the pointer.....)
NeilBrown
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
2025-05-10 3:01 ` NeilBrown
@ 2025-05-10 16:02 ` Chuck Lever
2025-05-10 19:57 ` Mike Snitzer
0 siblings, 1 reply; 56+ messages in thread
From: Chuck Lever @ 2025-05-10 16:02 UTC (permalink / raw)
To: NeilBrown
Cc: Trond Myklebust, Anna Schumaker, Mike Snitzer, Pali Rohár,
Vincent Mailhol, Jeff Layton, linux-nfs, paulmck
On 5/9/25 11:01 PM, NeilBrown wrote:
> On Sat, 10 May 2025, Chuck Lever wrote:
>> [ adding Paul McK ]
>>
>> On 5/8/25 8:46 PM, NeilBrown wrote:
>>> This is a revised version a the earlier series. I've actually tested
>>> this time and fixed a few issues including the one that Mike found.
>>
>> As Mike mentioned in a previous thread, at this point, any fix for this
>> issue will need to be applied to recent stable kernels as well. This
>> series looks a bit too complicated for that.
>>
>> I expect that other subsystems will encounter this issue eventually,
>> so it would be beneficial to address the root cause. For that purpose, I
>> think I like Vincent's proposal the best:
>>
>> https://lore.kernel.org/linux-nfs/8c67a295-8caa-4e53-a764-f691657bbe62@wanadoo.fr/raw
>>
>> None of this is to say that Neil's patches shouldn't be applied. But
>> perhaps these are not a broad solution to the RCU compilation issue.
>
> Do we need a "broad solution to the RCU compilation issue"?
Fair question. If the current localio code is simply incorrect as it
stands, then I suppose the answer is no. Because gcc is happy to compile
it in most cases, I thought the problem was with older versions of gcc,
not with localio (even though, I agree, the use of an incomplete
structure definition is somewhat brittle when used with RCU).
> Does it ever make sense to "dereference" a pointer to a structure that is
> not fully specified? What does that even mean?
>
> I find it harder to argue against use of rcu_access_pointer() in that
> context, at least for test-against-NULL, but sparse doesn't complain
> about a bare test of an __rcu pointer against NULL, so maybe there is no
> need for rcu_access_pointer() for simple tests - in which case the
> documentation should be updated.
For backporting purposes, inventing our own local RCU helper to handle
the situation might be best. Then going forward, apply your patches to
rectify the use of the incomplete structure definition, and the local
helper can eventually be removed.
My interest is getting to a narrow set of changes that can be applied
now and backported as needed. The broader clean-ups can then be applied
to future kernels (or as subsequent patches in the same merge window).
My 2 cents, worth every penny.
> (of course rcu_dereference() doesn't actually dereference the pointer,
> despite its name. It just declared that there is an imminent intention
> to dereference the pointer.....)
>
> NeilBrown
--
Chuck Lever
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
2025-05-10 16:02 ` Chuck Lever
@ 2025-05-10 19:57 ` Mike Snitzer
2025-05-16 15:33 ` Chuck Lever
2025-05-19 3:49 ` NeilBrown
0 siblings, 2 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-05-10 19:57 UTC (permalink / raw)
To: Chuck Lever
Cc: NeilBrown, Trond Myklebust, Anna Schumaker, Pali Rohár,
Vincent Mailhol, Jeff Layton, linux-nfs, paulmck
On Sat, May 10, 2025 at 12:02:27PM -0400, Chuck Lever wrote:
> On 5/9/25 11:01 PM, NeilBrown wrote:
> > On Sat, 10 May 2025, Chuck Lever wrote:
> >> [ adding Paul McK ]
> >>
> >> On 5/8/25 8:46 PM, NeilBrown wrote:
> >>> This is a revised version a the earlier series. I've actually tested
> >>> this time and fixed a few issues including the one that Mike found.
> >>
> >> As Mike mentioned in a previous thread, at this point, any fix for this
> >> issue will need to be applied to recent stable kernels as well. This
> >> series looks a bit too complicated for that.
> >>
> >> I expect that other subsystems will encounter this issue eventually,
> >> so it would be beneficial to address the root cause. For that purpose, I
> >> think I like Vincent's proposal the best:
> >>
> >> https://lore.kernel.org/linux-nfs/8c67a295-8caa-4e53-a764-f691657bbe62@wanadoo.fr/raw
> >>
> >> None of this is to say that Neil's patches shouldn't be applied. But
> >> perhaps these are not a broad solution to the RCU compilation issue.
> >
> > Do we need a "broad solution to the RCU compilation issue"?
>
> Fair question. If the current localio code is simply incorrect as it
> stands, then I suppose the answer is no. Because gcc is happy to compile
> it in most cases, I thought the problem was with older versions of gcc,
> not with localio (even though, I agree, the use of an incomplete
> structure definition is somewhat brittle when used with RCU).
>
>
> > Does it ever make sense to "dereference" a pointer to a structure that is
> > not fully specified? What does that even mean?
> >
> > I find it harder to argue against use of rcu_access_pointer() in that
> > context, at least for test-against-NULL, but sparse doesn't complain
> > about a bare test of an __rcu pointer against NULL, so maybe there is no
> > need for rcu_access_pointer() for simple tests - in which case the
> > documentation should be updated.
>
> For backporting purposes, inventing our own local RCU helper to handle
> the situation might be best. Then going forward, apply your patches to
> rectify the use of the incomplete structure definition, and the local
> helper can eventually be removed.
>
> My interest is getting to a narrow set of changes that can be applied
> now and backported as needed. The broader clean-ups can then be applied
> to future kernels (or as subsequent patches in the same merge window).
>
> My 2 cents, worth every penny.
I really would prefer we just use this patch as the stop-gap for 6.14
and 6.15 (which I have been carrying for nearly a year now because I
need to support an EL8 platform that uses gcc 8.5):
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=kernel-6.12.24/nfs-testing&id=f9add5e4c9b4629b102876dce9484e1da3e35d1f
Then we work through getting Neil's patchset to land for 6.16 and
revert the stop-gap (dummy nfsd_file) patch.
> > (of course rcu_dereference() doesn't actually dereference the pointer,
> > despite its name. It just declared that there is an imminent intention
> > to dereference the pointer.....)
> >
> > NeilBrown
Rather than do a way more crazy stop-gap like this (which actually works):
fs/nfs/localio.c | 6 +++---
fs/nfs_common/nfslocalio.c | 8 +++----
include/linux/nfslocalio.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 59 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 73dd07495440..fedc07254c00 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -272,7 +272,7 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
new = NULL;
rcu_read_lock();
- nf = rcu_dereference(*pnf);
+ nf = rcu_dereference_opaque(*pnf);
if (!nf) {
rcu_read_unlock();
new = __nfs_local_open_fh(clp, cred, fh, nfl, mode);
@@ -281,11 +281,11 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
rcu_read_lock();
/* try to swap in the pointer */
spin_lock(&clp->cl_uuid.lock);
- nf = rcu_dereference_protected(*pnf, 1);
+ nf = rcu_dereference_opaque_protected(*pnf, 1);
if (!nf) {
nf = new;
new = NULL;
- rcu_assign_pointer(*pnf, nf);
+ rcu_assign_opaque_pointer(*pnf, nf);
}
spin_unlock(&clp->cl_uuid.lock);
}
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 6a0bdea6d644..213862ceb8bb 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -285,14 +285,14 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
return;
}
- ro_nf = rcu_access_pointer(nfl->ro_file);
- rw_nf = rcu_access_pointer(nfl->rw_file);
+ ro_nf = rcu_access_opaque(nfl->ro_file);
+ rw_nf = rcu_access_opaque(nfl->rw_file);
if (ro_nf || rw_nf) {
spin_lock(&nfs_uuid->lock);
if (ro_nf)
- ro_nf = rcu_dereference_protected(xchg(&nfl->ro_file, NULL), 1);
+ ro_nf = rcu_dereference_opaque_protected(xchg(&nfl->ro_file, NULL), 1);
if (rw_nf)
- rw_nf = rcu_dereference_protected(xchg(&nfl->rw_file, NULL), 1);
+ rw_nf = rcu_dereference_opaque_protected(xchg(&nfl->rw_file, NULL), 1);
/* Remove nfl from nfs_uuid->files list */
RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 9aa8a43843d7..c6e86891d4b5 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -15,6 +15,58 @@
#include <linux/sunrpc/svcauth.h>
#include <linux/nfs.h>
#include <net/net_namespace.h>
+#include <linux/rcupdate.h>
+
+/*
+ * RCU methods to allow fs/nfs_common and fs/nfs LOCALIO code to avoid
+ * dereferencing pointer to 'struct nfs_file' which is opaque outside fs/nfsd
+*/
+#define __rcu_access_opaque_pointer(p, local, space) \
+({ \
+ typeof(p) local = (__force typeof(p))READ_ONCE(p); \
+ rcu_check_sparse(p, space); \
+ (__force __kernel typeof(p))(local); \
+})
+
+#define rcu_access_opaque(p) __rcu_access_opaque_pointer((p), __UNIQUE_ID(rcu), __rcu)
+
+#define __rcu_dereference_opaque_protected(p, local, c, space) \
+({ \
+ RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_opaque_protected() usage"); \
+ rcu_check_sparse(p, space); \
+ (__force __kernel typeof(p))(p); \
+})
+
+#define rcu_dereference_opaque_protected(p, c) \
+ __rcu_dereference_opaque_protected((p), __UNIQUE_ID(rcu), (c), __rcu)
+
+#define __rcu_dereference_opaque_check(p, local, c, space) \
+({ \
+ /* Dependency order vs. p above. */ \
+ typeof(p) local = (__force typeof(p))READ_ONCE(p); \
+ RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_opaque_check() usage"); \
+ rcu_check_sparse(p, space); \
+ (__force __kernel typeof(p))(local); \
+})
+
+#define rcu_dereference_opaque_check(p, c) \
+ __rcu_dereference_opaque_check((p), __UNIQUE_ID(rcu), \
+ (c) || rcu_read_lock_held(), __rcu)
+
+#define rcu_dereference_opaque(p) rcu_dereference_opaque_check(p, 0)
+
+#define RCU_INITIALIZER_OPAQUE(v) (typeof((v)) __force __rcu)(v)
+
+#define rcu_assign_opaque_pointer(p, v) \
+do { \
+ uintptr_t _r_a_p__v = (uintptr_t)(v); \
+ rcu_check_sparse(p, __rcu); \
+ \
+ if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
+ WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
+ else \
+ smp_store_release(&p, RCU_INITIALIZER_OPAQUE((typeof(p))_r_a_p__v)); \
+} while (0)
struct nfs_client;
struct nfs_file_localio;
^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
2025-05-10 19:57 ` Mike Snitzer
@ 2025-05-16 15:33 ` Chuck Lever
2025-05-18 10:46 ` Pali Rohár
2025-05-19 3:49 ` NeilBrown
1 sibling, 1 reply; 56+ messages in thread
From: Chuck Lever @ 2025-05-16 15:33 UTC (permalink / raw)
To: Mike Snitzer
Cc: NeilBrown, Trond Myklebust, Anna Schumaker, Pali Rohár,
Vincent Mailhol, Jeff Layton, linux-nfs, paulmck
On 5/10/25 3:57 PM, Mike Snitzer wrote:
> On Sat, May 10, 2025 at 12:02:27PM -0400, Chuck Lever wrote:
>> On 5/9/25 11:01 PM, NeilBrown wrote:
>>> On Sat, 10 May 2025, Chuck Lever wrote:
>>>> [ adding Paul McK ]
>>>>
>>>> On 5/8/25 8:46 PM, NeilBrown wrote:
>>>>> This is a revised version a the earlier series. I've actually tested
>>>>> this time and fixed a few issues including the one that Mike found.
>>>>
>>>> As Mike mentioned in a previous thread, at this point, any fix for this
>>>> issue will need to be applied to recent stable kernels as well. This
>>>> series looks a bit too complicated for that.
>>>>
>>>> I expect that other subsystems will encounter this issue eventually,
>>>> so it would be beneficial to address the root cause. For that purpose, I
>>>> think I like Vincent's proposal the best:
>>>>
>>>> https://lore.kernel.org/linux-nfs/8c67a295-8caa-4e53-a764-f691657bbe62@wanadoo.fr/raw
>>>>
>>>> None of this is to say that Neil's patches shouldn't be applied. But
>>>> perhaps these are not a broad solution to the RCU compilation issue.
>>>
>>> Do we need a "broad solution to the RCU compilation issue"?
>>
>> Fair question. If the current localio code is simply incorrect as it
>> stands, then I suppose the answer is no. Because gcc is happy to compile
>> it in most cases, I thought the problem was with older versions of gcc,
>> not with localio (even though, I agree, the use of an incomplete
>> structure definition is somewhat brittle when used with RCU).
>>
>>
>>> Does it ever make sense to "dereference" a pointer to a structure that is
>>> not fully specified? What does that even mean?
>>>
>>> I find it harder to argue against use of rcu_access_pointer() in that
>>> context, at least for test-against-NULL, but sparse doesn't complain
>>> about a bare test of an __rcu pointer against NULL, so maybe there is no
>>> need for rcu_access_pointer() for simple tests - in which case the
>>> documentation should be updated.
>>
>> For backporting purposes, inventing our own local RCU helper to handle
>> the situation might be best. Then going forward, apply your patches to
>> rectify the use of the incomplete structure definition, and the local
>> helper can eventually be removed.
>>
>> My interest is getting to a narrow set of changes that can be applied
>> now and backported as needed. The broader clean-ups can then be applied
>> to future kernels (or as subsequent patches in the same merge window).
>>
>> My 2 cents, worth every penny.
>
> I really would prefer we just use this patch as the stop-gap for 6.14
> and 6.15 (which I have been carrying for nearly a year now because I
> need to support an EL8 platform that uses gcc 8.5):
>
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=kernel-6.12.24/nfs-testing&id=f9add5e4c9b4629b102876dce9484e1da3e35d1f
>
> Then we work through getting Neil's patchset to land for 6.16 and
> revert the stop-gap (dummy nfsd_file) patch.
>
>>> (of course rcu_dereference() doesn't actually dereference the pointer,
>>> despite its name. It just declared that there is an imminent intention
>>> to dereference the pointer.....)
>>>
>>> NeilBrown
>
> Rather than do a way more crazy stop-gap like this (which actually works):
The below looks about like I expected it to. Thanks for coding it up and
trying it!
Agreed that it is more verbose than the original patch, but this seems
more self-documenting to me and looks narrow enough to backport to LTS
kernels or even move these helpers to common headers if they are found
to be more broadly useful. I have only a mild preference for this
solution over the initial one.
One more comment below.
> fs/nfs/localio.c | 6 +++---
> fs/nfs_common/nfslocalio.c | 8 +++----
> include/linux/nfslocalio.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 59 insertions(+), 7 deletions(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index 73dd07495440..fedc07254c00 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -272,7 +272,7 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
>
> new = NULL;
> rcu_read_lock();
> - nf = rcu_dereference(*pnf);
> + nf = rcu_dereference_opaque(*pnf);
> if (!nf) {
> rcu_read_unlock();
> new = __nfs_local_open_fh(clp, cred, fh, nfl, mode);
> @@ -281,11 +281,11 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> rcu_read_lock();
> /* try to swap in the pointer */
> spin_lock(&clp->cl_uuid.lock);
> - nf = rcu_dereference_protected(*pnf, 1);
> + nf = rcu_dereference_opaque_protected(*pnf, 1);
> if (!nf) {
> nf = new;
> new = NULL;
> - rcu_assign_pointer(*pnf, nf);
> + rcu_assign_opaque_pointer(*pnf, nf);
> }
> spin_unlock(&clp->cl_uuid.lock);
> }
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index 6a0bdea6d644..213862ceb8bb 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -285,14 +285,14 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
> return;
> }
>
> - ro_nf = rcu_access_pointer(nfl->ro_file);
> - rw_nf = rcu_access_pointer(nfl->rw_file);
> + ro_nf = rcu_access_opaque(nfl->ro_file);
> + rw_nf = rcu_access_opaque(nfl->rw_file);
> if (ro_nf || rw_nf) {
> spin_lock(&nfs_uuid->lock);
> if (ro_nf)
> - ro_nf = rcu_dereference_protected(xchg(&nfl->ro_file, NULL), 1);
> + ro_nf = rcu_dereference_opaque_protected(xchg(&nfl->ro_file, NULL), 1);
> if (rw_nf)
> - rw_nf = rcu_dereference_protected(xchg(&nfl->rw_file, NULL), 1);
> + rw_nf = rcu_dereference_opaque_protected(xchg(&nfl->rw_file, NULL), 1);
I might combine rcu_dereference_opaque_protected and xchg into another
helper, since this rather verbose idiom appears twice.
> /* Remove nfl from nfs_uuid->files list */
> RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index 9aa8a43843d7..c6e86891d4b5 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -15,6 +15,58 @@
> #include <linux/sunrpc/svcauth.h>
> #include <linux/nfs.h>
> #include <net/net_namespace.h>
> +#include <linux/rcupdate.h>
> +
> +/*
> + * RCU methods to allow fs/nfs_common and fs/nfs LOCALIO code to avoid
> + * dereferencing pointer to 'struct nfs_file' which is opaque outside fs/nfsd
> +*/
> +#define __rcu_access_opaque_pointer(p, local, space) \
> +({ \
> + typeof(p) local = (__force typeof(p))READ_ONCE(p); \
> + rcu_check_sparse(p, space); \
> + (__force __kernel typeof(p))(local); \
> +})
> +
> +#define rcu_access_opaque(p) __rcu_access_opaque_pointer((p), __UNIQUE_ID(rcu), __rcu)
> +
> +#define __rcu_dereference_opaque_protected(p, local, c, space) \
> +({ \
> + RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_opaque_protected() usage"); \
> + rcu_check_sparse(p, space); \
> + (__force __kernel typeof(p))(p); \
> +})
> +
> +#define rcu_dereference_opaque_protected(p, c) \
> + __rcu_dereference_opaque_protected((p), __UNIQUE_ID(rcu), (c), __rcu)
> +
> +#define __rcu_dereference_opaque_check(p, local, c, space) \
> +({ \
> + /* Dependency order vs. p above. */ \
> + typeof(p) local = (__force typeof(p))READ_ONCE(p); \
> + RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_opaque_check() usage"); \
> + rcu_check_sparse(p, space); \
> + (__force __kernel typeof(p))(local); \
> +})
> +
> +#define rcu_dereference_opaque_check(p, c) \
> + __rcu_dereference_opaque_check((p), __UNIQUE_ID(rcu), \
> + (c) || rcu_read_lock_held(), __rcu)
> +
> +#define rcu_dereference_opaque(p) rcu_dereference_opaque_check(p, 0)
> +
> +#define RCU_INITIALIZER_OPAQUE(v) (typeof((v)) __force __rcu)(v)
> +
> +#define rcu_assign_opaque_pointer(p, v) \
> +do { \
> + uintptr_t _r_a_p__v = (uintptr_t)(v); \
> + rcu_check_sparse(p, __rcu); \
> + \
> + if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
> + WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
> + else \
> + smp_store_release(&p, RCU_INITIALIZER_OPAQUE((typeof(p))_r_a_p__v)); \
> +} while (0)
>
> struct nfs_client;
> struct nfs_file_localio;
--
Chuck Lever
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
2025-05-16 15:33 ` Chuck Lever
@ 2025-05-18 10:46 ` Pali Rohár
0 siblings, 0 replies; 56+ messages in thread
From: Pali Rohár @ 2025-05-18 10:46 UTC (permalink / raw)
To: Mike Snitzer
Cc: Chuck Lever, NeilBrown, Trond Myklebust, Anna Schumaker,
Vincent Mailhol, Jeff Layton, linux-nfs, paulmck
On Friday 16 May 2025 11:33:20 Chuck Lever wrote:
> On 5/10/25 3:57 PM, Mike Snitzer wrote:
> > On Sat, May 10, 2025 at 12:02:27PM -0400, Chuck Lever wrote:
> >> On 5/9/25 11:01 PM, NeilBrown wrote:
> >>> On Sat, 10 May 2025, Chuck Lever wrote:
> >>>> [ adding Paul McK ]
> >>>>
> >>>> On 5/8/25 8:46 PM, NeilBrown wrote:
> >>>>> This is a revised version a the earlier series. I've actually tested
> >>>>> this time and fixed a few issues including the one that Mike found.
> >>>>
> >>>> As Mike mentioned in a previous thread, at this point, any fix for this
> >>>> issue will need to be applied to recent stable kernels as well. This
> >>>> series looks a bit too complicated for that.
> >>>>
> >>>> I expect that other subsystems will encounter this issue eventually,
> >>>> so it would be beneficial to address the root cause. For that purpose, I
> >>>> think I like Vincent's proposal the best:
> >>>>
> >>>> https://lore.kernel.org/linux-nfs/8c67a295-8caa-4e53-a764-f691657bbe62@wanadoo.fr/raw
> >>>>
> >>>> None of this is to say that Neil's patches shouldn't be applied. But
> >>>> perhaps these are not a broad solution to the RCU compilation issue.
> >>>
> >>> Do we need a "broad solution to the RCU compilation issue"?
> >>
> >> Fair question. If the current localio code is simply incorrect as it
> >> stands, then I suppose the answer is no. Because gcc is happy to compile
> >> it in most cases, I thought the problem was with older versions of gcc,
> >> not with localio (even though, I agree, the use of an incomplete
> >> structure definition is somewhat brittle when used with RCU).
> >>
> >>
> >>> Does it ever make sense to "dereference" a pointer to a structure that is
> >>> not fully specified? What does that even mean?
> >>>
> >>> I find it harder to argue against use of rcu_access_pointer() in that
> >>> context, at least for test-against-NULL, but sparse doesn't complain
> >>> about a bare test of an __rcu pointer against NULL, so maybe there is no
> >>> need for rcu_access_pointer() for simple tests - in which case the
> >>> documentation should be updated.
> >>
> >> For backporting purposes, inventing our own local RCU helper to handle
> >> the situation might be best. Then going forward, apply your patches to
> >> rectify the use of the incomplete structure definition, and the local
> >> helper can eventually be removed.
> >>
> >> My interest is getting to a narrow set of changes that can be applied
> >> now and backported as needed. The broader clean-ups can then be applied
> >> to future kernels (or as subsequent patches in the same merge window).
> >>
> >> My 2 cents, worth every penny.
> >
> > I really would prefer we just use this patch as the stop-gap for 6.14
> > and 6.15 (which I have been carrying for nearly a year now because I
> > need to support an EL8 platform that uses gcc 8.5):
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=kernel-6.12.24/nfs-testing&id=f9add5e4c9b4629b102876dce9484e1da3e35d1f
> >
> > Then we work through getting Neil's patchset to land for 6.16 and
> > revert the stop-gap (dummy nfsd_file) patch.
> >
> >>> (of course rcu_dereference() doesn't actually dereference the pointer,
> >>> despite its name. It just declared that there is an imminent intention
> >>> to dereference the pointer.....)
> >>>
> >>> NeilBrown
> >
> > Rather than do a way more crazy stop-gap like this (which actually works):
>
> The below looks about like I expected it to. Thanks for coding it up and
> trying it!
This really looks nice. Thanks.
Would you send it as a regular patch to the list?
> Agreed that it is more verbose than the original patch, but this seems
> more self-documenting to me and looks narrow enough to backport to LTS
> kernels or even move these helpers to common headers if they are found
> to be more broadly useful. I have only a mild preference for this
> solution over the initial one.
>
> One more comment below.
>
>
> > fs/nfs/localio.c | 6 +++---
> > fs/nfs_common/nfslocalio.c | 8 +++----
> > include/linux/nfslocalio.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 59 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > index 73dd07495440..fedc07254c00 100644
> > --- a/fs/nfs/localio.c
> > +++ b/fs/nfs/localio.c
> > @@ -272,7 +272,7 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> >
> > new = NULL;
> > rcu_read_lock();
> > - nf = rcu_dereference(*pnf);
> > + nf = rcu_dereference_opaque(*pnf);
> > if (!nf) {
> > rcu_read_unlock();
> > new = __nfs_local_open_fh(clp, cred, fh, nfl, mode);
> > @@ -281,11 +281,11 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> > rcu_read_lock();
> > /* try to swap in the pointer */
> > spin_lock(&clp->cl_uuid.lock);
> > - nf = rcu_dereference_protected(*pnf, 1);
> > + nf = rcu_dereference_opaque_protected(*pnf, 1);
> > if (!nf) {
> > nf = new;
> > new = NULL;
> > - rcu_assign_pointer(*pnf, nf);
> > + rcu_assign_opaque_pointer(*pnf, nf);
> > }
> > spin_unlock(&clp->cl_uuid.lock);
> > }
> > diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> > index 6a0bdea6d644..213862ceb8bb 100644
> > --- a/fs/nfs_common/nfslocalio.c
> > +++ b/fs/nfs_common/nfslocalio.c
> > @@ -285,14 +285,14 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
> > return;
> > }
> >
> > - ro_nf = rcu_access_pointer(nfl->ro_file);
> > - rw_nf = rcu_access_pointer(nfl->rw_file);
> > + ro_nf = rcu_access_opaque(nfl->ro_file);
> > + rw_nf = rcu_access_opaque(nfl->rw_file);
> > if (ro_nf || rw_nf) {
> > spin_lock(&nfs_uuid->lock);
> > if (ro_nf)
> > - ro_nf = rcu_dereference_protected(xchg(&nfl->ro_file, NULL), 1);
> > + ro_nf = rcu_dereference_opaque_protected(xchg(&nfl->ro_file, NULL), 1);
> > if (rw_nf)
> > - rw_nf = rcu_dereference_protected(xchg(&nfl->rw_file, NULL), 1);
> > + rw_nf = rcu_dereference_opaque_protected(xchg(&nfl->rw_file, NULL), 1);
>
> I might combine rcu_dereference_opaque_protected and xchg into another
> helper, since this rather verbose idiom appears twice.
>
>
> > /* Remove nfl from nfs_uuid->files list */
> > RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
> > diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> > index 9aa8a43843d7..c6e86891d4b5 100644
> > --- a/include/linux/nfslocalio.h
> > +++ b/include/linux/nfslocalio.h
> > @@ -15,6 +15,58 @@
> > #include <linux/sunrpc/svcauth.h>
> > #include <linux/nfs.h>
> > #include <net/net_namespace.h>
> > +#include <linux/rcupdate.h>
> > +
> > +/*
> > + * RCU methods to allow fs/nfs_common and fs/nfs LOCALIO code to avoid
> > + * dereferencing pointer to 'struct nfs_file' which is opaque outside fs/nfsd
> > +*/
> > +#define __rcu_access_opaque_pointer(p, local, space) \
> > +({ \
> > + typeof(p) local = (__force typeof(p))READ_ONCE(p); \
> > + rcu_check_sparse(p, space); \
> > + (__force __kernel typeof(p))(local); \
> > +})
> > +
> > +#define rcu_access_opaque(p) __rcu_access_opaque_pointer((p), __UNIQUE_ID(rcu), __rcu)
> > +
> > +#define __rcu_dereference_opaque_protected(p, local, c, space) \
> > +({ \
> > + RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_opaque_protected() usage"); \
> > + rcu_check_sparse(p, space); \
> > + (__force __kernel typeof(p))(p); \
> > +})
> > +
> > +#define rcu_dereference_opaque_protected(p, c) \
> > + __rcu_dereference_opaque_protected((p), __UNIQUE_ID(rcu), (c), __rcu)
> > +
> > +#define __rcu_dereference_opaque_check(p, local, c, space) \
> > +({ \
> > + /* Dependency order vs. p above. */ \
> > + typeof(p) local = (__force typeof(p))READ_ONCE(p); \
> > + RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_opaque_check() usage"); \
> > + rcu_check_sparse(p, space); \
> > + (__force __kernel typeof(p))(local); \
> > +})
> > +
> > +#define rcu_dereference_opaque_check(p, c) \
> > + __rcu_dereference_opaque_check((p), __UNIQUE_ID(rcu), \
> > + (c) || rcu_read_lock_held(), __rcu)
> > +
> > +#define rcu_dereference_opaque(p) rcu_dereference_opaque_check(p, 0)
> > +
> > +#define RCU_INITIALIZER_OPAQUE(v) (typeof((v)) __force __rcu)(v)
> > +
> > +#define rcu_assign_opaque_pointer(p, v) \
> > +do { \
> > + uintptr_t _r_a_p__v = (uintptr_t)(v); \
> > + rcu_check_sparse(p, __rcu); \
> > + \
> > + if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
> > + WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
> > + else \
> > + smp_store_release(&p, RCU_INITIALIZER_OPAQUE((typeof(p))_r_a_p__v)); \
> > +} while (0)
> >
> > struct nfs_client;
> > struct nfs_file_localio;
>
>
> --
> Chuck Lever
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers
2025-05-10 19:57 ` Mike Snitzer
2025-05-16 15:33 ` Chuck Lever
@ 2025-05-19 3:49 ` NeilBrown
1 sibling, 0 replies; 56+ messages in thread
From: NeilBrown @ 2025-05-19 3:49 UTC (permalink / raw)
To: Mike Snitzer
Cc: Chuck Lever, Trond Myklebust, Anna Schumaker, Pali Rohár,
Vincent Mailhol, Jeff Layton, linux-nfs, paulmck
On Sun, 11 May 2025, Mike Snitzer wrote:
> On Sat, May 10, 2025 at 12:02:27PM -0400, Chuck Lever wrote:
> > On 5/9/25 11:01 PM, NeilBrown wrote:
> > > On Sat, 10 May 2025, Chuck Lever wrote:
> > >> [ adding Paul McK ]
> > >>
> > >> On 5/8/25 8:46 PM, NeilBrown wrote:
> > >>> This is a revised version a the earlier series. I've actually tested
> > >>> this time and fixed a few issues including the one that Mike found.
> > >>
> > >> As Mike mentioned in a previous thread, at this point, any fix for this
> > >> issue will need to be applied to recent stable kernels as well. This
> > >> series looks a bit too complicated for that.
> > >>
> > >> I expect that other subsystems will encounter this issue eventually,
> > >> so it would be beneficial to address the root cause. For that purpose, I
> > >> think I like Vincent's proposal the best:
> > >>
> > >> https://lore.kernel.org/linux-nfs/8c67a295-8caa-4e53-a764-f691657bbe62@wanadoo.fr/raw
> > >>
> > >> None of this is to say that Neil's patches shouldn't be applied. But
> > >> perhaps these are not a broad solution to the RCU compilation issue.
> > >
> > > Do we need a "broad solution to the RCU compilation issue"?
> >
> > Fair question. If the current localio code is simply incorrect as it
> > stands, then I suppose the answer is no. Because gcc is happy to compile
> > it in most cases, I thought the problem was with older versions of gcc,
> > not with localio (even though, I agree, the use of an incomplete
> > structure definition is somewhat brittle when used with RCU).
> >
> >
> > > Does it ever make sense to "dereference" a pointer to a structure that is
> > > not fully specified? What does that even mean?
> > >
> > > I find it harder to argue against use of rcu_access_pointer() in that
> > > context, at least for test-against-NULL, but sparse doesn't complain
> > > about a bare test of an __rcu pointer against NULL, so maybe there is no
> > > need for rcu_access_pointer() for simple tests - in which case the
> > > documentation should be updated.
> >
> > For backporting purposes, inventing our own local RCU helper to handle
> > the situation might be best. Then going forward, apply your patches to
> > rectify the use of the incomplete structure definition, and the local
> > helper can eventually be removed.
> >
> > My interest is getting to a narrow set of changes that can be applied
> > now and backported as needed. The broader clean-ups can then be applied
> > to future kernels (or as subsequent patches in the same merge window).
> >
> > My 2 cents, worth every penny.
>
> I really would prefer we just use this patch as the stop-gap for 6.14
> and 6.15 (which I have been carrying for nearly a year now because I
> need to support an EL8 platform that uses gcc 8.5):
>
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=kernel-6.12.24/nfs-testing&id=f9add5e4c9b4629b102876dce9484e1da3e35d1f
>
> Then we work through getting Neil's patchset to land for 6.16 and
> revert the stop-gap (dummy nfsd_file) patch.
>
> > > (of course rcu_dereference() doesn't actually dereference the pointer,
> > > despite its name. It just declared that there is an imminent intention
> > > to dereference the pointer.....)
> > >
> > > NeilBrown
>
> Rather than do a way more crazy stop-gap like this (which actually works):
"works" in what sense? Presumably that gcc-8 doesn't complain.
sparse doesn't like it at all though.
If you don't care about sparse not being happy, then it would be easier
to just use READ_ONCE and WRITE_ONCE rather than creating the new
_opaque interfaces.
Thanks,
NeilBrown
>
> fs/nfs/localio.c | 6 +++---
> fs/nfs_common/nfslocalio.c | 8 +++----
> include/linux/nfslocalio.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 59 insertions(+), 7 deletions(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index 73dd07495440..fedc07254c00 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -272,7 +272,7 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
>
> new = NULL;
> rcu_read_lock();
> - nf = rcu_dereference(*pnf);
> + nf = rcu_dereference_opaque(*pnf);
> if (!nf) {
> rcu_read_unlock();
> new = __nfs_local_open_fh(clp, cred, fh, nfl, mode);
> @@ -281,11 +281,11 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> rcu_read_lock();
> /* try to swap in the pointer */
> spin_lock(&clp->cl_uuid.lock);
> - nf = rcu_dereference_protected(*pnf, 1);
> + nf = rcu_dereference_opaque_protected(*pnf, 1);
> if (!nf) {
> nf = new;
> new = NULL;
> - rcu_assign_pointer(*pnf, nf);
> + rcu_assign_opaque_pointer(*pnf, nf);
> }
> spin_unlock(&clp->cl_uuid.lock);
> }
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index 6a0bdea6d644..213862ceb8bb 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -285,14 +285,14 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
> return;
> }
>
> - ro_nf = rcu_access_pointer(nfl->ro_file);
> - rw_nf = rcu_access_pointer(nfl->rw_file);
> + ro_nf = rcu_access_opaque(nfl->ro_file);
> + rw_nf = rcu_access_opaque(nfl->rw_file);
> if (ro_nf || rw_nf) {
> spin_lock(&nfs_uuid->lock);
> if (ro_nf)
> - ro_nf = rcu_dereference_protected(xchg(&nfl->ro_file, NULL), 1);
> + ro_nf = rcu_dereference_opaque_protected(xchg(&nfl->ro_file, NULL), 1);
> if (rw_nf)
> - rw_nf = rcu_dereference_protected(xchg(&nfl->rw_file, NULL), 1);
> + rw_nf = rcu_dereference_opaque_protected(xchg(&nfl->rw_file, NULL), 1);
>
> /* Remove nfl from nfs_uuid->files list */
> RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index 9aa8a43843d7..c6e86891d4b5 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -15,6 +15,58 @@
> #include <linux/sunrpc/svcauth.h>
> #include <linux/nfs.h>
> #include <net/net_namespace.h>
> +#include <linux/rcupdate.h>
> +
> +/*
> + * RCU methods to allow fs/nfs_common and fs/nfs LOCALIO code to avoid
> + * dereferencing pointer to 'struct nfs_file' which is opaque outside fs/nfsd
> +*/
> +#define __rcu_access_opaque_pointer(p, local, space) \
> +({ \
> + typeof(p) local = (__force typeof(p))READ_ONCE(p); \
> + rcu_check_sparse(p, space); \
> + (__force __kernel typeof(p))(local); \
> +})
> +
> +#define rcu_access_opaque(p) __rcu_access_opaque_pointer((p), __UNIQUE_ID(rcu), __rcu)
> +
> +#define __rcu_dereference_opaque_protected(p, local, c, space) \
> +({ \
> + RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_opaque_protected() usage"); \
> + rcu_check_sparse(p, space); \
> + (__force __kernel typeof(p))(p); \
> +})
> +
> +#define rcu_dereference_opaque_protected(p, c) \
> + __rcu_dereference_opaque_protected((p), __UNIQUE_ID(rcu), (c), __rcu)
> +
> +#define __rcu_dereference_opaque_check(p, local, c, space) \
> +({ \
> + /* Dependency order vs. p above. */ \
> + typeof(p) local = (__force typeof(p))READ_ONCE(p); \
> + RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_opaque_check() usage"); \
> + rcu_check_sparse(p, space); \
> + (__force __kernel typeof(p))(local); \
> +})
> +
> +#define rcu_dereference_opaque_check(p, c) \
> + __rcu_dereference_opaque_check((p), __UNIQUE_ID(rcu), \
> + (c) || rcu_read_lock_held(), __rcu)
> +
> +#define rcu_dereference_opaque(p) rcu_dereference_opaque_check(p, 0)
> +
> +#define RCU_INITIALIZER_OPAQUE(v) (typeof((v)) __force __rcu)(v)
> +
> +#define rcu_assign_opaque_pointer(p, v) \
> +do { \
> + uintptr_t _r_a_p__v = (uintptr_t)(v); \
> + rcu_check_sparse(p, __rcu); \
> + \
> + if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
> + WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
> + else \
> + smp_store_release(&p, RCU_INITIALIZER_OPAQUE((typeof(p))_r_a_p__v)); \
> +} while (0)
>
> struct nfs_client;
> struct nfs_file_localio;
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer"
2025-05-09 0:46 ` [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer NeilBrown
2025-05-09 11:03 ` kernel test robot
@ 2025-07-08 14:20 ` Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
` (2 more replies)
1 sibling, 3 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-08 14:20 UTC (permalink / raw)
To: NeilBrown
Cc: Trond Myklebust, Anna Schumaker, Pali Rohár, Vincent Mailhol,
Chuck Lever, Jeff Layton, linux-nfs
[Preface: this revert makes it much less likely to "lose the race",
whereby causing nfsd_shutdown_net() to hang, so we'd do well to take
the time/care to properly fix whatever is lacking in Neil's commit
c25a89770d1f]
From: Mike Snitzer <snitzer@hammerspace.com>
Subject: Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer"
Date: Sat, 5 Jul 2025 01:47:55 +0000
This reverts commit c25a89770d1f ("nfs_localio: change
nfsd_file_put_local() to take a pointer to __rcu pointer") because it
has been determined as the cause for nfsd_shutdown_net() hanging
waiting for percpu_ref_exit(&nn->nfsd_net_ref) during shutdown _after_
running a simple LOCALIO test with fio, e.g.:
fio --nrfiles=3 --filesize=1000m --cpus_allowed_policy=split
--group_reporting --rw=read --bs=47008 --numjobs=3 --iodepth=16
--runtime=20 --time_based --loops=1 --ioengine=libaio --direct=1
--verify_dump=1 --invalidate=1 --randrepeat=1 --exitall --name
task_share1 --filename=/mnt/share1/a.test
NOTE: commit c25a89770d1f contained hunks that should've been included
in commit e6f7e1487ab5 ("nfs_localio: simplify interface to nfsd for
getting nfsd_file"). So this revert fixes the associated breakage,
but in general it means that LOCALIO is not bisect-safe.
This commit also reverts commit 1c14d71928ef ("NFSD: Clean up kdoc for
nfsd_file_put_local()") which was a fix for commit c25a89770d1f.
Fixes: 1c14d71928ef ("NFSD: Clean up kdoc for nfsd_file_put_local()")
Fixes: c25a89770d1f ("nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer")
Fixes: e6f7e1487ab5 ("nfs_localio: simplify interface to nfsd for getting nfsd_file")
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
fs/nfs/localio.c | 11 ++---------
fs/nfs_common/nfslocalio.c | 24 ++++++++++++++++++------
fs/nfsd/filecache.c | 13 ++++---------
fs/nfsd/filecache.h | 2 +-
include/linux/nfslocalio.h | 19 ++++++++-----------
5 files changed, 33 insertions(+), 36 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 510d0a16cfe9..ef12dd279539 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -209,16 +209,9 @@ void nfs_local_probe_async(struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_local_probe_async);
-static inline void nfs_local_file_put(struct nfsd_file *localio)
+static inline void nfs_local_file_put(struct nfsd_file *nf)
{
- /* nfs_to_nfsd_file_put_local() expects an __rcu pointer
- * but we have a __kernel pointer. It is always safe
- * to cast a __kernel pointer to an __rcu pointer
- * because the cast only weakens what is known about the pointer.
- */
- struct nfsd_file __rcu *nf = (struct nfsd_file __rcu*) localio;
-
- nfs_to_nfsd_file_put_local(&nf);
+ nfs_to_nfsd_file_put_local(nf);
}
/*
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 05c7c16e37ab..1dd5a8cca064 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -170,6 +170,9 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
while ((nfl = list_first_entry_or_null(&nfs_uuid->files,
struct nfs_file_localio,
list)) != NULL) {
+ struct nfsd_file *ro_nf;
+ struct nfsd_file *rw_nf;
+
/* If nfs_uuid is already NULL, nfs_close_local_fh is
* closing and we must wait, else we unlink and close.
*/
@@ -186,14 +189,17 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
continue;
}
+ ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
+ rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
+
/* Remove nfl from nfs_uuid->files list */
list_del_init(&nfl->list);
spin_unlock(&nfs_uuid->lock);
-
- nfs_to_nfsd_file_put_local(&nfl->ro_file);
- nfs_to_nfsd_file_put_local(&nfl->rw_file);
+ if (ro_nf)
+ nfs_to_nfsd_file_put_local(ro_nf);
+ if (rw_nf)
+ nfs_to_nfsd_file_put_local(rw_nf);
cond_resched();
-
spin_lock(&nfs_uuid->lock);
/* Now we can allow racing nfs_close_local_fh() to
* skip the locking.
@@ -297,6 +303,8 @@ EXPORT_SYMBOL_GPL(nfs_open_local_fh);
void nfs_close_local_fh(struct nfs_file_localio *nfl)
{
+ struct nfsd_file *ro_nf;
+ struct nfsd_file *rw_nf;
nfs_uuid_t *nfs_uuid;
rcu_read_lock();
@@ -329,8 +337,12 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
spin_unlock(&nfs_uuid->lock);
rcu_read_unlock();
- nfs_to_nfsd_file_put_local(&nfl->ro_file);
- nfs_to_nfsd_file_put_local(&nfl->rw_file);
+ ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
+ rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
+ if (ro_nf)
+ nfs_to_nfsd_file_put_local(ro_nf);
+ if (rw_nf)
+ nfs_to_nfsd_file_put_local(rw_nf);
/* Remove nfl from nfs_uuid->files list and signal nfs_uuid_put()
* that we are done. The moment we drop the spinlock the
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 68b8d0c6414e..905034fd8c34 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -372,22 +372,17 @@ nfsd_file_put(struct nfsd_file *nf)
/**
* nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
- * @pnf: nfsd_file of which to put the reference
+ * @nf: nfsd_file of which to put the reference
*
* First save the associated net to return to caller, then put
* the reference of the nfsd_file.
*/
struct net *
-nfsd_file_put_local(struct nfsd_file __rcu **pnf)
+nfsd_file_put_local(struct nfsd_file *nf)
{
- struct nfsd_file *nf;
- struct net *net = NULL;
+ struct net *net = nf->nf_net;
- nf = unrcu_pointer(xchg(pnf, NULL));
- if (nf) {
- net = nf->nf_net;
- nfsd_file_put(nf);
- }
+ nfsd_file_put(nf);
return net;
}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 722b26c71e45..cd02f91aaef1 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -62,7 +62,7 @@ void nfsd_file_cache_shutdown(void);
int nfsd_file_cache_start_net(struct net *net);
void nfsd_file_cache_shutdown_net(struct net *net);
void nfsd_file_put(struct nfsd_file *nf);
-struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
+struct net *nfsd_file_put_local(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get_local(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
struct file *nfsd_file_file(struct nfsd_file *nf);
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 5c7c92659e73..453d9de3d70b 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -60,9 +60,9 @@ struct nfsd_localio_operations {
struct rpc_clnt *,
const struct cred *,
const struct nfs_fh *,
- struct nfsd_file __rcu **pnf,
+ struct nfsd_file __rcu **,
const fmode_t);
- struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
+ struct net *(*nfsd_file_put_local)(struct nfsd_file *);
struct nfsd_file *(*nfsd_file_get_local)(struct nfsd_file *);
struct file *(*nfsd_file_file)(struct nfsd_file *);
} ____cacheline_aligned;
@@ -88,19 +88,16 @@ static inline void nfs_to_nfsd_net_put(struct net *net)
rcu_read_unlock();
}
-static inline void nfs_to_nfsd_file_put_local(struct nfsd_file __rcu **localio)
+static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
{
/*
- * Either *localio must be guaranteed to be non-NULL, or caller
- * must prevent nfsd shutdown from completing as nfs_close_local_fh()
- * does by blocking the nfs_uuid from being finally put.
+ * Must not hold RCU otherwise nfsd_file_put() can easily trigger:
+ * "Voluntary context switch within RCU read-side critical section!"
+ * by scheduling deep in underlying filesystem (e.g. XFS).
*/
- struct net *net;
+ struct net *net = nfs_to->nfsd_file_put_local(localio);
- net = nfs_to->nfsd_file_put_local(localio);
-
- if (net)
- nfs_to_nfsd_net_put(net);
+ nfs_to_nfsd_net_put(net);
}
#else /* CONFIG_NFS_LOCALIO */
--
2.44.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes
2025-07-08 14:20 ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
@ 2025-07-14 3:13 ` Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 1/9] Revert "NFSD: Clean up kdoc for nfsd_open_local_fh()" Mike Snitzer
` (8 more replies)
2025-07-14 3:50 ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" NeilBrown
2025-07-15 22:52 ` [PATCH 0/3] Fix localio hangs Trond Myklebust
2 siblings, 9 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 3:13 UTC (permalink / raw)
To: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Cc: linux-nfs
[Apologies for so many words...]
Hi,
I wanted to get this on all the NFS and NFSD maintainers' radar ASAP.
I realize the timing of this is not great due to how late we are in
the 6.16 release cycle (v6.16-rc7). But I feel it prudent to make it
clear that the LOCALIO changes that went upstream during the 6.16 merge
window are unstable under load. So this week we'll need to make a
call on how to handle this for v6.16 final.
And just FYI: I unfortunately don't have time this week to assist with
developing/testing a smaller fix to solve this situation. The window
for extensive testing (by myself and others at Hammerspace) was late
last week. At this point, given we are short on time, reverting is
the sane thing to do.
Also, the 6.16-rc7 release's LOCALIO changes put it on something of an
island relative to more enterprise production kernels I am helping to
maintain (both the RHEL10 kernel and Oracle's OCI kernel, which is
actually an Ubuntu kernel, both have NFS LOCALIO that is 6.14 based).
All that said:
The past few weeks I had to assist with an HPC benchmarking effort
that generates heavy load using the "MLperf" benchmark suite. Testing
was done on 10 enterprise grade NVMe storage systems (each with 48
CPUs, and 8 NVMe devices) that depend on LOCALIO to "just work
_well_" to achieve a favorable score. Unfortunately LOCALIO didn't,
so I got to reverting. I started with this partial revert patch but it
wasn't enough (it just made the problem harder to hit), labeling this
previous revert proposal as "RFC" rather than "URGENT" was a mistake:
https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
(which is very similar to patch 2 in this series)
It wasn't until I did a full revert of 6.16's LOCALIO changes that
LOCALIO stopped having resource leaks (nfsd_file in particular) that
prevented proper NFSD shutdown and the inability to unload nfsd.ko.ko
(which I had to do a lot of while developing other NFS and NFSD
changes that were unrelated to LOCALIO).
Neil, I value the work you did to try to address the lingering
complaints about RCU related compiler errors in LOCALIO (but when you
posted your changes months ago I didn't have time to review, and then
they went upstream; so I assumed they were ready and made sure to
include them in Hammerspace's more recent kernels so that I could gain
"production" confidence in the changes even though I still hadn't had
time to review them properly.. ugh). Glad "we" did this heavy load
testing because otherwise we'd be oblivious about LOCALIO changes
merged for 6.16 causing regression. (I'm sending this later on my
Sunday evening in the hopes that you being in Australia enables us to
not lose a day of communication on this situation).
Patch 2 gets into how simple it is to trigger the nfsd_file leaks
resulting from running fio followed by NFSD shutdown and nfsd.ko
module removal.
Regards,
Mike
Mike Snitzer (9):
Revert "NFSD: Clean up kdoc for nfsd_open_local_fh()"
Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer"
Revert "nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()"
Revert "nfs_localio: duplicate nfs_close_local_fh()"
Revert "nfs_localio: simplify interface to nfsd for getting nfsd_file"
Revert "nfs_localio: always hold nfsd net ref with nfsd_file ref"
Revert "nfs_localio: use cmpxchg() to install new nfs_file_localio"
nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
nfs/localio: add localio_async_probe modparm
fs/nfs/localio.c | 64 ++++++++++++++++--------
fs/nfs_common/nfslocalio.c | 99 +++++++++++++-------------------------
fs/nfsd/filecache.c | 34 ++-----------
fs/nfsd/filecache.h | 3 +-
fs/nfsd/localio.c | 44 ++---------------
include/linux/nfslocalio.h | 26 +++++-----
6 files changed, 100 insertions(+), 170 deletions(-)
--
2.44.0
^ permalink raw reply [flat|nested] 56+ messages in thread
* [for-6.16-final PATCH 1/9] Revert "NFSD: Clean up kdoc for nfsd_open_local_fh()"
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
@ 2025-07-14 3:13 ` Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 2/9] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
` (7 subsequent siblings)
8 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 3:13 UTC (permalink / raw)
To: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Cc: linux-nfs
This reverts commit cae55a0ab53b2f7c92633a85f1c448746c4ae689.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/localio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 4f6468eb2adf..80d9ff6608a7 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -32,7 +32,7 @@
* @rpc_clnt: rpc_clnt that the client established
* @cred: cred that the client established
* @nfs_fh: filehandle to lookup
- * @pnf: place to find the nfsd_file, or store it if it was non-NULL
+ * @nfp: place to find the nfsd_file, or store it if it was non-NULL
* @fmode: fmode_t to use for open
*
* This function maps a local fh to a path on a local filesystem.
--
2.44.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [for-6.16-final PATCH 2/9] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer"
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 1/9] Revert "NFSD: Clean up kdoc for nfsd_open_local_fh()" Mike Snitzer
@ 2025-07-14 3:13 ` Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 3/9] Revert "nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()" Mike Snitzer
` (6 subsequent siblings)
8 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 3:13 UTC (permalink / raw)
To: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Cc: linux-nfs
From: Mike Snitzer <snitzer@hammerspace.com>
This reverts commit c25a89770d1f ("nfs_localio: change
nfsd_file_put_local() to take a pointer to __rcu pointer") because it
has been determined as the cause for nfsd_shutdown_net() hanging
waiting for percpu_ref_exit(&nn->nfsd_net_ref) during shutdown _after_
running a simple LOCALIO test with fio, e.g.:
fio --nrfiles=3 --filesize=1000m --cpus_allowed_policy=split
--group_reporting --rw=read --bs=47008 --numjobs=3 --iodepth=16
--runtime=20 --time_based --loops=1 --ioengine=libaio --direct=1
--verify_dump=1 --invalidate=1 --randrepeat=1 --exitall --name
task_share1 --filename=/mnt/share1/a.test
NOTE: commit c25a89770d1f contained hunks that should've been included
in commit e6f7e1487ab5 ("nfs_localio: simplify interface to nfsd for
getting nfsd_file"). So this revert fixes the associated breakage,
but in general it means that LOCALIO is not bisect-safe.
This commit also reverts commit 1c14d71928ef ("NFSD: Clean up kdoc for
nfsd_file_put_local()") which was a fix for commit c25a89770d1f.
Fixes: 1c14d71928ef ("NFSD: Clean up kdoc for nfsd_file_put_local()")
Fixes: c25a89770d1f ("nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer")
Fixes: e6f7e1487ab5 ("nfs_localio: simplify interface to nfsd for getting nfsd_file")
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
fs/nfs/localio.c | 11 ++---------
fs/nfs_common/nfslocalio.c | 24 ++++++++++++++++++------
fs/nfsd/filecache.c | 13 ++++---------
fs/nfsd/filecache.h | 2 +-
include/linux/nfslocalio.h | 19 ++++++++-----------
5 files changed, 33 insertions(+), 36 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 510d0a16cfe9..ef12dd279539 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -209,16 +209,9 @@ void nfs_local_probe_async(struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_local_probe_async);
-static inline void nfs_local_file_put(struct nfsd_file *localio)
+static inline void nfs_local_file_put(struct nfsd_file *nf)
{
- /* nfs_to_nfsd_file_put_local() expects an __rcu pointer
- * but we have a __kernel pointer. It is always safe
- * to cast a __kernel pointer to an __rcu pointer
- * because the cast only weakens what is known about the pointer.
- */
- struct nfsd_file __rcu *nf = (struct nfsd_file __rcu*) localio;
-
- nfs_to_nfsd_file_put_local(&nf);
+ nfs_to_nfsd_file_put_local(nf);
}
/*
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 05c7c16e37ab..1dd5a8cca064 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -170,6 +170,9 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
while ((nfl = list_first_entry_or_null(&nfs_uuid->files,
struct nfs_file_localio,
list)) != NULL) {
+ struct nfsd_file *ro_nf;
+ struct nfsd_file *rw_nf;
+
/* If nfs_uuid is already NULL, nfs_close_local_fh is
* closing and we must wait, else we unlink and close.
*/
@@ -186,14 +189,17 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
continue;
}
+ ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
+ rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
+
/* Remove nfl from nfs_uuid->files list */
list_del_init(&nfl->list);
spin_unlock(&nfs_uuid->lock);
-
- nfs_to_nfsd_file_put_local(&nfl->ro_file);
- nfs_to_nfsd_file_put_local(&nfl->rw_file);
+ if (ro_nf)
+ nfs_to_nfsd_file_put_local(ro_nf);
+ if (rw_nf)
+ nfs_to_nfsd_file_put_local(rw_nf);
cond_resched();
-
spin_lock(&nfs_uuid->lock);
/* Now we can allow racing nfs_close_local_fh() to
* skip the locking.
@@ -297,6 +303,8 @@ EXPORT_SYMBOL_GPL(nfs_open_local_fh);
void nfs_close_local_fh(struct nfs_file_localio *nfl)
{
+ struct nfsd_file *ro_nf;
+ struct nfsd_file *rw_nf;
nfs_uuid_t *nfs_uuid;
rcu_read_lock();
@@ -329,8 +337,12 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
spin_unlock(&nfs_uuid->lock);
rcu_read_unlock();
- nfs_to_nfsd_file_put_local(&nfl->ro_file);
- nfs_to_nfsd_file_put_local(&nfl->rw_file);
+ ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
+ rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
+ if (ro_nf)
+ nfs_to_nfsd_file_put_local(ro_nf);
+ if (rw_nf)
+ nfs_to_nfsd_file_put_local(rw_nf);
/* Remove nfl from nfs_uuid->files list and signal nfs_uuid_put()
* that we are done. The moment we drop the spinlock the
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 91a7ae7db9cc..06150dd171be 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -375,22 +375,17 @@ nfsd_file_put(struct nfsd_file *nf)
/**
* nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
- * @pnf: nfsd_file of which to put the reference
+ * @nf: nfsd_file of which to put the reference
*
* First save the associated net to return to caller, then put
* the reference of the nfsd_file.
*/
struct net *
-nfsd_file_put_local(struct nfsd_file __rcu **pnf)
+nfsd_file_put_local(struct nfsd_file *nf)
{
- struct nfsd_file *nf;
- struct net *net = NULL;
+ struct net *net = nf->nf_net;
- nf = unrcu_pointer(xchg(pnf, NULL));
- if (nf) {
- net = nf->nf_net;
- nfsd_file_put(nf);
- }
+ nfsd_file_put(nf);
return net;
}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 237a05c74211..d41428ce8a11 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -66,7 +66,7 @@ void nfsd_file_cache_shutdown(void);
int nfsd_file_cache_start_net(struct net *net);
void nfsd_file_cache_shutdown_net(struct net *net);
void nfsd_file_put(struct nfsd_file *nf);
-struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
+struct net *nfsd_file_put_local(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get_local(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
struct file *nfsd_file_file(struct nfsd_file *nf);
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 5c7c92659e73..453d9de3d70b 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -60,9 +60,9 @@ struct nfsd_localio_operations {
struct rpc_clnt *,
const struct cred *,
const struct nfs_fh *,
- struct nfsd_file __rcu **pnf,
+ struct nfsd_file __rcu **,
const fmode_t);
- struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
+ struct net *(*nfsd_file_put_local)(struct nfsd_file *);
struct nfsd_file *(*nfsd_file_get_local)(struct nfsd_file *);
struct file *(*nfsd_file_file)(struct nfsd_file *);
} ____cacheline_aligned;
@@ -88,19 +88,16 @@ static inline void nfs_to_nfsd_net_put(struct net *net)
rcu_read_unlock();
}
-static inline void nfs_to_nfsd_file_put_local(struct nfsd_file __rcu **localio)
+static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
{
/*
- * Either *localio must be guaranteed to be non-NULL, or caller
- * must prevent nfsd shutdown from completing as nfs_close_local_fh()
- * does by blocking the nfs_uuid from being finally put.
+ * Must not hold RCU otherwise nfsd_file_put() can easily trigger:
+ * "Voluntary context switch within RCU read-side critical section!"
+ * by scheduling deep in underlying filesystem (e.g. XFS).
*/
- struct net *net;
+ struct net *net = nfs_to->nfsd_file_put_local(localio);
- net = nfs_to->nfsd_file_put_local(localio);
-
- if (net)
- nfs_to_nfsd_net_put(net);
+ nfs_to_nfsd_net_put(net);
}
#else /* CONFIG_NFS_LOCALIO */
--
2.44.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [for-6.16-final PATCH 3/9] Revert "nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()"
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 1/9] Revert "NFSD: Clean up kdoc for nfsd_open_local_fh()" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 2/9] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
@ 2025-07-14 3:13 ` Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 4/9] Revert "nfs_localio: duplicate nfs_close_local_fh()" Mike Snitzer
` (5 subsequent siblings)
8 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 3:13 UTC (permalink / raw)
To: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Cc: linux-nfs
This reverts commit c17f0e92b492d51a339442d5e5626a4c0a1dd060.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs_common/nfslocalio.c | 85 ++++++++++++--------------------------
1 file changed, 27 insertions(+), 58 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 1dd5a8cca064..49c59f0c78c6 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -151,7 +151,8 @@ EXPORT_SYMBOL_GPL(nfs_localio_enable_client);
*/
static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
{
- struct nfs_file_localio *nfl;
+ LIST_HEAD(local_files);
+ struct nfs_file_localio *nfl, *tmp;
spin_lock(&nfs_uuid->lock);
if (unlikely(!rcu_access_pointer(nfs_uuid->net))) {
@@ -165,49 +166,37 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
nfs_uuid->dom = NULL;
}
+ list_splice_init(&nfs_uuid->files, &local_files);
+ spin_unlock(&nfs_uuid->lock);
+
/* Walk list of files and ensure their last references dropped */
-
- while ((nfl = list_first_entry_or_null(&nfs_uuid->files,
- struct nfs_file_localio,
- list)) != NULL) {
+ list_for_each_entry_safe(nfl, tmp, &local_files, list) {
struct nfsd_file *ro_nf;
struct nfsd_file *rw_nf;
- /* If nfs_uuid is already NULL, nfs_close_local_fh is
- * closing and we must wait, else we unlink and close.
- */
- if (rcu_access_pointer(nfl->nfs_uuid) == NULL) {
- /* nfs_close_local_fh() is doing the
- * close and we must wait. until it unlinks
- */
- wait_var_event_spinlock(nfl,
- list_first_entry_or_null(
- &nfs_uuid->files,
- struct nfs_file_localio,
- list) != nfl,
- &nfs_uuid->lock);
- continue;
- }
-
ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
+ spin_lock(&nfs_uuid->lock);
/* Remove nfl from nfs_uuid->files list */
list_del_init(&nfl->list);
spin_unlock(&nfs_uuid->lock);
+ /* Now we can allow racing nfs_close_local_fh() to
+ * skip the locking.
+ */
+ RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
+
if (ro_nf)
nfs_to_nfsd_file_put_local(ro_nf);
if (rw_nf)
nfs_to_nfsd_file_put_local(rw_nf);
+
cond_resched();
- spin_lock(&nfs_uuid->lock);
- /* Now we can allow racing nfs_close_local_fh() to
- * skip the locking.
- */
- RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
- wake_up_var_locked(&nfl->nfs_uuid, &nfs_uuid->lock);
}
+ spin_lock(&nfs_uuid->lock);
+ BUG_ON(!list_empty(&nfs_uuid->files));
+
/* Remove client from nn->local_clients */
if (nfs_uuid->list_lock) {
spin_lock(nfs_uuid->list_lock);
@@ -315,43 +304,23 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
return;
}
- spin_lock(&nfs_uuid->lock);
- if (!rcu_access_pointer(nfl->nfs_uuid)) {
- /* nfs_uuid_put has finished here */
- spin_unlock(&nfs_uuid->lock);
- rcu_read_unlock();
- return;
- }
- if (list_empty(&nfs_uuid->files)) {
- /* nfs_uuid_put() has started closing files, wait for it
- * to finished
- */
- spin_unlock(&nfs_uuid->lock);
- rcu_read_unlock();
- wait_var_event(&nfl->nfs_uuid,
- rcu_access_pointer(nfl->nfs_uuid) == NULL);
- return;
- }
- /* tell nfs_uuid_put() to wait for us */
- RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
- spin_unlock(&nfs_uuid->lock);
- rcu_read_unlock();
-
ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
+
+ spin_lock(&nfs_uuid->lock);
+ /* Remove nfl from nfs_uuid->files list */
+ list_del_init(&nfl->list);
+ spin_unlock(&nfs_uuid->lock);
+ rcu_read_unlock();
+ /* Now we can allow racing nfs_close_local_fh() to
+ * skip the locking.
+ */
+ RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
+
if (ro_nf)
nfs_to_nfsd_file_put_local(ro_nf);
if (rw_nf)
nfs_to_nfsd_file_put_local(rw_nf);
-
- /* Remove nfl from nfs_uuid->files list and signal nfs_uuid_put()
- * that we are done. The moment we drop the spinlock the
- * nfs_uuid could be freed.
- */
- spin_lock(&nfs_uuid->lock);
- list_del_init(&nfl->list);
- wake_up_var_locked(&nfl->nfs_uuid, &nfs_uuid->lock);
- spin_unlock(&nfs_uuid->lock);
}
EXPORT_SYMBOL_GPL(nfs_close_local_fh);
--
2.44.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [for-6.16-final PATCH 4/9] Revert "nfs_localio: duplicate nfs_close_local_fh()"
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
` (2 preceding siblings ...)
2025-07-14 3:13 ` [for-6.16-final PATCH 3/9] Revert "nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()" Mike Snitzer
@ 2025-07-14 3:13 ` Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 5/9] Revert "nfs_localio: simplify interface to nfsd for getting nfsd_file" Mike Snitzer
` (4 subsequent siblings)
8 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 3:13 UTC (permalink / raw)
To: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Cc: linux-nfs
This reverts commit 23adb30505afe36b3e587a9074c3288de3c7e583.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs_common/nfslocalio.c | 21 +--------------------
1 file changed, 1 insertion(+), 20 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 49c59f0c78c6..503f85f64b76 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -171,26 +171,7 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
/* Walk list of files and ensure their last references dropped */
list_for_each_entry_safe(nfl, tmp, &local_files, list) {
- struct nfsd_file *ro_nf;
- struct nfsd_file *rw_nf;
-
- ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
- rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
-
- spin_lock(&nfs_uuid->lock);
- /* Remove nfl from nfs_uuid->files list */
- list_del_init(&nfl->list);
- spin_unlock(&nfs_uuid->lock);
- /* Now we can allow racing nfs_close_local_fh() to
- * skip the locking.
- */
- RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
-
- if (ro_nf)
- nfs_to_nfsd_file_put_local(ro_nf);
- if (rw_nf)
- nfs_to_nfsd_file_put_local(rw_nf);
-
+ nfs_close_local_fh(nfl);
cond_resched();
}
--
2.44.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [for-6.16-final PATCH 5/9] Revert "nfs_localio: simplify interface to nfsd for getting nfsd_file"
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
` (3 preceding siblings ...)
2025-07-14 3:13 ` [for-6.16-final PATCH 4/9] Revert "nfs_localio: duplicate nfs_close_local_fh()" Mike Snitzer
@ 2025-07-14 3:13 ` Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 6/9] Revert "nfs_localio: always hold nfsd net ref with nfsd_file ref" Mike Snitzer
` (3 subsequent siblings)
8 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 3:13 UTC (permalink / raw)
To: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Cc: linux-nfs
This reverts commit aa79c438fda263372c63df2170bc4e4030bb68de.
Really nasty revert due to original series not being bisect safe.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 31 +++++++++++++++++++++++++------
fs/nfs_common/nfslocalio.c | 3 +--
fs/nfsd/localio.c | 37 ++++---------------------------------
include/linux/nfslocalio.h | 6 ++++--
4 files changed, 34 insertions(+), 43 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index ef12dd279539..86df8d2cd22e 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -209,6 +209,11 @@ void nfs_local_probe_async(struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_local_probe_async);
+static inline struct nfsd_file *nfs_local_file_get(struct nfsd_file *nf)
+{
+ return nfs_to->nfsd_file_get_local(nf);
+}
+
static inline void nfs_local_file_put(struct nfsd_file *nf)
{
nfs_to_nfsd_file_put_local(nf);
@@ -223,13 +228,12 @@ static inline void nfs_local_file_put(struct nfsd_file *nf)
static struct nfsd_file *
__nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, struct nfs_file_localio *nfl,
- struct nfsd_file __rcu **pnf,
const fmode_t mode)
{
struct nfsd_file *localio;
localio = nfs_open_local_fh(&clp->cl_uuid, clp->cl_rpcclient,
- cred, fh, nfl, pnf, mode);
+ cred, fh, nfl, mode);
if (IS_ERR(localio)) {
int status = PTR_ERR(localio);
trace_nfs_local_open_fh(fh, mode, status);
@@ -256,7 +260,7 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, struct nfs_file_localio *nfl,
const fmode_t mode)
{
- struct nfsd_file *nf, __rcu **pnf;
+ struct nfsd_file *nf, *new, __rcu **pnf;
if (!nfs_server_is_local(clp))
return NULL;
@@ -268,9 +272,24 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
else
pnf = &nfl->ro_file;
- nf = __nfs_local_open_fh(clp, cred, fh, nfl, pnf, mode);
- if (IS_ERR(nf))
- return NULL;
+ new = NULL;
+ rcu_read_lock();
+ nf = rcu_dereference(*pnf);
+ if (!nf) {
+ rcu_read_unlock();
+ new = __nfs_local_open_fh(clp, cred, fh, nfl, mode);
+ if (IS_ERR(new))
+ return NULL;
+ rcu_read_lock();
+ /* try to swap in the pointer */
+ nf = unrcu_pointer(cmpxchg(pnf, NULL, RCU_INITIALIZER(new)));
+ if (!nf)
+ swap(nf, new);
+ }
+ nf = nfs_local_file_get(nf);
+ rcu_read_unlock();
+ if (new)
+ nfs_to_nfsd_file_put_local(new);
return nf;
}
EXPORT_SYMBOL_GPL(nfs_local_open_fh);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 503f85f64b76..f6821b2c87a2 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -237,7 +237,6 @@ static void nfs_uuid_add_file(nfs_uuid_t *nfs_uuid, struct nfs_file_localio *nfl
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
const struct nfs_fh *nfs_fh, struct nfs_file_localio *nfl,
- struct nfsd_file __rcu **pnf,
const fmode_t fmode)
{
struct net *net;
@@ -262,7 +261,7 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
rcu_read_unlock();
/* We have an implied reference to net thanks to nfsd_net_try_get */
localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt,
- cred, nfs_fh, pnf, fmode);
+ cred, nfs_fh, fmode);
nfs_to_nfsd_net_put(net);
if (!IS_ERR(localio))
nfs_uuid_add_file(uuid, nfl);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 80d9ff6608a7..40998283b858 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -32,7 +32,6 @@
* @rpc_clnt: rpc_clnt that the client established
* @cred: cred that the client established
* @nfs_fh: filehandle to lookup
- * @nfp: place to find the nfsd_file, or store it if it was non-NULL
* @fmode: fmode_t to use for open
*
* This function maps a local fh to a path on a local filesystem.
@@ -43,11 +42,10 @@
* set. Caller (NFS client) is responsible for calling nfsd_net_put and
* nfsd_file_put (via nfs_to_nfsd_file_put_local).
*/
-static struct nfsd_file *
+struct nfsd_file *
nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
- const struct nfs_fh *nfs_fh, struct nfsd_file __rcu **pnf,
- const fmode_t fmode)
+ const struct nfs_fh *nfs_fh, const fmode_t fmode)
{
int mayflags = NFSD_MAY_LOCALIO;
struct svc_cred rq_cred;
@@ -61,12 +59,6 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
if (!nfsd_net_try_get(net))
return ERR_PTR(-ENXIO);
- rcu_read_lock();
- localio = nfsd_file_get(rcu_dereference(*pnf));
- rcu_read_unlock();
- if (localio)
- return localio;
-
/* nfs_fh -> svc_fh */
fh_init(&fh, NFS4_FHSIZE);
fh.fh_handle.fh_size = nfs_fh->size;
@@ -88,33 +80,12 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
if (rq_cred.cr_group_info)
put_group_info(rq_cred.cr_group_info);
- if (!IS_ERR(localio)) {
- struct nfsd_file *new;
- if (!nfsd_net_try_get(net)) {
- nfsd_file_put(localio);
- nfsd_net_put(net);
- return ERR_PTR(-ENXIO);
- }
- nfsd_file_get(localio);
- again:
- new = unrcu_pointer(cmpxchg(pnf, NULL, RCU_INITIALIZER(localio)));
- if (new) {
- /* Some other thread installed an nfsd_file */
- if (nfsd_file_get(new) == NULL)
- goto again;
- /*
- * Drop the ref we were going to install and the
- * one we were going to return.
- */
- nfsd_file_put(localio);
- nfsd_file_put(localio);
- localio = new;
- }
- } else
+ if (IS_ERR(localio))
nfsd_net_put(net);
return localio;
}
+EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
static const struct nfsd_localio_operations nfsd_localio_ops = {
.nfsd_net_try_get = nfsd_net_try_get,
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 453d9de3d70b..c3f34bae60e1 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -50,6 +50,10 @@ void nfs_localio_invalidate_clients(struct list_head *nn_local_clients,
spinlock_t *nn_local_clients_lock);
/* localio needs to map filehandle -> struct nfsd_file */
+extern struct nfsd_file *
+nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
+ const struct cred *, const struct nfs_fh *,
+ const fmode_t) __must_hold(rcu);
void nfs_close_local_fh(struct nfs_file_localio *);
struct nfsd_localio_operations {
@@ -60,7 +64,6 @@ struct nfsd_localio_operations {
struct rpc_clnt *,
const struct cred *,
const struct nfs_fh *,
- struct nfsd_file __rcu **,
const fmode_t);
struct net *(*nfsd_file_put_local)(struct nfsd_file *);
struct nfsd_file *(*nfsd_file_get_local)(struct nfsd_file *);
@@ -73,7 +76,6 @@ extern const struct nfsd_localio_operations *nfs_to;
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *,
struct rpc_clnt *, const struct cred *,
const struct nfs_fh *, struct nfs_file_localio *,
- struct nfsd_file __rcu **pnf,
const fmode_t);
static inline void nfs_to_nfsd_net_put(struct net *net)
--
2.44.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [for-6.16-final PATCH 6/9] Revert "nfs_localio: always hold nfsd net ref with nfsd_file ref"
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
` (4 preceding siblings ...)
2025-07-14 3:13 ` [for-6.16-final PATCH 5/9] Revert "nfs_localio: simplify interface to nfsd for getting nfsd_file" Mike Snitzer
@ 2025-07-14 3:13 ` Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 7/9] Revert "nfs_localio: use cmpxchg() to install new nfs_file_localio" Mike Snitzer
` (2 subsequent siblings)
8 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 3:13 UTC (permalink / raw)
To: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Cc: linux-nfs
This reverts commit 14c7e3db9a8663de3bda1d4efe6a98c7d84f5b57.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 4 ++--
fs/nfs_common/nfslocalio.c | 5 +++--
fs/nfsd/filecache.c | 21 ---------------------
fs/nfsd/filecache.h | 1 -
fs/nfsd/localio.c | 9 ++-------
include/linux/nfslocalio.h | 3 ++-
6 files changed, 9 insertions(+), 34 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 86df8d2cd22e..7a33da477da3 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -211,12 +211,12 @@ EXPORT_SYMBOL_GPL(nfs_local_probe_async);
static inline struct nfsd_file *nfs_local_file_get(struct nfsd_file *nf)
{
- return nfs_to->nfsd_file_get_local(nf);
+ return nfs_to->nfsd_file_get(nf);
}
static inline void nfs_local_file_put(struct nfsd_file *nf)
{
- nfs_to_nfsd_file_put_local(nf);
+ nfs_to->nfsd_file_put(nf);
}
/*
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index f6821b2c87a2..bdf251332b6b 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -262,8 +262,9 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
/* We have an implied reference to net thanks to nfsd_net_try_get */
localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt,
cred, nfs_fh, fmode);
- nfs_to_nfsd_net_put(net);
- if (!IS_ERR(localio))
+ if (IS_ERR(localio))
+ nfs_to_nfsd_net_put(net);
+ else
nfs_uuid_add_file(uuid, nfl);
return localio;
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 06150dd171be..6d9d7c2430ba 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -389,27 +389,6 @@ nfsd_file_put_local(struct nfsd_file *nf)
return net;
}
-/**
- * nfsd_file_get_local - get nfsd_file reference and reference to net
- * @nf: nfsd_file of which to put the reference
- *
- * Get reference to both the nfsd_file and nf->nf_net.
- */
-struct nfsd_file *
-nfsd_file_get_local(struct nfsd_file *nf)
-{
- struct net *net = nf->nf_net;
-
- if (nfsd_net_try_get(net)) {
- nf = nfsd_file_get(nf);
- if (!nf)
- nfsd_net_put(net);
- } else {
- nf = NULL;
- }
- return nf;
-}
-
/**
* nfsd_file_file - get the backing file of an nfsd_file
* @nf: nfsd_file of which to access the backing file.
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index d41428ce8a11..fa7638007fbd 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -67,7 +67,6 @@ int nfsd_file_cache_start_net(struct net *net);
void nfsd_file_cache_shutdown_net(struct net *net);
void nfsd_file_put(struct nfsd_file *nf);
struct net *nfsd_file_put_local(struct nfsd_file *nf);
-struct nfsd_file *nfsd_file_get_local(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
struct file *nfsd_file_file(struct nfsd_file *nf);
void nfsd_file_close_inode_sync(struct inode *inode);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 40998283b858..842366707eb1 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -56,9 +56,6 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
if (nfs_fh->size > NFS4_FHSIZE)
return ERR_PTR(-EINVAL);
- if (!nfsd_net_try_get(net))
- return ERR_PTR(-ENXIO);
-
/* nfs_fh -> svc_fh */
fh_init(&fh, NFS4_FHSIZE);
fh.fh_handle.fh_size = nfs_fh->size;
@@ -80,9 +77,6 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
if (rq_cred.cr_group_info)
put_group_info(rq_cred.cr_group_info);
- if (IS_ERR(localio))
- nfsd_net_put(net);
-
return localio;
}
EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
@@ -92,7 +86,8 @@ static const struct nfsd_localio_operations nfsd_localio_ops = {
.nfsd_net_put = nfsd_net_put,
.nfsd_open_local_fh = nfsd_open_local_fh,
.nfsd_file_put_local = nfsd_file_put_local,
- .nfsd_file_get_local = nfsd_file_get_local,
+ .nfsd_file_get = nfsd_file_get,
+ .nfsd_file_put = nfsd_file_put,
.nfsd_file_file = nfsd_file_file,
};
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index c3f34bae60e1..9aa8a43843d7 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -66,7 +66,8 @@ struct nfsd_localio_operations {
const struct nfs_fh *,
const fmode_t);
struct net *(*nfsd_file_put_local)(struct nfsd_file *);
- struct nfsd_file *(*nfsd_file_get_local)(struct nfsd_file *);
+ struct nfsd_file *(*nfsd_file_get)(struct nfsd_file *);
+ void (*nfsd_file_put)(struct nfsd_file *);
struct file *(*nfsd_file_file)(struct nfsd_file *);
} ____cacheline_aligned;
--
2.44.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [for-6.16-final PATCH 7/9] Revert "nfs_localio: use cmpxchg() to install new nfs_file_localio"
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
` (5 preceding siblings ...)
2025-07-14 3:13 ` [for-6.16-final PATCH 6/9] Revert "nfs_localio: always hold nfsd net ref with nfsd_file ref" Mike Snitzer
@ 2025-07-14 3:13 ` Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 9/9] nfs/localio: add localio_async_probe modparm Mike Snitzer
8 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 3:13 UTC (permalink / raw)
To: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Cc: linux-nfs
This reverts commit 9fccbbec13a73ae0e75448a00e20dadc8c511c8c.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 11 ++++++++---
fs/nfs_common/nfslocalio.c | 39 +++++++++++++++++++++-----------------
2 files changed, 30 insertions(+), 20 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 7a33da477da3..a4bacd9a5052 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -282,9 +282,14 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
return NULL;
rcu_read_lock();
/* try to swap in the pointer */
- nf = unrcu_pointer(cmpxchg(pnf, NULL, RCU_INITIALIZER(new)));
- if (!nf)
- swap(nf, new);
+ spin_lock(&clp->cl_uuid.lock);
+ nf = rcu_dereference_protected(*pnf, 1);
+ if (!nf) {
+ nf = new;
+ new = NULL;
+ rcu_assign_pointer(*pnf, nf);
+ }
+ spin_unlock(&clp->cl_uuid.lock);
}
nf = nfs_local_file_get(nf);
rcu_read_unlock();
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index bdf251332b6b..6a0bdea6d644 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -273,8 +273,8 @@ EXPORT_SYMBOL_GPL(nfs_open_local_fh);
void nfs_close_local_fh(struct nfs_file_localio *nfl)
{
- struct nfsd_file *ro_nf;
- struct nfsd_file *rw_nf;
+ struct nfsd_file *ro_nf = NULL;
+ struct nfsd_file *rw_nf = NULL;
nfs_uuid_t *nfs_uuid;
rcu_read_lock();
@@ -285,23 +285,28 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
return;
}
- ro_nf = unrcu_pointer(xchg(&nfl->ro_file, NULL));
- rw_nf = unrcu_pointer(xchg(&nfl->rw_file, NULL));
+ ro_nf = rcu_access_pointer(nfl->ro_file);
+ rw_nf = rcu_access_pointer(nfl->rw_file);
+ if (ro_nf || rw_nf) {
+ spin_lock(&nfs_uuid->lock);
+ if (ro_nf)
+ ro_nf = rcu_dereference_protected(xchg(&nfl->ro_file, NULL), 1);
+ if (rw_nf)
+ rw_nf = rcu_dereference_protected(xchg(&nfl->rw_file, NULL), 1);
- spin_lock(&nfs_uuid->lock);
- /* Remove nfl from nfs_uuid->files list */
- list_del_init(&nfl->list);
- spin_unlock(&nfs_uuid->lock);
+ /* Remove nfl from nfs_uuid->files list */
+ RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
+ list_del_init(&nfl->list);
+ spin_unlock(&nfs_uuid->lock);
+ rcu_read_unlock();
+
+ if (ro_nf)
+ nfs_to_nfsd_file_put_local(ro_nf);
+ if (rw_nf)
+ nfs_to_nfsd_file_put_local(rw_nf);
+ return;
+ }
rcu_read_unlock();
- /* Now we can allow racing nfs_close_local_fh() to
- * skip the locking.
- */
- RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
-
- if (ro_nf)
- nfs_to_nfsd_file_put_local(ro_nf);
- if (rw_nf)
- nfs_to_nfsd_file_put_local(rw_nf);
}
EXPORT_SYMBOL_GPL(nfs_close_local_fh);
--
2.44.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
` (6 preceding siblings ...)
2025-07-14 3:13 ` [for-6.16-final PATCH 7/9] Revert "nfs_localio: use cmpxchg() to install new nfs_file_localio" Mike Snitzer
@ 2025-07-14 3:13 ` Mike Snitzer
2025-07-14 4:19 ` NeilBrown
2025-07-14 12:23 ` Jeff Layton
2025-07-14 3:13 ` [for-6.16-final PATCH 9/9] nfs/localio: add localio_async_probe modparm Mike Snitzer
8 siblings, 2 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 3:13 UTC (permalink / raw)
To: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Cc: linux-nfs
From: Mike Snitzer <snitzer@hammerspace.com>
Previously nfs_local_probe() was made to disable and then attempt to
re-enable LOCALIO (via LOCALIO protocol handshake) if/when it was
called and LOCALIO already enabled.
Vague memory for _why_ this was the case is that this was useful
if/when a local NFS server were to be restarted with a local NFS
client connected to it.
But as it happens this causes an absurd amount of LOCALIO flapping
which has a side-effect of too much IO being needlessly sent to NFSD
(using RPC over the loopback network interface). This is the
definition of "serious performance loss" (that negates the point of
having LOCALIO).
So remove this mis-optimization for re-enabling LOCALIO if/when an NFS
server is restarted (which is an extremely rare thing to do). Will
revisit testing that scenario again but in the meantime this patch
restores the full benefit of LOCALIO.
Fixes: 56bcd0f07fdb ("nfs: implement client support for NFS_LOCALIO_PROGRAM")
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
fs/nfs/localio.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index a4bacd9a5052..e3ae003118cb 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -180,10 +180,8 @@ static void nfs_local_probe(struct nfs_client *clp)
return;
}
- if (nfs_client_is_local(clp)) {
- /* If already enabled, disable and re-enable */
- nfs_localio_disable_client(clp);
- }
+ if (nfs_client_is_local(clp))
+ return;
if (!nfs_uuid_begin(&clp->cl_uuid))
return;
@@ -241,7 +239,8 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
case -ENOMEM:
case -ENXIO:
case -ENOENT:
- /* Revalidate localio, will disable if unsupported */
+ /* Revalidate localio */
+ nfs_localio_disable_client(clp);
nfs_local_probe(clp);
}
}
--
2.44.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [for-6.16-final PATCH 9/9] nfs/localio: add localio_async_probe modparm
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
` (7 preceding siblings ...)
2025-07-14 3:13 ` [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
@ 2025-07-14 3:13 ` Mike Snitzer
2025-07-14 4:23 ` NeilBrown
8 siblings, 1 reply; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 3:13 UTC (permalink / raw)
To: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Cc: linux-nfs
From: Mike Snitzer <snitzer@hammerspace.com>
This knob influences the LOCALIO handshake so that it happens
synchronously upon NFS client creation.. which reduces the window of
opportunity for a bunch of IO to flood page cache and send out over to
NFSD before LOCALIO handshake negotiates that the client and server
are local. The knob is:
echo N > /sys/module/nfs/parameters/localio_async_probe
Fixes: 1ff4716f420b ("NFS: always probe for LOCALIO support asynchronously")
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
fs/nfs/localio.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index e3ae003118cb..76ca9bd21d2e 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -49,6 +49,11 @@ struct nfs_local_fsync_ctx {
static bool localio_enabled __read_mostly = true;
module_param(localio_enabled, bool, 0644);
+static bool localio_async_probe __read_mostly = true;
+module_param(localio_async_probe, bool, 0644);
+MODULE_PARM_DESC(localio_async_probe,
+ "Probe for LOCALIO support asynchronously.");
+
static bool localio_O_DIRECT_semantics __read_mostly = false;
module_param(localio_O_DIRECT_semantics, bool, 0644);
MODULE_PARM_DESC(localio_O_DIRECT_semantics,
@@ -203,7 +208,10 @@ void nfs_local_probe_async_work(struct work_struct *work)
void nfs_local_probe_async(struct nfs_client *clp)
{
- queue_work(nfsiod_workqueue, &clp->cl_local_probe_work);
+ if (likely(localio_async_probe))
+ queue_work(nfsiod_workqueue, &clp->cl_local_probe_work);
+ else
+ nfs_local_probe(clp);
}
EXPORT_SYMBOL_GPL(nfs_local_probe_async);
--
2.44.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer"
2025-07-08 14:20 ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
@ 2025-07-14 3:50 ` NeilBrown
2025-07-14 14:45 ` Mike Snitzer
2025-07-15 22:52 ` [PATCH 0/3] Fix localio hangs Trond Myklebust
2 siblings, 1 reply; 56+ messages in thread
From: NeilBrown @ 2025-07-14 3:50 UTC (permalink / raw)
To: Mike Snitzer
Cc: Trond Myklebust, Anna Schumaker, Pali Rohár, Vincent Mailhol,
Chuck Lever, Jeff Layton, linux-nfs
On Wed, 09 Jul 2025, Mike Snitzer wrote:
> [Preface: this revert makes it much less likely to "lose the race",
> whereby causing nfsd_shutdown_net() to hang, so we'd do well to take
> the time/care to properly fix whatever is lacking in Neil's commit
> c25a89770d1f]
Was this the first time you posted on this issue? If so it seem strange
to start a discussion with a revert with out a clear undertstanding of
the problem...
Maybe
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -177,7 +177,7 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
/* nfs_close_local_fh() is doing the
* close and we must wait. until it unlinks
*/
- wait_var_event_spinlock(nfl,
+ wait_var_event_spinlock(nfl->nfs_uuid,
list_first_entry_or_null(
&nfs_uuid->files,
struct nfs_file_localio,
will fix the problem - I'm waiting on the wrong address, which could
cause various things to hang.
NeilBrown
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
2025-07-14 3:13 ` [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
@ 2025-07-14 4:19 ` NeilBrown
2025-07-14 14:37 ` Mike Snitzer
2025-07-14 12:23 ` Jeff Layton
1 sibling, 1 reply; 56+ messages in thread
From: NeilBrown @ 2025-07-14 4:19 UTC (permalink / raw)
To: Mike Snitzer
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
linux-nfs
On Mon, 14 Jul 2025, Mike Snitzer wrote:
> From: Mike Snitzer <snitzer@hammerspace.com>
>
> Previously nfs_local_probe() was made to disable and then attempt to
> re-enable LOCALIO (via LOCALIO protocol handshake) if/when it was
> called and LOCALIO already enabled.
>
> Vague memory for _why_ this was the case is that this was useful
> if/when a local NFS server were to be restarted with a local NFS
> client connected to it.
>
> But as it happens this causes an absurd amount of LOCALIO flapping
> which has a side-effect of too much IO being needlessly sent to NFSD
> (using RPC over the loopback network interface). This is the
> definition of "serious performance loss" (that negates the point of
> having LOCALIO).
>
> So remove this mis-optimization for re-enabling LOCALIO if/when an NFS
> server is restarted (which is an extremely rare thing to do). Will
> revisit testing that scenario again but in the meantime this patch
> restores the full benefit of LOCALIO.
>
> Fixes: 56bcd0f07fdb ("nfs: implement client support for NFS_LOCALIO_PROGRAM")
> Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
Reviewed-by: NeilBrown <neil@brown.name>
I cannot see any justification for probing if localio is currently
thought to be working. If for some reason it doesn't work, then when we
notice that is a good time to disable - which it what this patch does.
However...
> ---
> fs/nfs/localio.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index a4bacd9a5052..e3ae003118cb 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -180,10 +180,8 @@ static void nfs_local_probe(struct nfs_client *clp)
> return;
> }
>
> - if (nfs_client_is_local(clp)) {
> - /* If already enabled, disable and re-enable */
> - nfs_localio_disable_client(clp);
> - }
> + if (nfs_client_is_local(clp))
> + return;
>
> if (!nfs_uuid_begin(&clp->cl_uuid))
> return;
> @@ -241,7 +239,8 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> case -ENOMEM:
> case -ENXIO:
> case -ENOENT:
> - /* Revalidate localio, will disable if unsupported */
> + /* Revalidate localio */
> + nfs_localio_disable_client(clp);
> nfs_local_probe(clp);
Shouldn't that be nfs_local_probe_async() ???? I wonder why that wasn't
changed in
Commit 1ff4716f420b ("NFS: always probe for LOCALIO support asynchronously")
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [for-6.16-final PATCH 9/9] nfs/localio: add localio_async_probe modparm
2025-07-14 3:13 ` [for-6.16-final PATCH 9/9] nfs/localio: add localio_async_probe modparm Mike Snitzer
@ 2025-07-14 4:23 ` NeilBrown
2025-07-14 12:28 ` Jeff Layton
0 siblings, 1 reply; 56+ messages in thread
From: NeilBrown @ 2025-07-14 4:23 UTC (permalink / raw)
To: Mike Snitzer
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
linux-nfs
On Mon, 14 Jul 2025, Mike Snitzer wrote:
> From: Mike Snitzer <snitzer@hammerspace.com>
>
> This knob influences the LOCALIO handshake so that it happens
> synchronously upon NFS client creation.. which reduces the window of
> opportunity for a bunch of IO to flood page cache and send out over to
> NFSD before LOCALIO handshake negotiates that the client and server
> are local. The knob is:
> echo N > /sys/module/nfs/parameters/localio_async_probe
I understand why you are adding this but .... yuck. Tuning knobs are
best avoided.
Maybe we could still do the probe async, but in mount wait for 200ms,
or for the probe to get a reply. That should make everyone happy.
NeilBrown
>
> Fixes: 1ff4716f420b ("NFS: always probe for LOCALIO support asynchronously")
> Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
> ---
> fs/nfs/localio.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index e3ae003118cb..76ca9bd21d2e 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -49,6 +49,11 @@ struct nfs_local_fsync_ctx {
> static bool localio_enabled __read_mostly = true;
> module_param(localio_enabled, bool, 0644);
>
> +static bool localio_async_probe __read_mostly = true;
> +module_param(localio_async_probe, bool, 0644);
> +MODULE_PARM_DESC(localio_async_probe,
> + "Probe for LOCALIO support asynchronously.");
> +
> static bool localio_O_DIRECT_semantics __read_mostly = false;
> module_param(localio_O_DIRECT_semantics, bool, 0644);
> MODULE_PARM_DESC(localio_O_DIRECT_semantics,
> @@ -203,7 +208,10 @@ void nfs_local_probe_async_work(struct work_struct *work)
>
> void nfs_local_probe_async(struct nfs_client *clp)
> {
> - queue_work(nfsiod_workqueue, &clp->cl_local_probe_work);
> + if (likely(localio_async_probe))
> + queue_work(nfsiod_workqueue, &clp->cl_local_probe_work);
> + else
> + nfs_local_probe(clp);
> }
> EXPORT_SYMBOL_GPL(nfs_local_probe_async);
>
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
2025-07-14 3:13 ` [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
2025-07-14 4:19 ` NeilBrown
@ 2025-07-14 12:23 ` Jeff Layton
1 sibling, 0 replies; 56+ messages in thread
From: Jeff Layton @ 2025-07-14 12:23 UTC (permalink / raw)
To: Mike Snitzer, Anna Schumaker, Trond Myklebust, Chuck Lever,
NeilBrown
Cc: linux-nfs
On Sun, 2025-07-13 at 23:13 -0400, Mike Snitzer wrote:
> From: Mike Snitzer <snitzer@hammerspace.com>
>
> Previously nfs_local_probe() was made to disable and then attempt to
> re-enable LOCALIO (via LOCALIO protocol handshake) if/when it was
> called and LOCALIO already enabled.
>
> Vague memory for _why_ this was the case is that this was useful
> if/when a local NFS server were to be restarted with a local NFS
> client connected to it.
>
> But as it happens this causes an absurd amount of LOCALIO flapping
> which has a side-effect of too much IO being needlessly sent to NFSD
> (using RPC over the loopback network interface). This is the
> definition of "serious performance loss" (that negates the point of
> having LOCALIO).
>
> So remove this mis-optimization for re-enabling LOCALIO if/when an NFS
> server is restarted (which is an extremely rare thing to do). Will
> revisit testing that scenario again but in the meantime this patch
> restores the full benefit of LOCALIO.
>
> Fixes: 56bcd0f07fdb ("nfs: implement client support for NFS_LOCALIO_PROGRAM")
> Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
> ---
> fs/nfs/localio.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index a4bacd9a5052..e3ae003118cb 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -180,10 +180,8 @@ static void nfs_local_probe(struct nfs_client *clp)
> return;
> }
>
> - if (nfs_client_is_local(clp)) {
> - /* If already enabled, disable and re-enable */
> - nfs_localio_disable_client(clp);
> - }
> + if (nfs_client_is_local(clp))
> + return;
>
> if (!nfs_uuid_begin(&clp->cl_uuid))
> return;
> @@ -241,7 +239,8 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> case -ENOMEM:
> case -ENXIO:
> case -ENOENT:
> - /* Revalidate localio, will disable if unsupported */
> + /* Revalidate localio */
> + nfs_localio_disable_client(clp);
> nfs_local_probe(clp);
> }
> }
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [for-6.16-final PATCH 9/9] nfs/localio: add localio_async_probe modparm
2025-07-14 4:23 ` NeilBrown
@ 2025-07-14 12:28 ` Jeff Layton
2025-07-14 14:08 ` Mike Snitzer
0 siblings, 1 reply; 56+ messages in thread
From: Jeff Layton @ 2025-07-14 12:28 UTC (permalink / raw)
To: NeilBrown, Mike Snitzer
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, linux-nfs
On Mon, 2025-07-14 at 14:23 +1000, NeilBrown wrote:
> On Mon, 14 Jul 2025, Mike Snitzer wrote:
> > From: Mike Snitzer <snitzer@hammerspace.com>
> >
> > This knob influences the LOCALIO handshake so that it happens
> > synchronously upon NFS client creation.. which reduces the window of
> > opportunity for a bunch of IO to flood page cache and send out over to
> > NFSD before LOCALIO handshake negotiates that the client and server
> > are local. The knob is:
> > echo N > /sys/module/nfs/parameters/localio_async_probe
>
> I understand why you are adding this but .... yuck. Tuning knobs are
> best avoided.
>
> Maybe we could still do the probe async, but in mount wait for 200ms,
> or for the probe to get a reply. That should make everyone happy.
>
> NeilBrown
>
Agreed. I'd prefer to not need a tuning knob for this. Doing a short
wait in the mount for localio negotiation would be a lot more
reasonable.
That said, why is the LOCALIO negotiation taking so long? Shouldn't
that be basically instantaneous during the mount? Do we really have
applications that are hammering reads and writes just after the mount
returns but before the localio negotiation completes?
>
> >
> > Fixes: 1ff4716f420b ("NFS: always probe for LOCALIO support asynchronously")
> > Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
> > ---
> > fs/nfs/localio.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > index e3ae003118cb..76ca9bd21d2e 100644
> > --- a/fs/nfs/localio.c
> > +++ b/fs/nfs/localio.c
> > @@ -49,6 +49,11 @@ struct nfs_local_fsync_ctx {
> > static bool localio_enabled __read_mostly = true;
> > module_param(localio_enabled, bool, 0644);
> >
> > +static bool localio_async_probe __read_mostly = true;
> > +module_param(localio_async_probe, bool, 0644);
> > +MODULE_PARM_DESC(localio_async_probe,
> > + "Probe for LOCALIO support asynchronously.");
> > +
> > static bool localio_O_DIRECT_semantics __read_mostly = false;
> > module_param(localio_O_DIRECT_semantics, bool, 0644);
> > MODULE_PARM_DESC(localio_O_DIRECT_semantics,
> > @@ -203,7 +208,10 @@ void nfs_local_probe_async_work(struct work_struct *work)
> >
> > void nfs_local_probe_async(struct nfs_client *clp)
> > {
> > - queue_work(nfsiod_workqueue, &clp->cl_local_probe_work);
> > + if (likely(localio_async_probe))
> > + queue_work(nfsiod_workqueue, &clp->cl_local_probe_work);
> > + else
> > + nfs_local_probe(clp);
> > }
> > EXPORT_SYMBOL_GPL(nfs_local_probe_async);
> >
> > --
> > 2.44.0
> >
> >
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [for-6.16-final PATCH 9/9] nfs/localio: add localio_async_probe modparm
2025-07-14 12:28 ` Jeff Layton
@ 2025-07-14 14:08 ` Mike Snitzer
0 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 14:08 UTC (permalink / raw)
To: Jeff Layton
Cc: NeilBrown, Anna Schumaker, Trond Myklebust, Chuck Lever,
linux-nfs
On Mon, Jul 14, 2025 at 08:28:22AM -0400, Jeff Layton wrote:
> On Mon, 2025-07-14 at 14:23 +1000, NeilBrown wrote:
> > On Mon, 14 Jul 2025, Mike Snitzer wrote:
> > > From: Mike Snitzer <snitzer@hammerspace.com>
> > >
> > > This knob influences the LOCALIO handshake so that it happens
> > > synchronously upon NFS client creation.. which reduces the window of
> > > opportunity for a bunch of IO to flood page cache and send out over to
> > > NFSD before LOCALIO handshake negotiates that the client and server
> > > are local. The knob is:
> > > echo N > /sys/module/nfs/parameters/localio_async_probe
> >
> > I understand why you are adding this but .... yuck. Tuning knobs are
> > best avoided.
> >
> > Maybe we could still do the probe async, but in mount wait for 200ms,
> > or for the probe to get a reply. That should make everyone happy.
> >
> > NeilBrown
> >
>
> Agreed. I'd prefer to not need a tuning knob for this. Doing a short
> wait in the mount for localio negotiation would be a lot more
> reasonable.
OK.
> That said, why is the LOCALIO negotiation taking so long? Shouldn't
> that be basically instantaneous during the mount? Do we really have
> applications that are hammering reads and writes just after the mount
> returns but before the localio negotiation completes?
Yes, buffered IO in particular brings the hammer via a storm of page
cache IO that takes a while to be complete before LOCALIO ever gets an
opportunity to service NFS client IO.
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
2025-07-14 4:19 ` NeilBrown
@ 2025-07-14 14:37 ` Mike Snitzer
0 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 14:37 UTC (permalink / raw)
To: NeilBrown
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
linux-nfs
On Mon, Jul 14, 2025 at 02:19:33PM +1000, NeilBrown wrote:
> On Mon, 14 Jul 2025, Mike Snitzer wrote:
> > From: Mike Snitzer <snitzer@hammerspace.com>
> >
> > Previously nfs_local_probe() was made to disable and then attempt to
> > re-enable LOCALIO (via LOCALIO protocol handshake) if/when it was
> > called and LOCALIO already enabled.
> >
> > Vague memory for _why_ this was the case is that this was useful
> > if/when a local NFS server were to be restarted with a local NFS
> > client connected to it.
> >
> > But as it happens this causes an absurd amount of LOCALIO flapping
> > which has a side-effect of too much IO being needlessly sent to NFSD
> > (using RPC over the loopback network interface). This is the
> > definition of "serious performance loss" (that negates the point of
> > having LOCALIO).
> >
> > So remove this mis-optimization for re-enabling LOCALIO if/when an NFS
> > server is restarted (which is an extremely rare thing to do). Will
> > revisit testing that scenario again but in the meantime this patch
> > restores the full benefit of LOCALIO.
> >
> > Fixes: 56bcd0f07fdb ("nfs: implement client support for NFS_LOCALIO_PROGRAM")
> > Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
>
> Reviewed-by: NeilBrown <neil@brown.name>
>
> I cannot see any justification for probing if localio is currently
> thought to be working. If for some reason it doesn't work, then when we
> notice that is a good time to disable - which it what this patch does.
Thanks for the review!
> However...
>
> > ---
> > fs/nfs/localio.c | 9 ++++-----
> > 1 file changed, 4 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > index a4bacd9a5052..e3ae003118cb 100644
> > --- a/fs/nfs/localio.c
> > +++ b/fs/nfs/localio.c
> > @@ -180,10 +180,8 @@ static void nfs_local_probe(struct nfs_client *clp)
> > return;
> > }
> >
> > - if (nfs_client_is_local(clp)) {
> > - /* If already enabled, disable and re-enable */
> > - nfs_localio_disable_client(clp);
> > - }
> > + if (nfs_client_is_local(clp))
> > + return;
> >
> > if (!nfs_uuid_begin(&clp->cl_uuid))
> > return;
> > @@ -241,7 +239,8 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> > case -ENOMEM:
> > case -ENXIO:
> > case -ENOENT:
> > - /* Revalidate localio, will disable if unsupported */
> > + /* Revalidate localio */
> > + nfs_localio_disable_client(clp);
> > nfs_local_probe(clp);
>
> Shouldn't that be nfs_local_probe_async() ???? I wonder why that wasn't
> changed in
> Commit 1ff4716f420b ("NFS: always probe for LOCALIO support asynchronously")
Creates the potential for many files being opened and each triggering
their own async probe. I reasoned it best to make this error path
wait. In practice that has worked reasonably...
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer"
2025-07-14 3:50 ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" NeilBrown
@ 2025-07-14 14:45 ` Mike Snitzer
0 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2025-07-14 14:45 UTC (permalink / raw)
To: NeilBrown
Cc: Trond Myklebust, Anna Schumaker, Pali Rohár, Vincent Mailhol,
Chuck Lever, Jeff Layton, linux-nfs
On Mon, Jul 14, 2025 at 01:50:48PM +1000, NeilBrown wrote:
> On Wed, 09 Jul 2025, Mike Snitzer wrote:
> > [Preface: this revert makes it much less likely to "lose the race",
> > whereby causing nfsd_shutdown_net() to hang, so we'd do well to take
> > the time/care to properly fix whatever is lacking in Neil's commit
> > c25a89770d1f]
>
> Was this the first time you posted on this issue? If so it seem strange
> to start a discussion with a revert with out a clear undertstanding of
> the problem...
Might seem strange, but it seems strange for code to have gotten
upstream without having been properly tested to reveal such basic
issues. And when I embarked on what should've been a quick revert,
only to find that the series of changes weren't even bisect safe, that
only gave me more justification to rip all the code out in the hopes
of restoring known solid LOCALIO functionality (from v6.14).
>
> Maybe
>
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -177,7 +177,7 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
> /* nfs_close_local_fh() is doing the
> * close and we must wait. until it unlinks
> */
> - wait_var_event_spinlock(nfl,
> + wait_var_event_spinlock(nfl->nfs_uuid,
> list_first_entry_or_null(
> &nfs_uuid->files,
> struct nfs_file_localio,
>
>
> will fix the problem - I'm waiting on the wrong address, which could
> cause various things to hang.
Yes, as I just replied to your official patch posting for this, I will
test. But the "maybe" nature lacks confidence and that needs to
improve. ;) Are you able to test LOCALIO?
I'll work to get better at making time to treat your LOCALIO changes
as if they are my own and to fully embrace associated review/test
work. But that work never quite reaches the same level of investment
for me to do so as when I develop the change.. maybe "normal" but I
need to get past that. We're in this together!
Thanks for your time on this Neil!
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 0/3] Fix localio hangs
2025-07-08 14:20 ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
2025-07-14 3:50 ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" NeilBrown
@ 2025-07-15 22:52 ` Trond Myklebust
2025-07-15 22:52 ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
` (6 more replies)
2 siblings, 7 replies; 56+ messages in thread
From: Trond Myklebust @ 2025-07-15 22:52 UTC (permalink / raw)
To: Anna Schumaker; +Cc: NeilBrown, Mike Snitzer, linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
The following patch series fixes a series of issues with the current
localio code, as reported in the link
https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
Trond Myklebust (3):
NFS/localio: nfs_close_local_fh() fix check for file closed
NFS/localio: nfs_uuid_put() fix the wait for file unlink events
NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
fs/nfs_common/nfslocalio.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed
2025-07-15 22:52 ` [PATCH 0/3] Fix localio hangs Trond Myklebust
@ 2025-07-15 22:52 ` Trond Myklebust
2025-07-15 22:52 ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events Trond Myklebust
` (5 subsequent siblings)
6 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2025-07-15 22:52 UTC (permalink / raw)
To: Anna Schumaker; +Cc: NeilBrown, Mike Snitzer, linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
If the struct nfs_file_localio is closed, its list entry will be empty,
but the nfs_uuid->files list might still contain other entries.
Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs_common/nfslocalio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 05c7c16e37ab..64949c46c174 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -314,7 +314,7 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
rcu_read_unlock();
return;
}
- if (list_empty(&nfs_uuid->files)) {
+ if (list_empty(&nfl->list)) {
/* nfs_uuid_put() has started closing files, wait for it
* to finished
*/
--
2.50.1
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events
2025-07-15 22:52 ` [PATCH 0/3] Fix localio hangs Trond Myklebust
2025-07-15 22:52 ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
@ 2025-07-15 22:52 ` Trond Myklebust
2025-07-15 22:52 ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file Trond Myklebust
` (4 subsequent siblings)
6 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2025-07-15 22:52 UTC (permalink / raw)
To: Anna Schumaker; +Cc: NeilBrown, Mike Snitzer, linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
No reference to nfl is held when waiting in nfs_uuid_put(), so not only
must the event condition check if the first entry in the list has
changed, it must also check if the nfl->nfs_uuid field is still NULL,
in case the old entry was replaced.
Also change the event variable to be nfs_uuid, for the same reason that
no reference is held to nfl.
Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs_common/nfslocalio.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 64949c46c174..d157fdc068d7 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -177,12 +177,13 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
/* nfs_close_local_fh() is doing the
* close and we must wait. until it unlinks
*/
- wait_var_event_spinlock(nfl,
- list_first_entry_or_null(
- &nfs_uuid->files,
- struct nfs_file_localio,
- list) != nfl,
- &nfs_uuid->lock);
+ wait_var_event_spinlock(
+ nfs_uuid,
+ list_first_entry_or_null(
+ &nfs_uuid->files,
+ struct nfs_file_localio, list) != nfl ||
+ rcu_access_pointer(nfl->nfs_uuid),
+ &nfs_uuid->lock);
continue;
}
@@ -338,7 +339,7 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
*/
spin_lock(&nfs_uuid->lock);
list_del_init(&nfl->list);
- wake_up_var_locked(&nfl->nfs_uuid, &nfs_uuid->lock);
+ wake_up_var_locked(nfs_uuid, &nfs_uuid->lock);
spin_unlock(&nfs_uuid->lock);
}
EXPORT_SYMBOL_GPL(nfs_close_local_fh);
--
2.50.1
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
2025-07-15 22:52 ` [PATCH 0/3] Fix localio hangs Trond Myklebust
2025-07-15 22:52 ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
2025-07-15 22:52 ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events Trond Myklebust
@ 2025-07-15 22:52 ` Trond Myklebust
2025-07-16 1:09 ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed NeilBrown
` (3 subsequent siblings)
6 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2025-07-15 22:52 UTC (permalink / raw)
To: Anna Schumaker; +Cc: NeilBrown, Mike Snitzer, linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
After setting nfl->nfs_uuid to NULL, dereferences of nfl should be
avoided, since there are no further guarantees that the memory is still
allocated.
Also change the wake_up_var_locked() to be a regular wake_up_var(),
since nfs_close_local_fh() cannot retake the nfs_uuid->lock after
dropping it.
Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs_common/nfslocalio.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index d157fdc068d7..27ad5263e400 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -199,8 +199,8 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
/* Now we can allow racing nfs_close_local_fh() to
* skip the locking.
*/
- RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
- wake_up_var_locked(&nfl->nfs_uuid, &nfs_uuid->lock);
+ rcu_assign_pointer(nfl->nfs_uuid, NULL);
+ wake_up_var(nfl);
}
/* Remove client from nn->local_clients */
@@ -321,8 +321,7 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
*/
spin_unlock(&nfs_uuid->lock);
rcu_read_unlock();
- wait_var_event(&nfl->nfs_uuid,
- rcu_access_pointer(nfl->nfs_uuid) == NULL);
+ wait_var_event(nfl, rcu_access_pointer(nfl->nfs_uuid) == NULL);
return;
}
/* tell nfs_uuid_put() to wait for us */
--
2.50.1
^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed
2025-07-15 22:52 ` [PATCH 0/3] Fix localio hangs Trond Myklebust
` (2 preceding siblings ...)
2025-07-15 22:52 ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file Trond Myklebust
@ 2025-07-16 1:09 ` NeilBrown
2025-07-16 1:22 ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events NeilBrown
` (2 subsequent siblings)
6 siblings, 0 replies; 56+ messages in thread
From: NeilBrown @ 2025-07-16 1:09 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Anna Schumaker, Mike Snitzer, linux-nfs
On Wed, 16 Jul 2025, Trond Myklebust wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> If the struct nfs_file_localio is closed, its list entry will be empty,
> but the nfs_uuid->files list might still contain other entries.
>
> Acked-by: Mike Snitzer <snitzer@kernel.org>
> Tested-by: Mike Snitzer <snitzer@kernel.org>
> Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfs_common/nfslocalio.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index 05c7c16e37ab..64949c46c174 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -314,7 +314,7 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
> rcu_read_unlock();
> return;
> }
> - if (list_empty(&nfs_uuid->files)) {
> + if (list_empty(&nfl->list)) {
Yes of course... This must match:
/* Remove nfl from nfs_uuid->files list */
list_del_init(&nfl->list);
spin_unlock(&nfs_uuid->lock);
in nfs_uuid_put(). If nfs_uuid_put() disconnects nfl from the list
first, nfs_close_local_fh() must skip the closing and wait for
->nfs_uuid to become NULL. So it really must be testing the same
list_head.
Reviewed-by: NeilBrown <neil@brown.name>
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events
2025-07-15 22:52 ` [PATCH 0/3] Fix localio hangs Trond Myklebust
` (3 preceding siblings ...)
2025-07-16 1:09 ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed NeilBrown
@ 2025-07-16 1:22 ` NeilBrown
2025-07-16 2:29 ` Trond Myklebust
2025-07-16 1:31 ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file NeilBrown
2025-07-16 15:59 ` [PATCH v2 0/3] Fix localio hangs Trond Myklebust
6 siblings, 1 reply; 56+ messages in thread
From: NeilBrown @ 2025-07-16 1:22 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Anna Schumaker, Mike Snitzer, linux-nfs
On Wed, 16 Jul 2025, Trond Myklebust wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> No reference to nfl is held when waiting in nfs_uuid_put(), so not only
> must the event condition check if the first entry in the list has
> changed, it must also check if the nfl->nfs_uuid field is still NULL,
> in case the old entry was replaced.
As no reference is held to nfl, it cannot be safe to check if
nfl->nfs_uuid is still NULL. It could be freed and the memory reused
for anything.
At this point, with nfs_uuid->net set to NULL, nothing can be added to
nfs_uuid->files. Things can only be removed.
So if list_first_entry_or_null stops being nfl, then we know that nfl
has been removed from the list and cannot possibly be added again.
This must have been done by nfs_close_local_fh() which set ->nfs_uuid to
NULL so that nfs_uuid_put() decided to wait.
>
> Also change the event variable to be nfs_uuid, for the same reason that
> no reference is held to nfl.
The event variable is never dereferenced so we don't need to hold a
reference to it. The address is hashed to choose a wait queue - that is
all it is used for.
So it is perfectly safe to wait on &nfl->nfs_uuid or to wake it up
without a reference to nfl. All that matters is that the waker and the
waiter use the same address.
So I believe this patch is wrong. The extra test on nfl->nfs_uuid is
incorrect and not needed. The change to the event variable is not needed
except that they must both be the same (which is what my earlier patch
did).
Thanks,
NeilBrown
>
> Acked-by: Mike Snitzer <snitzer@kernel.org>
> Tested-by: Mike Snitzer <snitzer@kernel.org>
> Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfs_common/nfslocalio.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index 64949c46c174..d157fdc068d7 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -177,12 +177,13 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
> /* nfs_close_local_fh() is doing the
> * close and we must wait. until it unlinks
> */
> - wait_var_event_spinlock(nfl,
> - list_first_entry_or_null(
> - &nfs_uuid->files,
> - struct nfs_file_localio,
> - list) != nfl,
> - &nfs_uuid->lock);
> + wait_var_event_spinlock(
> + nfs_uuid,
> + list_first_entry_or_null(
> + &nfs_uuid->files,
> + struct nfs_file_localio, list) != nfl ||
> + rcu_access_pointer(nfl->nfs_uuid),
> + &nfs_uuid->lock);
> continue;
> }
>
> @@ -338,7 +339,7 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
> */
> spin_lock(&nfs_uuid->lock);
> list_del_init(&nfl->list);
> - wake_up_var_locked(&nfl->nfs_uuid, &nfs_uuid->lock);
> + wake_up_var_locked(nfs_uuid, &nfs_uuid->lock);
> spin_unlock(&nfs_uuid->lock);
> }
> EXPORT_SYMBOL_GPL(nfs_close_local_fh);
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
2025-07-15 22:52 ` [PATCH 0/3] Fix localio hangs Trond Myklebust
` (4 preceding siblings ...)
2025-07-16 1:22 ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events NeilBrown
@ 2025-07-16 1:31 ` NeilBrown
2025-07-16 4:17 ` Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 0/3] Fix localio hangs Trond Myklebust
6 siblings, 1 reply; 56+ messages in thread
From: NeilBrown @ 2025-07-16 1:31 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Anna Schumaker, Mike Snitzer, linux-nfs
On Wed, 16 Jul 2025, Trond Myklebust wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> After setting nfl->nfs_uuid to NULL, dereferences of nfl should be
> avoided, since there are no further guarantees that the memory is still
> allocated.
nfl is not being dereferenced here. The difference between using "nfl"
and "&nfl->nfs_uuid" as the event variable is simply some address
arithmetic. As long as both are the same it doesn't matter which is
used.
> Also change the wake_up_var_locked() to be a regular wake_up_var(),
> since nfs_close_local_fh() cannot retake the nfs_uuid->lock after
> dropping it.
The point of wake_up_var_locked() is to document why wake_up_var() is
safe. In general you need a barrier between an assignment and a
wake_up_var(). I would like to eventually remove all wake_up_var()
calls, replacing them with other calls which document why the wakeup is
safe (e.g. store_release_wake_up(), atomic_dec_and_wake_up()). In this
case it is safe because both the waker and the waiter hold the same
spinlock. I would like that documentation to remain.
Also the change from RCU_INIT_POINTER() to rcu_assign_pointer() is not
documented and not needed. The primary purpose of rcu_assign_pointer()
is to include a barer so that anything that the new value points to will
be visible to a concurrent reader that uses rcu_dereference_pointer().
As NULL doesn't point to anything, no possible barrier is needed and
RCU_INIT_POINTER() is the correct interface to use.
So I think this patch is unnecessary and while it doesn't change
behaviour in any important way, it does make the intention of the code a
little less clear.
Thanks,
NeilBrown
>
> Acked-by: Mike Snitzer <snitzer@kernel.org>
> Tested-by: Mike Snitzer <snitzer@kernel.org>
> Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfs_common/nfslocalio.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index d157fdc068d7..27ad5263e400 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -199,8 +199,8 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
> /* Now we can allow racing nfs_close_local_fh() to
> * skip the locking.
> */
> - RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
> - wake_up_var_locked(&nfl->nfs_uuid, &nfs_uuid->lock);
> + rcu_assign_pointer(nfl->nfs_uuid, NULL);
> + wake_up_var(nfl);
> }
>
> /* Remove client from nn->local_clients */
> @@ -321,8 +321,7 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
> */
> spin_unlock(&nfs_uuid->lock);
> rcu_read_unlock();
> - wait_var_event(&nfl->nfs_uuid,
> - rcu_access_pointer(nfl->nfs_uuid) == NULL);
> + wait_var_event(nfl, rcu_access_pointer(nfl->nfs_uuid) == NULL);
> return;
> }
> /* tell nfs_uuid_put() to wait for us */
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events
2025-07-16 1:22 ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events NeilBrown
@ 2025-07-16 2:29 ` Trond Myklebust
2025-07-16 3:51 ` NeilBrown
0 siblings, 1 reply; 56+ messages in thread
From: Trond Myklebust @ 2025-07-16 2:29 UTC (permalink / raw)
To: NeilBrown; +Cc: Anna Schumaker, Mike Snitzer, linux-nfs
On Wed, 2025-07-16 at 11:22 +1000, NeilBrown wrote:
> On Wed, 16 Jul 2025, Trond Myklebust wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >
> > No reference to nfl is held when waiting in nfs_uuid_put(), so not
> > only
> > must the event condition check if the first entry in the list has
> > changed, it must also check if the nfl->nfs_uuid field is still
> > NULL,
> > in case the old entry was replaced.
>
> As no reference is held to nfl, it cannot be safe to check if
> nfl->nfs_uuid is still NULL. It could be freed and the memory reused
> for anything.
It is quite safe.
The test first checks if the nfs_uuid->files list first entry still
points to 'nfl'. Then it checks the value of nfl->nfs_uuid.
All this happens while the nfs_uuid->lock is held.
>
> At this point, with nfs_uuid->net set to NULL, nothing can be added
> to
> nfs_uuid->files. Things can only be removed.
There is nothing in either nfs_open_local_fh() or nfs_uuid_add_file()
that stops anyone from adding a new entry to nfs_uuid->files in the
case where nfs_uuid->net is NULL.
If that was the intention, then I agree that this patch is wrong, but
not for the reasons you listed.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events
2025-07-16 2:29 ` Trond Myklebust
@ 2025-07-16 3:51 ` NeilBrown
0 siblings, 0 replies; 56+ messages in thread
From: NeilBrown @ 2025-07-16 3:51 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Anna Schumaker, Mike Snitzer, linux-nfs
On Wed, 16 Jul 2025, Trond Myklebust wrote:
> On Wed, 2025-07-16 at 11:22 +1000, NeilBrown wrote:
> > On Wed, 16 Jul 2025, Trond Myklebust wrote:
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > >
> > > No reference to nfl is held when waiting in nfs_uuid_put(), so not
> > > only
> > > must the event condition check if the first entry in the list has
> > > changed, it must also check if the nfl->nfs_uuid field is still
> > > NULL,
> > > in case the old entry was replaced.
> >
> > As no reference is held to nfl, it cannot be safe to check if
> > nfl->nfs_uuid is still NULL. It could be freed and the memory reused
> > for anything.
>
> It is quite safe.
>
> The test first checks if the nfs_uuid->files list first entry still
> points to 'nfl'. Then it checks the value of nfl->nfs_uuid.
>
> All this happens while the nfs_uuid->lock is held.
Ok, agreed. It is safe.
>
> >
> > At this point, with nfs_uuid->net set to NULL, nothing can be added
> > to
> > nfs_uuid->files. Things can only be removed.
>
> There is nothing in either nfs_open_local_fh() or nfs_uuid_add_file()
> that stops anyone from adding a new entry to nfs_uuid->files in the
> case where nfs_uuid->net is NULL.
nfs_open_local_fh() starts:
rcu_read_lock();
net = rcu_dereference(uuid->net);
if (!net || !nfs_to->nfsd_net_try_get(net)) {
rcu_read_unlock();
return ERR_PTR(-ENXIO);
}
rcu_read_unlock();
so if uuid->net is NULL, it will return and not add to the list.
Of course there could be races with nfs_put_uuid() called after that
check. The nfsd_net_try_get(net) check is supposed to handle that case.
But I cannot convince myself that it does. In particular expkey_flush()
calls nfsd_file_cache_purge() which calls
nfs_localio_invalidate_clients(). This can happen when
nfsd_net_try_get() will succeed.
So I agree that things could get added to the nfs_uuid->list after ->net
has been set to NULL, but that is a bug. nfs_uuid_put() shouldn't be
checking for that.
I don't think nfsd_file_cache_purge() should be invalidating the clients
in this case - just the files that the clients have open so they are
forced to re-open.
I'm not sure how best to resolve this at present.
Thanks,
NeilBrown
>
> If that was the intention, then I agree that this patch is wrong, but
> not for the reasons you listed.
>
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trondmy@kernel.org, trond.myklebust@hammerspace.com
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
2025-07-16 1:31 ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file NeilBrown
@ 2025-07-16 4:17 ` Trond Myklebust
2025-07-16 5:07 ` NeilBrown
0 siblings, 1 reply; 56+ messages in thread
From: Trond Myklebust @ 2025-07-16 4:17 UTC (permalink / raw)
To: NeilBrown; +Cc: Anna Schumaker, Mike Snitzer, linux-nfs
On Wed, 2025-07-16 at 11:31 +1000, NeilBrown wrote:
> On Wed, 16 Jul 2025, Trond Myklebust wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >
> > After setting nfl->nfs_uuid to NULL, dereferences of nfl should be
> > avoided, since there are no further guarantees that the memory is
> > still
> > allocated.
>
> nfl is not being dereferenced here. The difference between using
> "nfl"
> and "&nfl->nfs_uuid" as the event variable is simply some address
> arithmetic. As long as both are the same it doesn't matter which is
> used.
>
>
> > Also change the wake_up_var_locked() to be a regular wake_up_var(),
> > since nfs_close_local_fh() cannot retake the nfs_uuid->lock after
> > dropping it.
>
> The point of wake_up_var_locked() is to document why wake_up_var() is
> safe. In general you need a barrier between an assignment and a
> wake_up_var(). I would like to eventually remove all wake_up_var()
> calls, replacing them with other calls which document why the wakeup
> is
> safe (e.g. store_release_wake_up(), atomic_dec_and_wake_up()). In
> this
> case it is safe because both the waker and the waiter hold the same
> spinlock. I would like that documentation to remain.
The documentation is wrong. The waiter and waker do not both hold the
spin lock.
nfs_close_local_fh() calls wait_var_event() after it has dropped the
nfs_uuid->lock, and it has no guarantee that nfs_uuid still exists
after that is the case.
In order to guarantee that, it would have to go through the whole
rcu_dereference(nfl->nfs_uuid) rhumba from beginning of the call again.
The point of the rcu_assign_pointer() is therefore to add the barrier
that is missing before the call to wake_up_var().
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
2025-07-16 4:17 ` Trond Myklebust
@ 2025-07-16 5:07 ` NeilBrown
2025-07-16 15:19 ` Trond Myklebust
0 siblings, 1 reply; 56+ messages in thread
From: NeilBrown @ 2025-07-16 5:07 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Anna Schumaker, Mike Snitzer, linux-nfs
On Wed, 16 Jul 2025, Trond Myklebust wrote:
> On Wed, 2025-07-16 at 11:31 +1000, NeilBrown wrote:
> > On Wed, 16 Jul 2025, Trond Myklebust wrote:
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > >
> > > After setting nfl->nfs_uuid to NULL, dereferences of nfl should be
> > > avoided, since there are no further guarantees that the memory is
> > > still
> > > allocated.
> >
> > nfl is not being dereferenced here. The difference between using
> > "nfl"
> > and "&nfl->nfs_uuid" as the event variable is simply some address
> > arithmetic. As long as both are the same it doesn't matter which is
> > used.
> >
> >
> > > Also change the wake_up_var_locked() to be a regular wake_up_var(),
> > > since nfs_close_local_fh() cannot retake the nfs_uuid->lock after
> > > dropping it.
> >
> > The point of wake_up_var_locked() is to document why wake_up_var() is
> > safe. In general you need a barrier between an assignment and a
> > wake_up_var(). I would like to eventually remove all wake_up_var()
> > calls, replacing them with other calls which document why the wakeup
> > is
> > safe (e.g. store_release_wake_up(), atomic_dec_and_wake_up()). In
> > this
> > case it is safe because both the waker and the waiter hold the same
> > spinlock. I would like that documentation to remain.
>
>
> The documentation is wrong. The waiter and waker do not both hold the
> spin lock.
True. In that case it would make sense to use store_release_wake_up()
in nfs_uuid_put(). Though that doesn't have the right rcu
annotations....
I think
store_release_wake_up(&nfl->nfs_uuid, RCU_INITIALIZER(NULL));
would be correct.
>
> nfs_close_local_fh() calls wait_var_event() after it has dropped the
> nfs_uuid->lock, and it has no guarantee that nfs_uuid still exists
> after that is the case.
> In order to guarantee that, it would have to go through the whole
> rcu_dereference(nfl->nfs_uuid) rhumba from beginning of the call again.
>
> The point of the rcu_assign_pointer() is therefore to add the barrier
> that is missing before the call to wake_up_var().
rcu_assign_pointer()s add a barrier before the assignment. wake_up_var()
requires a barrier after the assignment.
In fact, when the val is NULL, rcu_assign_pointer() doesn't even include
that barrier - it acts exactly like RCU_INIT_POINTER() - interesting.
Thanks,
NeilBrown
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trondmy@kernel.org, trond.myklebust@hammerspace.com
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
2025-07-16 5:07 ` NeilBrown
@ 2025-07-16 15:19 ` Trond Myklebust
0 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2025-07-16 15:19 UTC (permalink / raw)
To: NeilBrown; +Cc: Anna Schumaker, Mike Snitzer, linux-nfs
On Wed, 2025-07-16 at 15:07 +1000, NeilBrown wrote:
> On Wed, 16 Jul 2025, Trond Myklebust wrote:
> > On Wed, 2025-07-16 at 11:31 +1000, NeilBrown wrote:
> > > On Wed, 16 Jul 2025, Trond Myklebust wrote:
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > >
> > > > After setting nfl->nfs_uuid to NULL, dereferences of nfl should
> > > > be
> > > > avoided, since there are no further guarantees that the memory
> > > > is
> > > > still
> > > > allocated.
> > >
> > > nfl is not being dereferenced here. The difference between using
> > > "nfl"
> > > and "&nfl->nfs_uuid" as the event variable is simply some address
> > > arithmetic. As long as both are the same it doesn't matter which
> > > is
> > > used.
> > >
> > >
> > > > Also change the wake_up_var_locked() to be a regular
> > > > wake_up_var(),
> > > > since nfs_close_local_fh() cannot retake the nfs_uuid->lock
> > > > after
> > > > dropping it.
> > >
> > > The point of wake_up_var_locked() is to document why
> > > wake_up_var() is
> > > safe. In general you need a barrier between an assignment and a
> > > wake_up_var(). I would like to eventually remove all
> > > wake_up_var()
> > > calls, replacing them with other calls which document why the
> > > wakeup
> > > is
> > > safe (e.g. store_release_wake_up(), atomic_dec_and_wake_up()).
> > > In
> > > this
> > > case it is safe because both the waker and the waiter hold the
> > > same
> > > spinlock. I would like that documentation to remain.
> >
> >
> > The documentation is wrong. The waiter and waker do not both hold
> > the
> > spin lock.
>
> True. In that case it would make sense to use
> store_release_wake_up()
> in nfs_uuid_put(). Though that doesn't have the right rcu
> annotations....
> I think
> store_release_wake_up(&nfl->nfs_uuid, RCU_INITIALIZER(NULL));
> would be correct.
>
> >
> > nfs_close_local_fh() calls wait_var_event() after it has dropped
> > the
> > nfs_uuid->lock, and it has no guarantee that nfs_uuid still exists
> > after that is the case.
> > In order to guarantee that, it would have to go through the whole
> > rcu_dereference(nfl->nfs_uuid) rhumba from beginning of the call
> > again.
> >
> > The point of the rcu_assign_pointer() is therefore to add the
> > barrier
> > that is missing before the call to wake_up_var().
>
> rcu_assign_pointer()s add a barrier before the assignment.
> wake_up_var()
> requires a barrier after the assignment.
> In fact, when the val is NULL, rcu_assign_pointer() doesn't even
> include
> that barrier - it acts exactly like RCU_INIT_POINTER() - interesting.
>
Fair enough. I have a v2 of this patchset that Mike is testing out, and
that should fix this issue as well as the ones raised by the second
patch. I'll post once he is satisfied.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH v2 0/3] Fix localio hangs
2025-07-15 22:52 ` [PATCH 0/3] Fix localio hangs Trond Myklebust
` (5 preceding siblings ...)
2025-07-16 1:31 ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file NeilBrown
@ 2025-07-16 15:59 ` Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
` (3 more replies)
6 siblings, 4 replies; 56+ messages in thread
From: Trond Myklebust @ 2025-07-16 15:59 UTC (permalink / raw)
To: Anna Schumaker; +Cc: NeilBrown, Mike Snitzer, linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
The following patch series fixes a series of issues with the current
localio code, as reported in the link
https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
Trond Myklebust (3):
NFS/localio: nfs_close_local_fh() fix check for file closed
NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
fs/nfs_common/nfslocalio.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH v2 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed
2025-07-16 15:59 ` [PATCH v2 0/3] Fix localio hangs Trond Myklebust
@ 2025-07-16 15:59 ` Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 2/3] NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() Trond Myklebust
` (2 subsequent siblings)
3 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2025-07-16 15:59 UTC (permalink / raw)
To: Anna Schumaker; +Cc: NeilBrown, Mike Snitzer, linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
If the struct nfs_file_localio is closed, its list entry will be empty,
but the nfs_uuid->files list might still contain other entries.
Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs_common/nfslocalio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 05c7c16e37ab..64949c46c174 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -314,7 +314,7 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
rcu_read_unlock();
return;
}
- if (list_empty(&nfs_uuid->files)) {
+ if (list_empty(&nfl->list)) {
/* nfs_uuid_put() has started closing files, wait for it
* to finished
*/
--
2.50.1
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH v2 2/3] NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
2025-07-16 15:59 ` [PATCH v2 0/3] Fix localio hangs Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
@ 2025-07-16 15:59 ` Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file Trond Myklebust
2025-07-16 22:09 ` [PATCH v2 0/3] Fix localio hangs NeilBrown
3 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2025-07-16 15:59 UTC (permalink / raw)
To: Anna Schumaker; +Cc: NeilBrown, Mike Snitzer, linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
In order for the wait in nfs_uuid_put() to be safe, it is necessary to
ensure that nfs_uuid_add_file() doesn't add a new entry once the
nfs_uuid->net has been NULLed out.
Also fix up the wake_up_var_locked() / wait_var_event_spinlock() to both
use the nfs_uuid address, since nfl, and &nfl->uuid could be used elsewhere.
Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/all/175262893035.2234665.1735173020338594784@noble.neil.brown.name/
Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs_common/nfslocalio.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 64949c46c174..f1f1592ac134 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -177,7 +177,7 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
/* nfs_close_local_fh() is doing the
* close and we must wait. until it unlinks
*/
- wait_var_event_spinlock(nfl,
+ wait_var_event_spinlock(nfs_uuid,
list_first_entry_or_null(
&nfs_uuid->files,
struct nfs_file_localio,
@@ -243,15 +243,20 @@ void nfs_localio_invalidate_clients(struct list_head *nn_local_clients,
}
EXPORT_SYMBOL_GPL(nfs_localio_invalidate_clients);
-static void nfs_uuid_add_file(nfs_uuid_t *nfs_uuid, struct nfs_file_localio *nfl)
+static int nfs_uuid_add_file(nfs_uuid_t *nfs_uuid, struct nfs_file_localio *nfl)
{
+ int ret = 0;
+
/* Add nfl to nfs_uuid->files if it isn't already */
spin_lock(&nfs_uuid->lock);
- if (list_empty(&nfl->list)) {
+ if (rcu_access_pointer(nfs_uuid->net) == NULL) {
+ ret = -ENXIO;
+ } else if (list_empty(&nfl->list)) {
rcu_assign_pointer(nfl->nfs_uuid, nfs_uuid);
list_add_tail(&nfl->list, &nfs_uuid->files);
}
spin_unlock(&nfs_uuid->lock);
+ return ret;
}
/*
@@ -285,11 +290,13 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
}
rcu_read_unlock();
/* We have an implied reference to net thanks to nfsd_net_try_get */
- localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt,
- cred, nfs_fh, pnf, fmode);
+ localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt, cred,
+ nfs_fh, pnf, fmode);
+ if (!IS_ERR(localio) && nfs_uuid_add_file(uuid, nfl) < 0) {
+ /* Delete the cached file when racing with nfs_uuid_put() */
+ nfs_to_nfsd_file_put_local(pnf);
+ }
nfs_to_nfsd_net_put(net);
- if (!IS_ERR(localio))
- nfs_uuid_add_file(uuid, nfl);
return localio;
}
@@ -338,7 +345,7 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
*/
spin_lock(&nfs_uuid->lock);
list_del_init(&nfl->list);
- wake_up_var_locked(&nfl->nfs_uuid, &nfs_uuid->lock);
+ wake_up_var_locked(nfs_uuid, &nfs_uuid->lock);
spin_unlock(&nfs_uuid->lock);
}
EXPORT_SYMBOL_GPL(nfs_close_local_fh);
--
2.50.1
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH v2 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
2025-07-16 15:59 ` [PATCH v2 0/3] Fix localio hangs Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 2/3] NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() Trond Myklebust
@ 2025-07-16 15:59 ` Trond Myklebust
2025-07-16 22:09 ` [PATCH v2 0/3] Fix localio hangs NeilBrown
3 siblings, 0 replies; 56+ messages in thread
From: Trond Myklebust @ 2025-07-16 15:59 UTC (permalink / raw)
To: Anna Schumaker; +Cc: NeilBrown, Mike Snitzer, linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Use store_release_wake_up() instead of wake_up_var_locked(), because the
waiter cannot retake the nfs_uuid->lock.
Acked-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Suggested-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/all/175262948827.2234665.1891349021754495573@noble.neil.brown.name/
Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs_common/nfslocalio.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index f1f1592ac134..dd715cdb6c04 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -198,8 +198,7 @@ static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid)
/* Now we can allow racing nfs_close_local_fh() to
* skip the locking.
*/
- RCU_INIT_POINTER(nfl->nfs_uuid, NULL);
- wake_up_var_locked(&nfl->nfs_uuid, &nfs_uuid->lock);
+ store_release_wake_up(&nfl->nfs_uuid, RCU_INITIALIZER(NULL));
}
/* Remove client from nn->local_clients */
--
2.50.1
^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH v2 0/3] Fix localio hangs
2025-07-16 15:59 ` [PATCH v2 0/3] Fix localio hangs Trond Myklebust
` (2 preceding siblings ...)
2025-07-16 15:59 ` [PATCH v2 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file Trond Myklebust
@ 2025-07-16 22:09 ` NeilBrown
2025-07-16 23:27 ` Mike Snitzer
3 siblings, 1 reply; 56+ messages in thread
From: NeilBrown @ 2025-07-16 22:09 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Anna Schumaker, Mike Snitzer, linux-nfs
On Thu, 17 Jul 2025, Trond Myklebust wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> The following patch series fixes a series of issues with the current
> localio code, as reported in the link
> https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
>
>
> Trond Myklebust (3):
> NFS/localio: nfs_close_local_fh() fix check for file closed
> NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
> NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
That all looks good to me - thanks a lot for finding and fixing my bugs.
Reviewed-by: NeilBrown <neil@brown.name>
I'd still like to fix the nfsd_file_cache_purge() issue but that is
quite separate especially now that you've prevented it causing problems
for nfs_uuid_put().
thanks,
NeilBrown
>
> fs/nfs_common/nfslocalio.c | 28 +++++++++++++++++-----------
> 1 file changed, 17 insertions(+), 11 deletions(-)
>
> --
> 2.50.1
>
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH v2 0/3] Fix localio hangs
2025-07-16 22:09 ` [PATCH v2 0/3] Fix localio hangs NeilBrown
@ 2025-07-16 23:27 ` Mike Snitzer
2025-07-18 0:18 ` NeilBrown
0 siblings, 1 reply; 56+ messages in thread
From: Mike Snitzer @ 2025-07-16 23:27 UTC (permalink / raw)
To: NeilBrown; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs
On Thu, Jul 17, 2025 at 08:09:11AM +1000, NeilBrown wrote:
> On Thu, 17 Jul 2025, Trond Myklebust wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >
> > The following patch series fixes a series of issues with the current
> > localio code, as reported in the link
> > https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
> >
> >
> > Trond Myklebust (3):
> > NFS/localio: nfs_close_local_fh() fix check for file closed
> > NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
> > NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
>
> That all looks good to me - thanks a lot for finding and fixing my bugs.
>
> Reviewed-by: NeilBrown <neil@brown.name>
>
> I'd still like to fix the nfsd_file_cache_purge() issue but that is
> quite separate especially now that you've prevented it causing problems
> for nfs_uuid_put().
>
> thanks,
> NeilBrown
Unfortunately even with these 3 v2 fixes I was just able to hit the
same hang on NFSD shutdown. It took 5 iterations of the fio test,
reported here:
https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
So it is harder to hit with these v2 fixes, nevertheless:
[ 369.528839] task:rpc.nfsd state:D stack:0 pid:10569 tgid:10569 ppid:1 flags:0x00004006
[ 369.528985] Call Trace:
[ 369.529127] <TASK>
[ 369.529295] __schedule+0x26d/0x530
[ 369.529435] schedule+0x27/0xa0
[ 369.529566] schedule_timeout+0x14e/0x160
[ 369.529700] ? svc_destroy+0xce/0x160 [sunrpc]
[ 369.529882] ? lockd_put+0x5f/0x90 [lockd]
[ 369.530022] __wait_for_common+0x8f/0x1d0
[ 369.530154] ? __pfx_schedule_timeout+0x10/0x10
[ 369.530329] nfsd_destroy_serv+0x13f/0x1a0 [nfsd]
[ 369.530516] nfsd_svc+0xe0/0x170 [nfsd]
[ 369.530684] write_threads+0xc3/0x190 [nfsd]
[ 369.530845] ? simple_transaction_get+0xc2/0xe0
[ 369.530973] ? __pfx_write_threads+0x10/0x10 [nfsd]
[ 369.531133] nfsctl_transaction_write+0x47/0x80 [nfsd]
[ 369.531324] vfs_write+0xfa/0x420
[ 369.531448] ? do_filp_open+0xae/0x150
[ 369.531574] ksys_write+0x63/0xe0
[ 369.531693] do_syscall_64+0x7d/0x160
[ 369.531816] ? do_sys_openat2+0x81/0xd0
[ 369.531937] ? syscall_exit_work+0xf3/0x120
[ 369.532058] ? syscall_exit_to_user_mode+0x32/0x1b0
[ 369.532178] ? do_syscall_64+0x89/0x160
[ 369.532344] ? __mod_memcg_lruvec_state+0x95/0x150
[ 369.532465] ? __lruvec_stat_mod_folio+0x84/0xd0
[ 369.532584] ? syscall_exit_work+0xf3/0x120
[ 369.532705] ? syscall_exit_to_user_mode+0x32/0x1b0
[ 369.532827] ? do_syscall_64+0x89/0x160
[ 369.532947] ? __handle_mm_fault+0x326/0x730
[ 369.533066] ? __mod_memcg_lruvec_state+0x95/0x150
[ 369.533187] ? __count_memcg_events+0x53/0xf0
[ 369.533306] ? handle_mm_fault+0x245/0x340
[ 369.533427] ? do_user_addr_fault+0x341/0x6b0
[ 369.533547] ? exc_page_fault+0x70/0x160
[ 369.533666] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 369.533787] RIP: 0033:0x7f1db10fd617
crash> dis -l nfsd_destroy_serv+0x13f
/root/snitm/git/linux-HS/fs/nfsd/nfssvc.c: 468
0xffffffffc172e36f <nfsd_destroy_serv+319>: mov %r12,%rdi
which is the percpu_ref_exit() in nfsd_shutdown_net():
static void nfsd_shutdown_net(struct net *net)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
if (!nn->nfsd_net_up)
return;
percpu_ref_kill_and_confirm(&nn->nfsd_net_ref, nfsd_net_done);
wait_for_completion(&nn->nfsd_net_confirm_done);
nfsd_export_flush(net);
nfs4_state_shutdown_net(net);
nfsd_reply_cache_shutdown(nn);
nfsd_file_cache_shutdown_net(net);
if (nn->lockd_up) {
lockd_down(net);
nn->lockd_up = false;
}
wait_for_completion(&nn->nfsd_net_free_done);
---> percpu_ref_exit(&nn->nfsd_net_ref);
nn->nfsd_net_up = false;
nfsd_shutdown_generic();
}
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH v2 0/3] Fix localio hangs
2025-07-16 23:27 ` Mike Snitzer
@ 2025-07-18 0:18 ` NeilBrown
0 siblings, 0 replies; 56+ messages in thread
From: NeilBrown @ 2025-07-18 0:18 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs
On Thu, 17 Jul 2025, Mike Snitzer wrote:
> On Thu, Jul 17, 2025 at 08:09:11AM +1000, NeilBrown wrote:
> > On Thu, 17 Jul 2025, Trond Myklebust wrote:
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > >
> > > The following patch series fixes a series of issues with the current
> > > localio code, as reported in the link
> > > https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
> > >
> > >
> > > Trond Myklebust (3):
> > > NFS/localio: nfs_close_local_fh() fix check for file closed
> > > NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
> > > NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
> >
> > That all looks good to me - thanks a lot for finding and fixing my bugs.
> >
> > Reviewed-by: NeilBrown <neil@brown.name>
> >
> > I'd still like to fix the nfsd_file_cache_purge() issue but that is
> > quite separate especially now that you've prevented it causing problems
> > for nfs_uuid_put().
> >
> > thanks,
> > NeilBrown
>
> Unfortunately even with these 3 v2 fixes I was just able to hit the
> same hang on NFSD shutdown. It took 5 iterations of the fio test,
> reported here:
> https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
> So it is harder to hit with these v2 fixes, nevertheless:
>
> [ 369.528839] task:rpc.nfsd state:D stack:0 pid:10569 tgid:10569 ppid:1 flags:0x00004006
Are there any other tasks which are in "state:D", or any nfsd or nfs
processes that are waiting in any state?
I'll see if I can work out any way that an nfsd_net_ref reference might leak.
Thanks,
NeilBrown
> [ 369.528985] Call Trace:
> [ 369.529127] <TASK>
> [ 369.529295] __schedule+0x26d/0x530
> [ 369.529435] schedule+0x27/0xa0
> [ 369.529566] schedule_timeout+0x14e/0x160
> [ 369.529700] ? svc_destroy+0xce/0x160 [sunrpc]
> [ 369.529882] ? lockd_put+0x5f/0x90 [lockd]
> [ 369.530022] __wait_for_common+0x8f/0x1d0
> [ 369.530154] ? __pfx_schedule_timeout+0x10/0x10
> [ 369.530329] nfsd_destroy_serv+0x13f/0x1a0 [nfsd]
> [ 369.530516] nfsd_svc+0xe0/0x170 [nfsd]
> [ 369.530684] write_threads+0xc3/0x190 [nfsd]
> [ 369.530845] ? simple_transaction_get+0xc2/0xe0
> [ 369.530973] ? __pfx_write_threads+0x10/0x10 [nfsd]
> [ 369.531133] nfsctl_transaction_write+0x47/0x80 [nfsd]
> [ 369.531324] vfs_write+0xfa/0x420
> [ 369.531448] ? do_filp_open+0xae/0x150
> [ 369.531574] ksys_write+0x63/0xe0
> [ 369.531693] do_syscall_64+0x7d/0x160
> [ 369.531816] ? do_sys_openat2+0x81/0xd0
> [ 369.531937] ? syscall_exit_work+0xf3/0x120
> [ 369.532058] ? syscall_exit_to_user_mode+0x32/0x1b0
> [ 369.532178] ? do_syscall_64+0x89/0x160
> [ 369.532344] ? __mod_memcg_lruvec_state+0x95/0x150
> [ 369.532465] ? __lruvec_stat_mod_folio+0x84/0xd0
> [ 369.532584] ? syscall_exit_work+0xf3/0x120
> [ 369.532705] ? syscall_exit_to_user_mode+0x32/0x1b0
> [ 369.532827] ? do_syscall_64+0x89/0x160
> [ 369.532947] ? __handle_mm_fault+0x326/0x730
> [ 369.533066] ? __mod_memcg_lruvec_state+0x95/0x150
> [ 369.533187] ? __count_memcg_events+0x53/0xf0
> [ 369.533306] ? handle_mm_fault+0x245/0x340
> [ 369.533427] ? do_user_addr_fault+0x341/0x6b0
> [ 369.533547] ? exc_page_fault+0x70/0x160
> [ 369.533666] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 369.533787] RIP: 0033:0x7f1db10fd617
>
> crash> dis -l nfsd_destroy_serv+0x13f
> /root/snitm/git/linux-HS/fs/nfsd/nfssvc.c: 468
> 0xffffffffc172e36f <nfsd_destroy_serv+319>: mov %r12,%rdi
>
> which is the percpu_ref_exit() in nfsd_shutdown_net():
>
> static void nfsd_shutdown_net(struct net *net)
> {
> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>
> if (!nn->nfsd_net_up)
> return;
>
> percpu_ref_kill_and_confirm(&nn->nfsd_net_ref, nfsd_net_done);
> wait_for_completion(&nn->nfsd_net_confirm_done);
>
> nfsd_export_flush(net);
> nfs4_state_shutdown_net(net);
> nfsd_reply_cache_shutdown(nn);
> nfsd_file_cache_shutdown_net(net);
> if (nn->lockd_up) {
> lockd_down(net);
> nn->lockd_up = false;
> }
>
> wait_for_completion(&nn->nfsd_net_free_done);
> ---> percpu_ref_exit(&nn->nfsd_net_ref);
>
> nn->nfsd_net_up = false;
> nfsd_shutdown_generic();
> }
>
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2025-07-18 0:18 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-09 0:46 [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers NeilBrown
2025-05-09 0:46 ` [PATCH 1/6] nfs_localio: use cmpxchg() to install new nfs_file_localio NeilBrown
2025-05-09 0:46 ` [PATCH 2/6] nfs_localio: always hold nfsd net ref with nfsd_file ref NeilBrown
2025-05-09 0:46 ` [PATCH 3/6] nfs_localio: simplify interface to nfsd for getting nfsd_file NeilBrown
2025-05-09 0:46 ` [PATCH 4/6] nfs_localio: duplicate nfs_close_local_fh() NeilBrown
2025-05-09 0:46 ` [PATCH 5/6] nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh() NeilBrown
2025-05-09 0:46 ` [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer NeilBrown
2025-05-09 11:03 ` kernel test robot
2025-07-08 14:20 ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 1/9] Revert "NFSD: Clean up kdoc for nfsd_open_local_fh()" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 2/9] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 3/9] Revert "nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 4/9] Revert "nfs_localio: duplicate nfs_close_local_fh()" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 5/9] Revert "nfs_localio: simplify interface to nfsd for getting nfsd_file" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 6/9] Revert "nfs_localio: always hold nfsd net ref with nfsd_file ref" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 7/9] Revert "nfs_localio: use cmpxchg() to install new nfs_file_localio" Mike Snitzer
2025-07-14 3:13 ` [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
2025-07-14 4:19 ` NeilBrown
2025-07-14 14:37 ` Mike Snitzer
2025-07-14 12:23 ` Jeff Layton
2025-07-14 3:13 ` [for-6.16-final PATCH 9/9] nfs/localio: add localio_async_probe modparm Mike Snitzer
2025-07-14 4:23 ` NeilBrown
2025-07-14 12:28 ` Jeff Layton
2025-07-14 14:08 ` Mike Snitzer
2025-07-14 3:50 ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" NeilBrown
2025-07-14 14:45 ` Mike Snitzer
2025-07-15 22:52 ` [PATCH 0/3] Fix localio hangs Trond Myklebust
2025-07-15 22:52 ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
2025-07-15 22:52 ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events Trond Myklebust
2025-07-15 22:52 ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file Trond Myklebust
2025-07-16 1:09 ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed NeilBrown
2025-07-16 1:22 ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events NeilBrown
2025-07-16 2:29 ` Trond Myklebust
2025-07-16 3:51 ` NeilBrown
2025-07-16 1:31 ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file NeilBrown
2025-07-16 4:17 ` Trond Myklebust
2025-07-16 5:07 ` NeilBrown
2025-07-16 15:19 ` Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 0/3] Fix localio hangs Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 2/3] NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() Trond Myklebust
2025-07-16 15:59 ` [PATCH v2 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file Trond Myklebust
2025-07-16 22:09 ` [PATCH v2 0/3] Fix localio hangs NeilBrown
2025-07-16 23:27 ` Mike Snitzer
2025-07-18 0:18 ` NeilBrown
2025-05-09 16:01 ` [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers Chuck Lever
2025-05-09 21:02 ` Mike Snitzer
2025-05-10 0:16 ` Paul E. McKenney
2025-05-10 2:44 ` NeilBrown
2025-05-10 3:01 ` NeilBrown
2025-05-10 16:02 ` Chuck Lever
2025-05-10 19:57 ` Mike Snitzer
2025-05-16 15:33 ` Chuck Lever
2025-05-18 10:46 ` Pali Rohár
2025-05-19 3:49 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox