From: Mike Snitzer <snitzer@kernel.org>
To: linux-nfs@vger.kernel.org
Cc: Anna Schumaker <anna@kernel.org>,
Trond Myklebust <trondmy@hammerspace.com>,
Chuck Lever <chuck.lever@oracle.com>,
Jeff Layton <jlayton@kernel.org>, NeilBrown <neilb@suse.de>
Subject: [PATCH] nfsd/filecache: add nfsd_file_acquire_gc_cached
Date: Thu, 24 Oct 2024 14:55:26 -0400 [thread overview]
Message-ID: <20241024185526.76146-1-snitzer@kernel.org> (raw)
Rather than make nfsd_file_do_acquire() more complex (by training
it to conditionally skip both fh_verify() and nfsd_file allocation
and contruction) introduce nfsd_file_acquire_gc_cached() -- which
duplicates the minimalist subset of nfsd_file_do_acquire() needed to
achieve nfsd_file lookup using an opaque @inode_key.
nfsd_file_acquire_gc_cached() only returns a cached and GC'd nfsd_file
obtained using the opaque @inode_key, established from a previous call
to nfsd_file_do_acquire_local() that originally added the GC'd
nfsd_file to the filecache.
Update nfsd_open_local_fh to store @inode_key in @nfs_fh so later
calls can check if it maps to an open GC'd nfsd_file in the filecache
using nfsd_file_acquire_gc_cached(). Its nfsd_file_lookup_locked()
call will only find a match if @cred matches the nfsd_file's nf_cred.
And care is taken to clear the inode_key in nfsd_file_free() if the
nfsd_file has a non-NULL nf_inodep (which is a pointer to the address
of the opaque inode_key stored in the nfs_fh). This avoids any risk
of re-using the old inode_key for a different nfsd_file.
This commit's cached nfsd_file lookup dramatically speeds up LOCALIO
performance, especially for small 4K O_DIRECT IO, e.g.:
before: read: IOPS=376k, BW=1469MiB/s (1541MB/s)(28.7GiB/20001msec)
after: read: IOPS=1037k, BW=4050MiB/s (4247MB/s)(79.1GiB/20002msec)
Note that LOCALIO calls nfsd_open_local_fh() for every IO it issues to
the underlying filesystem using the returned nfsd_file. This is why
caching the opaque 'inode_key' in 'struct nfs_fh' is so helpful for
LOCALIO to quickly return the cached nfsd_file. Whereas regular NFS
avoids fh_verify()'s costly duplicate lookups of the underlying
filesystem's dentry by caching it in 'fh_dentry' of 'struct svc_fh'.
LOCALIO cannot take the same approach, of storing the dentry, without
creating object lifetime issues associated with dentry references
holding server mounts open and preventing unmounts.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/inode.c | 3 ++
fs/nfs_common/nfslocalio.c | 2 +-
fs/nfsd/filecache.c | 78 ++++++++++++++++++++++++++++++++++++++
fs/nfsd/filecache.h | 7 ++++
fs/nfsd/localio.c | 46 +++++++++++++++++++---
include/linux/nfs.h | 4 ++
include/linux/nfslocalio.h | 6 +--
7 files changed, 136 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index cc7a32b32676..3051d65e3a89 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2413,6 +2413,9 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
#endif /* CONFIG_NFS_V4 */
#ifdef CONFIG_NFS_V4_2
nfsi->xattr_cache = NULL;
+#endif
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ nfsi->fh.inode_key = NULL;
#endif
nfs_netfs_inode_init(nfsi);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 09404d142d1a..bacebaa1e15c 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -130,7 +130,7 @@ EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client);
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
- const struct nfs_fh *nfs_fh, const fmode_t fmode)
+ struct nfs_fh *nfs_fh, const fmode_t fmode)
{
struct net *net;
struct nfsd_file *localio;
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 1408166222c5..5ab978ac3555 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -221,6 +221,9 @@ nfsd_file_alloc(struct net *net, struct inode *inode, unsigned char need,
INIT_LIST_HEAD(&nf->nf_gc);
nf->nf_birthtime = ktime_get();
nf->nf_file = NULL;
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ nf->nf_inodep = NULL;
+#endif
nf->nf_cred = get_current_cred();
nf->nf_net = net;
nf->nf_flags = want_gc ?
@@ -285,6 +288,12 @@ nfsd_file_free(struct nfsd_file *nf)
nfsd_file_check_write_error(nf);
nfsd_filp_close(nf->nf_file);
}
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ if (nf->nf_inodep) {
+ *(nf->nf_inodep) = NULL;
+ nf->nf_inodep = NULL;
+ }
+#endif
/*
* If this item is still linked via nf_lru, that's a bug.
@@ -1255,6 +1264,75 @@ nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
return beres;
}
+/**
+ * nfsd_file_acquire_cached - Get cached GC'd open file using inode
+ * @net: The network namespace in which to perform a lookup
+ * @cred: the user credential with which to validate access
+ * @inode_key: inode to use as opaque lookup key
+ * @may_flags: NFSD_MAY_ settings for the file
+ * @pnf: OUT: found cached GC'd "struct nfsd_file" object
+ *
+ * Rather than make nfsd_file_do_acquire more complex (by training
+ * it to conditionally skip fh_verify(), nfsd_file allocation and
+ * contruction) duplicate the minimalist subset of it that is
+ * needed to achieve nfsd_file lookup using the opaque @inode_key.
+ *
+ * The nfsd_file object returned by this API is reference-counted
+ * and garbage-collected. The object is retained for a few
+ * seconds after the final nfsd_file_put() in case the caller
+ * wants to re-use it.
+ *
+ * Return values:
+ * %nfs_ok - @pnf points to an nfsd_file with its reference
+ * count boosted.
+ *
+ * On error, an nfsstat value in network byte order is returned.
+ */
+__be32
+nfsd_file_acquire_cached(struct net *net, const struct cred *cred,
+ void *inode_key, unsigned int may_flags,
+ struct nfsd_file **pnf)
+{
+ unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
+ struct nfsd_file *nf;
+ __be32 status;
+
+ rcu_read_lock();
+ nf = nfsd_file_lookup_locked(net, cred, inode_key, need, true);
+ rcu_read_unlock();
+
+ if (unlikely(!nf))
+ return nfserr_noent;
+
+ /*
+ * If the nf is on the LRU then it holds an extra reference
+ * that must be put if it's removed. It had better not be
+ * the last one however, since we should hold another.
+ */
+ if (nfsd_file_lru_remove(nf))
+ WARN_ON_ONCE(refcount_dec_and_test(&nf->nf_ref));
+
+ if (WARN_ON_ONCE(test_bit(NFSD_FILE_PENDING, &nf->nf_flags) ||
+ !test_bit(NFSD_FILE_HASHED, &nf->nf_flags))) {
+ status = nfserr_inval;
+ goto error;
+ }
+ this_cpu_inc(nfsd_file_cache_hits);
+
+ status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags));
+ if (status != nfs_ok) {
+error:
+ nfsd_file_put(nf);
+ nf = NULL;
+ } else {
+ this_cpu_inc(nfsd_file_acquisitions);
+ nfsd_file_check_write_error(nf);
+ *pnf = nf;
+ }
+ trace_nfsd_file_acquire(NULL, inode_key, may_flags, nf, status);
+ return status;
+}
+
/**
* nfsd_file_acquire_opened - Get a struct nfsd_file using existing open file
* @rqstp: the RPC transaction being executed
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index cadf3c2689c4..e000f6988dc8 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -47,6 +47,10 @@ struct nfsd_file {
struct list_head nf_gc;
struct rcu_head nf_rcu;
ktime_t nf_birthtime;
+
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ void ** nf_inodep;
+#endif
};
int nfsd_file_cache_init(void);
@@ -71,5 +75,8 @@ __be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp,
__be32 nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
struct auth_domain *client, struct svc_fh *fhp,
unsigned int may_flags, struct nfsd_file **pnf);
+__be32 nfsd_file_acquire_cached(struct net *net, const struct cred *cred,
+ void *inode_key, unsigned int may_flags,
+ struct nfsd_file **pnf);
int nfsd_file_cache_stats_show(struct seq_file *m, void *v);
#endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index f441cb9f74d5..34a229409117 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -58,33 +58,67 @@ void nfsd_localio_ops_init(void)
struct nfsd_file *
nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
- const struct nfs_fh *nfs_fh, const fmode_t fmode)
+ struct nfs_fh *nfs_fh, const fmode_t fmode)
{
int mayflags = NFSD_MAY_LOCALIO;
struct svc_cred rq_cred;
struct svc_fh fh;
struct nfsd_file *localio;
+ void *nf_inode_key;
__be32 beres;
if (nfs_fh->size > NFS4_FHSIZE)
return ERR_PTR(-EINVAL);
- /* nfs_fh -> svc_fh */
- fh_init(&fh, NFS4_FHSIZE);
- fh.fh_handle.fh_size = nfs_fh->size;
- memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
-
if (fmode & FMODE_READ)
mayflags |= NFSD_MAY_READ;
if (fmode & FMODE_WRITE)
mayflags |= NFSD_MAY_WRITE;
+ /*
+ * Check if 'inode_key' stored in @nfs_fh maps to an
+ * open nfsd_file in the filecache (from a previous
+ * nfsd_open_local_fh).
+ *
+ * The 'inode_key' may have become stale (due to nfsd_file
+ * being free'd by filecache GC) so the lookup will fail
+ * gracefully by falling back to using nfsd_file_acquire_local().
+ */
+ nf_inode_key = READ_ONCE(nfs_fh->inode_key);
+ if (nf_inode_key) {
+ beres = nfsd_file_acquire_cached(net, cred,
+ nf_inode_key,
+ mayflags, &localio);
+ if (beres == nfs_ok)
+ return localio;
+ /*
+ * reset stale nfs_fh->inode_key and fallthru
+ * to use normal nfsd_file_acquire_local().
+ */
+ nfs_fh->inode_key = NULL;
+ }
+
+ /* nfs_fh -> svc_fh */
+ fh_init(&fh, NFS4_FHSIZE);
+ fh.fh_handle.fh_size = nfs_fh->size;
+ memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
+
svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred);
beres = nfsd_file_acquire_local(net, &rq_cred, dom,
&fh, mayflags, &localio);
if (beres)
localio = ERR_PTR(nfs_stat_to_errno(be32_to_cpu(beres)));
+ else {
+ /*
+ * opaque 'inode_key' has a 1:1 mapping to both an
+ * nfsd_file and nfs_fh struct (And the nfs_fh is shared
+ * by all NFS client threads. So there is no risk of
+ * storing competing addresses in nfsd_file->nf_inodep
+ */
+ localio->nf_inodep = (void **) &nfs_fh->inode_key;
+ nfs_fh->inode_key = localio->nf_inode;
+ }
fh_put(&fh);
if (rq_cred.cr_group_info)
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index 9ad727ddfedb..56c894575c70 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -29,6 +29,10 @@
struct nfs_fh {
unsigned short size;
unsigned char data[NFS_MAXFHSIZE];
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ /* 'inode_key' is an opaque key used for nfsd_file hash lookups */
+ void * inode_key;
+#endif
};
/*
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 3982fea79919..619eb1961ed6 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -43,7 +43,7 @@ void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid);
/* localio needs to map filehandle -> struct nfsd_file */
extern struct nfsd_file *
nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
- const struct cred *, const struct nfs_fh *,
+ const struct cred *, struct nfs_fh *,
const fmode_t) __must_hold(rcu);
struct nfsd_localio_operations {
@@ -53,7 +53,7 @@ struct nfsd_localio_operations {
struct auth_domain *,
struct rpc_clnt *,
const struct cred *,
- const struct nfs_fh *,
+ struct nfs_fh *,
const fmode_t);
void (*nfsd_file_put_local)(struct nfsd_file *);
struct file *(*nfsd_file_file)(struct nfsd_file *);
@@ -64,7 +64,7 @@ extern const struct nfsd_localio_operations *nfs_to;
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *,
struct rpc_clnt *, const struct cred *,
- const struct nfs_fh *, const fmode_t);
+ struct nfs_fh *, const fmode_t);
static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
{
--
2.44.0
next reply other threads:[~2024-10-24 18:55 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-24 18:55 Mike Snitzer [this message]
2024-10-25 3:00 ` [PATCH] nfsd/filecache: add nfsd_file_acquire_gc_cached NeilBrown
2024-10-25 12:52 ` Chuck Lever III
2024-10-25 13:31 ` Trond Myklebust
2024-10-25 13:51 ` Chuck Lever III
2024-10-25 14:00 ` Jeff Layton
2024-10-27 21:49 ` NeilBrown
2024-10-25 14:21 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241024185526.76146-1-snitzer@kernel.org \
--to=snitzer@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.