* [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO
@ 2024-11-08 23:39 Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 01/19] nfs/localio: must clear res.replen in nfs_local_read_done Mike Snitzer
` (19 more replies)
0 siblings, 20 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Hi,
I really wanted to post these patches at the beginning of the week (or
sooner) but I had quite a few issues to work through. The biggest
challenge came from trying to develop the final patch only to hit the
wall of needing to find and fix memory corruption with the first
patch.
HUGE special thanks to NeilBrown for helping me identify the source of
the NFSv3 LOCALIO memory corruption fixed by the first patch. Anna,
we'd do well for that patch to land upstream for 6.12 final (but Trond
if it slips to the 6.13 merge window pull that should be fine, as the
Fixes: tag should get it to land in 6.12-stable).
The 2nd patch is also a fundamental fix but it is kernel config
dependant on whether you'll experience the RCU splat it fixes.
Patches 3 - 6 are cleanups I've been carrying since just after the
6.12 merge window.
Patch 7 adds a 'localio_O_DIRECT_semantics' nfs module parameter that
when set will allow the use of O_DIRECT from the LOCALIO client
through to the underlying filesystem.
Patches 8 and beyond are dealing with the leftover bake-a-thon
business of switching from caching LOCALIO's open nfsd_file in the
server to doing so in the client. Definitely took some effort but the
end result is working really well.
This is quite a bit of change at the end of the 6.13 development
window, but I _think_ it worthy of considersation for 6.13 (the bulk
of the changes are confined to fs/nfs/localio.c and
fs/nfs_common/nfslocalio.c which are only built if LOCALIO Kconfig
options enabled (even general NFS code paths are all wrapped with
CONFIG_NFS_LOCALIO).
I'm happy to work through any issues found in review with urgency next
week (or this weekend if others are interested to look and happen to
find something).
Happy to take it as it comes, I'm in no way _pushing_ for these
changes to land for 6.13. I'm just now comfortable posting them for
serious consideration.
Thanks,
Mike
Mike Snitzer (18):
nfs_common: must not hold RCU while calling nfsd_file_put_local
nfs/localio: remove redundant suid/sgid handling
nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx
nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
nfs/localio: eliminate need for nfs_local_fsync_work forward declaration
nfs/localio: add direct IO enablement with sync and async IO support
nfsd: add nfsd_file_{get,put} to 'nfs_to' nfsd_localio_operations
nfs_common: rename functions that invalidate LOCALIO nfs_clients
nfs_common: move localio_lock to new lock member of nfs_uuid_t
nfs: cache all open LOCALIO nfsd_file(s) in client
nfsd: update percpu_ref to manage references on nfsd_net
nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file
nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
nfs_common: track all open nfsd_files per LOCALIO nfs_client
nfs_common: add nfs_localio trace events
nfs: probe for LOCALIO when v4 client reconnects to server
nfs: probe for LOCALIO when v3 client reconnects to server
NeilBrown (1):
nfs/localio: must clear res.replen in nfs_local_read_done
fs/nfs/client.c | 1 -
fs/nfs/direct.c | 1 +
fs/nfs/flexfilelayout/flexfilelayout.c | 29 ++-
fs/nfs/flexfilelayout/flexfilelayout.h | 1 +
fs/nfs/inode.c | 3 +
fs/nfs/internal.h | 11 +-
fs/nfs/localio.c | 319 ++++++++++++++++++-------
fs/nfs/nfs3proc.c | 34 ++-
fs/nfs/nfs4state.c | 1 +
fs/nfs/pagelist.c | 5 +-
fs/nfs/write.c | 3 +-
fs/nfs_common/Makefile | 3 +-
fs/nfs_common/localio_trace.c | 10 +
fs/nfs_common/localio_trace.h | 56 +++++
fs/nfs_common/nfslocalio.c | 269 ++++++++++++++++-----
fs/nfsd/filecache.c | 32 ++-
fs/nfsd/filecache.h | 2 +-
fs/nfsd/localio.c | 9 +-
fs/nfsd/netns.h | 12 +-
fs/nfsd/nfsctl.c | 6 +-
fs/nfsd/nfssvc.c | 40 ++--
include/linux/nfs_fs.h | 22 +-
include/linux/nfs_fs_sb.h | 3 +-
include/linux/nfs_xdr.h | 1 +
include/linux/nfslocalio.h | 65 +++--
25 files changed, 712 insertions(+), 226 deletions(-)
create mode 100644 fs/nfs_common/localio_trace.c
create mode 100644 fs/nfs_common/localio_trace.h
--
2.44.0
^ permalink raw reply [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 01/19] nfs/localio: must clear res.replen in nfs_local_read_done
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-11 0:36 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 02/19] nfs_common: must not hold RCU while calling nfsd_file_put_local Mike Snitzer
` (18 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
From: NeilBrown <neilb@suse.de>
Otherwise memory corruption can occur due to NFSv3 LOCALIO reads
leaving garbage in res.replen:
- nfs3_read_done() copies that into server->read_hdrsize; from there
nfs3_proc_read_setup() copies it to args.replen in new requests.
- nfs3_xdr_enc_read3args() passes that to rpc_prepare_reply_pages()
which includes it in hdrsize for xdr_init_pages, so that rq_rcv_buf
contains a ridiculous len.
- This is copied to rq_private_buf and xs_read_stream_request()
eventually passes the kvec to sock_recvmsg() which receives incoming
data into entirely the wrong place.
This is easily reproduced with NFSv3 LOCALIO that is servicing reads
when it is made to pivot back to using normal RPC. This switch back
to using normal NFSv3 with RPC can occur for a few reasons but this
issue was exposed with a test that stops and then restarts the NFSv3
server while LOCALIO is performing heavy read IO.
Fixes: 70ba381e1a43 ("nfs: add LOCALIO support")
Reported-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 8f0ce82a677e..637528e6368e 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -354,6 +354,12 @@ nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
nfs_local_pgio_done(hdr, status);
+ /*
+ * Must clear replen otherwise NFSv3 data corruption will occur
+ * if/when switching from LOCALIO back to using normal RPC.
+ */
+ hdr->res.replen = 0;
+
if (hdr->res.count != hdr->args.count ||
hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp)))
hdr->res.eof = true;
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 02/19] nfs_common: must not hold RCU while calling nfsd_file_put_local
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 01/19] nfs/localio: must clear res.replen in nfs_local_read_done Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-11 1:01 ` NeilBrown
2024-11-13 14:58 ` Jeff Layton
2024-11-08 23:39 ` [for-6.13 PATCH 03/19] nfs/localio: remove redundant suid/sgid handling Mike Snitzer
` (17 subsequent siblings)
19 siblings, 2 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Move holding the RCU from nfs_to_nfsd_file_put_local to
nfs_to_nfsd_net_put. It is the call to nfs_to->nfsd_serv_put that
requires the RCU anyway (the puts for nfsd_file and netns were
combined to avoid an extra indirect reference but that
micro-optimization isn't possible now).
This fixes xfstests generic/013 and it triggering:
"Voluntary context switch within RCU read-side critical section!"
[ 143.545738] Call Trace:
[ 143.546206] <TASK>
[ 143.546625] ? show_regs+0x6d/0x80
[ 143.547267] ? __warn+0x91/0x140
[ 143.547951] ? rcu_note_context_switch+0x496/0x5d0
[ 143.548856] ? report_bug+0x193/0x1a0
[ 143.549557] ? handle_bug+0x63/0xa0
[ 143.550214] ? exc_invalid_op+0x1d/0x80
[ 143.550938] ? asm_exc_invalid_op+0x1f/0x30
[ 143.551736] ? rcu_note_context_switch+0x496/0x5d0
[ 143.552634] ? wakeup_preempt+0x62/0x70
[ 143.553358] __schedule+0xaa/0x1380
[ 143.554025] ? _raw_spin_unlock_irqrestore+0x12/0x40
[ 143.554958] ? try_to_wake_up+0x1fe/0x6b0
[ 143.555715] ? wake_up_process+0x19/0x20
[ 143.556452] schedule+0x2e/0x120
[ 143.557066] schedule_preempt_disabled+0x19/0x30
[ 143.557933] rwsem_down_read_slowpath+0x24d/0x4a0
[ 143.558818] ? xfs_efi_item_format+0x50/0xc0 [xfs]
[ 143.559894] down_read+0x4e/0xb0
[ 143.560519] xlog_cil_commit+0x1b2/0xbc0 [xfs]
[ 143.561460] ? _raw_spin_unlock+0x12/0x30
[ 143.562212] ? xfs_inode_item_precommit+0xc7/0x220 [xfs]
[ 143.563309] ? xfs_trans_run_precommits+0x69/0xd0 [xfs]
[ 143.564394] __xfs_trans_commit+0xb5/0x330 [xfs]
[ 143.565367] xfs_trans_roll+0x48/0xc0 [xfs]
[ 143.566262] xfs_defer_trans_roll+0x57/0x100 [xfs]
[ 143.567278] xfs_defer_finish_noroll+0x27a/0x490 [xfs]
[ 143.568342] xfs_defer_finish+0x1a/0x80 [xfs]
[ 143.569267] xfs_bunmapi_range+0x4d/0xb0 [xfs]
[ 143.570208] xfs_itruncate_extents_flags+0x13d/0x230 [xfs]
[ 143.571353] xfs_free_eofblocks+0x12e/0x190 [xfs]
[ 143.572359] xfs_file_release+0x12d/0x140 [xfs]
[ 143.573324] __fput+0xe8/0x2d0
[ 143.573922] __fput_sync+0x1d/0x30
[ 143.574574] nfsd_filp_close+0x33/0x60 [nfsd]
[ 143.575430] nfsd_file_free+0x96/0x150 [nfsd]
[ 143.576274] nfsd_file_put+0xf7/0x1a0 [nfsd]
[ 143.577104] nfsd_file_put_local+0x18/0x30 [nfsd]
[ 143.578070] nfs_close_local_fh+0x101/0x110 [nfs_localio]
[ 143.579079] __put_nfs_open_context+0xc9/0x180 [nfs]
[ 143.580031] nfs_file_clear_open_context+0x4a/0x60 [nfs]
[ 143.581038] nfs_file_release+0x3e/0x60 [nfs]
[ 143.581879] __fput+0xe8/0x2d0
[ 143.582464] __fput_sync+0x1d/0x30
[ 143.583108] __x64_sys_close+0x41/0x80
[ 143.583823] x64_sys_call+0x189a/0x20d0
[ 143.584552] do_syscall_64+0x64/0x170
[ 143.585240] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 143.586185] RIP: 0033:0x7f3c5153efd7
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs_common/nfslocalio.c | 8 +++-----
fs/nfsd/filecache.c | 14 +++++++-------
fs/nfsd/filecache.h | 2 +-
include/linux/nfslocalio.h | 18 +++++++++++++++---
4 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 09404d142d1a..a74ec08f6c96 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -155,11 +155,9 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
/* We have an implied reference to net thanks to nfsd_serv_try_get */
localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt,
cred, nfs_fh, fmode);
- if (IS_ERR(localio)) {
- rcu_read_lock();
- nfs_to->nfsd_serv_put(net);
- rcu_read_unlock();
- }
+ if (IS_ERR(localio))
+ nfs_to_nfsd_net_put(net);
+
return localio;
}
EXPORT_SYMBOL_GPL(nfs_open_local_fh);
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index c16671135d17..9a62b4da89bb 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -391,19 +391,19 @@ nfsd_file_put(struct nfsd_file *nf)
}
/**
- * nfsd_file_put_local - put the reference to nfsd_file and local nfsd_serv
- * @nf: nfsd_file of which to put the references
+ * nfsd_file_put_local - put nfsd_file reference and arm nfsd_serv_put in caller
+ * @nf: nfsd_file of which to put the reference
*
- * First put the reference of the nfsd_file and then put the
- * reference to the associated nn->nfsd_serv.
+ * First save the associated net to return to caller, then put
+ * the reference of the nfsd_file.
*/
-void
-nfsd_file_put_local(struct nfsd_file *nf) __must_hold(rcu)
+struct net *
+nfsd_file_put_local(struct nfsd_file *nf)
{
struct net *net = nf->nf_net;
nfsd_file_put(nf);
- nfsd_serv_put(net);
+ return net;
}
/**
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index cadf3c2689c4..d5db6b34ba30 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -55,7 +55,7 @@ void nfsd_file_cache_shutdown(void);
int nfsd_file_cache_start_net(struct net *net);
void nfsd_file_cache_shutdown_net(struct net *net);
void nfsd_file_put(struct nfsd_file *nf);
-void nfsd_file_put_local(struct nfsd_file *nf);
+struct net *nfsd_file_put_local(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
struct file *nfsd_file_file(struct nfsd_file *nf);
void nfsd_file_close_inode_sync(struct inode *inode);
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 3982fea79919..9202f4b24343 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -55,7 +55,7 @@ struct nfsd_localio_operations {
const struct cred *,
const struct nfs_fh *,
const fmode_t);
- void (*nfsd_file_put_local)(struct nfsd_file *);
+ struct net *(*nfsd_file_put_local)(struct nfsd_file *);
struct file *(*nfsd_file_file)(struct nfsd_file *);
} ____cacheline_aligned;
@@ -66,7 +66,7 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *,
struct rpc_clnt *, const struct cred *,
const struct nfs_fh *, const fmode_t);
-static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
+static inline void nfs_to_nfsd_net_put(struct net *net)
{
/*
* Once reference to nfsd_serv is dropped, NFSD could be
@@ -74,10 +74,22 @@ static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
* by always taking RCU.
*/
rcu_read_lock();
- nfs_to->nfsd_file_put_local(localio);
+ nfs_to->nfsd_serv_put(net);
rcu_read_unlock();
}
+static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
+{
+ /*
+ * Must not hold RCU otherwise nfsd_file_put() can easily trigger:
+ * "Voluntary context switch within RCU read-side critical section!"
+ * by scheduling deep in underlying filesystem (e.g. XFS).
+ */
+ struct net *net = nfs_to->nfsd_file_put_local(localio);
+
+ nfs_to_nfsd_net_put(net);
+}
+
#else /* CONFIG_NFS_LOCALIO */
static inline void nfsd_localio_ops_init(void)
{
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 03/19] nfs/localio: remove redundant suid/sgid handling
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 01/19] nfs/localio: must clear res.replen in nfs_local_read_done Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 02/19] nfs_common: must not hold RCU while calling nfsd_file_put_local Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-11 1:09 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 04/19] nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx Mike Snitzer
` (16 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
From: Mike Snitzer <snitzer@hammerspace.com>
nfs_writeback_done() will take care of suid/sgid corner case.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 637528e6368e..4b24933093b6 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -527,12 +527,7 @@ nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
}
if (status < 0)
nfs_reset_boot_verifier(inode);
- else if (nfs_should_remove_suid(inode)) {
- /* Deal with the suid/sgid bit corner case */
- spin_lock(&inode->i_lock);
- nfs_set_cache_invalid(inode, NFS_INO_INVALID_MODE);
- spin_unlock(&inode->i_lock);
- }
+
nfs_local_pgio_done(hdr, status);
}
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 04/19] nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (2 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 03/19] nfs/localio: remove redundant suid/sgid handling Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-11 1:15 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 05/19] nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter Mike Snitzer
` (15 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
nfs_local_commit() doesn't need async cleanup of nfs_local_fsync_ctx,
so there is no need to use a kref.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 20 +++-----------------
1 file changed, 3 insertions(+), 17 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 4b24933093b6..a7eb83a604d0 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -42,7 +42,6 @@ struct nfs_local_fsync_ctx {
struct nfsd_file *localio;
struct nfs_commit_data *data;
struct work_struct work;
- struct kref kref;
struct completion *done;
};
static void nfs_local_fsync_work(struct work_struct *work);
@@ -689,30 +688,17 @@ nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data,
ctx->localio = localio;
ctx->data = data;
INIT_WORK(&ctx->work, nfs_local_fsync_work);
- kref_init(&ctx->kref);
ctx->done = NULL;
}
return ctx;
}
-static void
-nfs_local_fsync_ctx_kref_free(struct kref *kref)
-{
- kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
-}
-
-static void
-nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
-{
- kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
-}
-
static void
nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
{
nfs_local_release_commit_data(ctx->localio, ctx->data,
ctx->data->task.tk_ops);
- nfs_local_fsync_ctx_put(ctx);
+ kfree(ctx);
}
static void
@@ -745,7 +731,7 @@ int nfs_local_commit(struct nfsd_file *localio,
}
nfs_local_init_commit(data, call_ops);
- kref_get(&ctx->kref);
+
if (how & FLUSH_SYNC) {
DECLARE_COMPLETION_ONSTACK(done);
ctx->done = &done;
@@ -753,6 +739,6 @@ int nfs_local_commit(struct nfsd_file *localio,
wait_for_completion(&done);
} else
queue_work(nfsiod_workqueue, &ctx->work);
- nfs_local_fsync_ctx_put(ctx);
+
return 0;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 05/19] nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (3 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 04/19] nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-11 1:20 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 06/19] nfs/localio: eliminate need for nfs_local_fsync_work forward declaration Mike Snitzer
` (14 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Push the read_iter and write_iter availability checks down to
nfs_do_local_read and nfs_do_local_write respectively.
This eliminates a redundant nfs_to->nfsd_file_file() call.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index a7eb83a604d0..a77ac7e8a05c 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -273,7 +273,7 @@ nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
static struct nfs_local_kiocb *
nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
- struct nfsd_file *localio, gfp_t flags)
+ struct file *file, gfp_t flags)
{
struct nfs_local_kiocb *iocb;
@@ -286,9 +286,8 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
kfree(iocb);
return NULL;
}
- init_sync_kiocb(&iocb->kiocb, nfs_to->nfsd_file_file(localio));
+ init_sync_kiocb(&iocb->kiocb, file);
iocb->kiocb.ki_pos = hdr->args.offset;
- iocb->localio = localio;
iocb->hdr = hdr;
iocb->kiocb.ki_flags &= ~IOCB_APPEND;
return iocb;
@@ -395,13 +394,19 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
const struct rpc_call_ops *call_ops)
{
struct nfs_local_kiocb *iocb;
+ struct file *file = nfs_to->nfsd_file_file(localio);
+
+ /* Don't support filesystems without read_iter */
+ if (!file->f_op->read_iter)
+ return -EAGAIN;
dprintk("%s: vfs_read count=%u pos=%llu\n",
__func__, hdr->args.count, hdr->args.offset);
- iocb = nfs_local_iocb_alloc(hdr, localio, GFP_KERNEL);
+ iocb = nfs_local_iocb_alloc(hdr, file, GFP_KERNEL);
if (iocb == NULL)
return -ENOMEM;
+ iocb->localio = localio;
nfs_local_pgio_init(hdr, call_ops);
hdr->res.eof = false;
@@ -564,14 +569,20 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
const struct rpc_call_ops *call_ops)
{
struct nfs_local_kiocb *iocb;
+ struct file *file = nfs_to->nfsd_file_file(localio);
+
+ /* Don't support filesystems without write_iter */
+ if (!file->f_op->write_iter)
+ return -EAGAIN;
dprintk("%s: vfs_write count=%u pos=%llu %s\n",
__func__, hdr->args.count, hdr->args.offset,
(hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
- iocb = nfs_local_iocb_alloc(hdr, localio, GFP_NOIO);
+ iocb = nfs_local_iocb_alloc(hdr, file, GFP_NOIO);
if (iocb == NULL)
return -ENOMEM;
+ iocb->localio = localio;
switch (hdr->args.stable) {
default:
@@ -597,16 +608,9 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
const struct rpc_call_ops *call_ops)
{
int status = 0;
- struct file *filp = nfs_to->nfsd_file_file(localio);
if (!hdr->args.count)
return 0;
- /* Don't support filesystems without read_iter/write_iter */
- if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
- nfs_local_disable(clp);
- status = -EAGAIN;
- goto out;
- }
switch (hdr->rw_mode) {
case FMODE_READ:
@@ -620,8 +624,10 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
hdr->rw_mode);
status = -EINVAL;
}
-out:
+
if (status != 0) {
+ if (status == -EAGAIN)
+ nfs_local_disable(clp);
nfs_to_nfsd_file_put_local(localio);
hdr->task.tk_status = status;
nfs_local_hdr_release(hdr, call_ops);
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 06/19] nfs/localio: eliminate need for nfs_local_fsync_work forward declaration
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (4 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 05/19] nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-11 1:21 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 07/19] nfs/localio: add direct IO enablement with sync and async IO support Mike Snitzer
` (13 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Move nfs_local_fsync_ctx_alloc() after nfs_local_fsync_work().
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 31 +++++++++++++++----------------
1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index a77ac7e8a05c..4b8618cf114c 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -44,7 +44,6 @@ struct nfs_local_fsync_ctx {
struct work_struct work;
struct completion *done;
};
-static void nfs_local_fsync_work(struct work_struct *work);
static bool localio_enabled __read_mostly = true;
module_param(localio_enabled, bool, 0644);
@@ -684,21 +683,6 @@ nfs_local_release_commit_data(struct nfsd_file *localio,
call_ops->rpc_release(data);
}
-static struct nfs_local_fsync_ctx *
-nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data,
- struct nfsd_file *localio, gfp_t flags)
-{
- struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
-
- if (ctx != NULL) {
- ctx->localio = localio;
- ctx->data = data;
- INIT_WORK(&ctx->work, nfs_local_fsync_work);
- ctx->done = NULL;
- }
- return ctx;
-}
-
static void
nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
{
@@ -723,6 +707,21 @@ nfs_local_fsync_work(struct work_struct *work)
nfs_local_fsync_ctx_free(ctx);
}
+static struct nfs_local_fsync_ctx *
+nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data,
+ struct nfsd_file *localio, gfp_t flags)
+{
+ struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
+
+ if (ctx != NULL) {
+ ctx->localio = localio;
+ ctx->data = data;
+ INIT_WORK(&ctx->work, nfs_local_fsync_work);
+ ctx->done = NULL;
+ }
+ return ctx;
+}
+
int nfs_local_commit(struct nfsd_file *localio,
struct nfs_commit_data *data,
const struct rpc_call_ops *call_ops, int how)
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 07/19] nfs/localio: add direct IO enablement with sync and async IO support
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (5 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 06/19] nfs/localio: eliminate need for nfs_local_fsync_work forward declaration Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-11 1:31 ` NeilBrown
2024-11-12 14:31 ` Chuck Lever
2024-11-08 23:39 ` [for-6.13 PATCH 08/19] nfsd: add nfsd_file_{get,put} to 'nfs_to' nfsd_localio_operations Mike Snitzer
` (12 subsequent siblings)
19 siblings, 2 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
This commit simply adds the required O_DIRECT plumbing. It doesn't
address the fact that NFS doesn't ensure all writes are page aligned
(nor device logical block size aligned as required by O_DIRECT).
Because NFS will read-modify-write for IO that isn't aligned, LOCALIO
will not use O_DIRECT semantics by default if/when an application
requests the use of O_DIRECT. Allow the use of O_DIRECT semantics by:
1: Adding a flag to the nfs_pgio_header struct to allow the NFS
O_DIRECT layer to signal that O_DIRECT was used by the application
2: Adding a 'localio_O_DIRECT_semantics' NFS module parameter that
when enabled will cause LOCALIO to use O_DIRECT semantics (this may
cause IO to fail if applications do not properly align their IO).
Adding Direct IO support helps side-step the problem that LOCALIO
currently double buffers buffered IO (by using page cache in both NFS
and the underlying filesystem). More care is needed to craft a proper
solution for LOCALIO's redundant use of page cache for buffered IO,
e.g.: https://marc.info/?l=linux-nfs&m=171996211625151&w=2
This commit is derived from code developed by Weston Andros Adamson.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/direct.c | 1 +
fs/nfs/localio.c | 92 ++++++++++++++++++++++++++++++++++++-----
include/linux/nfs_xdr.h | 1 +
3 files changed, 84 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 90079ca134dd..4b92493d6ff0 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -303,6 +303,7 @@ static void nfs_read_sync_pgio_error(struct list_head *head, int error)
static void nfs_direct_pgio_init(struct nfs_pgio_header *hdr)
{
get_dreq(hdr->dreq);
+ set_bit(NFS_IOHDR_ODIRECT, &hdr->flags);
}
static const struct nfs_pgio_completion_ops nfs_direct_read_completion_ops = {
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 4b8618cf114c..de0dcd76d84d 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -35,6 +35,7 @@ struct nfs_local_kiocb {
struct bio_vec *bvec;
struct nfs_pgio_header *hdr;
struct work_struct work;
+ void (*aio_complete_work)(struct work_struct *);
struct nfsd_file *localio;
};
@@ -48,6 +49,10 @@ struct nfs_local_fsync_ctx {
static bool localio_enabled __read_mostly = true;
module_param(localio_enabled, bool, 0644);
+static bool localio_O_DIRECT_semantics __read_mostly = false;
+module_param(localio_O_DIRECT_semantics, bool, 0644);
+MODULE_PARM_DESC(localio_O_DIRECT_semantics, "Use O_DIRECT semantics");
+
static inline bool nfs_client_is_local(const struct nfs_client *clp)
{
return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
@@ -285,10 +290,19 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
kfree(iocb);
return NULL;
}
- init_sync_kiocb(&iocb->kiocb, file);
+
+ if (localio_O_DIRECT_semantics &&
+ test_bit(NFS_IOHDR_ODIRECT, &hdr->flags)) {
+ iocb->kiocb.ki_filp = file;
+ iocb->kiocb.ki_flags = IOCB_DIRECT;
+ } else
+ init_sync_kiocb(&iocb->kiocb, file);
+
iocb->kiocb.ki_pos = hdr->args.offset;
iocb->hdr = hdr;
iocb->kiocb.ki_flags &= ~IOCB_APPEND;
+ iocb->aio_complete_work = NULL;
+
return iocb;
}
@@ -343,6 +357,18 @@ nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
nfs_local_hdr_release(hdr, hdr->task.tk_ops);
}
+/*
+ * Complete the I/O from iocb->kiocb.ki_complete()
+ *
+ * Note that this function can be called from a bottom half context,
+ * hence we need to queue the rpc_call_done() etc to a workqueue
+ */
+static inline void nfs_local_pgio_aio_complete(struct nfs_local_kiocb *iocb)
+{
+ INIT_WORK(&iocb->work, iocb->aio_complete_work);
+ queue_work(nfsiod_workqueue, &iocb->work);
+}
+
static void
nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
{
@@ -365,6 +391,23 @@ nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
status > 0 ? status : 0, hdr->res.eof);
}
+static void nfs_local_read_aio_complete_work(struct work_struct *work)
+{
+ struct nfs_local_kiocb *iocb =
+ container_of(work, struct nfs_local_kiocb, work);
+
+ nfs_local_pgio_release(iocb);
+}
+
+static void nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
+{
+ struct nfs_local_kiocb *iocb =
+ container_of(kiocb, struct nfs_local_kiocb, kiocb);
+
+ nfs_local_read_done(iocb, ret);
+ nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_read_aio_complete_work */
+}
+
static void nfs_local_call_read(struct work_struct *work)
{
struct nfs_local_kiocb *iocb =
@@ -379,10 +422,10 @@ static void nfs_local_call_read(struct work_struct *work)
nfs_local_iter_init(&iter, iocb, READ);
status = filp->f_op->read_iter(&iocb->kiocb, &iter);
- WARN_ON_ONCE(status == -EIOCBQUEUED);
-
- nfs_local_read_done(iocb, status);
- nfs_local_pgio_release(iocb);
+ if (status != -EIOCBQUEUED) {
+ nfs_local_read_done(iocb, status);
+ nfs_local_pgio_release(iocb);
+ }
revert_creds(save_cred);
}
@@ -410,6 +453,11 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
nfs_local_pgio_init(hdr, call_ops);
hdr->res.eof = false;
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
+ iocb->aio_complete_work = nfs_local_read_aio_complete_work;
+ }
+
INIT_WORK(&iocb->work, nfs_local_call_read);
queue_work(nfslocaliod_workqueue, &iocb->work);
@@ -534,6 +582,24 @@ nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
nfs_local_pgio_done(hdr, status);
}
+static void nfs_local_write_aio_complete_work(struct work_struct *work)
+{
+ struct nfs_local_kiocb *iocb =
+ container_of(work, struct nfs_local_kiocb, work);
+
+ nfs_local_vfs_getattr(iocb);
+ nfs_local_pgio_release(iocb);
+}
+
+static void nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
+{
+ struct nfs_local_kiocb *iocb =
+ container_of(kiocb, struct nfs_local_kiocb, kiocb);
+
+ nfs_local_write_done(iocb, ret);
+ nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_write_aio_complete_work */
+}
+
static void nfs_local_call_write(struct work_struct *work)
{
struct nfs_local_kiocb *iocb =
@@ -552,11 +618,11 @@ static void nfs_local_call_write(struct work_struct *work)
file_start_write(filp);
status = filp->f_op->write_iter(&iocb->kiocb, &iter);
file_end_write(filp);
- WARN_ON_ONCE(status == -EIOCBQUEUED);
-
- nfs_local_write_done(iocb, status);
- nfs_local_vfs_getattr(iocb);
- nfs_local_pgio_release(iocb);
+ if (status != -EIOCBQUEUED) {
+ nfs_local_write_done(iocb, status);
+ nfs_local_vfs_getattr(iocb);
+ nfs_local_pgio_release(iocb);
+ }
revert_creds(save_cred);
current->flags = old_flags;
@@ -592,10 +658,16 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
case NFS_FILE_SYNC:
iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
}
+
nfs_local_pgio_init(hdr, call_ops);
nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
+ iocb->aio_complete_work = nfs_local_write_aio_complete_work;
+ }
+
INIT_WORK(&iocb->work, nfs_local_call_write);
queue_work(nfslocaliod_workqueue, &iocb->work);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index e0ae0a14257f..f30e94d105b7 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1632,6 +1632,7 @@ enum {
NFS_IOHDR_RESEND_PNFS,
NFS_IOHDR_RESEND_MDS,
NFS_IOHDR_UNSTABLE_WRITES,
+ NFS_IOHDR_ODIRECT,
};
struct nfs_io_completion;
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 08/19] nfsd: add nfsd_file_{get,put} to 'nfs_to' nfsd_localio_operations
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (6 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 07/19] nfs/localio: add direct IO enablement with sync and async IO support Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 09/19] nfs_common: rename functions that invalidate LOCALIO nfs_clients Mike Snitzer
` (11 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
In later a commit LOCALIO must call both nfsd_file_get and
nfsd_file_put to manage extra nfsd_file references.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/localio.c | 2 ++
include/linux/nfslocalio.h | 2 ++
2 files changed, 4 insertions(+)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index f441cb9f74d5..8beda4c85111 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -29,6 +29,8 @@ static const struct nfsd_localio_operations nfsd_localio_ops = {
.nfsd_serv_put = nfsd_serv_put,
.nfsd_open_local_fh = nfsd_open_local_fh,
.nfsd_file_put_local = nfsd_file_put_local,
+ .nfsd_file_get = nfsd_file_get,
+ .nfsd_file_put = nfsd_file_put,
.nfsd_file_file = nfsd_file_file,
};
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 9202f4b24343..ab6a2a53f505 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -56,6 +56,8 @@ struct nfsd_localio_operations {
const struct nfs_fh *,
const fmode_t);
struct net *(*nfsd_file_put_local)(struct nfsd_file *);
+ struct nfsd_file *(*nfsd_file_get)(struct nfsd_file *);
+ void (*nfsd_file_put)(struct nfsd_file *);
struct file *(*nfsd_file_file)(struct nfsd_file *);
} ____cacheline_aligned;
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 09/19] nfs_common: rename functions that invalidate LOCALIO nfs_clients
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (7 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 08/19] nfsd: add nfsd_file_{get,put} to 'nfs_to' nfsd_localio_operations Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-11 1:32 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t Mike Snitzer
` (10 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Rename nfs_uuid_invalidate_one_client to nfs_localio_disable_client.
Rename nfs_uuid_invalidate_clients to nfs_localio_invalidate_clients.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 2 +-
fs/nfs_common/nfslocalio.c | 8 ++++----
fs/nfsd/nfsctl.c | 4 ++--
include/linux/nfslocalio.h | 5 +++--
4 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index de0dcd76d84d..cab2a8819259 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -139,7 +139,7 @@ void nfs_local_disable(struct nfs_client *clp)
spin_lock(&clp->cl_localio_lock);
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
trace_nfs_local_disable(clp);
- nfs_uuid_invalidate_one_client(&clp->cl_uuid);
+ nfs_localio_disable_client(&clp->cl_uuid);
}
spin_unlock(&clp->cl_localio_lock);
}
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index a74ec08f6c96..904439e4bb85 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -107,7 +107,7 @@ static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
list_del_init(&nfs_uuid->list);
}
-void nfs_uuid_invalidate_clients(struct list_head *list)
+void nfs_localio_invalidate_clients(struct list_head *list)
{
nfs_uuid_t *nfs_uuid, *tmp;
@@ -116,9 +116,9 @@ void nfs_uuid_invalidate_clients(struct list_head *list)
nfs_uuid_put_locked(nfs_uuid);
spin_unlock(&nfs_uuid_lock);
}
-EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_clients);
+EXPORT_SYMBOL_GPL(nfs_localio_invalidate_clients);
-void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid)
+void nfs_localio_disable_client(nfs_uuid_t *nfs_uuid)
{
if (nfs_uuid->net) {
spin_lock(&nfs_uuid_lock);
@@ -126,7 +126,7 @@ void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid)
spin_unlock(&nfs_uuid_lock);
}
}
-EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client);
+EXPORT_SYMBOL_GPL(nfs_localio_disable_client);
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 3adbc05ebaac..727904d8a4d0 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2276,14 +2276,14 @@ static __net_init int nfsd_net_init(struct net *net)
* nfsd_net_pre_exit - Disconnect localio clients from net namespace
* @net: a network namespace that is about to be destroyed
*
- * This invalidated ->net pointers held by localio clients
+ * This invalidates ->net pointers held by localio clients
* while they can still safely access nn->counter.
*/
static __net_exit void nfsd_net_pre_exit(struct net *net)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
- nfs_uuid_invalidate_clients(&nn->local_clients);
+ nfs_localio_invalidate_clients(&nn->local_clients);
}
#endif
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index ab6a2a53f505..a05d1043f2b0 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -37,8 +37,9 @@ bool nfs_uuid_begin(nfs_uuid_t *);
void nfs_uuid_end(nfs_uuid_t *);
void nfs_uuid_is_local(const uuid_t *, struct list_head *,
struct net *, struct auth_domain *, struct module *);
-void nfs_uuid_invalidate_clients(struct list_head *list);
-void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid);
+
+void nfs_localio_disable_client(nfs_uuid_t *nfs_uuid);
+void nfs_localio_invalidate_clients(struct list_head *list);
/* localio needs to map filehandle -> struct nfsd_file */
extern struct nfsd_file *
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (8 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 09/19] nfs_common: rename functions that invalidate LOCALIO nfs_clients Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-11 1:55 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 11/19] nfs: cache all open LOCALIO nfsd_file(s) in client Mike Snitzer
` (9 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Remove cl_localio_lock from 'struct nfs_client' in favor of adding a
lock to the nfs_uuid_t struct (which is embedded in each nfs_client).
Push nfs_local_{enable,disable} implementation down to nfs_common.
Those methods now call nfs_localio_{enable,disable}_client.
This allows implementing nfs_localio_invalidate_clients in terms of
nfs_localio_disable_client.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/client.c | 1 -
fs/nfs/localio.c | 18 ++++++------
fs/nfs_common/nfslocalio.c | 57 ++++++++++++++++++++++++++------------
include/linux/nfs_fs_sb.h | 1 -
include/linux/nfslocalio.h | 8 +++++-
5 files changed, 55 insertions(+), 30 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 03ecc7765615..124232054807 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -182,7 +182,6 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
seqlock_init(&clp->cl_boot_lock);
ktime_get_real_ts64(&clp->cl_nfssvc_boot);
nfs_uuid_init(&clp->cl_uuid);
- spin_lock_init(&clp->cl_localio_lock);
#endif /* CONFIG_NFS_LOCALIO */
clp->cl_principal = "*";
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index cab2a8819259..4c75ffc5efa2 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -125,10 +125,8 @@ const struct rpc_program nfslocalio_program = {
*/
static void nfs_local_enable(struct nfs_client *clp)
{
- spin_lock(&clp->cl_localio_lock);
- set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
trace_nfs_local_enable(clp);
- spin_unlock(&clp->cl_localio_lock);
+ nfs_localio_enable_client(clp);
}
/*
@@ -136,12 +134,8 @@ static void nfs_local_enable(struct nfs_client *clp)
*/
void nfs_local_disable(struct nfs_client *clp)
{
- spin_lock(&clp->cl_localio_lock);
- if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
- trace_nfs_local_disable(clp);
- nfs_localio_disable_client(&clp->cl_uuid);
- }
- spin_unlock(&clp->cl_localio_lock);
+ trace_nfs_local_disable(clp);
+ nfs_localio_disable_client(clp);
}
/*
@@ -183,8 +177,12 @@ static bool nfs_server_uuid_is_local(struct nfs_client *clp)
rpc_shutdown_client(rpcclient_localio);
/* Server is only local if it initialized required struct members */
- if (status || !clp->cl_uuid.net || !clp->cl_uuid.dom)
+ rcu_read_lock();
+ if (status || !rcu_access_pointer(clp->cl_uuid.net) || !clp->cl_uuid.dom) {
+ rcu_read_unlock();
return false;
+ }
+ rcu_read_unlock();
return true;
}
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 904439e4bb85..cf2f47ea4f8d 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -7,6 +7,9 @@
#include <linux/module.h>
#include <linux/list.h>
#include <linux/nfslocalio.h>
+#include <linux/nfs3.h>
+#include <linux/nfs4.h>
+#include <linux/nfs_fs_sb.h>
#include <net/netns/generic.h>
MODULE_LICENSE("GPL");
@@ -25,6 +28,7 @@ void nfs_uuid_init(nfs_uuid_t *nfs_uuid)
nfs_uuid->net = NULL;
nfs_uuid->dom = NULL;
INIT_LIST_HEAD(&nfs_uuid->list);
+ spin_lock_init(&nfs_uuid->lock);
}
EXPORT_SYMBOL_GPL(nfs_uuid_init);
@@ -94,12 +98,23 @@ void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
}
EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
+void nfs_localio_enable_client(struct nfs_client *clp)
+{
+ nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
+
+ spin_lock(&nfs_uuid->lock);
+ set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+ spin_unlock(&nfs_uuid->lock);
+}
+EXPORT_SYMBOL_GPL(nfs_localio_enable_client);
+
static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
{
- if (nfs_uuid->net) {
- module_put(nfsd_mod);
- nfs_uuid->net = NULL;
- }
+ if (!nfs_uuid->net)
+ return;
+ module_put(nfsd_mod);
+ rcu_assign_pointer(nfs_uuid->net, NULL);
+
if (nfs_uuid->dom) {
auth_domain_put(nfs_uuid->dom);
nfs_uuid->dom = NULL;
@@ -107,27 +122,35 @@ static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
list_del_init(&nfs_uuid->list);
}
-void nfs_localio_invalidate_clients(struct list_head *list)
+void nfs_localio_disable_client(struct nfs_client *clp)
+{
+ nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
+
+ spin_lock(&nfs_uuid->lock);
+ if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
+ spin_lock(&nfs_uuid_lock);
+ nfs_uuid_put_locked(nfs_uuid);
+ spin_unlock(&nfs_uuid_lock);
+ }
+ spin_unlock(&nfs_uuid->lock);
+}
+EXPORT_SYMBOL_GPL(nfs_localio_disable_client);
+
+void nfs_localio_invalidate_clients(struct list_head *cl_uuid_list)
{
nfs_uuid_t *nfs_uuid, *tmp;
spin_lock(&nfs_uuid_lock);
- list_for_each_entry_safe(nfs_uuid, tmp, list, list)
- nfs_uuid_put_locked(nfs_uuid);
+ list_for_each_entry_safe(nfs_uuid, tmp, cl_uuid_list, list) {
+ struct nfs_client *clp =
+ container_of(nfs_uuid, struct nfs_client, cl_uuid);
+
+ nfs_localio_disable_client(clp);
+ }
spin_unlock(&nfs_uuid_lock);
}
EXPORT_SYMBOL_GPL(nfs_localio_invalidate_clients);
-void nfs_localio_disable_client(nfs_uuid_t *nfs_uuid)
-{
- if (nfs_uuid->net) {
- spin_lock(&nfs_uuid_lock);
- nfs_uuid_put_locked(nfs_uuid);
- spin_unlock(&nfs_uuid_lock);
- }
-}
-EXPORT_SYMBOL_GPL(nfs_localio_disable_client);
-
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
const struct nfs_fh *nfs_fh, const fmode_t fmode)
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b804346a9741..239d86ef166c 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -132,7 +132,6 @@ struct nfs_client {
struct timespec64 cl_nfssvc_boot;
seqlock_t cl_boot_lock;
nfs_uuid_t cl_uuid;
- spinlock_t cl_localio_lock;
#endif /* CONFIG_NFS_LOCALIO */
};
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index a05d1043f2b0..4d5583873f41 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -6,6 +6,7 @@
#ifndef __LINUX_NFSLOCALIO_H
#define __LINUX_NFSLOCALIO_H
+
/* nfsd_file structure is purposely kept opaque to NFS client */
struct nfsd_file;
@@ -19,6 +20,8 @@ struct nfsd_file;
#include <linux/nfs.h>
#include <net/net_namespace.h>
+struct nfs_client;
+
/*
* Useful to allow a client to negotiate if localio
* possible with its server.
@@ -27,6 +30,8 @@ struct nfsd_file;
*/
typedef struct {
uuid_t uuid;
+ /* sadly this struct is just over a cacheline, avoid bouncing */
+ spinlock_t ____cacheline_aligned lock;
struct list_head list;
struct net __rcu *net; /* nfsd's network namespace */
struct auth_domain *dom; /* auth_domain for localio */
@@ -38,7 +43,8 @@ void nfs_uuid_end(nfs_uuid_t *);
void nfs_uuid_is_local(const uuid_t *, struct list_head *,
struct net *, struct auth_domain *, struct module *);
-void nfs_localio_disable_client(nfs_uuid_t *nfs_uuid);
+void nfs_localio_enable_client(struct nfs_client *clp);
+void nfs_localio_disable_client(struct nfs_client *clp);
void nfs_localio_invalidate_clients(struct list_head *list);
/* localio needs to map filehandle -> struct nfsd_file */
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 11/19] nfs: cache all open LOCALIO nfsd_file(s) in client
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (9 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 12/19] nfsd: update percpu_ref to manage references on nfsd_net Mike Snitzer
` (8 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
This commit switches from leaning heavily on NFSD's filecache (in
terms of GC'd nfsd_files) back to caching nfsd_files in the
client. A later commit will add the callback mechanism needed to
allow NFSD to force the NFS client to cleanup all caches files.
Add nfs_fh_localio_init() and 'struct nfs_fh_localio' to cache opened
nfsd_file(s) (both a RO and RW nfsd_file is able to be opened and
cached for a given nfs_fh).
Update nfs_local_open_fh() to cache the nfsd_file once it is opened
using __nfs_local_open_fh().
Introduce nfs_close_local_fh() to clear the cached open nfsd_files and
call nfs_to_nfsd_file_put_local().
Refcounting is such that:
- nfs_local_open_fh() is paired with nfs_close_local_fh().
- __nfs_local_open_fh() is paired with nfs_to_nfsd_file_put_local().
- nfs_local_file_get() is paired with nfs_local_file_put().
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 29 +++++----
fs/nfs/flexfilelayout/flexfilelayout.h | 1 +
fs/nfs/inode.c | 3 +
fs/nfs/internal.h | 4 +-
fs/nfs/localio.c | 89 +++++++++++++++++++++-----
fs/nfs/pagelist.c | 5 +-
fs/nfs/write.c | 3 +-
fs/nfs_common/nfslocalio.c | 52 ++++++++++++++-
include/linux/nfs_fs.h | 22 ++++++-
include/linux/nfslocalio.h | 18 +++---
10 files changed, 181 insertions(+), 45 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index f78115c6c2c1..451f168d882b 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -164,18 +164,17 @@ decode_name(struct xdr_stream *xdr, u32 *id)
}
static struct nfsd_file *
-ff_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, fmode_t mode)
{
- if (mode & FMODE_WRITE) {
- /*
- * Always request read and write access since this corresponds
- * to a rw layout.
- */
- mode |= FMODE_READ;
- }
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
- return nfs_local_open_fh(clp, cred, fh, mode);
+ return nfs_local_open_fh(clp, cred, fh, &mirror->nfl, mode);
+#else
+ return NULL;
+#endif
}
static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
@@ -247,6 +246,9 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
spin_lock_init(&mirror->lock);
refcount_set(&mirror->ref, 1);
INIT_LIST_HEAD(&mirror->mirrors);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ nfs_localio_file_init(&mirror->nfl);
+#endif
}
return mirror;
}
@@ -257,6 +259,9 @@ static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
ff_layout_remove_mirror(mirror);
kfree(mirror->fh_versions);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ nfs_close_local_fh(&mirror->nfl);
+#endif
cred = rcu_access_pointer(mirror->ro_cred);
put_cred(cred);
cred = rcu_access_pointer(mirror->rw_cred);
@@ -1820,7 +1825,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Start IO accounting for local read */
- localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, FMODE_READ);
+ localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, FMODE_READ);
if (localio) {
hdr->task.tk_start = ktime_get();
ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
@@ -1896,7 +1901,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Start IO accounting for local write */
- localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh,
+ localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
hdr->task.tk_start = ktime_get();
@@ -1981,7 +1986,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
data->args.fh = fh;
/* Start IO accounting for local commit */
- localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh,
+ localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
data->task.tk_start = ktime_get();
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index f84b3fb0dddd..095df09017a5 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -83,6 +83,7 @@ struct nfs4_ff_layout_mirror {
nfs4_stateid stateid;
const struct cred __rcu *ro_cred;
const struct cred __rcu *rw_cred;
+ struct nfs_file_localio nfl;
refcount_t ref;
spinlock_t lock;
unsigned long flags;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 596f35170137..1aa67fca69b2 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1137,6 +1137,8 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
ctx->lock_context.open_context = ctx;
INIT_LIST_HEAD(&ctx->list);
ctx->mdsthreshold = NULL;
+ nfs_localio_file_init(&ctx->nfl);
+
return ctx;
}
EXPORT_SYMBOL_GPL(alloc_nfs_open_context);
@@ -1168,6 +1170,7 @@ static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
nfs_sb_deactive(sb);
put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1));
kfree(ctx->mdsthreshold);
+ nfs_close_local_fh(&ctx->nfl);
kfree_rcu(ctx, rcu_head);
}
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 430733e3eff2..57af3ab3adbe 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -459,6 +459,7 @@ extern void nfs_local_probe(struct nfs_client *);
extern struct nfsd_file *nfs_local_open_fh(struct nfs_client *,
const struct cred *,
struct nfs_fh *,
+ struct nfs_file_localio *,
const fmode_t);
extern int nfs_local_doio(struct nfs_client *,
struct nfsd_file *,
@@ -474,7 +475,8 @@ static inline void nfs_local_disable(struct nfs_client *clp) {}
static inline void nfs_local_probe(struct nfs_client *clp) {}
static inline struct nfsd_file *
nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
- struct nfs_fh *fh, const fmode_t mode)
+ struct nfs_fh *fh, struct nfs_file_localio *nfl,
+ const fmode_t mode)
{
return NULL;
}
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 4c75ffc5efa2..d10d863aaf23 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -214,27 +214,33 @@ void nfs_local_probe(struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_local_probe);
+static inline struct nfsd_file *nfs_local_file_get(struct nfsd_file *nf)
+{
+ return nfs_to->nfsd_file_get(nf);
+}
+
+static inline void nfs_local_file_put(struct nfsd_file *nf)
+{
+ nfs_to->nfsd_file_put(nf);
+}
+
/*
- * nfs_local_open_fh - open a local filehandle in terms of nfsd_file
+ * __nfs_local_open_fh - open a local filehandle in terms of nfsd_file.
*
- * Returns a pointer to a struct nfsd_file or NULL
+ * Returns a pointer to a struct nfsd_file or ERR_PTR.
+ * Caller must release returned nfsd_file with nfs_to_nfsd_file_put_local().
*/
-struct nfsd_file *
-nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
- struct nfs_fh *fh, const fmode_t mode)
+static struct nfsd_file *
+__nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, struct nfs_file_localio *nfl,
+ const fmode_t mode)
{
struct nfsd_file *localio;
- int status;
-
- if (!nfs_server_is_local(clp))
- return NULL;
- if (mode & ~(FMODE_READ | FMODE_WRITE))
- return NULL;
localio = nfs_open_local_fh(&clp->cl_uuid, clp->cl_rpcclient,
- cred, fh, mode);
+ cred, fh, nfl, mode);
if (IS_ERR(localio)) {
- status = PTR_ERR(localio);
+ int status = PTR_ERR(localio);
trace_nfs_local_open_fh(fh, mode, status);
switch (status) {
case -ENOMEM:
@@ -243,10 +249,59 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
/* Revalidate localio, will disable if unsupported */
nfs_local_probe(clp);
}
- return NULL;
}
return localio;
}
+
+/*
+ * nfs_local_open_fh - open a local filehandle in terms of nfsd_file.
+ * First checking if the open nfsd_file is already cached, otherwise
+ * must __nfs_local_open_fh and insert the nfsd_file in nfs_file_localio.
+ *
+ * Returns a pointer to a struct nfsd_file or NULL.
+ */
+struct nfsd_file *
+nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, struct nfs_file_localio *nfl,
+ const fmode_t mode)
+{
+ struct nfsd_file *nf, *new, __rcu **pnf;
+
+ if (!nfs_server_is_local(clp))
+ return NULL;
+ if (mode & ~(FMODE_READ | FMODE_WRITE))
+ return NULL;
+
+ if (mode & FMODE_WRITE)
+ pnf = &nfl->rw_file;
+ else
+ pnf = &nfl->ro_file;
+
+ new = NULL;
+ rcu_read_lock();
+ nf = rcu_dereference(*pnf);
+ if (!nf) {
+ rcu_read_unlock();
+ new = __nfs_local_open_fh(clp, cred, fh, nfl, mode);
+ if (IS_ERR(new))
+ return NULL;
+ /* try to swap in the pointer */
+ spin_lock(&clp->cl_uuid.lock);
+ nf = rcu_dereference_protected(*pnf, 1);
+ if (!nf) {
+ nf = new;
+ new = NULL;
+ rcu_assign_pointer(*pnf, nf);
+ }
+ spin_unlock(&clp->cl_uuid.lock);
+ rcu_read_lock();
+ }
+ nf = nfs_local_file_get(nf);
+ rcu_read_unlock();
+ if (new)
+ nfs_to_nfsd_file_put_local(new);
+ return nf;
+}
EXPORT_SYMBOL_GPL(nfs_local_open_fh);
static struct bio_vec *
@@ -350,7 +405,7 @@ nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
{
struct nfs_pgio_header *hdr = iocb->hdr;
- nfs_to_nfsd_file_put_local(iocb->localio);
+ nfs_local_file_put(iocb->localio);
nfs_local_iocb_free(iocb);
nfs_local_hdr_release(hdr, hdr->task.tk_ops);
}
@@ -697,7 +752,7 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
if (status != 0) {
if (status == -EAGAIN)
nfs_local_disable(clp);
- nfs_to_nfsd_file_put_local(localio);
+ nfs_local_file_put(localio);
hdr->task.tk_status = status;
nfs_local_hdr_release(hdr, call_ops);
}
@@ -748,7 +803,7 @@ nfs_local_release_commit_data(struct nfsd_file *localio,
struct nfs_commit_data *data,
const struct rpc_call_ops *call_ops)
{
- nfs_to_nfsd_file_put_local(localio);
+ nfs_local_file_put(localio);
call_ops->rpc_call_done(&data->task, data);
call_ops->rpc_release(data);
}
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index e27c07bd8929..11968dcb7243 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -961,8 +961,9 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client;
struct nfsd_file *localio =
- nfs_local_open_fh(clp, hdr->cred,
- hdr->args.fh, hdr->args.context->mode);
+ nfs_local_open_fh(clp, hdr->cred, hdr->args.fh,
+ &hdr->args.context->nfl,
+ hdr->args.context->mode);
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ead2dc55952d..8d4dbb69b7c0 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1815,7 +1815,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
task_flags = RPC_TASK_MOVEABLE;
localio = nfs_local_open_fh(NFS_SERVER(inode)->nfs_client, data->cred,
- data->args.fh, data->context->mode);
+ data->args.fh, &data->context->nfl,
+ data->context->mode);
return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how,
RPC_TASK_CRED_NOREF | task_flags, localio);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index cf2f47ea4f8d..345f3c55aa9c 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -9,7 +9,7 @@
#include <linux/nfslocalio.h>
#include <linux/nfs3.h>
#include <linux/nfs4.h>
-#include <linux/nfs_fs_sb.h>
+#include <linux/nfs_fs.h>
#include <net/netns/generic.h>
MODULE_LICENSE("GPL");
@@ -151,9 +151,18 @@ void nfs_localio_invalidate_clients(struct list_head *cl_uuid_list)
}
EXPORT_SYMBOL_GPL(nfs_localio_invalidate_clients);
+static void nfs_uuid_add_file(nfs_uuid_t *nfs_uuid, struct nfs_file_localio *nfl)
+{
+ spin_lock(&nfs_uuid_lock);
+ if (!nfl->nfs_uuid)
+ rcu_assign_pointer(nfl->nfs_uuid, nfs_uuid);
+ spin_unlock(&nfs_uuid_lock);
+}
+
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
- const struct nfs_fh *nfs_fh, const fmode_t fmode)
+ const struct nfs_fh *nfs_fh, struct nfs_file_localio *nfl,
+ const fmode_t fmode)
{
struct net *net;
struct nfsd_file *localio;
@@ -180,11 +189,50 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
cred, nfs_fh, fmode);
if (IS_ERR(localio))
nfs_to_nfsd_net_put(net);
+ else
+ nfs_uuid_add_file(uuid, nfl);
return localio;
}
EXPORT_SYMBOL_GPL(nfs_open_local_fh);
+void nfs_close_local_fh(struct nfs_file_localio *nfl)
+{
+ struct nfsd_file *ro_nf = NULL;
+ struct nfsd_file *rw_nf = NULL;
+ nfs_uuid_t *nfs_uuid;
+
+ rcu_read_lock();
+ nfs_uuid = rcu_dereference(nfl->nfs_uuid);
+ if (!nfs_uuid) {
+ /* regular (non-LOCALIO) NFS will hammer this */
+ rcu_read_unlock();
+ return;
+ }
+
+ ro_nf = rcu_access_pointer(nfl->ro_file);
+ rw_nf = rcu_access_pointer(nfl->rw_file);
+ if (ro_nf || rw_nf) {
+ spin_lock(&nfs_uuid_lock);
+ if (ro_nf)
+ ro_nf = rcu_dereference_protected(xchg(&nfl->ro_file, NULL), 1);
+ if (rw_nf)
+ rw_nf = rcu_dereference_protected(xchg(&nfl->rw_file, NULL), 1);
+
+ rcu_assign_pointer(nfl->nfs_uuid, NULL);
+ spin_unlock(&nfs_uuid_lock);
+ rcu_read_unlock();
+
+ if (ro_nf)
+ nfs_to_nfsd_file_put_local(ro_nf);
+ if (rw_nf)
+ nfs_to_nfsd_file_put_local(rw_nf);
+ return;
+ }
+ rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(nfs_close_local_fh);
+
/*
* The NFS LOCALIO code needs to call into NFSD using various symbols,
* but cannot be statically linked, because that will make the NFS
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 039898d70954..67ae2c3f41d2 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -77,6 +77,23 @@ struct nfs_lock_context {
struct rcu_head rcu_head;
};
+struct nfs_file_localio {
+ struct nfsd_file __rcu *ro_file;
+ struct nfsd_file __rcu *rw_file;
+ struct list_head list;
+ void __rcu *nfs_uuid; /* opaque pointer to 'nfs_uuid_t' */
+};
+
+static inline void nfs_localio_file_init(struct nfs_file_localio *nfl)
+{
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ nfl->ro_file = NULL;
+ nfl->rw_file = NULL;
+ INIT_LIST_HEAD(&nfl->list);
+ nfl->nfs_uuid = NULL;
+#endif
+}
+
struct nfs4_state;
struct nfs_open_context {
struct nfs_lock_context lock_context;
@@ -87,15 +104,16 @@ struct nfs_open_context {
struct nfs4_state *state;
fmode_t mode;
+ int error;
unsigned long flags;
#define NFS_CONTEXT_BAD (2)
#define NFS_CONTEXT_UNLOCK (3)
#define NFS_CONTEXT_FILE_OPEN (4)
- int error;
- struct list_head list;
struct nfs4_threshold *mdsthreshold;
+ struct list_head list;
struct rcu_head rcu_head;
+ struct nfs_file_localio nfl;
};
struct nfs_open_dir_context {
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 4d5583873f41..7cfc6720ed26 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -6,10 +6,6 @@
#ifndef __LINUX_NFSLOCALIO_H
#define __LINUX_NFSLOCALIO_H
-
-/* nfsd_file structure is purposely kept opaque to NFS client */
-struct nfsd_file;
-
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
#include <linux/module.h>
@@ -21,6 +17,7 @@ struct nfsd_file;
#include <net/net_namespace.h>
struct nfs_client;
+struct nfs_file_localio;
/*
* Useful to allow a client to negotiate if localio
@@ -52,6 +49,7 @@ extern struct nfsd_file *
nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
const struct cred *, const struct nfs_fh *,
const fmode_t) __must_hold(rcu);
+void nfs_close_local_fh(struct nfs_file_localio *);
struct nfsd_localio_operations {
bool (*nfsd_serv_try_get)(struct net *);
@@ -73,7 +71,8 @@ extern const struct nfsd_localio_operations *nfs_to;
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *,
struct rpc_clnt *, const struct cred *,
- const struct nfs_fh *, const fmode_t);
+ const struct nfs_fh *, struct nfs_file_localio *,
+ const fmode_t);
static inline void nfs_to_nfsd_net_put(struct net *net)
{
@@ -100,12 +99,15 @@ static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
}
#else /* CONFIG_NFS_LOCALIO */
+
+struct nfs_file_localio;
+static inline void nfs_close_local_fh(struct nfs_file_localio *nfl)
+{
+}
static inline void nfsd_localio_ops_init(void)
{
}
-static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
-{
-}
+
#endif /* CONFIG_NFS_LOCALIO */
#endif /* __LINUX_NFSLOCALIO_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 12/19] nfsd: update percpu_ref to manage references on nfsd_net
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (10 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 11/19] nfs: cache all open LOCALIO nfsd_file(s) in client Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 13/19] nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_ Mike Snitzer
` (7 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Holding a reference on nfsd_net is what is required, it was never
actually about ensuring nn->nfsd_serv available.
Move waiting for outstanding percpu references from
nfsd_destroy_serv() to nfsd_shutdown_net().
By moving it later it will be possible to invalidate localio clients
during nfsd_file_cache_shutdown_net() via __nfsd_file_cache_purge().
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/nfssvc.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 49e2f32102ab..6ca554042426 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -436,6 +436,10 @@ static void nfsd_shutdown_net(struct net *net)
if (!nn->nfsd_net_up)
return;
+
+ percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done);
+ wait_for_completion(&nn->nfsd_serv_confirm_done);
+
nfsd_export_flush(net);
nfs4_state_shutdown_net(net);
nfsd_reply_cache_shutdown(nn);
@@ -444,7 +448,10 @@ static void nfsd_shutdown_net(struct net *net)
lockd_down(net);
nn->lockd_up = false;
}
+
+ wait_for_completion(&nn->nfsd_serv_free_done);
percpu_ref_exit(&nn->nfsd_serv_ref);
+
nn->nfsd_net_up = false;
nfsd_shutdown_generic();
}
@@ -526,11 +533,6 @@ void nfsd_destroy_serv(struct net *net)
lockdep_assert_held(&nfsd_mutex);
- percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done);
- wait_for_completion(&nn->nfsd_serv_confirm_done);
- wait_for_completion(&nn->nfsd_serv_free_done);
- /* percpu_ref_exit is called in nfsd_shutdown_net */
-
spin_lock(&nfsd_notifier_lock);
nn->nfsd_serv = NULL;
spin_unlock(&nfsd_notifier_lock);
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 13/19] nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (11 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 12/19] nfsd: update percpu_ref to manage references on nfsd_net Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 14/19] nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file Mike Snitzer
` (6 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs_common/nfslocalio.c | 10 +++++++---
fs/nfsd/filecache.c | 2 +-
fs/nfsd/localio.c | 4 ++--
fs/nfsd/netns.h | 11 ++++++-----
fs/nfsd/nfssvc.c | 34 +++++++++++++++++-----------------
include/linux/nfslocalio.h | 12 ++++++------
6 files changed, 39 insertions(+), 34 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 345f3c55aa9c..0935bdcaa940 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -159,6 +159,10 @@ static void nfs_uuid_add_file(nfs_uuid_t *nfs_uuid, struct nfs_file_localio *nfl
spin_unlock(&nfs_uuid_lock);
}
+/*
+ * Caller is responsible for calling nfsd_net_put and
+ * nfsd_file_put (via nfs_to_nfsd_file_put_local).
+ */
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
const struct nfs_fh *nfs_fh, struct nfs_file_localio *nfl,
@@ -171,7 +175,7 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
* Not running in nfsd context, so must safely get reference on nfsd_serv.
* But the server may already be shutting down, if so disallow new localio.
* uuid->net is NOT a counted reference, but rcu_read_lock() ensures that
- * if uuid->net is not NULL, then calling nfsd_serv_try_get() is safe
+ * if uuid->net is not NULL, then calling nfsd_net_try_get() is safe
* and if it succeeds we will have an implied reference to the net.
*
* Otherwise NFS may not have ref on NFSD and therefore cannot safely
@@ -179,12 +183,12 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
*/
rcu_read_lock();
net = rcu_dereference(uuid->net);
- if (!net || !nfs_to->nfsd_serv_try_get(net)) {
+ if (!net || !nfs_to->nfsd_net_try_get(net)) {
rcu_read_unlock();
return ERR_PTR(-ENXIO);
}
rcu_read_unlock();
- /* We have an implied reference to net thanks to nfsd_serv_try_get */
+ /* We have an implied reference to net thanks to nfsd_net_try_get */
localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt,
cred, nfs_fh, fmode);
if (IS_ERR(localio))
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 9a62b4da89bb..fac98b2cb463 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -391,7 +391,7 @@ nfsd_file_put(struct nfsd_file *nf)
}
/**
- * nfsd_file_put_local - put nfsd_file reference and arm nfsd_serv_put in caller
+ * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
* @nf: nfsd_file of which to put the reference
*
* First save the associated net to return to caller, then put
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 8beda4c85111..f9a91cd3b5ec 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -25,8 +25,8 @@
#include "cache.h"
static const struct nfsd_localio_operations nfsd_localio_ops = {
- .nfsd_serv_try_get = nfsd_serv_try_get,
- .nfsd_serv_put = nfsd_serv_put,
+ .nfsd_net_try_get = nfsd_net_try_get,
+ .nfsd_net_put = nfsd_net_put,
.nfsd_open_local_fh = nfsd_open_local_fh,
.nfsd_file_put_local = nfsd_file_put_local,
.nfsd_file_get = nfsd_file_get,
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 26f7b34d1a03..8faef59d7122 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -140,9 +140,10 @@ struct nfsd_net {
struct svc_info nfsd_info;
#define nfsd_serv nfsd_info.serv
- struct percpu_ref nfsd_serv_ref;
- struct completion nfsd_serv_confirm_done;
- struct completion nfsd_serv_free_done;
+
+ struct percpu_ref nfsd_net_ref;
+ struct completion nfsd_net_confirm_done;
+ struct completion nfsd_net_free_done;
/*
* clientid and stateid data for construction of net unique COPY
@@ -229,8 +230,8 @@ struct nfsd_net {
extern bool nfsd_support_version(int vers);
extern unsigned int nfsd_net_id;
-bool nfsd_serv_try_get(struct net *net);
-void nfsd_serv_put(struct net *net);
+bool nfsd_net_try_get(struct net *net);
+void nfsd_net_put(struct net *net);
void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
void nfsd_reset_write_verifier(struct nfsd_net *nn);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 6ca554042426..e937e2d0ce62 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -214,32 +214,32 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
return 0;
}
-bool nfsd_serv_try_get(struct net *net) __must_hold(rcu)
+bool nfsd_net_try_get(struct net *net) __must_hold(rcu)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
- return (nn && percpu_ref_tryget_live(&nn->nfsd_serv_ref));
+ return (nn && percpu_ref_tryget_live(&nn->nfsd_net_ref));
}
-void nfsd_serv_put(struct net *net) __must_hold(rcu)
+void nfsd_net_put(struct net *net) __must_hold(rcu)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
- percpu_ref_put(&nn->nfsd_serv_ref);
+ percpu_ref_put(&nn->nfsd_net_ref);
}
-static void nfsd_serv_done(struct percpu_ref *ref)
+static void nfsd_net_done(struct percpu_ref *ref)
{
- struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
+ struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_net_ref);
- complete(&nn->nfsd_serv_confirm_done);
+ complete(&nn->nfsd_net_confirm_done);
}
-static void nfsd_serv_free(struct percpu_ref *ref)
+static void nfsd_net_free(struct percpu_ref *ref)
{
- struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
+ struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_net_ref);
- complete(&nn->nfsd_serv_free_done);
+ complete(&nn->nfsd_net_free_done);
}
/*
@@ -437,8 +437,8 @@ static void nfsd_shutdown_net(struct net *net)
if (!nn->nfsd_net_up)
return;
- percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done);
- wait_for_completion(&nn->nfsd_serv_confirm_done);
+ percpu_ref_kill_and_confirm(&nn->nfsd_net_ref, nfsd_net_done);
+ wait_for_completion(&nn->nfsd_net_confirm_done);
nfsd_export_flush(net);
nfs4_state_shutdown_net(net);
@@ -449,8 +449,8 @@ static void nfsd_shutdown_net(struct net *net)
nn->lockd_up = false;
}
- wait_for_completion(&nn->nfsd_serv_free_done);
- percpu_ref_exit(&nn->nfsd_serv_ref);
+ wait_for_completion(&nn->nfsd_net_free_done);
+ percpu_ref_exit(&nn->nfsd_net_ref);
nn->nfsd_net_up = false;
nfsd_shutdown_generic();
@@ -654,12 +654,12 @@ int nfsd_create_serv(struct net *net)
if (nn->nfsd_serv)
return 0;
- error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free,
+ error = percpu_ref_init(&nn->nfsd_net_ref, nfsd_net_free,
0, GFP_KERNEL);
if (error)
return error;
- init_completion(&nn->nfsd_serv_free_done);
- init_completion(&nn->nfsd_serv_confirm_done);
+ init_completion(&nn->nfsd_net_free_done);
+ init_completion(&nn->nfsd_net_confirm_done);
if (nfsd_max_blksize == 0)
nfsd_max_blksize = nfsd_get_default_max_blksize();
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 7cfc6720ed26..aa2b5c6561ab 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -52,8 +52,8 @@ nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
void nfs_close_local_fh(struct nfs_file_localio *);
struct nfsd_localio_operations {
- bool (*nfsd_serv_try_get)(struct net *);
- void (*nfsd_serv_put)(struct net *);
+ bool (*nfsd_net_try_get)(struct net *);
+ void (*nfsd_net_put)(struct net *);
struct nfsd_file *(*nfsd_open_local_fh)(struct net *,
struct auth_domain *,
struct rpc_clnt *,
@@ -77,12 +77,12 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *,
static inline void nfs_to_nfsd_net_put(struct net *net)
{
/*
- * Once reference to nfsd_serv is dropped, NFSD could be
- * unloaded, so ensure safe return from nfsd_file_put_local()
- * by always taking RCU.
+ * Once reference to net (and associated nfsd_serv) is dropped, NFSD
+ * could be unloaded, so ensure safe return from nfsd_net_put() by
+ * always taking RCU.
*/
rcu_read_lock();
- nfs_to->nfsd_serv_put(net);
+ nfs_to->nfsd_net_put(net);
rcu_read_unlock();
}
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 14/19] nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (12 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 13/19] nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_ Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 15/19] nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock Mike Snitzer
` (5 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Now that LOCALIO no longer leans on NFSD's filecache for caching open
files (and instead uses NFS client-side open nfsd_file caching) there
is no need to use NFSD filecache's GC feature. Avoiding GC will speed
up nfsd_file initial opens.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/filecache.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index fac98b2cb463..ab9942e42054 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -1225,10 +1225,9 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
* a file. The security implications of this should be carefully
* considered before use.
*
- * The nfsd_file object returned by this API is reference-counted
- * and garbage-collected. The object is retained for a few
- * seconds after the final nfsd_file_put() in case the caller
- * wants to re-use it.
+ * The nfsd_file_object returned by this API is reference-counted
+ * but not garbage-collected. The object is unhashed after the
+ * final nfsd_file_put().
*
* Return values:
* %nfs_ok - @pnf points to an nfsd_file with its reference
@@ -1250,7 +1249,7 @@ nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
__be32 beres;
beres = nfsd_file_do_acquire(NULL, net, cred, client,
- fhp, may_flags, NULL, pnf, true);
+ fhp, may_flags, NULL, pnf, false);
revert_creds(save_cred);
return beres;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 15/19] nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (13 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 14/19] nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 16/19] nfs_common: track all open nfsd_files per LOCALIO nfs_client Mike Snitzer
` (4 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
This global spinlock protects all nfs_uuid_t relative to the global
nfs_uuids list. A later commit will split this global spinlock so
prepare by renaming this lock to reflect its intended narrow scope.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
fs/nfs_common/nfslocalio.c | 34 +++++++++++++++++-----------------
fs/nfsd/localio.c | 2 +-
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 0935bdcaa940..e58b5b4b4c3a 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -15,11 +15,11 @@
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("NFS localio protocol bypass support");
-static DEFINE_SPINLOCK(nfs_uuid_lock);
+static DEFINE_SPINLOCK(nfs_uuids_lock);
/*
* Global list of nfs_uuid_t instances
- * that is protected by nfs_uuid_lock.
+ * that is protected by nfs_uuids_lock.
*/
static LIST_HEAD(nfs_uuids);
@@ -34,15 +34,15 @@ EXPORT_SYMBOL_GPL(nfs_uuid_init);
bool nfs_uuid_begin(nfs_uuid_t *nfs_uuid)
{
- spin_lock(&nfs_uuid_lock);
+ spin_lock(&nfs_uuids_lock);
/* Is this nfs_uuid already in use? */
if (!list_empty(&nfs_uuid->list)) {
- spin_unlock(&nfs_uuid_lock);
+ spin_unlock(&nfs_uuids_lock);
return false;
}
uuid_gen(&nfs_uuid->uuid);
list_add_tail(&nfs_uuid->list, &nfs_uuids);
- spin_unlock(&nfs_uuid_lock);
+ spin_unlock(&nfs_uuids_lock);
return true;
}
@@ -51,10 +51,10 @@ EXPORT_SYMBOL_GPL(nfs_uuid_begin);
void nfs_uuid_end(nfs_uuid_t *nfs_uuid)
{
if (nfs_uuid->net == NULL) {
- spin_lock(&nfs_uuid_lock);
+ spin_lock(&nfs_uuids_lock);
if (nfs_uuid->net == NULL)
list_del_init(&nfs_uuid->list);
- spin_unlock(&nfs_uuid_lock);
+ spin_unlock(&nfs_uuids_lock);
}
}
EXPORT_SYMBOL_GPL(nfs_uuid_end);
@@ -78,7 +78,7 @@ void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
{
nfs_uuid_t *nfs_uuid;
- spin_lock(&nfs_uuid_lock);
+ spin_lock(&nfs_uuids_lock);
nfs_uuid = nfs_uuid_lookup_locked(uuid);
if (nfs_uuid) {
kref_get(&dom->ref);
@@ -94,7 +94,7 @@ void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
__module_get(mod);
nfsd_mod = mod;
}
- spin_unlock(&nfs_uuid_lock);
+ spin_unlock(&nfs_uuids_lock);
}
EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
@@ -128,9 +128,9 @@ void nfs_localio_disable_client(struct nfs_client *clp)
spin_lock(&nfs_uuid->lock);
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
- spin_lock(&nfs_uuid_lock);
+ spin_lock(&nfs_uuids_lock);
nfs_uuid_put_locked(nfs_uuid);
- spin_unlock(&nfs_uuid_lock);
+ spin_unlock(&nfs_uuids_lock);
}
spin_unlock(&nfs_uuid->lock);
}
@@ -140,23 +140,23 @@ void nfs_localio_invalidate_clients(struct list_head *cl_uuid_list)
{
nfs_uuid_t *nfs_uuid, *tmp;
- spin_lock(&nfs_uuid_lock);
+ spin_lock(&nfs_uuids_lock);
list_for_each_entry_safe(nfs_uuid, tmp, cl_uuid_list, list) {
struct nfs_client *clp =
container_of(nfs_uuid, struct nfs_client, cl_uuid);
nfs_localio_disable_client(clp);
}
- spin_unlock(&nfs_uuid_lock);
+ spin_unlock(&nfs_uuids_lock);
}
EXPORT_SYMBOL_GPL(nfs_localio_invalidate_clients);
static void nfs_uuid_add_file(nfs_uuid_t *nfs_uuid, struct nfs_file_localio *nfl)
{
- spin_lock(&nfs_uuid_lock);
+ spin_lock(&nfs_uuids_lock);
if (!nfl->nfs_uuid)
rcu_assign_pointer(nfl->nfs_uuid, nfs_uuid);
- spin_unlock(&nfs_uuid_lock);
+ spin_unlock(&nfs_uuids_lock);
}
/*
@@ -217,14 +217,14 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
ro_nf = rcu_access_pointer(nfl->ro_file);
rw_nf = rcu_access_pointer(nfl->rw_file);
if (ro_nf || rw_nf) {
- spin_lock(&nfs_uuid_lock);
+ spin_lock(&nfs_uuids_lock);
if (ro_nf)
ro_nf = rcu_dereference_protected(xchg(&nfl->ro_file, NULL), 1);
if (rw_nf)
rw_nf = rcu_dereference_protected(xchg(&nfl->rw_file, NULL), 1);
rcu_assign_pointer(nfl->nfs_uuid, NULL);
- spin_unlock(&nfs_uuid_lock);
+ spin_unlock(&nfs_uuids_lock);
rcu_read_unlock();
if (ro_nf)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index f9a91cd3b5ec..2ae07161b919 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -54,7 +54,7 @@ void nfsd_localio_ops_init(void)
* avoid all the NFS overhead with reads, writes and commits.
*
* On successful return, returned nfsd_file will have its nf_net member
- * set. Caller (NFS client) is responsible for calling nfsd_serv_put and
+ * set. Caller (NFS client) is responsible for calling nfsd_net_put and
* nfsd_file_put (via nfs_to_nfsd_file_put_local).
*/
struct nfsd_file *
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 16/19] nfs_common: track all open nfsd_files per LOCALIO nfs_client
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (14 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 15/19] nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock Mike Snitzer
@ 2024-11-08 23:39 ` Mike Snitzer
2024-11-08 23:40 ` [for-6.13 PATCH 17/19] nfs_common: add nfs_localio trace events Mike Snitzer
` (3 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:39 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
This tracking enables __nfsd_file_cache_purge() to call
nfs_localio_invalidate_clients(), upon shutdown or export change, to
nfs_close_local_fh() all open nfsd_files that are still cached by the
LOCALIO nfs clients associated with nfsd_net that is being shutdown.
Now that the client must track all open nfsd_files there was more work
than necessary being done with the global nfs_uuids_lock contended.
This manifested in various RCU issues, e.g.:
hrtimer: interrupt took 47969440 ns
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Use nfs_uuid->lock to protect all nfs_uuid_t members, instead of
nfs_uuids_lock, once nfs_uuid_is_local() adds the client to
nn->local_clients.
Also add 'local_clients_lock' to 'struct nfsd_net' to protect
nn->local_clients. And store a pointer to spinlock in the 'list_lock'
member of nfs_uuid_t so nfs_localio_disable_client() can use it to
avoid taking the global nfs_uuids_lock.
In combination, these split out locks eliminate the use of the single
nfslocalio.c global nfs_uuids_lock in the IO paths (open and close).
Also refactored associated fs/nfs_common/nfslocalio.c methods' locking
to reduce work performed with spinlocks held in general.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs_common/nfslocalio.c | 166 +++++++++++++++++++++++++++----------
fs/nfsd/filecache.c | 9 ++
fs/nfsd/localio.c | 1 +
fs/nfsd/netns.h | 1 +
fs/nfsd/nfsctl.c | 4 +-
include/linux/nfslocalio.h | 8 +-
6 files changed, 143 insertions(+), 46 deletions(-)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index e58b5b4b4c3a..c8d18f671bcb 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -23,27 +23,49 @@ static DEFINE_SPINLOCK(nfs_uuids_lock);
*/
static LIST_HEAD(nfs_uuids);
+/*
+ * Lock ordering:
+ * 1: nfs_uuid->lock
+ * 2: nfs_uuids_lock
+ * 3: nfs_uuid->list_lock (aka nn->local_clients_lock)
+ *
+ * May skip locks in select cases, but never hold multiple
+ * locks out of order.
+ */
+
void nfs_uuid_init(nfs_uuid_t *nfs_uuid)
{
nfs_uuid->net = NULL;
nfs_uuid->dom = NULL;
+ nfs_uuid->list_lock = NULL;
INIT_LIST_HEAD(&nfs_uuid->list);
+ INIT_LIST_HEAD(&nfs_uuid->files);
spin_lock_init(&nfs_uuid->lock);
}
EXPORT_SYMBOL_GPL(nfs_uuid_init);
bool nfs_uuid_begin(nfs_uuid_t *nfs_uuid)
{
+ spin_lock(&nfs_uuid->lock);
+ if (nfs_uuid->net) {
+ /* This nfs_uuid is already in use */
+ spin_unlock(&nfs_uuid->lock);
+ return false;
+ }
+
spin_lock(&nfs_uuids_lock);
- /* Is this nfs_uuid already in use? */
if (!list_empty(&nfs_uuid->list)) {
+ /* This nfs_uuid is already in use */
spin_unlock(&nfs_uuids_lock);
+ spin_unlock(&nfs_uuid->lock);
return false;
}
- uuid_gen(&nfs_uuid->uuid);
list_add_tail(&nfs_uuid->list, &nfs_uuids);
spin_unlock(&nfs_uuids_lock);
+ uuid_gen(&nfs_uuid->uuid);
+ spin_unlock(&nfs_uuid->lock);
+
return true;
}
EXPORT_SYMBOL_GPL(nfs_uuid_begin);
@@ -51,11 +73,15 @@ EXPORT_SYMBOL_GPL(nfs_uuid_begin);
void nfs_uuid_end(nfs_uuid_t *nfs_uuid)
{
if (nfs_uuid->net == NULL) {
- spin_lock(&nfs_uuids_lock);
- if (nfs_uuid->net == NULL)
+ spin_lock(&nfs_uuid->lock);
+ if (nfs_uuid->net == NULL) {
+ /* Not local, remove from nfs_uuids */
+ spin_lock(&nfs_uuids_lock);
list_del_init(&nfs_uuid->list);
- spin_unlock(&nfs_uuids_lock);
- }
+ spin_unlock(&nfs_uuids_lock);
+ }
+ spin_unlock(&nfs_uuid->lock);
+ }
}
EXPORT_SYMBOL_GPL(nfs_uuid_end);
@@ -73,28 +99,39 @@ static nfs_uuid_t * nfs_uuid_lookup_locked(const uuid_t *uuid)
static struct module *nfsd_mod;
void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
- struct net *net, struct auth_domain *dom,
- struct module *mod)
+ spinlock_t *list_lock, struct net *net,
+ struct auth_domain *dom, struct module *mod)
{
nfs_uuid_t *nfs_uuid;
spin_lock(&nfs_uuids_lock);
nfs_uuid = nfs_uuid_lookup_locked(uuid);
- if (nfs_uuid) {
- kref_get(&dom->ref);
- nfs_uuid->dom = dom;
- /*
- * We don't hold a ref on the net, but instead put
- * ourselves on a list so the net pointer can be
- * invalidated.
- */
- list_move(&nfs_uuid->list, list);
- rcu_assign_pointer(nfs_uuid->net, net);
-
- __module_get(mod);
- nfsd_mod = mod;
+ if (!nfs_uuid) {
+ spin_unlock(&nfs_uuids_lock);
+ return;
}
+
+ /*
+ * We don't hold a ref on the net, but instead put
+ * ourselves on @list (nn->local_clients) so the net
+ * pointer can be invalidated.
+ */
+ spin_lock(list_lock); /* list_lock is nn->local_clients_lock */
+ list_move(&nfs_uuid->list, list);
+ spin_unlock(list_lock);
+
spin_unlock(&nfs_uuids_lock);
+ /* Once nfs_uuid is parented to @list, avoid global nfs_uuids_lock */
+ spin_lock(&nfs_uuid->lock);
+
+ __module_get(mod);
+ nfsd_mod = mod;
+
+ nfs_uuid->list_lock = list_lock;
+ kref_get(&dom->ref);
+ nfs_uuid->dom = dom;
+ rcu_assign_pointer(nfs_uuid->net, net);
+ spin_unlock(&nfs_uuid->lock);
}
EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
@@ -108,55 +145,96 @@ void nfs_localio_enable_client(struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_localio_enable_client);
-static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
+/*
+ * Cleanup the nfs_uuid_t embedded in an nfs_client.
+ * This is the long-form of nfs_uuid_init().
+ */
+static void nfs_uuid_put(nfs_uuid_t *nfs_uuid)
{
- if (!nfs_uuid->net)
+ LIST_HEAD(local_files);
+ struct nfs_file_localio *nfl, *tmp;
+
+ spin_lock(&nfs_uuid->lock);
+ if (unlikely(!nfs_uuid->net)) {
+ spin_unlock(&nfs_uuid->lock);
return;
- module_put(nfsd_mod);
+ }
rcu_assign_pointer(nfs_uuid->net, NULL);
if (nfs_uuid->dom) {
auth_domain_put(nfs_uuid->dom);
nfs_uuid->dom = NULL;
}
- list_del_init(&nfs_uuid->list);
+
+ list_splice_init(&nfs_uuid->files, &local_files);
+ spin_unlock(&nfs_uuid->lock);
+
+ /* Walk list of files and ensure their last references dropped */
+ list_for_each_entry_safe(nfl, tmp, &local_files, list) {
+ nfs_close_local_fh(nfl);
+ cond_resched();
+ }
+
+ spin_lock(&nfs_uuid->lock);
+ BUG_ON(!list_empty(&nfs_uuid->files));
+
+ /* Remove client from nn->local_clients */
+ if (nfs_uuid->list_lock) {
+ spin_lock(nfs_uuid->list_lock);
+ BUG_ON(list_empty(&nfs_uuid->list));
+ list_del_init(&nfs_uuid->list);
+ spin_unlock(nfs_uuid->list_lock);
+ nfs_uuid->list_lock = NULL;
+ }
+
+ module_put(nfsd_mod);
+ spin_unlock(&nfs_uuid->lock);
}
void nfs_localio_disable_client(struct nfs_client *clp)
{
- nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
+ nfs_uuid_t *nfs_uuid = NULL;
- spin_lock(&nfs_uuid->lock);
+ spin_lock(&clp->cl_uuid.lock); /* aka &nfs_uuid->lock */
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
- spin_lock(&nfs_uuids_lock);
- nfs_uuid_put_locked(nfs_uuid);
- spin_unlock(&nfs_uuids_lock);
+ /* &clp->cl_uuid is always not NULL, using as bool here */
+ nfs_uuid = &clp->cl_uuid;
}
- spin_unlock(&nfs_uuid->lock);
+ spin_unlock(&clp->cl_uuid.lock);
+
+ if (nfs_uuid)
+ nfs_uuid_put(nfs_uuid);
}
EXPORT_SYMBOL_GPL(nfs_localio_disable_client);
-void nfs_localio_invalidate_clients(struct list_head *cl_uuid_list)
+void nfs_localio_invalidate_clients(struct list_head *nn_local_clients,
+ spinlock_t *nn_local_clients_lock)
{
+ LIST_HEAD(local_clients);
nfs_uuid_t *nfs_uuid, *tmp;
+ struct nfs_client *clp;
- spin_lock(&nfs_uuids_lock);
- list_for_each_entry_safe(nfs_uuid, tmp, cl_uuid_list, list) {
- struct nfs_client *clp =
- container_of(nfs_uuid, struct nfs_client, cl_uuid);
-
+ spin_lock(nn_local_clients_lock);
+ list_splice_init(nn_local_clients, &local_clients);
+ spin_unlock(nn_local_clients_lock);
+ list_for_each_entry_safe(nfs_uuid, tmp, &local_clients, list) {
+ if (WARN_ON(nfs_uuid->list_lock != nn_local_clients_lock))
+ break;
+ clp = container_of(nfs_uuid, struct nfs_client, cl_uuid);
nfs_localio_disable_client(clp);
}
- spin_unlock(&nfs_uuids_lock);
}
EXPORT_SYMBOL_GPL(nfs_localio_invalidate_clients);
static void nfs_uuid_add_file(nfs_uuid_t *nfs_uuid, struct nfs_file_localio *nfl)
{
- spin_lock(&nfs_uuids_lock);
- if (!nfl->nfs_uuid)
+ /* Add nfl to nfs_uuid->files if it isn't already */
+ spin_lock(&nfs_uuid->lock);
+ if (list_empty(&nfl->list)) {
rcu_assign_pointer(nfl->nfs_uuid, nfs_uuid);
- spin_unlock(&nfs_uuids_lock);
+ list_add_tail(&nfl->list, &nfs_uuid->files);
+ }
+ spin_unlock(&nfs_uuid->lock);
}
/*
@@ -217,14 +295,16 @@ void nfs_close_local_fh(struct nfs_file_localio *nfl)
ro_nf = rcu_access_pointer(nfl->ro_file);
rw_nf = rcu_access_pointer(nfl->rw_file);
if (ro_nf || rw_nf) {
- spin_lock(&nfs_uuids_lock);
+ spin_lock(&nfs_uuid->lock);
if (ro_nf)
ro_nf = rcu_dereference_protected(xchg(&nfl->ro_file, NULL), 1);
if (rw_nf)
rw_nf = rcu_dereference_protected(xchg(&nfl->rw_file, NULL), 1);
+ /* Remove nfl from nfs_uuid->files list */
rcu_assign_pointer(nfl->nfs_uuid, NULL);
- spin_unlock(&nfs_uuids_lock);
+ list_del_init(&nfl->list);
+ spin_unlock(&nfs_uuid->lock);
rcu_read_unlock();
if (ro_nf)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ab9942e42054..c9ab64e3732c 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -39,6 +39,7 @@
#include <linux/fsnotify.h>
#include <linux/seq_file.h>
#include <linux/rhashtable.h>
+#include <linux/nfslocalio.h>
#include "vfs.h"
#include "nfsd.h"
@@ -836,6 +837,14 @@ __nfsd_file_cache_purge(struct net *net)
struct nfsd_file *nf;
LIST_HEAD(dispose);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ if (net) {
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ nfs_localio_invalidate_clients(&nn->local_clients,
+ &nn->local_clients_lock);
+ }
+#endif
+
rhltable_walk_enter(&nfsd_file_rhltable, &iter);
do {
rhashtable_walk_start(&iter);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 2ae07161b919..238647fa379e 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -116,6 +116,7 @@ static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp)
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
nfs_uuid_is_local(&argp->uuid, &nn->local_clients,
+ &nn->local_clients_lock,
net, rqstp->rq_client, THIS_MODULE);
return rpc_success;
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 8faef59d7122..187c4140b191 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -220,6 +220,7 @@ struct nfsd_net {
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
/* Local clients to be invalidated when net is shut down */
+ spinlock_t local_clients_lock;
struct list_head local_clients;
#endif
};
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 727904d8a4d0..70347b0ecdc4 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2259,6 +2259,7 @@ static __net_init int nfsd_net_init(struct net *net)
seqlock_init(&nn->writeverf_lock);
nfsd_proc_stat_init(net);
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ spin_lock_init(&nn->local_clients_lock);
INIT_LIST_HEAD(&nn->local_clients);
#endif
return 0;
@@ -2283,7 +2284,8 @@ static __net_exit void nfsd_net_pre_exit(struct net *net)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
- nfs_localio_invalidate_clients(&nn->local_clients);
+ nfs_localio_invalidate_clients(&nn->local_clients,
+ &nn->local_clients_lock);
}
#endif
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index aa2b5c6561ab..c68a529230c1 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -30,19 +30,23 @@ typedef struct {
/* sadly this struct is just over a cacheline, avoid bouncing */
spinlock_t ____cacheline_aligned lock;
struct list_head list;
+ spinlock_t *list_lock; /* nn->local_clients_lock */
struct net __rcu *net; /* nfsd's network namespace */
struct auth_domain *dom; /* auth_domain for localio */
+ /* Local files to close when net is shut down or exports change */
+ struct list_head files;
} nfs_uuid_t;
void nfs_uuid_init(nfs_uuid_t *);
bool nfs_uuid_begin(nfs_uuid_t *);
void nfs_uuid_end(nfs_uuid_t *);
-void nfs_uuid_is_local(const uuid_t *, struct list_head *,
+void nfs_uuid_is_local(const uuid_t *, struct list_head *, spinlock_t *,
struct net *, struct auth_domain *, struct module *);
void nfs_localio_enable_client(struct nfs_client *clp);
void nfs_localio_disable_client(struct nfs_client *clp);
-void nfs_localio_invalidate_clients(struct list_head *list);
+void nfs_localio_invalidate_clients(struct list_head *nn_local_clients,
+ spinlock_t *nn_local_clients_lock);
/* localio needs to map filehandle -> struct nfsd_file */
extern struct nfsd_file *
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 17/19] nfs_common: add nfs_localio trace events
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (15 preceding siblings ...)
2024-11-08 23:39 ` [for-6.13 PATCH 16/19] nfs_common: track all open nfsd_files per LOCALIO nfs_client Mike Snitzer
@ 2024-11-08 23:40 ` Mike Snitzer
2024-11-08 23:40 ` [for-6.13 PATCH 18/19] nfs: probe for LOCALIO when v4 client reconnects to server Mike Snitzer
` (2 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:40 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
The nfs_localio.ko now exposes /sys/kernel/tracing/events/nfs_localio
with nfs_localio_enable_client and nfs_localio_disable_client events.
These complement the existing nfs/nfs_local_{enable,disable} events to
convey things like if/when nfs_localio_invalidate_clients calls
nfs_localio_disable_client followed by nfs_local_probe_aync calling
nfs_local_enable which calls nfs_localio_enable_client.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs_common/Makefile | 3 +-
fs/nfs_common/localio_trace.c | 10 +++++++
fs/nfs_common/localio_trace.h | 56 +++++++++++++++++++++++++++++++++++
fs/nfs_common/nfslocalio.c | 4 +++
4 files changed, 72 insertions(+), 1 deletion(-)
create mode 100644 fs/nfs_common/localio_trace.c
create mode 100644 fs/nfs_common/localio_trace.h
diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
index a5e54809701e..c10ead273ff2 100644
--- a/fs/nfs_common/Makefile
+++ b/fs/nfs_common/Makefile
@@ -6,8 +6,9 @@
obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
nfs_acl-objs := nfsacl.o
+CFLAGS_localio_trace.o += -I$(src)
obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o
-nfs_localio-objs := nfslocalio.o
+nfs_localio-objs := nfslocalio.o localio_trace.o
obj-$(CONFIG_GRACE_PERIOD) += grace.o
obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
diff --git a/fs/nfs_common/localio_trace.c b/fs/nfs_common/localio_trace.c
new file mode 100644
index 000000000000..7decfe57abeb
--- /dev/null
+++ b/fs/nfs_common/localio_trace.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2024 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+#include <linux/nfs_fs.h>
+#include <linux/namei.h>
+
+#define CREATE_TRACE_POINTS
+#include "localio_trace.h"
diff --git a/fs/nfs_common/localio_trace.h b/fs/nfs_common/localio_trace.h
new file mode 100644
index 000000000000..4055aec9ff8d
--- /dev/null
+++ b/fs/nfs_common/localio_trace.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2024 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM nfs_localio
+
+#if !defined(_TRACE_NFS_COMMON_LOCALIO_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NFS_COMMON_LOCALIO_H
+
+#include <linux/tracepoint.h>
+
+#include <trace/misc/fs.h>
+#include <trace/misc/nfs.h>
+#include <trace/misc/sunrpc.h>
+
+DECLARE_EVENT_CLASS(nfs_local_client_event,
+ TP_PROTO(
+ const struct nfs_client *clp
+ ),
+
+ TP_ARGS(clp),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, protocol)
+ __string(server, clp->cl_hostname)
+ ),
+
+ TP_fast_assign(
+ __entry->protocol = clp->rpc_ops->version;
+ __assign_str(server);
+ ),
+
+ TP_printk(
+ "server=%s NFSv%u", __get_str(server), __entry->protocol
+ )
+);
+
+#define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \
+ DEFINE_EVENT(nfs_local_client_event, name, \
+ TP_PROTO( \
+ const struct nfs_client *clp \
+ ), \
+ TP_ARGS(clp))
+
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_localio_enable_client);
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_localio_disable_client);
+
+#endif /* _TRACE_NFS_COMMON_LOCALIO_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE localio_trace
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index c8d18f671bcb..fb376d38ac9a 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -12,6 +12,8 @@
#include <linux/nfs_fs.h>
#include <net/netns/generic.h>
+#include "localio_trace.h"
+
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("NFS localio protocol bypass support");
@@ -141,6 +143,7 @@ void nfs_localio_enable_client(struct nfs_client *clp)
spin_lock(&nfs_uuid->lock);
set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+ trace_nfs_localio_enable_client(clp);
spin_unlock(&nfs_uuid->lock);
}
EXPORT_SYMBOL_GPL(nfs_localio_enable_client);
@@ -199,6 +202,7 @@ void nfs_localio_disable_client(struct nfs_client *clp)
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
/* &clp->cl_uuid is always not NULL, using as bool here */
nfs_uuid = &clp->cl_uuid;
+ trace_nfs_localio_disable_client(clp);
}
spin_unlock(&clp->cl_uuid.lock);
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 18/19] nfs: probe for LOCALIO when v4 client reconnects to server
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (16 preceding siblings ...)
2024-11-08 23:40 ` [for-6.13 PATCH 17/19] nfs_common: add nfs_localio trace events Mike Snitzer
@ 2024-11-08 23:40 ` Mike Snitzer
2024-11-08 23:40 ` [for-6.13 PATCH 19/19] nfs: probe for LOCALIO when v3 " Mike Snitzer
2024-11-10 15:49 ` [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Chuck Lever III
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:40 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Introduce nfs_local_probe_async() for the NFS client to initiate
if/when it reconnects with server. For NFSv4 it is a simple matter to
call nfs_local_probe_async() from nfs4_do_reclaim (during NFSv4
grace).
[NFSv3 also needs to reestablish LOCALIO if/when a client reconnects
to server, but the stateless nature of v3 means the implementation is
more tricky so its been factored out to the following commit.]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/internal.h | 2 ++
fs/nfs/localio.c | 15 +++++++++++++++
fs/nfs/nfs4state.c | 1 +
include/linux/nfs_fs_sb.h | 1 +
4 files changed, 19 insertions(+)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 57af3ab3adbe..efd42efd9405 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -456,6 +456,7 @@ extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
/* localio.c */
extern void nfs_local_disable(struct nfs_client *);
extern void nfs_local_probe(struct nfs_client *);
+extern void nfs_local_probe_async(struct nfs_client *);
extern struct nfsd_file *nfs_local_open_fh(struct nfs_client *,
const struct cred *,
struct nfs_fh *,
@@ -473,6 +474,7 @@ extern bool nfs_server_is_local(const struct nfs_client *clp);
#else /* CONFIG_NFS_LOCALIO */
static inline void nfs_local_disable(struct nfs_client *clp) {}
static inline void nfs_local_probe(struct nfs_client *clp) {}
+static inline void nfs_local_probe_async(struct nfs_client *clp) {}
static inline struct nfsd_file *
nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, struct nfs_file_localio *nfl,
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index d10d863aaf23..710e537b3402 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -214,6 +214,21 @@ void nfs_local_probe(struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_local_probe);
+static void nfs_local_probe_async_work(struct work_struct *work)
+{
+ struct nfs_client *clp =
+ container_of(work, struct nfs_client, cl_local_probe_work);
+
+ nfs_local_probe(clp);
+}
+
+void nfs_local_probe_async(struct nfs_client *clp)
+{
+ INIT_WORK(&clp->cl_local_probe_work, nfs_local_probe_async_work);
+ queue_work(nfsiod_workqueue, &clp->cl_local_probe_work);
+}
+EXPORT_SYMBOL_GPL(nfs_local_probe_async);
+
static inline struct nfsd_file *nfs_local_file_get(struct nfsd_file *nf)
{
return nfs_to->nfsd_file_get(nf);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index dafd61186557..2ebb9ac56b7b 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1957,6 +1957,7 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
}
rcu_read_unlock();
nfs4_free_state_owners(&freeme);
+ nfs_local_probe_async(clp);
if (lost_locks)
pr_warn("NFS: %s: lost %d locks\n",
clp->cl_hostname, lost_locks);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 239d86ef166c..63d7e0f478d8 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -132,6 +132,7 @@ struct nfs_client {
struct timespec64 cl_nfssvc_boot;
seqlock_t cl_boot_lock;
nfs_uuid_t cl_uuid;
+ struct work_struct cl_local_probe_work;
#endif /* CONFIG_NFS_LOCALIO */
};
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [for-6.13 PATCH 19/19] nfs: probe for LOCALIO when v3 client reconnects to server
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (17 preceding siblings ...)
2024-11-08 23:40 ` [for-6.13 PATCH 18/19] nfs: probe for LOCALIO when v4 client reconnects to server Mike Snitzer
@ 2024-11-08 23:40 ` Mike Snitzer
2024-11-11 3:06 ` NeilBrown
2024-11-10 15:49 ` [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Chuck Lever III
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-08 23:40 UTC (permalink / raw)
To: linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, Jeff Layton,
NeilBrown
Re-enabling NFSv3 LOCALIO is made more complex (than NFSv4) because v3
is stateless. As such, the hueristic used to identify a LOCALIO probe
point is more adhoc by nature: if/when NFSv3 client IO begins to
complete again in terms of normal RPC-based NFSv3 server IO, attempt
nfs_local_probe_async().
Care is taken to throttle the frequency of nfs_local_probe_async(),
otherwise there could be a flood of repeat calls to
nfs_local_probe_async().
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/internal.h | 5 +++++
fs/nfs/localio.c | 11 +++++++++++
fs/nfs/nfs3proc.c | 34 +++++++++++++++++++++++++++++++---
fs/nfs_common/nfslocalio.c | 4 ++++
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfslocalio.h | 4 +++-
6 files changed, 55 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index efd42efd9405..fb1ab7cee6b9 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -470,6 +470,7 @@ extern int nfs_local_commit(struct nfsd_file *,
struct nfs_commit_data *,
const struct rpc_call_ops *, int);
extern bool nfs_server_is_local(const struct nfs_client *clp);
+extern bool nfs_server_was_local(const struct nfs_client *clp);
#else /* CONFIG_NFS_LOCALIO */
static inline void nfs_local_disable(struct nfs_client *clp) {}
@@ -499,6 +500,10 @@ static inline bool nfs_server_is_local(const struct nfs_client *clp)
{
return false;
}
+static inline bool nfs_server_was_local(const struct nfs_client *clp)
+{
+ return false;
+}
#endif /* CONFIG_NFS_LOCALIO */
/* super.c */
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 710e537b3402..1559dc2f1850 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -64,6 +64,17 @@ bool nfs_server_is_local(const struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_server_is_local);
+static inline bool nfs_client_was_local(const struct nfs_client *clp)
+{
+ return !!test_bit(NFS_CS_LOCAL_IO_CAPABLE, &clp->cl_flags);
+}
+
+bool nfs_server_was_local(const struct nfs_client *clp)
+{
+ return nfs_client_was_local(clp) && localio_enabled;
+}
+EXPORT_SYMBOL_GPL(nfs_server_was_local);
+
/*
* UUID_IS_LOCAL XDR functions
*/
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 1566163c6d85..4d2018760e9b 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -844,6 +844,29 @@ nfs3_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
return status;
}
+static void nfs3_local_probe(struct nfs_server *server)
+{
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ struct nfs_client *clp = server->nfs_client;
+ nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
+
+ if (likely(!nfs_server_was_local(clp)))
+ return;
+ /*
+ * Try re-enabling LOCALIO if it was previously enabled, but
+ * was disabled due to server restart, and IO has successfully
+ * completed in terms of normal RPC.
+ */
+ mutex_lock(&nfs_uuid->local_probe_mutex);
+ /* Arbitrary throttle to reduce nfs_local_probe_async() frequency */
+ if ((nfs_uuid->local_probe_count++ & 255) == 0) {
+ if (unlikely(!nfs_server_is_local(clp) && nfs_server_was_local(clp)))
+ nfs_local_probe_async(clp);
+ }
+ mutex_unlock(&nfs_uuid->local_probe_mutex);
+#endif
+}
+
static int nfs3_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
{
struct inode *inode = hdr->inode;
@@ -855,8 +878,11 @@ static int nfs3_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
if (nfs3_async_handle_jukebox(task, inode))
return -EAGAIN;
- if (task->tk_status >= 0 && !server->read_hdrsize)
- cmpxchg(&server->read_hdrsize, 0, hdr->res.replen);
+ if (task->tk_status >= 0) {
+ if (!server->read_hdrsize)
+ cmpxchg(&server->read_hdrsize, 0, hdr->res.replen);
+ nfs3_local_probe(server);
+ }
nfs_invalidate_atime(inode);
nfs_refresh_inode(inode, &hdr->fattr);
@@ -886,8 +912,10 @@ static int nfs3_write_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
if (nfs3_async_handle_jukebox(task, inode))
return -EAGAIN;
- if (task->tk_status >= 0)
+ if (task->tk_status >= 0) {
nfs_writeback_update_inode(hdr);
+ nfs3_local_probe(NFS_SERVER(inode));
+ }
return 0;
}
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index fb376d38ac9a..852ba8fd73f3 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -43,6 +43,8 @@ void nfs_uuid_init(nfs_uuid_t *nfs_uuid)
INIT_LIST_HEAD(&nfs_uuid->list);
INIT_LIST_HEAD(&nfs_uuid->files);
spin_lock_init(&nfs_uuid->lock);
+ mutex_init(&nfs_uuid->local_probe_mutex);
+ nfs_uuid->local_probe_count = 0;
}
EXPORT_SYMBOL_GPL(nfs_uuid_init);
@@ -143,6 +145,8 @@ void nfs_localio_enable_client(struct nfs_client *clp)
spin_lock(&nfs_uuid->lock);
set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+ /* Also set hint that client and server are LOCALIO capable */
+ set_bit(NFS_CS_LOCAL_IO_CAPABLE, &clp->cl_flags);
trace_nfs_localio_enable_client(clp);
spin_unlock(&nfs_uuid->lock);
}
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 63d7e0f478d8..45906c402c98 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -51,6 +51,7 @@ struct nfs_client {
#define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
#define NFS_CS_PNFS 9 /* - Server used for pnfs */
#define NFS_CS_LOCAL_IO 10 /* - client is local */
+#define NFS_CS_LOCAL_IO_CAPABLE 11 /* - client was previously local */
struct sockaddr_storage cl_addr; /* server identifier */
size_t cl_addrlen;
char * cl_hostname; /* hostname of server */
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index c68a529230c1..3dfef0bb18fe 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -27,7 +27,9 @@ struct nfs_file_localio;
*/
typedef struct {
uuid_t uuid;
- /* sadly this struct is just over a cacheline, avoid bouncing */
+ struct mutex local_probe_mutex;
+ unsigned local_probe_count;
+ /* sadly this struct is over a cacheline, avoid bouncing */
spinlock_t ____cacheline_aligned lock;
struct list_head list;
spinlock_t *list_lock; /* nn->local_clients_lock */
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
` (18 preceding siblings ...)
2024-11-08 23:40 ` [for-6.13 PATCH 19/19] nfs: probe for LOCALIO when v3 " Mike Snitzer
@ 2024-11-10 15:49 ` Chuck Lever III
19 siblings, 0 replies; 45+ messages in thread
From: Chuck Lever III @ 2024-11-10 15:49 UTC (permalink / raw)
To: Mike Snitzer
Cc: Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Jeff Layton, Neil Brown
> On Nov 8, 2024, at 6:39 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> Hi,
>
> I really wanted to post these patches at the beginning of the week (or
> sooner) but I had quite a few issues to work through. The biggest
> challenge came from trying to develop the final patch only to hit the
> wall of needing to find and fix memory corruption with the first
> patch.
>
> HUGE special thanks to NeilBrown for helping me identify the source of
> the NFSv3 LOCALIO memory corruption fixed by the first patch. Anna,
> we'd do well for that patch to land upstream for 6.12 final (but Trond
> if it slips to the 6.13 merge window pull that should be fine, as the
> Fixes: tag should get it to land in 6.12-stable).
>
> The 2nd patch is also a fundamental fix but it is kernel config
> dependant on whether you'll experience the RCU splat it fixes.
>
> Patches 3 - 6 are cleanups I've been carrying since just after the
> 6.12 merge window.
>
> Patch 7 adds a 'localio_O_DIRECT_semantics' nfs module parameter that
> when set will allow the use of O_DIRECT from the LOCALIO client
> through to the underlying filesystem.
>
> Patches 8 and beyond are dealing with the leftover bake-a-thon
> business of switching from caching LOCALIO's open nfsd_file in the
> server to doing so in the client. Definitely took some effort but the
> end result is working really well.
>
> This is quite a bit of change at the end of the 6.13 development
> window, but I _think_ it worthy of considersation for 6.13 (the bulk
> of the changes are confined to fs/nfs/localio.c and
> fs/nfs_common/nfslocalio.c which are only built if LOCALIO Kconfig
> options enabled (even general NFS code paths are all wrapped with
> CONFIG_NFS_LOCALIO).
>
> I'm happy to work through any issues found in review with urgency next
> week (or this weekend if others are interested to look and happen to
> find something).
>
> Happy to take it as it comes, I'm in no way _pushing_ for these
> changes to land for 6.13. I'm just now comfortable posting them for
> serious consideration.
Hey Mike -
I'd like to see patches 7ff get an unhurried review and then
spend a few weeks in fs-next and/or linux-next. I don't have
any objection to moving forward quickly with 1 - 6.
--
Chuck Lever
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 01/19] nfs/localio: must clear res.replen in nfs_local_read_done
2024-11-08 23:39 ` [for-6.13 PATCH 01/19] nfs/localio: must clear res.replen in nfs_local_read_done Mike Snitzer
@ 2024-11-11 0:36 ` NeilBrown
0 siblings, 0 replies; 45+ messages in thread
From: NeilBrown @ 2024-11-11 0:36 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Sat, 09 Nov 2024, Mike Snitzer wrote:
> From: NeilBrown <neilb@suse.de>
>
> Otherwise memory corruption can occur due to NFSv3 LOCALIO reads
> leaving garbage in res.replen:
I'm not comfortable with this patch. It doesn't tell us *why* there is
garbage in res.replen.
This is part of nfs_pgio_header and whenever that is allocated it
initialised to all zeros. So where does the garbage come from?
Answer: it comes from
hdr->res.verf = &hdr->verf;
in nfs_pgio_rpcsetup().
struct nfs_pgio_res contains a union. 'replen' is present for read.
'verf' is present for write (and there is other stuff).
so I think that init of res.verf should only happen for write.
I cannot see an easy way to do that. The best I can come up with is
to add a new pg_ioflags flag which says "this is a write", and only
initialise res.verf if that is set.
If we do stick with the current patch, I'd like a comment where we set
res.replen saying that it was corrupted when res.verf was initialised in
nfs_gpio_rpcsetup().
Or maybe move res.replen out of the union. There is a 4byte hole before
the union (on x86_64). It would be cleaner to move verf out, but that
is bigger....
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 02/19] nfs_common: must not hold RCU while calling nfsd_file_put_local
2024-11-08 23:39 ` [for-6.13 PATCH 02/19] nfs_common: must not hold RCU while calling nfsd_file_put_local Mike Snitzer
@ 2024-11-11 1:01 ` NeilBrown
2024-11-13 14:58 ` Jeff Layton
1 sibling, 0 replies; 45+ messages in thread
From: NeilBrown @ 2024-11-11 1:01 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Sat, 09 Nov 2024, Mike Snitzer wrote:
> Move holding the RCU from nfs_to_nfsd_file_put_local to
> nfs_to_nfsd_net_put. It is the call to nfs_to->nfsd_serv_put that
> requires the RCU anyway (the puts for nfsd_file and netns were
> combined to avoid an extra indirect reference but that
> micro-optimization isn't possible now).
>
> This fixes xfstests generic/013 and it triggering:
>
> "Voluntary context switch within RCU read-side critical section!"
I'm surprised it got that far. For me, the might_sleep() at the top of
nfsd_file_put() always warns that there is a problem here.
The fix is good though.
Reviewed-by: NeilBrown <neilb@suse.de>
NeilBrown
>
> [ 143.545738] Call Trace:
> [ 143.546206] <TASK>
> [ 143.546625] ? show_regs+0x6d/0x80
> [ 143.547267] ? __warn+0x91/0x140
> [ 143.547951] ? rcu_note_context_switch+0x496/0x5d0
> [ 143.548856] ? report_bug+0x193/0x1a0
> [ 143.549557] ? handle_bug+0x63/0xa0
> [ 143.550214] ? exc_invalid_op+0x1d/0x80
> [ 143.550938] ? asm_exc_invalid_op+0x1f/0x30
> [ 143.551736] ? rcu_note_context_switch+0x496/0x5d0
> [ 143.552634] ? wakeup_preempt+0x62/0x70
> [ 143.553358] __schedule+0xaa/0x1380
> [ 143.554025] ? _raw_spin_unlock_irqrestore+0x12/0x40
> [ 143.554958] ? try_to_wake_up+0x1fe/0x6b0
> [ 143.555715] ? wake_up_process+0x19/0x20
> [ 143.556452] schedule+0x2e/0x120
> [ 143.557066] schedule_preempt_disabled+0x19/0x30
> [ 143.557933] rwsem_down_read_slowpath+0x24d/0x4a0
> [ 143.558818] ? xfs_efi_item_format+0x50/0xc0 [xfs]
> [ 143.559894] down_read+0x4e/0xb0
> [ 143.560519] xlog_cil_commit+0x1b2/0xbc0 [xfs]
> [ 143.561460] ? _raw_spin_unlock+0x12/0x30
> [ 143.562212] ? xfs_inode_item_precommit+0xc7/0x220 [xfs]
> [ 143.563309] ? xfs_trans_run_precommits+0x69/0xd0 [xfs]
> [ 143.564394] __xfs_trans_commit+0xb5/0x330 [xfs]
> [ 143.565367] xfs_trans_roll+0x48/0xc0 [xfs]
> [ 143.566262] xfs_defer_trans_roll+0x57/0x100 [xfs]
> [ 143.567278] xfs_defer_finish_noroll+0x27a/0x490 [xfs]
> [ 143.568342] xfs_defer_finish+0x1a/0x80 [xfs]
> [ 143.569267] xfs_bunmapi_range+0x4d/0xb0 [xfs]
> [ 143.570208] xfs_itruncate_extents_flags+0x13d/0x230 [xfs]
> [ 143.571353] xfs_free_eofblocks+0x12e/0x190 [xfs]
> [ 143.572359] xfs_file_release+0x12d/0x140 [xfs]
> [ 143.573324] __fput+0xe8/0x2d0
> [ 143.573922] __fput_sync+0x1d/0x30
> [ 143.574574] nfsd_filp_close+0x33/0x60 [nfsd]
> [ 143.575430] nfsd_file_free+0x96/0x150 [nfsd]
> [ 143.576274] nfsd_file_put+0xf7/0x1a0 [nfsd]
> [ 143.577104] nfsd_file_put_local+0x18/0x30 [nfsd]
> [ 143.578070] nfs_close_local_fh+0x101/0x110 [nfs_localio]
> [ 143.579079] __put_nfs_open_context+0xc9/0x180 [nfs]
> [ 143.580031] nfs_file_clear_open_context+0x4a/0x60 [nfs]
> [ 143.581038] nfs_file_release+0x3e/0x60 [nfs]
> [ 143.581879] __fput+0xe8/0x2d0
> [ 143.582464] __fput_sync+0x1d/0x30
> [ 143.583108] __x64_sys_close+0x41/0x80
> [ 143.583823] x64_sys_call+0x189a/0x20d0
> [ 143.584552] do_syscall_64+0x64/0x170
> [ 143.585240] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 143.586185] RIP: 0033:0x7f3c5153efd7
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs_common/nfslocalio.c | 8 +++-----
> fs/nfsd/filecache.c | 14 +++++++-------
> fs/nfsd/filecache.h | 2 +-
> include/linux/nfslocalio.h | 18 +++++++++++++++---
> 4 files changed, 26 insertions(+), 16 deletions(-)
>
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index 09404d142d1a..a74ec08f6c96 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -155,11 +155,9 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
> /* We have an implied reference to net thanks to nfsd_serv_try_get */
> localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt,
> cred, nfs_fh, fmode);
> - if (IS_ERR(localio)) {
> - rcu_read_lock();
> - nfs_to->nfsd_serv_put(net);
> - rcu_read_unlock();
> - }
> + if (IS_ERR(localio))
> + nfs_to_nfsd_net_put(net);
> +
> return localio;
> }
> EXPORT_SYMBOL_GPL(nfs_open_local_fh);
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index c16671135d17..9a62b4da89bb 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -391,19 +391,19 @@ nfsd_file_put(struct nfsd_file *nf)
> }
>
> /**
> - * nfsd_file_put_local - put the reference to nfsd_file and local nfsd_serv
> - * @nf: nfsd_file of which to put the references
> + * nfsd_file_put_local - put nfsd_file reference and arm nfsd_serv_put in caller
> + * @nf: nfsd_file of which to put the reference
> *
> - * First put the reference of the nfsd_file and then put the
> - * reference to the associated nn->nfsd_serv.
> + * First save the associated net to return to caller, then put
> + * the reference of the nfsd_file.
> */
> -void
> -nfsd_file_put_local(struct nfsd_file *nf) __must_hold(rcu)
> +struct net *
> +nfsd_file_put_local(struct nfsd_file *nf)
> {
> struct net *net = nf->nf_net;
>
> nfsd_file_put(nf);
> - nfsd_serv_put(net);
> + return net;
> }
>
> /**
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index cadf3c2689c4..d5db6b34ba30 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -55,7 +55,7 @@ void nfsd_file_cache_shutdown(void);
> int nfsd_file_cache_start_net(struct net *net);
> void nfsd_file_cache_shutdown_net(struct net *net);
> void nfsd_file_put(struct nfsd_file *nf);
> -void nfsd_file_put_local(struct nfsd_file *nf);
> +struct net *nfsd_file_put_local(struct nfsd_file *nf);
> struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> struct file *nfsd_file_file(struct nfsd_file *nf);
> void nfsd_file_close_inode_sync(struct inode *inode);
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index 3982fea79919..9202f4b24343 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -55,7 +55,7 @@ struct nfsd_localio_operations {
> const struct cred *,
> const struct nfs_fh *,
> const fmode_t);
> - void (*nfsd_file_put_local)(struct nfsd_file *);
> + struct net *(*nfsd_file_put_local)(struct nfsd_file *);
> struct file *(*nfsd_file_file)(struct nfsd_file *);
> } ____cacheline_aligned;
>
> @@ -66,7 +66,7 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *,
> struct rpc_clnt *, const struct cred *,
> const struct nfs_fh *, const fmode_t);
>
> -static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
> +static inline void nfs_to_nfsd_net_put(struct net *net)
> {
> /*
> * Once reference to nfsd_serv is dropped, NFSD could be
> @@ -74,10 +74,22 @@ static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
> * by always taking RCU.
> */
> rcu_read_lock();
> - nfs_to->nfsd_file_put_local(localio);
> + nfs_to->nfsd_serv_put(net);
> rcu_read_unlock();
> }
>
> +static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
> +{
> + /*
> + * Must not hold RCU otherwise nfsd_file_put() can easily trigger:
> + * "Voluntary context switch within RCU read-side critical section!"
> + * by scheduling deep in underlying filesystem (e.g. XFS).
> + */
> + struct net *net = nfs_to->nfsd_file_put_local(localio);
> +
> + nfs_to_nfsd_net_put(net);
> +}
> +
> #else /* CONFIG_NFS_LOCALIO */
> static inline void nfsd_localio_ops_init(void)
> {
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 03/19] nfs/localio: remove redundant suid/sgid handling
2024-11-08 23:39 ` [for-6.13 PATCH 03/19] nfs/localio: remove redundant suid/sgid handling Mike Snitzer
@ 2024-11-11 1:09 ` NeilBrown
0 siblings, 0 replies; 45+ messages in thread
From: NeilBrown @ 2024-11-11 1:09 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Sat, 09 Nov 2024, Mike Snitzer wrote:
> From: Mike Snitzer <snitzer@hammerspace.com>
>
> nfs_writeback_done() will take care of suid/sgid corner case.
The code removed is from nfs_local_write_done().
That is called only in nfs_local_call_write() and nfs_local_pgio_release
is called shortly afterwards. That calls nfs_local_hdr_release() which
calls ->rpc_call_done which will be nfs_pgio_result (or something that
eventually calls nfs_pgio_result via some other ->rpc_call_done)
nfs_pgio_result calls ->rw_done which will be nfs_writeback_done which,
as you say, already contains that code.
So it looks good.
Reviewed-by: NeilBrown <neilb@suse.de>
Thanks,
NeilBrown
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/localio.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index 637528e6368e..4b24933093b6 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -527,12 +527,7 @@ nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
> }
> if (status < 0)
> nfs_reset_boot_verifier(inode);
> - else if (nfs_should_remove_suid(inode)) {
> - /* Deal with the suid/sgid bit corner case */
> - spin_lock(&inode->i_lock);
> - nfs_set_cache_invalid(inode, NFS_INO_INVALID_MODE);
> - spin_unlock(&inode->i_lock);
> - }
> +
> nfs_local_pgio_done(hdr, status);
> }
>
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 04/19] nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx
2024-11-08 23:39 ` [for-6.13 PATCH 04/19] nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx Mike Snitzer
@ 2024-11-11 1:15 ` NeilBrown
0 siblings, 0 replies; 45+ messages in thread
From: NeilBrown @ 2024-11-11 1:15 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Sat, 09 Nov 2024, Mike Snitzer wrote:
> nfs_local_commit() doesn't need async cleanup of nfs_local_fsync_ctx,
> so there is no need to use a kref.
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
thanks,
NeilBrown
> ---
> fs/nfs/localio.c | 20 +++-----------------
> 1 file changed, 3 insertions(+), 17 deletions(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index 4b24933093b6..a7eb83a604d0 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -42,7 +42,6 @@ struct nfs_local_fsync_ctx {
> struct nfsd_file *localio;
> struct nfs_commit_data *data;
> struct work_struct work;
> - struct kref kref;
> struct completion *done;
> };
> static void nfs_local_fsync_work(struct work_struct *work);
> @@ -689,30 +688,17 @@ nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data,
> ctx->localio = localio;
> ctx->data = data;
> INIT_WORK(&ctx->work, nfs_local_fsync_work);
> - kref_init(&ctx->kref);
> ctx->done = NULL;
> }
> return ctx;
> }
>
> -static void
> -nfs_local_fsync_ctx_kref_free(struct kref *kref)
> -{
> - kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
> -}
> -
> -static void
> -nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
> -{
> - kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
> -}
> -
> static void
> nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
> {
> nfs_local_release_commit_data(ctx->localio, ctx->data,
> ctx->data->task.tk_ops);
> - nfs_local_fsync_ctx_put(ctx);
> + kfree(ctx);
> }
>
> static void
> @@ -745,7 +731,7 @@ int nfs_local_commit(struct nfsd_file *localio,
> }
>
> nfs_local_init_commit(data, call_ops);
> - kref_get(&ctx->kref);
> +
> if (how & FLUSH_SYNC) {
> DECLARE_COMPLETION_ONSTACK(done);
> ctx->done = &done;
> @@ -753,6 +739,6 @@ int nfs_local_commit(struct nfsd_file *localio,
> wait_for_completion(&done);
> } else
> queue_work(nfsiod_workqueue, &ctx->work);
> - nfs_local_fsync_ctx_put(ctx);
> +
> return 0;
> }
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 05/19] nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
2024-11-08 23:39 ` [for-6.13 PATCH 05/19] nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter Mike Snitzer
@ 2024-11-11 1:20 ` NeilBrown
2024-11-11 15:09 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: NeilBrown @ 2024-11-11 1:20 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Sat, 09 Nov 2024, Mike Snitzer wrote:
> Push the read_iter and write_iter availability checks down to
> nfs_do_local_read and nfs_do_local_write respectively.
>
> This eliminates a redundant nfs_to->nfsd_file_file() call.
Do it?
The patch removes 2 of these calls and add 2 of these calls. So it
isn't clear what is being eliminated.
Maybe it is a good think to do, but it isn't obvious to me why.
Thanks,
NeilBrown
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/localio.c | 32 +++++++++++++++++++-------------
> 1 file changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index a7eb83a604d0..a77ac7e8a05c 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -273,7 +273,7 @@ nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
>
> static struct nfs_local_kiocb *
> nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
> - struct nfsd_file *localio, gfp_t flags)
> + struct file *file, gfp_t flags)
> {
> struct nfs_local_kiocb *iocb;
>
> @@ -286,9 +286,8 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
> kfree(iocb);
> return NULL;
> }
> - init_sync_kiocb(&iocb->kiocb, nfs_to->nfsd_file_file(localio));
> + init_sync_kiocb(&iocb->kiocb, file);
> iocb->kiocb.ki_pos = hdr->args.offset;
> - iocb->localio = localio;
> iocb->hdr = hdr;
> iocb->kiocb.ki_flags &= ~IOCB_APPEND;
> return iocb;
> @@ -395,13 +394,19 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
> const struct rpc_call_ops *call_ops)
> {
> struct nfs_local_kiocb *iocb;
> + struct file *file = nfs_to->nfsd_file_file(localio);
> +
> + /* Don't support filesystems without read_iter */
> + if (!file->f_op->read_iter)
> + return -EAGAIN;
>
> dprintk("%s: vfs_read count=%u pos=%llu\n",
> __func__, hdr->args.count, hdr->args.offset);
>
> - iocb = nfs_local_iocb_alloc(hdr, localio, GFP_KERNEL);
> + iocb = nfs_local_iocb_alloc(hdr, file, GFP_KERNEL);
> if (iocb == NULL)
> return -ENOMEM;
> + iocb->localio = localio;
>
> nfs_local_pgio_init(hdr, call_ops);
> hdr->res.eof = false;
> @@ -564,14 +569,20 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
> const struct rpc_call_ops *call_ops)
> {
> struct nfs_local_kiocb *iocb;
> + struct file *file = nfs_to->nfsd_file_file(localio);
> +
> + /* Don't support filesystems without write_iter */
> + if (!file->f_op->write_iter)
> + return -EAGAIN;
>
> dprintk("%s: vfs_write count=%u pos=%llu %s\n",
> __func__, hdr->args.count, hdr->args.offset,
> (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
>
> - iocb = nfs_local_iocb_alloc(hdr, localio, GFP_NOIO);
> + iocb = nfs_local_iocb_alloc(hdr, file, GFP_NOIO);
> if (iocb == NULL)
> return -ENOMEM;
> + iocb->localio = localio;
>
> switch (hdr->args.stable) {
> default:
> @@ -597,16 +608,9 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
> const struct rpc_call_ops *call_ops)
> {
> int status = 0;
> - struct file *filp = nfs_to->nfsd_file_file(localio);
>
> if (!hdr->args.count)
> return 0;
> - /* Don't support filesystems without read_iter/write_iter */
> - if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
> - nfs_local_disable(clp);
> - status = -EAGAIN;
> - goto out;
> - }
>
> switch (hdr->rw_mode) {
> case FMODE_READ:
> @@ -620,8 +624,10 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
> hdr->rw_mode);
> status = -EINVAL;
> }
> -out:
> +
> if (status != 0) {
> + if (status == -EAGAIN)
> + nfs_local_disable(clp);
> nfs_to_nfsd_file_put_local(localio);
> hdr->task.tk_status = status;
> nfs_local_hdr_release(hdr, call_ops);
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 06/19] nfs/localio: eliminate need for nfs_local_fsync_work forward declaration
2024-11-08 23:39 ` [for-6.13 PATCH 06/19] nfs/localio: eliminate need for nfs_local_fsync_work forward declaration Mike Snitzer
@ 2024-11-11 1:21 ` NeilBrown
0 siblings, 0 replies; 45+ messages in thread
From: NeilBrown @ 2024-11-11 1:21 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Sat, 09 Nov 2024, Mike Snitzer wrote:
> Move nfs_local_fsync_ctx_alloc() after nfs_local_fsync_work().
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Nice.
Reviewed-by: NeilBrown <neilb@suse.de>
Thanks,
NeilBrown
> ---
> fs/nfs/localio.c | 31 +++++++++++++++----------------
> 1 file changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index a77ac7e8a05c..4b8618cf114c 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -44,7 +44,6 @@ struct nfs_local_fsync_ctx {
> struct work_struct work;
> struct completion *done;
> };
> -static void nfs_local_fsync_work(struct work_struct *work);
>
> static bool localio_enabled __read_mostly = true;
> module_param(localio_enabled, bool, 0644);
> @@ -684,21 +683,6 @@ nfs_local_release_commit_data(struct nfsd_file *localio,
> call_ops->rpc_release(data);
> }
>
> -static struct nfs_local_fsync_ctx *
> -nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data,
> - struct nfsd_file *localio, gfp_t flags)
> -{
> - struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
> -
> - if (ctx != NULL) {
> - ctx->localio = localio;
> - ctx->data = data;
> - INIT_WORK(&ctx->work, nfs_local_fsync_work);
> - ctx->done = NULL;
> - }
> - return ctx;
> -}
> -
> static void
> nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
> {
> @@ -723,6 +707,21 @@ nfs_local_fsync_work(struct work_struct *work)
> nfs_local_fsync_ctx_free(ctx);
> }
>
> +static struct nfs_local_fsync_ctx *
> +nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data,
> + struct nfsd_file *localio, gfp_t flags)
> +{
> + struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
> +
> + if (ctx != NULL) {
> + ctx->localio = localio;
> + ctx->data = data;
> + INIT_WORK(&ctx->work, nfs_local_fsync_work);
> + ctx->done = NULL;
> + }
> + return ctx;
> +}
> +
> int nfs_local_commit(struct nfsd_file *localio,
> struct nfs_commit_data *data,
> const struct rpc_call_ops *call_ops, int how)
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 07/19] nfs/localio: add direct IO enablement with sync and async IO support
2024-11-08 23:39 ` [for-6.13 PATCH 07/19] nfs/localio: add direct IO enablement with sync and async IO support Mike Snitzer
@ 2024-11-11 1:31 ` NeilBrown
2024-11-12 14:31 ` Chuck Lever
1 sibling, 0 replies; 45+ messages in thread
From: NeilBrown @ 2024-11-11 1:31 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Sat, 09 Nov 2024, Mike Snitzer wrote:
> This commit simply adds the required O_DIRECT plumbing. It doesn't
> address the fact that NFS doesn't ensure all writes are page aligned
> (nor device logical block size aligned as required by O_DIRECT).
>
> Because NFS will read-modify-write for IO that isn't aligned, LOCALIO
> will not use O_DIRECT semantics by default if/when an application
> requests the use of O_DIRECT. Allow the use of O_DIRECT semantics by:
> 1: Adding a flag to the nfs_pgio_header struct to allow the NFS
> O_DIRECT layer to signal that O_DIRECT was used by the application
> 2: Adding a 'localio_O_DIRECT_semantics' NFS module parameter that
> when enabled will cause LOCALIO to use O_DIRECT semantics (this may
> cause IO to fail if applications do not properly align their IO).
>
> Adding Direct IO support helps side-step the problem that LOCALIO
> currently double buffers buffered IO (by using page cache in both NFS
> and the underlying filesystem). More care is needed to craft a proper
> solution for LOCALIO's redundant use of page cache for buffered IO,
> e.g.: https://marc.info/?l=linux-nfs&m=171996211625151&w=2
>
> This commit is derived from code developed by Weston Andros Adamson.
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/direct.c | 1 +
> fs/nfs/localio.c | 92 ++++++++++++++++++++++++++++++++++++-----
> include/linux/nfs_xdr.h | 1 +
> 3 files changed, 84 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 90079ca134dd..4b92493d6ff0 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -303,6 +303,7 @@ static void nfs_read_sync_pgio_error(struct list_head *head, int error)
> static void nfs_direct_pgio_init(struct nfs_pgio_header *hdr)
> {
> get_dreq(hdr->dreq);
> + set_bit(NFS_IOHDR_ODIRECT, &hdr->flags);
> }
>
> static const struct nfs_pgio_completion_ops nfs_direct_read_completion_ops = {
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index 4b8618cf114c..de0dcd76d84d 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -35,6 +35,7 @@ struct nfs_local_kiocb {
> struct bio_vec *bvec;
> struct nfs_pgio_header *hdr;
> struct work_struct work;
> + void (*aio_complete_work)(struct work_struct *);
> struct nfsd_file *localio;
> };
>
> @@ -48,6 +49,10 @@ struct nfs_local_fsync_ctx {
> static bool localio_enabled __read_mostly = true;
> module_param(localio_enabled, bool, 0644);
>
> +static bool localio_O_DIRECT_semantics __read_mostly = false;
> +module_param(localio_O_DIRECT_semantics, bool, 0644);
> +MODULE_PARM_DESC(localio_O_DIRECT_semantics, "Use O_DIRECT semantics");
Should the text mention localio??
> +
> static inline bool nfs_client_is_local(const struct nfs_client *clp)
> {
> return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> @@ -285,10 +290,19 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
> kfree(iocb);
> return NULL;
> }
> - init_sync_kiocb(&iocb->kiocb, file);
> +
> + if (localio_O_DIRECT_semantics &&
> + test_bit(NFS_IOHDR_ODIRECT, &hdr->flags)) {
> + iocb->kiocb.ki_filp = file;
> + iocb->kiocb.ki_flags = IOCB_DIRECT;
why isn't ki_ioprio initialised??
The rest I am not able to review as I'm that that familiar with iocb
code.
Thanks,
NeilBrown
> + } else
> + init_sync_kiocb(&iocb->kiocb, file);
> +
> iocb->kiocb.ki_pos = hdr->args.offset;
> iocb->hdr = hdr;
> iocb->kiocb.ki_flags &= ~IOCB_APPEND;
> + iocb->aio_complete_work = NULL;
> +
> return iocb;
> }
>
> @@ -343,6 +357,18 @@ nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
> nfs_local_hdr_release(hdr, hdr->task.tk_ops);
> }
>
> +/*
> + * Complete the I/O from iocb->kiocb.ki_complete()
> + *
> + * Note that this function can be called from a bottom half context,
> + * hence we need to queue the rpc_call_done() etc to a workqueue
> + */
> +static inline void nfs_local_pgio_aio_complete(struct nfs_local_kiocb *iocb)
> +{
> + INIT_WORK(&iocb->work, iocb->aio_complete_work);
> + queue_work(nfsiod_workqueue, &iocb->work);
> +}
> +
> static void
> nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
> {
> @@ -365,6 +391,23 @@ nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
> status > 0 ? status : 0, hdr->res.eof);
> }
>
> +static void nfs_local_read_aio_complete_work(struct work_struct *work)
> +{
> + struct nfs_local_kiocb *iocb =
> + container_of(work, struct nfs_local_kiocb, work);
> +
> + nfs_local_pgio_release(iocb);
> +}
> +
> +static void nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
> +{
> + struct nfs_local_kiocb *iocb =
> + container_of(kiocb, struct nfs_local_kiocb, kiocb);
> +
> + nfs_local_read_done(iocb, ret);
> + nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_read_aio_complete_work */
> +}
> +
> static void nfs_local_call_read(struct work_struct *work)
> {
> struct nfs_local_kiocb *iocb =
> @@ -379,10 +422,10 @@ static void nfs_local_call_read(struct work_struct *work)
> nfs_local_iter_init(&iter, iocb, READ);
>
> status = filp->f_op->read_iter(&iocb->kiocb, &iter);
> - WARN_ON_ONCE(status == -EIOCBQUEUED);
> -
> - nfs_local_read_done(iocb, status);
> - nfs_local_pgio_release(iocb);
> + if (status != -EIOCBQUEUED) {
> + nfs_local_read_done(iocb, status);
> + nfs_local_pgio_release(iocb);
> + }
>
> revert_creds(save_cred);
> }
> @@ -410,6 +453,11 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
> nfs_local_pgio_init(hdr, call_ops);
> hdr->res.eof = false;
>
> + if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
> + iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
> + iocb->aio_complete_work = nfs_local_read_aio_complete_work;
> + }
> +
> INIT_WORK(&iocb->work, nfs_local_call_read);
> queue_work(nfslocaliod_workqueue, &iocb->work);
>
> @@ -534,6 +582,24 @@ nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
> nfs_local_pgio_done(hdr, status);
> }
>
> +static void nfs_local_write_aio_complete_work(struct work_struct *work)
> +{
> + struct nfs_local_kiocb *iocb =
> + container_of(work, struct nfs_local_kiocb, work);
> +
> + nfs_local_vfs_getattr(iocb);
> + nfs_local_pgio_release(iocb);
> +}
> +
> +static void nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
> +{
> + struct nfs_local_kiocb *iocb =
> + container_of(kiocb, struct nfs_local_kiocb, kiocb);
> +
> + nfs_local_write_done(iocb, ret);
> + nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_write_aio_complete_work */
> +}
> +
> static void nfs_local_call_write(struct work_struct *work)
> {
> struct nfs_local_kiocb *iocb =
> @@ -552,11 +618,11 @@ static void nfs_local_call_write(struct work_struct *work)
> file_start_write(filp);
> status = filp->f_op->write_iter(&iocb->kiocb, &iter);
> file_end_write(filp);
> - WARN_ON_ONCE(status == -EIOCBQUEUED);
> -
> - nfs_local_write_done(iocb, status);
> - nfs_local_vfs_getattr(iocb);
> - nfs_local_pgio_release(iocb);
> + if (status != -EIOCBQUEUED) {
> + nfs_local_write_done(iocb, status);
> + nfs_local_vfs_getattr(iocb);
> + nfs_local_pgio_release(iocb);
> + }
>
> revert_creds(save_cred);
> current->flags = old_flags;
> @@ -592,10 +658,16 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
> case NFS_FILE_SYNC:
> iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
> }
> +
> nfs_local_pgio_init(hdr, call_ops);
>
> nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
>
> + if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
> + iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
> + iocb->aio_complete_work = nfs_local_write_aio_complete_work;
> + }
> +
> INIT_WORK(&iocb->work, nfs_local_call_write);
> queue_work(nfslocaliod_workqueue, &iocb->work);
>
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index e0ae0a14257f..f30e94d105b7 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1632,6 +1632,7 @@ enum {
> NFS_IOHDR_RESEND_PNFS,
> NFS_IOHDR_RESEND_MDS,
> NFS_IOHDR_UNSTABLE_WRITES,
> + NFS_IOHDR_ODIRECT,
> };
>
> struct nfs_io_completion;
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 09/19] nfs_common: rename functions that invalidate LOCALIO nfs_clients
2024-11-08 23:39 ` [for-6.13 PATCH 09/19] nfs_common: rename functions that invalidate LOCALIO nfs_clients Mike Snitzer
@ 2024-11-11 1:32 ` NeilBrown
0 siblings, 0 replies; 45+ messages in thread
From: NeilBrown @ 2024-11-11 1:32 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Sat, 09 Nov 2024, Mike Snitzer wrote:
> Rename nfs_uuid_invalidate_one_client to nfs_localio_disable_client.
> Rename nfs_uuid_invalidate_clients to nfs_localio_invalidate_clients.
I agree that is an improvement.
Reviewed-by: NeilBrown <neilb@suse.de>
Thanks,
NeilBrown
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/localio.c | 2 +-
> fs/nfs_common/nfslocalio.c | 8 ++++----
> fs/nfsd/nfsctl.c | 4 ++--
> include/linux/nfslocalio.h | 5 +++--
> 4 files changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index de0dcd76d84d..cab2a8819259 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -139,7 +139,7 @@ void nfs_local_disable(struct nfs_client *clp)
> spin_lock(&clp->cl_localio_lock);
> if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> trace_nfs_local_disable(clp);
> - nfs_uuid_invalidate_one_client(&clp->cl_uuid);
> + nfs_localio_disable_client(&clp->cl_uuid);
> }
> spin_unlock(&clp->cl_localio_lock);
> }
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index a74ec08f6c96..904439e4bb85 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -107,7 +107,7 @@ static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
> list_del_init(&nfs_uuid->list);
> }
>
> -void nfs_uuid_invalidate_clients(struct list_head *list)
> +void nfs_localio_invalidate_clients(struct list_head *list)
> {
> nfs_uuid_t *nfs_uuid, *tmp;
>
> @@ -116,9 +116,9 @@ void nfs_uuid_invalidate_clients(struct list_head *list)
> nfs_uuid_put_locked(nfs_uuid);
> spin_unlock(&nfs_uuid_lock);
> }
> -EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_clients);
> +EXPORT_SYMBOL_GPL(nfs_localio_invalidate_clients);
>
> -void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid)
> +void nfs_localio_disable_client(nfs_uuid_t *nfs_uuid)
> {
> if (nfs_uuid->net) {
> spin_lock(&nfs_uuid_lock);
> @@ -126,7 +126,7 @@ void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid)
> spin_unlock(&nfs_uuid_lock);
> }
> }
> -EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client);
> +EXPORT_SYMBOL_GPL(nfs_localio_disable_client);
>
> struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
> struct rpc_clnt *rpc_clnt, const struct cred *cred,
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 3adbc05ebaac..727904d8a4d0 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -2276,14 +2276,14 @@ static __net_init int nfsd_net_init(struct net *net)
> * nfsd_net_pre_exit - Disconnect localio clients from net namespace
> * @net: a network namespace that is about to be destroyed
> *
> - * This invalidated ->net pointers held by localio clients
> + * This invalidates ->net pointers held by localio clients
> * while they can still safely access nn->counter.
> */
> static __net_exit void nfsd_net_pre_exit(struct net *net)
> {
> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>
> - nfs_uuid_invalidate_clients(&nn->local_clients);
> + nfs_localio_invalidate_clients(&nn->local_clients);
> }
> #endif
>
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index ab6a2a53f505..a05d1043f2b0 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -37,8 +37,9 @@ bool nfs_uuid_begin(nfs_uuid_t *);
> void nfs_uuid_end(nfs_uuid_t *);
> void nfs_uuid_is_local(const uuid_t *, struct list_head *,
> struct net *, struct auth_domain *, struct module *);
> -void nfs_uuid_invalidate_clients(struct list_head *list);
> -void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid);
> +
> +void nfs_localio_disable_client(nfs_uuid_t *nfs_uuid);
> +void nfs_localio_invalidate_clients(struct list_head *list);
>
> /* localio needs to map filehandle -> struct nfsd_file */
> extern struct nfsd_file *
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-08 23:39 ` [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t Mike Snitzer
@ 2024-11-11 1:55 ` NeilBrown
2024-11-11 15:33 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: NeilBrown @ 2024-11-11 1:55 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Sat, 09 Nov 2024, Mike Snitzer wrote:
> Remove cl_localio_lock from 'struct nfs_client' in favor of adding a
> lock to the nfs_uuid_t struct (which is embedded in each nfs_client).
>
> Push nfs_local_{enable,disable} implementation down to nfs_common.
> Those methods now call nfs_localio_{enable,disable}_client.
>
> This allows implementing nfs_localio_invalidate_clients in terms of
> nfs_localio_disable_client.
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/client.c | 1 -
> fs/nfs/localio.c | 18 ++++++------
> fs/nfs_common/nfslocalio.c | 57 ++++++++++++++++++++++++++------------
> include/linux/nfs_fs_sb.h | 1 -
> include/linux/nfslocalio.h | 8 +++++-
> 5 files changed, 55 insertions(+), 30 deletions(-)
>
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 03ecc7765615..124232054807 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -182,7 +182,6 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
> seqlock_init(&clp->cl_boot_lock);
> ktime_get_real_ts64(&clp->cl_nfssvc_boot);
> nfs_uuid_init(&clp->cl_uuid);
> - spin_lock_init(&clp->cl_localio_lock);
> #endif /* CONFIG_NFS_LOCALIO */
>
> clp->cl_principal = "*";
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index cab2a8819259..4c75ffc5efa2 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -125,10 +125,8 @@ const struct rpc_program nfslocalio_program = {
> */
> static void nfs_local_enable(struct nfs_client *clp)
> {
> - spin_lock(&clp->cl_localio_lock);
> - set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> trace_nfs_local_enable(clp);
> - spin_unlock(&clp->cl_localio_lock);
> + nfs_localio_enable_client(clp);
> }
Why do we need this function? The one caller could call
nfs_localio_enable_client() directly instead. The tracepoint could be
placed in that one caller.
>
> /*
> @@ -136,12 +134,8 @@ static void nfs_local_enable(struct nfs_client *clp)
> */
> void nfs_local_disable(struct nfs_client *clp)
> {
> - spin_lock(&clp->cl_localio_lock);
> - if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> - trace_nfs_local_disable(clp);
> - nfs_localio_disable_client(&clp->cl_uuid);
> - }
> - spin_unlock(&clp->cl_localio_lock);
> + trace_nfs_local_disable(clp);
> + nfs_localio_disable_client(clp);
> }
Ditto. Though there are more callers so the tracepoint solution isn't
quite so obvious.
>
> /*
> @@ -183,8 +177,12 @@ static bool nfs_server_uuid_is_local(struct nfs_client *clp)
> rpc_shutdown_client(rpcclient_localio);
>
> /* Server is only local if it initialized required struct members */
> - if (status || !clp->cl_uuid.net || !clp->cl_uuid.dom)
> + rcu_read_lock();
> + if (status || !rcu_access_pointer(clp->cl_uuid.net) || !clp->cl_uuid.dom) {
> + rcu_read_unlock();
> return false;
> + }
> + rcu_read_unlock();
What value does RCU provide here? I don't think this change is needed.
rcu_access_pointer does not require rcu_read_lock().
>
> return true;
> }
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index 904439e4bb85..cf2f47ea4f8d 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -7,6 +7,9 @@
> #include <linux/module.h>
> #include <linux/list.h>
> #include <linux/nfslocalio.h>
> +#include <linux/nfs3.h>
> +#include <linux/nfs4.h>
> +#include <linux/nfs_fs_sb.h>
I don't feel good about adding this nfs client knowledge in to nfs_common.
> #include <net/netns/generic.h>
>
> MODULE_LICENSE("GPL");
> @@ -25,6 +28,7 @@ void nfs_uuid_init(nfs_uuid_t *nfs_uuid)
> nfs_uuid->net = NULL;
> nfs_uuid->dom = NULL;
> INIT_LIST_HEAD(&nfs_uuid->list);
> + spin_lock_init(&nfs_uuid->lock);
> }
> EXPORT_SYMBOL_GPL(nfs_uuid_init);
>
> @@ -94,12 +98,23 @@ void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
> }
> EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
>
> +void nfs_localio_enable_client(struct nfs_client *clp)
> +{
> + nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
> +
> + spin_lock(&nfs_uuid->lock);
> + set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> + spin_unlock(&nfs_uuid->lock);
> +}
> +EXPORT_SYMBOL_GPL(nfs_localio_enable_client);
And I don't feel good about nfs_local accessing nfs_client directly.
It only uses it for NFS_CS_LOCAL_IO. Can we ditch that flag and instead
so something like testing nfs_uuid.net ??
> +
> static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
> {
> - if (nfs_uuid->net) {
> - module_put(nfsd_mod);
> - nfs_uuid->net = NULL;
> - }
> + if (!nfs_uuid->net)
> + return;
> + module_put(nfsd_mod);
> + rcu_assign_pointer(nfs_uuid->net, NULL);
> +
I much prefer RCU_INIT_POINTER for assigning NULL as there is no need
for ordering here.
> if (nfs_uuid->dom) {
> auth_domain_put(nfs_uuid->dom);
> nfs_uuid->dom = NULL;
> @@ -107,27 +122,35 @@ static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
> list_del_init(&nfs_uuid->list);
> }
>
> -void nfs_localio_invalidate_clients(struct list_head *list)
> +void nfs_localio_disable_client(struct nfs_client *clp)
> +{
> + nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
> +
> + spin_lock(&nfs_uuid->lock);
> + if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> + spin_lock(&nfs_uuid_lock);
> + nfs_uuid_put_locked(nfs_uuid);
> + spin_unlock(&nfs_uuid_lock);
> + }
> + spin_unlock(&nfs_uuid->lock);
> +}
> +EXPORT_SYMBOL_GPL(nfs_localio_disable_client);
> +
> +void nfs_localio_invalidate_clients(struct list_head *cl_uuid_list)
> {
> nfs_uuid_t *nfs_uuid, *tmp;
>
> spin_lock(&nfs_uuid_lock);
> - list_for_each_entry_safe(nfs_uuid, tmp, list, list)
> - nfs_uuid_put_locked(nfs_uuid);
> + list_for_each_entry_safe(nfs_uuid, tmp, cl_uuid_list, list) {
> + struct nfs_client *clp =
> + container_of(nfs_uuid, struct nfs_client, cl_uuid);
> +
> + nfs_localio_disable_client(clp);
> + }
> spin_unlock(&nfs_uuid_lock);
> }
> EXPORT_SYMBOL_GPL(nfs_localio_invalidate_clients);
>
> -void nfs_localio_disable_client(nfs_uuid_t *nfs_uuid)
> -{
> - if (nfs_uuid->net) {
> - spin_lock(&nfs_uuid_lock);
> - nfs_uuid_put_locked(nfs_uuid);
> - spin_unlock(&nfs_uuid_lock);
> - }
> -}
> -EXPORT_SYMBOL_GPL(nfs_localio_disable_client);
> -
> struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
> struct rpc_clnt *rpc_clnt, const struct cred *cred,
> const struct nfs_fh *nfs_fh, const fmode_t fmode)
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index b804346a9741..239d86ef166c 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -132,7 +132,6 @@ struct nfs_client {
> struct timespec64 cl_nfssvc_boot;
> seqlock_t cl_boot_lock;
> nfs_uuid_t cl_uuid;
> - spinlock_t cl_localio_lock;
> #endif /* CONFIG_NFS_LOCALIO */
> };
>
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index a05d1043f2b0..4d5583873f41 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -6,6 +6,7 @@
> #ifndef __LINUX_NFSLOCALIO_H
> #define __LINUX_NFSLOCALIO_H
>
> +
> /* nfsd_file structure is purposely kept opaque to NFS client */
> struct nfsd_file;
>
> @@ -19,6 +20,8 @@ struct nfsd_file;
> #include <linux/nfs.h>
> #include <net/net_namespace.h>
>
> +struct nfs_client;
> +
> /*
> * Useful to allow a client to negotiate if localio
> * possible with its server.
> @@ -27,6 +30,8 @@ struct nfsd_file;
> */
> typedef struct {
> uuid_t uuid;
> + /* sadly this struct is just over a cacheline, avoid bouncing */
> + spinlock_t ____cacheline_aligned lock;
> struct list_head list;
> struct net __rcu *net; /* nfsd's network namespace */
> struct auth_domain *dom; /* auth_domain for localio */
> @@ -38,7 +43,8 @@ void nfs_uuid_end(nfs_uuid_t *);
> void nfs_uuid_is_local(const uuid_t *, struct list_head *,
> struct net *, struct auth_domain *, struct module *);
>
> -void nfs_localio_disable_client(nfs_uuid_t *nfs_uuid);
> +void nfs_localio_enable_client(struct nfs_client *clp);
> +void nfs_localio_disable_client(struct nfs_client *clp);
> void nfs_localio_invalidate_clients(struct list_head *list);
>
> /* localio needs to map filehandle -> struct nfsd_file */
> --
> 2.44.0
>
>
I think this is a good refactoring to do, but I don't like some of the
details, or some of the RCU code.
NeilBrown
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 19/19] nfs: probe for LOCALIO when v3 client reconnects to server
2024-11-08 23:40 ` [for-6.13 PATCH 19/19] nfs: probe for LOCALIO when v3 " Mike Snitzer
@ 2024-11-11 3:06 ` NeilBrown
0 siblings, 0 replies; 45+ messages in thread
From: NeilBrown @ 2024-11-11 3:06 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Sat, 09 Nov 2024, Mike Snitzer wrote:
> Re-enabling NFSv3 LOCALIO is made more complex (than NFSv4) because v3
> is stateless. As such, the hueristic used to identify a LOCALIO probe
> point is more adhoc by nature: if/when NFSv3 client IO begins to
> complete again in terms of normal RPC-based NFSv3 server IO, attempt
> nfs_local_probe_async().
>
> Care is taken to throttle the frequency of nfs_local_probe_async(),
> otherwise there could be a flood of repeat calls to
> nfs_local_probe_async().
I think it would be good to limit this to only probing when the network
connection is reestablished - assuming we can ignore connectionless
protocols like UDP.
I think you can stash rpc_clnt->cl_xprt->connect_cookie and check if
that has changed or not. If not, then there is no point probing again.
Thanks,
NeilBrown
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/internal.h | 5 +++++
> fs/nfs/localio.c | 11 +++++++++++
> fs/nfs/nfs3proc.c | 34 +++++++++++++++++++++++++++++++---
> fs/nfs_common/nfslocalio.c | 4 ++++
> include/linux/nfs_fs_sb.h | 1 +
> include/linux/nfslocalio.h | 4 +++-
> 6 files changed, 55 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index efd42efd9405..fb1ab7cee6b9 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -470,6 +470,7 @@ extern int nfs_local_commit(struct nfsd_file *,
> struct nfs_commit_data *,
> const struct rpc_call_ops *, int);
> extern bool nfs_server_is_local(const struct nfs_client *clp);
> +extern bool nfs_server_was_local(const struct nfs_client *clp);
>
> #else /* CONFIG_NFS_LOCALIO */
> static inline void nfs_local_disable(struct nfs_client *clp) {}
> @@ -499,6 +500,10 @@ static inline bool nfs_server_is_local(const struct nfs_client *clp)
> {
> return false;
> }
> +static inline bool nfs_server_was_local(const struct nfs_client *clp)
> +{
> + return false;
> +}
> #endif /* CONFIG_NFS_LOCALIO */
>
> /* super.c */
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index 710e537b3402..1559dc2f1850 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -64,6 +64,17 @@ bool nfs_server_is_local(const struct nfs_client *clp)
> }
> EXPORT_SYMBOL_GPL(nfs_server_is_local);
>
> +static inline bool nfs_client_was_local(const struct nfs_client *clp)
> +{
> + return !!test_bit(NFS_CS_LOCAL_IO_CAPABLE, &clp->cl_flags);
> +}
> +
> +bool nfs_server_was_local(const struct nfs_client *clp)
> +{
> + return nfs_client_was_local(clp) && localio_enabled;
> +}
> +EXPORT_SYMBOL_GPL(nfs_server_was_local);
> +
> /*
> * UUID_IS_LOCAL XDR functions
> */
> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> index 1566163c6d85..4d2018760e9b 100644
> --- a/fs/nfs/nfs3proc.c
> +++ b/fs/nfs/nfs3proc.c
> @@ -844,6 +844,29 @@ nfs3_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
> return status;
> }
>
> +static void nfs3_local_probe(struct nfs_server *server)
> +{
> +#if IS_ENABLED(CONFIG_NFS_LOCALIO)
> + struct nfs_client *clp = server->nfs_client;
> + nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
> +
> + if (likely(!nfs_server_was_local(clp)))
> + return;
> + /*
> + * Try re-enabling LOCALIO if it was previously enabled, but
> + * was disabled due to server restart, and IO has successfully
> + * completed in terms of normal RPC.
> + */
> + mutex_lock(&nfs_uuid->local_probe_mutex);
> + /* Arbitrary throttle to reduce nfs_local_probe_async() frequency */
> + if ((nfs_uuid->local_probe_count++ & 255) == 0) {
> + if (unlikely(!nfs_server_is_local(clp) && nfs_server_was_local(clp)))
> + nfs_local_probe_async(clp);
> + }
> + mutex_unlock(&nfs_uuid->local_probe_mutex);
> +#endif
> +}
> +
> static int nfs3_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
> {
> struct inode *inode = hdr->inode;
> @@ -855,8 +878,11 @@ static int nfs3_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
> if (nfs3_async_handle_jukebox(task, inode))
> return -EAGAIN;
>
> - if (task->tk_status >= 0 && !server->read_hdrsize)
> - cmpxchg(&server->read_hdrsize, 0, hdr->res.replen);
> + if (task->tk_status >= 0) {
> + if (!server->read_hdrsize)
> + cmpxchg(&server->read_hdrsize, 0, hdr->res.replen);
> + nfs3_local_probe(server);
> + }
>
> nfs_invalidate_atime(inode);
> nfs_refresh_inode(inode, &hdr->fattr);
> @@ -886,8 +912,10 @@ static int nfs3_write_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
>
> if (nfs3_async_handle_jukebox(task, inode))
> return -EAGAIN;
> - if (task->tk_status >= 0)
> + if (task->tk_status >= 0) {
> nfs_writeback_update_inode(hdr);
> + nfs3_local_probe(NFS_SERVER(inode));
> + }
> return 0;
> }
>
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index fb376d38ac9a..852ba8fd73f3 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -43,6 +43,8 @@ void nfs_uuid_init(nfs_uuid_t *nfs_uuid)
> INIT_LIST_HEAD(&nfs_uuid->list);
> INIT_LIST_HEAD(&nfs_uuid->files);
> spin_lock_init(&nfs_uuid->lock);
> + mutex_init(&nfs_uuid->local_probe_mutex);
> + nfs_uuid->local_probe_count = 0;
> }
> EXPORT_SYMBOL_GPL(nfs_uuid_init);
>
> @@ -143,6 +145,8 @@ void nfs_localio_enable_client(struct nfs_client *clp)
>
> spin_lock(&nfs_uuid->lock);
> set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> + /* Also set hint that client and server are LOCALIO capable */
> + set_bit(NFS_CS_LOCAL_IO_CAPABLE, &clp->cl_flags);
> trace_nfs_localio_enable_client(clp);
> spin_unlock(&nfs_uuid->lock);
> }
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 63d7e0f478d8..45906c402c98 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -51,6 +51,7 @@ struct nfs_client {
> #define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
> #define NFS_CS_PNFS 9 /* - Server used for pnfs */
> #define NFS_CS_LOCAL_IO 10 /* - client is local */
> +#define NFS_CS_LOCAL_IO_CAPABLE 11 /* - client was previously local */
> struct sockaddr_storage cl_addr; /* server identifier */
> size_t cl_addrlen;
> char * cl_hostname; /* hostname of server */
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index c68a529230c1..3dfef0bb18fe 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -27,7 +27,9 @@ struct nfs_file_localio;
> */
> typedef struct {
> uuid_t uuid;
> - /* sadly this struct is just over a cacheline, avoid bouncing */
> + struct mutex local_probe_mutex;
> + unsigned local_probe_count;
> + /* sadly this struct is over a cacheline, avoid bouncing */
> spinlock_t ____cacheline_aligned lock;
> struct list_head list;
> spinlock_t *list_lock; /* nn->local_clients_lock */
> --
> 2.44.0
>
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 05/19] nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
2024-11-11 1:20 ` NeilBrown
@ 2024-11-11 15:09 ` Mike Snitzer
0 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-11 15:09 UTC (permalink / raw)
To: NeilBrown
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Mon, Nov 11, 2024 at 12:20:12PM +1100, NeilBrown wrote:
> On Sat, 09 Nov 2024, Mike Snitzer wrote:
> > Push the read_iter and write_iter availability checks down to
> > nfs_do_local_read and nfs_do_local_write respectively.
> >
> > This eliminates a redundant nfs_to->nfsd_file_file() call.
>
> Do it?
It does.. it is harder to see just from looking at the patch.
> The patch removes 2 of these calls and add 2 of these calls. So it
> isn't clear what is being eliminated.
>
> Maybe it is a good thing to do, but it isn't obvious to me why.
nfs_local_doio() is common to both read and write, and both
nfs_do_local_read() and nfs_do_local_write() already call
nfs_to->nfsd_file_file(). This patch simply pushes nfs_local_doio()'s
nfs_to->nfsd_file_file() call down to nfs_do_local_{read,write} (or
put differently: moves their respective calls earlier) to kill 2 birds
with 1 stone. Hence it eliminates an extra call to
nfs_to->nfsd_file_file() in both read and write paths.
Mike
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-11 1:55 ` NeilBrown
@ 2024-11-11 15:33 ` Mike Snitzer
2024-11-11 20:35 ` NeilBrown
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-11 15:33 UTC (permalink / raw)
To: NeilBrown
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Mon, Nov 11, 2024 at 12:55:24PM +1100, NeilBrown wrote:
> On Sat, 09 Nov 2024, Mike Snitzer wrote:
> > Remove cl_localio_lock from 'struct nfs_client' in favor of adding a
> > lock to the nfs_uuid_t struct (which is embedded in each nfs_client).
> >
> > Push nfs_local_{enable,disable} implementation down to nfs_common.
> > Those methods now call nfs_localio_{enable,disable}_client.
> >
> > This allows implementing nfs_localio_invalidate_clients in terms of
> > nfs_localio_disable_client.
> >
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > fs/nfs/client.c | 1 -
> > fs/nfs/localio.c | 18 ++++++------
> > fs/nfs_common/nfslocalio.c | 57 ++++++++++++++++++++++++++------------
> > include/linux/nfs_fs_sb.h | 1 -
> > include/linux/nfslocalio.h | 8 +++++-
> > 5 files changed, 55 insertions(+), 30 deletions(-)
> >
> > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > index 03ecc7765615..124232054807 100644
> > --- a/fs/nfs/client.c
> > +++ b/fs/nfs/client.c
> > @@ -182,7 +182,6 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
> > seqlock_init(&clp->cl_boot_lock);
> > ktime_get_real_ts64(&clp->cl_nfssvc_boot);
> > nfs_uuid_init(&clp->cl_uuid);
> > - spin_lock_init(&clp->cl_localio_lock);
> > #endif /* CONFIG_NFS_LOCALIO */
> >
> > clp->cl_principal = "*";
> > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > index cab2a8819259..4c75ffc5efa2 100644
> > --- a/fs/nfs/localio.c
> > +++ b/fs/nfs/localio.c
> > @@ -125,10 +125,8 @@ const struct rpc_program nfslocalio_program = {
> > */
> > static void nfs_local_enable(struct nfs_client *clp)
> > {
> > - spin_lock(&clp->cl_localio_lock);
> > - set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> > trace_nfs_local_enable(clp);
> > - spin_unlock(&clp->cl_localio_lock);
> > + nfs_localio_enable_client(clp);
> > }
>
> Why do we need this function? The one caller could call
> nfs_localio_enable_client() directly instead. The tracepoint could be
> placed in that one caller.
Yeah, I saw that too but felt it useful to differentiate between calls
that occur during NFS client initialization and those that happen as a
side-effect of callers from other contexts (in later patch this
manifests as nfs_localio_disable_client vs nfs_local_disable).
Hence my adding secondary tracepoints for nfs_common (see "[PATCH
17/19] nfs_common: add nfs_localio trace events).
But sure, we can just eliminate nfs_local_{enable,disable} and the
corresponding tracepoints (which will have moved down to nfs_common).
> > /*
> > @@ -136,12 +134,8 @@ static void nfs_local_enable(struct nfs_client *clp)
> > */
> > void nfs_local_disable(struct nfs_client *clp)
> > {
> > - spin_lock(&clp->cl_localio_lock);
> > - if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> > - trace_nfs_local_disable(clp);
> > - nfs_localio_disable_client(&clp->cl_uuid);
> > - }
> > - spin_unlock(&clp->cl_localio_lock);
> > + trace_nfs_local_disable(clp);
> > + nfs_localio_disable_client(clp);
> > }
>
> Ditto. Though there are more callers so the tracepoint solution isn't
> quite so obvious.
Right... as I just explained: that's why I preserved nfs_local_disable
(and the tracepoint).
> > /*
> > @@ -183,8 +177,12 @@ static bool nfs_server_uuid_is_local(struct nfs_client *clp)
> > rpc_shutdown_client(rpcclient_localio);
> >
> > /* Server is only local if it initialized required struct members */
> > - if (status || !clp->cl_uuid.net || !clp->cl_uuid.dom)
> > + rcu_read_lock();
> > + if (status || !rcu_access_pointer(clp->cl_uuid.net) || !clp->cl_uuid.dom) {
> > + rcu_read_unlock();
> > return false;
> > + }
> > + rcu_read_unlock();
>
> What value does RCU provide here? I don't think this change is needed.
> rcu_access_pointer does not require rcu_read_lock().
OK, not sure why I though RCU read-side needed for rcu_access_pointer()...
> > return true;
> > }
> > diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> > index 904439e4bb85..cf2f47ea4f8d 100644
> > --- a/fs/nfs_common/nfslocalio.c
> > +++ b/fs/nfs_common/nfslocalio.c
> > @@ -7,6 +7,9 @@
> > #include <linux/module.h>
> > #include <linux/list.h>
> > #include <linux/nfslocalio.h>
> > +#include <linux/nfs3.h>
> > +#include <linux/nfs4.h>
> > +#include <linux/nfs_fs_sb.h>
>
> I don't feel good about adding this nfs client knowledge in to nfs_common.
I hear you.. I was "OK with it".
> > #include <net/netns/generic.h>
> >
> > MODULE_LICENSE("GPL");
> > @@ -25,6 +28,7 @@ void nfs_uuid_init(nfs_uuid_t *nfs_uuid)
> > nfs_uuid->net = NULL;
> > nfs_uuid->dom = NULL;
> > INIT_LIST_HEAD(&nfs_uuid->list);
> > + spin_lock_init(&nfs_uuid->lock);
> > }
> > EXPORT_SYMBOL_GPL(nfs_uuid_init);
> >
> > @@ -94,12 +98,23 @@ void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
> > }
> > EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
> >
> > +void nfs_localio_enable_client(struct nfs_client *clp)
> > +{
> > + nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
> > +
> > + spin_lock(&nfs_uuid->lock);
> > + set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> > + spin_unlock(&nfs_uuid->lock);
> > +}
> > +EXPORT_SYMBOL_GPL(nfs_localio_enable_client);
>
> And I don't feel good about nfs_local accessing nfs_client directly.
> It only uses it for NFS_CS_LOCAL_IO. Can we ditch that flag and instead
> so something like testing nfs_uuid.net ??
That'd probably be OK for the equivalent of NFS_CS_LOCAL_IO but the last
patch in this series ("nfs: probe for LOCALIO when v3 client
reconnects to server") adds NFS_CS_LOCAL_IO_CAPABLE to provide a hint
that the client and server successfully established themselves local
via LOCALIO protocol. This is needed so that NFSv3 (stateless) has a
hint that reestablishing LOCALIO needed if/when the client loses
connectivity to the server (because it was shutdown and restarted).
Could introduce flags local to nfs_uuid_t structure as a means of
nfs_common/nfslocalio.c not needing to know internals of nfs_client at
all -- conversely: probably best to not bloat nfs_uuid_t (and
nfs_client) further, so should just ensure nfs_client treated as an
opaque pointer in nfs_common by introducing accessors?
> > +
> > static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
> > {
> > - if (nfs_uuid->net) {
> > - module_put(nfsd_mod);
> > - nfs_uuid->net = NULL;
> > - }
> > + if (!nfs_uuid->net)
> > + return;
> > + module_put(nfsd_mod);
> > + rcu_assign_pointer(nfs_uuid->net, NULL);
> > +
>
> I much prefer RCU_INIT_POINTER for assigning NULL as there is no need
> for ordering here.
OK.
> > if (nfs_uuid->dom) {
> > auth_domain_put(nfs_uuid->dom);
> > nfs_uuid->dom = NULL;
> > @@ -107,27 +122,35 @@ static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
> > list_del_init(&nfs_uuid->list);
> > }
> >
> > -void nfs_localio_invalidate_clients(struct list_head *list)
> > +void nfs_localio_disable_client(struct nfs_client *clp)
> > +{
> > + nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
> > +
> > + spin_lock(&nfs_uuid->lock);
> > + if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> > + spin_lock(&nfs_uuid_lock);
> > + nfs_uuid_put_locked(nfs_uuid);
> > + spin_unlock(&nfs_uuid_lock);
> > + }
> > + spin_unlock(&nfs_uuid->lock);
> > +}
> > +EXPORT_SYMBOL_GPL(nfs_localio_disable_client);
> > +
> > +void nfs_localio_invalidate_clients(struct list_head *cl_uuid_list)
> > {
> > nfs_uuid_t *nfs_uuid, *tmp;
> >
> > spin_lock(&nfs_uuid_lock);
> > - list_for_each_entry_safe(nfs_uuid, tmp, list, list)
> > - nfs_uuid_put_locked(nfs_uuid);
> > + list_for_each_entry_safe(nfs_uuid, tmp, cl_uuid_list, list) {
> > + struct nfs_client *clp =
> > + container_of(nfs_uuid, struct nfs_client, cl_uuid);
> > +
> > + nfs_localio_disable_client(clp);
> > + }
> > spin_unlock(&nfs_uuid_lock);
> > }
> > EXPORT_SYMBOL_GPL(nfs_localio_invalidate_clients);
> >
> > -void nfs_localio_disable_client(nfs_uuid_t *nfs_uuid)
> > -{
> > - if (nfs_uuid->net) {
> > - spin_lock(&nfs_uuid_lock);
> > - nfs_uuid_put_locked(nfs_uuid);
> > - spin_unlock(&nfs_uuid_lock);
> > - }
> > -}
> > -EXPORT_SYMBOL_GPL(nfs_localio_disable_client);
> > -
> > struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
> > struct rpc_clnt *rpc_clnt, const struct cred *cred,
> > const struct nfs_fh *nfs_fh, const fmode_t fmode)
> > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> > index b804346a9741..239d86ef166c 100644
> > --- a/include/linux/nfs_fs_sb.h
> > +++ b/include/linux/nfs_fs_sb.h
> > @@ -132,7 +132,6 @@ struct nfs_client {
> > struct timespec64 cl_nfssvc_boot;
> > seqlock_t cl_boot_lock;
> > nfs_uuid_t cl_uuid;
> > - spinlock_t cl_localio_lock;
> > #endif /* CONFIG_NFS_LOCALIO */
> > };
> >
> > diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> > index a05d1043f2b0..4d5583873f41 100644
> > --- a/include/linux/nfslocalio.h
> > +++ b/include/linux/nfslocalio.h
> > @@ -6,6 +6,7 @@
> > #ifndef __LINUX_NFSLOCALIO_H
> > #define __LINUX_NFSLOCALIO_H
> >
> > +
> > /* nfsd_file structure is purposely kept opaque to NFS client */
> > struct nfsd_file;
> >
> > @@ -19,6 +20,8 @@ struct nfsd_file;
> > #include <linux/nfs.h>
> > #include <net/net_namespace.h>
> >
> > +struct nfs_client;
> > +
> > /*
> > * Useful to allow a client to negotiate if localio
> > * possible with its server.
> > @@ -27,6 +30,8 @@ struct nfsd_file;
> > */
> > typedef struct {
> > uuid_t uuid;
> > + /* sadly this struct is just over a cacheline, avoid bouncing */
> > + spinlock_t ____cacheline_aligned lock;
> > struct list_head list;
> > struct net __rcu *net; /* nfsd's network namespace */
> > struct auth_domain *dom; /* auth_domain for localio */
> > @@ -38,7 +43,8 @@ void nfs_uuid_end(nfs_uuid_t *);
> > void nfs_uuid_is_local(const uuid_t *, struct list_head *,
> > struct net *, struct auth_domain *, struct module *);
> >
> > -void nfs_localio_disable_client(nfs_uuid_t *nfs_uuid);
> > +void nfs_localio_enable_client(struct nfs_client *clp);
> > +void nfs_localio_disable_client(struct nfs_client *clp);
> > void nfs_localio_invalidate_clients(struct list_head *list);
> >
> > /* localio needs to map filehandle -> struct nfsd_file */
> > --
> > 2.44.0
> >
> >
>
> I think this is a good refactoring to do, but I don't like some of the
> details, or some of the RCU code.
Sure, I'll clean it up further.
Thanks for your review.
Mike
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-11 15:33 ` Mike Snitzer
@ 2024-11-11 20:35 ` NeilBrown
2024-11-11 22:27 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: NeilBrown @ 2024-11-11 20:35 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Tue, 12 Nov 2024, Mike Snitzer wrote:
> On Mon, Nov 11, 2024 at 12:55:24PM +1100, NeilBrown wrote:
> > On Sat, 09 Nov 2024, Mike Snitzer wrote:
> > > Remove cl_localio_lock from 'struct nfs_client' in favor of adding a
> > > lock to the nfs_uuid_t struct (which is embedded in each nfs_client).
> > >
> > > Push nfs_local_{enable,disable} implementation down to nfs_common.
> > > Those methods now call nfs_localio_{enable,disable}_client.
> > >
> > > This allows implementing nfs_localio_invalidate_clients in terms of
> > > nfs_localio_disable_client.
> > >
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > ---
> > > fs/nfs/client.c | 1 -
> > > fs/nfs/localio.c | 18 ++++++------
> > > fs/nfs_common/nfslocalio.c | 57 ++++++++++++++++++++++++++------------
> > > include/linux/nfs_fs_sb.h | 1 -
> > > include/linux/nfslocalio.h | 8 +++++-
> > > 5 files changed, 55 insertions(+), 30 deletions(-)
> > >
> > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > > index 03ecc7765615..124232054807 100644
> > > --- a/fs/nfs/client.c
> > > +++ b/fs/nfs/client.c
> > > @@ -182,7 +182,6 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
> > > seqlock_init(&clp->cl_boot_lock);
> > > ktime_get_real_ts64(&clp->cl_nfssvc_boot);
> > > nfs_uuid_init(&clp->cl_uuid);
> > > - spin_lock_init(&clp->cl_localio_lock);
> > > #endif /* CONFIG_NFS_LOCALIO */
> > >
> > > clp->cl_principal = "*";
> > > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > > index cab2a8819259..4c75ffc5efa2 100644
> > > --- a/fs/nfs/localio.c
> > > +++ b/fs/nfs/localio.c
> > > @@ -125,10 +125,8 @@ const struct rpc_program nfslocalio_program = {
> > > */
> > > static void nfs_local_enable(struct nfs_client *clp)
> > > {
> > > - spin_lock(&clp->cl_localio_lock);
> > > - set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> > > trace_nfs_local_enable(clp);
> > > - spin_unlock(&clp->cl_localio_lock);
> > > + nfs_localio_enable_client(clp);
> > > }
> >
> > Why do we need this function? The one caller could call
> > nfs_localio_enable_client() directly instead. The tracepoint could be
> > placed in that one caller.
>
> Yeah, I saw that too but felt it useful to differentiate between calls
> that occur during NFS client initialization and those that happen as a
> side-effect of callers from other contexts (in later patch this
> manifests as nfs_localio_disable_client vs nfs_local_disable).
>
> Hence my adding secondary tracepoints for nfs_common (see "[PATCH
> 17/19] nfs_common: add nfs_localio trace events).
>
> But sure, we can just eliminate nfs_local_{enable,disable} and the
> corresponding tracepoints (which will have moved down to nfs_common).
I don't feel strongly about this. If you think these is value in these
wrapper functions then I won't argue. As a general rule I don't like
multiple interfaces that do (much) the same thing as keeping track of
them increases the mental load.
>
> > > /*
> > > @@ -136,12 +134,8 @@ static void nfs_local_enable(struct nfs_client *clp)
> > > */
> > > void nfs_local_disable(struct nfs_client *clp)
> > > {
> > > - spin_lock(&clp->cl_localio_lock);
> > > - if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> > > - trace_nfs_local_disable(clp);
> > > - nfs_localio_disable_client(&clp->cl_uuid);
> > > - }
> > > - spin_unlock(&clp->cl_localio_lock);
> > > + trace_nfs_local_disable(clp);
> > > + nfs_localio_disable_client(clp);
> > > }
> >
> > Ditto. Though there are more callers so the tracepoint solution isn't
> > quite so obvious.
>
> Right... as I just explained: that's why I preserved nfs_local_disable
> (and the tracepoint).
>
>
> > > /*
> > > @@ -183,8 +177,12 @@ static bool nfs_server_uuid_is_local(struct nfs_client *clp)
> > > rpc_shutdown_client(rpcclient_localio);
> > >
> > > /* Server is only local if it initialized required struct members */
> > > - if (status || !clp->cl_uuid.net || !clp->cl_uuid.dom)
> > > + rcu_read_lock();
> > > + if (status || !rcu_access_pointer(clp->cl_uuid.net) || !clp->cl_uuid.dom) {
> > > + rcu_read_unlock();
> > > return false;
> > > + }
> > > + rcu_read_unlock();
> >
> > What value does RCU provide here? I don't think this change is needed.
> > rcu_access_pointer does not require rcu_read_lock().
>
> OK, not sure why I though RCU read-side needed for rcu_access_pointer()...
>
> > > return true;
> > > }
> > > diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> > > index 904439e4bb85..cf2f47ea4f8d 100644
> > > --- a/fs/nfs_common/nfslocalio.c
> > > +++ b/fs/nfs_common/nfslocalio.c
> > > @@ -7,6 +7,9 @@
> > > #include <linux/module.h>
> > > #include <linux/list.h>
> > > #include <linux/nfslocalio.h>
> > > +#include <linux/nfs3.h>
> > > +#include <linux/nfs4.h>
> > > +#include <linux/nfs_fs_sb.h>
> >
> > I don't feel good about adding this nfs client knowledge in to nfs_common.
>
> I hear you.. I was "OK with it".
>
> > > #include <net/netns/generic.h>
> > >
> > > MODULE_LICENSE("GPL");
> > > @@ -25,6 +28,7 @@ void nfs_uuid_init(nfs_uuid_t *nfs_uuid)
> > > nfs_uuid->net = NULL;
> > > nfs_uuid->dom = NULL;
> > > INIT_LIST_HEAD(&nfs_uuid->list);
> > > + spin_lock_init(&nfs_uuid->lock);
> > > }
> > > EXPORT_SYMBOL_GPL(nfs_uuid_init);
> > >
> > > @@ -94,12 +98,23 @@ void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
> > > }
> > > EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
> > >
> > > +void nfs_localio_enable_client(struct nfs_client *clp)
> > > +{
> > > + nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
> > > +
> > > + spin_lock(&nfs_uuid->lock);
> > > + set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> > > + spin_unlock(&nfs_uuid->lock);
> > > +}
> > > +EXPORT_SYMBOL_GPL(nfs_localio_enable_client);
> >
> > And I don't feel good about nfs_local accessing nfs_client directly.
> > It only uses it for NFS_CS_LOCAL_IO. Can we ditch that flag and instead
> > so something like testing nfs_uuid.net ??
>
> That'd probably be OK for the equivalent of NFS_CS_LOCAL_IO but the last
> patch in this series ("nfs: probe for LOCALIO when v3 client
> reconnects to server") adds NFS_CS_LOCAL_IO_CAPABLE to provide a hint
> that the client and server successfully established themselves local
> via LOCALIO protocol. This is needed so that NFSv3 (stateless) has a
> hint that reestablishing LOCALIO needed if/when the client loses
> connectivity to the server (because it was shutdown and restarted).
I don't like NFS_CS_LOCAL_IO_CAPABLE.
A use case that I imagine (and a customer does something like this) is an
HA cluster where the NFS server can move from one node to another. All
the node access the filesystem, most over NFS. If a server-migration
happens (e.g. the current server node failed) then the new server node
would suddenly become LOCALIO-capable even though it wasn't at
mount-time. I would like it to be able to detect this and start doing
localio.
So I don't want NFS_CS_LOCAL_IO_CAPABLE. I think tracking when the
network connection is re-established is sufficient.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-11 20:35 ` NeilBrown
@ 2024-11-11 22:27 ` Mike Snitzer
2024-11-11 23:23 ` NeilBrown
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-11 22:27 UTC (permalink / raw)
To: NeilBrown
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Tue, Nov 12, 2024 at 07:35:04AM +1100, NeilBrown wrote:
> On Tue, 12 Nov 2024, Mike Snitzer wrote:
> > On Mon, Nov 11, 2024 at 12:55:24PM +1100, NeilBrown wrote:
> > > On Sat, 09 Nov 2024, Mike Snitzer wrote:
> > > > Remove cl_localio_lock from 'struct nfs_client' in favor of adding a
> > > > lock to the nfs_uuid_t struct (which is embedded in each nfs_client).
> > > >
> > > > Push nfs_local_{enable,disable} implementation down to nfs_common.
> > > > Those methods now call nfs_localio_{enable,disable}_client.
> > > >
> > > > This allows implementing nfs_localio_invalidate_clients in terms of
> > > > nfs_localio_disable_client.
> > > >
> > > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > > ---
> > > > fs/nfs/client.c | 1 -
> > > > fs/nfs/localio.c | 18 ++++++------
> > > > fs/nfs_common/nfslocalio.c | 57 ++++++++++++++++++++++++++------------
> > > > include/linux/nfs_fs_sb.h | 1 -
> > > > include/linux/nfslocalio.h | 8 +++++-
> > > > 5 files changed, 55 insertions(+), 30 deletions(-)
> > > >
> > > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > > > index 03ecc7765615..124232054807 100644
> > > > --- a/fs/nfs/client.c
> > > > +++ b/fs/nfs/client.c
> > > > @@ -182,7 +182,6 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
> > > > seqlock_init(&clp->cl_boot_lock);
> > > > ktime_get_real_ts64(&clp->cl_nfssvc_boot);
> > > > nfs_uuid_init(&clp->cl_uuid);
> > > > - spin_lock_init(&clp->cl_localio_lock);
> > > > #endif /* CONFIG_NFS_LOCALIO */
> > > >
> > > > clp->cl_principal = "*";
> > > > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > > > index cab2a8819259..4c75ffc5efa2 100644
> > > > --- a/fs/nfs/localio.c
> > > > +++ b/fs/nfs/localio.c
> > > > @@ -125,10 +125,8 @@ const struct rpc_program nfslocalio_program = {
> > > > */
> > > > static void nfs_local_enable(struct nfs_client *clp)
> > > > {
> > > > - spin_lock(&clp->cl_localio_lock);
> > > > - set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> > > > trace_nfs_local_enable(clp);
> > > > - spin_unlock(&clp->cl_localio_lock);
> > > > + nfs_localio_enable_client(clp);
> > > > }
> > >
> > > Why do we need this function? The one caller could call
> > > nfs_localio_enable_client() directly instead. The tracepoint could be
> > > placed in that one caller.
> >
> > Yeah, I saw that too but felt it useful to differentiate between calls
> > that occur during NFS client initialization and those that happen as a
> > side-effect of callers from other contexts (in later patch this
> > manifests as nfs_localio_disable_client vs nfs_local_disable).
> >
> > Hence my adding secondary tracepoints for nfs_common (see "[PATCH
> > 17/19] nfs_common: add nfs_localio trace events).
> >
> > But sure, we can just eliminate nfs_local_{enable,disable} and the
> > corresponding tracepoints (which will have moved down to nfs_common).
>
> I don't feel strongly about this. If you think these is value in these
> wrapper functions then I won't argue. As a general rule I don't like
> multiple interfaces that do (much) the same thing as keeping track of
> them increases the mental load.
>
> >
> > > > /*
> > > > @@ -136,12 +134,8 @@ static void nfs_local_enable(struct nfs_client *clp)
> > > > */
> > > > void nfs_local_disable(struct nfs_client *clp)
> > > > {
> > > > - spin_lock(&clp->cl_localio_lock);
> > > > - if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> > > > - trace_nfs_local_disable(clp);
> > > > - nfs_localio_disable_client(&clp->cl_uuid);
> > > > - }
> > > > - spin_unlock(&clp->cl_localio_lock);
> > > > + trace_nfs_local_disable(clp);
> > > > + nfs_localio_disable_client(clp);
> > > > }
> > >
> > > Ditto. Though there are more callers so the tracepoint solution isn't
> > > quite so obvious.
> >
> > Right... as I just explained: that's why I preserved nfs_local_disable
> > (and the tracepoint).
> >
> >
> > > > /*
> > > > @@ -183,8 +177,12 @@ static bool nfs_server_uuid_is_local(struct nfs_client *clp)
> > > > rpc_shutdown_client(rpcclient_localio);
> > > >
> > > > /* Server is only local if it initialized required struct members */
> > > > - if (status || !clp->cl_uuid.net || !clp->cl_uuid.dom)
> > > > + rcu_read_lock();
> > > > + if (status || !rcu_access_pointer(clp->cl_uuid.net) || !clp->cl_uuid.dom) {
> > > > + rcu_read_unlock();
> > > > return false;
> > > > + }
> > > > + rcu_read_unlock();
> > >
> > > What value does RCU provide here? I don't think this change is needed.
> > > rcu_access_pointer does not require rcu_read_lock().
> >
> > OK, not sure why I though RCU read-side needed for rcu_access_pointer()...
> >
> > > > return true;
> > > > }
> > > > diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> > > > index 904439e4bb85..cf2f47ea4f8d 100644
> > > > --- a/fs/nfs_common/nfslocalio.c
> > > > +++ b/fs/nfs_common/nfslocalio.c
> > > > @@ -7,6 +7,9 @@
> > > > #include <linux/module.h>
> > > > #include <linux/list.h>
> > > > #include <linux/nfslocalio.h>
> > > > +#include <linux/nfs3.h>
> > > > +#include <linux/nfs4.h>
> > > > +#include <linux/nfs_fs_sb.h>
> > >
> > > I don't feel good about adding this nfs client knowledge in to nfs_common.
> >
> > I hear you.. I was "OK with it".
> >
> > > > #include <net/netns/generic.h>
> > > >
> > > > MODULE_LICENSE("GPL");
> > > > @@ -25,6 +28,7 @@ void nfs_uuid_init(nfs_uuid_t *nfs_uuid)
> > > > nfs_uuid->net = NULL;
> > > > nfs_uuid->dom = NULL;
> > > > INIT_LIST_HEAD(&nfs_uuid->list);
> > > > + spin_lock_init(&nfs_uuid->lock);
> > > > }
> > > > EXPORT_SYMBOL_GPL(nfs_uuid_init);
> > > >
> > > > @@ -94,12 +98,23 @@ void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
> > > > }
> > > > EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
> > > >
> > > > +void nfs_localio_enable_client(struct nfs_client *clp)
> > > > +{
> > > > + nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
> > > > +
> > > > + spin_lock(&nfs_uuid->lock);
> > > > + set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> > > > + spin_unlock(&nfs_uuid->lock);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(nfs_localio_enable_client);
> > >
> > > And I don't feel good about nfs_local accessing nfs_client directly.
> > > It only uses it for NFS_CS_LOCAL_IO. Can we ditch that flag and instead
> > > so something like testing nfs_uuid.net ??
> >
> > That'd probably be OK for the equivalent of NFS_CS_LOCAL_IO but the last
> > patch in this series ("nfs: probe for LOCALIO when v3 client
> > reconnects to server") adds NFS_CS_LOCAL_IO_CAPABLE to provide a hint
> > that the client and server successfully established themselves local
> > via LOCALIO protocol. This is needed so that NFSv3 (stateless) has a
> > hint that reestablishing LOCALIO needed if/when the client loses
> > connectivity to the server (because it was shutdown and restarted).
>
> I don't like NFS_CS_LOCAL_IO_CAPABLE.
> A use case that I imagine (and a customer does something like this) is an
> HA cluster where the NFS server can move from one node to another. All
> the node access the filesystem, most over NFS. If a server-migration
> happens (e.g. the current server node failed) then the new server node
> would suddenly become LOCALIO-capable even though it wasn't at
> mount-time. I would like it to be able to detect this and start doing
> localio.
Server migration while retaining the client being local to the new
server? So client migrates too?
If the client migrates then it will negotiate with server using
LOCALIO protocol.
Anyway, this HA hypothetical feels contrived. It is fine that you
dislike NFS_CS_LOCAL_IO_CAPABLE but I don't understand what you'd like
as an alternative. Or why the simplicity in my approach lacking.
> So I don't want NFS_CS_LOCAL_IO_CAPABLE. I think tracking when the
> network connection is re-established is sufficient.
Eh, that type of tracking doesn't really buy me anything if I've lost
context (that LOCALIO was previously established and should be
re-established).
NFS v3 is stateless, hence my hooking off read and write paths to
trigger nfs_local_probe_async(). Unlike NFS v4, with its grace, more
care is needed to avoid needless calls to nfs_local_probe_async().
Your previous email about just tracking network connection change was
an optimization for avoiding repeat (pointless) probes. We still
need to know to do the probe to begin with. Are you saying you want
to backfill the equivalent of grace (or pseudo-grace) to NFSv3?
My approach works. Not following what you are saying will be better.
Thanks,
Mike
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-11 22:27 ` Mike Snitzer
@ 2024-11-11 23:23 ` NeilBrown
2024-11-12 0:16 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: NeilBrown @ 2024-11-11 23:23 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Tue, 12 Nov 2024, Mike Snitzer wrote:
> On Tue, Nov 12, 2024 at 07:35:04AM +1100, NeilBrown wrote:
> >
> > I don't like NFS_CS_LOCAL_IO_CAPABLE.
> > A use case that I imagine (and a customer does something like this) is an
> > HA cluster where the NFS server can move from one node to another. All
> > the node access the filesystem, most over NFS. If a server-migration
> > happens (e.g. the current server node failed) then the new server node
> > would suddenly become LOCALIO-capable even though it wasn't at
> > mount-time. I would like it to be able to detect this and start doing
> > localio.
>
> Server migration while retaining the client being local to the new
> server? So client migrates too?
No. Client doesn't migrate. Server migrates and appears on the same
host as the client. The client can suddenly get better performance. It
should benefit from that.
>
> If the client migrates then it will negotiate with server using
> LOCALIO protocol.
>
> Anyway, this HA hypothetical feels contrived. It is fine that you
> dislike NFS_CS_LOCAL_IO_CAPABLE but I don't understand what you'd like
> as an alternative. Or why the simplicity in my approach lacking.
We have customers with exactly this HA config. This is why I put work
into make sure loop-back NFS (client and server on same node) works
cleanly without memory allocation deadlocks.
https://lwn.net/Articles/595652/
Getting localio in that config would be even better.
Your approach assumes that if LOCALIO isn't detected at mount time, it
will never been available. I think that is a flawed assumption.
>
> > So I don't want NFS_CS_LOCAL_IO_CAPABLE. I think tracking when the
> > network connection is re-established is sufficient.
>
> Eh, that type of tracking doesn't really buy me anything if I've lost
> context (that LOCALIO was previously established and should be
> re-established).
>
> NFS v3 is stateless, hence my hooking off read and write paths to
> trigger nfs_local_probe_async(). Unlike NFS v4, with its grace, more
> care is needed to avoid needless calls to nfs_local_probe_async().
I think it makes perfect sense to trigger the probe on a successful
read/write with some rate limiting to avoid sending a LOCALIO probe on
EVERY read/write. Your rate-limiting for NFSv3 is:
- never probe if the mount-time probe was not successful
- otherwise probe once every 256 IO requests.
I think the first is too restrictive, and the second is unnecessarily
frequent.
I propose:
- probe once each time the client reconnects with the server
This will result in many fewer probes in practice, but any successful
probe will happen at nearly the earliest possible moment.
>
> Your previous email about just tracking network connection change was
> an optimization for avoiding repeat (pointless) probes. We still
> need to know to do the probe to begin with. Are you saying you want
> to backfill the equivalent of grace (or pseudo-grace) to NFSv3?
You don't "know to do the probe" at mount time. You simply always do
it. Similarly whenever localio isn't active it is always appropriate to
probe - with rate limiting.
And NFSv3 already has a grace period - in the NLM/STAT protocols. We
could use STAT to detect when the server has restarted and so it is worth
probing again. But STAT is not as reliable as we might like and it
would be more complexity with no real gain.
I would be happy to use exactly the same mechanism for both v3 and v4:
send a probe after IO on a new connection. But your solution for v4 is
simple and elegant so I'm not at all against it.
>
> My approach works. Not following what you are saying will be better.
- server-migration can benefit from localio on the new host
- many fewer probes
- probes are much more timely.
NeilBrown
>
> Thanks,
> Mike
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-11 23:23 ` NeilBrown
@ 2024-11-12 0:16 ` Mike Snitzer
2024-11-12 0:49 ` NeilBrown
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-11-12 0:16 UTC (permalink / raw)
To: NeilBrown
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Tue, Nov 12, 2024 at 10:23:19AM +1100, NeilBrown wrote:
> On Tue, 12 Nov 2024, Mike Snitzer wrote:
> > On Tue, Nov 12, 2024 at 07:35:04AM +1100, NeilBrown wrote:
> > >
> > > I don't like NFS_CS_LOCAL_IO_CAPABLE.
> > > A use case that I imagine (and a customer does something like this) is an
> > > HA cluster where the NFS server can move from one node to another. All
> > > the node access the filesystem, most over NFS. If a server-migration
> > > happens (e.g. the current server node failed) then the new server node
> > > would suddenly become LOCALIO-capable even though it wasn't at
> > > mount-time. I would like it to be able to detect this and start doing
> > > localio.
> >
> > Server migration while retaining the client being local to the new
> > server? So client migrates too?
>
> No. Client doesn't migrate. Server migrates and appears on the same
> host as the client. The client can suddenly get better performance. It
> should benefit from that.
>
> >
> > If the client migrates then it will negotiate with server using
> > LOCALIO protocol.
> >
> > Anyway, this HA hypothetical feels contrived. It is fine that you
> > dislike NFS_CS_LOCAL_IO_CAPABLE but I don't understand what you'd like
> > as an alternative. Or why the simplicity in my approach lacking.
>
> We have customers with exactly this HA config. This is why I put work
> into make sure loop-back NFS (client and server on same node) works
> cleanly without memory allocation deadlocks.
> https://lwn.net/Articles/595652/
> Getting localio in that config would be even better.
>
> Your approach assumes that if LOCALIO isn't detected at mount time, it
> will never been available. I think that is a flawed assumption.
That's fair, I agree your HA scenario is valid. It was terse as
initially presented but I understand now, thanks.
> > > So I don't want NFS_CS_LOCAL_IO_CAPABLE. I think tracking when the
> > > network connection is re-established is sufficient.
> >
> > Eh, that type of tracking doesn't really buy me anything if I've lost
> > context (that LOCALIO was previously established and should be
> > re-established).
> >
> > NFS v3 is stateless, hence my hooking off read and write paths to
> > trigger nfs_local_probe_async(). Unlike NFS v4, with its grace, more
> > care is needed to avoid needless calls to nfs_local_probe_async().
>
> I think it makes perfect sense to trigger the probe on a successful
> read/write with some rate limiting to avoid sending a LOCALIO probe on
> EVERY read/write. Your rate-limiting for NFSv3 is:
> - never probe if the mount-time probe was not successful
> - otherwise probe once every 256 IO requests.
>
> I think the first is too restrictive, and the second is unnecessarily
> frequent.
> I propose:
> - probe once each time the client reconnects with the server
>
> This will result in many fewer probes in practice, but any successful
> probe will happen at nearly the earliest possible moment.
I'm all for what you're proposing (its what I wanted from the start).
In practice I just don't quite grok the client reconnect awareness
implementation you're saying is at our finger tips.
> > Your previous email about just tracking network connection change was
> > an optimization for avoiding repeat (pointless) probes. We still
> > need to know to do the probe to begin with. Are you saying you want
> > to backfill the equivalent of grace (or pseudo-grace) to NFSv3?
>
> You don't "know to do the probe" at mount time. You simply always do
> it. Similarly whenever localio isn't active it is always appropriate to
> probe - with rate limiting.
>
> And NFSv3 already has a grace period - in the NLM/STAT protocols. We
> could use STAT to detect when the server has restarted and so it is worth
> probing again. But STAT is not as reliable as we might like and it
> would be more complexity with no real gain.
If you have a specific idea for the mechanism we need to create to
detect the v3 client reconnects to the server please let me know.
Reusing or augmenting an existing thing is fine by me.
> I would be happy to use exactly the same mechanism for both v3 and v4:
> send a probe after IO on a new connection. But your solution for v4 is
> simple and elegant so I'm not at all against it.
>
> >
> > My approach works. Not following what you are saying will be better.
>
> - server-migration can benefit from localio on the new host
> - many fewer probes
> - probes are much more timely.
Ha, you misunderstood me: I know the benefits.. what eludes me is the
construction of the point to probe (reliable v3 client reconnect
awareness). ;)
Mike
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-12 0:16 ` Mike Snitzer
@ 2024-11-12 0:49 ` NeilBrown
2024-11-12 14:36 ` Chuck Lever
0 siblings, 1 reply; 45+ messages in thread
From: NeilBrown @ 2024-11-12 0:49 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
Jeff Layton
On Tue, 12 Nov 2024, Mike Snitzer wrote:
> On Tue, Nov 12, 2024 at 10:23:19AM +1100, NeilBrown wrote:
> > On Tue, 12 Nov 2024, Mike Snitzer wrote:
> > > On Tue, Nov 12, 2024 at 07:35:04AM +1100, NeilBrown wrote:
> > > >
> > > > I don't like NFS_CS_LOCAL_IO_CAPABLE.
> > > > A use case that I imagine (and a customer does something like this) is an
> > > > HA cluster where the NFS server can move from one node to another. All
> > > > the node access the filesystem, most over NFS. If a server-migration
> > > > happens (e.g. the current server node failed) then the new server node
> > > > would suddenly become LOCALIO-capable even though it wasn't at
> > > > mount-time. I would like it to be able to detect this and start doing
> > > > localio.
> > >
> > > Server migration while retaining the client being local to the new
> > > server? So client migrates too?
> >
> > No. Client doesn't migrate. Server migrates and appears on the same
> > host as the client. The client can suddenly get better performance. It
> > should benefit from that.
> >
> > >
> > > If the client migrates then it will negotiate with server using
> > > LOCALIO protocol.
> > >
> > > Anyway, this HA hypothetical feels contrived. It is fine that you
> > > dislike NFS_CS_LOCAL_IO_CAPABLE but I don't understand what you'd like
> > > as an alternative. Or why the simplicity in my approach lacking.
> >
> > We have customers with exactly this HA config. This is why I put work
> > into make sure loop-back NFS (client and server on same node) works
> > cleanly without memory allocation deadlocks.
> > https://lwn.net/Articles/595652/
> > Getting localio in that config would be even better.
> >
> > Your approach assumes that if LOCALIO isn't detected at mount time, it
> > will never been available. I think that is a flawed assumption.
>
> That's fair, I agree your HA scenario is valid. It was terse as
> initially presented but I understand now, thanks.
>
> > > > So I don't want NFS_CS_LOCAL_IO_CAPABLE. I think tracking when the
> > > > network connection is re-established is sufficient.
> > >
> > > Eh, that type of tracking doesn't really buy me anything if I've lost
> > > context (that LOCALIO was previously established and should be
> > > re-established).
> > >
> > > NFS v3 is stateless, hence my hooking off read and write paths to
> > > trigger nfs_local_probe_async(). Unlike NFS v4, with its grace, more
> > > care is needed to avoid needless calls to nfs_local_probe_async().
> >
> > I think it makes perfect sense to trigger the probe on a successful
> > read/write with some rate limiting to avoid sending a LOCALIO probe on
> > EVERY read/write. Your rate-limiting for NFSv3 is:
> > - never probe if the mount-time probe was not successful
> > - otherwise probe once every 256 IO requests.
> >
> > I think the first is too restrictive, and the second is unnecessarily
> > frequent.
> > I propose:
> > - probe once each time the client reconnects with the server
> >
> > This will result in many fewer probes in practice, but any successful
> > probe will happen at nearly the earliest possible moment.
>
> I'm all for what you're proposing (its what I wanted from the start).
> In practice I just don't quite grok the client reconnect awareness
> implementation you're saying is at our finger tips.
>
> > > Your previous email about just tracking network connection change was
> > > an optimization for avoiding repeat (pointless) probes. We still
> > > need to know to do the probe to begin with. Are you saying you want
> > > to backfill the equivalent of grace (or pseudo-grace) to NFSv3?
> >
> > You don't "know to do the probe" at mount time. You simply always do
> > it. Similarly whenever localio isn't active it is always appropriate to
> > probe - with rate limiting.
> >
> > And NFSv3 already has a grace period - in the NLM/STAT protocols. We
> > could use STAT to detect when the server has restarted and so it is worth
> > probing again. But STAT is not as reliable as we might like and it
> > would be more complexity with no real gain.
>
> If you have a specific idea for the mechanism we need to create to
> detect the v3 client reconnects to the server please let me know.
> Reusing or augmenting an existing thing is fine by me.
nfs3_local_probe(struct nfs_server *server)
{
struct nfs_client *clp = server->nfs_client;
nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
if (nfs_uuid->connect_cookie != clp->cl_rpcclient->cl_xprt->connect_cookie)
nfs_local_probe_async()
}
static void nfs_local_probe_async_work(struct work_struct *work)
{
struct nfs_client *clp = container_of(work, struct nfs_client,
cl_local_probe_work);
clp->cl_uuid.connect_cookie =
clp->cl_rpcclient->cl_xprt->connect_cookie;
nfs_local_probe(clp);
}
Or maybe assign connect_cookie (which we have to add to uuid) inside
nfs_local_probe().
Though you need rcu_dereference_pointer() to access cl_xprt and
rcu_read_lock() protection around that.
(cl_xprt can change when the NFS client follows a "location" reported by
the server to handle migration or mirroring. Conceivably we should
check for either cl_xprt changing or cl_xprt->connect_cookie changing,
but if that were an issue it would be easier to initialise
->connect_cookie to a random number)
Note that you don't need local_probe_mutex. A given work_struct
(cl_local_probe_work) can only be running once. If you try to
queue_work() it while it is running, queue_work() will do nothing.
You'll want to only INIT_WORK() once - not on every
nfs_local_probe_async() call.
NeilBrown
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 07/19] nfs/localio: add direct IO enablement with sync and async IO support
2024-11-08 23:39 ` [for-6.13 PATCH 07/19] nfs/localio: add direct IO enablement with sync and async IO support Mike Snitzer
2024-11-11 1:31 ` NeilBrown
@ 2024-11-12 14:31 ` Chuck Lever
1 sibling, 0 replies; 45+ messages in thread
From: Chuck Lever @ 2024-11-12 14:31 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Jeff Layton,
NeilBrown
On Fri, Nov 08, 2024 at 06:39:50PM -0500, Mike Snitzer wrote:
> This commit simply adds the required O_DIRECT plumbing. It doesn't
> address the fact that NFS doesn't ensure all writes are page aligned
> (nor device logical block size aligned as required by O_DIRECT).
>
> Because NFS will read-modify-write for IO that isn't aligned, LOCALIO
> will not use O_DIRECT semantics by default if/when an application
> requests the use of O_DIRECT. Allow the use of O_DIRECT semantics by:
> 1: Adding a flag to the nfs_pgio_header struct to allow the NFS
> O_DIRECT layer to signal that O_DIRECT was used by the application
> 2: Adding a 'localio_O_DIRECT_semantics' NFS module parameter that
> when enabled will cause LOCALIO to use O_DIRECT semantics (this may
> cause IO to fail if applications do not properly align their IO).
I'm not clear why the module parameter is necessary. Applications
that use O_DIRECT, I think, assume there /are/ alignment
restrictions. Generally they are constructed to operate agnosticly
on both NFS and non-NFS file systems, so I'm not sure any such
applications expect or depend on NFS's looser I/O alignment.
If it turns out to be necessary, the module parameter should be
documented somewhere (maybe under Documentation/).
> Adding Direct IO support helps side-step the problem that LOCALIO
> currently double buffers buffered IO (by using page cache in both NFS
> and the underlying filesystem). More care is needed to craft a proper
> solution for LOCALIO's redundant use of page cache for buffered IO,
> e.g.: https://marc.info/?l=linux-nfs&m=171996211625151&w=2
This last paragraph confused me initially. Above, the description
states that this change is to address support for applications using
O_DIRECT. But here, you mention a problem that appears to affect all
users of LOCALIO. I guess this paragraph is aspirational? I'm not
sure I would use direct I/O to address generic double-buffering --
not only does direct I/O have alignment constraints but it also
makes some assumptions about how the application is managing its I/O
buffer.
It would help me if this paragraph was dropped, since (IIUC) it
isn't directly related to the use of O_DIRECT by applications; and
perhaps add an initial paragraph that provides a problem statement.
> This commit is derived from code developed by Weston Andros Adamson.
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/direct.c | 1 +
> fs/nfs/localio.c | 92 ++++++++++++++++++++++++++++++++++++-----
> include/linux/nfs_xdr.h | 1 +
> 3 files changed, 84 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 90079ca134dd..4b92493d6ff0 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -303,6 +303,7 @@ static void nfs_read_sync_pgio_error(struct list_head *head, int error)
> static void nfs_direct_pgio_init(struct nfs_pgio_header *hdr)
> {
> get_dreq(hdr->dreq);
> + set_bit(NFS_IOHDR_ODIRECT, &hdr->flags);
> }
>
> static const struct nfs_pgio_completion_ops nfs_direct_read_completion_ops = {
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index 4b8618cf114c..de0dcd76d84d 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -35,6 +35,7 @@ struct nfs_local_kiocb {
> struct bio_vec *bvec;
> struct nfs_pgio_header *hdr;
> struct work_struct work;
> + void (*aio_complete_work)(struct work_struct *);
> struct nfsd_file *localio;
> };
>
> @@ -48,6 +49,10 @@ struct nfs_local_fsync_ctx {
> static bool localio_enabled __read_mostly = true;
> module_param(localio_enabled, bool, 0644);
>
> +static bool localio_O_DIRECT_semantics __read_mostly = false;
> +module_param(localio_O_DIRECT_semantics, bool, 0644);
> +MODULE_PARM_DESC(localio_O_DIRECT_semantics, "Use O_DIRECT semantics");
> +
> static inline bool nfs_client_is_local(const struct nfs_client *clp)
> {
> return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> @@ -285,10 +290,19 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
> kfree(iocb);
> return NULL;
> }
> - init_sync_kiocb(&iocb->kiocb, file);
> +
> + if (localio_O_DIRECT_semantics &&
> + test_bit(NFS_IOHDR_ODIRECT, &hdr->flags)) {
> + iocb->kiocb.ki_filp = file;
> + iocb->kiocb.ki_flags = IOCB_DIRECT;
> + } else
> + init_sync_kiocb(&iocb->kiocb, file);
> +
> iocb->kiocb.ki_pos = hdr->args.offset;
> iocb->hdr = hdr;
> iocb->kiocb.ki_flags &= ~IOCB_APPEND;
> + iocb->aio_complete_work = NULL;
> +
> return iocb;
> }
>
> @@ -343,6 +357,18 @@ nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
> nfs_local_hdr_release(hdr, hdr->task.tk_ops);
> }
>
> +/*
> + * Complete the I/O from iocb->kiocb.ki_complete()
> + *
> + * Note that this function can be called from a bottom half context,
> + * hence we need to queue the rpc_call_done() etc to a workqueue
> + */
> +static inline void nfs_local_pgio_aio_complete(struct nfs_local_kiocb *iocb)
> +{
> + INIT_WORK(&iocb->work, iocb->aio_complete_work);
> + queue_work(nfsiod_workqueue, &iocb->work);
> +}
> +
> static void
> nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
> {
> @@ -365,6 +391,23 @@ nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
> status > 0 ? status : 0, hdr->res.eof);
> }
>
> +static void nfs_local_read_aio_complete_work(struct work_struct *work)
> +{
> + struct nfs_local_kiocb *iocb =
> + container_of(work, struct nfs_local_kiocb, work);
> +
> + nfs_local_pgio_release(iocb);
> +}
> +
> +static void nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
> +{
> + struct nfs_local_kiocb *iocb =
> + container_of(kiocb, struct nfs_local_kiocb, kiocb);
> +
> + nfs_local_read_done(iocb, ret);
> + nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_read_aio_complete_work */
> +}
> +
> static void nfs_local_call_read(struct work_struct *work)
> {
> struct nfs_local_kiocb *iocb =
> @@ -379,10 +422,10 @@ static void nfs_local_call_read(struct work_struct *work)
> nfs_local_iter_init(&iter, iocb, READ);
>
> status = filp->f_op->read_iter(&iocb->kiocb, &iter);
> - WARN_ON_ONCE(status == -EIOCBQUEUED);
> -
> - nfs_local_read_done(iocb, status);
> - nfs_local_pgio_release(iocb);
> + if (status != -EIOCBQUEUED) {
> + nfs_local_read_done(iocb, status);
> + nfs_local_pgio_release(iocb);
> + }
>
> revert_creds(save_cred);
> }
> @@ -410,6 +453,11 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
> nfs_local_pgio_init(hdr, call_ops);
> hdr->res.eof = false;
>
> + if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
> + iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
> + iocb->aio_complete_work = nfs_local_read_aio_complete_work;
> + }
> +
> INIT_WORK(&iocb->work, nfs_local_call_read);
> queue_work(nfslocaliod_workqueue, &iocb->work);
>
> @@ -534,6 +582,24 @@ nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
> nfs_local_pgio_done(hdr, status);
> }
>
> +static void nfs_local_write_aio_complete_work(struct work_struct *work)
> +{
> + struct nfs_local_kiocb *iocb =
> + container_of(work, struct nfs_local_kiocb, work);
> +
> + nfs_local_vfs_getattr(iocb);
> + nfs_local_pgio_release(iocb);
> +}
> +
> +static void nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
> +{
> + struct nfs_local_kiocb *iocb =
> + container_of(kiocb, struct nfs_local_kiocb, kiocb);
> +
> + nfs_local_write_done(iocb, ret);
> + nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_write_aio_complete_work */
> +}
> +
> static void nfs_local_call_write(struct work_struct *work)
> {
> struct nfs_local_kiocb *iocb =
> @@ -552,11 +618,11 @@ static void nfs_local_call_write(struct work_struct *work)
> file_start_write(filp);
> status = filp->f_op->write_iter(&iocb->kiocb, &iter);
> file_end_write(filp);
> - WARN_ON_ONCE(status == -EIOCBQUEUED);
> -
> - nfs_local_write_done(iocb, status);
> - nfs_local_vfs_getattr(iocb);
> - nfs_local_pgio_release(iocb);
> + if (status != -EIOCBQUEUED) {
> + nfs_local_write_done(iocb, status);
> + nfs_local_vfs_getattr(iocb);
> + nfs_local_pgio_release(iocb);
> + }
>
> revert_creds(save_cred);
> current->flags = old_flags;
> @@ -592,10 +658,16 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
> case NFS_FILE_SYNC:
> iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
> }
> +
> nfs_local_pgio_init(hdr, call_ops);
>
> nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
>
> + if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
> + iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
> + iocb->aio_complete_work = nfs_local_write_aio_complete_work;
> + }
> +
> INIT_WORK(&iocb->work, nfs_local_call_write);
> queue_work(nfslocaliod_workqueue, &iocb->work);
>
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index e0ae0a14257f..f30e94d105b7 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1632,6 +1632,7 @@ enum {
> NFS_IOHDR_RESEND_PNFS,
> NFS_IOHDR_RESEND_MDS,
> NFS_IOHDR_UNSTABLE_WRITES,
> + NFS_IOHDR_ODIRECT,
> };
>
> struct nfs_io_completion;
> --
> 2.44.0
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-12 0:49 ` NeilBrown
@ 2024-11-12 14:36 ` Chuck Lever
2024-11-12 23:13 ` NeilBrown
0 siblings, 1 reply; 45+ messages in thread
From: Chuck Lever @ 2024-11-12 14:36 UTC (permalink / raw)
To: NeilBrown
Cc: Mike Snitzer, linux-nfs, Anna Schumaker, Trond Myklebust,
Jeff Layton
On Tue, Nov 12, 2024 at 11:49:30AM +1100, NeilBrown wrote:
> On Tue, 12 Nov 2024, Mike Snitzer wrote:
> > On Tue, Nov 12, 2024 at 10:23:19AM +1100, NeilBrown wrote:
> > > On Tue, 12 Nov 2024, Mike Snitzer wrote:
> > > > On Tue, Nov 12, 2024 at 07:35:04AM +1100, NeilBrown wrote:
> > > > >
> > > > > I don't like NFS_CS_LOCAL_IO_CAPABLE.
> > > > > A use case that I imagine (and a customer does something like this) is an
> > > > > HA cluster where the NFS server can move from one node to another. All
> > > > > the node access the filesystem, most over NFS. If a server-migration
> > > > > happens (e.g. the current server node failed) then the new server node
> > > > > would suddenly become LOCALIO-capable even though it wasn't at
> > > > > mount-time. I would like it to be able to detect this and start doing
> > > > > localio.
> > > >
> > > > Server migration while retaining the client being local to the new
> > > > server? So client migrates too?
> > >
> > > No. Client doesn't migrate. Server migrates and appears on the same
> > > host as the client. The client can suddenly get better performance. It
> > > should benefit from that.
> > >
> > > >
> > > > If the client migrates then it will negotiate with server using
> > > > LOCALIO protocol.
> > > >
> > > > Anyway, this HA hypothetical feels contrived. It is fine that you
> > > > dislike NFS_CS_LOCAL_IO_CAPABLE but I don't understand what you'd like
> > > > as an alternative. Or why the simplicity in my approach lacking.
> > >
> > > We have customers with exactly this HA config. This is why I put work
> > > into make sure loop-back NFS (client and server on same node) works
> > > cleanly without memory allocation deadlocks.
> > > https://lwn.net/Articles/595652/
> > > Getting localio in that config would be even better.
> > >
> > > Your approach assumes that if LOCALIO isn't detected at mount time, it
> > > will never been available. I think that is a flawed assumption.
> >
> > That's fair, I agree your HA scenario is valid. It was terse as
> > initially presented but I understand now, thanks.
> >
> > > > > So I don't want NFS_CS_LOCAL_IO_CAPABLE. I think tracking when the
> > > > > network connection is re-established is sufficient.
> > > >
> > > > Eh, that type of tracking doesn't really buy me anything if I've lost
> > > > context (that LOCALIO was previously established and should be
> > > > re-established).
> > > >
> > > > NFS v3 is stateless, hence my hooking off read and write paths to
> > > > trigger nfs_local_probe_async(). Unlike NFS v4, with its grace, more
> > > > care is needed to avoid needless calls to nfs_local_probe_async().
> > >
> > > I think it makes perfect sense to trigger the probe on a successful
> > > read/write with some rate limiting to avoid sending a LOCALIO probe on
> > > EVERY read/write. Your rate-limiting for NFSv3 is:
> > > - never probe if the mount-time probe was not successful
> > > - otherwise probe once every 256 IO requests.
> > >
> > > I think the first is too restrictive, and the second is unnecessarily
> > > frequent.
> > > I propose:
> > > - probe once each time the client reconnects with the server
> > >
> > > This will result in many fewer probes in practice, but any successful
> > > probe will happen at nearly the earliest possible moment.
> >
> > I'm all for what you're proposing (its what I wanted from the start).
> > In practice I just don't quite grok the client reconnect awareness
> > implementation you're saying is at our finger tips.
> >
> > > > Your previous email about just tracking network connection change was
> > > > an optimization for avoiding repeat (pointless) probes. We still
> > > > need to know to do the probe to begin with. Are you saying you want
> > > > to backfill the equivalent of grace (or pseudo-grace) to NFSv3?
> > >
> > > You don't "know to do the probe" at mount time. You simply always do
> > > it. Similarly whenever localio isn't active it is always appropriate to
> > > probe - with rate limiting.
> > >
> > > And NFSv3 already has a grace period - in the NLM/STAT protocols. We
> > > could use STAT to detect when the server has restarted and so it is worth
> > > probing again. But STAT is not as reliable as we might like and it
> > > would be more complexity with no real gain.
> >
> > If you have a specific idea for the mechanism we need to create to
> > detect the v3 client reconnects to the server please let me know.
> > Reusing or augmenting an existing thing is fine by me.
>
> nfs3_local_probe(struct nfs_server *server)
> {
> struct nfs_client *clp = server->nfs_client;
> nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
>
> if (nfs_uuid->connect_cookie != clp->cl_rpcclient->cl_xprt->connect_cookie)
> nfs_local_probe_async()
> }
>
> static void nfs_local_probe_async_work(struct work_struct *work)
> {
> struct nfs_client *clp = container_of(work, struct nfs_client,
> cl_local_probe_work);
> clp->cl_uuid.connect_cookie =
> clp->cl_rpcclient->cl_xprt->connect_cookie;
> nfs_local_probe(clp);
> }
>
> Or maybe assign connect_cookie (which we have to add to uuid) inside
> nfs_local_probe().
The problem with per-connection checks is that a change in export
security policy could disable LOCALIO rather persistently. The only
way to recover, if checking is done only when a connection is
established, is to remount or force a disconnect.
> Though you need rcu_dereference_pointer() to access cl_xprt and
> rcu_read_lock() protection around that.
> (cl_xprt can change when the NFS client follows a "location" reported by
> the server to handle migration or mirroring. Conceivably we should
> check for either cl_xprt changing or cl_xprt->connect_cookie changing,
> but if that were an issue it would be easier to initialise
> ->connect_cookie to a random number)
>
> Note that you don't need local_probe_mutex. A given work_struct
> (cl_local_probe_work) can only be running once. If you try to
> queue_work() it while it is running, queue_work() will do nothing.
>
> You'll want to only INIT_WORK() once - not on every
> nfs_local_probe_async() call.
>
> NeilBrown
--
Chuck Lever
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-12 14:36 ` Chuck Lever
@ 2024-11-12 23:13 ` NeilBrown
2024-11-13 0:07 ` Chuck Lever III
0 siblings, 1 reply; 45+ messages in thread
From: NeilBrown @ 2024-11-12 23:13 UTC (permalink / raw)
To: Chuck Lever
Cc: Mike Snitzer, linux-nfs, Anna Schumaker, Trond Myklebust,
Jeff Layton
On Wed, 13 Nov 2024, Chuck Lever wrote:
> On Tue, Nov 12, 2024 at 11:49:30AM +1100, NeilBrown wrote:
> > >
> > > If you have a specific idea for the mechanism we need to create to
> > > detect the v3 client reconnects to the server please let me know.
> > > Reusing or augmenting an existing thing is fine by me.
> >
> > nfs3_local_probe(struct nfs_server *server)
> > {
> > struct nfs_client *clp = server->nfs_client;
> > nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
> >
> > if (nfs_uuid->connect_cookie != clp->cl_rpcclient->cl_xprt->connect_cookie)
> > nfs_local_probe_async()
> > }
> >
> > static void nfs_local_probe_async_work(struct work_struct *work)
> > {
> > struct nfs_client *clp = container_of(work, struct nfs_client,
> > cl_local_probe_work);
> > clp->cl_uuid.connect_cookie =
> > clp->cl_rpcclient->cl_xprt->connect_cookie;
> > nfs_local_probe(clp);
> > }
> >
> > Or maybe assign connect_cookie (which we have to add to uuid) inside
> > nfs_local_probe().
>
> The problem with per-connection checks is that a change in export
> security policy could disable LOCALIO rather persistently. The only
> way to recover, if checking is done only when a connection is
> established, is to remount or force a disconnect.
>
What export security policy specifically?
Do you mean changing from sec=sys to to sec=krb5i for example? This
would (hopefully) disable localio. Then changing the export back to
sec=sys would mean that localio would be possible again. I wonder how
the client copes with this. Does it work on a live mount without
remount? If so it would certainly make sense for the current security
setting to be cached in nfs_uidd and for a probe to be attempted
whenever that changed to sec=sys.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-12 23:13 ` NeilBrown
@ 2024-11-13 0:07 ` Chuck Lever III
2024-11-13 0:32 ` NeilBrown
0 siblings, 1 reply; 45+ messages in thread
From: Chuck Lever III @ 2024-11-13 0:07 UTC (permalink / raw)
To: Neil Brown
Cc: Mike Snitzer, Linux NFS Mailing List, Anna Schumaker,
Trond Myklebust, Jeff Layton
> On Nov 12, 2024, at 6:13 PM, NeilBrown <neilb@suse.de> wrote:
>
> On Wed, 13 Nov 2024, Chuck Lever wrote:
>> On Tue, Nov 12, 2024 at 11:49:30AM +1100, NeilBrown wrote:
>>>>
>>>> If you have a specific idea for the mechanism we need to create to
>>>> detect the v3 client reconnects to the server please let me know.
>>>> Reusing or augmenting an existing thing is fine by me.
>>>
>>> nfs3_local_probe(struct nfs_server *server)
>>> {
>>> struct nfs_client *clp = server->nfs_client;
>>> nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
>>>
>>> if (nfs_uuid->connect_cookie != clp->cl_rpcclient->cl_xprt->connect_cookie)
>>> nfs_local_probe_async()
>>> }
>>>
>>> static void nfs_local_probe_async_work(struct work_struct *work)
>>> {
>>> struct nfs_client *clp = container_of(work, struct nfs_client,
>>> cl_local_probe_work);
>>> clp->cl_uuid.connect_cookie =
>>> clp->cl_rpcclient->cl_xprt->connect_cookie;
>>> nfs_local_probe(clp);
>>> }
>>>
>>> Or maybe assign connect_cookie (which we have to add to uuid) inside
>>> nfs_local_probe().
>>
>> The problem with per-connection checks is that a change in export
>> security policy could disable LOCALIO rather persistently. The only
>> way to recover, if checking is done only when a connection is
>> established, is to remount or force a disconnect.
>>
> What export security policy specifically?
> Do you mean changing from sec=sys to to sec=krb5i for example?
Another example might be altering the IP address list on
the export. Suppose the client is accidentally blocked
by this policy, the administrator realizes it, and changes
it again to restore access.
The client does not disconnect in this case, AFAIK.
> This
> would (hopefully) disable localio. Then changing the export back to
> sec=sys would mean that localio would be possible again. I wonder how
> the client copes with this. Does it work on a live mount without
> remount? If so it would certainly make sense for the current security
> setting to be cached in nfs_uidd and for a probe to be attempted
> whenever that changed to sec=sys.
>
> Thanks,
> NeilBrown
--
Chuck Lever
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t
2024-11-13 0:07 ` Chuck Lever III
@ 2024-11-13 0:32 ` NeilBrown
0 siblings, 0 replies; 45+ messages in thread
From: NeilBrown @ 2024-11-13 0:32 UTC (permalink / raw)
To: Chuck Lever III
Cc: Mike Snitzer, Linux NFS Mailing List, Anna Schumaker,
Trond Myklebust, Jeff Layton
On Wed, 13 Nov 2024, Chuck Lever III wrote:
>
>
> > On Nov 12, 2024, at 6:13 PM, NeilBrown <neilb@suse.de> wrote:
> >
> > On Wed, 13 Nov 2024, Chuck Lever wrote:
> >> On Tue, Nov 12, 2024 at 11:49:30AM +1100, NeilBrown wrote:
> >>>>
> >>>> If you have a specific idea for the mechanism we need to create to
> >>>> detect the v3 client reconnects to the server please let me know.
> >>>> Reusing or augmenting an existing thing is fine by me.
> >>>
> >>> nfs3_local_probe(struct nfs_server *server)
> >>> {
> >>> struct nfs_client *clp = server->nfs_client;
> >>> nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
> >>>
> >>> if (nfs_uuid->connect_cookie != clp->cl_rpcclient->cl_xprt->connect_cookie)
> >>> nfs_local_probe_async()
> >>> }
> >>>
> >>> static void nfs_local_probe_async_work(struct work_struct *work)
> >>> {
> >>> struct nfs_client *clp = container_of(work, struct nfs_client,
> >>> cl_local_probe_work);
> >>> clp->cl_uuid.connect_cookie =
> >>> clp->cl_rpcclient->cl_xprt->connect_cookie;
> >>> nfs_local_probe(clp);
> >>> }
> >>>
> >>> Or maybe assign connect_cookie (which we have to add to uuid) inside
> >>> nfs_local_probe().
> >>
> >> The problem with per-connection checks is that a change in export
> >> security policy could disable LOCALIO rather persistently. The only
> >> way to recover, if checking is done only when a connection is
> >> established, is to remount or force a disconnect.
> >>
> > What export security policy specifically?
> > Do you mean changing from sec=sys to to sec=krb5i for example?
>
> Another example might be altering the IP address list on
> the export. Suppose the client is accidentally blocked
> by this policy, the administrator realizes it, and changes
> it again to restore access.
>
> The client does not disconnect in this case, AFAIK.
Yes, that is a simpler case...
How would the localio path get disabled when this happens?
I suspect ->nfsd_open_local_fh would (should?) fail.
It, or nfs_open_local_fh() which calls it, could reset
uuid->connect_cookie to an impossible value so as to force a
probe after the next successful IO. That would be an important part of
the protocol.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 02/19] nfs_common: must not hold RCU while calling nfsd_file_put_local
2024-11-08 23:39 ` [for-6.13 PATCH 02/19] nfs_common: must not hold RCU while calling nfsd_file_put_local Mike Snitzer
2024-11-11 1:01 ` NeilBrown
@ 2024-11-13 14:58 ` Jeff Layton
2024-11-13 16:51 ` Mike Snitzer
1 sibling, 1 reply; 45+ messages in thread
From: Jeff Layton @ 2024-11-13 14:58 UTC (permalink / raw)
To: Mike Snitzer, linux-nfs
Cc: Anna Schumaker, Trond Myklebust, Chuck Lever, NeilBrown
On Fri, 2024-11-08 at 18:39 -0500, Mike Snitzer wrote:
> Move holding the RCU from nfs_to_nfsd_file_put_local to
> nfs_to_nfsd_net_put. It is the call to nfs_to->nfsd_serv_put that
> requires the RCU anyway (the puts for nfsd_file and netns were
> combined to avoid an extra indirect reference but that
> micro-optimization isn't possible now).
>
> This fixes xfstests generic/013 and it triggering:
>
> "Voluntary context switch within RCU read-side critical section!"
>
> [ 143.545738] Call Trace:
> [ 143.546206] <TASK>
> [ 143.546625] ? show_regs+0x6d/0x80
> [ 143.547267] ? __warn+0x91/0x140
> [ 143.547951] ? rcu_note_context_switch+0x496/0x5d0
> [ 143.548856] ? report_bug+0x193/0x1a0
> [ 143.549557] ? handle_bug+0x63/0xa0
> [ 143.550214] ? exc_invalid_op+0x1d/0x80
> [ 143.550938] ? asm_exc_invalid_op+0x1f/0x30
> [ 143.551736] ? rcu_note_context_switch+0x496/0x5d0
> [ 143.552634] ? wakeup_preempt+0x62/0x70
> [ 143.553358] __schedule+0xaa/0x1380
> [ 143.554025] ? _raw_spin_unlock_irqrestore+0x12/0x40
> [ 143.554958] ? try_to_wake_up+0x1fe/0x6b0
> [ 143.555715] ? wake_up_process+0x19/0x20
> [ 143.556452] schedule+0x2e/0x120
> [ 143.557066] schedule_preempt_disabled+0x19/0x30
> [ 143.557933] rwsem_down_read_slowpath+0x24d/0x4a0
> [ 143.558818] ? xfs_efi_item_format+0x50/0xc0 [xfs]
> [ 143.559894] down_read+0x4e/0xb0
> [ 143.560519] xlog_cil_commit+0x1b2/0xbc0 [xfs]
> [ 143.561460] ? _raw_spin_unlock+0x12/0x30
> [ 143.562212] ? xfs_inode_item_precommit+0xc7/0x220 [xfs]
> [ 143.563309] ? xfs_trans_run_precommits+0x69/0xd0 [xfs]
> [ 143.564394] __xfs_trans_commit+0xb5/0x330 [xfs]
> [ 143.565367] xfs_trans_roll+0x48/0xc0 [xfs]
> [ 143.566262] xfs_defer_trans_roll+0x57/0x100 [xfs]
> [ 143.567278] xfs_defer_finish_noroll+0x27a/0x490 [xfs]
> [ 143.568342] xfs_defer_finish+0x1a/0x80 [xfs]
> [ 143.569267] xfs_bunmapi_range+0x4d/0xb0 [xfs]
> [ 143.570208] xfs_itruncate_extents_flags+0x13d/0x230 [xfs]
> [ 143.571353] xfs_free_eofblocks+0x12e/0x190 [xfs]
> [ 143.572359] xfs_file_release+0x12d/0x140 [xfs]
> [ 143.573324] __fput+0xe8/0x2d0
> [ 143.573922] __fput_sync+0x1d/0x30
> [ 143.574574] nfsd_filp_close+0x33/0x60 [nfsd]
> [ 143.575430] nfsd_file_free+0x96/0x150 [nfsd]
> [ 143.576274] nfsd_file_put+0xf7/0x1a0 [nfsd]
> [ 143.577104] nfsd_file_put_local+0x18/0x30 [nfsd]
> [ 143.578070] nfs_close_local_fh+0x101/0x110 [nfs_localio]
> [ 143.579079] __put_nfs_open_context+0xc9/0x180 [nfs]
> [ 143.580031] nfs_file_clear_open_context+0x4a/0x60 [nfs]
> [ 143.581038] nfs_file_release+0x3e/0x60 [nfs]
> [ 143.581879] __fput+0xe8/0x2d0
> [ 143.582464] __fput_sync+0x1d/0x30
> [ 143.583108] __x64_sys_close+0x41/0x80
> [ 143.583823] x64_sys_call+0x189a/0x20d0
> [ 143.584552] do_syscall_64+0x64/0x170
> [ 143.585240] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 143.586185] RIP: 0033:0x7f3c5153efd7
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs_common/nfslocalio.c | 8 +++-----
> fs/nfsd/filecache.c | 14 +++++++-------
> fs/nfsd/filecache.h | 2 +-
> include/linux/nfslocalio.h | 18 +++++++++++++++---
> 4 files changed, 26 insertions(+), 16 deletions(-)
>
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index 09404d142d1a..a74ec08f6c96 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -155,11 +155,9 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
> /* We have an implied reference to net thanks to nfsd_serv_try_get */
> localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt,
> cred, nfs_fh, fmode);
> - if (IS_ERR(localio)) {
> - rcu_read_lock();
> - nfs_to->nfsd_serv_put(net);
> - rcu_read_unlock();
> - }
> + if (IS_ERR(localio))
> + nfs_to_nfsd_net_put(net);
> +
> return localio;
> }
> EXPORT_SYMBOL_GPL(nfs_open_local_fh);
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index c16671135d17..9a62b4da89bb 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -391,19 +391,19 @@ nfsd_file_put(struct nfsd_file *nf)
> }
>
> /**
> - * nfsd_file_put_local - put the reference to nfsd_file and local nfsd_serv
> - * @nf: nfsd_file of which to put the references
> + * nfsd_file_put_local - put nfsd_file reference and arm nfsd_serv_put in caller
> + * @nf: nfsd_file of which to put the reference
> *
> - * First put the reference of the nfsd_file and then put the
> - * reference to the associated nn->nfsd_serv.
> + * First save the associated net to return to caller, then put
> + * the reference of the nfsd_file.
> */
> -void
> -nfsd_file_put_local(struct nfsd_file *nf) __must_hold(rcu)
> +struct net *
> +nfsd_file_put_local(struct nfsd_file *nf)
> {
> struct net *net = nf->nf_net;
>
> nfsd_file_put(nf);
> - nfsd_serv_put(net);
> + return net;
> }
>
> /**
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index cadf3c2689c4..d5db6b34ba30 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -55,7 +55,7 @@ void nfsd_file_cache_shutdown(void);
> int nfsd_file_cache_start_net(struct net *net);
> void nfsd_file_cache_shutdown_net(struct net *net);
> void nfsd_file_put(struct nfsd_file *nf);
> -void nfsd_file_put_local(struct nfsd_file *nf);
> +struct net *nfsd_file_put_local(struct nfsd_file *nf);
> struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> struct file *nfsd_file_file(struct nfsd_file *nf);
> void nfsd_file_close_inode_sync(struct inode *inode);
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index 3982fea79919..9202f4b24343 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -55,7 +55,7 @@ struct nfsd_localio_operations {
> const struct cred *,
> const struct nfs_fh *,
> const fmode_t);
> - void (*nfsd_file_put_local)(struct nfsd_file *);
> + struct net *(*nfsd_file_put_local)(struct nfsd_file *);
> struct file *(*nfsd_file_file)(struct nfsd_file *);
> } ____cacheline_aligned;
>
> @@ -66,7 +66,7 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *,
> struct rpc_clnt *, const struct cred *,
> const struct nfs_fh *, const fmode_t);
>
> -static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
> +static inline void nfs_to_nfsd_net_put(struct net *net)
> {
> /*
> * Once reference to nfsd_serv is dropped, NFSD could be
> @@ -74,10 +74,22 @@ static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
> * by always taking RCU.
> */
> rcu_read_lock();
> - nfs_to->nfsd_file_put_local(localio);
> + nfs_to->nfsd_serv_put(net);
> rcu_read_unlock();
> }
>
> +static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio)
> +{
> + /*
> + * Must not hold RCU otherwise nfsd_file_put() can easily trigger:
> + * "Voluntary context switch within RCU read-side critical section!"
> + * by scheduling deep in underlying filesystem (e.g. XFS).
> + */
> + struct net *net = nfs_to->nfsd_file_put_local(localio);
> +
> + nfs_to_nfsd_net_put(net);
> +}
> +
> #else /* CONFIG_NFS_LOCALIO */
> static inline void nfsd_localio_ops_init(void)
> {
I think this probably needs to go into v6.12 (or very early into v6.13
and backported). It should also probably get:
Fixes: 65f2a5c36635 ("nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()")
You can also add:
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [for-6.13 PATCH 02/19] nfs_common: must not hold RCU while calling nfsd_file_put_local
2024-11-13 14:58 ` Jeff Layton
@ 2024-11-13 16:51 ` Mike Snitzer
0 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-11-13 16:51 UTC (permalink / raw)
To: Jeff Layton
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, Chuck Lever,
NeilBrown
On Wed, Nov 13, 2024 at 09:58:28AM -0500, Jeff Layton wrote:
>
> I think this probably needs to go into v6.12 (or very early into v6.13
> and backported). It should also probably get:
>
> Fixes: 65f2a5c36635 ("nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()")
>
> You can also add:
>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
OK, I've added these. I'm about to send out v2 of this series that
includes the other tags and suggested changes others have provided.
Thanks,
Mike
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2024-11-13 16:51 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-08 23:39 [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 01/19] nfs/localio: must clear res.replen in nfs_local_read_done Mike Snitzer
2024-11-11 0:36 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 02/19] nfs_common: must not hold RCU while calling nfsd_file_put_local Mike Snitzer
2024-11-11 1:01 ` NeilBrown
2024-11-13 14:58 ` Jeff Layton
2024-11-13 16:51 ` Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 03/19] nfs/localio: remove redundant suid/sgid handling Mike Snitzer
2024-11-11 1:09 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 04/19] nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx Mike Snitzer
2024-11-11 1:15 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 05/19] nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter Mike Snitzer
2024-11-11 1:20 ` NeilBrown
2024-11-11 15:09 ` Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 06/19] nfs/localio: eliminate need for nfs_local_fsync_work forward declaration Mike Snitzer
2024-11-11 1:21 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 07/19] nfs/localio: add direct IO enablement with sync and async IO support Mike Snitzer
2024-11-11 1:31 ` NeilBrown
2024-11-12 14:31 ` Chuck Lever
2024-11-08 23:39 ` [for-6.13 PATCH 08/19] nfsd: add nfsd_file_{get,put} to 'nfs_to' nfsd_localio_operations Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 09/19] nfs_common: rename functions that invalidate LOCALIO nfs_clients Mike Snitzer
2024-11-11 1:32 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t Mike Snitzer
2024-11-11 1:55 ` NeilBrown
2024-11-11 15:33 ` Mike Snitzer
2024-11-11 20:35 ` NeilBrown
2024-11-11 22:27 ` Mike Snitzer
2024-11-11 23:23 ` NeilBrown
2024-11-12 0:16 ` Mike Snitzer
2024-11-12 0:49 ` NeilBrown
2024-11-12 14:36 ` Chuck Lever
2024-11-12 23:13 ` NeilBrown
2024-11-13 0:07 ` Chuck Lever III
2024-11-13 0:32 ` NeilBrown
2024-11-08 23:39 ` [for-6.13 PATCH 11/19] nfs: cache all open LOCALIO nfsd_file(s) in client Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 12/19] nfsd: update percpu_ref to manage references on nfsd_net Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 13/19] nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_ Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 14/19] nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 15/19] nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock Mike Snitzer
2024-11-08 23:39 ` [for-6.13 PATCH 16/19] nfs_common: track all open nfsd_files per LOCALIO nfs_client Mike Snitzer
2024-11-08 23:40 ` [for-6.13 PATCH 17/19] nfs_common: add nfs_localio trace events Mike Snitzer
2024-11-08 23:40 ` [for-6.13 PATCH 18/19] nfs: probe for LOCALIO when v4 client reconnects to server Mike Snitzer
2024-11-08 23:40 ` [for-6.13 PATCH 19/19] nfs: probe for LOCALIO when v3 " Mike Snitzer
2024-11-11 3:06 ` NeilBrown
2024-11-10 15:49 ` [for-6.13 PATCH 00/19] nfs/nfsd: fixes and improvements for LOCALIO Chuck Lever III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox