public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Automatic NFSv4 state revocation on filesystem unmount
@ 2026-02-24 16:39 Chuck Lever
  2026-02-24 16:39 ` [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification Chuck Lever
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Chuck Lever @ 2026-02-24 16:39 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

When an NFS server exports a filesystem and clients hold NFSv4 state
(opens, locks, delegations), unmounting the underlying filesystem
fails with EBUSY. The /proc/fs/nfsd/unlock_fs interface exists for
administrators to manually revoke state before retrying the unmount,
but this approach has significant operational drawbacks.

Manual intervention breaks automation workflows. Containerized NFS
servers, orchestration systems, and unattended maintenance scripts
cannot reliably unmount exported filesystems without implementing
custom logic to detect the failure and invoke unlock_fs. System
administrators managing many exports face tedious, error-prone
procedures when decommissioning storage.

This series enables the NFS server to detect filesystem unmount
events and automatically revoke associated state. The mechanism
registers with a new SRCU notifier chain in VFS that fires during
mount teardown, after processing stuck children but before
fsnotify_vfsmount_delete(), while SB_ACTIVE is still set. When a
filesystem is unmounted, all NFSv4 opens, locks, and delegations
referencing it are revoked, async COPY operations are cancelled
with NFS4ERR_ADMIN_REVOKED sent to clients, NLM locks are released,
and cached file handles are closed.

With automatic revocation, unmount operations complete without
administrator intervention once the brief state cleanup finishes.
Clients receive immediate notification of state loss through
standard NFSv4 error codes, allowing applications to handle the
situation appropriately rather than encountering silent failures.

Based on v7.0-rc1

---

Changes since v2:
- Replace fs_pin with an SRCU umount notifier chain in VFS
- Merge the pending COPY cancellation patch
- Replace xa_cmpxchg() with xa_insert()
- Use cancel_work_sync() instead of flush_workqueue()
- Remove rcu_barrier()
- Correct misleading claims in kdoc comments and commit messages

Changes since v1:
- Explain why drop_client() is being renamed
- Finish implementing revocation on umount
- Rename pin_insert_group
- Clarified log output and code comments
- Hold nfsd_mutex while closing nfsd_files

Chuck Lever (3):
  fs: add umount notifier chain for filesystem unmount notification
  nfsd: revoke NFSv4 state when filesystem is unmounted
  nfsd: close cached files on filesystem unmount

 fs/namespace.c        |  69 ++++++++++
 fs/nfsd/Makefile      |   2 +-
 fs/nfsd/filecache.c   |  45 +++++++
 fs/nfsd/filecache.h   |   1 +
 fs/nfsd/netns.h       |   5 +
 fs/nfsd/nfs4state.c   |  29 +++++
 fs/nfsd/nfsctl.c      |  10 +-
 fs/nfsd/sb_watch.c    | 283 ++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/state.h       |   7 ++
 include/linux/mount.h |   4 +
 10 files changed, 452 insertions(+), 3 deletions(-)
 create mode 100644 fs/nfsd/sb_watch.c

-- 
2.53.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-02-24 16:39 [PATCH v3 0/3] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
@ 2026-02-24 16:39 ` Chuck Lever
  2026-02-26  8:48   ` Christian Brauner
  2026-02-24 16:39 ` [PATCH v3 2/3] nfsd: revoke NFSv4 state when filesystem is unmounted Chuck Lever
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2026-02-24 16:39 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Kernel subsystems occasionally need notification when a filesystem
is unmounted. Until now, the only mechanism available is the fs_pin
infrastructure, which has limited adoption (only BSD process
accounting uses it) and VFS maintainers consider it deprecated.

Add an SRCU notifier chain that fires during mount teardown,
following the pattern established by lease_notifier_chain in
fs/locks.c. The notifier fires after processing stuck children but
before fsnotify_vfsmount_delete(), at which point SB_ACTIVE is
still set and the superblock remains fully accessible.

The SRCU notifier type is chosen because:
 - Unmount is relatively infrequent, so the overhead of SRCU
   registration and unregistration is acceptable
 - Callbacks run in process context and may sleep
 - No cache bounces occur during chain traversal

NFSD requires this mechanism to revoke NFSv4 state (opens, locks,
delegations) and release cached file handles when a filesystem is
unmounted, avoiding EBUSY errors that occur when client state pins
the mount.

Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/namespace.c        | 69 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/mount.h |  4 +++
 2 files changed, 73 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index ebe19ded293a..269e007e9312 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -34,6 +34,7 @@
 #include <linux/mnt_idmapping.h>
 #include <linux/pidfs.h>
 #include <linux/nstree.h>
+#include <linux/notifier.h>
 
 #include "pnode.h"
 #include "internal.h"
@@ -73,6 +74,70 @@ static u64 event;
 static DEFINE_XARRAY_FLAGS(mnt_id_xa, XA_FLAGS_ALLOC);
 static DEFINE_IDA(mnt_group_ida);
 
+/*
+ * Kernel subsystems can register to be notified when a filesystem is
+ * unmounted. This is used by (e.g.) nfsd to revoke state associated
+ * with files on the filesystem being unmounted.
+ */
+static struct srcu_notifier_head umount_notifier_chain;
+
+/**
+ * umount_register_notifier - register for unmount notifications
+ * @nb: notifier_block to register
+ *
+ * Registers a notifier to be called when any filesystem is
+ * unmounted. The callback is invoked after stuck children are
+ * processed but before fsnotify_vfsmount_delete(), while SB_ACTIVE
+ * is still set and the superblock remains fully accessible.
+ *
+ * Callback signature:
+ *   int (*callback)(struct notifier_block *nb,
+ *                   unsigned long val, void *data)
+ *
+ *   @val:  always 0 (reserved for future extension)
+ *   @data: struct super_block * for the unmounting filesystem
+ *
+ * Callbacks run in process context and may sleep. Return
+ * NOTIFY_DONE from the callback; return values are ignored and
+ * cannot prevent unmount. Callbacks must handle their own error
+ * recovery internally.
+ *
+ * The notification fires once per mount instance. Bind mounts of
+ * the same filesystem trigger multiple callbacks with the same
+ * super_block pointer; callbacks must handle duplicate
+ * notifications idempotently.
+ *
+ * The super_block pointer is valid only for the duration of the
+ * callback. Callbacks must not retain this pointer for
+ * asynchronous use; to access the filesystem after the callback
+ * returns, acquire a separate reference (e.g., via an open file)
+ * during callback execution.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int umount_register_notifier(struct notifier_block *nb)
+{
+	return srcu_notifier_chain_register(&umount_notifier_chain, nb);
+}
+EXPORT_SYMBOL_GPL(umount_register_notifier);
+
+/**
+ * umount_unregister_notifier - unregister an unmount notifier
+ * @nb: notifier_block to unregister
+ *
+ * Unregisters a previously registered notifier. This function may
+ * block due to SRCU synchronization.
+ *
+ * Must not be called from within a notifier callback; doing so
+ * causes deadlock. Must be called before module unload if the
+ * notifier_block resides in module memory.
+ */
+void umount_unregister_notifier(struct notifier_block *nb)
+{
+	srcu_notifier_chain_unregister(&umount_notifier_chain, nb);
+}
+EXPORT_SYMBOL_GPL(umount_unregister_notifier);
+
 /* Don't allow confusion with old 32bit mount ID */
 #define MNT_UNIQUE_ID_OFFSET (1ULL << 31)
 static u64 mnt_id_ctr = MNT_UNIQUE_ID_OFFSET;
@@ -1307,6 +1372,8 @@ static void cleanup_mnt(struct mount *mnt)
 		hlist_del(&m->mnt_umount);
 		mntput(&m->mnt);
 	}
+	/* Notify registrants before superblock deactivation */
+	srcu_notifier_call_chain(&umount_notifier_chain, 0, mnt->mnt.mnt_sb);
 	fsnotify_vfsmount_delete(&mnt->mnt);
 	dput(mnt->mnt.mnt_root);
 	deactivate_super(mnt->mnt.mnt_sb);
@@ -6189,6 +6256,8 @@ void __init mnt_init(void)
 {
 	int err;
 
+	srcu_init_notifier_head(&umount_notifier_chain);
+
 	mnt_cache = kmem_cache_create("mnt_cache", sizeof(struct mount),
 			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index acfe7ef86a1b..9a46ab40dffd 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -21,6 +21,7 @@ struct file_system_type;
 struct fs_context;
 struct file;
 struct path;
+struct notifier_block;
 
 enum mount_flags {
 	MNT_NOSUID	= 0x01,
@@ -109,4 +110,7 @@ extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
 
 extern int cifs_root_data(char **dev, char **opts);
 
+int umount_register_notifier(struct notifier_block *nb);
+void umount_unregister_notifier(struct notifier_block *nb);
+
 #endif /* _LINUX_MOUNT_H */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 2/3] nfsd: revoke NFSv4 state when filesystem is unmounted
  2026-02-24 16:39 [PATCH v3 0/3] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
  2026-02-24 16:39 ` [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification Chuck Lever
@ 2026-02-24 16:39 ` Chuck Lever
  2026-02-24 16:39 ` [PATCH v3 3/3] nfsd: close cached files on filesystem unmount Chuck Lever
  2026-02-24 17:14 ` [PATCH v3 0/3] Automatic NFSv4 state revocation " Al Viro
  3 siblings, 0 replies; 30+ messages in thread
From: Chuck Lever @ 2026-02-24 16:39 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

When an NFS server's local filesystem is unmounted while NFS clients
are still accessing it, NFSv4 state holds files open which pins the
filesystem, preventing unmount.

Previously, administrators had to manually revoke state via
/proc/fs/nfsd/unlock_fs before a formerly exported filesystem could
be unmounted.

Register with the VFS umount notifier chain to detect filesystem
unmounts and revoke NFSv4 state and NLM locks associated with that
filesystem. An xarray in nfsd_net tracks per-superblock entries.
When NFS state is created for a file on a given superblock, an entry
is registered (idempotently) for that superblock. When the filesystem
is unmounted, VFS invokes the notifier callback which queues work to:

 - Cancel ongoing async COPY operations (nfsd4_cancel_copy_by_sb)
 - Release NLM locks (nlmsvc_unlock_all_by_sb)
 - Revoke NFSv4 state (nfsd4_revoke_states)

Each network namespace registers its own notifier_block, allowing the
callback to directly access the correct nfsd_net via container_of().

The revocation work runs on a dedicated workqueue (nfsd_sb_wq) to
avoid deadlocks since the VFS notifier callback should not block for
extended periods. Synchronization between VFS unmount and NFSD shutdown
uses xa_erase() atomicity: the path that successfully erases the xarray
entry triggers work.

If state revocation takes an unexpectedly long time (e.g., when
re-exporting an NFS mount whose backend server is unresponsive),
periodic warnings are emitted every 30 seconds. The wait is
interruptible: if interrupted before work starts, cancel_work()
removes the queued work and revocation runs directly in the unmount
context; if work is already running, the callback returns and
revocation continues in the background. Open files keep the superblock
alive until revocation closes them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/Makefile    |   2 +-
 fs/nfsd/netns.h     |   5 +
 fs/nfsd/nfs4state.c |  29 +++++
 fs/nfsd/nfsctl.c    |  10 +-
 fs/nfsd/sb_watch.c  | 273 ++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/state.h     |   7 ++
 6 files changed, 323 insertions(+), 3 deletions(-)
 create mode 100644 fs/nfsd/sb_watch.c

diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index f0da4d69dc74..bf6146283165 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -13,7 +13,7 @@ nfsd-y			+= trace.o
 nfsd-y 			+= nfssvc.o nfsctl.o nfsfh.o vfs.o \
 			   export.o auth.o lockd.o nfscache.o \
 			   stats.o filecache.o nfs3proc.o nfs3xdr.o \
-			   netlink.o
+			   netlink.o sb_watch.o
 nfsd-$(CONFIG_NFSD_V2) += nfsproc.o nfsxdr.o
 nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
 nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 9fa600602658..bc6004c85a4d 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -13,6 +13,7 @@
 #include <linux/filelock.h>
 #include <linux/nfs4.h>
 #include <linux/percpu_counter.h>
+#include <linux/xarray.h>
 #include <linux/percpu-refcount.h>
 #include <linux/siphash.h>
 #include <linux/sunrpc/stats.h>
@@ -219,6 +220,10 @@ struct nfsd_net {
 	/* last time an admin-revoke happened for NFSv4.0 */
 	time64_t		nfs40_last_revoke;
 
+	/* Superblock watch for automatic state revocation on unmount */
+	struct xarray		nfsd_sb_watches;
+	struct notifier_block	nfsd_umount_notifier;
+
 #if IS_ENABLED(CONFIG_NFS_LOCALIO)
 	/* Local clients to be invalidated when net is shut down */
 	spinlock_t              local_clients_lock;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6b9c399b89df..d35a578db1c0 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6463,6 +6463,16 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
 		status = nfserr_bad_stateid;
 		if (nfsd4_is_deleg_cur(open))
 			goto out;
+		/*
+		 * Watch the superblock so unmount can trigger revocation
+		 * of NFSv4 state (opens, locks, delegations) held by
+		 * clients on this filesystem. nfsd_sb_watch() returns
+		 * immediately if a watch already exists for this sb.
+		 */
+		status = nfsd_sb_watch(SVC_NET(rqstp),
+				       current_fh->fh_export->ex_path.mnt);
+		if (status)
+			goto out;
 	}
 
 	if (!stp) {
@@ -9010,8 +9020,13 @@ static int nfs4_state_create_net(struct net *net)
 
 	shrinker_register(nn->nfsd_client_shrinker);
 
+	if (nfsd_sb_watch_setup(nn))
+		goto err_sb_entries;
+
 	return 0;
 
+err_sb_entries:
+	shrinker_free(nn->nfsd_client_shrinker);
 err_shrinker:
 	put_net(net);
 	kfree(nn->sessionid_hashtbl);
@@ -9111,6 +9126,8 @@ nfs4_state_shutdown_net(struct net *net)
 	struct list_head *pos, *next, reaplist;
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
+	nfsd_sb_watch_shutdown(nn);
+
 	shrinker_free(nn->nfsd_client_shrinker);
 	cancel_work_sync(&nn->nfsd_shrinker_work);
 	disable_delayed_work_sync(&nn->laundromat_work);
@@ -9465,6 +9482,18 @@ nfsd_get_dir_deleg(struct nfsd4_compound_state *cstate,
 	if (rfp != fp) {
 		put_nfs4_file(fp);
 		fp = rfp;
+	} else {
+		/*
+		 * Watch the superblock so unmount can trigger revocation
+		 * of directory delegations held by clients on this
+		 * filesystem. nfsd_sb_watch() returns immediately if a
+		 * watch already exists for this sb.
+		 */
+		if (nfsd_sb_watch(clp->net,
+				  cstate->current_fh.fh_export->ex_path.mnt)) {
+			put_nfs4_file(fp);
+			return ERR_PTR(-EAGAIN);
+		}
 	}
 
 	/* if this client already has one, return that it's unavailable */
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index e9acd2cd602c..5d8a95a48ff9 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2268,9 +2268,12 @@ static int __init init_nfsd(void)
 	retval = nfsd4_create_laundry_wq();
 	if (retval)
 		goto out_free_cld;
+	retval = nfsd_sb_watch_init();
+	if (retval)
+		goto out_free_laundry;
 	retval = register_filesystem(&nfsd_fs_type);
 	if (retval)
-		goto out_free_nfsd4;
+		goto out_free_sb;
 	retval = genl_register_family(&nfsd_nl_family);
 	if (retval)
 		goto out_free_filesystem;
@@ -2284,7 +2287,9 @@ static int __init init_nfsd(void)
 	genl_unregister_family(&nfsd_nl_family);
 out_free_filesystem:
 	unregister_filesystem(&nfsd_fs_type);
-out_free_nfsd4:
+out_free_sb:
+	nfsd_sb_watch_exit();
+out_free_laundry:
 	nfsd4_destroy_laundry_wq();
 out_free_cld:
 	unregister_cld_notifier();
@@ -2307,6 +2312,7 @@ static void __exit exit_nfsd(void)
 	remove_proc_entry("fs/nfs", NULL);
 	genl_unregister_family(&nfsd_nl_family);
 	unregister_filesystem(&nfsd_fs_type);
+	nfsd_sb_watch_exit();
 	nfsd4_destroy_laundry_wq();
 	unregister_cld_notifier();
 	unregister_pernet_subsys(&nfsd_net_ops);
diff --git a/fs/nfsd/sb_watch.c b/fs/nfsd/sb_watch.c
new file mode 100644
index 000000000000..8f711956a12e
--- /dev/null
+++ b/fs/nfsd/sb_watch.c
@@ -0,0 +1,273 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Superblock watch for automatic NFSv4 state revocation on unmount.
+ *
+ * When a local filesystem is unmounted while NFS clients hold state,
+ * this code automatically revokes that state so the unmount can proceed.
+ *
+ * Copyright (C) 2025 Oracle. All rights reserved.
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ */
+
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/sunrpc/svc.h>
+#include <linux/lockd/lockd.h>
+
+#include "nfsd.h"
+#include "netns.h"
+#include "state.h"
+#include "filecache.h"
+
+#define NFSDDBG_FACILITY	NFSDDBG_PROC
+
+static struct workqueue_struct *nfsd_sb_watch_wq;
+
+/* Interval for progress warnings during unmount (in seconds) */
+#define NFSD_STATE_REVOKE_INTERVAL	30
+
+/*
+ * Watch record for a superblock with NFS state. When the filesystem
+ * is unmounted, the notifier callback finds this record and triggers
+ * state revocation.
+ */
+struct nfsd_sb_watch {
+	struct super_block	*sb;
+	struct net		*net;
+	struct work_struct	work;
+	struct completion	done;
+	struct rcu_head		rcu;
+};
+
+static void nfsd_sb_watch_free_rcu(struct rcu_head *rcu)
+{
+	struct nfsd_sb_watch *watch = container_of(rcu, struct nfsd_sb_watch, rcu);
+
+	put_net(watch->net);
+	kfree(watch);
+}
+
+/*
+ * Work function for nfsd_sb_watch - runs in process context.
+ * Cancels async COPYs, releases NLM locks, revokes NFSv4 state, and closes
+ * cached NFSv2/3 files for the superblock.
+ */
+static void nfsd_sb_revoke_work(struct work_struct *work)
+{
+	struct nfsd_sb_watch *watch = container_of(work, struct nfsd_sb_watch, work);
+	struct nfsd_net *nn = net_generic(watch->net, nfsd_net_id);
+
+	pr_info("nfsd: unmount of %s, revoking NFS state\n", watch->sb->s_id);
+
+	nfsd4_cancel_copy_by_sb(watch->net, watch->sb);
+	/* Errors are logged by lockd; no recovery is possible. */
+	(void)nlmsvc_unlock_all_by_sb(watch->sb);
+	nfsd4_revoke_states(nn, watch->sb);
+
+	pr_info("nfsd: state revocation for %s complete\n", watch->sb->s_id);
+
+	complete(&watch->done);
+	call_rcu(&watch->rcu, nfsd_sb_watch_free_rcu);
+}
+
+/*
+ * Trigger state revocation for a superblock and wait for completion.
+ *
+ * The xa_erase() ensures exactly one path (either this notification or
+ * NFSD shutdown) handles cleanup for a given watch record.
+ */
+static void nfsd_sb_trigger_revoke(struct nfsd_net *nn, struct super_block *sb)
+{
+	struct nfsd_sb_watch *watch;
+	unsigned int elapsed = 0;
+	long ret;
+
+	watch = xa_erase(&nn->nfsd_sb_watches, (unsigned long)sb);
+	if (!watch)
+		return;
+
+	queue_work(nfsd_sb_watch_wq, &watch->work);
+
+	/*
+	 * Block until state revocation completes. Periodic warnings help
+	 * diagnose stuck operations (e.g., re-exports of an NFS mount
+	 * whose backend server is unresponsive).
+	 *
+	 * The work function handles freeing, so this function can return
+	 * early on interrupt. Open files keep the superblock alive until
+	 * revocation closes them.
+	 */
+	for (;;) {
+		ret = wait_for_completion_interruptible_timeout(&watch->done,
+						NFSD_STATE_REVOKE_INTERVAL * HZ);
+		if (ret > 0)
+			return;
+
+		if (ret == -ERESTARTSYS) {
+			/*
+			 * Interrupted by signal. If the work has not yet
+			 * started, cancel it and run in this context: a
+			 * successful cancel_work() means no other context
+			 * will execute the work function, so it must run
+			 * here to ensure state revocation occurs.
+			 *
+			 * If already running, cancel_work() waits for
+			 * completion before returning false.
+			 */
+			if (cancel_work(&watch->work)) {
+				pr_warn("nfsd: unmount of %s interrupted, revoking state in unmount context\n",
+					sb->s_id);
+				nfsd_sb_revoke_work(&watch->work);
+				return;
+			}
+			pr_warn("nfsd: unmount of %s interrupted; revocation continues in background\n",
+				sb->s_id);
+			return;
+		}
+
+		/* Timed out - print warning and continue waiting */
+		elapsed += NFSD_STATE_REVOKE_INTERVAL;
+		pr_warn("nfsd: unmount of %s blocked for %u seconds waiting for NFS state revocation\n",
+			sb->s_id, elapsed);
+	}
+}
+
+/*
+ * Notifier callback invoked when any filesystem is unmounted.
+ * Check if this superblock is being watched and trigger revocation.
+ */
+static int nfsd_umount_notifier_call(struct notifier_block *nb,
+				     unsigned long action, void *data)
+{
+	struct nfsd_net *nn = container_of(nb, struct nfsd_net, nfsd_umount_notifier);
+	struct super_block *sb = data;
+
+	nfsd_sb_trigger_revoke(nn, sb);
+	return NOTIFY_DONE;
+}
+
+/**
+ * nfsd_sb_watch - watch a superblock for unmount to trigger state revocation
+ * @net: network namespace
+ * @mnt: vfsmount for the filesystem
+ *
+ * When NFS state is created for a file on this filesystem, register a
+ * watch so the umount notifier can revoke that state on unmount.
+ * Returns nfs_ok on success, or an NFS error on failure.
+ *
+ * This function is idempotent - if a watch already exists for the
+ * superblock, no new watch is created.
+ */
+__be32 nfsd_sb_watch(struct net *net, struct vfsmount *mnt)
+{
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	struct super_block *sb = mnt->mnt_sb;
+	struct nfsd_sb_watch *new;
+	int ret;
+
+	if (xa_load(&nn->nfsd_sb_watches, (unsigned long)sb))
+		return nfs_ok;
+
+	new = kzalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return nfserr_jukebox;
+
+	new->sb = sb;
+	new->net = get_net(net);
+	INIT_WORK(&new->work, nfsd_sb_revoke_work);
+	init_completion(&new->done);
+
+	ret = xa_insert(&nn->nfsd_sb_watches, (unsigned long)sb, new, GFP_KERNEL);
+	if (ret) {
+		/*
+		 * Another task beat us to it. Even if the winner has not
+		 * yet completed insertion, returning here is safe: the
+		 * caller holds an open file reference that prevents
+		 * unmount from completing until state creation finishes.
+		 */
+		put_net(new->net);
+		kfree(new);
+		return nfs_ok;
+	}
+
+	/*
+	 * Callers hold an open file reference, so unmount cannot clear
+	 * SB_ACTIVE while this function executes. Warn if this assumption
+	 * is violated, but handle it gracefully by cleaning up and
+	 * returning an error.
+	 */
+	if (WARN_ON_ONCE(!(READ_ONCE(sb->s_flags) & SB_ACTIVE))) {
+		new = xa_erase(&nn->nfsd_sb_watches, (unsigned long)sb);
+		if (new) {
+			put_net(new->net);
+			kfree(new);
+		}
+		return nfserr_stale;
+	}
+
+	return nfs_ok;
+}
+
+/**
+ * nfsd_sb_watch_setup - initialize umount watch for a network namespace
+ * @nn: nfsd_net for this network namespace
+ *
+ * Called during nfs4_state_create_net(). Registers with the VFS umount
+ * notifier chain to receive callbacks when filesystems are unmounted.
+ */
+int nfsd_sb_watch_setup(struct nfsd_net *nn)
+{
+	xa_init(&nn->nfsd_sb_watches);
+	nn->nfsd_umount_notifier.notifier_call = nfsd_umount_notifier_call;
+	return umount_register_notifier(&nn->nfsd_umount_notifier);
+}
+
+/*
+ * Clean up all watch records during NFSD shutdown.
+ *
+ * xa_erase() synchronizes with nfsd_sb_trigger_revoke(): the path that
+ * successfully erases an xarray entry performs cleanup for that entry.
+ * A NULL return indicates the umount notification path is handling cleanup.
+ */
+static void nfsd_sb_watches_destroy(struct nfsd_net *nn)
+{
+	struct nfsd_sb_watch *watch;
+	unsigned long index;
+
+	xa_for_each(&nn->nfsd_sb_watches, index, watch) {
+		watch = xa_erase(&nn->nfsd_sb_watches, index);
+		if (!watch)
+			continue; /* Umount notification path handling this */
+		cancel_work_sync(&watch->work);
+		put_net(watch->net);
+		kfree(watch);
+	}
+	xa_destroy(&nn->nfsd_sb_watches);
+}
+
+/**
+ * nfsd_sb_watch_shutdown - shutdown umount watch for a network namespace
+ * @nn: nfsd_net for this network namespace
+ *
+ * Must be called during nfsd shutdown before tearing down client state.
+ */
+void nfsd_sb_watch_shutdown(struct nfsd_net *nn)
+{
+	umount_unregister_notifier(&nn->nfsd_umount_notifier);
+	nfsd_sb_watches_destroy(nn);
+}
+
+int nfsd_sb_watch_init(void)
+{
+	nfsd_sb_watch_wq = alloc_workqueue("nfsd_sb_watch", WQ_UNBOUND, 0);
+	if (!nfsd_sb_watch_wq)
+		return -ENOMEM;
+	return 0;
+}
+
+void nfsd_sb_watch_exit(void)
+{
+	destroy_workqueue(nfsd_sb_watch_wq);
+}
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 6fcbf1e427d4..4b57d04f868a 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -853,6 +853,13 @@ static inline void nfsd4_cancel_copy_by_sb(struct net *net, struct super_block *
 }
 #endif
 
+/* superblock watch for unmount notification (sb_watch.c) */
+int nfsd_sb_watch_init(void);
+void nfsd_sb_watch_exit(void);
+__be32 nfsd_sb_watch(struct net *net, struct vfsmount *mnt);
+int nfsd_sb_watch_setup(struct nfsd_net *nn);
+void nfsd_sb_watch_shutdown(struct nfsd_net *nn);
+
 /* grace period management */
 bool nfsd4_force_end_grace(struct nfsd_net *nn);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 3/3] nfsd: close cached files on filesystem unmount
  2026-02-24 16:39 [PATCH v3 0/3] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
  2026-02-24 16:39 ` [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification Chuck Lever
  2026-02-24 16:39 ` [PATCH v3 2/3] nfsd: revoke NFSv4 state when filesystem is unmounted Chuck Lever
@ 2026-02-24 16:39 ` Chuck Lever
  2026-02-24 17:14 ` [PATCH v3 0/3] Automatic NFSv4 state revocation " Al Viro
  3 siblings, 0 replies; 30+ messages in thread
From: Chuck Lever @ 2026-02-24 16:39 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

When a filesystem is unmounted while NFS is exporting it, the
unmount can fail with EBUSY even after NFSv4 state has been revoked.
This occurs because the nfsd_file cache holds open NFSv2/3 file
handles that pin the filesystem.

Extend the mechanism that revokes NFSv4 state on unmount to also
close cached file handles. nfsd_file_close_sb() walks the nfsd_file
cache and disposes of entries belonging to the target superblock.
It runs after NFSv4 state revocation, handling NFSv2/3 file handles
that remain in the cache.

Entries under construction (nf_file not yet set) are skipped; these
have no open file to close.

The hashtable walk releases the mutex periodically to avoid blocking
other NFSD operations during large cache walks. Entries are disposed
incrementally in batches, keeping memory usage bounded and spreading
the I/O load.

A log message is emitted when cached file handles are closed during
unmount, informing administrators that NFS clients may receive stale
file handle errors.

A flush_workqueue() call is added to nfsd_sb_watch_shutdown() to
ensure that any work items still executing complete before shutdown
proceeds. Without this, if an unmount notification returns early
due to signal interruption while the work function is still running,
nfsd_file_cache_shutdown() could destroy the file cache slab while
nfsd_file_close_sb() is still disposing entries.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/filecache.h |  1 +
 fs/nfsd/sb_watch.c  | 10 ++++++++++
 3 files changed, 56 insertions(+)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 1e2b38ed1d35..d1a6f7cf40b2 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -894,6 +894,51 @@ __nfsd_file_cache_purge(struct net *net)
 	nfsd_file_dispose_list(&dispose);
 }
 
+/**
+ * nfsd_file_close_sb - close GC-managed cached files for a superblock
+ * @sb: target superblock
+ *
+ * Walk the nfsd_file cache and close out GC-managed entries (those
+ * acquired via nfsd_file_acquire_gc) that belong to @sb. Called during
+ * filesystem unmount after NFSv4 state revocation to release remaining
+ * cached file handles that may be pinning the filesystem.
+ */
+void nfsd_file_close_sb(struct super_block *sb)
+{
+	struct rhashtable_iter iter;
+	struct nfsd_file *nf;
+	unsigned int closed = 0;
+	LIST_HEAD(dispose);
+
+	if (!test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags))
+		return;
+
+	rhltable_walk_enter(&nfsd_file_rhltable, &iter);
+	do {
+		rhashtable_walk_start(&iter);
+
+		nf = rhashtable_walk_next(&iter);
+		while (!IS_ERR_OR_NULL(nf)) {
+			if (test_bit(NFSD_FILE_GC, &nf->nf_flags) &&
+			    nf->nf_file &&
+			    file_inode(nf->nf_file)->i_sb == sb) {
+				nfsd_file_cond_queue(nf, &dispose);
+				closed++;
+			}
+			nf = rhashtable_walk_next(&iter);
+		}
+
+		rhashtable_walk_stop(&iter);
+	} while (nf == ERR_PTR(-EAGAIN));
+	rhashtable_walk_exit(&iter);
+
+	nfsd_file_dispose_list(&dispose);
+
+	if (closed)
+		pr_info("nfsd: closed %u cached file handle%s on %s\n",
+			closed, closed == 1 ? "" : "s", sb->s_id);
+}
+
 static struct nfsd_fcache_disposal *
 nfsd_alloc_fcache_disposal(void)
 {
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index b383dbc5b921..66ca7fc6189b 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -70,6 +70,7 @@ struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
 struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
 struct file *nfsd_file_file(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
+void nfsd_file_close_sb(struct super_block *sb);
 void nfsd_file_net_dispose(struct nfsd_net *nn);
 bool nfsd_file_is_cached(struct inode *inode);
 __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
diff --git a/fs/nfsd/sb_watch.c b/fs/nfsd/sb_watch.c
index 8f711956a12e..34e50afe566c 100644
--- a/fs/nfsd/sb_watch.c
+++ b/fs/nfsd/sb_watch.c
@@ -65,6 +65,7 @@ static void nfsd_sb_revoke_work(struct work_struct *work)
 	/* Errors are logged by lockd; no recovery is possible. */
 	(void)nlmsvc_unlock_all_by_sb(watch->sb);
 	nfsd4_revoke_states(nn, watch->sb);
+	nfsd_file_close_sb(watch->sb);
 
 	pr_info("nfsd: state revocation for %s complete\n", watch->sb->s_id);
 
@@ -257,6 +258,15 @@ void nfsd_sb_watch_shutdown(struct nfsd_net *nn)
 {
 	umount_unregister_notifier(&nn->nfsd_umount_notifier);
 	nfsd_sb_watches_destroy(nn);
+	/*
+	 * Ensure any work items still running complete before shutdown
+	 * proceeds. This handles the case where an unmount notification
+	 * returned early due to signal interruption but the work function
+	 * is still executing nfsd_file_close_sb(). Without this flush,
+	 * nfsd_file_cache_shutdown() could destroy the slab while the
+	 * work function is still disposing file cache entries.
+	 */
+	flush_workqueue(nfsd_sb_watch_wq);
 }
 
 int nfsd_sb_watch_init(void)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 0/3] Automatic NFSv4 state revocation on filesystem unmount
  2026-02-24 16:39 [PATCH v3 0/3] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
                   ` (2 preceding siblings ...)
  2026-02-24 16:39 ` [PATCH v3 3/3] nfsd: close cached files on filesystem unmount Chuck Lever
@ 2026-02-24 17:14 ` Al Viro
  3 siblings, 0 replies; 30+ messages in thread
From: Al Viro @ 2026-02-24 17:14 UTC (permalink / raw)
  To: Chuck Lever
  Cc: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-fsdevel, Chuck Lever

On Tue, Feb 24, 2026 at 11:39:05AM -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> When an NFS server exports a filesystem and clients hold NFSv4 state
> (opens, locks, delegations), unmounting the underlying filesystem
> fails with EBUSY. The /proc/fs/nfsd/unlock_fs interface exists for
> administrators to manually revoke state before retrying the unmount,
> but this approach has significant operational drawbacks.
> 
> Manual intervention breaks automation workflows. Containerized NFS
> servers, orchestration systems, and unattended maintenance scripts
> cannot reliably unmount exported filesystems without implementing
> custom logic to detect the failure and invoke unlock_fs. System
> administrators managing many exports face tedious, error-prone
> procedures when decommissioning storage.
> 
> This series enables the NFS server to detect filesystem unmount
> events and automatically revoke associated state. The mechanism
> registers with a new SRCU notifier chain in VFS that fires during
> mount teardown, after processing stuck children but before
> fsnotify_vfsmount_delete(), while SB_ACTIVE is still set. When a
> filesystem is unmounted, all NFSv4 opens, locks, and delegations
> referencing it are revoked, async COPY operations are cancelled
> with NFS4ERR_ADMIN_REVOKED sent to clients, NLM locks are released,
> and cached file handles are closed.
> 
> With automatic revocation, unmount operations complete without
> administrator intervention once the brief state cleanup finishes.
> Clients receive immediate notification of state loss through
> standard NFSv4 error codes, allowing applications to handle the
> situation appropriately rather than encountering silent failures.

So anyone can force that just with unshare -U -m date?  Creates
a new namespace, populated by clones of all mounts you see, runs
date(1) in that, then exits, with namespace dissolved.  At that
point all cloned mounts are released, each triggering your notifier
chain...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-02-24 16:39 ` [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification Chuck Lever
@ 2026-02-26  8:48   ` Christian Brauner
  2026-02-26 10:52     ` Amir Goldstein
  0 siblings, 1 reply; 30+ messages in thread
From: Christian Brauner @ 2026-02-26  8:48 UTC (permalink / raw)
  To: Chuck Lever, Jan Kara, Amir Goldstein
  Cc: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-fsdevel, Chuck Lever

On Tue, Feb 24, 2026 at 11:39:06AM -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Kernel subsystems occasionally need notification when a filesystem
> is unmounted. Until now, the only mechanism available is the fs_pin
> infrastructure, which has limited adoption (only BSD process
> accounting uses it) and VFS maintainers consider it deprecated.
> 
> Add an SRCU notifier chain that fires during mount teardown,
> following the pattern established by lease_notifier_chain in
> fs/locks.c. The notifier fires after processing stuck children but
> before fsnotify_vfsmount_delete(), at which point SB_ACTIVE is
> still set and the superblock remains fully accessible.

What I don't understand is why you need this per-mount especially
because you say above "when a filesystem is mounted. Could you explain
this in some more details, please?

Also this should take namespaces into account somehow, right? As Al
correctly observed anything that does CLONE_NEWNS and inherits your
mountable will generate notifications. Like, if systemd spawns services,
if a container runtime start, if someone uses unshare you'll get
absolutely flooded with events. I'm pretty sure that is not what you
want and that is defo not what the VFS should do...

Another thing: These ad-hoc notifiers are horrific. So I'm pitching
another idea and I hope that Jan and Amir can tell me that this is
doable...

Can we extend fsnotify so that it's possible for a filesystem to
register "internal watches" on relevant objects such as mounts and
superblocks and get notified and execute blocking stuff if needed.

Then we don't have to add another set of custom notification mechanisms
but have it available in a single subsystem and uniformely available.

> The SRCU notifier type is chosen because:
>  - Unmount is relatively infrequent, so the overhead of SRCU
>    registration and unregistration is acceptable
>  - Callbacks run in process context and may sleep
>  - No cache bounces occur during chain traversal
> 
> NFSD requires this mechanism to revoke NFSv4 state (opens, locks,
> delegations) and release cached file handles when a filesystem is
> unmounted, avoiding EBUSY errors that occur when client state pins
> the mount.
> 
> Suggested-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/namespace.c        | 69 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/mount.h |  4 +++
>  2 files changed, 73 insertions(+)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ebe19ded293a..269e007e9312 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -34,6 +34,7 @@
>  #include <linux/mnt_idmapping.h>
>  #include <linux/pidfs.h>
>  #include <linux/nstree.h>
> +#include <linux/notifier.h>
>  
>  #include "pnode.h"
>  #include "internal.h"
> @@ -73,6 +74,70 @@ static u64 event;
>  static DEFINE_XARRAY_FLAGS(mnt_id_xa, XA_FLAGS_ALLOC);
>  static DEFINE_IDA(mnt_group_ida);
>  
> +/*
> + * Kernel subsystems can register to be notified when a filesystem is
> + * unmounted. This is used by (e.g.) nfsd to revoke state associated
> + * with files on the filesystem being unmounted.
> + */
> +static struct srcu_notifier_head umount_notifier_chain;
> +
> +/**
> + * umount_register_notifier - register for unmount notifications
> + * @nb: notifier_block to register
> + *
> + * Registers a notifier to be called when any filesystem is
> + * unmounted. The callback is invoked after stuck children are
> + * processed but before fsnotify_vfsmount_delete(), while SB_ACTIVE
> + * is still set and the superblock remains fully accessible.
> + *
> + * Callback signature:
> + *   int (*callback)(struct notifier_block *nb,
> + *                   unsigned long val, void *data)
> + *
> + *   @val:  always 0 (reserved for future extension)
> + *   @data: struct super_block * for the unmounting filesystem
> + *
> + * Callbacks run in process context and may sleep. Return
> + * NOTIFY_DONE from the callback; return values are ignored and
> + * cannot prevent unmount. Callbacks must handle their own error
> + * recovery internally.
> + *
> + * The notification fires once per mount instance. Bind mounts of
> + * the same filesystem trigger multiple callbacks with the same
> + * super_block pointer; callbacks must handle duplicate
> + * notifications idempotently.
> + *
> + * The super_block pointer is valid only for the duration of the
> + * callback. Callbacks must not retain this pointer for
> + * asynchronous use; to access the filesystem after the callback
> + * returns, acquire a separate reference (e.g., via an open file)
> + * during callback execution.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int umount_register_notifier(struct notifier_block *nb)
> +{
> +	return srcu_notifier_chain_register(&umount_notifier_chain, nb);
> +}
> +EXPORT_SYMBOL_GPL(umount_register_notifier);
> +
> +/**
> + * umount_unregister_notifier - unregister an unmount notifier
> + * @nb: notifier_block to unregister
> + *
> + * Unregisters a previously registered notifier. This function may
> + * block due to SRCU synchronization.
> + *
> + * Must not be called from within a notifier callback; doing so
> + * causes deadlock. Must be called before module unload if the
> + * notifier_block resides in module memory.
> + */
> +void umount_unregister_notifier(struct notifier_block *nb)
> +{
> +	srcu_notifier_chain_unregister(&umount_notifier_chain, nb);
> +}
> +EXPORT_SYMBOL_GPL(umount_unregister_notifier);
> +
>  /* Don't allow confusion with old 32bit mount ID */
>  #define MNT_UNIQUE_ID_OFFSET (1ULL << 31)
>  static u64 mnt_id_ctr = MNT_UNIQUE_ID_OFFSET;
> @@ -1307,6 +1372,8 @@ static void cleanup_mnt(struct mount *mnt)
>  		hlist_del(&m->mnt_umount);
>  		mntput(&m->mnt);
>  	}
> +	/* Notify registrants before superblock deactivation */
> +	srcu_notifier_call_chain(&umount_notifier_chain, 0, mnt->mnt.mnt_sb);
>  	fsnotify_vfsmount_delete(&mnt->mnt);
>  	dput(mnt->mnt.mnt_root);
>  	deactivate_super(mnt->mnt.mnt_sb);
> @@ -6189,6 +6256,8 @@ void __init mnt_init(void)
>  {
>  	int err;
>  
> +	srcu_init_notifier_head(&umount_notifier_chain);
> +
>  	mnt_cache = kmem_cache_create("mnt_cache", sizeof(struct mount),
>  			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
>  
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index acfe7ef86a1b..9a46ab40dffd 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -21,6 +21,7 @@ struct file_system_type;
>  struct fs_context;
>  struct file;
>  struct path;
> +struct notifier_block;
>  
>  enum mount_flags {
>  	MNT_NOSUID	= 0x01,
> @@ -109,4 +110,7 @@ extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
>  
>  extern int cifs_root_data(char **dev, char **opts);
>  
> +int umount_register_notifier(struct notifier_block *nb);
> +void umount_unregister_notifier(struct notifier_block *nb);
> +
>  #endif /* _LINUX_MOUNT_H */
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-02-26  8:48   ` Christian Brauner
@ 2026-02-26 10:52     ` Amir Goldstein
  2026-02-26 13:27       ` Chuck Lever
  0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2026-02-26 10:52 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Jan Kara, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, linux-nfs, linux-fsdevel, Chuck Lever,
	Christian Brauner

On Thu, Feb 26, 2026 at 9:48 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Feb 24, 2026 at 11:39:06AM -0500, Chuck Lever wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> >
> > Kernel subsystems occasionally need notification when a filesystem
> > is unmounted. Until now, the only mechanism available is the fs_pin
> > infrastructure, which has limited adoption (only BSD process
> > accounting uses it) and VFS maintainers consider it deprecated.
> >
> > Add an SRCU notifier chain that fires during mount teardown,
> > following the pattern established by lease_notifier_chain in
> > fs/locks.c. The notifier fires after processing stuck children but
> > before fsnotify_vfsmount_delete(), at which point SB_ACTIVE is
> > still set and the superblock remains fully accessible.

Did you see commit 74bd284537b34 ("fsnotify: Shutdown fsnotify
before destroying sb's dcache")?

Does it make the fsnotify_sb_delete() hook an appropriate place
for this cleanup?

We could send an FS_UNMOUNT event on sb, the same way as we send
it on inode in fsnotify_unmount_inodes().

>
> What I don't understand is why you need this per-mount especially
> because you say above "when a filesystem is mounted. Could you explain
> this in some more details, please?
>

The confusing thing is that FS_UNMOUNT/IN_UNMOUNT are sent
for inotify when the sb is destroyed, not when the mount is unmounted.

If we wanted we could also send FS_UNMOUNT in fsnotify_vfsmount_delete(),
but that would be too confusing.

I think the only reason that we did not add fanotify support for FAN_UNMOUNT
is this name confusion, but there could be other reasons which I don't
remember.

> Also this should take namespaces into account somehow, right? As Al
> correctly observed anything that does CLONE_NEWNS and inherits your
> mountable will generate notifications. Like, if systemd spawns services,
> if a container runtime start, if someone uses unshare you'll get
> absolutely flooded with events. I'm pretty sure that is not what you
> want and that is defo not what the VFS should do...
>
> Another thing: These ad-hoc notifiers are horrific. So I'm pitching
> another idea and I hope that Jan and Amir can tell me that this is
> doable...
>
> Can we extend fsnotify so that it's possible for a filesystem to
> register "internal watches" on relevant objects such as mounts and
> superblocks and get notified and execute blocking stuff if needed.
>

You mean like nfsd_file_fsnotify_group? ;)

> Then we don't have to add another set of custom notification mechanisms
> but have it available in a single subsystem and uniformely available.
>

I don't see a problem with nfsd registering for FS_UNMOUNT
event on sb (once we add it).

As a matter of fact, I think that nfsd can already add an inode
mark on the export root path for FS_UNMOUNT event.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-02-26 10:52     ` Amir Goldstein
@ 2026-02-26 13:27       ` Chuck Lever
  2026-02-26 13:32         ` Jan Kara
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2026-02-26 13:27 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, linux-nfs, linux-fsdevel, Chuck Lever,
	Christian Brauner

On 2/26/26 5:52 AM, Amir Goldstein wrote:
> On Thu, Feb 26, 2026 at 9:48 AM Christian Brauner <brauner@kernel.org> wrote:
>>
>> On Tue, Feb 24, 2026 at 11:39:06AM -0500, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>
>>> Kernel subsystems occasionally need notification when a filesystem
>>> is unmounted. Until now, the only mechanism available is the fs_pin
>>> infrastructure, which has limited adoption (only BSD process
>>> accounting uses it) and VFS maintainers consider it deprecated.
>>>
>>> Add an SRCU notifier chain that fires during mount teardown,
>>> following the pattern established by lease_notifier_chain in
>>> fs/locks.c. The notifier fires after processing stuck children but
>>> before fsnotify_vfsmount_delete(), at which point SB_ACTIVE is
>>> still set and the superblock remains fully accessible.
> 
> Did you see commit 74bd284537b34 ("fsnotify: Shutdown fsnotify
> before destroying sb's dcache")?
> 
> Does it make the fsnotify_sb_delete() hook an appropriate place
> for this cleanup?
> 
> We could send an FS_UNMOUNT event on sb, the same way as we send
> it on inode in fsnotify_unmount_inodes().
> 
>>
>> What I don't understand is why you need this per-mount especially
>> because you say above "when a filesystem is mounted. Could you explain
>> this in some more details, please?
>>
> 
> The confusing thing is that FS_UNMOUNT/IN_UNMOUNT are sent
> for inotify when the sb is destroyed, not when the mount is unmounted.
> 
> If we wanted we could also send FS_UNMOUNT in fsnotify_vfsmount_delete(),
> but that would be too confusing.
> 
> I think the only reason that we did not add fanotify support for FAN_UNMOUNT
> is this name confusion, but there could be other reasons which I don't
> remember.
> 
>> Also this should take namespaces into account somehow, right? As Al
>> correctly observed anything that does CLONE_NEWNS and inherits your
>> mountable will generate notifications. Like, if systemd spawns services,
>> if a container runtime start, if someone uses unshare you'll get
>> absolutely flooded with events. I'm pretty sure that is not what you
>> want and that is defo not what the VFS should do...

I agree with Al's earlier comment and have added some protection there
for the next revision of the series.


>> Another thing: These ad-hoc notifiers are horrific. So I'm pitching
>> another idea and I hope that Jan and Amir can tell me that this is
>> doable...
>>
>> Can we extend fsnotify so that it's possible for a filesystem to
>> register "internal watches" on relevant objects such as mounts and
>> superblocks and get notified and execute blocking stuff if needed.
>>
> 
> You mean like nfsd_file_fsnotify_group? ;)
> 
>> Then we don't have to add another set of custom notification mechanisms
>> but have it available in a single subsystem and uniformely available.
>>
> 
> I don't see a problem with nfsd registering for FS_UNMOUNT
> event on sb (once we add it).
> 
> As a matter of fact, I think that nfsd can already add an inode
> mark on the export root path for FS_UNMOUNT event.

There isn't much required here aside from getting a synchronous notice
that the final file system unmount is going on. I'm happy to try
whatever mechanism VFS maintainers are most comfortable with.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-02-26 13:27       ` Chuck Lever
@ 2026-02-26 13:32         ` Jan Kara
  2026-02-27 15:10           ` Chuck Lever
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2026-02-26 13:32 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Amir Goldstein, Jan Kara, NeilBrown, Jeff Layton,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-fsdevel,
	Chuck Lever, Christian Brauner

On Thu 26-02-26 08:27:00, Chuck Lever wrote:
> On 2/26/26 5:52 AM, Amir Goldstein wrote:
> > On Thu, Feb 26, 2026 at 9:48 AM Christian Brauner <brauner@kernel.org> wrote:
> >> Another thing: These ad-hoc notifiers are horrific. So I'm pitching
> >> another idea and I hope that Jan and Amir can tell me that this is
> >> doable...
> >>
> >> Can we extend fsnotify so that it's possible for a filesystem to
> >> register "internal watches" on relevant objects such as mounts and
> >> superblocks and get notified and execute blocking stuff if needed.
> >>
> > 
> > You mean like nfsd_file_fsnotify_group? ;)
> > 
> >> Then we don't have to add another set of custom notification mechanisms
> >> but have it available in a single subsystem and uniformely available.
> >>
> > 
> > I don't see a problem with nfsd registering for FS_UNMOUNT
> > event on sb (once we add it).
> > 
> > As a matter of fact, I think that nfsd can already add an inode
> > mark on the export root path for FS_UNMOUNT event.
> 
> There isn't much required here aside from getting a synchronous notice
> that the final file system unmount is going on. I'm happy to try
> whatever mechanism VFS maintainers are most comfortable with.

Yeah, then as Amir writes placing a mark with FS_UNMOUNT event on the
export root path and handling the event in
nfsd_file_fsnotify_handle_event() should do what you need?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-02-26 13:32         ` Jan Kara
@ 2026-02-27 15:10           ` Chuck Lever
  2026-03-01 14:37             ` Amir Goldstein
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2026-02-27 15:10 UTC (permalink / raw)
  To: Jan Kara, Amir Goldstein, Christian Brauner
  Cc: Jan Kara, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, linux-nfs, linux-fsdevel, Chuck Lever

On 2/26/26 8:32 AM, Jan Kara wrote:
> On Thu 26-02-26 08:27:00, Chuck Lever wrote:
>> On 2/26/26 5:52 AM, Amir Goldstein wrote:
>>> On Thu, Feb 26, 2026 at 9:48 AM Christian Brauner <brauner@kernel.org> wrote:
>>>> Another thing: These ad-hoc notifiers are horrific. So I'm pitching
>>>> another idea and I hope that Jan and Amir can tell me that this is
>>>> doable...
>>>>
>>>> Can we extend fsnotify so that it's possible for a filesystem to
>>>> register "internal watches" on relevant objects such as mounts and
>>>> superblocks and get notified and execute blocking stuff if needed.
>>>>
>>>
>>> You mean like nfsd_file_fsnotify_group? ;)
>>>
>>>> Then we don't have to add another set of custom notification mechanisms
>>>> but have it available in a single subsystem and uniformely available.
>>>>
>>>
>>> I don't see a problem with nfsd registering for FS_UNMOUNT
>>> event on sb (once we add it).
>>>
>>> As a matter of fact, I think that nfsd can already add an inode
>>> mark on the export root path for FS_UNMOUNT event.
>>
>> There isn't much required here aside from getting a synchronous notice
>> that the final file system unmount is going on. I'm happy to try
>> whatever mechanism VFS maintainers are most comfortable with.
> 
> Yeah, then as Amir writes placing a mark with FS_UNMOUNT event on the
> export root path and handling the event in
> nfsd_file_fsnotify_handle_event() should do what you need?

Turns out FS_UNMOUNT doesn't do what I need.

1/3 here has a fatal flaw: the SRCU notifier does not fire until all
files on the mount are closed. The problem is that NFSD holds files
open when there is outstanding NFSv4 state. So the SRCU notifier will
never fire, on umount, to release that state.

FS_UNMOUNT notifiers have the same issue.

They fire from fsnotify_sb_delete() inside generic_shutdown_super(),
which runs inside deactivate_locked_super(), which runs when s_active
drops to 0. That requires all mounts to be freed, which requires all
NFSD files to be closed: the same problem.

For any notification approach to actually do what is needed, it needs to
fire during do_umount(), before propagate_mount_busy(). Something like:

do_umount(mnt):
    <- NEW: notify subsystems, allow them to release file refs
    retval = propagate_mount_busy(mnt, 2)   // now passes
    umount_tree(mnt, ...)

This is what Christian's "internal watches... execute blocking stuff"
would need to enable. The existing fsnotify plumbing (groups, marks,
event dispatch) provides the infrastructure, but a new notification hook
in do_umount() is required — neither fsnotify_vfsmount_delete() nor
fsnotify_sb_delete() fires early enough.

But a hook in do_umount() fires for every mount namespace teardown, not
just admin-initiated unmounts. NFSD's callback would need to filter
(e.g., only act when it's the last mount of a superblock that NFSD is
exporting).

This is why I originally went with fs_pin. Not saying the series should
go back to that, but this is the basic requirement: NFSD needs
notification of a umount request while files are still open on that
mount, so that it can revoke the NFSv4 state and close those files.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-02-27 15:10           ` Chuck Lever
@ 2026-03-01 14:37             ` Amir Goldstein
  2026-03-01 17:20               ` Chuck Lever
  0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2026-03-01 14:37 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Jan Kara, Christian Brauner, Jan Kara, NeilBrown, Jeff Layton,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-fsdevel,
	Chuck Lever

On Fri, Feb 27, 2026 at 4:10 PM Chuck Lever <cel@kernel.org> wrote:
>
> On 2/26/26 8:32 AM, Jan Kara wrote:
> > On Thu 26-02-26 08:27:00, Chuck Lever wrote:
> >> On 2/26/26 5:52 AM, Amir Goldstein wrote:
> >>> On Thu, Feb 26, 2026 at 9:48 AM Christian Brauner <brauner@kernel.org> wrote:
> >>>> Another thing: These ad-hoc notifiers are horrific. So I'm pitching
> >>>> another idea and I hope that Jan and Amir can tell me that this is
> >>>> doable...
> >>>>
> >>>> Can we extend fsnotify so that it's possible for a filesystem to
> >>>> register "internal watches" on relevant objects such as mounts and
> >>>> superblocks and get notified and execute blocking stuff if needed.
> >>>>
> >>>
> >>> You mean like nfsd_file_fsnotify_group? ;)
> >>>
> >>>> Then we don't have to add another set of custom notification mechanisms
> >>>> but have it available in a single subsystem and uniformely available.
> >>>>
> >>>
> >>> I don't see a problem with nfsd registering for FS_UNMOUNT
> >>> event on sb (once we add it).
> >>>
> >>> As a matter of fact, I think that nfsd can already add an inode
> >>> mark on the export root path for FS_UNMOUNT event.
> >>
> >> There isn't much required here aside from getting a synchronous notice
> >> that the final file system unmount is going on. I'm happy to try
> >> whatever mechanism VFS maintainers are most comfortable with.
> >
> > Yeah, then as Amir writes placing a mark with FS_UNMOUNT event on the
> > export root path and handling the event in
> > nfsd_file_fsnotify_handle_event() should do what you need?
>
> Turns out FS_UNMOUNT doesn't do what I need.
>
> 1/3 here has a fatal flaw: the SRCU notifier does not fire until all
> files on the mount are closed. The problem is that NFSD holds files
> open when there is outstanding NFSv4 state. So the SRCU notifier will
> never fire, on umount, to release that state.
>
> FS_UNMOUNT notifiers have the same issue.
>
> They fire from fsnotify_sb_delete() inside generic_shutdown_super(),
> which runs inside deactivate_locked_super(), which runs when s_active
> drops to 0. That requires all mounts to be freed, which requires all
> NFSD files to be closed: the same problem.
>
> For any notification approach to actually do what is needed, it needs to
> fire during do_umount(), before propagate_mount_busy(). Something like:
>
> do_umount(mnt):
>     <- NEW: notify subsystems, allow them to release file refs
>     retval = propagate_mount_busy(mnt, 2)   // now passes
>     umount_tree(mnt, ...)
>
> This is what Christian's "internal watches... execute blocking stuff"
> would need to enable. The existing fsnotify plumbing (groups, marks,
> event dispatch) provides the infrastructure, but a new notification hook
> in do_umount() is required — neither fsnotify_vfsmount_delete() nor
> fsnotify_sb_delete() fires early enough.
>
> But a hook in do_umount() fires for every mount namespace teardown, not
> just admin-initiated unmounts. NFSD's callback would need to filter
> (e.g., only act when it's the last mount of a superblock that NFSD is
> exporting).
>
> This is why I originally went with fs_pin. Not saying the series should
> go back to that, but this is the basic requirement: NFSD needs
> notification of a umount request while files are still open on that
> mount, so that it can revoke the NFSv4 state and close those files.
>

I understand the problem with FS_UNMOUNT, but I fail to understand
the desired semantics, specifically the "the last mount of a superblock
that NFSD is exporting".

One option is that nfsd will use a private mount clone for accessing
files as overlayfs does and wait until all the other mounts are gone,
but FWIW that has some user visible implications.

Then we can enable subscribing for FS_MNT_DETACH events on a
super_block.
fanotify UAPI currently only allows subscribing them on mntns,
but allowing internal users to subscribe on all the unmounts of a sb
should be as simple as below (famous last word).

Thanks,
Amir.


diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 9995de1710e59..0abe16db3636c 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -695,6 +695,7 @@ void fsnotify_mnt(__u32 mask, struct mnt_namespace
*ns, struct vfsmount *mnt)
 {
        struct fsnotify_mnt data = {
                .ns = ns,
+               .sb = mnt->mnt_sb,
                .mnt_id = real_mount(mnt)->mnt_id_unique,
        };

diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 95985400d3d8e..c21fae333f0dc 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -332,6 +332,7 @@ static inline const struct path
*file_range_path(const struct file_range *range)

 struct fsnotify_mnt {
        const struct mnt_namespace *ns;
+       struct super_block *sb;
        u64 mnt_id;
 };

@@ -395,6 +396,8 @@ static inline struct super_block
*fsnotify_data_sb(const void *data,
                return file_range_path(data)->dentry->d_sb;
        case FSNOTIFY_EVENT_ERROR:
                return ((struct fs_error_report *) data)->sb;
+       case FSNOTIFY_EVENT_MNT:
+               return ((struct fsnotify_mnt *)data)->sb;
        default:
                return NULL;
        }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-01 14:37             ` Amir Goldstein
@ 2026-03-01 17:20               ` Chuck Lever
  2026-03-01 18:09                 ` Amir Goldstein
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2026-03-01 17:20 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Christian Brauner, Jan Kara, NeilBrown, Jeff Layton,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-fsdevel,
	Chuck Lever



On Sun, Mar 1, 2026, at 9:37 AM, Amir Goldstein wrote:
> On Fri, Feb 27, 2026 at 4:10 PM Chuck Lever <cel@kernel.org> wrote:
>>
>> On 2/26/26 8:32 AM, Jan Kara wrote:
>> > On Thu 26-02-26 08:27:00, Chuck Lever wrote:
>> >> On 2/26/26 5:52 AM, Amir Goldstein wrote:
>> >>> On Thu, Feb 26, 2026 at 9:48 AM Christian Brauner <brauner@kernel.org> wrote:
>> >>>> Another thing: These ad-hoc notifiers are horrific. So I'm pitching
>> >>>> another idea and I hope that Jan and Amir can tell me that this is
>> >>>> doable...
>> >>>>
>> >>>> Can we extend fsnotify so that it's possible for a filesystem to
>> >>>> register "internal watches" on relevant objects such as mounts and
>> >>>> superblocks and get notified and execute blocking stuff if needed.
>> >>>>
>> >>>
>> >>> You mean like nfsd_file_fsnotify_group? ;)
>> >>>
>> >>>> Then we don't have to add another set of custom notification mechanisms
>> >>>> but have it available in a single subsystem and uniformely available.
>> >>>>
>> >>>
>> >>> I don't see a problem with nfsd registering for FS_UNMOUNT
>> >>> event on sb (once we add it).
>> >>>
>> >>> As a matter of fact, I think that nfsd can already add an inode
>> >>> mark on the export root path for FS_UNMOUNT event.
>> >>
>> >> There isn't much required here aside from getting a synchronous notice
>> >> that the final file system unmount is going on. I'm happy to try
>> >> whatever mechanism VFS maintainers are most comfortable with.
>> >
>> > Yeah, then as Amir writes placing a mark with FS_UNMOUNT event on the
>> > export root path and handling the event in
>> > nfsd_file_fsnotify_handle_event() should do what you need?
>>
>> Turns out FS_UNMOUNT doesn't do what I need.
>>
>> 1/3 here has a fatal flaw: the SRCU notifier does not fire until all
>> files on the mount are closed. The problem is that NFSD holds files
>> open when there is outstanding NFSv4 state. So the SRCU notifier will
>> never fire, on umount, to release that state.
>>
>> FS_UNMOUNT notifiers have the same issue.
>>
>> They fire from fsnotify_sb_delete() inside generic_shutdown_super(),
>> which runs inside deactivate_locked_super(), which runs when s_active
>> drops to 0. That requires all mounts to be freed, which requires all
>> NFSD files to be closed: the same problem.
>>
>> For any notification approach to actually do what is needed, it needs to
>> fire during do_umount(), before propagate_mount_busy(). Something like:
>>
>> do_umount(mnt):
>>     <- NEW: notify subsystems, allow them to release file refs
>>     retval = propagate_mount_busy(mnt, 2)   // now passes
>>     umount_tree(mnt, ...)
>>
>> This is what Christian's "internal watches... execute blocking stuff"
>> would need to enable. The existing fsnotify plumbing (groups, marks,
>> event dispatch) provides the infrastructure, but a new notification hook
>> in do_umount() is required — neither fsnotify_vfsmount_delete() nor
>> fsnotify_sb_delete() fires early enough.
>>
>> But a hook in do_umount() fires for every mount namespace teardown, not
>> just admin-initiated unmounts. NFSD's callback would need to filter
>> (e.g., only act when it's the last mount of a superblock that NFSD is
>> exporting).
>>
>> This is why I originally went with fs_pin. Not saying the series should
>> go back to that, but this is the basic requirement: NFSD needs
>> notification of a umount request while files are still open on that
>> mount, so that it can revoke the NFSv4 state and close those files.
>>
>
> I understand the problem with FS_UNMOUNT, but I fail to understand
> the desired semantics, specifically the "the last mount of a superblock
> that NFSD is exporting".

Perhaps that description nails down too much implementation detail,
and it might be stale. A broader description is this user story:

"As a system administrator, I'd like to be able to unexport an NFSD
share that is being accessed by NFSv4 clients, and then unmount it,
reliably (for example, via automation). Currently the umount step
hangs if there are still outstanding delegations granted to the NFSv4
clients."

The discussion here has added some interesting corner cases: NFSD
can export bind mounts (portions of a local physical file system);
unprivileged users can create and umount file systems using "share".

The goal is to make umount behavior more deterministic without
having to insert additional adminstrative steps.

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-01 17:20               ` Chuck Lever
@ 2026-03-01 18:09                 ` Amir Goldstein
  2026-03-01 18:19                   ` Chuck Lever
  0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2026-03-01 18:09 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Jan Kara, Christian Brauner, Jan Kara, NeilBrown, Jeff Layton,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-fsdevel,
	Chuck Lever

On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
>
>
>
> On Sun, Mar 1, 2026, at 9:37 AM, Amir Goldstein wrote:
> > On Fri, Feb 27, 2026 at 4:10 PM Chuck Lever <cel@kernel.org> wrote:
> >>
> >> On 2/26/26 8:32 AM, Jan Kara wrote:
> >> > On Thu 26-02-26 08:27:00, Chuck Lever wrote:
> >> >> On 2/26/26 5:52 AM, Amir Goldstein wrote:
> >> >>> On Thu, Feb 26, 2026 at 9:48 AM Christian Brauner <brauner@kernel.org> wrote:
> >> >>>> Another thing: These ad-hoc notifiers are horrific. So I'm pitching
> >> >>>> another idea and I hope that Jan and Amir can tell me that this is
> >> >>>> doable...
> >> >>>>
> >> >>>> Can we extend fsnotify so that it's possible for a filesystem to
> >> >>>> register "internal watches" on relevant objects such as mounts and
> >> >>>> superblocks and get notified and execute blocking stuff if needed.
> >> >>>>
> >> >>>
> >> >>> You mean like nfsd_file_fsnotify_group? ;)
> >> >>>
> >> >>>> Then we don't have to add another set of custom notification mechanisms
> >> >>>> but have it available in a single subsystem and uniformely available.
> >> >>>>
> >> >>>
> >> >>> I don't see a problem with nfsd registering for FS_UNMOUNT
> >> >>> event on sb (once we add it).
> >> >>>
> >> >>> As a matter of fact, I think that nfsd can already add an inode
> >> >>> mark on the export root path for FS_UNMOUNT event.
> >> >>
> >> >> There isn't much required here aside from getting a synchronous notice
> >> >> that the final file system unmount is going on. I'm happy to try
> >> >> whatever mechanism VFS maintainers are most comfortable with.
> >> >
> >> > Yeah, then as Amir writes placing a mark with FS_UNMOUNT event on the
> >> > export root path and handling the event in
> >> > nfsd_file_fsnotify_handle_event() should do what you need?
> >>
> >> Turns out FS_UNMOUNT doesn't do what I need.
> >>
> >> 1/3 here has a fatal flaw: the SRCU notifier does not fire until all
> >> files on the mount are closed. The problem is that NFSD holds files
> >> open when there is outstanding NFSv4 state. So the SRCU notifier will
> >> never fire, on umount, to release that state.
> >>
> >> FS_UNMOUNT notifiers have the same issue.
> >>
> >> They fire from fsnotify_sb_delete() inside generic_shutdown_super(),
> >> which runs inside deactivate_locked_super(), which runs when s_active
> >> drops to 0. That requires all mounts to be freed, which requires all
> >> NFSD files to be closed: the same problem.
> >>
> >> For any notification approach to actually do what is needed, it needs to
> >> fire during do_umount(), before propagate_mount_busy(). Something like:
> >>
> >> do_umount(mnt):
> >>     <- NEW: notify subsystems, allow them to release file refs
> >>     retval = propagate_mount_busy(mnt, 2)   // now passes
> >>     umount_tree(mnt, ...)
> >>
> >> This is what Christian's "internal watches... execute blocking stuff"
> >> would need to enable. The existing fsnotify plumbing (groups, marks,
> >> event dispatch) provides the infrastructure, but a new notification hook
> >> in do_umount() is required — neither fsnotify_vfsmount_delete() nor
> >> fsnotify_sb_delete() fires early enough.
> >>
> >> But a hook in do_umount() fires for every mount namespace teardown, not
> >> just admin-initiated unmounts. NFSD's callback would need to filter
> >> (e.g., only act when it's the last mount of a superblock that NFSD is
> >> exporting).
> >>
> >> This is why I originally went with fs_pin. Not saying the series should
> >> go back to that, but this is the basic requirement: NFSD needs
> >> notification of a umount request while files are still open on that
> >> mount, so that it can revoke the NFSv4 state and close those files.
> >>
> >
> > I understand the problem with FS_UNMOUNT, but I fail to understand
> > the desired semantics, specifically the "the last mount of a superblock
> > that NFSD is exporting".
>
> Perhaps that description nails down too much implementation detail,
> and it might be stale. A broader description is this user story:
>
> "As a system administrator, I'd like to be able to unexport an NFSD

Doesn't "unexporting" involve communicating to nfsd?
Meaning calling to svc_export_put() to path_put() the
share root path?

> share that is being accessed by NFSv4 clients, and then unmount it,
> reliably (for example, via automation). Currently the umount step
> hangs if there are still outstanding delegations granted to the NFSv4
> clients."

Can't svc_export_put() be the trigger for nfsd to release all resources
associated with this share?

>
> The discussion here has added some interesting corner cases: NFSD
> can export bind mounts (portions of a local physical file system);
> unprivileged users can create and umount file systems using "share".
>

The basic question is whether nfsd is exporting a mount, a filesystem
or something in between (i.e. a subtree of a filesystem).

AFAIK, the current implementation is that nfsd is actually exporting
one specific mount, so changing the properties of this mount
(e.g. readonly) would affect the exported share.

If nfsd wanted to export a subtree of a filesystem it could
in theory use a cloned private mount as overlayfs does, but
then things like audit logs may not have a full path if the original
path used to export the subtree was unmounted.

> The goal is to make umount behavior more deterministic without
> having to insert additional adminstrative steps.
>

I would say that if svc_export_put() succeeded, umount should
be able to work, but I realize that with automounts, the exports
are not actually administered from userspace all the way (?),
so perhaps the problem needs to be spelled out in more detail.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-01 18:09                 ` Amir Goldstein
@ 2026-03-01 18:19                   ` Chuck Lever
  2026-03-02  4:09                     ` NeilBrown
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2026-03-01 18:19 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Christian Brauner, Jan Kara, NeilBrown, Jeff Layton,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-fsdevel,
	Chuck Lever



On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
>> Perhaps that description nails down too much implementation detail,
>> and it might be stale. A broader description is this user story:
>>
>> "As a system administrator, I'd like to be able to unexport an NFSD
>
> Doesn't "unexporting" involve communicating to nfsd?
> Meaning calling to svc_export_put() to path_put() the
> share root path?
>
>> share that is being accessed by NFSv4 clients, and then unmount it,
>> reliably (for example, via automation). Currently the umount step
>> hangs if there are still outstanding delegations granted to the NFSv4
>> clients."
>
> Can't svc_export_put() be the trigger for nfsd to release all resources
> associated with this share?

Currently unexport does not revoke NFSv4 state. So, that would
be a user-visible behavior change. I suggested that approach a
few months ago to linux-nfs@ and there was push-back.


>> The discussion here has added some interesting corner cases: NFSD
>> can export bind mounts (portions of a local physical file system);
>> unprivileged users can create and umount file systems using "share".
>
> The basic question is whether nfsd is exporting a mount, a filesystem
> or something in between (i.e. a subtree of a filesystem).
>
> AFAIK, the current implementation is that nfsd is actually exporting
> one specific mount, so changing the properties of this mount
> (e.g. readonly) would affect the exported share.

AIUI NFSD can export starting at any arbitrary directory
that appears in the mount namespace. It does not have to
start at the local file system's root directory. But even
so, outstanding NFSv4 delegations on a narrow portion of
a mounted file system will still pin that mount.

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-01 18:19                   ` Chuck Lever
@ 2026-03-02  4:09                     ` NeilBrown
  2026-03-02 13:57                       ` Chuck Lever
  0 siblings, 1 reply; 30+ messages in thread
From: NeilBrown @ 2026-03-02  4:09 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Amir Goldstein, Jan Kara, Christian Brauner, Jan Kara,
	Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-fsdevel, Chuck Lever

On Mon, 02 Mar 2026, Chuck Lever wrote:
> 
> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> > On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> >> Perhaps that description nails down too much implementation detail,
> >> and it might be stale. A broader description is this user story:
> >>
> >> "As a system administrator, I'd like to be able to unexport an NFSD
> >
> > Doesn't "unexporting" involve communicating to nfsd?
> > Meaning calling to svc_export_put() to path_put() the
> > share root path?
> >
> >> share that is being accessed by NFSv4 clients, and then unmount it,
> >> reliably (for example, via automation). Currently the umount step
> >> hangs if there are still outstanding delegations granted to the NFSv4
> >> clients."
> >
> > Can't svc_export_put() be the trigger for nfsd to release all resources
> > associated with this share?
> 
> Currently unexport does not revoke NFSv4 state. So, that would
> be a user-visible behavior change. I suggested that approach a
> few months ago to linux-nfs@ and there was push-back.
> 

Could we add a "-F" or similar flag to "exportfs -u" which implements the
desired semantic?  i.e.  asking nfsd to release all locks and close all
state on the filesystem.

NeilBrown

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02  4:09                     ` NeilBrown
@ 2026-03-02 13:57                       ` Chuck Lever
  2026-03-02 15:26                         ` Jan Kara
  2026-03-02 17:01                         ` Chuck Lever
  0 siblings, 2 replies; 30+ messages in thread
From: Chuck Lever @ 2026-03-02 13:57 UTC (permalink / raw)
  To: NeilBrown
  Cc: Amir Goldstein, Jan Kara, Christian Brauner, Jan Kara,
	Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-fsdevel, Chuck Lever

On 3/1/26 11:09 PM, NeilBrown wrote:
> On Mon, 02 Mar 2026, Chuck Lever wrote:
>>
>> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
>>> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
>>>> Perhaps that description nails down too much implementation detail,
>>>> and it might be stale. A broader description is this user story:
>>>>
>>>> "As a system administrator, I'd like to be able to unexport an NFSD
>>>
>>> Doesn't "unexporting" involve communicating to nfsd?
>>> Meaning calling to svc_export_put() to path_put() the
>>> share root path?
>>>
>>>> share that is being accessed by NFSv4 clients, and then unmount it,
>>>> reliably (for example, via automation). Currently the umount step
>>>> hangs if there are still outstanding delegations granted to the NFSv4
>>>> clients."
>>>
>>> Can't svc_export_put() be the trigger for nfsd to release all resources
>>> associated with this share?
>>
>> Currently unexport does not revoke NFSv4 state. So, that would
>> be a user-visible behavior change. I suggested that approach a
>> few months ago to linux-nfs@ and there was push-back.
>>
> 
> Could we add a "-F" or similar flag to "exportfs -u" which implements the
> desired semantic?  i.e.  asking nfsd to release all locks and close all
> state on the filesystem.

That meets my needs, but should be passed by the linux-nfs@ review
committee.

-F could probably just use the existing "unlock filesystem" API
after it does the unexport.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02 13:57                       ` Chuck Lever
@ 2026-03-02 15:26                         ` Jan Kara
  2026-03-02 17:10                           ` Jeff Layton
  2026-03-02 17:01                         ` Chuck Lever
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Kara @ 2026-03-02 15:26 UTC (permalink / raw)
  To: Chuck Lever
  Cc: NeilBrown, Amir Goldstein, Jan Kara, Christian Brauner, Jan Kara,
	Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-fsdevel, Chuck Lever

On Mon 02-03-26 08:57:28, Chuck Lever wrote:
> On 3/1/26 11:09 PM, NeilBrown wrote:
> > On Mon, 02 Mar 2026, Chuck Lever wrote:
> >> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> >>> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> >>>> Perhaps that description nails down too much implementation detail,
> >>>> and it might be stale. A broader description is this user story:
> >>>>
> >>>> "As a system administrator, I'd like to be able to unexport an NFSD
> >>>
> >>> Doesn't "unexporting" involve communicating to nfsd?
> >>> Meaning calling to svc_export_put() to path_put() the
> >>> share root path?
> >>>
> >>>> share that is being accessed by NFSv4 clients, and then unmount it,
> >>>> reliably (for example, via automation). Currently the umount step
> >>>> hangs if there are still outstanding delegations granted to the NFSv4
> >>>> clients."
> >>>
> >>> Can't svc_export_put() be the trigger for nfsd to release all resources
> >>> associated with this share?
> >>
> >> Currently unexport does not revoke NFSv4 state. So, that would
> >> be a user-visible behavior change. I suggested that approach a
> >> few months ago to linux-nfs@ and there was push-back.
> >>
> > 
> > Could we add a "-F" or similar flag to "exportfs -u" which implements the
> > desired semantic?  i.e.  asking nfsd to release all locks and close all
> > state on the filesystem.
> 
> That meets my needs, but should be passed by the linux-nfs@ review
> committee.
> 
> -F could probably just use the existing "unlock filesystem" API
> after it does the unexport.

If this option flies, then I guess it is the most sensible variant. If it
doesn't work for some reason, then something like ->umount_begin sb
callback could be twisted (may possibly need some extension) to provide
the needed notification? At least in my naive understanding it was created
for usecases like this...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02 13:57                       ` Chuck Lever
  2026-03-02 15:26                         ` Jan Kara
@ 2026-03-02 17:01                         ` Chuck Lever
  2026-03-02 20:36                           ` NeilBrown
  1 sibling, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2026-03-02 17:01 UTC (permalink / raw)
  To: NeilBrown
  Cc: Amir Goldstein, Jan Kara, Christian Brauner, Jan Kara,
	Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-fsdevel

On 3/2/26 8:57 AM, Chuck Lever wrote:
> On 3/1/26 11:09 PM, NeilBrown wrote:
>> On Mon, 02 Mar 2026, Chuck Lever wrote:
>>>
>>> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
>>>> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
>>>>> Perhaps that description nails down too much implementation detail,
>>>>> and it might be stale. A broader description is this user story:
>>>>>
>>>>> "As a system administrator, I'd like to be able to unexport an NFSD
>>>>
>>>> Doesn't "unexporting" involve communicating to nfsd?
>>>> Meaning calling to svc_export_put() to path_put() the
>>>> share root path?
>>>>
>>>>> share that is being accessed by NFSv4 clients, and then unmount it,
>>>>> reliably (for example, via automation). Currently the umount step
>>>>> hangs if there are still outstanding delegations granted to the NFSv4
>>>>> clients."
>>>>
>>>> Can't svc_export_put() be the trigger for nfsd to release all resources
>>>> associated with this share?
>>>
>>> Currently unexport does not revoke NFSv4 state. So, that would
>>> be a user-visible behavior change. I suggested that approach a
>>> few months ago to linux-nfs@ and there was push-back.
>>>
>>
>> Could we add a "-F" or similar flag to "exportfs -u" which implements the
>> desired semantic?  i.e.  asking nfsd to release all locks and close all
>> state on the filesystem.
> 
> That meets my needs, but should be passed by the linux-nfs@ review
> committee.

Discussed with the reporter. -F addresses the automation requirement,
but users still expect "exportfs -u" to work the same way for NFSv3 and
NFSv4: "unexport" followed by "unmount" always works.

I am not remembering clearly why the linux-nfs folks though that NFSv4
delegations should stay in place after unexport. In my view, unexport
should be a security boundary, stopping access to the files on the
export.

But during a warm server reboot, do we want that behavior?


> -F could probably just use the existing "unlock filesystem" API
> after it does the unexport.

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02 15:26                         ` Jan Kara
@ 2026-03-02 17:10                           ` Jeff Layton
  2026-03-02 17:37                             ` Jan Kara
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Layton @ 2026-03-02 17:10 UTC (permalink / raw)
  To: Jan Kara, Chuck Lever
  Cc: NeilBrown, Amir Goldstein, Christian Brauner, Jan Kara,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-fsdevel,
	Chuck Lever

On Mon, 2026-03-02 at 16:26 +0100, Jan Kara wrote:
> On Mon 02-03-26 08:57:28, Chuck Lever wrote:
> > On 3/1/26 11:09 PM, NeilBrown wrote:
> > > On Mon, 02 Mar 2026, Chuck Lever wrote:
> > > > On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> > > > > On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> > > > > > Perhaps that description nails down too much implementation detail,
> > > > > > and it might be stale. A broader description is this user story:
> > > > > > 
> > > > > > "As a system administrator, I'd like to be able to unexport an NFSD
> > > > > 
> > > > > Doesn't "unexporting" involve communicating to nfsd?
> > > > > Meaning calling to svc_export_put() to path_put() the
> > > > > share root path?
> > > > > 
> > > > > > share that is being accessed by NFSv4 clients, and then unmount it,
> > > > > > reliably (for example, via automation). Currently the umount step
> > > > > > hangs if there are still outstanding delegations granted to the NFSv4
> > > > > > clients."
> > > > > 
> > > > > Can't svc_export_put() be the trigger for nfsd to release all resources
> > > > > associated with this share?
> > > > 
> > > > Currently unexport does not revoke NFSv4 state. So, that would
> > > > be a user-visible behavior change. I suggested that approach a
> > > > few months ago to linux-nfs@ and there was push-back.
> > > > 
> > > 
> > > Could we add a "-F" or similar flag to "exportfs -u" which implements the
> > > desired semantic?  i.e.  asking nfsd to release all locks and close all
> > > state on the filesystem.
> > 
> > That meets my needs, but should be passed by the linux-nfs@ review
> > committee.
> > 
> > -F could probably just use the existing "unlock filesystem" API
> > after it does the unexport.
> 
> If this option flies, then I guess it is the most sensible variant. If it
> doesn't work for some reason, then something like ->umount_begin sb
> callback could be twisted (may possibly need some extension) to provide
> the needed notification? At least in my naive understanding it was created
> for usecases like this...
> 
> 								Honza

umount_begin is a superblock op that only occurs when MNT_FORCE is set.
In this case though, we really want something that calls back into
nfsd, rather than to the fs being unmounted.

You could just wire up a bunch of umount_begin() operations but that
seems rather nasty. Maybe you could add some sort of callback that nfsd
could register that runs just before umount_begin does?

That would be preferable from a UI standpoint than dealing with a new
exportfs option, IMO. MNT_FORCE could then (in theory) properly allow
for state teardown in nfsd that way, which seems rather natural.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02 17:10                           ` Jeff Layton
@ 2026-03-02 17:37                             ` Jan Kara
  2026-03-02 17:53                               ` Jeff Layton
  2026-03-02 20:46                               ` NeilBrown
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2026-03-02 17:37 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Jan Kara, Chuck Lever, NeilBrown, Amir Goldstein,
	Christian Brauner, Jan Kara, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, linux-nfs, linux-fsdevel, Chuck Lever

On Mon 02-03-26 12:10:52, Jeff Layton wrote:
> On Mon, 2026-03-02 at 16:26 +0100, Jan Kara wrote:
> > On Mon 02-03-26 08:57:28, Chuck Lever wrote:
> > > On 3/1/26 11:09 PM, NeilBrown wrote:
> > > > On Mon, 02 Mar 2026, Chuck Lever wrote:
> > > > > On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> > > > > > On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> > > > > > > Perhaps that description nails down too much implementation detail,
> > > > > > > and it might be stale. A broader description is this user story:
> > > > > > > 
> > > > > > > "As a system administrator, I'd like to be able to unexport an NFSD
> > > > > > 
> > > > > > Doesn't "unexporting" involve communicating to nfsd?
> > > > > > Meaning calling to svc_export_put() to path_put() the
> > > > > > share root path?
> > > > > > 
> > > > > > > share that is being accessed by NFSv4 clients, and then unmount it,
> > > > > > > reliably (for example, via automation). Currently the umount step
> > > > > > > hangs if there are still outstanding delegations granted to the NFSv4
> > > > > > > clients."
> > > > > > 
> > > > > > Can't svc_export_put() be the trigger for nfsd to release all resources
> > > > > > associated with this share?
> > > > > 
> > > > > Currently unexport does not revoke NFSv4 state. So, that would
> > > > > be a user-visible behavior change. I suggested that approach a
> > > > > few months ago to linux-nfs@ and there was push-back.
> > > > > 
> > > > 
> > > > Could we add a "-F" or similar flag to "exportfs -u" which implements the
> > > > desired semantic?  i.e.  asking nfsd to release all locks and close all
> > > > state on the filesystem.
> > > 
> > > That meets my needs, but should be passed by the linux-nfs@ review
> > > committee.
> > > 
> > > -F could probably just use the existing "unlock filesystem" API
> > > after it does the unexport.
> > 
> > If this option flies, then I guess it is the most sensible variant. If it
> > doesn't work for some reason, then something like ->umount_begin sb
> > callback could be twisted (may possibly need some extension) to provide
> > the needed notification? At least in my naive understanding it was created
> > for usecases like this...
> > 
> > 								Honza
> 
> umount_begin is a superblock op that only occurs when MNT_FORCE is set.
> In this case though, we really want something that calls back into
> nfsd, rather than to the fs being unmounted.

I see OK.

> You could just wire up a bunch of umount_begin() operations but that
> seems rather nasty. Maybe you could add some sort of callback that nfsd
> could register that runs just before umount_begin does?

Thinking about this more - Chuck was also writing about the problem of
needing to shutdown the state only when this is the last unmount of a
superblock but until we grab namespace_lock(), that's impossible to tell in
a race-free manner? And how about lazy unmounts? There it would seem to be
extra hard to determine when NFS needs to drop it's delegations since you
need to figure out whether all file references are NFS internal only? It
all seems like a notification from VFS isn't the right place to solve this
issue...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02 17:37                             ` Jan Kara
@ 2026-03-02 17:53                               ` Jeff Layton
  2026-03-04 13:17                                 ` Christian Brauner
  2026-03-02 20:46                               ` NeilBrown
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff Layton @ 2026-03-02 17:53 UTC (permalink / raw)
  To: Jan Kara
  Cc: Chuck Lever, NeilBrown, Amir Goldstein, Christian Brauner,
	Jan Kara, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-fsdevel, Chuck Lever

On Mon, 2026-03-02 at 18:37 +0100, Jan Kara wrote:
> On Mon 02-03-26 12:10:52, Jeff Layton wrote:
> > On Mon, 2026-03-02 at 16:26 +0100, Jan Kara wrote:
> > > On Mon 02-03-26 08:57:28, Chuck Lever wrote:
> > > > On 3/1/26 11:09 PM, NeilBrown wrote:
> > > > > On Mon, 02 Mar 2026, Chuck Lever wrote:
> > > > > > On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> > > > > > > On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> > > > > > > > Perhaps that description nails down too much implementation detail,
> > > > > > > > and it might be stale. A broader description is this user story:
> > > > > > > > 
> > > > > > > > "As a system administrator, I'd like to be able to unexport an NFSD
> > > > > > > 
> > > > > > > Doesn't "unexporting" involve communicating to nfsd?
> > > > > > > Meaning calling to svc_export_put() to path_put() the
> > > > > > > share root path?
> > > > > > > 
> > > > > > > > share that is being accessed by NFSv4 clients, and then unmount it,
> > > > > > > > reliably (for example, via automation). Currently the umount step
> > > > > > > > hangs if there are still outstanding delegations granted to the NFSv4
> > > > > > > > clients."
> > > > > > > 
> > > > > > > Can't svc_export_put() be the trigger for nfsd to release all resources
> > > > > > > associated with this share?
> > > > > > 
> > > > > > Currently unexport does not revoke NFSv4 state. So, that would
> > > > > > be a user-visible behavior change. I suggested that approach a
> > > > > > few months ago to linux-nfs@ and there was push-back.
> > > > > > 
> > > > > 
> > > > > Could we add a "-F" or similar flag to "exportfs -u" which implements the
> > > > > desired semantic?  i.e.  asking nfsd to release all locks and close all
> > > > > state on the filesystem.
> > > > 
> > > > That meets my needs, but should be passed by the linux-nfs@ review
> > > > committee.
> > > > 
> > > > -F could probably just use the existing "unlock filesystem" API
> > > > after it does the unexport.
> > > 
> > > If this option flies, then I guess it is the most sensible variant. If it
> > > doesn't work for some reason, then something like ->umount_begin sb
> > > callback could be twisted (may possibly need some extension) to provide
> > > the needed notification? At least in my naive understanding it was created
> > > for usecases like this...
> > > 
> > > 								Honza
> > 
> > umount_begin is a superblock op that only occurs when MNT_FORCE is set.
> > In this case though, we really want something that calls back into
> > nfsd, rather than to the fs being unmounted.
> 
> I see OK.
> 
> > You could just wire up a bunch of umount_begin() operations but that
> > seems rather nasty. Maybe you could add some sort of callback that nfsd
> > could register that runs just before umount_begin does?
> 
> Thinking about this more - Chuck was also writing about the problem of
> needing to shutdown the state only when this is the last unmount of a
> superblock but until we grab namespace_lock(), that's impossible to tell in
> a race-free manner? And how about lazy unmounts? There it would seem to be
> extra hard to determine when NFS needs to drop it's delegations since you
> need to figure out whether all file references are NFS internal only? It
> all seems like a notification from VFS isn't the right place to solve this
> issue...
> 

The issue is that traditionally, "exportfs -u" is what unexports the
filesystem and at that point you can (usually) unmount it. We'd ideally
like to have a solution that doesn't create extra steps or change this,
since there is already a lot of automation and muscle memory around
these commands.

This method mostly works with v3, since there is no long term state
(technically lockd can hold some, but that's only for file locking).
With v4 that changed and nfsd holds files open for much longer.

We can't drop all the state when fs is unexported, as it's not uncommon
for it to be reexported soon afterward, and we can't force a grace
period at that point to allow reclaim.

Unmounting seems like the natural place for this. At the point where
you're unmounting, there can be no more state and the admin's intent is
clear. Tearing down nfsd state at that point seems pretty safe.

If we can't add some sort of hook to the umount path, then I'll
understand, but it would be a nice to have for this use-case.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02 17:01                         ` Chuck Lever
@ 2026-03-02 20:36                           ` NeilBrown
  2026-03-03 20:02                             ` Chuck Lever
  2026-03-04 13:05                             ` Christian Brauner
  0 siblings, 2 replies; 30+ messages in thread
From: NeilBrown @ 2026-03-02 20:36 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Amir Goldstein, Jan Kara, Christian Brauner, Jan Kara,
	Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-fsdevel

On Tue, 03 Mar 2026, Chuck Lever wrote:
> On 3/2/26 8:57 AM, Chuck Lever wrote:
> > On 3/1/26 11:09 PM, NeilBrown wrote:
> >> On Mon, 02 Mar 2026, Chuck Lever wrote:
> >>>
> >>> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> >>>> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> >>>>> Perhaps that description nails down too much implementation detail,
> >>>>> and it might be stale. A broader description is this user story:
> >>>>>
> >>>>> "As a system administrator, I'd like to be able to unexport an NFSD
> >>>>
> >>>> Doesn't "unexporting" involve communicating to nfsd?
> >>>> Meaning calling to svc_export_put() to path_put() the
> >>>> share root path?
> >>>>
> >>>>> share that is being accessed by NFSv4 clients, and then unmount it,
> >>>>> reliably (for example, via automation). Currently the umount step
> >>>>> hangs if there are still outstanding delegations granted to the NFSv4
> >>>>> clients."
> >>>>
> >>>> Can't svc_export_put() be the trigger for nfsd to release all resources
> >>>> associated with this share?
> >>>
> >>> Currently unexport does not revoke NFSv4 state. So, that would
> >>> be a user-visible behavior change. I suggested that approach a
> >>> few months ago to linux-nfs@ and there was push-back.
> >>>
> >>
> >> Could we add a "-F" or similar flag to "exportfs -u" which implements the
> >> desired semantic?  i.e.  asking nfsd to release all locks and close all
> >> state on the filesystem.
> > 
> > That meets my needs, but should be passed by the linux-nfs@ review
> > committee.
> 
> Discussed with the reporter. -F addresses the automation requirement,
> but users still expect "exportfs -u" to work the same way for NFSv3 and
> NFSv4: "unexport" followed by "unmount" always works.
> 
> I am not remembering clearly why the linux-nfs folks though that NFSv4
> delegations should stay in place after unexport. In my view, unexport
> should be a security boundary, stopping access to the files on the
> export.

At the time when the API was growing, delegations were barely an
unhatched idea.

unexport may be a security boundary, but it is not so obvious that it is
a state boundary.

The kernel is not directly involved in whether something is exported or
not.  That is under the control of mountd/exportfs.  The kernel keeps a
cache of info from there.  So if you want to impose a state boundary, it
really should involved mountd/exportfs.

There was once this idea floating around that policy didn't belong in
the kernel.

NeilBrown

> 
> But during a warm server reboot, do we want that behavior?
> 
> 
> > -F could probably just use the existing "unlock filesystem" API
> > after it does the unexport.
> 
> -- 
> Chuck Lever
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02 17:37                             ` Jan Kara
  2026-03-02 17:53                               ` Jeff Layton
@ 2026-03-02 20:46                               ` NeilBrown
  1 sibling, 0 replies; 30+ messages in thread
From: NeilBrown @ 2026-03-02 20:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jeff Layton, Jan Kara, Chuck Lever, Amir Goldstein,
	Christian Brauner, Jan Kara, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, linux-nfs, linux-fsdevel, Chuck Lever

On Tue, 03 Mar 2026, Jan Kara wrote:
> On Mon 02-03-26 12:10:52, Jeff Layton wrote:
> > On Mon, 2026-03-02 at 16:26 +0100, Jan Kara wrote:
> > > On Mon 02-03-26 08:57:28, Chuck Lever wrote:
> > > > On 3/1/26 11:09 PM, NeilBrown wrote:
> > > > > On Mon, 02 Mar 2026, Chuck Lever wrote:
> > > > > > On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> > > > > > > On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> > > > > > > > Perhaps that description nails down too much implementation detail,
> > > > > > > > and it might be stale. A broader description is this user story:
> > > > > > > > 
> > > > > > > > "As a system administrator, I'd like to be able to unexport an NFSD
> > > > > > > 
> > > > > > > Doesn't "unexporting" involve communicating to nfsd?
> > > > > > > Meaning calling to svc_export_put() to path_put() the
> > > > > > > share root path?
> > > > > > > 
> > > > > > > > share that is being accessed by NFSv4 clients, and then unmount it,
> > > > > > > > reliably (for example, via automation). Currently the umount step
> > > > > > > > hangs if there are still outstanding delegations granted to the NFSv4
> > > > > > > > clients."
> > > > > > > 
> > > > > > > Can't svc_export_put() be the trigger for nfsd to release all resources
> > > > > > > associated with this share?
> > > > > > 
> > > > > > Currently unexport does not revoke NFSv4 state. So, that would
> > > > > > be a user-visible behavior change. I suggested that approach a
> > > > > > few months ago to linux-nfs@ and there was push-back.
> > > > > > 
> > > > > 
> > > > > Could we add a "-F" or similar flag to "exportfs -u" which implements the
> > > > > desired semantic?  i.e.  asking nfsd to release all locks and close all
> > > > > state on the filesystem.
> > > > 
> > > > That meets my needs, but should be passed by the linux-nfs@ review
> > > > committee.
> > > > 
> > > > -F could probably just use the existing "unlock filesystem" API
> > > > after it does the unexport.
> > > 
> > > If this option flies, then I guess it is the most sensible variant. If it
> > > doesn't work for some reason, then something like ->umount_begin sb
> > > callback could be twisted (may possibly need some extension) to provide
> > > the needed notification? At least in my naive understanding it was created
> > > for usecases like this...
> > > 
> > > 								Honza
> > 
> > umount_begin is a superblock op that only occurs when MNT_FORCE is set.
> > In this case though, we really want something that calls back into
> > nfsd, rather than to the fs being unmounted.
> 
> I see OK.
> 
> > You could just wire up a bunch of umount_begin() operations but that
> > seems rather nasty. Maybe you could add some sort of callback that nfsd
> > could register that runs just before umount_begin does?
> 
> Thinking about this more - Chuck was also writing about the problem of
> needing to shutdown the state only when this is the last unmount of a
> superblock but until we grab namespace_lock(), that's impossible to tell in
> a race-free manner? And how about lazy unmounts? There it would seem to be
> extra hard to determine when NFS needs to drop it's delegations since you
> need to figure out whether all file references are NFS internal only? It
> all seems like a notification from VFS isn't the right place to solve this
> issue...

It isn't clear to me that "last unmount" is the correct target.

The nfsd file cache (which I think is the main focus here - it holds
delegated files etc) caches files.  i.e.  "struct file *".  This
identifies a dentry on a vfsmount.
I think it could be appropriate to drop nfsd state when the vfsmount is
being considered for unmount - including lazy unmount - whether the
superblock has other mounts or not.

An unmount attempt should fail if the vfsmount is still in use,
including in use in the the nfsd export table.  And if it is going to
fail, then we don't want to drop nfsd state.  But if it would succeed
after nfsd state was dropped - then and only then do we want to drop
nfsd state.

I think it would be messy to get this dependency correct.

i.e.  I think that guessing, in the kernel, what the user wants is
problematic.  I think giving the user the tools to say exactly what they
want is best.

NeilBrown


> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02 20:36                           ` NeilBrown
@ 2026-03-03 20:02                             ` Chuck Lever
  2026-03-03 21:23                               ` NeilBrown
  2026-03-04 13:05                             ` Christian Brauner
  1 sibling, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2026-03-03 20:02 UTC (permalink / raw)
  To: NeilBrown
  Cc: Amir Goldstein, Jan Kara, Christian Brauner, Jan Kara,
	Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-fsdevel

On 3/2/26 3:36 PM, NeilBrown wrote:
> On Tue, 03 Mar 2026, Chuck Lever wrote:
>> On 3/2/26 8:57 AM, Chuck Lever wrote:
>>> On 3/1/26 11:09 PM, NeilBrown wrote:
>>>> On Mon, 02 Mar 2026, Chuck Lever wrote:
>>>>>
>>>>> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
>>>>>> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
>>>>>>> Perhaps that description nails down too much implementation detail,
>>>>>>> and it might be stale. A broader description is this user story:
>>>>>>>
>>>>>>> "As a system administrator, I'd like to be able to unexport an NFSD
>>>>>>
>>>>>> Doesn't "unexporting" involve communicating to nfsd?
>>>>>> Meaning calling to svc_export_put() to path_put() the
>>>>>> share root path?
>>>>>>
>>>>>>> share that is being accessed by NFSv4 clients, and then unmount it,
>>>>>>> reliably (for example, via automation). Currently the umount step
>>>>>>> hangs if there are still outstanding delegations granted to the NFSv4
>>>>>>> clients."
>>>>>>
>>>>>> Can't svc_export_put() be the trigger for nfsd to release all resources
>>>>>> associated with this share?
>>>>>
>>>>> Currently unexport does not revoke NFSv4 state. So, that would
>>>>> be a user-visible behavior change. I suggested that approach a
>>>>> few months ago to linux-nfs@ and there was push-back.
>>>>>
>>>>
>>>> Could we add a "-F" or similar flag to "exportfs -u" which implements the
>>>> desired semantic?  i.e.  asking nfsd to release all locks and close all
>>>> state on the filesystem.
>>>
>>> That meets my needs, but should be passed by the linux-nfs@ review
>>> committee.
>>
>> Discussed with the reporter. -F addresses the automation requirement,
>> but users still expect "exportfs -u" to work the same way for NFSv3 and
>> NFSv4: "unexport" followed by "unmount" always works.
>>
>> I am not remembering clearly why the linux-nfs folks though that NFSv4
>> delegations should stay in place after unexport. In my view, unexport
>> should be a security boundary, stopping access to the files on the
>> export.
> 
> At the time when the API was growing, delegations were barely an
> unhatched idea.
> 
> unexport may be a security boundary, but it is not so obvious that it is
> a state boundary.
> 
> The kernel is not directly involved in whether something is exported or
> not.  That is under the control of mountd/exportfs.  The kernel keeps a
> cache of info from there.  So if you want to impose a state boundary, it
> really should involved mountd/exportfs.
> 
> There was once this idea floating around that policy didn't belong in
> the kernel.

I consider enabling unmount after unexport more "mechanism" than
"policy", but not so much that I'm about to get religious about it. It
appears that the expedient path forward would be teaching exportfs to do
an "unlock filesystem" after it finishes unexporting, and leaving the
kernel untouched.

The question now is whether exportfs should grow a command-line option
to modulate this behavior:

- Some users consider the current situation as a regression -- unmount
  after unexport used to work seamlessly with NFSv3; still does; but not
  with NFSv4.

- Some users might consider changing the current unexport behavior as
  introducing a regression -- they rely on NFSv4 state continuing to
  exist after unexport. That behavior isn't documented anywhere, I
  suspect.

Thus I'm not sure exactly what change to exportfs is most appropriate.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-03 20:02                             ` Chuck Lever
@ 2026-03-03 21:23                               ` NeilBrown
  2026-03-03 22:50                                 ` Chuck Lever
  0 siblings, 1 reply; 30+ messages in thread
From: NeilBrown @ 2026-03-03 21:23 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Amir Goldstein, Jan Kara, Christian Brauner, Jan Kara,
	Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-fsdevel

On Wed, 04 Mar 2026, Chuck Lever wrote:
> On 3/2/26 3:36 PM, NeilBrown wrote:
> > On Tue, 03 Mar 2026, Chuck Lever wrote:
> >> On 3/2/26 8:57 AM, Chuck Lever wrote:
> >>> On 3/1/26 11:09 PM, NeilBrown wrote:
> >>>> On Mon, 02 Mar 2026, Chuck Lever wrote:
> >>>>>
> >>>>> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> >>>>>> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> >>>>>>> Perhaps that description nails down too much implementation detail,
> >>>>>>> and it might be stale. A broader description is this user story:
> >>>>>>>
> >>>>>>> "As a system administrator, I'd like to be able to unexport an NFSD
> >>>>>>
> >>>>>> Doesn't "unexporting" involve communicating to nfsd?
> >>>>>> Meaning calling to svc_export_put() to path_put() the
> >>>>>> share root path?
> >>>>>>
> >>>>>>> share that is being accessed by NFSv4 clients, and then unmount it,
> >>>>>>> reliably (for example, via automation). Currently the umount step
> >>>>>>> hangs if there are still outstanding delegations granted to the NFSv4
> >>>>>>> clients."
> >>>>>>
> >>>>>> Can't svc_export_put() be the trigger for nfsd to release all resources
> >>>>>> associated with this share?
> >>>>>
> >>>>> Currently unexport does not revoke NFSv4 state. So, that would
> >>>>> be a user-visible behavior change. I suggested that approach a
> >>>>> few months ago to linux-nfs@ and there was push-back.
> >>>>>
> >>>>
> >>>> Could we add a "-F" or similar flag to "exportfs -u" which implements the
> >>>> desired semantic?  i.e.  asking nfsd to release all locks and close all
> >>>> state on the filesystem.
> >>>
> >>> That meets my needs, but should be passed by the linux-nfs@ review
> >>> committee.
> >>
> >> Discussed with the reporter. -F addresses the automation requirement,
> >> but users still expect "exportfs -u" to work the same way for NFSv3 and
> >> NFSv4: "unexport" followed by "unmount" always works.
> >>
> >> I am not remembering clearly why the linux-nfs folks though that NFSv4
> >> delegations should stay in place after unexport. In my view, unexport
> >> should be a security boundary, stopping access to the files on the
> >> export.
> > 
> > At the time when the API was growing, delegations were barely an
> > unhatched idea.
> > 
> > unexport may be a security boundary, but it is not so obvious that it is
> > a state boundary.
> > 
> > The kernel is not directly involved in whether something is exported or
> > not.  That is under the control of mountd/exportfs.  The kernel keeps a
> > cache of info from there.  So if you want to impose a state boundary, it
> > really should involved mountd/exportfs.
> > 
> > There was once this idea floating around that policy didn't belong in
> > the kernel.
> 
> I consider enabling unmount after unexport more "mechanism" than
> "policy", but not so much that I'm about to get religious about it. It
> appears that the expedient path forward would be teaching exportfs to do
> an "unlock filesystem" after it finishes unexporting, and leaving the
> kernel untouched.
> 
> The question now is whether exportfs should grow a command-line option
> to modulate this behavior:
> 
> - Some users consider the current situation as a regression -- unmount
>   after unexport used to work seamlessly with NFSv3; still does; but not
>   with NFSv4.

They are of course welcome to keep using NFSv3 (and to not lock files) :-)

> 
> - Some users might consider changing the current unexport behavior as
>   introducing a regression -- they rely on NFSv4 state continuing to
>   exist after unexport. That behavior isn't documented anywhere, I
>   suspect.
> 
> Thus I'm not sure exactly what change to exportfs is most appropriate.

I think any purging of the cache should happen at unexport time, not
transparently when unmount is attempted as I think the ordering
semantics there are complex.

And as the kernel doesn't know when something has been unexported, it
must be exportfs which initiates the cache purge.

So the only interesting question I can see is:
  do we mount "purge on unexport" the default, or do we require an
  explicit request (-F)?
A complexity here is that a given filesystem can be exported to
different clients with different options, and different subtrees can be
exported. If the cache-flush were to be the default, it would need to be
on the last export of any path to the filesystem.  This would need to
include implicit exports via crossmnt.  I think this would be hard to
specify and document well.

So I think an explicit "flush cache" exportfs action is simplest and
best.
Possibly:
   exportfs -F /some/path
would unexport all exports which reference the same mountpoint, then
would tell the kernel to drop all cached data for that mount.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-03 21:23                               ` NeilBrown
@ 2026-03-03 22:50                                 ` Chuck Lever
  2026-03-04  1:01                                   ` NeilBrown
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2026-03-03 22:50 UTC (permalink / raw)
  To: NeilBrown
  Cc: Amir Goldstein, Jan Kara, Christian Brauner, Jan Kara,
	Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-fsdevel

On 3/3/26 4:23 PM, NeilBrown wrote:
> On Wed, 04 Mar 2026, Chuck Lever wrote:
>> On 3/2/26 3:36 PM, NeilBrown wrote:
>>> On Tue, 03 Mar 2026, Chuck Lever wrote:
>>>> On 3/2/26 8:57 AM, Chuck Lever wrote:
>>>>> On 3/1/26 11:09 PM, NeilBrown wrote:
>>>>>> On Mon, 02 Mar 2026, Chuck Lever wrote:
>>>>>>>
>>>>>>> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
>>>>>>>> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
>>>>>>>>> Perhaps that description nails down too much implementation detail,
>>>>>>>>> and it might be stale. A broader description is this user story:
>>>>>>>>>
>>>>>>>>> "As a system administrator, I'd like to be able to unexport an NFSD
>>>>>>>>
>>>>>>>> Doesn't "unexporting" involve communicating to nfsd?
>>>>>>>> Meaning calling to svc_export_put() to path_put() the
>>>>>>>> share root path?
>>>>>>>>
>>>>>>>>> share that is being accessed by NFSv4 clients, and then unmount it,
>>>>>>>>> reliably (for example, via automation). Currently the umount step
>>>>>>>>> hangs if there are still outstanding delegations granted to the NFSv4
>>>>>>>>> clients."
>>>>>>>>
>>>>>>>> Can't svc_export_put() be the trigger for nfsd to release all resources
>>>>>>>> associated with this share?
>>>>>>>
>>>>>>> Currently unexport does not revoke NFSv4 state. So, that would
>>>>>>> be a user-visible behavior change. I suggested that approach a
>>>>>>> few months ago to linux-nfs@ and there was push-back.
>>>>>>>
>>>>>>
>>>>>> Could we add a "-F" or similar flag to "exportfs -u" which implements the
>>>>>> desired semantic?  i.e.  asking nfsd to release all locks and close all
>>>>>> state on the filesystem.
>>>>>
>>>>> That meets my needs, but should be passed by the linux-nfs@ review
>>>>> committee.
>>>>
>>>> Discussed with the reporter. -F addresses the automation requirement,
>>>> but users still expect "exportfs -u" to work the same way for NFSv3 and
>>>> NFSv4: "unexport" followed by "unmount" always works.
>>>>
>>>> I am not remembering clearly why the linux-nfs folks though that NFSv4
>>>> delegations should stay in place after unexport. In my view, unexport
>>>> should be a security boundary, stopping access to the files on the
>>>> export.
>>>
>>> At the time when the API was growing, delegations were barely an
>>> unhatched idea.
>>>
>>> unexport may be a security boundary, but it is not so obvious that it is
>>> a state boundary.
>>>
>>> The kernel is not directly involved in whether something is exported or
>>> not.  That is under the control of mountd/exportfs.  The kernel keeps a
>>> cache of info from there.  So if you want to impose a state boundary, it
>>> really should involved mountd/exportfs.
>>>
>>> There was once this idea floating around that policy didn't belong in
>>> the kernel.
>>
>> I consider enabling unmount after unexport more "mechanism" than
>> "policy", but not so much that I'm about to get religious about it. It
>> appears that the expedient path forward would be teaching exportfs to do
>> an "unlock filesystem" after it finishes unexporting, and leaving the
>> kernel untouched.
>>
>> The question now is whether exportfs should grow a command-line option
>> to modulate this behavior:
>>
>> - Some users consider the current situation as a regression -- unmount
>>   after unexport used to work seamlessly with NFSv3; still does; but not
>>   with NFSv4.
> 
> They are of course welcome to keep using NFSv3 (and to not lock files) :-)

>> - Some users might consider changing the current unexport behavior as
>>   introducing a regression -- they rely on NFSv4 state continuing to
>>   exist after unexport. That behavior isn't documented anywhere, I
>>   suspect.
>>
>> Thus I'm not sure exactly what change to exportfs is most appropriate.
> 
> I think any purging of the cache should happen at unexport time, not
> transparently when unmount is attempted as I think the ordering
> semantics there are complex.
> 
> And as the kernel doesn't know when something has been unexported, it
> must be exportfs which initiates the cache purge.
> 
> So the only interesting question I can see is:
>   do we mount "purge on unexport" the default, or do we require an
>   explicit request (-F)?

Yes, that's what I was trying to say above.


> A complexity here is that a given filesystem can be exported to
> different clients with different options, and different subtrees can be
> exported. If the cache-flush were to be the default, it would need to be
> on the last export of any path to the filesystem.  This would need to
> include implicit exports via crossmnt.  I think this would be hard to
> specify and document well.

Is there nothing we can do to engineer the exportfs command to remove
some of this complexity?


> So I think an explicit "flush cache" exportfs action is simplest and
> best.
> Possibly:
>    exportfs -F /some/path
> would unexport all exports which reference the same mountpoint, then
> would tell the kernel to drop all cached data for that mount.

I passed along your original "-F" suggestion to the original reporter a
few days ago, and it was not met with universal glee and a huzzah.

Although "-F" can be added to automation easily enough, their
preference, based on their own users' experience, is that the fix should
not require changes in user behavior.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-03 22:50                                 ` Chuck Lever
@ 2026-03-04  1:01                                   ` NeilBrown
  0 siblings, 0 replies; 30+ messages in thread
From: NeilBrown @ 2026-03-04  1:01 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Amir Goldstein, Jan Kara, Christian Brauner, Jan Kara,
	Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-fsdevel

On Wed, 04 Mar 2026, Chuck Lever wrote:
> On 3/3/26 4:23 PM, NeilBrown wrote:
> > On Wed, 04 Mar 2026, Chuck Lever wrote:
> >> On 3/2/26 3:36 PM, NeilBrown wrote:
> >>> On Tue, 03 Mar 2026, Chuck Lever wrote:
> >>>> On 3/2/26 8:57 AM, Chuck Lever wrote:
> >>>>> On 3/1/26 11:09 PM, NeilBrown wrote:
> >>>>>> On Mon, 02 Mar 2026, Chuck Lever wrote:
> >>>>>>>
> >>>>>>> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> >>>>>>>> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> >>>>>>>>> Perhaps that description nails down too much implementation detail,
> >>>>>>>>> and it might be stale. A broader description is this user story:
> >>>>>>>>>
> >>>>>>>>> "As a system administrator, I'd like to be able to unexport an NFSD
> >>>>>>>>
> >>>>>>>> Doesn't "unexporting" involve communicating to nfsd?
> >>>>>>>> Meaning calling to svc_export_put() to path_put() the
> >>>>>>>> share root path?
> >>>>>>>>
> >>>>>>>>> share that is being accessed by NFSv4 clients, and then unmount it,
> >>>>>>>>> reliably (for example, via automation). Currently the umount step
> >>>>>>>>> hangs if there are still outstanding delegations granted to the NFSv4
> >>>>>>>>> clients."
> >>>>>>>>
> >>>>>>>> Can't svc_export_put() be the trigger for nfsd to release all resources
> >>>>>>>> associated with this share?
> >>>>>>>
> >>>>>>> Currently unexport does not revoke NFSv4 state. So, that would
> >>>>>>> be a user-visible behavior change. I suggested that approach a
> >>>>>>> few months ago to linux-nfs@ and there was push-back.
> >>>>>>>
> >>>>>>
> >>>>>> Could we add a "-F" or similar flag to "exportfs -u" which implements the
> >>>>>> desired semantic?  i.e.  asking nfsd to release all locks and close all
> >>>>>> state on the filesystem.
> >>>>>
> >>>>> That meets my needs, but should be passed by the linux-nfs@ review
> >>>>> committee.
> >>>>
> >>>> Discussed with the reporter. -F addresses the automation requirement,
> >>>> but users still expect "exportfs -u" to work the same way for NFSv3 and
> >>>> NFSv4: "unexport" followed by "unmount" always works.
> >>>>
> >>>> I am not remembering clearly why the linux-nfs folks though that NFSv4
> >>>> delegations should stay in place after unexport. In my view, unexport
> >>>> should be a security boundary, stopping access to the files on the
> >>>> export.
> >>>
> >>> At the time when the API was growing, delegations were barely an
> >>> unhatched idea.
> >>>
> >>> unexport may be a security boundary, but it is not so obvious that it is
> >>> a state boundary.
> >>>
> >>> The kernel is not directly involved in whether something is exported or
> >>> not.  That is under the control of mountd/exportfs.  The kernel keeps a
> >>> cache of info from there.  So if you want to impose a state boundary, it
> >>> really should involved mountd/exportfs.
> >>>
> >>> There was once this idea floating around that policy didn't belong in
> >>> the kernel.
> >>
> >> I consider enabling unmount after unexport more "mechanism" than
> >> "policy", but not so much that I'm about to get religious about it. It
> >> appears that the expedient path forward would be teaching exportfs to do
> >> an "unlock filesystem" after it finishes unexporting, and leaving the
> >> kernel untouched.
> >>
> >> The question now is whether exportfs should grow a command-line option
> >> to modulate this behavior:
> >>
> >> - Some users consider the current situation as a regression -- unmount
> >>   after unexport used to work seamlessly with NFSv3; still does; but not
> >>   with NFSv4.
> > 
> > They are of course welcome to keep using NFSv3 (and to not lock files) :-)
> 
> >> - Some users might consider changing the current unexport behavior as
> >>   introducing a regression -- they rely on NFSv4 state continuing to
> >>   exist after unexport. That behavior isn't documented anywhere, I
> >>   suspect.
> >>
> >> Thus I'm not sure exactly what change to exportfs is most appropriate.
> > 
> > I think any purging of the cache should happen at unexport time, not
> > transparently when unmount is attempted as I think the ordering
> > semantics there are complex.
> > 
> > And as the kernel doesn't know when something has been unexported, it
> > must be exportfs which initiates the cache purge.
> > 
> > So the only interesting question I can see is:
> >   do we mount "purge on unexport" the default, or do we require an
> >   explicit request (-F)?
> 
> Yes, that's what I was trying to say above.
> 
> 
> > A complexity here is that a given filesystem can be exported to
> > different clients with different options, and different subtrees can be
> > exported. If the cache-flush were to be the default, it would need to be
> > on the last export of any path to the filesystem.  This would need to
> > include implicit exports via crossmnt.  I think this would be hard to
> > specify and document well.
> 
> Is there nothing we can do to engineer the exportfs command to remove
> some of this complexity?

An argument could certainly be made that the exportfs command attempts
to be too general and consequently is too vague, and that a better
interface could be defined that was more opinionated and less flexible,
and so was easier to use without risk of complexity.

But that would be a different tool.
The complexity I see is not in the implementation, which could be
re-engineers, but in the design/behaviour which cannot without
user-visible change.

> 
> 
> > So I think an explicit "flush cache" exportfs action is simplest and
> > best.
> > Possibly:
> >    exportfs -F /some/path
> > would unexport all exports which reference the same mountpoint, then
> > would tell the kernel to drop all cached data for that mount.
> 
> I passed along your original "-F" suggestion to the original reporter a
> few days ago, and it was not met with universal glee and a huzzah.
> 
> Although "-F" can be added to automation easily enough, their
> preference, based on their own users' experience, is that the fix should
> not require changes in user behavior.

I think their preference, while understandable, is naive.
They are doing something that is not documented as supported, is not
actually supported, and has always had the possibility of failure.
They have switched from v3 to v4 (possibly following a default) and now the
failure is more likely...
Maybe the best fix without changes in user behaviour is to switch the
default back to v3....

NeilBrown


> 
> 
> -- 
> Chuck Lever
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02 20:36                           ` NeilBrown
  2026-03-03 20:02                             ` Chuck Lever
@ 2026-03-04 13:05                             ` Christian Brauner
  1 sibling, 0 replies; 30+ messages in thread
From: Christian Brauner @ 2026-03-04 13:05 UTC (permalink / raw)
  To: NeilBrown
  Cc: Chuck Lever, Amir Goldstein, Jan Kara, Jan Kara, Jeff Layton,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-fsdevel

On Tue, Mar 03, 2026 at 07:36:26AM +1100, NeilBrown wrote:
> On Tue, 03 Mar 2026, Chuck Lever wrote:
> > On 3/2/26 8:57 AM, Chuck Lever wrote:
> > > On 3/1/26 11:09 PM, NeilBrown wrote:
> > >> On Mon, 02 Mar 2026, Chuck Lever wrote:
> > >>>
> > >>> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> > >>>> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> > >>>>> Perhaps that description nails down too much implementation detail,
> > >>>>> and it might be stale. A broader description is this user story:
> > >>>>>
> > >>>>> "As a system administrator, I'd like to be able to unexport an NFSD
> > >>>>
> > >>>> Doesn't "unexporting" involve communicating to nfsd?
> > >>>> Meaning calling to svc_export_put() to path_put() the
> > >>>> share root path?
> > >>>>
> > >>>>> share that is being accessed by NFSv4 clients, and then unmount it,
> > >>>>> reliably (for example, via automation). Currently the umount step
> > >>>>> hangs if there are still outstanding delegations granted to the NFSv4
> > >>>>> clients."
> > >>>>
> > >>>> Can't svc_export_put() be the trigger for nfsd to release all resources
> > >>>> associated with this share?
> > >>>
> > >>> Currently unexport does not revoke NFSv4 state. So, that would
> > >>> be a user-visible behavior change. I suggested that approach a
> > >>> few months ago to linux-nfs@ and there was push-back.
> > >>>
> > >>
> > >> Could we add a "-F" or similar flag to "exportfs -u" which implements the
> > >> desired semantic?  i.e.  asking nfsd to release all locks and close all
> > >> state on the filesystem.
> > > 
> > > That meets my needs, but should be passed by the linux-nfs@ review
> > > committee.
> > 
> > Discussed with the reporter. -F addresses the automation requirement,
> > but users still expect "exportfs -u" to work the same way for NFSv3 and
> > NFSv4: "unexport" followed by "unmount" always works.
> > 
> > I am not remembering clearly why the linux-nfs folks though that NFSv4
> > delegations should stay in place after unexport. In my view, unexport
> > should be a security boundary, stopping access to the files on the
> > export.
> 
> At the time when the API was growing, delegations were barely an
> unhatched idea.
> 
> unexport may be a security boundary, but it is not so obvious that it is
> a state boundary.
> 
> The kernel is not directly involved in whether something is exported or
> not.  That is under the control of mountd/exportfs.  The kernel keeps a
> cache of info from there.  So if you want to impose a state boundary, it
> really should involved mountd/exportfs.
> 
> There was once this idea floating around that policy didn't belong in
> the kernel.

Very much agree.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-02 17:53                               ` Jeff Layton
@ 2026-03-04 13:17                                 ` Christian Brauner
  2026-03-04 15:15                                   ` Chuck Lever
  0 siblings, 1 reply; 30+ messages in thread
From: Christian Brauner @ 2026-03-04 13:17 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Jan Kara, Chuck Lever, NeilBrown, Amir Goldstein, Jan Kara,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-fsdevel,
	Chuck Lever

On Mon, Mar 02, 2026 at 12:53:17PM -0500, Jeff Layton wrote:
> On Mon, 2026-03-02 at 18:37 +0100, Jan Kara wrote:
> > On Mon 02-03-26 12:10:52, Jeff Layton wrote:
> > > On Mon, 2026-03-02 at 16:26 +0100, Jan Kara wrote:
> > > > On Mon 02-03-26 08:57:28, Chuck Lever wrote:
> > > > > On 3/1/26 11:09 PM, NeilBrown wrote:
> > > > > > On Mon, 02 Mar 2026, Chuck Lever wrote:
> > > > > > > On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
> > > > > > > > On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
> > > > > > > > > Perhaps that description nails down too much implementation detail,
> > > > > > > > > and it might be stale. A broader description is this user story:
> > > > > > > > > 
> > > > > > > > > "As a system administrator, I'd like to be able to unexport an NFSD
> > > > > > > > 
> > > > > > > > Doesn't "unexporting" involve communicating to nfsd?
> > > > > > > > Meaning calling to svc_export_put() to path_put() the
> > > > > > > > share root path?
> > > > > > > > 
> > > > > > > > > share that is being accessed by NFSv4 clients, and then unmount it,
> > > > > > > > > reliably (for example, via automation). Currently the umount step
> > > > > > > > > hangs if there are still outstanding delegations granted to the NFSv4
> > > > > > > > > clients."
> > > > > > > > 
> > > > > > > > Can't svc_export_put() be the trigger for nfsd to release all resources
> > > > > > > > associated with this share?
> > > > > > > 
> > > > > > > Currently unexport does not revoke NFSv4 state. So, that would
> > > > > > > be a user-visible behavior change. I suggested that approach a
> > > > > > > few months ago to linux-nfs@ and there was push-back.
> > > > > > > 
> > > > > > 
> > > > > > Could we add a "-F" or similar flag to "exportfs -u" which implements the
> > > > > > desired semantic?  i.e.  asking nfsd to release all locks and close all
> > > > > > state on the filesystem.
> > > > > 
> > > > > That meets my needs, but should be passed by the linux-nfs@ review
> > > > > committee.
> > > > > 
> > > > > -F could probably just use the existing "unlock filesystem" API
> > > > > after it does the unexport.
> > > > 
> > > > If this option flies, then I guess it is the most sensible variant. If it
> > > > doesn't work for some reason, then something like ->umount_begin sb
> > > > callback could be twisted (may possibly need some extension) to provide
> > > > the needed notification? At least in my naive understanding it was created
> > > > for usecases like this...
> > > > 
> > > > 								Honza
> > > 
> > > umount_begin is a superblock op that only occurs when MNT_FORCE is set.
> > > In this case though, we really want something that calls back into
> > > nfsd, rather than to the fs being unmounted.
> > 
> > I see OK.
> > 
> > > You could just wire up a bunch of umount_begin() operations but that
> > > seems rather nasty. Maybe you could add some sort of callback that nfsd
> > > could register that runs just before umount_begin does?
> > 
> > Thinking about this more - Chuck was also writing about the problem of
> > needing to shutdown the state only when this is the last unmount of a
> > superblock but until we grab namespace_lock(), that's impossible to tell in
> > a race-free manner? And how about lazy unmounts? There it would seem to be
> > extra hard to determine when NFS needs to drop it's delegations since you
> > need to figure out whether all file references are NFS internal only? It
> > all seems like a notification from VFS isn't the right place to solve this
> > issue...
> > 
> 
> The issue is that traditionally, "exportfs -u" is what unexports the
> filesystem and at that point you can (usually) unmount it. We'd ideally
> like to have a solution that doesn't create extra steps or change this,
> since there is already a lot of automation and muscle memory around
> these commands.
> 
> This method mostly works with v3, since there is no long term state
> (technically lockd can hold some, but that's only for file locking).
> With v4 that changed and nfsd holds files open for much longer.
> 
> We can't drop all the state when fs is unexported, as it's not uncommon
> for it to be reexported soon afterward, and we can't force a grace
> period at that point to allow reclaim.
> 
> Unmounting seems like the natural place for this. At the point where
> you're unmounting, there can be no more state and the admin's intent is
> clear. Tearing down nfsd state at that point seems pretty safe.
> 
> If we can't add some sort of hook to the umount path, then I'll
> understand, but it would be a nice to have for this use-case.

At first glance, umount seems like a natural place for a lot of things.

The locking and the guarantees that we have traditionally given to
userspace make it a very convoluted codepath and I'm very hesitant to
add more complexity in this part of the code.

Now I suggested the fsnotify mechanism because it's already there and if
it is _reasonably_ easy to provide the notification that nfs needs to
clean up whatever it needs to clean up than this is probably fine. What
I absolutely don't want is to have another custom notification
mechanism in the VFS layer.

But if we can solve this in userspace then it is absolutely the
preferred variant and what we should do.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification
  2026-03-04 13:17                                 ` Christian Brauner
@ 2026-03-04 15:15                                   ` Chuck Lever
  0 siblings, 0 replies; 30+ messages in thread
From: Chuck Lever @ 2026-03-04 15:15 UTC (permalink / raw)
  To: Christian Brauner, Jeff Layton
  Cc: Jan Kara, NeilBrown, Amir Goldstein, Jan Kara, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, linux-nfs, linux-fsdevel, Chuck Lever

On 3/4/26 8:17 AM, Christian Brauner wrote:
> On Mon, Mar 02, 2026 at 12:53:17PM -0500, Jeff Layton wrote:
>> On Mon, 2026-03-02 at 18:37 +0100, Jan Kara wrote:
>>> On Mon 02-03-26 12:10:52, Jeff Layton wrote:
>>>> On Mon, 2026-03-02 at 16:26 +0100, Jan Kara wrote:
>>>>> On Mon 02-03-26 08:57:28, Chuck Lever wrote:
>>>>>> On 3/1/26 11:09 PM, NeilBrown wrote:
>>>>>>> On Mon, 02 Mar 2026, Chuck Lever wrote:
>>>>>>>> On Sun, Mar 1, 2026, at 1:09 PM, Amir Goldstein wrote:
>>>>>>>>> On Sun, Mar 1, 2026 at 6:21 PM Chuck Lever <cel@kernel.org> wrote:
>>>>>>>>>> Perhaps that description nails down too much implementation detail,
>>>>>>>>>> and it might be stale. A broader description is this user story:
>>>>>>>>>>
>>>>>>>>>> "As a system administrator, I'd like to be able to unexport an NFSD
>>>>>>>>>
>>>>>>>>> Doesn't "unexporting" involve communicating to nfsd?
>>>>>>>>> Meaning calling to svc_export_put() to path_put() the
>>>>>>>>> share root path?
>>>>>>>>>
>>>>>>>>>> share that is being accessed by NFSv4 clients, and then unmount it,
>>>>>>>>>> reliably (for example, via automation). Currently the umount step
>>>>>>>>>> hangs if there are still outstanding delegations granted to the NFSv4
>>>>>>>>>> clients."
>>>>>>>>>
>>>>>>>>> Can't svc_export_put() be the trigger for nfsd to release all resources
>>>>>>>>> associated with this share?
>>>>>>>>
>>>>>>>> Currently unexport does not revoke NFSv4 state. So, that would
>>>>>>>> be a user-visible behavior change. I suggested that approach a
>>>>>>>> few months ago to linux-nfs@ and there was push-back.
>>>>>>>>
>>>>>>>
>>>>>>> Could we add a "-F" or similar flag to "exportfs -u" which implements the
>>>>>>> desired semantic?  i.e.  asking nfsd to release all locks and close all
>>>>>>> state on the filesystem.
>>>>>>
>>>>>> That meets my needs, but should be passed by the linux-nfs@ review
>>>>>> committee.
>>>>>>
>>>>>> -F could probably just use the existing "unlock filesystem" API
>>>>>> after it does the unexport.
>>>>>
>>>>> If this option flies, then I guess it is the most sensible variant. If it
>>>>> doesn't work for some reason, then something like ->umount_begin sb
>>>>> callback could be twisted (may possibly need some extension) to provide
>>>>> the needed notification? At least in my naive understanding it was created
>>>>> for usecases like this...
>>>>>
>>>>> 								Honza
>>>>
>>>> umount_begin is a superblock op that only occurs when MNT_FORCE is set.
>>>> In this case though, we really want something that calls back into
>>>> nfsd, rather than to the fs being unmounted.
>>>
>>> I see OK.
>>>
>>>> You could just wire up a bunch of umount_begin() operations but that
>>>> seems rather nasty. Maybe you could add some sort of callback that nfsd
>>>> could register that runs just before umount_begin does?
>>>
>>> Thinking about this more - Chuck was also writing about the problem of
>>> needing to shutdown the state only when this is the last unmount of a
>>> superblock but until we grab namespace_lock(), that's impossible to tell in
>>> a race-free manner? And how about lazy unmounts? There it would seem to be
>>> extra hard to determine when NFS needs to drop it's delegations since you
>>> need to figure out whether all file references are NFS internal only? It
>>> all seems like a notification from VFS isn't the right place to solve this
>>> issue...
>>>
>>
>> The issue is that traditionally, "exportfs -u" is what unexports the
>> filesystem and at that point you can (usually) unmount it. We'd ideally
>> like to have a solution that doesn't create extra steps or change this,
>> since there is already a lot of automation and muscle memory around
>> these commands.
>>
>> This method mostly works with v3, since there is no long term state
>> (technically lockd can hold some, but that's only for file locking).
>> With v4 that changed and nfsd holds files open for much longer.
>>
>> We can't drop all the state when fs is unexported, as it's not uncommon
>> for it to be reexported soon afterward, and we can't force a grace
>> period at that point to allow reclaim.
>>
>> Unmounting seems like the natural place for this. At the point where
>> you're unmounting, there can be no more state and the admin's intent is
>> clear. Tearing down nfsd state at that point seems pretty safe.
>>
>> If we can't add some sort of hook to the umount path, then I'll
>> understand, but it would be a nice to have for this use-case.
> 
> At first glance, umount seems like a natural place for a lot of things.
> 
> The locking and the guarantees that we have traditionally given to
> userspace make it a very convoluted codepath and I'm very hesitant to
> add more complexity in this part of the code.
> 
> Now I suggested the fsnotify mechanism because it's already there and if
> it is _reasonably_ easy to provide the notification that nfs needs to
> clean up whatever it needs to clean up than this is probably fine. What
> I absolutely don't want is to have another custom notification
> mechanism in the VFS layer.
> 
> But if we can solve this in userspace then it is absolutely the
> preferred variant and what we should do.

No problem with that line of reasoning. Right now it looks like we can
do this with changes to exportfs, so I will pursue that.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2026-03-04 15:15 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24 16:39 [PATCH v3 0/3] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
2026-02-24 16:39 ` [PATCH v3 1/3] fs: add umount notifier chain for filesystem unmount notification Chuck Lever
2026-02-26  8:48   ` Christian Brauner
2026-02-26 10:52     ` Amir Goldstein
2026-02-26 13:27       ` Chuck Lever
2026-02-26 13:32         ` Jan Kara
2026-02-27 15:10           ` Chuck Lever
2026-03-01 14:37             ` Amir Goldstein
2026-03-01 17:20               ` Chuck Lever
2026-03-01 18:09                 ` Amir Goldstein
2026-03-01 18:19                   ` Chuck Lever
2026-03-02  4:09                     ` NeilBrown
2026-03-02 13:57                       ` Chuck Lever
2026-03-02 15:26                         ` Jan Kara
2026-03-02 17:10                           ` Jeff Layton
2026-03-02 17:37                             ` Jan Kara
2026-03-02 17:53                               ` Jeff Layton
2026-03-04 13:17                                 ` Christian Brauner
2026-03-04 15:15                                   ` Chuck Lever
2026-03-02 20:46                               ` NeilBrown
2026-03-02 17:01                         ` Chuck Lever
2026-03-02 20:36                           ` NeilBrown
2026-03-03 20:02                             ` Chuck Lever
2026-03-03 21:23                               ` NeilBrown
2026-03-03 22:50                                 ` Chuck Lever
2026-03-04  1:01                                   ` NeilBrown
2026-03-04 13:05                             ` Christian Brauner
2026-02-24 16:39 ` [PATCH v3 2/3] nfsd: revoke NFSv4 state when filesystem is unmounted Chuck Lever
2026-02-24 16:39 ` [PATCH v3 3/3] nfsd: close cached files on filesystem unmount Chuck Lever
2026-02-24 17:14 ` [PATCH v3 0/3] Automatic NFSv4 state revocation " Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox