public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount
@ 2026-03-18 14:15 Chuck Lever
  2026-03-18 14:15 ` [PATCH v4 1/6] NFSD: Extract revoke_one_stid() utility function Chuck Lever
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Chuck Lever @ 2026-03-18 14:15 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

When an NFS server exports a filesystem and clients hold NFSv4
state (opens, locks, delegations), unmounting the underlying
filesystem fails with EBUSY. The /proc/fs/nfsd/unlock_fs procfs
interface revokes all NFSv4 state on a given filesystem's
superblock, but it operates at whole-filesystem granularity and
offers no structured way to extend the operation to other scopes.

This series adds an NFSD_CMD_UNLOCK netlink command that supports
typed, validated attributes and multiple scope values. The
initial "ip" scope provides a netlink equivalent of
write_unlock_ip, releasing NLM locks held by a specific client
address. A "filesystem" scope parallels write_unlock_fs, accepting
a path string and revoking NFSv4 state, NLM locks, async COPY
operations, and cached file handles associated with that path.

The filesystem scope revokes state at export granularity rather
than superblock granularity: when multiple exports share a
filesystem, only state referencing files under the specified
export root is affected. The subtree check walks all dentry
aliases of each inode so that hard-linked files outside the
export are not incorrectly matched. When the export root is the
filesystem root, the subtree check is elided.

The exportfs user space command can invoke NFSD_CMD_UNLOCK with
the filesystem scope after an explicit "exportfs -u", enabling
automated state cleanup without custom scripting or manual
procfs writes.

---
Changes since v3:
- All VFS changes replaced with new netlink "unlock" operation

Changes since v2:
- Replace fs_pin with an SRCU umount notifier chain in VFS
- Merge the pending COPY cancellation patch
- Replace xa_cmpxchg() with xa_insert()
- Use cancel_work_sync() instead of flush_workqueue()
- Remove rcu_barrier()
- Correct misleading claims in kdoc comments and commit messages

Changes since v1:
- Explain why drop_client() is being renamed
- Finish implementing revocation on umount
- Rename pin_insert_group
- Clarified log output and code comments
- Hold nfsd_mutex while closing nfsd_files

---
Chuck Lever (6):
      NFSD: Extract revoke_one_stid() utility function
      NFSD: Add NFSD_CMD_UNLOCK netlink command with ip scope
      NFSD: Add filesystem scope to NFSD_CMD_UNLOCK
      NFSD: Refactor find_one_sb_stid() into find_next_sb_stid()
      NFSD: Add export-scoped state revocation
      NFSD: Add nfsd_file_close_export() for file cache cleanup

 Documentation/netlink/specs/nfsd.yaml |  39 ++++++
 fs/nfsd/filecache.c                   |  77 ++++++++++++
 fs/nfsd/filecache.h                   |   2 +
 fs/nfsd/netlink.c                     |  14 +++
 fs/nfsd/netlink.h                     |   1 +
 fs/nfsd/nfs4state.c                   | 219 +++++++++++++++++++++-------------
 fs/nfsd/nfsctl.c                      | 105 +++++++++++++++-
 fs/nfsd/state.h                       |   7 ++
 fs/nfsd/trace.h                       |  13 +-
 include/uapi/linux/nfsd_netlink.h     |  19 +++
 10 files changed, 406 insertions(+), 90 deletions(-)
---
base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c
change-id: 20260318-umount-kills-nfsv4-state-138218f2f4e0

Best regards,
--  
Chuck Lever


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/6] NFSD: Extract revoke_one_stid() utility function
  2026-03-18 14:15 [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
@ 2026-03-18 14:15 ` Chuck Lever
  2026-03-18 14:21   ` Jeff Layton
  2026-03-18 14:15 ` [PATCH v4 2/6] NFSD: Add NFSD_CMD_UNLOCK netlink command with ip scope Chuck Lever
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-03-18 14:15 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

The per-stateid revocation logic in nfsd4_revoke_states() handles
four stateid types in a deeply nested switch. Extract two helpers:

revoke_ol_stid() performs admin-revocation of an open or lock
stateid with st_mutex already held: marks the stateid as
SC_STATUS_ADMIN_REVOKED, closes POSIX locks for lock stateids,
and releases file access.

revoke_one_stid() dispatches by sc_type, acquires st_mutex with
the appropriate lockdep class for open and lock stateids, and
handles delegation unhash and layout close inline.

No functional change. Preparation for adding export-scoped state
revocation which reuses revoke_one_stid().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4state.c | 126 ++++++++++++++++++++++++++--------------------------
 1 file changed, 63 insertions(+), 63 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6b9c399b89df..d476192e4b27 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1757,6 +1757,68 @@ static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
 	return stid;
 }
 
+static void revoke_ol_stid(struct nfs4_client *clp,
+			   struct nfs4_ol_stateid *stp)
+{
+	struct nfs4_stid *stid = &stp->st_stid;
+
+	lockdep_assert_held(&stp->st_mutex);
+	spin_lock(&clp->cl_lock);
+	if (stid->sc_status == 0) {
+		stid->sc_status |= SC_STATUS_ADMIN_REVOKED;
+		atomic_inc(&clp->cl_admin_revoked);
+		spin_unlock(&clp->cl_lock);
+		if (stid->sc_type == SC_TYPE_LOCK) {
+			struct nfs4_lockowner *lo =
+				lockowner(stp->st_stateowner);
+			struct nfsd_file *nf;
+
+			nf = find_any_file(stp->st_stid.sc_file);
+			if (nf) {
+				get_file(nf->nf_file);
+				filp_close(nf->nf_file, (fl_owner_t)lo);
+				nfsd_file_put(nf);
+			}
+		}
+		release_all_access(stp);
+	} else
+		spin_unlock(&clp->cl_lock);
+}
+
+static void revoke_one_stid(struct nfs4_client *clp, struct nfs4_stid *stid)
+{
+	struct nfs4_ol_stateid *stp;
+	struct nfs4_delegation *dp;
+
+	switch (stid->sc_type) {
+	case SC_TYPE_OPEN:
+		stp = openlockstateid(stid);
+		mutex_lock_nested(&stp->st_mutex, OPEN_STATEID_MUTEX);
+		revoke_ol_stid(clp, stp);
+		mutex_unlock(&stp->st_mutex);
+		break;
+	case SC_TYPE_LOCK:
+		stp = openlockstateid(stid);
+		mutex_lock_nested(&stp->st_mutex, LOCK_STATEID_MUTEX);
+		revoke_ol_stid(clp, stp);
+		mutex_unlock(&stp->st_mutex);
+		break;
+	case SC_TYPE_DELEG:
+		refcount_inc(&stid->sc_count);
+		dp = delegstateid(stid);
+		spin_lock(&state_lock);
+		if (!unhash_delegation_locked(dp, SC_STATUS_ADMIN_REVOKED))
+			dp = NULL;
+		spin_unlock(&state_lock);
+		if (dp)
+			revoke_delegation(dp);
+		break;
+	case SC_TYPE_LAYOUT:
+		nfsd4_close_layout(layoutstateid(stid));
+		break;
+	}
+}
+
 /**
  * nfsd4_revoke_states - revoke all nfsv4 states associated with given filesystem
  * @nn:   used to identify instance of nfsd (there is one per net namespace)
@@ -1787,70 +1849,8 @@ void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
 			struct nfs4_stid *stid = find_one_sb_stid(clp, sb,
 								  sc_types);
 			if (stid) {
-				struct nfs4_ol_stateid *stp;
-				struct nfs4_delegation *dp;
-				struct nfs4_layout_stateid *ls;
-
 				spin_unlock(&nn->client_lock);
-				switch (stid->sc_type) {
-				case SC_TYPE_OPEN:
-					stp = openlockstateid(stid);
-					mutex_lock_nested(&stp->st_mutex,
-							  OPEN_STATEID_MUTEX);
-
-					spin_lock(&clp->cl_lock);
-					if (stid->sc_status == 0) {
-						stid->sc_status |=
-							SC_STATUS_ADMIN_REVOKED;
-						atomic_inc(&clp->cl_admin_revoked);
-						spin_unlock(&clp->cl_lock);
-						release_all_access(stp);
-					} else
-						spin_unlock(&clp->cl_lock);
-					mutex_unlock(&stp->st_mutex);
-					break;
-				case SC_TYPE_LOCK:
-					stp = openlockstateid(stid);
-					mutex_lock_nested(&stp->st_mutex,
-							  LOCK_STATEID_MUTEX);
-					spin_lock(&clp->cl_lock);
-					if (stid->sc_status == 0) {
-						struct nfs4_lockowner *lo =
-							lockowner(stp->st_stateowner);
-						struct nfsd_file *nf;
-
-						stid->sc_status |=
-							SC_STATUS_ADMIN_REVOKED;
-						atomic_inc(&clp->cl_admin_revoked);
-						spin_unlock(&clp->cl_lock);
-						nf = find_any_file(stp->st_stid.sc_file);
-						if (nf) {
-							get_file(nf->nf_file);
-							filp_close(nf->nf_file,
-								   (fl_owner_t)lo);
-							nfsd_file_put(nf);
-						}
-						release_all_access(stp);
-					} else
-						spin_unlock(&clp->cl_lock);
-					mutex_unlock(&stp->st_mutex);
-					break;
-				case SC_TYPE_DELEG:
-					refcount_inc(&stid->sc_count);
-					dp = delegstateid(stid);
-					spin_lock(&state_lock);
-					if (!unhash_delegation_locked(
-						    dp, SC_STATUS_ADMIN_REVOKED))
-						dp = NULL;
-					spin_unlock(&state_lock);
-					if (dp)
-						revoke_delegation(dp);
-					break;
-				case SC_TYPE_LAYOUT:
-					ls = layoutstateid(stid);
-					nfsd4_close_layout(ls);
-					break;
-				}
+				revoke_one_stid(clp, stid);
 				nfs4_put_stid(stid);
 				spin_lock(&nn->client_lock);
 				if (clp->cl_minorversion == 0)

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 2/6] NFSD: Add NFSD_CMD_UNLOCK netlink command with ip scope
  2026-03-18 14:15 [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
  2026-03-18 14:15 ` [PATCH v4 1/6] NFSD: Extract revoke_one_stid() utility function Chuck Lever
@ 2026-03-18 14:15 ` Chuck Lever
  2026-03-18 14:28   ` Jeff Layton
  2026-03-18 14:15 ` [PATCH v4 3/6] NFSD: Add filesystem scope to NFSD_CMD_UNLOCK Chuck Lever
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-03-18 14:15 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

The existing write_unlock_ip procfs interface releases NLM file
locks held by a specific client IP address, but procfs provides
no structured way to extend that operation to other scopes such
as revoking NFSv4 state. A netlink command allows the operation
to carry typed, validated attributes and supports future scope
values without interface proliferation.

NFSD_CMD_UNLOCK accepts an unlock-type attribute selecting the
scope and an address attribute carrying a binary sockaddr_in
or sockaddr_in6. The handler validates the address family
and length, then calls nlmsvc_unlock_all_by_ip() to release
matching NLM locks. Because lockd is a single global instance,
that call operates across all network namespaces regardless of
which namespace the caller inhabits. The command requires admin
privileges via GENL_ADMIN_PERM.

The unlock-type enum begins with a single value, ip, and is
defined with render-max so that future values can be added
without breaking existing userspace.

The nfsd_ctl_unlock_ip tracepoint is updated from string-based
address logging to __sockaddr, which stores the binary sockaddr
and formats it with %pISpc. This affects both the new netlink
path and the existing procfs write_unlock_ip path, giving
consistent structured output in both cases.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 Documentation/netlink/specs/nfsd.yaml | 32 ++++++++++++++++++
 fs/nfsd/netlink.c                     | 13 ++++++++
 fs/nfsd/netlink.h                     |  1 +
 fs/nfsd/nfsctl.c                      | 63 ++++++++++++++++++++++++++++++++++-
 fs/nfsd/trace.h                       | 13 ++++----
 include/uapi/linux/nfsd_netlink.h     | 17 ++++++++++
 6 files changed, 132 insertions(+), 7 deletions(-)

diff --git a/Documentation/netlink/specs/nfsd.yaml b/Documentation/netlink/specs/nfsd.yaml
index f87b5a05e5e9..02fadfca22ba 100644
--- a/Documentation/netlink/specs/nfsd.yaml
+++ b/Documentation/netlink/specs/nfsd.yaml
@@ -6,6 +6,13 @@ uapi-header: linux/nfsd_netlink.h
 
 doc: NFSD configuration over generic netlink.
 
+definitions:
+  -
+    type: enum
+    name: unlock-type
+    render-max: true
+    entries: [ip]
+
 attribute-sets:
   -
     name: rpc-status
@@ -127,6 +134,21 @@ attribute-sets:
       -
         name: npools
         type: u32
+  -
+    name: unlock
+    attributes:
+      -
+        name: type
+        type: u32
+        enum: unlock-type
+      -
+        name: address
+        type: binary
+        doc: >-
+          struct sockaddr_in or struct sockaddr_in6.
+          Required when type is ip.
+        checks:
+          min-len: 16
 
 operations:
   list:
@@ -227,3 +249,13 @@ operations:
           attributes:
             - mode
             - npools
+    -
+      name: unlock
+      doc: release NLM locks by scope
+      attribute-set: unlock
+      flags: [admin-perm]
+      do:
+        request:
+          attributes:
+            - type
+            - address
diff --git a/fs/nfsd/netlink.c b/fs/nfsd/netlink.c
index 887525964451..9ec0d56eaa21 100644
--- a/fs/nfsd/netlink.c
+++ b/fs/nfsd/netlink.c
@@ -47,6 +47,12 @@ static const struct nla_policy nfsd_pool_mode_set_nl_policy[NFSD_A_POOL_MODE_MOD
 	[NFSD_A_POOL_MODE_MODE] = { .type = NLA_NUL_STRING, },
 };
 
+/* NFSD_CMD_UNLOCK - do */
+static const struct nla_policy nfsd_unlock_nl_policy[NFSD_A_UNLOCK_ADDRESS + 1] = {
+	[NFSD_A_UNLOCK_TYPE] = NLA_POLICY_MAX(NLA_U32, 0),
+	[NFSD_A_UNLOCK_ADDRESS] = NLA_POLICY_MIN_LEN(16),
+};
+
 /* Ops table for nfsd */
 static const struct genl_split_ops nfsd_nl_ops[] = {
 	{
@@ -102,6 +108,13 @@ static const struct genl_split_ops nfsd_nl_ops[] = {
 		.doit	= nfsd_nl_pool_mode_get_doit,
 		.flags	= GENL_CMD_CAP_DO,
 	},
+	{
+		.cmd		= NFSD_CMD_UNLOCK,
+		.doit		= nfsd_nl_unlock_doit,
+		.policy		= nfsd_unlock_nl_policy,
+		.maxattr	= NFSD_A_UNLOCK_ADDRESS,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
 };
 
 struct genl_family nfsd_nl_family __ro_after_init = {
diff --git a/fs/nfsd/netlink.h b/fs/nfsd/netlink.h
index 478117ff6b8c..3ece774e5f52 100644
--- a/fs/nfsd/netlink.h
+++ b/fs/nfsd/netlink.h
@@ -26,6 +26,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info);
 int nfsd_nl_listener_get_doit(struct sk_buff *skb, struct genl_info *info);
 int nfsd_nl_pool_mode_set_doit(struct sk_buff *skb, struct genl_info *info);
 int nfsd_nl_pool_mode_get_doit(struct sk_buff *skb, struct genl_info *info);
+int nfsd_nl_unlock_doit(struct sk_buff *skb, struct genl_info *info);
 
 extern struct genl_family nfsd_nl_family;
 
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 4cc8a58fa56a..858f3803c490 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -236,7 +236,7 @@ static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size)
 	if (rpc_pton(net, fo_path, size, sap, salen) == 0)
 		return -EINVAL;
 
-	trace_nfsd_ctl_unlock_ip(net, buf);
+	trace_nfsd_ctl_unlock_ip(net, sap, svc_addr_len(sap));
 	return nlmsvc_unlock_all_by_ip(sap);
 }
 
@@ -2142,6 +2142,67 @@ int nfsd_nl_pool_mode_get_doit(struct sk_buff *skb, struct genl_info *info)
 	return err;
 }
 
+/**
+ * nfsd_nl_unlock_by_ip - release NLM locks held by an IP address
+ * @info: netlink metadata and command arguments
+ *
+ * Return: 0 on success or a negative errno.
+ */
+static int nfsd_nl_unlock_by_ip(struct genl_info *info)
+{
+	struct sockaddr *sap;
+
+	if (GENL_REQ_ATTR_CHECK(info, NFSD_A_UNLOCK_ADDRESS))
+		return -EINVAL;
+	sap = nla_data(info->attrs[NFSD_A_UNLOCK_ADDRESS]);
+	switch (sap->sa_family) {
+	case AF_INET:
+		if (nla_len(info->attrs[NFSD_A_UNLOCK_ADDRESS]) <
+		    sizeof(struct sockaddr_in))
+			return -EINVAL;
+		break;
+	case AF_INET6:
+		if (nla_len(info->attrs[NFSD_A_UNLOCK_ADDRESS]) <
+		    sizeof(struct sockaddr_in6))
+			return -EINVAL;
+		break;
+	default:
+		return -EAFNOSUPPORT;
+	}
+	/*
+	 * nlmsvc_unlock_all_by_ip() releases matching locks
+	 * across all network namespaces because lockd operates
+	 * a single global instance.
+	 */
+	trace_nfsd_ctl_unlock_ip(genl_info_net(info), sap,
+				 svc_addr_len(sap));
+	return nlmsvc_unlock_all_by_ip(sap);
+}
+
+/**
+ * nfsd_nl_unlock_doit - release NLM locks by scope
+ * @skb: reply buffer
+ * @info: netlink metadata and command arguments
+ *
+ * Return: 0 on success or a negative errno.
+ */
+int nfsd_nl_unlock_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	u32 type;
+
+	if (GENL_REQ_ATTR_CHECK(info, NFSD_A_UNLOCK_TYPE))
+		return -EINVAL;
+
+	type = nla_get_u32(info->attrs[NFSD_A_UNLOCK_TYPE]);
+
+	switch (type) {
+	case NFSD_UNLOCK_TYPE_IP:
+		return nfsd_nl_unlock_by_ip(info);
+	default:
+		return -EINVAL;
+	}
+}
+
 /**
  * nfsd_net_init - Prepare the nfsd_net portion of a new net namespace
  * @net: a freshly-created network namespace
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index d1d0b0dd0545..c770c5e6b1e7 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -1984,19 +1984,20 @@ TRACE_EVENT(nfsd_cb_recall_any_done,
 TRACE_EVENT(nfsd_ctl_unlock_ip,
 	TP_PROTO(
 		const struct net *net,
-		const char *address
+		const struct sockaddr *addr,
+		const unsigned int addrlen
 	),
-	TP_ARGS(net, address),
+	TP_ARGS(net, addr, addrlen),
 	TP_STRUCT__entry(
 		__field(unsigned int, netns_ino)
-		__string(address, address)
+		__sockaddr(addr, addrlen)
 	),
 	TP_fast_assign(
 		__entry->netns_ino = net->ns.inum;
-		__assign_str(address);
+		__assign_sockaddr(addr, addr, addrlen);
 	),
-	TP_printk("address=%s",
-		__get_str(address)
+	TP_printk("addr=%pISpc",
+		__get_sockaddr(addr)
 	)
 );
 
diff --git a/include/uapi/linux/nfsd_netlink.h b/include/uapi/linux/nfsd_netlink.h
index e9efbc9e63d8..8edd75590f31 100644
--- a/include/uapi/linux/nfsd_netlink.h
+++ b/include/uapi/linux/nfsd_netlink.h
@@ -10,6 +10,14 @@
 #define NFSD_FAMILY_NAME	"nfsd"
 #define NFSD_FAMILY_VERSION	1
 
+enum nfsd_unlock_type {
+	NFSD_UNLOCK_TYPE_IP,
+
+	/* private: */
+	__NFSD_UNLOCK_TYPE_MAX,
+	NFSD_UNLOCK_TYPE_MAX = (__NFSD_UNLOCK_TYPE_MAX - 1)
+};
+
 enum {
 	NFSD_A_RPC_STATUS_XID = 1,
 	NFSD_A_RPC_STATUS_FLAGS,
@@ -80,6 +88,14 @@ enum {
 	NFSD_A_POOL_MODE_MAX = (__NFSD_A_POOL_MODE_MAX - 1)
 };
 
+enum {
+	NFSD_A_UNLOCK_TYPE = 1,
+	NFSD_A_UNLOCK_ADDRESS,
+
+	__NFSD_A_UNLOCK_MAX,
+	NFSD_A_UNLOCK_MAX = (__NFSD_A_UNLOCK_MAX - 1)
+};
+
 enum {
 	NFSD_CMD_RPC_STATUS_GET = 1,
 	NFSD_CMD_THREADS_SET,
@@ -90,6 +106,7 @@ enum {
 	NFSD_CMD_LISTENER_GET,
 	NFSD_CMD_POOL_MODE_SET,
 	NFSD_CMD_POOL_MODE_GET,
+	NFSD_CMD_UNLOCK,
 
 	__NFSD_CMD_MAX,
 	NFSD_CMD_MAX = (__NFSD_CMD_MAX - 1)

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 3/6] NFSD: Add filesystem scope to NFSD_CMD_UNLOCK
  2026-03-18 14:15 [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
  2026-03-18 14:15 ` [PATCH v4 1/6] NFSD: Extract revoke_one_stid() utility function Chuck Lever
  2026-03-18 14:15 ` [PATCH v4 2/6] NFSD: Add NFSD_CMD_UNLOCK netlink command with ip scope Chuck Lever
@ 2026-03-18 14:15 ` Chuck Lever
  2026-03-18 14:29   ` Jeff Layton
  2026-03-18 14:15 ` [PATCH v4 4/6] NFSD: Refactor find_one_sb_stid() into find_next_sb_stid() Chuck Lever
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-03-18 14:15 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Add NFSD_UNLOCK_TYPE_FILESYSTEM to the NFSD_CMD_UNLOCK netlink
command, providing a netlink equivalent of /proc/fs/nfsd/unlock_fs.

The filesystem scope requires a "path" string attribute containing
the filesystem path whose state should be released. The handler
resolves the path to its superblock, then cancels async copies,
releases NLM locks, and revokes NFSv4 state on that superblock --
the same operations performed by write_unlock_fs().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 Documentation/netlink/specs/nfsd.yaml | 11 ++++++++--
 fs/nfsd/netlink.c                     |  7 +++---
 fs/nfsd/nfsctl.c                      | 41 ++++++++++++++++++++++++++++++++++-
 include/uapi/linux/nfsd_netlink.h     |  2 ++
 4 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/Documentation/netlink/specs/nfsd.yaml b/Documentation/netlink/specs/nfsd.yaml
index 02fadfca22ba..1083ef60cac3 100644
--- a/Documentation/netlink/specs/nfsd.yaml
+++ b/Documentation/netlink/specs/nfsd.yaml
@@ -11,7 +11,7 @@ definitions:
     type: enum
     name: unlock-type
     render-max: true
-    entries: [ip]
+    entries: [ip, filesystem]
 
 attribute-sets:
   -
@@ -149,6 +149,12 @@ attribute-sets:
           Required when type is ip.
         checks:
           min-len: 16
+      -
+        name: path
+        type: string
+        doc: >-
+          Filesystem path whose state should be released.
+          Required when type is filesystem.
 
 operations:
   list:
@@ -251,7 +257,7 @@ operations:
             - npools
     -
       name: unlock
-      doc: release NLM locks by scope
+      doc: release locks or revoke NFS state by scope
       attribute-set: unlock
       flags: [admin-perm]
       do:
@@ -259,3 +265,4 @@ operations:
           attributes:
             - type
             - address
+            - path
diff --git a/fs/nfsd/netlink.c b/fs/nfsd/netlink.c
index 9ec0d56eaa21..8367d4e3fe4f 100644
--- a/fs/nfsd/netlink.c
+++ b/fs/nfsd/netlink.c
@@ -48,9 +48,10 @@ static const struct nla_policy nfsd_pool_mode_set_nl_policy[NFSD_A_POOL_MODE_MOD
 };
 
 /* NFSD_CMD_UNLOCK - do */
-static const struct nla_policy nfsd_unlock_nl_policy[NFSD_A_UNLOCK_ADDRESS + 1] = {
-	[NFSD_A_UNLOCK_TYPE] = NLA_POLICY_MAX(NLA_U32, 0),
+static const struct nla_policy nfsd_unlock_nl_policy[NFSD_A_UNLOCK_PATH + 1] = {
+	[NFSD_A_UNLOCK_TYPE] = NLA_POLICY_MAX(NLA_U32, NFSD_UNLOCK_TYPE_MAX),
 	[NFSD_A_UNLOCK_ADDRESS] = NLA_POLICY_MIN_LEN(16),
+	[NFSD_A_UNLOCK_PATH] = { .type = NLA_NUL_STRING, .len = PATH_MAX - 1, },
 };
 
 /* Ops table for nfsd */
@@ -112,7 +113,7 @@ static const struct genl_split_ops nfsd_nl_ops[] = {
 		.cmd		= NFSD_CMD_UNLOCK,
 		.doit		= nfsd_nl_unlock_doit,
 		.policy		= nfsd_unlock_nl_policy,
-		.maxattr	= NFSD_A_UNLOCK_ADDRESS,
+		.maxattr	= NFSD_A_UNLOCK_PATH,
 		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
 	},
 };
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 858f3803c490..d3ed343699bd 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2180,7 +2180,44 @@ static int nfsd_nl_unlock_by_ip(struct genl_info *info)
 }
 
 /**
- * nfsd_nl_unlock_doit - release NLM locks by scope
+ * nfsd_nl_unlock_by_filesystem - release locks and state on a filesystem
+ * @info: netlink metadata and command arguments
+ *
+ * Return: 0 on success or a negative errno.
+ */
+static int nfsd_nl_unlock_by_filesystem(struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	struct path path;
+	int error;
+
+	if (GENL_REQ_ATTR_CHECK(info, NFSD_A_UNLOCK_PATH))
+		return -EINVAL;
+
+	trace_nfsd_ctl_unlock_fs(net,
+				 nla_data(info->attrs[NFSD_A_UNLOCK_PATH]));
+	error = kern_path(nla_data(info->attrs[NFSD_A_UNLOCK_PATH]),
+			  0, &path);
+	if (error)
+		return error;
+
+	nfsd4_cancel_copy_by_sb(net, path.dentry->d_sb);
+	error = nlmsvc_unlock_all_by_sb(path.dentry->d_sb);
+
+	mutex_lock(&nfsd_mutex);
+	if (nn->nfsd_serv)
+		nfsd4_revoke_states(nn, path.dentry->d_sb);
+	else
+		error = -EINVAL;
+	mutex_unlock(&nfsd_mutex);
+
+	path_put(&path);
+	return error;
+}
+
+/**
+ * nfsd_nl_unlock_doit - release locks or revoke NFS state
  * @skb: reply buffer
  * @info: netlink metadata and command arguments
  *
@@ -2198,6 +2235,8 @@ int nfsd_nl_unlock_doit(struct sk_buff *skb, struct genl_info *info)
 	switch (type) {
 	case NFSD_UNLOCK_TYPE_IP:
 		return nfsd_nl_unlock_by_ip(info);
+	case NFSD_UNLOCK_TYPE_FILESYSTEM:
+		return nfsd_nl_unlock_by_filesystem(info);
 	default:
 		return -EINVAL;
 	}
diff --git a/include/uapi/linux/nfsd_netlink.h b/include/uapi/linux/nfsd_netlink.h
index 8edd75590f31..340ad36080fe 100644
--- a/include/uapi/linux/nfsd_netlink.h
+++ b/include/uapi/linux/nfsd_netlink.h
@@ -12,6 +12,7 @@
 
 enum nfsd_unlock_type {
 	NFSD_UNLOCK_TYPE_IP,
+	NFSD_UNLOCK_TYPE_FILESYSTEM,
 
 	/* private: */
 	__NFSD_UNLOCK_TYPE_MAX,
@@ -91,6 +92,7 @@ enum {
 enum {
 	NFSD_A_UNLOCK_TYPE = 1,
 	NFSD_A_UNLOCK_ADDRESS,
+	NFSD_A_UNLOCK_PATH,
 
 	__NFSD_A_UNLOCK_MAX,
 	NFSD_A_UNLOCK_MAX = (__NFSD_A_UNLOCK_MAX - 1)

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 4/6] NFSD: Refactor find_one_sb_stid() into find_next_sb_stid()
  2026-03-18 14:15 [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
                   ` (2 preceding siblings ...)
  2026-03-18 14:15 ` [PATCH v4 3/6] NFSD: Add filesystem scope to NFSD_CMD_UNLOCK Chuck Lever
@ 2026-03-18 14:15 ` Chuck Lever
  2026-03-18 14:30   ` Jeff Layton
  2026-03-18 14:15 ` [PATCH v4 5/6] NFSD: Add export-scoped state revocation Chuck Lever
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-03-18 14:15 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Export-scoped state revocation, added in a subsequent commit,
needs to advance past non-matching stids within the same
client without restarting from position zero on each call.

Extract find_next_sb_stid(), which accepts a starting IDR
position and can resume iteration across multiple calls.
find_one_sb_stid() becomes a thin wrapper that starts from
position zero.

No change in behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4state.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d476192e4b27..891669b32804 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1738,25 +1738,40 @@ static void release_openowner(struct nfs4_openowner *oo)
 	nfs4_put_stateowner(&oo->oo_owner);
 }
 
-static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
-					  struct super_block *sb,
-					  unsigned int sc_types)
+/*
+ * On return of a non-NULL stid, @id holds that entry's IDR key;
+ * caller must increment @id before the next call to advance past it.
+ */
+static struct nfs4_stid *find_next_sb_stid(struct nfs4_client *clp,
+					   struct super_block *sb,
+					   unsigned int sc_types,
+					   unsigned long *id)
 {
-	unsigned long id, tmp;
 	struct nfs4_stid *stid;
 
 	spin_lock(&clp->cl_lock);
-	idr_for_each_entry_ul(&clp->cl_stateids, stid, tmp, id)
+	while ((stid = idr_get_next_ul(&clp->cl_stateids, id)) != NULL) {
 		if ((stid->sc_type & sc_types) &&
 		    stid->sc_status == 0 &&
 		    stid->sc_file->fi_inode->i_sb == sb) {
 			refcount_inc(&stid->sc_count);
 			break;
 		}
+		(*id)++;
+	}
 	spin_unlock(&clp->cl_lock);
 	return stid;
 }
 
+static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
+					  struct super_block *sb,
+					  unsigned int sc_types)
+{
+	unsigned long id = 0;
+
+	return find_next_sb_stid(clp, sb, sc_types, &id);
+}
+
 static void revoke_ol_stid(struct nfs4_client *clp,
 			   struct nfs4_ol_stateid *stp)
 {

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 5/6] NFSD: Add export-scoped state revocation
  2026-03-18 14:15 [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
                   ` (3 preceding siblings ...)
  2026-03-18 14:15 ` [PATCH v4 4/6] NFSD: Refactor find_one_sb_stid() into find_next_sb_stid() Chuck Lever
@ 2026-03-18 14:15 ` Chuck Lever
  2026-03-18 14:47   ` Jeff Layton
  2026-03-18 14:15 ` [PATCH v4 6/6] NFSD: Add nfsd_file_close_export() for file cache cleanup Chuck Lever
  2026-03-18 14:24 ` [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Jeff Layton
  6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-03-18 14:15 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

nfsd4_revoke_states() revokes all NFSv4 state on an entire
superblock, which is too coarse when multiple exports share a
filesystem. Add nfsd4_revoke_export_states() to revoke only
state associated with files under a specific export root, then
convert nfsd4_revoke_states() to a thin wrapper that passes
sb->s_root.

nfsd4_revoke_export_states() uses find_next_sb_stid() to locate
candidate stids, then verifies each against the export root via
nfsd_file_inode_is_in_subtree(). That helper is placed in the
file cache layer (filecache.c) because it operates on VFS types
with no NFSv4 state dependency. It walks all of an inode's
dentry aliases rather than calling d_find_any_alias(), because
for hard-linked files an arbitrary alias may fall outside the
export subtree even when another alias is inside it.

When the export root is the filesystem root, the subtree check
is elided and every stid matching the superblock is revoked
directly.

The NFSD_UNLOCK_TYPE_FILESYSTEM handler now calls
nfsd4_revoke_export_states() with the resolved path dentry,
enabling subtree-scoped revocation through the netlink
interface.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c | 32 +++++++++++++++++++
 fs/nfsd/filecache.h |  1 +
 fs/nfsd/nfs4state.c | 92 +++++++++++++++++++++++++++++++++++++----------------
 fs/nfsd/nfsctl.c    |  3 +-
 fs/nfsd/state.h     |  7 ++++
 5 files changed, 107 insertions(+), 28 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 1e2b38ed1d35..cd09be0c5465 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -894,6 +894,38 @@ __nfsd_file_cache_purge(struct net *net)
 	nfsd_file_dispose_list(&dispose);
 }
 
+/**
+ * nfsd_file_inode_is_in_subtree - check whether an inode is under a subtree
+ * @inode:        inode to test
+ * @root_dentry:  dentry of the subtree root
+ *
+ * Check whether @inode has any dentry alias that falls within the
+ * subtree rooted at @root_dentry.  Hard-linked files can have aliases
+ * in multiple directories, so all aliases must be tested.
+ *
+ * Return: %true if any dentry alias of @inode is at or below
+ * @root_dentry, %false otherwise.
+ */
+bool nfsd_file_inode_is_in_subtree(struct inode *inode,
+				   struct dentry *root_dentry)
+{
+	struct dentry *alias;
+	bool found = false;
+
+	/* i_lock stabilizes the alias list; is_subdir() nests
+	 * rename_lock (a seqlock) beneath it but does not sleep.
+	 */
+	spin_lock(&inode->i_lock);
+	hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
+		if (is_subdir(alias, root_dentry)) {
+			found = true;
+			break;
+		}
+	}
+	spin_unlock(&inode->i_lock);
+	return found;
+}
+
 static struct nfsd_fcache_disposal *
 nfsd_alloc_fcache_disposal(void)
 {
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index b383dbc5b921..36c9a8e388d2 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -70,6 +70,7 @@ struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
 struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
 struct file *nfsd_file_file(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
+bool nfsd_file_inode_is_in_subtree(struct inode *inode, struct dentry *root_dentry);
 void nfsd_file_net_dispose(struct nfsd_net *nn);
 bool nfsd_file_is_cached(struct inode *inode);
 __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 891669b32804..581f38395c42 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1763,15 +1763,6 @@ static struct nfs4_stid *find_next_sb_stid(struct nfs4_client *clp,
 	return stid;
 }
 
-static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
-					  struct super_block *sb,
-					  unsigned int sc_types)
-{
-	unsigned long id = 0;
-
-	return find_next_sb_stid(clp, sb, sc_types, &id);
-}
-
 static void revoke_ol_stid(struct nfs4_client *clp,
 			   struct nfs4_ol_stateid *stp)
 {
@@ -1835,20 +1826,19 @@ static void revoke_one_stid(struct nfs4_client *clp, struct nfs4_stid *stid)
 }
 
 /**
- * nfsd4_revoke_states - revoke all nfsv4 states associated with given filesystem
- * @nn:   used to identify instance of nfsd (there is one per net namespace)
- * @sb:   super_block used to identify target filesystem
+ * nfsd4_revoke_export_states - revoke states associated with a given export
+ * @nn:           nfsd_net identifying the nfsd instance (one per net namespace)
+ * @sb:           super_block of the export's filesystem
+ * @root_dentry:  dentry of the export root directory
  *
  * All nfs4 states (open, lock, delegation, layout) held by the server instance
- * and associated with a file on the given filesystem will be revoked resulting
- * in any files being closed and so all references from nfsd to the filesystem
- * being released.  Thus nfsd will no longer prevent the filesystem from being
- * unmounted.
- *
- * The clients which own the states will subsequently being notified that the
- * states have been "admin-revoked".
+ * and associated with files under the given export will be revoked.  When
+ * @root_dentry is the filesystem root, all state on @sb is revoked (equivalent
+ * to nfsd4_revoke_states).  When @root_dentry is a subdirectory, only state on
+ * files within that subtree is revoked.
  */
-void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
+void nfsd4_revoke_export_states(struct nfsd_net *nn, struct super_block *sb,
+				struct dentry *root_dentry)
 {
 	unsigned int idhashval;
 	unsigned int sc_types;
@@ -1861,18 +1851,53 @@ void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
 		struct nfs4_client *clp;
 	retry:
 		list_for_each_entry(clp, head, cl_idhash) {
-			struct nfs4_stid *stid = find_one_sb_stid(clp, sb,
-								  sc_types);
-			if (stid) {
-				spin_unlock(&nn->client_lock);
+			struct nfs4_stid *stid;
+			/* Resets to zero on each retry; revocation may
+			 * alter the IDR, so a stale cursor is unsafe.
+			 */
+			unsigned long id = 0;
+
+			while ((stid = find_next_sb_stid(clp, sb,
+						sc_types, &id)) != NULL) {
+				if (root_dentry != sb->s_root) {
+					bool match;
+
+					/* Bare inc to pin clp; get_client_locked() is
+					 * not used because its courtesy-to-active
+					 * transition is unwanted during revocation.
+					 */
+					atomic_inc(&clp->cl_rpc_users);
+					spin_unlock(&nn->client_lock);
+					match = nfsd_file_inode_is_in_subtree(
+							stid->sc_file->fi_inode,
+							root_dentry);
+					if (!match) {
+						nfs4_put_stid(stid);
+						spin_lock(&nn->client_lock);
+						put_client_renew_locked(clp);
+						id++;
+						continue;
+					}
+				} else {
+					/* Whole-sb: goto retry restarts the
+					 * client list immediately after
+					 * revocation, so clp needs no pin.
+					 */
+					spin_unlock(&nn->client_lock);
+				}
+
 				revoke_one_stid(clp, stid);
 				nfs4_put_stid(stid);
 				spin_lock(&nn->client_lock);
+				if (root_dentry != sb->s_root)
+					put_client_renew_locked(clp);
 				if (clp->cl_minorversion == 0)
 					/* Allow cleanup after a lease period.
-					 * store_release ensures cleanup will
-					 * see any newly revoked states if it
-					 * sees the time updated.
+					 * The lock/unlock pair orders this
+					 * store after revoke_one_stid(), so
+					 * nfs40_clean_admin_revoked() sees
+					 * newly revoked states if it sees
+					 * the updated time.
 					 */
 					nn->nfs40_last_revoke =
 						ktime_get_boottime_seconds();
@@ -1883,6 +1908,19 @@ void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
 	spin_unlock(&nn->client_lock);
 }
 
+/**
+ * nfsd4_revoke_states - revoke all nfsv4 states associated with given filesystem
+ * @nn:   nfsd_net identifying the nfsd instance
+ * @sb:   super_block used to identify target filesystem
+ *
+ * Convenience wrapper around nfsd4_revoke_export_states() that revokes
+ * all state on @sb by passing sb->s_root as the export root.
+ */
+void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
+{
+	nfsd4_revoke_export_states(nn, sb, sb->s_root);
+}
+
 static inline int
 hash_sessionid(struct nfs4_sessionid *sessionid)
 {
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index d3ed343699bd..d9c61c939059 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2207,7 +2207,8 @@ static int nfsd_nl_unlock_by_filesystem(struct genl_info *info)
 
 	mutex_lock(&nfsd_mutex);
 	if (nn->nfsd_serv)
-		nfsd4_revoke_states(nn, path.dentry->d_sb);
+		nfsd4_revoke_export_states(nn, path.dentry->d_sb,
+					   path.dentry);
 	else
 		error = -EINVAL;
 	mutex_unlock(&nfsd_mutex);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 6fcbf1e427d4..9e7c7884831c 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -843,11 +843,18 @@ struct nfsd_file *find_any_file(struct nfs4_file *f);
 
 #ifdef CONFIG_NFSD_V4
 void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb);
+void nfsd4_revoke_export_states(struct nfsd_net *nn, struct super_block *sb,
+				struct dentry *root_dentry);
 void nfsd4_cancel_copy_by_sb(struct net *net, struct super_block *sb);
 #else
 static inline void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
 {
 }
+static inline void nfsd4_revoke_export_states(struct nfsd_net *nn,
+					      struct super_block *sb,
+					      struct dentry *root_dentry)
+{
+}
 static inline void nfsd4_cancel_copy_by_sb(struct net *net, struct super_block *sb)
 {
 }

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 6/6] NFSD: Add nfsd_file_close_export() for file cache cleanup
  2026-03-18 14:15 [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
                   ` (4 preceding siblings ...)
  2026-03-18 14:15 ` [PATCH v4 5/6] NFSD: Add export-scoped state revocation Chuck Lever
@ 2026-03-18 14:15 ` Chuck Lever
  2026-03-18 14:24 ` [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Jeff Layton
  6 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2026-03-18 14:15 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

After NFSv4 state revocation, GC-managed entries in the
nfsd_file cache can still pin a filesystem and prevent unmount. Add
nfsd_file_close_export() to close cached files scoped to a specific
export subtree, matching the granularity of
nfsd4_revoke_export_states().

nfsd_file_close_export() walks the nfsd_file rhashtable and
disposes GC-managed entries whose backing file resides on the
target superblock and under the export root dentry. When the export
root is the filesystem root, the is_subdir() check is elided.

nfsd_nl_unlock_by_filesystem() calls nfsd_file_close_export() after
state revocation to release any remaining file cache references.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/filecache.h |  1 +
 fs/nfsd/nfsctl.c    |  6 ++++--
 3 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index cd09be0c5465..0944019ef645 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -926,6 +926,51 @@ bool nfsd_file_inode_is_in_subtree(struct inode *inode,
 	return found;
 }
 
+/**
+ * nfsd_file_close_export - close cached files for an export
+ * @sb:           super_block of the export's filesystem
+ * @root_dentry:  dentry of the export root directory
+ *
+ * Walk the nfsd_file cache and close GC-managed entries whose files
+ * reside under @root_dentry on @sb.  When @root_dentry is the
+ * filesystem root, all GC-managed entries on @sb are closed.
+ *
+ * Non-GC entries are opened by NFSv4, which closes them explicitly
+ * via CLOSE or state revocation; they require no cache-level flush.
+ */
+void nfsd_file_close_export(struct super_block *sb, struct dentry *root_dentry)
+{
+	struct rhashtable_iter iter;
+	struct nfsd_file *nf;
+	LIST_HEAD(dispose);
+
+	lockdep_assert_held(&nfsd_mutex);
+	if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0)
+		return;
+
+	rhltable_walk_enter(&nfsd_file_rhltable, &iter);
+	do {
+		rhashtable_walk_start(&iter);
+
+		nf = rhashtable_walk_next(&iter);
+		while (!IS_ERR_OR_NULL(nf)) {
+			if (test_bit(NFSD_FILE_GC, &nf->nf_flags) &&
+			    nf->nf_file &&
+			    file_inode(nf->nf_file)->i_sb == sb &&
+			    (root_dentry == sb->s_root ||
+			     nfsd_file_inode_is_in_subtree(
+				file_inode(nf->nf_file), root_dentry)))
+				nfsd_file_cond_queue(nf, &dispose);
+			nf = rhashtable_walk_next(&iter);
+		}
+
+		rhashtable_walk_stop(&iter);
+	} while (nf == ERR_PTR(-EAGAIN));
+	rhashtable_walk_exit(&iter);
+
+	nfsd_file_dispose_list(&dispose);
+}
+
 static struct nfsd_fcache_disposal *
 nfsd_alloc_fcache_disposal(void)
 {
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 36c9a8e388d2..c324f85b495f 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -71,6 +71,7 @@ struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
 struct file *nfsd_file_file(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
 bool nfsd_file_inode_is_in_subtree(struct inode *inode, struct dentry *root_dentry);
+void nfsd_file_close_export(struct super_block *sb, struct dentry *root_dentry);
 void nfsd_file_net_dispose(struct nfsd_net *nn);
 bool nfsd_file_is_cached(struct inode *inode);
 __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index d9c61c939059..55392f2d7882 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2206,11 +2206,13 @@ static int nfsd_nl_unlock_by_filesystem(struct genl_info *info)
 	error = nlmsvc_unlock_all_by_sb(path.dentry->d_sb);
 
 	mutex_lock(&nfsd_mutex);
-	if (nn->nfsd_serv)
+	if (nn->nfsd_serv) {
 		nfsd4_revoke_export_states(nn, path.dentry->d_sb,
 					   path.dentry);
-	else
+		nfsd_file_close_export(path.dentry->d_sb, path.dentry);
+	} else {
 		error = -EINVAL;
+	}
 	mutex_unlock(&nfsd_mutex);
 
 	path_put(&path);

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/6] NFSD: Extract revoke_one_stid() utility function
  2026-03-18 14:15 ` [PATCH v4 1/6] NFSD: Extract revoke_one_stid() utility function Chuck Lever
@ 2026-03-18 14:21   ` Jeff Layton
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2026-03-18 14:21 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

On Wed, 2026-03-18 at 10:15 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> The per-stateid revocation logic in nfsd4_revoke_states() handles
> four stateid types in a deeply nested switch. Extract two helpers:
> 
> revoke_ol_stid() performs admin-revocation of an open or lock
> stateid with st_mutex already held: marks the stateid as
> SC_STATUS_ADMIN_REVOKED, closes POSIX locks for lock stateids,
> and releases file access.
> 
> revoke_one_stid() dispatches by sc_type, acquires st_mutex with
> the appropriate lockdep class for open and lock stateids, and
> handles delegation unhash and layout close inline.
> 
> No functional change. Preparation for adding export-scoped state
> revocation which reuses revoke_one_stid().
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4state.c | 126 ++++++++++++++++++++++++++--------------------------
>  1 file changed, 63 insertions(+), 63 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 6b9c399b89df..d476192e4b27 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1757,6 +1757,68 @@ static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
>  	return stid;
>  }
>  
> +static void revoke_ol_stid(struct nfs4_client *clp,
> +			   struct nfs4_ol_stateid *stp)
> +{
> +	struct nfs4_stid *stid = &stp->st_stid;
> +
> +	lockdep_assert_held(&stp->st_mutex);
> +	spin_lock(&clp->cl_lock);
> +	if (stid->sc_status == 0) {
> +		stid->sc_status |= SC_STATUS_ADMIN_REVOKED;
> +		atomic_inc(&clp->cl_admin_revoked);
> +		spin_unlock(&clp->cl_lock);
> +		if (stid->sc_type == SC_TYPE_LOCK) {
> +			struct nfs4_lockowner *lo =
> +				lockowner(stp->st_stateowner);
> +			struct nfsd_file *nf;
> +
> +			nf = find_any_file(stp->st_stid.sc_file);
> +			if (nf) {
> +				get_file(nf->nf_file);
> +				filp_close(nf->nf_file, (fl_owner_t)lo);
> +				nfsd_file_put(nf);
> +			}
> +		}
> +		release_all_access(stp);
> +	} else
> +		spin_unlock(&clp->cl_lock);
> +}
> +
> +static void revoke_one_stid(struct nfs4_client *clp, struct nfs4_stid *stid)
> +{
> +	struct nfs4_ol_stateid *stp;
> +	struct nfs4_delegation *dp;
> +
> +	switch (stid->sc_type) {
> +	case SC_TYPE_OPEN:
> +		stp = openlockstateid(stid);
> +		mutex_lock_nested(&stp->st_mutex, OPEN_STATEID_MUTEX);
> +		revoke_ol_stid(clp, stp);
> +		mutex_unlock(&stp->st_mutex);
> +		break;
> +	case SC_TYPE_LOCK:
> +		stp = openlockstateid(stid);
> +		mutex_lock_nested(&stp->st_mutex, LOCK_STATEID_MUTEX);
> +		revoke_ol_stid(clp, stp);
> +		mutex_unlock(&stp->st_mutex);
> +		break;
> +	case SC_TYPE_DELEG:
> +		refcount_inc(&stid->sc_count);
> +		dp = delegstateid(stid);
> +		spin_lock(&state_lock);
> +		if (!unhash_delegation_locked(dp, SC_STATUS_ADMIN_REVOKED))
> +			dp = NULL;
> +		spin_unlock(&state_lock);
> +		if (dp)
> +			revoke_delegation(dp);
> +		break;
> +	case SC_TYPE_LAYOUT:
> +		nfsd4_close_layout(layoutstateid(stid));
> +		break;
> +	}
> +}
> +
>  /**
>   * nfsd4_revoke_states - revoke all nfsv4 states associated with given filesystem
>   * @nn:   used to identify instance of nfsd (there is one per net namespace)
> @@ -1787,70 +1849,8 @@ void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
>  			struct nfs4_stid *stid = find_one_sb_stid(clp, sb,
>  								  sc_types);
>  			if (stid) {
> -				struct nfs4_ol_stateid *stp;
> -				struct nfs4_delegation *dp;
> -				struct nfs4_layout_stateid *ls;
> -
>  				spin_unlock(&nn->client_lock);
> -				switch (stid->sc_type) {
> -				case SC_TYPE_OPEN:
> -					stp = openlockstateid(stid);
> -					mutex_lock_nested(&stp->st_mutex,
> -							  OPEN_STATEID_MUTEX);
> -
> -					spin_lock(&clp->cl_lock);
> -					if (stid->sc_status == 0) {
> -						stid->sc_status |=
> -							SC_STATUS_ADMIN_REVOKED;
> -						atomic_inc(&clp->cl_admin_revoked);
> -						spin_unlock(&clp->cl_lock);
> -						release_all_access(stp);
> -					} else
> -						spin_unlock(&clp->cl_lock);
> -					mutex_unlock(&stp->st_mutex);
> -					break;
> -				case SC_TYPE_LOCK:
> -					stp = openlockstateid(stid);
> -					mutex_lock_nested(&stp->st_mutex,
> -							  LOCK_STATEID_MUTEX);
> -					spin_lock(&clp->cl_lock);
> -					if (stid->sc_status == 0) {
> -						struct nfs4_lockowner *lo =
> -							lockowner(stp->st_stateowner);
> -						struct nfsd_file *nf;
> -
> -						stid->sc_status |=
> -							SC_STATUS_ADMIN_REVOKED;
> -						atomic_inc(&clp->cl_admin_revoked);
> -						spin_unlock(&clp->cl_lock);
> -						nf = find_any_file(stp->st_stid.sc_file);
> -						if (nf) {
> -							get_file(nf->nf_file);
> -							filp_close(nf->nf_file,
> -								   (fl_owner_t)lo);
> -							nfsd_file_put(nf);
> -						}
> -						release_all_access(stp);
> -					} else
> -						spin_unlock(&clp->cl_lock);
> -					mutex_unlock(&stp->st_mutex);
> -					break;
> -				case SC_TYPE_DELEG:
> -					refcount_inc(&stid->sc_count);
> -					dp = delegstateid(stid);
> -					spin_lock(&state_lock);
> -					if (!unhash_delegation_locked(
> -						    dp, SC_STATUS_ADMIN_REVOKED))
> -						dp = NULL;
> -					spin_unlock(&state_lock);
> -					if (dp)
> -						revoke_delegation(dp);
> -					break;
> -				case SC_TYPE_LAYOUT:
> -					ls = layoutstateid(stid);
> -					nfsd4_close_layout(ls);
> -					break;
> -				}
> +				revoke_one_stid(clp, stid);
>  				nfs4_put_stid(stid);
>  				spin_lock(&nn->client_lock);
>  				if (clp->cl_minorversion == 0)

Much nicer anyway.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount
  2026-03-18 14:15 [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
                   ` (5 preceding siblings ...)
  2026-03-18 14:15 ` [PATCH v4 6/6] NFSD: Add nfsd_file_close_export() for file cache cleanup Chuck Lever
@ 2026-03-18 14:24 ` Jeff Layton
  6 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2026-03-18 14:24 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

On Wed, 2026-03-18 at 10:15 -0400, Chuck Lever wrote:
> When an NFS server exports a filesystem and clients hold NFSv4
> state (opens, locks, delegations), unmounting the underlying
> filesystem fails with EBUSY. The /proc/fs/nfsd/unlock_fs procfs
> interface revokes all NFSv4 state on a given filesystem's
> superblock, but it operates at whole-filesystem granularity and
> offers no structured way to extend the operation to other scopes.
> 
> This series adds an NFSD_CMD_UNLOCK netlink command that supports
> typed, validated attributes and multiple scope values. The
> initial "ip" scope provides a netlink equivalent of
> write_unlock_ip, releasing NLM locks held by a specific client
> address. A "filesystem" scope parallels write_unlock_fs, accepting
> a path string and revoking NFSv4 state, NLM locks, async COPY
> operations, and cached file handles associated with that path.
> 
> The filesystem scope revokes state at export granularity rather
> than superblock granularity: when multiple exports share a
> filesystem, only state referencing files under the specified
> export root is affected. The subtree check walks all dentry
> aliases of each inode so that hard-linked files outside the
> export are not incorrectly matched. When the export root is the
> filesystem root, the subtree check is elided.
> 
> The exportfs user space command can invoke NFSD_CMD_UNLOCK with
> the filesystem scope after an explicit "exportfs -u", enabling
> automated state cleanup without custom scripting or manual
> procfs writes.
> 

I'll note that the mountd/exportd/exportfs nfs-utils patchset that I
sent recently adds basic netlink support to exportfs. The userland bits
for this should nestle on top of that nicely.

> ---
> Changes since v3:
> - All VFS changes replaced with new netlink "unlock" operation
> 
> Changes since v2:
> - Replace fs_pin with an SRCU umount notifier chain in VFS
> - Merge the pending COPY cancellation patch
> - Replace xa_cmpxchg() with xa_insert()
> - Use cancel_work_sync() instead of flush_workqueue()
> - Remove rcu_barrier()
> - Correct misleading claims in kdoc comments and commit messages
> 
> Changes since v1:
> - Explain why drop_client() is being renamed
> - Finish implementing revocation on umount
> - Rename pin_insert_group
> - Clarified log output and code comments
> - Hold nfsd_mutex while closing nfsd_files
> 
> ---
> Chuck Lever (6):
>       NFSD: Extract revoke_one_stid() utility function
>       NFSD: Add NFSD_CMD_UNLOCK netlink command with ip scope
>       NFSD: Add filesystem scope to NFSD_CMD_UNLOCK
>       NFSD: Refactor find_one_sb_stid() into find_next_sb_stid()
>       NFSD: Add export-scoped state revocation
>       NFSD: Add nfsd_file_close_export() for file cache cleanup
> 
>  Documentation/netlink/specs/nfsd.yaml |  39 ++++++
>  fs/nfsd/filecache.c                   |  77 ++++++++++++
>  fs/nfsd/filecache.h                   |   2 +
>  fs/nfsd/netlink.c                     |  14 +++
>  fs/nfsd/netlink.h                     |   1 +
>  fs/nfsd/nfs4state.c                   | 219 +++++++++++++++++++++-------------
>  fs/nfsd/nfsctl.c                      | 105 +++++++++++++++-
>  fs/nfsd/state.h                       |   7 ++
>  fs/nfsd/trace.h                       |  13 +-
>  include/uapi/linux/nfsd_netlink.h     |  19 +++
>  10 files changed, 406 insertions(+), 90 deletions(-)
> ---
> base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c
> change-id: 20260318-umount-kills-nfsv4-state-138218f2f4e0
> 
> Best regards,
> --  
> Chuck Lever

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/6] NFSD: Add NFSD_CMD_UNLOCK netlink command with ip scope
  2026-03-18 14:15 ` [PATCH v4 2/6] NFSD: Add NFSD_CMD_UNLOCK netlink command with ip scope Chuck Lever
@ 2026-03-18 14:28   ` Jeff Layton
  2026-03-18 14:32     ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2026-03-18 14:28 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

On Wed, 2026-03-18 at 10:15 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> The existing write_unlock_ip procfs interface releases NLM file
> locks held by a specific client IP address, but procfs provides
> no structured way to extend that operation to other scopes such
> as revoking NFSv4 state. A netlink command allows the operation
> to carry typed, validated attributes and supports future scope
> values without interface proliferation.
> 
> NFSD_CMD_UNLOCK accepts an unlock-type attribute selecting the
> scope and an address attribute carrying a binary sockaddr_in
> or sockaddr_in6. The handler validates the address family
> and length, then calls nlmsvc_unlock_all_by_ip() to release
> matching NLM locks. Because lockd is a single global instance,
> that call operates across all network namespaces regardless of
> which namespace the caller inhabits. The command requires admin
> privileges via GENL_ADMIN_PERM.
> 
> The unlock-type enum begins with a single value, ip, and is
> defined with render-max so that future values can be added
> without breaking existing userspace.
> 
> The nfsd_ctl_unlock_ip tracepoint is updated from string-based
> address logging to __sockaddr, which stores the binary sockaddr
> and formats it with %pISpc. This affects both the new netlink
> path and the existing procfs write_unlock_ip path, giving
> consistent structured output in both cases.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  Documentation/netlink/specs/nfsd.yaml | 32 ++++++++++++++++++
>  fs/nfsd/netlink.c                     | 13 ++++++++
>  fs/nfsd/netlink.h                     |  1 +
>  fs/nfsd/nfsctl.c                      | 63 ++++++++++++++++++++++++++++++++++-
>  fs/nfsd/trace.h                       | 13 ++++----
>  include/uapi/linux/nfsd_netlink.h     | 17 ++++++++++
>  6 files changed, 132 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/netlink/specs/nfsd.yaml b/Documentation/netlink/specs/nfsd.yaml
> index f87b5a05e5e9..02fadfca22ba 100644
> --- a/Documentation/netlink/specs/nfsd.yaml
> +++ b/Documentation/netlink/specs/nfsd.yaml
> @@ -6,6 +6,13 @@ uapi-header: linux/nfsd_netlink.h
>  
>  doc: NFSD configuration over generic netlink.
>  
> +definitions:
> +  -
> +    type: enum
> +    name: unlock-type
> +    render-max: true
> +    entries: [ip]
> +
>  attribute-sets:
>    -
>      name: rpc-status
> @@ -127,6 +134,21 @@ attribute-sets:
>        -
>          name: npools
>          type: u32
> +  -
> +    name: unlock
> +    attributes:
> +      -
> +        name: type
> +        type: u32
> +        enum: unlock-type
> +      -
> +        name: address
> +        type: binary
> +        doc: >-
> +          struct sockaddr_in or struct sockaddr_in6.
> +          Required when type is ip.
> +        checks:
> +          min-len: 16
>  
>  operations:
>    list:
> @@ -227,3 +249,13 @@ operations:
>            attributes:
>              - mode
>              - npools
> +    -
> +      name: unlock
> +      doc: release NLM locks by scope
> +      attribute-set: unlock
> +      flags: [admin-perm]
> +      do:
> +        request:
> +          attributes:
> +            - type
> +            - address

I wonder if we'd be better served with different commands instead of
passing a type value to a single command? Different types are going to
require different attributes, and it'll be easier to validate those if
they use different commands.

> diff --git a/fs/nfsd/netlink.c b/fs/nfsd/netlink.c
> index 887525964451..9ec0d56eaa21 100644
> --- a/fs/nfsd/netlink.c
> +++ b/fs/nfsd/netlink.c
> @@ -47,6 +47,12 @@ static const struct nla_policy nfsd_pool_mode_set_nl_policy[NFSD_A_POOL_MODE_MOD
>  	[NFSD_A_POOL_MODE_MODE] = { .type = NLA_NUL_STRING, },
>  };
>  
> +/* NFSD_CMD_UNLOCK - do */
> +static const struct nla_policy nfsd_unlock_nl_policy[NFSD_A_UNLOCK_ADDRESS + 1] = {
> +	[NFSD_A_UNLOCK_TYPE] = NLA_POLICY_MAX(NLA_U32, 0),
> +	[NFSD_A_UNLOCK_ADDRESS] = NLA_POLICY_MIN_LEN(16),
> +};
> +
>  /* Ops table for nfsd */
>  static const struct genl_split_ops nfsd_nl_ops[] = {
>  	{
> @@ -102,6 +108,13 @@ static const struct genl_split_ops nfsd_nl_ops[] = {
>  		.doit	= nfsd_nl_pool_mode_get_doit,
>  		.flags	= GENL_CMD_CAP_DO,
>  	},
> +	{
> +		.cmd		= NFSD_CMD_UNLOCK,
> +		.doit		= nfsd_nl_unlock_doit,
> +		.policy		= nfsd_unlock_nl_policy,
> +		.maxattr	= NFSD_A_UNLOCK_ADDRESS,
> +		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
> +	},
>  };
>  
>  struct genl_family nfsd_nl_family __ro_after_init = {
> diff --git a/fs/nfsd/netlink.h b/fs/nfsd/netlink.h
> index 478117ff6b8c..3ece774e5f52 100644
> --- a/fs/nfsd/netlink.h
> +++ b/fs/nfsd/netlink.h
> @@ -26,6 +26,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info);
>  int nfsd_nl_listener_get_doit(struct sk_buff *skb, struct genl_info *info);
>  int nfsd_nl_pool_mode_set_doit(struct sk_buff *skb, struct genl_info *info);
>  int nfsd_nl_pool_mode_get_doit(struct sk_buff *skb, struct genl_info *info);
> +int nfsd_nl_unlock_doit(struct sk_buff *skb, struct genl_info *info);
>  
>  extern struct genl_family nfsd_nl_family;
>  
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 4cc8a58fa56a..858f3803c490 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -236,7 +236,7 @@ static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size)
>  	if (rpc_pton(net, fo_path, size, sap, salen) == 0)
>  		return -EINVAL;
>  
> -	trace_nfsd_ctl_unlock_ip(net, buf);
> +	trace_nfsd_ctl_unlock_ip(net, sap, svc_addr_len(sap));
>  	return nlmsvc_unlock_all_by_ip(sap);
>  }
>  
> @@ -2142,6 +2142,67 @@ int nfsd_nl_pool_mode_get_doit(struct sk_buff *skb, struct genl_info *info)
>  	return err;
>  }
>  
> +/**
> + * nfsd_nl_unlock_by_ip - release NLM locks held by an IP address
> + * @info: netlink metadata and command arguments
> + *
> + * Return: 0 on success or a negative errno.
> + */
> +static int nfsd_nl_unlock_by_ip(struct genl_info *info)
> +{
> +	struct sockaddr *sap;
> +
> +	if (GENL_REQ_ATTR_CHECK(info, NFSD_A_UNLOCK_ADDRESS))
> +		return -EINVAL;
> +	sap = nla_data(info->attrs[NFSD_A_UNLOCK_ADDRESS]);
> +	switch (sap->sa_family) {
> +	case AF_INET:
> +		if (nla_len(info->attrs[NFSD_A_UNLOCK_ADDRESS]) <
> +		    sizeof(struct sockaddr_in))
> +			return -EINVAL;
> +		break;
> +	case AF_INET6:
> +		if (nla_len(info->attrs[NFSD_A_UNLOCK_ADDRESS]) <
> +		    sizeof(struct sockaddr_in6))
> +			return -EINVAL;
> +		break;
> +	default:
> +		return -EAFNOSUPPORT;
> +	}
> +	/*
> +	 * nlmsvc_unlock_all_by_ip() releases matching locks
> +	 * across all network namespaces because lockd operates
> +	 * a single global instance.
> +	 */
> +	trace_nfsd_ctl_unlock_ip(genl_info_net(info), sap,
> +				 svc_addr_len(sap));
> +	return nlmsvc_unlock_all_by_ip(sap);
> +}
> +
> +/**
> + * nfsd_nl_unlock_doit - release NLM locks by scope
> + * @skb: reply buffer
> + * @info: netlink metadata and command arguments
> + *
> + * Return: 0 on success or a negative errno.
> + */
> +int nfsd_nl_unlock_doit(struct sk_buff *skb, struct genl_info *info)
> +{
> +	u32 type;
> +
> +	if (GENL_REQ_ATTR_CHECK(info, NFSD_A_UNLOCK_TYPE))
> +		return -EINVAL;
> +
> +	type = nla_get_u32(info->attrs[NFSD_A_UNLOCK_TYPE]);
> +
> +	switch (type) {
> +	case NFSD_UNLOCK_TYPE_IP:
> +		return nfsd_nl_unlock_by_ip(info);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
>  /**
>   * nfsd_net_init - Prepare the nfsd_net portion of a new net namespace
>   * @net: a freshly-created network namespace
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index d1d0b0dd0545..c770c5e6b1e7 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -1984,19 +1984,20 @@ TRACE_EVENT(nfsd_cb_recall_any_done,
>  TRACE_EVENT(nfsd_ctl_unlock_ip,
>  	TP_PROTO(
>  		const struct net *net,
> -		const char *address
> +		const struct sockaddr *addr,
> +		const unsigned int addrlen
>  	),
> -	TP_ARGS(net, address),
> +	TP_ARGS(net, addr, addrlen),
>  	TP_STRUCT__entry(
>  		__field(unsigned int, netns_ino)
> -		__string(address, address)
> +		__sockaddr(addr, addrlen)
>  	),
>  	TP_fast_assign(
>  		__entry->netns_ino = net->ns.inum;
> -		__assign_str(address);
> +		__assign_sockaddr(addr, addr, addrlen);
>  	),
> -	TP_printk("address=%s",
> -		__get_str(address)
> +	TP_printk("addr=%pISpc",
> +		__get_sockaddr(addr)
>  	)
>  );
>  
> diff --git a/include/uapi/linux/nfsd_netlink.h b/include/uapi/linux/nfsd_netlink.h
> index e9efbc9e63d8..8edd75590f31 100644
> --- a/include/uapi/linux/nfsd_netlink.h
> +++ b/include/uapi/linux/nfsd_netlink.h
> @@ -10,6 +10,14 @@
>  #define NFSD_FAMILY_NAME	"nfsd"
>  #define NFSD_FAMILY_VERSION	1
>  
> +enum nfsd_unlock_type {
> +	NFSD_UNLOCK_TYPE_IP,
> +
> +	/* private: */
> +	__NFSD_UNLOCK_TYPE_MAX,
> +	NFSD_UNLOCK_TYPE_MAX = (__NFSD_UNLOCK_TYPE_MAX - 1)
> +};
> +
>  enum {
>  	NFSD_A_RPC_STATUS_XID = 1,
>  	NFSD_A_RPC_STATUS_FLAGS,
> @@ -80,6 +88,14 @@ enum {
>  	NFSD_A_POOL_MODE_MAX = (__NFSD_A_POOL_MODE_MAX - 1)
>  };
>  
> +enum {
> +	NFSD_A_UNLOCK_TYPE = 1,
> +	NFSD_A_UNLOCK_ADDRESS,
> +
> +	__NFSD_A_UNLOCK_MAX,
> +	NFSD_A_UNLOCK_MAX = (__NFSD_A_UNLOCK_MAX - 1)
> +};
> +
>  enum {
>  	NFSD_CMD_RPC_STATUS_GET = 1,
>  	NFSD_CMD_THREADS_SET,
> @@ -90,6 +106,7 @@ enum {
>  	NFSD_CMD_LISTENER_GET,
>  	NFSD_CMD_POOL_MODE_SET,
>  	NFSD_CMD_POOL_MODE_GET,
> +	NFSD_CMD_UNLOCK,
>  
>  	__NFSD_CMD_MAX,
>  	NFSD_CMD_MAX = (__NFSD_CMD_MAX - 1)

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 3/6] NFSD: Add filesystem scope to NFSD_CMD_UNLOCK
  2026-03-18 14:15 ` [PATCH v4 3/6] NFSD: Add filesystem scope to NFSD_CMD_UNLOCK Chuck Lever
@ 2026-03-18 14:29   ` Jeff Layton
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2026-03-18 14:29 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

On Wed, 2026-03-18 at 10:15 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Add NFSD_UNLOCK_TYPE_FILESYSTEM to the NFSD_CMD_UNLOCK netlink
> command, providing a netlink equivalent of /proc/fs/nfsd/unlock_fs.
> 
> The filesystem scope requires a "path" string attribute containing
> the filesystem path whose state should be released. The handler
> resolves the path to its superblock, then cancels async copies,
> releases NLM locks, and revokes NFSv4 state on that superblock --
> the same operations performed by write_unlock_fs().
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  Documentation/netlink/specs/nfsd.yaml | 11 ++++++++--
>  fs/nfsd/netlink.c                     |  7 +++---
>  fs/nfsd/nfsctl.c                      | 41 ++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/nfsd_netlink.h     |  2 ++
>  4 files changed, 55 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/netlink/specs/nfsd.yaml b/Documentation/netlink/specs/nfsd.yaml
> index 02fadfca22ba..1083ef60cac3 100644
> --- a/Documentation/netlink/specs/nfsd.yaml
> +++ b/Documentation/netlink/specs/nfsd.yaml
> @@ -11,7 +11,7 @@ definitions:
>      type: enum
>      name: unlock-type
>      render-max: true
> -    entries: [ip]
> +    entries: [ip, filesystem]
>  
>  attribute-sets:
>    -
> @@ -149,6 +149,12 @@ attribute-sets:
>            Required when type is ip.
>          checks:
>            min-len: 16
> +      -
> +        name: path
> +        type: string
> +        doc: >-
> +          Filesystem path whose state should be released.
> +          Required when type is filesystem.
>  
>  operations:
>    list:
> @@ -251,7 +257,7 @@ operations:
>              - npools
>      -
>        name: unlock
> -      doc: release NLM locks by scope
> +      doc: release locks or revoke NFS state by scope
>        attribute-set: unlock
>        flags: [admin-perm]
>        do:
> @@ -259,3 +265,4 @@ operations:
>            attributes:
>              - type
>              - address
> +            - path
> diff --git a/fs/nfsd/netlink.c b/fs/nfsd/netlink.c
> index 9ec0d56eaa21..8367d4e3fe4f 100644
> --- a/fs/nfsd/netlink.c
> +++ b/fs/nfsd/netlink.c
> @@ -48,9 +48,10 @@ static const struct nla_policy nfsd_pool_mode_set_nl_policy[NFSD_A_POOL_MODE_MOD
>  };
>  
>  /* NFSD_CMD_UNLOCK - do */
> -static const struct nla_policy nfsd_unlock_nl_policy[NFSD_A_UNLOCK_ADDRESS + 1] = {
> -	[NFSD_A_UNLOCK_TYPE] = NLA_POLICY_MAX(NLA_U32, 0),
> +static const struct nla_policy nfsd_unlock_nl_policy[NFSD_A_UNLOCK_PATH + 1] = {
> +	[NFSD_A_UNLOCK_TYPE] = NLA_POLICY_MAX(NLA_U32, NFSD_UNLOCK_TYPE_MAX),
>  	[NFSD_A_UNLOCK_ADDRESS] = NLA_POLICY_MIN_LEN(16),
> +	[NFSD_A_UNLOCK_PATH] = { .type = NLA_NUL_STRING, .len = PATH_MAX - 1, },
>  };
>  
>  /* Ops table for nfsd */
> @@ -112,7 +113,7 @@ static const struct genl_split_ops nfsd_nl_ops[] = {
>  		.cmd		= NFSD_CMD_UNLOCK,
>  		.doit		= nfsd_nl_unlock_doit,
>  		.policy		= nfsd_unlock_nl_policy,
> -		.maxattr	= NFSD_A_UNLOCK_ADDRESS,
> +		.maxattr	= NFSD_A_UNLOCK_PATH,
>  		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
>  	},
>  };
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 858f3803c490..d3ed343699bd 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -2180,7 +2180,44 @@ static int nfsd_nl_unlock_by_ip(struct genl_info *info)
>  }
>  
>  /**
> - * nfsd_nl_unlock_doit - release NLM locks by scope
> + * nfsd_nl_unlock_by_filesystem - release locks and state on a filesystem
> + * @info: netlink metadata and command arguments
> + *
> + * Return: 0 on success or a negative errno.
> + */
> +static int nfsd_nl_unlock_by_filesystem(struct genl_info *info)
> +{
> +	struct net *net = genl_info_net(info);
> +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> +	struct path path;
> +	int error;
> +
> +	if (GENL_REQ_ATTR_CHECK(info, NFSD_A_UNLOCK_PATH))
> +		return -EINVAL;
> +
> +	trace_nfsd_ctl_unlock_fs(net,
> +				 nla_data(info->attrs[NFSD_A_UNLOCK_PATH]));
> +	error = kern_path(nla_data(info->attrs[NFSD_A_UNLOCK_PATH]),
> +			  0, &path);
> +	if (error)
> +		return error;
> +
> +	nfsd4_cancel_copy_by_sb(net, path.dentry->d_sb);
> +	error = nlmsvc_unlock_all_by_sb(path.dentry->d_sb);
> +
> +	mutex_lock(&nfsd_mutex);
> +	if (nn->nfsd_serv)
> +		nfsd4_revoke_states(nn, path.dentry->d_sb);
> +	else
> +		error = -EINVAL;
> +	mutex_unlock(&nfsd_mutex);
> +
> +	path_put(&path);
> +	return error;
> +}
> +
> +/**
> + * nfsd_nl_unlock_doit - release locks or revoke NFS state
>   * @skb: reply buffer
>   * @info: netlink metadata and command arguments
>   *
> @@ -2198,6 +2235,8 @@ int nfsd_nl_unlock_doit(struct sk_buff *skb, struct genl_info *info)
>  	switch (type) {
>  	case NFSD_UNLOCK_TYPE_IP:
>  		return nfsd_nl_unlock_by_ip(info);
> +	case NFSD_UNLOCK_TYPE_FILESYSTEM:
> +		return nfsd_nl_unlock_by_filesystem(info);
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/include/uapi/linux/nfsd_netlink.h b/include/uapi/linux/nfsd_netlink.h
> index 8edd75590f31..340ad36080fe 100644
> --- a/include/uapi/linux/nfsd_netlink.h
> +++ b/include/uapi/linux/nfsd_netlink.h
> @@ -12,6 +12,7 @@
>  
>  enum nfsd_unlock_type {
>  	NFSD_UNLOCK_TYPE_IP,
> +	NFSD_UNLOCK_TYPE_FILESYSTEM,
>  
>  	/* private: */
>  	__NFSD_UNLOCK_TYPE_MAX,
> @@ -91,6 +92,7 @@ enum {
>  enum {
>  	NFSD_A_UNLOCK_TYPE = 1,
>  	NFSD_A_UNLOCK_ADDRESS,
> +	NFSD_A_UNLOCK_PATH,
>  
>  	__NFSD_A_UNLOCK_MAX,
>  	NFSD_A_UNLOCK_MAX = (__NFSD_A_UNLOCK_MAX - 1)

Yeah, following up on my last mail. I think it would be cleaner to just
implement a new UNLOCK_FILESYSTEM command instead of overloading the
one that unlocks by IP.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 4/6] NFSD: Refactor find_one_sb_stid() into find_next_sb_stid()
  2026-03-18 14:15 ` [PATCH v4 4/6] NFSD: Refactor find_one_sb_stid() into find_next_sb_stid() Chuck Lever
@ 2026-03-18 14:30   ` Jeff Layton
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2026-03-18 14:30 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

On Wed, 2026-03-18 at 10:15 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Export-scoped state revocation, added in a subsequent commit,
> needs to advance past non-matching stids within the same
> client without restarting from position zero on each call.
> 
> Extract find_next_sb_stid(), which accepts a starting IDR
> position and can resume iteration across multiple calls.
> find_one_sb_stid() becomes a thin wrapper that starts from
> position zero.
> 
> No change in behavior.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4state.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index d476192e4b27..891669b32804 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1738,25 +1738,40 @@ static void release_openowner(struct nfs4_openowner *oo)
>  	nfs4_put_stateowner(&oo->oo_owner);
>  }
>  
> -static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
> -					  struct super_block *sb,
> -					  unsigned int sc_types)
> +/*
> + * On return of a non-NULL stid, @id holds that entry's IDR key;
> + * caller must increment @id before the next call to advance past it.
> + */
> +static struct nfs4_stid *find_next_sb_stid(struct nfs4_client *clp,
> +					   struct super_block *sb,
> +					   unsigned int sc_types,
> +					   unsigned long *id)
>  {
> -	unsigned long id, tmp;
>  	struct nfs4_stid *stid;
>  
>  	spin_lock(&clp->cl_lock);
> -	idr_for_each_entry_ul(&clp->cl_stateids, stid, tmp, id)
> +	while ((stid = idr_get_next_ul(&clp->cl_stateids, id)) != NULL) {
>  		if ((stid->sc_type & sc_types) &&
>  		    stid->sc_status == 0 &&
>  		    stid->sc_file->fi_inode->i_sb == sb) {
>  			refcount_inc(&stid->sc_count);
>  			break;
>  		}
> +		(*id)++;
> +	}
>  	spin_unlock(&clp->cl_lock);
>  	return stid;
>  }
>  
> +static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
> +					  struct super_block *sb,
> +					  unsigned int sc_types)
> +{
> +	unsigned long id = 0;
> +
> +	return find_next_sb_stid(clp, sb, sc_types, &id);
> +}
> +
>  static void revoke_ol_stid(struct nfs4_client *clp,
>  			   struct nfs4_ol_stateid *stp)
>  {

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/6] NFSD: Add NFSD_CMD_UNLOCK netlink command with ip scope
  2026-03-18 14:28   ` Jeff Layton
@ 2026-03-18 14:32     ` Chuck Lever
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2026-03-18 14:32 UTC (permalink / raw)
  To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

On 3/18/26 10:28 AM, Jeff Layton wrote:
> On Wed, 2026-03-18 at 10:15 -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>

>> @@ -227,3 +249,13 @@ operations:
>>            attributes:
>>              - mode
>>              - npools
>> +    -
>> +      name: unlock
>> +      doc: release NLM locks by scope
>> +      attribute-set: unlock
>> +      flags: [admin-perm]
>> +      do:
>> +        request:
>> +          attributes:
>> +            - type
>> +            - address
> 
> I wonder if we'd be better served with different commands instead of
> passing a type value to a single command? Different types are going to
> require different attributes, and it'll be easier to validate those if
> they use different commands.

I was following your philosophy about the THREADS command. But it's
correct that the YAML/netlink infrastructure struggles a bit when the
arguments for each of the subcommands are so different from each other.
I don't think it would be difficult to break UNLOCK into three separate
unlock netlink commands.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 5/6] NFSD: Add export-scoped state revocation
  2026-03-18 14:15 ` [PATCH v4 5/6] NFSD: Add export-scoped state revocation Chuck Lever
@ 2026-03-18 14:47   ` Jeff Layton
  2026-03-18 14:51     ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2026-03-18 14:47 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

On Wed, 2026-03-18 at 10:15 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> nfsd4_revoke_states() revokes all NFSv4 state on an entire
> superblock, which is too coarse when multiple exports share a
> filesystem. Add nfsd4_revoke_export_states() to revoke only
> state associated with files under a specific export root, then
> convert nfsd4_revoke_states() to a thin wrapper that passes
> sb->s_root.
> 
> nfsd4_revoke_export_states() uses find_next_sb_stid() to locate
> candidate stids, then verifies each against the export root via
> nfsd_file_inode_is_in_subtree(). That helper is placed in the
> file cache layer (filecache.c) because it operates on VFS types
> with no NFSv4 state dependency. It walks all of an inode's
> dentry aliases rather than calling d_find_any_alias(), because
> for hard-linked files an arbitrary alias may fall outside the
> export subtree even when another alias is inside it.
> 
> When the export root is the filesystem root, the subtree check
> is elided and every stid matching the superblock is revoked
> directly.
> 
> The NFSD_UNLOCK_TYPE_FILESYSTEM handler now calls
> nfsd4_revoke_export_states() with the resolved path dentry,
> enabling subtree-scoped revocation through the netlink
> interface.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/filecache.c | 32 +++++++++++++++++++
>  fs/nfsd/filecache.h |  1 +
>  fs/nfsd/nfs4state.c | 92 +++++++++++++++++++++++++++++++++++++----------------
>  fs/nfsd/nfsctl.c    |  3 +-
>  fs/nfsd/state.h     |  7 ++++
>  5 files changed, 107 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 1e2b38ed1d35..cd09be0c5465 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -894,6 +894,38 @@ __nfsd_file_cache_purge(struct net *net)
>  	nfsd_file_dispose_list(&dispose);
>  }
>  
> +/**
> + * nfsd_file_inode_is_in_subtree - check whether an inode is under a subtree
> + * @inode:        inode to test
> + * @root_dentry:  dentry of the subtree root
> + *
> + * Check whether @inode has any dentry alias that falls within the
> + * subtree rooted at @root_dentry.  Hard-linked files can have aliases
> + * in multiple directories, so all aliases must be tested.
> + *
> + * Return: %true if any dentry alias of @inode is at or below
> + * @root_dentry, %false otherwise.
> + */
> +bool nfsd_file_inode_is_in_subtree(struct inode *inode,
> +				   struct dentry *root_dentry)
> +{
> +	struct dentry *alias;
> +	bool found = false;
> +
> +	/* i_lock stabilizes the alias list; is_subdir() nests
> +	 * rename_lock (a seqlock) beneath it but does not sleep.
> +	 */
> +	spin_lock(&inode->i_lock);
> +	hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
> +		if (is_subdir(alias, root_dentry)) {
> +			found = true;
> +			break;
> +		}
> +	}
> +	spin_unlock(&inode->i_lock);
> +	return found;
> +}
> +
>  static struct nfsd_fcache_disposal *
>  nfsd_alloc_fcache_disposal(void)
>  {
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index b383dbc5b921..36c9a8e388d2 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -70,6 +70,7 @@ struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
>  struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
>  struct file *nfsd_file_file(struct nfsd_file *nf);
>  void nfsd_file_close_inode_sync(struct inode *inode);
> +bool nfsd_file_inode_is_in_subtree(struct inode *inode, struct dentry *root_dentry);
>  void nfsd_file_net_dispose(struct nfsd_net *nn);
>  bool nfsd_file_is_cached(struct inode *inode);
>  __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 891669b32804..581f38395c42 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1763,15 +1763,6 @@ static struct nfs4_stid *find_next_sb_stid(struct nfs4_client *clp,
>  	return stid;
>  }
>  
> -static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
> -					  struct super_block *sb,
> -					  unsigned int sc_types)
> -{
> -	unsigned long id = 0;
> -
> -	return find_next_sb_stid(clp, sb, sc_types, &id);
> -}
> -
>  static void revoke_ol_stid(struct nfs4_client *clp,
>  			   struct nfs4_ol_stateid *stp)
>  {
> @@ -1835,20 +1826,19 @@ static void revoke_one_stid(struct nfs4_client *clp, struct nfs4_stid *stid)
>  }
>  
>  /**
> - * nfsd4_revoke_states - revoke all nfsv4 states associated with given filesystem
> - * @nn:   used to identify instance of nfsd (there is one per net namespace)
> - * @sb:   super_block used to identify target filesystem
> + * nfsd4_revoke_export_states - revoke states associated with a given export
> + * @nn:           nfsd_net identifying the nfsd instance (one per net namespace)
> + * @sb:           super_block of the export's filesystem
> + * @root_dentry:  dentry of the export root directory
>   *
>   * All nfs4 states (open, lock, delegation, layout) held by the server instance
> - * and associated with a file on the given filesystem will be revoked resulting
> - * in any files being closed and so all references from nfsd to the filesystem
> - * being released.  Thus nfsd will no longer prevent the filesystem from being
> - * unmounted.
> - *
> - * The clients which own the states will subsequently being notified that the
> - * states have been "admin-revoked".
> + * and associated with files under the given export will be revoked.  When
> + * @root_dentry is the filesystem root, all state on @sb is revoked (equivalent
> + * to nfsd4_revoke_states).  When @root_dentry is a subdirectory, only state on
> + * files within that subtree is revoked.
>   */
> -void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
> +void nfsd4_revoke_export_states(struct nfsd_net *nn, struct super_block *sb,
> +				struct dentry *root_dentry)
>  {
>  	unsigned int idhashval;
>  	unsigned int sc_types;
> @@ -1861,18 +1851,53 @@ void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
>  		struct nfs4_client *clp;
>  	retry:
>  		list_for_each_entry(clp, head, cl_idhash) {
> -			struct nfs4_stid *stid = find_one_sb_stid(clp, sb,
> -								  sc_types);
> -			if (stid) {
> -				spin_unlock(&nn->client_lock);
> +			struct nfs4_stid *stid;
> +			/* Resets to zero on each retry; revocation may
> +			 * alter the IDR, so a stale cursor is unsafe.
> +			 */
> +			unsigned long id = 0;
> +
> +			while ((stid = find_next_sb_stid(clp, sb,
> +						sc_types, &id)) != NULL) {
> +				if (root_dentry != sb->s_root) {
> +					bool match;
> +
> +					/* Bare inc to pin clp; get_client_locked() is
> +					 * not used because its courtesy-to-active
> +					 * transition is unwanted during revocation.
> +					 */
> +					atomic_inc(&clp->cl_rpc_users);
> +					spin_unlock(&nn->client_lock);
> +					match = nfsd_file_inode_is_in_subtree(
> +							stid->sc_file->fi_inode,
> +							root_dentry);

Ouch the hardlinked thing makes this hard to reason about.

Ok, so suppose we have two exports on the same superblock.

/export/foo
/export/bar

One is exported to one client foo and one to another to client bar.
There is a file hardlinked across those directories:

$ touch /export/foo/baz
$ ln /export/bar/baz /export/foo/baz

Now, client foo opens /export/foo/baz, and client bar opens
/export/bar/baz.

/export/bar is unexported and state under it revoked. Won't client
foo's state end up being revoked too in that case?

Note that the different hardlinks should end up with different
filehandles since they are exposed to the clients via different
exports.

I wonder... do we need keep track of the export under which a stateid
was acquired so we can properly revoke the right ones in this
situation?

> +					if (!match) {
> +						nfs4_put_stid(stid);
> +						spin_lock(&nn->client_lock);
> +						put_client_renew_locked(clp);
> +						id++;
> +						continue;
> +					}
> +				} else {
> +					/* Whole-sb: goto retry restarts the
> +					 * client list immediately after
> +					 * revocation, so clp needs no pin.
> +					 */
> +					spin_unlock(&nn->client_lock);
> +				}
> +
>  				revoke_one_stid(clp, stid);
>  				nfs4_put_stid(stid);
>  				spin_lock(&nn->client_lock);
> +				if (root_dentry != sb->s_root)
> +					put_client_renew_locked(clp);
>  				if (clp->cl_minorversion == 0)
>  					/* Allow cleanup after a lease period.
> -					 * store_release ensures cleanup will
> -					 * see any newly revoked states if it
> -					 * sees the time updated.
> +					 * The lock/unlock pair orders this
> +					 * store after revoke_one_stid(), so
> +					 * nfs40_clean_admin_revoked() sees
> +					 * newly revoked states if it sees
> +					 * the updated time.
>  					 */
>  					nn->nfs40_last_revoke =
>  						ktime_get_boottime_seconds();
> @@ -1883,6 +1908,19 @@ void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
>  	spin_unlock(&nn->client_lock);
>  }
>  
> +/**
> + * nfsd4_revoke_states - revoke all nfsv4 states associated with given filesystem
> + * @nn:   nfsd_net identifying the nfsd instance
> + * @sb:   super_block used to identify target filesystem
> + *
> + * Convenience wrapper around nfsd4_revoke_export_states() that revokes
> + * all state on @sb by passing sb->s_root as the export root.
> + */
> +void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
> +{
> +	nfsd4_revoke_export_states(nn, sb, sb->s_root);
> +}
> +
>  static inline int
>  hash_sessionid(struct nfs4_sessionid *sessionid)
>  {
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index d3ed343699bd..d9c61c939059 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -2207,7 +2207,8 @@ static int nfsd_nl_unlock_by_filesystem(struct genl_info *info)
>  
>  	mutex_lock(&nfsd_mutex);
>  	if (nn->nfsd_serv)
> -		nfsd4_revoke_states(nn, path.dentry->d_sb);
> +		nfsd4_revoke_export_states(nn, path.dentry->d_sb,
> +					   path.dentry);
>  	else
>  		error = -EINVAL;
>  	mutex_unlock(&nfsd_mutex);
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 6fcbf1e427d4..9e7c7884831c 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -843,11 +843,18 @@ struct nfsd_file *find_any_file(struct nfs4_file *f);
>  
>  #ifdef CONFIG_NFSD_V4
>  void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb);
> +void nfsd4_revoke_export_states(struct nfsd_net *nn, struct super_block *sb,
> +				struct dentry *root_dentry);
>  void nfsd4_cancel_copy_by_sb(struct net *net, struct super_block *sb);
>  #else
>  static inline void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
>  {
>  }
> +static inline void nfsd4_revoke_export_states(struct nfsd_net *nn,
> +					      struct super_block *sb,
> +					      struct dentry *root_dentry)
> +{
> +}
>  static inline void nfsd4_cancel_copy_by_sb(struct net *net, struct super_block *sb)
>  {
>  }

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 5/6] NFSD: Add export-scoped state revocation
  2026-03-18 14:47   ` Jeff Layton
@ 2026-03-18 14:51     ` Chuck Lever
  2026-03-18 14:58       ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2026-03-18 14:51 UTC (permalink / raw)
  To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

On 3/18/26 10:47 AM, Jeff Layton wrote:
> On Wed, 2026-03-18 at 10:15 -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> nfsd4_revoke_states() revokes all NFSv4 state on an entire
>> superblock, which is too coarse when multiple exports share a
>> filesystem. Add nfsd4_revoke_export_states() to revoke only
>> state associated with files under a specific export root, then
>> convert nfsd4_revoke_states() to a thin wrapper that passes
>> sb->s_root.
>>
>> nfsd4_revoke_export_states() uses find_next_sb_stid() to locate
>> candidate stids, then verifies each against the export root via
>> nfsd_file_inode_is_in_subtree(). That helper is placed in the
>> file cache layer (filecache.c) because it operates on VFS types
>> with no NFSv4 state dependency. It walks all of an inode's
>> dentry aliases rather than calling d_find_any_alias(), because
>> for hard-linked files an arbitrary alias may fall outside the
>> export subtree even when another alias is inside it.
>>
>> When the export root is the filesystem root, the subtree check
>> is elided and every stid matching the superblock is revoked
>> directly.
>>
>> The NFSD_UNLOCK_TYPE_FILESYSTEM handler now calls
>> nfsd4_revoke_export_states() with the resolved path dentry,
>> enabling subtree-scoped revocation through the netlink
>> interface.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  fs/nfsd/filecache.c | 32 +++++++++++++++++++
>>  fs/nfsd/filecache.h |  1 +
>>  fs/nfsd/nfs4state.c | 92 +++++++++++++++++++++++++++++++++++++----------------
>>  fs/nfsd/nfsctl.c    |  3 +-
>>  fs/nfsd/state.h     |  7 ++++
>>  5 files changed, 107 insertions(+), 28 deletions(-)
>>
>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> index 1e2b38ed1d35..cd09be0c5465 100644
>> --- a/fs/nfsd/filecache.c
>> +++ b/fs/nfsd/filecache.c
>> @@ -894,6 +894,38 @@ __nfsd_file_cache_purge(struct net *net)
>>  	nfsd_file_dispose_list(&dispose);
>>  }
>>  
>> +/**
>> + * nfsd_file_inode_is_in_subtree - check whether an inode is under a subtree
>> + * @inode:        inode to test
>> + * @root_dentry:  dentry of the subtree root
>> + *
>> + * Check whether @inode has any dentry alias that falls within the
>> + * subtree rooted at @root_dentry.  Hard-linked files can have aliases
>> + * in multiple directories, so all aliases must be tested.
>> + *
>> + * Return: %true if any dentry alias of @inode is at or below
>> + * @root_dentry, %false otherwise.
>> + */
>> +bool nfsd_file_inode_is_in_subtree(struct inode *inode,
>> +				   struct dentry *root_dentry)
>> +{
>> +	struct dentry *alias;
>> +	bool found = false;
>> +
>> +	/* i_lock stabilizes the alias list; is_subdir() nests
>> +	 * rename_lock (a seqlock) beneath it but does not sleep.
>> +	 */
>> +	spin_lock(&inode->i_lock);
>> +	hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
>> +		if (is_subdir(alias, root_dentry)) {
>> +			found = true;
>> +			break;
>> +		}
>> +	}
>> +	spin_unlock(&inode->i_lock);
>> +	return found;
>> +}
>> +
>>  static struct nfsd_fcache_disposal *
>>  nfsd_alloc_fcache_disposal(void)
>>  {
>> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
>> index b383dbc5b921..36c9a8e388d2 100644
>> --- a/fs/nfsd/filecache.h
>> +++ b/fs/nfsd/filecache.h
>> @@ -70,6 +70,7 @@ struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
>>  struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
>>  struct file *nfsd_file_file(struct nfsd_file *nf);
>>  void nfsd_file_close_inode_sync(struct inode *inode);
>> +bool nfsd_file_inode_is_in_subtree(struct inode *inode, struct dentry *root_dentry);
>>  void nfsd_file_net_dispose(struct nfsd_net *nn);
>>  bool nfsd_file_is_cached(struct inode *inode);
>>  __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 891669b32804..581f38395c42 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1763,15 +1763,6 @@ static struct nfs4_stid *find_next_sb_stid(struct nfs4_client *clp,
>>  	return stid;
>>  }
>>  
>> -static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
>> -					  struct super_block *sb,
>> -					  unsigned int sc_types)
>> -{
>> -	unsigned long id = 0;
>> -
>> -	return find_next_sb_stid(clp, sb, sc_types, &id);
>> -}
>> -
>>  static void revoke_ol_stid(struct nfs4_client *clp,
>>  			   struct nfs4_ol_stateid *stp)
>>  {
>> @@ -1835,20 +1826,19 @@ static void revoke_one_stid(struct nfs4_client *clp, struct nfs4_stid *stid)
>>  }
>>  
>>  /**
>> - * nfsd4_revoke_states - revoke all nfsv4 states associated with given filesystem
>> - * @nn:   used to identify instance of nfsd (there is one per net namespace)
>> - * @sb:   super_block used to identify target filesystem
>> + * nfsd4_revoke_export_states - revoke states associated with a given export
>> + * @nn:           nfsd_net identifying the nfsd instance (one per net namespace)
>> + * @sb:           super_block of the export's filesystem
>> + * @root_dentry:  dentry of the export root directory
>>   *
>>   * All nfs4 states (open, lock, delegation, layout) held by the server instance
>> - * and associated with a file on the given filesystem will be revoked resulting
>> - * in any files being closed and so all references from nfsd to the filesystem
>> - * being released.  Thus nfsd will no longer prevent the filesystem from being
>> - * unmounted.
>> - *
>> - * The clients which own the states will subsequently being notified that the
>> - * states have been "admin-revoked".
>> + * and associated with files under the given export will be revoked.  When
>> + * @root_dentry is the filesystem root, all state on @sb is revoked (equivalent
>> + * to nfsd4_revoke_states).  When @root_dentry is a subdirectory, only state on
>> + * files within that subtree is revoked.
>>   */
>> -void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
>> +void nfsd4_revoke_export_states(struct nfsd_net *nn, struct super_block *sb,
>> +				struct dentry *root_dentry)
>>  {
>>  	unsigned int idhashval;
>>  	unsigned int sc_types;
>> @@ -1861,18 +1851,53 @@ void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
>>  		struct nfs4_client *clp;
>>  	retry:
>>  		list_for_each_entry(clp, head, cl_idhash) {
>> -			struct nfs4_stid *stid = find_one_sb_stid(clp, sb,
>> -								  sc_types);
>> -			if (stid) {
>> -				spin_unlock(&nn->client_lock);
>> +			struct nfs4_stid *stid;
>> +			/* Resets to zero on each retry; revocation may
>> +			 * alter the IDR, so a stale cursor is unsafe.
>> +			 */
>> +			unsigned long id = 0;
>> +
>> +			while ((stid = find_next_sb_stid(clp, sb,
>> +						sc_types, &id)) != NULL) {
>> +				if (root_dentry != sb->s_root) {
>> +					bool match;
>> +
>> +					/* Bare inc to pin clp; get_client_locked() is
>> +					 * not used because its courtesy-to-active
>> +					 * transition is unwanted during revocation.
>> +					 */
>> +					atomic_inc(&clp->cl_rpc_users);
>> +					spin_unlock(&nn->client_lock);
>> +					match = nfsd_file_inode_is_in_subtree(
>> +							stid->sc_file->fi_inode,
>> +							root_dentry);
> 
> Ouch the hardlinked thing makes this hard to reason about.
> 
> Ok, so suppose we have two exports on the same superblock.
> 
> /export/foo
> /export/bar
> 
> One is exported to one client foo and one to another to client bar.
> There is a file hardlinked across those directories:
> 
> $ touch /export/foo/baz
> $ ln /export/bar/baz /export/foo/baz
> 
> Now, client foo opens /export/foo/baz, and client bar opens
> /export/bar/baz.
> 
> /export/bar is unexported and state under it revoked. Won't client
> foo's state end up being revoked too in that case?
> 
> Note that the different hardlinks should end up with different
> filehandles since they are exposed to the clients via different
> exports.
> 
> I wonder... do we need keep track of the export under which a stateid
> was acquired so we can properly revoke the right ones in this
> situation?

If I understand your comment correctly, I believe that is what the
earlier patches in this series do -- tie the state IDs to a particular
export.



-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 5/6] NFSD: Add export-scoped state revocation
  2026-03-18 14:51     ` Chuck Lever
@ 2026-03-18 14:58       ` Jeff Layton
  2026-03-18 17:57         ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2026-03-18 14:58 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

On Wed, 2026-03-18 at 10:51 -0400, Chuck Lever wrote:
> On 3/18/26 10:47 AM, Jeff Layton wrote:
> > On Wed, 2026-03-18 at 10:15 -0400, Chuck Lever wrote:
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > > 
> > > nfsd4_revoke_states() revokes all NFSv4 state on an entire
> > > superblock, which is too coarse when multiple exports share a
> > > filesystem. Add nfsd4_revoke_export_states() to revoke only
> > > state associated with files under a specific export root, then
> > > convert nfsd4_revoke_states() to a thin wrapper that passes
> > > sb->s_root.
> > > 
> > > nfsd4_revoke_export_states() uses find_next_sb_stid() to locate
> > > candidate stids, then verifies each against the export root via
> > > nfsd_file_inode_is_in_subtree(). That helper is placed in the
> > > file cache layer (filecache.c) because it operates on VFS types
> > > with no NFSv4 state dependency. It walks all of an inode's
> > > dentry aliases rather than calling d_find_any_alias(), because
> > > for hard-linked files an arbitrary alias may fall outside the
> > > export subtree even when another alias is inside it.
> > > 
> > > When the export root is the filesystem root, the subtree check
> > > is elided and every stid matching the superblock is revoked
> > > directly.
> > > 
> > > The NFSD_UNLOCK_TYPE_FILESYSTEM handler now calls
> > > nfsd4_revoke_export_states() with the resolved path dentry,
> > > enabling subtree-scoped revocation through the netlink
> > > interface.
> > > 
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > ---
> > >  fs/nfsd/filecache.c | 32 +++++++++++++++++++
> > >  fs/nfsd/filecache.h |  1 +
> > >  fs/nfsd/nfs4state.c | 92 +++++++++++++++++++++++++++++++++++++----------------
> > >  fs/nfsd/nfsctl.c    |  3 +-
> > >  fs/nfsd/state.h     |  7 ++++
> > >  5 files changed, 107 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index 1e2b38ed1d35..cd09be0c5465 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -894,6 +894,38 @@ __nfsd_file_cache_purge(struct net *net)
> > >  	nfsd_file_dispose_list(&dispose);
> > >  }
> > >  
> > > +/**
> > > + * nfsd_file_inode_is_in_subtree - check whether an inode is under a subtree
> > > + * @inode:        inode to test
> > > + * @root_dentry:  dentry of the subtree root
> > > + *
> > > + * Check whether @inode has any dentry alias that falls within the
> > > + * subtree rooted at @root_dentry.  Hard-linked files can have aliases
> > > + * in multiple directories, so all aliases must be tested.
> > > + *
> > > + * Return: %true if any dentry alias of @inode is at or below
> > > + * @root_dentry, %false otherwise.
> > > + */
> > > +bool nfsd_file_inode_is_in_subtree(struct inode *inode,
> > > +				   struct dentry *root_dentry)
> > > +{
> > > +	struct dentry *alias;
> > > +	bool found = false;
> > > +
> > > +	/* i_lock stabilizes the alias list; is_subdir() nests
> > > +	 * rename_lock (a seqlock) beneath it but does not sleep.
> > > +	 */
> > > +	spin_lock(&inode->i_lock);
> > > +	hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
> > > +		if (is_subdir(alias, root_dentry)) {
> > > +			found = true;
> > > +			break;
> > > +		}
> > > +	}
> > > +	spin_unlock(&inode->i_lock);
> > > +	return found;
> > > +}
> > > +
> > >  static struct nfsd_fcache_disposal *
> > >  nfsd_alloc_fcache_disposal(void)
> > >  {
> > > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > > index b383dbc5b921..36c9a8e388d2 100644
> > > --- a/fs/nfsd/filecache.h
> > > +++ b/fs/nfsd/filecache.h
> > > @@ -70,6 +70,7 @@ struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
> > >  struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> > >  struct file *nfsd_file_file(struct nfsd_file *nf);
> > >  void nfsd_file_close_inode_sync(struct inode *inode);
> > > +bool nfsd_file_inode_is_in_subtree(struct inode *inode, struct dentry *root_dentry);
> > >  void nfsd_file_net_dispose(struct nfsd_net *nn);
> > >  bool nfsd_file_is_cached(struct inode *inode);
> > >  __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 891669b32804..581f38395c42 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -1763,15 +1763,6 @@ static struct nfs4_stid *find_next_sb_stid(struct nfs4_client *clp,
> > >  	return stid;
> > >  }
> > >  
> > > -static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
> > > -					  struct super_block *sb,
> > > -					  unsigned int sc_types)
> > > -{
> > > -	unsigned long id = 0;
> > > -
> > > -	return find_next_sb_stid(clp, sb, sc_types, &id);
> > > -}
> > > -
> > >  static void revoke_ol_stid(struct nfs4_client *clp,
> > >  			   struct nfs4_ol_stateid *stp)
> > >  {
> > > @@ -1835,20 +1826,19 @@ static void revoke_one_stid(struct nfs4_client *clp, struct nfs4_stid *stid)
> > >  }
> > >  
> > >  /**
> > > - * nfsd4_revoke_states - revoke all nfsv4 states associated with given filesystem
> > > - * @nn:   used to identify instance of nfsd (there is one per net namespace)
> > > - * @sb:   super_block used to identify target filesystem
> > > + * nfsd4_revoke_export_states - revoke states associated with a given export
> > > + * @nn:           nfsd_net identifying the nfsd instance (one per net namespace)
> > > + * @sb:           super_block of the export's filesystem
> > > + * @root_dentry:  dentry of the export root directory
> > >   *
> > >   * All nfs4 states (open, lock, delegation, layout) held by the server instance
> > > - * and associated with a file on the given filesystem will be revoked resulting
> > > - * in any files being closed and so all references from nfsd to the filesystem
> > > - * being released.  Thus nfsd will no longer prevent the filesystem from being
> > > - * unmounted.
> > > - *
> > > - * The clients which own the states will subsequently being notified that the
> > > - * states have been "admin-revoked".
> > > + * and associated with files under the given export will be revoked.  When
> > > + * @root_dentry is the filesystem root, all state on @sb is revoked (equivalent
> > > + * to nfsd4_revoke_states).  When @root_dentry is a subdirectory, only state on
> > > + * files within that subtree is revoked.
> > >   */
> > > -void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
> > > +void nfsd4_revoke_export_states(struct nfsd_net *nn, struct super_block *sb,
> > > +				struct dentry *root_dentry)
> > >  {
> > >  	unsigned int idhashval;
> > >  	unsigned int sc_types;
> > > @@ -1861,18 +1851,53 @@ void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
> > >  		struct nfs4_client *clp;
> > >  	retry:
> > >  		list_for_each_entry(clp, head, cl_idhash) {
> > > -			struct nfs4_stid *stid = find_one_sb_stid(clp, sb,
> > > -								  sc_types);
> > > -			if (stid) {
> > > -				spin_unlock(&nn->client_lock);
> > > +			struct nfs4_stid *stid;
> > > +			/* Resets to zero on each retry; revocation may
> > > +			 * alter the IDR, so a stale cursor is unsafe.
> > > +			 */
> > > +			unsigned long id = 0;
> > > +
> > > +			while ((stid = find_next_sb_stid(clp, sb,
> > > +						sc_types, &id)) != NULL) {
> > > +				if (root_dentry != sb->s_root) {
> > > +					bool match;
> > > +
> > > +					/* Bare inc to pin clp; get_client_locked() is
> > > +					 * not used because its courtesy-to-active
> > > +					 * transition is unwanted during revocation.
> > > +					 */
> > > +					atomic_inc(&clp->cl_rpc_users);
> > > +					spin_unlock(&nn->client_lock);
> > > +					match = nfsd_file_inode_is_in_subtree(
> > > +							stid->sc_file->fi_inode,
> > > +							root_dentry);
> > 
> > Ouch the hardlinked thing makes this hard to reason about.
> > 
> > Ok, so suppose we have two exports on the same superblock.
> > 
> > /export/foo
> > /export/bar
> > 
> > One is exported to one client foo and one to another to client bar.
> > There is a file hardlinked across those directories:
> > 
> > $ touch /export/foo/baz
> > $ ln /export/bar/baz /export/foo/baz
> > 
> > Now, client foo opens /export/foo/baz, and client bar opens
> > /export/bar/baz.
> > 
> > /export/bar is unexported and state under it revoked. Won't client
> > foo's state end up being revoked too in that case?
> > 
> > Note that the different hardlinks should end up with different
> > filehandles since they are exposed to the clients via different
> > exports.
> > 
> > I wonder... do we need keep track of the export under which a stateid
> > was acquired so we can properly revoke the right ones in this
> > situation?
> 
> If I understand your comment correctly, I believe that is what the
> earlier patches in this series do -- tie the state IDs to a particular
> export.
> 
> 

I don't think so, not AFAICT. 

+ * Check whether @inode has any dentry alias that falls within the
+ * subtree rooted at @root_dentry.  Hard-linked files can have aliases
+ * in multiple directories, so all aliases must be tested.

In this case, you're calling nfsd_file_inode_is_in_subtree(). In the
above example, that's going to return true for foo's state, even though
bar's export is the one being revoked.

I think to do this properly, you'd have to track an st_export field in
struct nfs4_ol_stateid (or maybe in struct nfs4_stid).

OTOH, hardlinks are rare so maybe this isn't worth fretting over.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 5/6] NFSD: Add export-scoped state revocation
  2026-03-18 14:58       ` Jeff Layton
@ 2026-03-18 17:57         ` Chuck Lever
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2026-03-18 17:57 UTC (permalink / raw)
  To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-fsdevel, Chuck Lever

On 3/18/26 10:58 AM, Jeff Layton wrote:
> On Wed, 2026-03-18 at 10:51 -0400, Chuck Lever wrote:
>> On 3/18/26 10:47 AM, Jeff Layton wrote:
>>> On Wed, 2026-03-18 at 10:15 -0400, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>
>>>> nfsd4_revoke_states() revokes all NFSv4 state on an entire
>>>> superblock, which is too coarse when multiple exports share a
>>>> filesystem. Add nfsd4_revoke_export_states() to revoke only
>>>> state associated with files under a specific export root, then
>>>> convert nfsd4_revoke_states() to a thin wrapper that passes
>>>> sb->s_root.
>>>>
>>>> nfsd4_revoke_export_states() uses find_next_sb_stid() to locate
>>>> candidate stids, then verifies each against the export root via
>>>> nfsd_file_inode_is_in_subtree(). That helper is placed in the
>>>> file cache layer (filecache.c) because it operates on VFS types
>>>> with no NFSv4 state dependency. It walks all of an inode's
>>>> dentry aliases rather than calling d_find_any_alias(), because
>>>> for hard-linked files an arbitrary alias may fall outside the
>>>> export subtree even when another alias is inside it.
>>>>
>>>> When the export root is the filesystem root, the subtree check
>>>> is elided and every stid matching the superblock is revoked
>>>> directly.
>>>>
>>>> The NFSD_UNLOCK_TYPE_FILESYSTEM handler now calls
>>>> nfsd4_revoke_export_states() with the resolved path dentry,
>>>> enabling subtree-scoped revocation through the netlink
>>>> interface.
>>>>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>  fs/nfsd/filecache.c | 32 +++++++++++++++++++
>>>>  fs/nfsd/filecache.h |  1 +
>>>>  fs/nfsd/nfs4state.c | 92 +++++++++++++++++++++++++++++++++++++----------------
>>>>  fs/nfsd/nfsctl.c    |  3 +-
>>>>  fs/nfsd/state.h     |  7 ++++
>>>>  5 files changed, 107 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>> index 1e2b38ed1d35..cd09be0c5465 100644
>>>> --- a/fs/nfsd/filecache.c
>>>> +++ b/fs/nfsd/filecache.c
>>>> @@ -894,6 +894,38 @@ __nfsd_file_cache_purge(struct net *net)
>>>>  	nfsd_file_dispose_list(&dispose);
>>>>  }
>>>>  
>>>> +/**
>>>> + * nfsd_file_inode_is_in_subtree - check whether an inode is under a subtree
>>>> + * @inode:        inode to test
>>>> + * @root_dentry:  dentry of the subtree root
>>>> + *
>>>> + * Check whether @inode has any dentry alias that falls within the
>>>> + * subtree rooted at @root_dentry.  Hard-linked files can have aliases
>>>> + * in multiple directories, so all aliases must be tested.
>>>> + *
>>>> + * Return: %true if any dentry alias of @inode is at or below
>>>> + * @root_dentry, %false otherwise.
>>>> + */
>>>> +bool nfsd_file_inode_is_in_subtree(struct inode *inode,
>>>> +				   struct dentry *root_dentry)
>>>> +{
>>>> +	struct dentry *alias;
>>>> +	bool found = false;
>>>> +
>>>> +	/* i_lock stabilizes the alias list; is_subdir() nests
>>>> +	 * rename_lock (a seqlock) beneath it but does not sleep.
>>>> +	 */
>>>> +	spin_lock(&inode->i_lock);
>>>> +	hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
>>>> +		if (is_subdir(alias, root_dentry)) {
>>>> +			found = true;
>>>> +			break;
>>>> +		}
>>>> +	}
>>>> +	spin_unlock(&inode->i_lock);
>>>> +	return found;
>>>> +}
>>>> +
>>>>  static struct nfsd_fcache_disposal *
>>>>  nfsd_alloc_fcache_disposal(void)
>>>>  {
>>>> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
>>>> index b383dbc5b921..36c9a8e388d2 100644
>>>> --- a/fs/nfsd/filecache.h
>>>> +++ b/fs/nfsd/filecache.h
>>>> @@ -70,6 +70,7 @@ struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
>>>>  struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
>>>>  struct file *nfsd_file_file(struct nfsd_file *nf);
>>>>  void nfsd_file_close_inode_sync(struct inode *inode);
>>>> +bool nfsd_file_inode_is_in_subtree(struct inode *inode, struct dentry *root_dentry);
>>>>  void nfsd_file_net_dispose(struct nfsd_net *nn);
>>>>  bool nfsd_file_is_cached(struct inode *inode);
>>>>  __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>> index 891669b32804..581f38395c42 100644
>>>> --- a/fs/nfsd/nfs4state.c
>>>> +++ b/fs/nfsd/nfs4state.c
>>>> @@ -1763,15 +1763,6 @@ static struct nfs4_stid *find_next_sb_stid(struct nfs4_client *clp,
>>>>  	return stid;
>>>>  }
>>>>  
>>>> -static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
>>>> -					  struct super_block *sb,
>>>> -					  unsigned int sc_types)
>>>> -{
>>>> -	unsigned long id = 0;
>>>> -
>>>> -	return find_next_sb_stid(clp, sb, sc_types, &id);
>>>> -}
>>>> -
>>>>  static void revoke_ol_stid(struct nfs4_client *clp,
>>>>  			   struct nfs4_ol_stateid *stp)
>>>>  {
>>>> @@ -1835,20 +1826,19 @@ static void revoke_one_stid(struct nfs4_client *clp, struct nfs4_stid *stid)
>>>>  }
>>>>  
>>>>  /**
>>>> - * nfsd4_revoke_states - revoke all nfsv4 states associated with given filesystem
>>>> - * @nn:   used to identify instance of nfsd (there is one per net namespace)
>>>> - * @sb:   super_block used to identify target filesystem
>>>> + * nfsd4_revoke_export_states - revoke states associated with a given export
>>>> + * @nn:           nfsd_net identifying the nfsd instance (one per net namespace)
>>>> + * @sb:           super_block of the export's filesystem
>>>> + * @root_dentry:  dentry of the export root directory
>>>>   *
>>>>   * All nfs4 states (open, lock, delegation, layout) held by the server instance
>>>> - * and associated with a file on the given filesystem will be revoked resulting
>>>> - * in any files being closed and so all references from nfsd to the filesystem
>>>> - * being released.  Thus nfsd will no longer prevent the filesystem from being
>>>> - * unmounted.
>>>> - *
>>>> - * The clients which own the states will subsequently being notified that the
>>>> - * states have been "admin-revoked".
>>>> + * and associated with files under the given export will be revoked.  When
>>>> + * @root_dentry is the filesystem root, all state on @sb is revoked (equivalent
>>>> + * to nfsd4_revoke_states).  When @root_dentry is a subdirectory, only state on
>>>> + * files within that subtree is revoked.
>>>>   */
>>>> -void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
>>>> +void nfsd4_revoke_export_states(struct nfsd_net *nn, struct super_block *sb,
>>>> +				struct dentry *root_dentry)
>>>>  {
>>>>  	unsigned int idhashval;
>>>>  	unsigned int sc_types;
>>>> @@ -1861,18 +1851,53 @@ void nfsd4_revoke_states(struct nfsd_net *nn, struct super_block *sb)
>>>>  		struct nfs4_client *clp;
>>>>  	retry:
>>>>  		list_for_each_entry(clp, head, cl_idhash) {
>>>> -			struct nfs4_stid *stid = find_one_sb_stid(clp, sb,
>>>> -								  sc_types);
>>>> -			if (stid) {
>>>> -				spin_unlock(&nn->client_lock);
>>>> +			struct nfs4_stid *stid;
>>>> +			/* Resets to zero on each retry; revocation may
>>>> +			 * alter the IDR, so a stale cursor is unsafe.
>>>> +			 */
>>>> +			unsigned long id = 0;
>>>> +
>>>> +			while ((stid = find_next_sb_stid(clp, sb,
>>>> +						sc_types, &id)) != NULL) {
>>>> +				if (root_dentry != sb->s_root) {
>>>> +					bool match;
>>>> +
>>>> +					/* Bare inc to pin clp; get_client_locked() is
>>>> +					 * not used because its courtesy-to-active
>>>> +					 * transition is unwanted during revocation.
>>>> +					 */
>>>> +					atomic_inc(&clp->cl_rpc_users);
>>>> +					spin_unlock(&nn->client_lock);
>>>> +					match = nfsd_file_inode_is_in_subtree(
>>>> +							stid->sc_file->fi_inode,
>>>> +							root_dentry);
>>>
>>> Ouch the hardlinked thing makes this hard to reason about.
>>>
>>> Ok, so suppose we have two exports on the same superblock.
>>>
>>> /export/foo
>>> /export/bar
>>>
>>> One is exported to one client foo and one to another to client bar.
>>> There is a file hardlinked across those directories:
>>>
>>> $ touch /export/foo/baz
>>> $ ln /export/bar/baz /export/foo/baz
>>>
>>> Now, client foo opens /export/foo/baz, and client bar opens
>>> /export/bar/baz.
>>>
>>> /export/bar is unexported and state under it revoked. Won't client
>>> foo's state end up being revoked too in that case?
>>>
>>> Note that the different hardlinks should end up with different
>>> filehandles since they are exposed to the clients via different
>>> exports.
>>>
>>> I wonder... do we need keep track of the export under which a stateid
>>> was acquired so we can properly revoke the right ones in this
>>> situation?
>>
>> If I understand your comment correctly, I believe that is what the
>> earlier patches in this series do -- tie the state IDs to a particular
>> export.
>>
>>
> 
> I don't think so, not AFAICT. 
> 
> + * Check whether @inode has any dentry alias that falls within the
> + * subtree rooted at @root_dentry.  Hard-linked files can have aliases
> + * in multiple directories, so all aliases must be tested.
> 
> In this case, you're calling nfsd_file_inode_is_in_subtree(). In the
> above example, that's going to return true for foo's state, even though
> bar's export is the one being revoked.
> 
> I think to do this properly, you'd have to track an st_export field in
> struct nfs4_ol_stateid (or maybe in struct nfs4_stid).
> 
> OTOH, hardlinks are rare so maybe this isn't worth fretting over.

Probably rare, but the point is the current approach is difficult to
reason about, thus will be hard to maintain. For v5, I'll try linking
NFSv4 state to exports.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-03-18 17:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18 14:15 [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Chuck Lever
2026-03-18 14:15 ` [PATCH v4 1/6] NFSD: Extract revoke_one_stid() utility function Chuck Lever
2026-03-18 14:21   ` Jeff Layton
2026-03-18 14:15 ` [PATCH v4 2/6] NFSD: Add NFSD_CMD_UNLOCK netlink command with ip scope Chuck Lever
2026-03-18 14:28   ` Jeff Layton
2026-03-18 14:32     ` Chuck Lever
2026-03-18 14:15 ` [PATCH v4 3/6] NFSD: Add filesystem scope to NFSD_CMD_UNLOCK Chuck Lever
2026-03-18 14:29   ` Jeff Layton
2026-03-18 14:15 ` [PATCH v4 4/6] NFSD: Refactor find_one_sb_stid() into find_next_sb_stid() Chuck Lever
2026-03-18 14:30   ` Jeff Layton
2026-03-18 14:15 ` [PATCH v4 5/6] NFSD: Add export-scoped state revocation Chuck Lever
2026-03-18 14:47   ` Jeff Layton
2026-03-18 14:51     ` Chuck Lever
2026-03-18 14:58       ` Jeff Layton
2026-03-18 17:57         ` Chuck Lever
2026-03-18 14:15 ` [PATCH v4 6/6] NFSD: Add nfsd_file_close_export() for file cache cleanup Chuck Lever
2026-03-18 14:24 ` [PATCH v4 0/6] Automatic NFSv4 state revocation on filesystem unmount Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox