Linux NFS development
 help / color / mirror / Atom feed
* [PATCH 0/2] nfsd: provide locking for v4_end_grace
@ 2025-07-04  7:20 NeilBrown
  2025-07-04  7:20 ` [PATCH 1/2] " NeilBrown
  2025-07-04  7:20 ` [PATCH 2/2] nfsd: discard client_tracking_active and instead disable laundromat_work NeilBrown
  0 siblings, 2 replies; 8+ messages in thread
From: NeilBrown @ 2025-07-04  7:20 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Li Lingfeng

Writing to v4_end_grace can race with server shutdown and result in UAF.
The first patch fixes the problem in a way that is suitable for
backport.  The second patch improves is slightly (in my opinion) but in
a way that cannot be backported very far.

Note that I've used a different Closes: link to the one Chuck suggested
in a recent email.

NeilBrown


 [PATCH 1/2] nfsd: provide locking for v4_end_grace
 [PATCH 2/2] nfsd: discard client_tracking_active and instead disable

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] nfsd: provide locking for v4_end_grace
  2025-07-04  7:20 [PATCH 0/2] nfsd: provide locking for v4_end_grace NeilBrown
@ 2025-07-04  7:20 ` NeilBrown
  2025-07-05 11:15   ` Jeff Layton
  2025-07-04  7:20 ` [PATCH 2/2] nfsd: discard client_tracking_active and instead disable laundromat_work NeilBrown
  1 sibling, 1 reply; 8+ messages in thread
From: NeilBrown @ 2025-07-04  7:20 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Li Lingfeng

Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particular.

We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.

nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.

However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it.  For this we add
a new flag protected with a spin_lock, and nn->client_lock is suitable.
It is set only while it is safe to make client tracking calls, and
v4_end_grace only schedules work while the flag is set and with the
spin_lock held.

So this patch adds an nfsd_net field "client_tracking_active" which is
set as described.  Another field "grace_end_forced", is set when
v4_end_grace is written.  After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.

This resolves a race which can result in use-after-free.

Reported-and-tested-by: Li Lingfeng <lilingfeng3@huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250513074305.3362209-1-lilingfeng3@huawei.com
Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfsd/netns.h     |  2 ++
 fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
 fs/nfsd/nfsctl.c    |  6 +++---
 fs/nfsd/state.h     |  2 +-
 4 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 3e2d0fde80a7..fe8338735e7c 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -66,6 +66,8 @@ struct nfsd_net {
 
 	struct lock_manager nfsd4_manager;
 	bool grace_ended;
+	bool grace_end_forced;
+	bool client_tracking_active;
 	time64_t boot_time;
 
 	struct dentry *nfsd_client_dir;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d5694987f86f..124fe4f669aa 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -84,7 +84,7 @@ static u64 current_sessionid = 1;
 /* forward declarations */
 static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner);
 static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
-void nfsd4_end_grace(struct nfsd_net *nn);
+static void nfsd4_end_grace(struct nfsd_net *nn);
 static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
 static void nfsd4_file_hash_remove(struct nfs4_file *fi);
 static void deleg_reaper(struct nfsd_net *nn);
@@ -6458,7 +6458,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	return nfs_ok;
 }
 
-void
+static void
 nfsd4_end_grace(struct nfsd_net *nn)
 {
 	/* do nothing if grace period already ended */
@@ -6491,6 +6491,36 @@ nfsd4_end_grace(struct nfsd_net *nn)
 	 */
 }
 
+/**
+ * nfsd4_force_end_grace - forcibly end the grace period
+ * @nn: nfsd_net in which the grace period must end.
+ *
+ *
+ * An nfsv4 grace period can be terminated early if it is known that
+ * no more client could reclaim state.  Sometimes user-space can provide
+ * that information - which will potentially be provided asychnronously
+ * w.r.t. server startup or shutdown.
+ *
+ * nfsd4_force_end_grace() causing the grace period to end and takes
+ * care to ensure races with server start/stop are not problematic.
+ *
+ * Return value:  %false if the NFS server was not active and
+ *      %true if the server was, or may have been, active.
+ */
+bool
+nfsd4_force_end_grace(struct nfsd_net *nn)
+{
+	if (!nn->client_tracking_ops)
+		return false;
+	spin_lock(&nn->client_lock);
+	if (nn->client_tracking_active) {
+		nn->grace_end_forced = true;
+		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+	}
+	spin_unlock(&nn->client_lock);
+	return true;
+}
+
 /*
  * If we've waited a lease period but there are still clients trying to
  * reclaim, wait a little longer to give them a chance to finish.
@@ -6500,6 +6530,8 @@ static bool clients_still_reclaiming(struct nfsd_net *nn)
 	time64_t double_grace_period_end = nn->boot_time +
 					   2 * nn->nfsd4_lease;
 
+	if (nn->grace_end_forced)
+		return false;
 	if (nn->track_reclaim_completes &&
 			atomic_read(&nn->nr_reclaim_complete) ==
 			nn->reclaim_str_hashtbl_size)
@@ -8807,6 +8839,8 @@ static int nfs4_state_create_net(struct net *net)
 	nn->unconf_name_tree = RB_ROOT;
 	nn->boot_time = ktime_get_real_seconds();
 	nn->grace_ended = false;
+	nn->grace_end_forced = false;
+	nn->client_tracking_active = false;
 	nn->nfsd4_manager.block_opens = true;
 	INIT_LIST_HEAD(&nn->nfsd4_manager.list);
 	INIT_LIST_HEAD(&nn->client_lru);
@@ -8887,6 +8921,10 @@ nfs4_state_start_net(struct net *net)
 		return ret;
 	locks_start_grace(net, &nn->nfsd4_manager);
 	nfsd4_client_tracking_init(net);
+	/* safe for laundromat to run now */
+	spin_lock(&nn->client_lock);
+	nn->client_tracking_active = true;
+	spin_unlock(&nn->client_lock);
 	if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
 		goto skip_grace;
 	printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
@@ -8935,6 +8973,9 @@ nfs4_state_shutdown_net(struct net *net)
 
 	shrinker_free(nn->nfsd_client_shrinker);
 	cancel_work_sync(&nn->nfsd_shrinker_work);
+	spin_lock(&nn->client_lock);
+	nn->client_tracking_active = false;
+	spin_unlock(&nn->client_lock);
 	cancel_delayed_work_sync(&nn->laundromat_work);
 	locks_end_grace(&nn->nfsd4_manager);
 
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 3f3e9f6c4250..658f3f86a59f 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1082,10 +1082,10 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
 		case 'Y':
 		case 'y':
 		case '1':
-			if (!nn->nfsd_serv)
-				return -EBUSY;
 			trace_nfsd_end_grace(netns(file));
-			nfsd4_end_grace(nn);
+			if (!nfsd4_force_end_grace(nn))
+				return -EBUSY;
+
 			break;
 		default:
 			return -EINVAL;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 1995bca158b8..05eabc69de40 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -836,7 +836,7 @@ static inline void nfsd4_revoke_states(struct net *net, struct super_block *sb)
 #endif
 
 /* grace period management */
-void nfsd4_end_grace(struct nfsd_net *nn);
+bool nfsd4_force_end_grace(struct nfsd_net *nn);
 
 /* nfs4recover operations */
 extern int nfsd4_client_tracking_init(struct net *net);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] nfsd: discard client_tracking_active and instead disable laundromat_work
  2025-07-04  7:20 [PATCH 0/2] nfsd: provide locking for v4_end_grace NeilBrown
  2025-07-04  7:20 ` [PATCH 1/2] " NeilBrown
@ 2025-07-04  7:20 ` NeilBrown
  2025-07-04 15:10   ` Chuck Lever
                     ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: NeilBrown @ 2025-07-04  7:20 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Li Lingfeng

We currently set client_tracking_active precisely when it is safe for
laundromat_work to be scheduled.  It is possible to enable/disable
laundromat_work, so we can do that instead of having a separate flag.

Doing this avoids overloading ->state_lock with a use that is only
tangentially related to the other uses.

Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfsd/netns.h     |  1 -
 fs/nfsd/nfs4state.c | 24 ++++++++++--------------
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index fe8338735e7c..d83c68872c4c 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -67,7 +67,6 @@ struct nfsd_net {
 	struct lock_manager nfsd4_manager;
 	bool grace_ended;
 	bool grace_end_forced;
-	bool client_tracking_active;
 	time64_t boot_time;
 
 	struct dentry *nfsd_client_dir;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 124fe4f669aa..db292ac473c6 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6512,12 +6512,12 @@ nfsd4_force_end_grace(struct nfsd_net *nn)
 {
 	if (!nn->client_tracking_ops)
 		return false;
-	spin_lock(&nn->client_lock);
-	if (nn->client_tracking_active) {
-		nn->grace_end_forced = true;
-		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
-	}
-	spin_unlock(&nn->client_lock);
+	/* laundromat_work must be initialised now, though it might be disabled */
+	nn->grace_end_forced = true;
+	/* This is a no-op after nfs4_state_shutdown_net() has called
+	 * disable_delayed_work_sync()
+	 */
+	mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
 	return true;
 }
 
@@ -8840,7 +8840,6 @@ static int nfs4_state_create_net(struct net *net)
 	nn->boot_time = ktime_get_real_seconds();
 	nn->grace_ended = false;
 	nn->grace_end_forced = false;
-	nn->client_tracking_active = false;
 	nn->nfsd4_manager.block_opens = true;
 	INIT_LIST_HEAD(&nn->nfsd4_manager.list);
 	INIT_LIST_HEAD(&nn->client_lru);
@@ -8855,6 +8854,8 @@ static int nfs4_state_create_net(struct net *net)
 	INIT_LIST_HEAD(&nn->blocked_locks_lru);
 
 	INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main);
+	/* Make sure his cannot run until client tracking is initialised */
+	disable_delayed_work(&nn->laundromat_work);
 	INIT_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker);
 	get_net(net);
 
@@ -8922,9 +8923,7 @@ nfs4_state_start_net(struct net *net)
 	locks_start_grace(net, &nn->nfsd4_manager);
 	nfsd4_client_tracking_init(net);
 	/* safe for laundromat to run now */
-	spin_lock(&nn->client_lock);
-	nn->client_tracking_active = true;
-	spin_unlock(&nn->client_lock);
+	enable_delayed_work(&nn->laundromat_work);
 	if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
 		goto skip_grace;
 	printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
@@ -8973,10 +8972,7 @@ nfs4_state_shutdown_net(struct net *net)
 
 	shrinker_free(nn->nfsd_client_shrinker);
 	cancel_work_sync(&nn->nfsd_shrinker_work);
-	spin_lock(&nn->client_lock);
-	nn->client_tracking_active = false;
-	spin_unlock(&nn->client_lock);
-	cancel_delayed_work_sync(&nn->laundromat_work);
+	disable_delayed_work_sync(&nn->laundromat_work);
 	locks_end_grace(&nn->nfsd4_manager);
 
 	INIT_LIST_HEAD(&reaplist);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] nfsd: discard client_tracking_active and instead disable laundromat_work
  2025-07-04  7:20 ` [PATCH 2/2] nfsd: discard client_tracking_active and instead disable laundromat_work NeilBrown
@ 2025-07-04 15:10   ` Chuck Lever
  2025-07-04 15:13   ` Chuck Lever
  2025-07-05 11:20   ` Jeff Layton
  2 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever @ 2025-07-04 15:10 UTC (permalink / raw)
  To: NeilBrown, Li Lingfeng
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Jeff Layton

On 7/4/25 3:20 AM, NeilBrown wrote:
> We currently set client_tracking_active precisely when it is safe for
> laundromat_work to be scheduled.  

Consider adding "Since v6.10, it is possible ..."

> It is possible to enable/disable
> laundromat_work, so we can do that instead of having a separate flag.
> 
> Doing this avoids overloading ->state_lock with a use that is only
> tangentially related to the other uses.

Please note here that this change is contained in a separate patch so
that the previous patch may be backported cleanly to LTS kernels.
Otherwise we have the situation where a patch adds a structure field
and the next patch immediately removes it, and that kind of churn
without explanation is generally to be avoided.

I would like to see a fresh "Tested-by" for the previous patch by
itself (any recent kernel) and then for 6.16 with both of these applied,
please?


> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/nfsd/netns.h     |  1 -
>  fs/nfsd/nfs4state.c | 24 ++++++++++--------------
>  2 files changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index fe8338735e7c..d83c68872c4c 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -67,7 +67,6 @@ struct nfsd_net {
>  	struct lock_manager nfsd4_manager;
>  	bool grace_ended;
>  	bool grace_end_forced;
> -	bool client_tracking_active;
>  	time64_t boot_time;
>  
>  	struct dentry *nfsd_client_dir;
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 124fe4f669aa..db292ac473c6 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -6512,12 +6512,12 @@ nfsd4_force_end_grace(struct nfsd_net *nn)
>  {
>  	if (!nn->client_tracking_ops)
>  		return false;
> -	spin_lock(&nn->client_lock);
> -	if (nn->client_tracking_active) {
> -		nn->grace_end_forced = true;
> -		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
> -	}
> -	spin_unlock(&nn->client_lock);
> +	/* laundromat_work must be initialised now, though it might be disabled */
> +	nn->grace_end_forced = true;
> +	/* This is a no-op after nfs4_state_shutdown_net() has called
> +	 * disable_delayed_work_sync()
> +	 */
> +	mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
>  	return true;
>  }
>  
> @@ -8840,7 +8840,6 @@ static int nfs4_state_create_net(struct net *net)
>  	nn->boot_time = ktime_get_real_seconds();
>  	nn->grace_ended = false;
>  	nn->grace_end_forced = false;
> -	nn->client_tracking_active = false;
>  	nn->nfsd4_manager.block_opens = true;
>  	INIT_LIST_HEAD(&nn->nfsd4_manager.list);
>  	INIT_LIST_HEAD(&nn->client_lru);
> @@ -8855,6 +8854,8 @@ static int nfs4_state_create_net(struct net *net)
>  	INIT_LIST_HEAD(&nn->blocked_locks_lru);
>  
>  	INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main);
> +	/* Make sure his cannot run until client tracking is initialised */
> +	disable_delayed_work(&nn->laundromat_work);
>  	INIT_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker);
>  	get_net(net);
>  
> @@ -8922,9 +8923,7 @@ nfs4_state_start_net(struct net *net)
>  	locks_start_grace(net, &nn->nfsd4_manager);
>  	nfsd4_client_tracking_init(net);
>  	/* safe for laundromat to run now */
> -	spin_lock(&nn->client_lock);
> -	nn->client_tracking_active = true;
> -	spin_unlock(&nn->client_lock);
> +	enable_delayed_work(&nn->laundromat_work);
>  	if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
>  		goto skip_grace;
>  	printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
> @@ -8973,10 +8972,7 @@ nfs4_state_shutdown_net(struct net *net)
>  
>  	shrinker_free(nn->nfsd_client_shrinker);
>  	cancel_work_sync(&nn->nfsd_shrinker_work);
> -	spin_lock(&nn->client_lock);
> -	nn->client_tracking_active = false;
> -	spin_unlock(&nn->client_lock);
> -	cancel_delayed_work_sync(&nn->laundromat_work);
> +	disable_delayed_work_sync(&nn->laundromat_work);
>  	locks_end_grace(&nn->nfsd4_manager);
>  
>  	INIT_LIST_HEAD(&reaplist);


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] nfsd: discard client_tracking_active and instead disable laundromat_work
  2025-07-04  7:20 ` [PATCH 2/2] nfsd: discard client_tracking_active and instead disable laundromat_work NeilBrown
  2025-07-04 15:10   ` Chuck Lever
@ 2025-07-04 15:13   ` Chuck Lever
  2025-07-05 11:20   ` Jeff Layton
  2 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever @ 2025-07-04 15:13 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Li Lingfeng,
	Jeff Layton

On 7/4/25 3:20 AM, NeilBrown wrote:
> We currently set client_tracking_active precisely when it is safe for
> laundromat_work to be scheduled.  It is possible to enable/disable
> laundromat_work, so we can do that instead of having a separate flag.
> 
> Doing this avoids overloading ->state_lock with a use that is only
> tangentially related to the other uses.
> 
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/nfsd/netns.h     |  1 -
>  fs/nfsd/nfs4state.c | 24 ++++++++++--------------
>  2 files changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index fe8338735e7c..d83c68872c4c 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -67,7 +67,6 @@ struct nfsd_net {
>  	struct lock_manager nfsd4_manager;
>  	bool grace_ended;
>  	bool grace_end_forced;
> -	bool client_tracking_active;
>  	time64_t boot_time;
>  
>  	struct dentry *nfsd_client_dir;
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 124fe4f669aa..db292ac473c6 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -6512,12 +6512,12 @@ nfsd4_force_end_grace(struct nfsd_net *nn)
>  {
>  	if (!nn->client_tracking_ops)
>  		return false;
> -	spin_lock(&nn->client_lock);
> -	if (nn->client_tracking_active) {
> -		nn->grace_end_forced = true;
> -		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
> -	}
> -	spin_unlock(&nn->client_lock);
> +	/* laundromat_work must be initialised now, though it might be disabled */
> +	nn->grace_end_forced = true;
> +	/* This is a no-op after nfs4_state_shutdown_net() has called
> +	 * disable_delayed_work_sync()
> +	 */
> +	mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
>  	return true;
>  }
>  
> @@ -8840,7 +8840,6 @@ static int nfs4_state_create_net(struct net *net)
>  	nn->boot_time = ktime_get_real_seconds();
>  	nn->grace_ended = false;
>  	nn->grace_end_forced = false;
> -	nn->client_tracking_active = false;
>  	nn->nfsd4_manager.block_opens = true;
>  	INIT_LIST_HEAD(&nn->nfsd4_manager.list);
>  	INIT_LIST_HEAD(&nn->client_lru);
> @@ -8855,6 +8854,8 @@ static int nfs4_state_create_net(struct net *net)
>  	INIT_LIST_HEAD(&nn->blocked_locks_lru);
>  
>  	INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main);
> +	/* Make sure his cannot run until client tracking is initialised */

Nit: maybe it's "Make sure /t/his cannot run " ?

I agree that backporting enable_delayed_work() is not practical.


> +	disable_delayed_work(&nn->laundromat_work);
>  	INIT_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker);
>  	get_net(net);
>  
> @@ -8922,9 +8923,7 @@ nfs4_state_start_net(struct net *net)
>  	locks_start_grace(net, &nn->nfsd4_manager);
>  	nfsd4_client_tracking_init(net);
>  	/* safe for laundromat to run now */
> -	spin_lock(&nn->client_lock);
> -	nn->client_tracking_active = true;
> -	spin_unlock(&nn->client_lock);
> +	enable_delayed_work(&nn->laundromat_work);
>  	if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
>  		goto skip_grace;
>  	printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
> @@ -8973,10 +8972,7 @@ nfs4_state_shutdown_net(struct net *net)
>  
>  	shrinker_free(nn->nfsd_client_shrinker);
>  	cancel_work_sync(&nn->nfsd_shrinker_work);
> -	spin_lock(&nn->client_lock);
> -	nn->client_tracking_active = false;
> -	spin_unlock(&nn->client_lock);
> -	cancel_delayed_work_sync(&nn->laundromat_work);
> +	disable_delayed_work_sync(&nn->laundromat_work);
>  	locks_end_grace(&nn->nfsd4_manager);
>  
>  	INIT_LIST_HEAD(&reaplist);


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] nfsd: provide locking for v4_end_grace
  2025-07-04  7:20 ` [PATCH 1/2] " NeilBrown
@ 2025-07-05 11:15   ` Jeff Layton
  2025-07-06  5:56     ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2025-07-05 11:15 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Li Lingfeng

On Fri, 2025-07-04 at 17:20 +1000, NeilBrown wrote:
> Writing to v4_end_grace can race with server shutdown and result in
> memory being accessed after it was freed - reclaim_str_hashtbl in
> particular.
> 
> We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
> held while client_tracking_op->init() is called and that can wait for
> an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
> deadlock.
> 
> nfsd4_end_grace() is also called by the landromat work queue and this
> doesn't require locking as server shutdown will stop the work and wait
> for it before freeing anything that nfsd4_end_grace() might access.
> 
> However, we must be sure that writing to v4_end_grace doesn't restart
> the work item after shutdown has already waited for it.  For this we add
> a new flag protected with a spin_lock, and nn->client_lock is suitable.
> It is set only while it is safe to make client tracking calls, and
> v4_end_grace only schedules work while the flag is set and with the
> spin_lock held.
> 
> So this patch adds an nfsd_net field "client_tracking_active" which is
> set as described.  Another field "grace_end_forced", is set when
> v4_end_grace is written.  After this is set, and providing
> client_tracking_active is set, the laundromat is scheduled.
> This "grace_end_forced" field bypasses other checks for whether the
> grace period has finished.
> 
> This resolves a race which can result in use-after-free.
> 
> Reported-and-tested-by: Li Lingfeng <lilingfeng3@huawei.com>
> Closes: https://lore.kernel.org/linux-nfs/20250513074305.3362209-1-lilingfeng3@huawei.com
> Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
> Cc: stable@vger.kernel.org
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/nfsd/netns.h     |  2 ++
>  fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
>  fs/nfsd/nfsctl.c    |  6 +++---
>  fs/nfsd/state.h     |  2 +-
>  4 files changed, 49 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index 3e2d0fde80a7..fe8338735e7c 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -66,6 +66,8 @@ struct nfsd_net {
>  
>  	struct lock_manager nfsd4_manager;
>  	bool grace_ended;
> +	bool grace_end_forced;
> +	bool client_tracking_active;

ISTM that the client_tracking_active bool is set and cleared similarly
to the nn->client_tracking_ops pointer itself. It _might_ be possible
to eliminate this bool and just use that pointer instead, though they
are not exactly cleared at the same time.

>  	time64_t boot_time;
>  
>  	struct dentry *nfsd_client_dir;
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index d5694987f86f..124fe4f669aa 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -84,7 +84,7 @@ static u64 current_sessionid = 1;
>  /* forward declarations */
>  static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner);
>  static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
> -void nfsd4_end_grace(struct nfsd_net *nn);
> +static void nfsd4_end_grace(struct nfsd_net *nn);
>  static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
>  static void nfsd4_file_hash_remove(struct nfs4_file *fi);
>  static void deleg_reaper(struct nfsd_net *nn);
> @@ -6458,7 +6458,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	return nfs_ok;
>  }
>  
> -void
> +static void
>  nfsd4_end_grace(struct nfsd_net *nn)
>  {
>  	/* do nothing if grace period already ended */
> @@ -6491,6 +6491,36 @@ nfsd4_end_grace(struct nfsd_net *nn)
>  	 */
>  }
>  
> +/**
> + * nfsd4_force_end_grace - forcibly end the grace period
> + * @nn: nfsd_net in which the grace period must end.
> + *
> + *
> + * An nfsv4 grace period can be terminated early if it is known that
> + * no more client could reclaim state.  Sometimes user-space can provide
> + * that information - which will potentially be provided asychnronously
> + * w.r.t. server startup or shutdown.
> + *
> + * nfsd4_force_end_grace() causing the grace period to end and takes
> + * care to ensure races with server start/stop are not problematic.
> + *
> + * Return value:  %false if the NFS server was not active and
> + *      %true if the server was, or may have been, active.
> + */
> +bool
> +nfsd4_force_end_grace(struct nfsd_net *nn)
> +{
> +	if (!nn->client_tracking_ops)
> +		return false;
> +	spin_lock(&nn->client_lock);
> +	if (nn->client_tracking_active) {
> +		nn->grace_end_forced = true;
> +		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
> +	}
> +	spin_unlock(&nn->client_lock);
> +	return true;
> +}
> +
>  /*
>   * If we've waited a lease period but there are still clients trying to
>   * reclaim, wait a little longer to give them a chance to finish.
> @@ -6500,6 +6530,8 @@ static bool clients_still_reclaiming(struct nfsd_net *nn)
>  	time64_t double_grace_period_end = nn->boot_time +
>  					   2 * nn->nfsd4_lease;
>  
> +	if (nn->grace_end_forced)
> +		return false;
>  	if (nn->track_reclaim_completes &&
>  			atomic_read(&nn->nr_reclaim_complete) ==
>  			nn->reclaim_str_hashtbl_size)
> @@ -8807,6 +8839,8 @@ static int nfs4_state_create_net(struct net *net)
>  	nn->unconf_name_tree = RB_ROOT;
>  	nn->boot_time = ktime_get_real_seconds();
>  	nn->grace_ended = false;
> +	nn->grace_end_forced = false;
> +	nn->client_tracking_active = false;
>  	nn->nfsd4_manager.block_opens = true;
>  	INIT_LIST_HEAD(&nn->nfsd4_manager.list);
>  	INIT_LIST_HEAD(&nn->client_lru);
> @@ -8887,6 +8921,10 @@ nfs4_state_start_net(struct net *net)
>  		return ret;
>  	locks_start_grace(net, &nn->nfsd4_manager);
>  	nfsd4_client_tracking_init(net);
> +	/* safe for laundromat to run now */
> +	spin_lock(&nn->client_lock);
> +	nn->client_tracking_active = true;
> +	spin_unlock(&nn->client_lock);
>  	if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
>  		goto skip_grace;
>  	printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
> @@ -8935,6 +8973,9 @@ nfs4_state_shutdown_net(struct net *net)
>  
>  	shrinker_free(nn->nfsd_client_shrinker);
>  	cancel_work_sync(&nn->nfsd_shrinker_work);
> +	spin_lock(&nn->client_lock);
> +	nn->client_tracking_active = false;
> +	spin_unlock(&nn->client_lock);
>  	cancel_delayed_work_sync(&nn->laundromat_work);
>  	locks_end_grace(&nn->nfsd4_manager);
>  
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 3f3e9f6c4250..658f3f86a59f 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -1082,10 +1082,10 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
>  		case 'Y':
>  		case 'y':
>  		case '1':
> -			if (!nn->nfsd_serv)
> -				return -EBUSY;
>  			trace_nfsd_end_grace(netns(file));
> -			nfsd4_end_grace(nn);
> +			if (!nfsd4_force_end_grace(nn))
> +				return -EBUSY;
> +
>  			break;
>  		default:
>  			return -EINVAL;
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 1995bca158b8..05eabc69de40 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -836,7 +836,7 @@ static inline void nfsd4_revoke_states(struct net *net, struct super_block *sb)
>  #endif
>  
>  /* grace period management */
> -void nfsd4_end_grace(struct nfsd_net *nn);
> +bool nfsd4_force_end_grace(struct nfsd_net *nn);
>  
>  /* nfs4recover operations */
>  extern int nfsd4_client_tracking_init(struct net *net);

The patch itself and the new bool are fine though.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] nfsd: discard client_tracking_active and instead disable laundromat_work
  2025-07-04  7:20 ` [PATCH 2/2] nfsd: discard client_tracking_active and instead disable laundromat_work NeilBrown
  2025-07-04 15:10   ` Chuck Lever
  2025-07-04 15:13   ` Chuck Lever
@ 2025-07-05 11:20   ` Jeff Layton
  2 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2025-07-05 11:20 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Li Lingfeng

On Fri, 2025-07-04 at 17:20 +1000, NeilBrown wrote:
> We currently set client_tracking_active precisely when it is safe for
> laundromat_work to be scheduled.  It is possible to enable/disable
> laundromat_work, so we can do that instead of having a separate flag.
> 
> Doing this avoids overloading ->state_lock with a use that is only
> tangentially related to the other uses.
> 
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/nfsd/netns.h     |  1 -
>  fs/nfsd/nfs4state.c | 24 ++++++++++--------------
>  2 files changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index fe8338735e7c..d83c68872c4c 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -67,7 +67,6 @@ struct nfsd_net {
>  	struct lock_manager nfsd4_manager;
>  	bool grace_ended;
>  	bool grace_end_forced;
> -	bool client_tracking_active;
>  	time64_t boot_time;
>  
>  	struct dentry *nfsd_client_dir;
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 124fe4f669aa..db292ac473c6 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -6512,12 +6512,12 @@ nfsd4_force_end_grace(struct nfsd_net *nn)
>  {
>  	if (!nn->client_tracking_ops)
>  		return false;
> -	spin_lock(&nn->client_lock);
> -	if (nn->client_tracking_active) {
> -		nn->grace_end_forced = true;
> -		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
> -	}
> -	spin_unlock(&nn->client_lock);
> +	/* laundromat_work must be initialised now, though it might be disabled */
> +	nn->grace_end_forced = true;
> +	/* This is a no-op after nfs4_state_shutdown_net() has called
> +	 * disable_delayed_work_sync()
> +	 */
> +	mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
>  	return true;
>  }
>  
> @@ -8840,7 +8840,6 @@ static int nfs4_state_create_net(struct net *net)
>  	nn->boot_time = ktime_get_real_seconds();
>  	nn->grace_ended = false;
>  	nn->grace_end_forced = false;
> -	nn->client_tracking_active = false;
>  	nn->nfsd4_manager.block_opens = true;
>  	INIT_LIST_HEAD(&nn->nfsd4_manager.list);
>  	INIT_LIST_HEAD(&nn->client_lru);
> @@ -8855,6 +8854,8 @@ static int nfs4_state_create_net(struct net *net)
>  	INIT_LIST_HEAD(&nn->blocked_locks_lru);
>  
>  	INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main);
> +	/* Make sure his cannot run until client tracking is initialised */
> +	disable_delayed_work(&nn->laundromat_work);
>  	INIT_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker);
>  	get_net(net);
>  
> @@ -8922,9 +8923,7 @@ nfs4_state_start_net(struct net *net)
>  	locks_start_grace(net, &nn->nfsd4_manager);
>  	nfsd4_client_tracking_init(net);
>  	/* safe for laundromat to run now */
> -	spin_lock(&nn->client_lock);
> -	nn->client_tracking_active = true;
> -	spin_unlock(&nn->client_lock);
> +	enable_delayed_work(&nn->laundromat_work);
>  	if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
>  		goto skip_grace;
>  	printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
> @@ -8973,10 +8972,7 @@ nfs4_state_shutdown_net(struct net *net)
>  
>  	shrinker_free(nn->nfsd_client_shrinker);
>  	cancel_work_sync(&nn->nfsd_shrinker_work);
> -	spin_lock(&nn->client_lock);
> -	nn->client_tracking_active = false;
> -	spin_unlock(&nn->client_lock);
> -	cancel_delayed_work_sync(&nn->laundromat_work);
> +	disable_delayed_work_sync(&nn->laundromat_work);
>  	locks_end_grace(&nn->nfsd4_manager);
>  
>  	INIT_LIST_HEAD(&reaplist);

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] nfsd: provide locking for v4_end_grace
  2025-07-05 11:15   ` Jeff Layton
@ 2025-07-06  5:56     ` NeilBrown
  0 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2025-07-06  5:56 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Li Lingfeng

On Sat, 05 Jul 2025, Jeff Layton wrote:
> On Fri, 2025-07-04 at 17:20 +1000, NeilBrown wrote:
> > Writing to v4_end_grace can race with server shutdown and result in
> > memory being accessed after it was freed - reclaim_str_hashtbl in
> > particular.
> > 
> > We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
> > held while client_tracking_op->init() is called and that can wait for
> > an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
> > deadlock.
> > 
> > nfsd4_end_grace() is also called by the landromat work queue and this
> > doesn't require locking as server shutdown will stop the work and wait
> > for it before freeing anything that nfsd4_end_grace() might access.
> > 
> > However, we must be sure that writing to v4_end_grace doesn't restart
> > the work item after shutdown has already waited for it.  For this we add
> > a new flag protected with a spin_lock, and nn->client_lock is suitable.
> > It is set only while it is safe to make client tracking calls, and
> > v4_end_grace only schedules work while the flag is set and with the
> > spin_lock held.
> > 
> > So this patch adds an nfsd_net field "client_tracking_active" which is
> > set as described.  Another field "grace_end_forced", is set when
> > v4_end_grace is written.  After this is set, and providing
> > client_tracking_active is set, the laundromat is scheduled.
> > This "grace_end_forced" field bypasses other checks for whether the
> > grace period has finished.
> > 
> > This resolves a race which can result in use-after-free.
> > 
> > Reported-and-tested-by: Li Lingfeng <lilingfeng3@huawei.com>
> > Closes: https://lore.kernel.org/linux-nfs/20250513074305.3362209-1-lilingfeng3@huawei.com
> > Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> >  fs/nfsd/netns.h     |  2 ++
> >  fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
> >  fs/nfsd/nfsctl.c    |  6 +++---
> >  fs/nfsd/state.h     |  2 +-
> >  4 files changed, 49 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > index 3e2d0fde80a7..fe8338735e7c 100644
> > --- a/fs/nfsd/netns.h
> > +++ b/fs/nfsd/netns.h
> > @@ -66,6 +66,8 @@ struct nfsd_net {
> >  
> >  	struct lock_manager nfsd4_manager;
> >  	bool grace_ended;
> > +	bool grace_end_forced;
> > +	bool client_tracking_active;
> 
> ISTM that the client_tracking_active bool is set and cleared similarly
> to the nn->client_tracking_ops pointer itself. It _might_ be possible
> to eliminate this bool and just use that pointer instead, though they
> are not exactly cleared at the same time.

Yes it might.
We currently set nn->client_tracking_ops before calling
nn->client_tracking_ops->init(),
and only clear it *after* calling nn->client_tracking_ops->exit().
If the ->init and ->exit functions never need nn->client_tracking_ops
(and I don't think they do) then we could set it after a successful
init, and clear it before calling ->exit.  Then we could use it as the
flag. We would need the exit path to be
  spinlock()
  ops = xchg(nn->client_tracking_ops, NULL);
  spinunlock()
  cancel_delayed_work_sync()
  ops->exit()

which is a little less abstraction than I would like, but should be ok.

This might be a useful simplification but I don't think we should try it
before submitting the fix.

Thanks for the suggestion and for the review.

NeilBrown


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-07-06  5:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-04  7:20 [PATCH 0/2] nfsd: provide locking for v4_end_grace NeilBrown
2025-07-04  7:20 ` [PATCH 1/2] " NeilBrown
2025-07-05 11:15   ` Jeff Layton
2025-07-06  5:56     ` NeilBrown
2025-07-04  7:20 ` [PATCH 2/2] nfsd: discard client_tracking_active and instead disable laundromat_work NeilBrown
2025-07-04 15:10   ` Chuck Lever
2025-07-04 15:13   ` Chuck Lever
2025-07-05 11:20   ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox