[PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand
@ 2024-11-13  5:38 NeilBrown
  2024-11-13  5:38 ` [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC NeilBrown
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: NeilBrown @ 2024-11-13  5:38 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

This patch set aims to allocate session-based DRC slots on demand, and
free them when not in use, or when memory is tight.

I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
overly agreesive, and with lots of printks, and it seems to do the right
thing, though memory pressure has never freed anything - I think you
need several clients with a non-trivial number of slots allocated before
the thresholds in the shrinker code will trigger any freeing.

I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
useful that is.  There are certainly cases where simply setting the
target in a SEQUENCE reply might not be enough, but I doubt they are
very common.  You would need a session to be completely idle, with the
last request received on it indicating that lots of slots were still in
use.

Currently we allocate slots one at a time when the last available slot
was used by the client, and only if a NOWAIT allocation can succeed.  It
is possible that this isn't quite agreesive enough.  When performing a
lot of writeback it can be useful to have lots of slots, but memory
pressure is also likely to build up on the server so GFP_NOWAIT is likely
to fail.  Maybe occasionally using a firmer request (outside the
spinlock) would be justified.

We free slots when the number of unused slots passes some threshold -
currently 6 (because ...  why not).  Possible a hysteresis should be
added so we don't free unused slots for a least N seconds.

When the shrinker wants to apply presure we remove slots equally from
all sessions.  Maybe there should be some proportionality but that would
be more complex and I'm not sure it would gain much.  Slot 0 can never
be freed of course.

I'm very interested to see what people think of the over-all approach,
and of the specifics of the code.

Thanks,
NeilBrown

 [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
 [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
 [PATCH 3/4] nfsd: free unused session-DRC slots
 [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
  2024-11-13  5:38 [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand NeilBrown
@ 2024-11-13  5:38 ` NeilBrown
  2024-11-13 16:48   ` Jeff Layton
  2024-11-13  5:38 ` [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand NeilBrown
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2024-11-13  5:38 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

Rather than guessing how much space it might be safe to use for the DRC,
simply try allocating slots and be prepared to accept failure.

The first slot for each session is allocated with GFP_KERNEL which is
unlikely to fail.  Subsequent slots are allocated with the addition of
__GFP_NORETRY which is expected to fail if there isn't much free memory.

This is probably too aggressive but clears the way for adding a
shrinker interface to free extra slots when memory is tight.

Also *always* allow NFSD_MAX_SLOTS_PER_SESSION slot pointers per
session.  This allows the session to be allocated before we know how
many slots we will successfully allocate, and will be useful when we
starting dynamically increasing the number of allocated slots.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 91 +++++++--------------------------------------
 fs/nfsd/nfsd.h      |  3 --
 fs/nfsd/nfssvc.c    | 32 ----------------
 fs/nfsd/state.h     |  2 +-
 4 files changed, 14 insertions(+), 114 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 551d2958ec29..2dcba0c83c10 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1935,59 +1935,6 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
 	return size + sizeof(struct nfsd4_slot);
 }
 
-/*
- * XXX: If we run out of reserved DRC memory we could (up to a point)
- * re-negotiate active sessions and reduce their slot usage to make
- * room for new connections. For now we just fail the create session.
- */
-static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
-{
-	u32 slotsize = slot_bytes(ca);
-	u32 num = ca->maxreqs;
-	unsigned long avail, total_avail;
-	unsigned int scale_factor;
-
-	spin_lock(&nfsd_drc_lock);
-	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
-		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
-	else
-		/* We have handed out more space than we chose in
-		 * set_max_drc() to allow.  That isn't really a
-		 * problem as long as that doesn't make us think we
-		 * have lots more due to integer overflow.
-		 */
-		total_avail = 0;
-	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
-	/*
-	 * Never use more than a fraction of the remaining memory,
-	 * unless it's the only way to give this client a slot.
-	 * The chosen fraction is either 1/8 or 1/number of threads,
-	 * whichever is smaller.  This ensures there are adequate
-	 * slots to support multiple clients per thread.
-	 * Give the client one slot even if that would require
-	 * over-allocation--it is better than failure.
-	 */
-	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
-
-	avail = clamp_t(unsigned long, avail, slotsize,
-			total_avail/scale_factor);
-	num = min_t(int, num, avail / slotsize);
-	num = max_t(int, num, 1);
-	nfsd_drc_mem_used += num * slotsize;
-	spin_unlock(&nfsd_drc_lock);
-
-	return num;
-}
-
-static void nfsd4_put_drc_mem(struct nfsd4_channel_attrs *ca)
-{
-	int slotsize = slot_bytes(ca);
-
-	spin_lock(&nfsd_drc_lock);
-	nfsd_drc_mem_used -= slotsize * ca->maxreqs;
-	spin_unlock(&nfsd_drc_lock);
-}
-
 static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
 					   struct nfsd4_channel_attrs *battrs)
 {
@@ -1996,26 +1943,27 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
 	struct nfsd4_session *new;
 	int i;
 
-	BUILD_BUG_ON(struct_size(new, se_slots, NFSD_MAX_SLOTS_PER_SESSION)
-		     > PAGE_SIZE);
+	BUILD_BUG_ON(sizeof(new) > PAGE_SIZE);
 
-	new = kzalloc(struct_size(new, se_slots, numslots), GFP_KERNEL);
+	new = kzalloc(sizeof(*new), GFP_KERNEL);
 	if (!new)
 		return NULL;
 	/* allocate each struct nfsd4_slot and data cache in one piece */
-	for (i = 0; i < numslots; i++) {
-		new->se_slots[i] = kzalloc(slotsize, GFP_KERNEL);
+	new->se_slots[0] = kzalloc(slotsize, GFP_KERNEL);
+	if (!new->se_slots[0])
+		goto out_free;
+
+	for (i = 1; i < numslots; i++) {
+		new->se_slots[i] = kzalloc(slotsize, GFP_KERNEL | __GFP_NORETRY);
 		if (!new->se_slots[i])
-			goto out_free;
+			break;
 	}
-
+	fattrs->maxreqs = i;
 	memcpy(&new->se_fchannel, fattrs, sizeof(struct nfsd4_channel_attrs));
 	memcpy(&new->se_bchannel, battrs, sizeof(struct nfsd4_channel_attrs));
 
 	return new;
 out_free:
-	while (i--)
-		kfree(new->se_slots[i]);
 	kfree(new);
 	return NULL;
 }
@@ -2128,7 +2076,6 @@ static void __free_session(struct nfsd4_session *ses)
 static void free_session(struct nfsd4_session *ses)
 {
 	nfsd4_del_conns(ses);
-	nfsd4_put_drc_mem(&ses->se_fchannel);
 	__free_session(ses);
 }
 
@@ -3748,17 +3695,6 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
 	ca->maxresp_cached = min_t(u32, ca->maxresp_cached,
 			NFSD_SLOT_CACHE_SIZE + NFSD_MIN_HDR_SEQ_SZ);
 	ca->maxreqs = min_t(u32, ca->maxreqs, NFSD_MAX_SLOTS_PER_SESSION);
-	/*
-	 * Note decreasing slot size below client's request may make it
-	 * difficult for client to function correctly, whereas
-	 * decreasing the number of slots will (just?) affect
-	 * performance.  When short on memory we therefore prefer to
-	 * decrease number of slots instead of their size.  Clients that
-	 * request larger slots than they need will get poor results:
-	 * Note that we always allow at least one slot, because our
-	 * accounting is soft and provides no guarantees either way.
-	 */
-	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
 
 	return nfs_ok;
 }
@@ -3836,11 +3772,11 @@ nfsd4_create_session(struct svc_rqst *rqstp,
 		return status;
 	status = check_backchannel_attrs(&cr_ses->back_channel);
 	if (status)
-		goto out_release_drc_mem;
+		goto out_err;
 	status = nfserr_jukebox;
 	new = alloc_session(&cr_ses->fore_channel, &cr_ses->back_channel);
 	if (!new)
-		goto out_release_drc_mem;
+		goto out_err;
 	conn = alloc_conn_from_crses(rqstp, cr_ses);
 	if (!conn)
 		goto out_free_session;
@@ -3950,8 +3886,7 @@ nfsd4_create_session(struct svc_rqst *rqstp,
 		expire_client(old);
 out_free_session:
 	__free_session(new);
-out_release_drc_mem:
-	nfsd4_put_drc_mem(&cr_ses->fore_channel);
+out_err:
 	return status;
 }
 
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 4b56ba1e8e48..3eb21e63b921 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -88,9 +88,6 @@ struct nfsd_genl_rqstp {
 extern struct svc_program	nfsd_programs[];
 extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
 extern struct mutex		nfsd_mutex;
-extern spinlock_t		nfsd_drc_lock;
-extern unsigned long		nfsd_drc_max_mem;
-extern unsigned long		nfsd_drc_mem_used;
 extern atomic_t			nfsd_th_cnt;		/* number of available threads */
 
 extern const struct seq_operations nfs_exports_op;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 49e2f32102ab..3dbaefc96608 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -70,16 +70,6 @@ static __be32			nfsd_init_request(struct svc_rqst *,
  */
 DEFINE_MUTEX(nfsd_mutex);
 
-/*
- * nfsd_drc_lock protects nfsd_drc_max_pages and nfsd_drc_pages_used.
- * nfsd_drc_max_pages limits the total amount of memory available for
- * version 4.1 DRC caches.
- * nfsd_drc_pages_used tracks the current version 4.1 DRC memory usage.
- */
-DEFINE_SPINLOCK(nfsd_drc_lock);
-unsigned long	nfsd_drc_max_mem;
-unsigned long	nfsd_drc_mem_used;
-
 #if IS_ENABLED(CONFIG_NFS_LOCALIO)
 static const struct svc_version *localio_versions[] = {
 	[1] = &localio_version1,
@@ -575,27 +565,6 @@ void nfsd_reset_versions(struct nfsd_net *nn)
 		}
 }
 
-/*
- * Each session guarantees a negotiated per slot memory cache for replies
- * which in turn consumes memory beyond the v2/v3/v4.0 server. A dedicated
- * NFSv4.1 server might want to use more memory for a DRC than a machine
- * with mutiple services.
- *
- * Impose a hard limit on the number of pages for the DRC which varies
- * according to the machines free pages. This is of course only a default.
- *
- * For now this is a #defined shift which could be under admin control
- * in the future.
- */
-static void set_max_drc(void)
-{
-	#define NFSD_DRC_SIZE_SHIFT	7
-	nfsd_drc_max_mem = (nr_free_buffer_pages()
-					>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
-	nfsd_drc_mem_used = 0;
-	dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
-}
-
 static int nfsd_get_default_max_blksize(void)
 {
 	struct sysinfo i;
@@ -678,7 +647,6 @@ int nfsd_create_serv(struct net *net)
 	nn->nfsd_serv = serv;
 	spin_unlock(&nfsd_notifier_lock);
 
-	set_max_drc();
 	/* check if the notifier is already set */
 	if (atomic_inc_return(&nfsd_notifier_refcount) == 1) {
 		register_inetaddr_notifier(&nfsd_inetaddr_notifier);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 35b3564c065f..c052e9eea81b 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -310,7 +310,7 @@ struct nfsd4_session {
 	struct list_head	se_conns;
 	u32			se_cb_prog;
 	u32			se_cb_seq_nr;
-	struct nfsd4_slot	*se_slots[];	/* forward channel slots */
+	struct nfsd4_slot	*se_slots[NFSD_MAX_SLOTS_PER_SESSION];	/* forward channel slots */
 };
 
 /* formatted contents of nfs4_sessionid */
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
  2024-11-13  5:38 ` [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC NeilBrown
@ 2024-11-13 16:48   ` Jeff Layton
  2024-11-15  4:01     ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2024-11-13 16:48 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, 2024-11-13 at 16:38 +1100, NeilBrown wrote:
> Rather than guessing how much space it might be safe to use for the DRC,
> simply try allocating slots and be prepared to accept failure.
> 
> The first slot for each session is allocated with GFP_KERNEL which is
> unlikely to fail.  Subsequent slots are allocated with the addition of
> __GFP_NORETRY which is expected to fail if there isn't much free memory.
> 
> This is probably too aggressive but clears the way for adding a
> shrinker interface to free extra slots when memory is tight.
> 
> Also *always* allow NFSD_MAX_SLOTS_PER_SESSION slot pointers per
> session.  This allows the session to be allocated before we know how
> many slots we will successfully allocate, and will be useful when we
> starting dynamically increasing the number of allocated slots.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4state.c | 91 +++++++--------------------------------------
>  fs/nfsd/nfsd.h      |  3 --
>  fs/nfsd/nfssvc.c    | 32 ----------------
>  fs/nfsd/state.h     |  2 +-
>  4 files changed, 14 insertions(+), 114 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 551d2958ec29..2dcba0c83c10 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1935,59 +1935,6 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
>  	return size + sizeof(struct nfsd4_slot);
>  }
>  
> -/*
> - * XXX: If we run out of reserved DRC memory we could (up to a point)
> - * re-negotiate active sessions and reduce their slot usage to make
> - * room for new connections. For now we just fail the create session.
> - */
> -static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
> -{
> -	u32 slotsize = slot_bytes(ca);
> -	u32 num = ca->maxreqs;
> -	unsigned long avail, total_avail;
> -	unsigned int scale_factor;
> -
> -	spin_lock(&nfsd_drc_lock);
> -	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
> -		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
> -	else
> -		/* We have handed out more space than we chose in
> -		 * set_max_drc() to allow.  That isn't really a
> -		 * problem as long as that doesn't make us think we
> -		 * have lots more due to integer overflow.
> -		 */
> -		total_avail = 0;
> -	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
> -	/*
> -	 * Never use more than a fraction of the remaining memory,
> -	 * unless it's the only way to give this client a slot.
> -	 * The chosen fraction is either 1/8 or 1/number of threads,
> -	 * whichever is smaller.  This ensures there are adequate
> -	 * slots to support multiple clients per thread.
> -	 * Give the client one slot even if that would require
> -	 * over-allocation--it is better than failure.
> -	 */
> -	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
> -
> -	avail = clamp_t(unsigned long, avail, slotsize,
> -			total_avail/scale_factor);
> -	num = min_t(int, num, avail / slotsize);
> -	num = max_t(int, num, 1);
> -	nfsd_drc_mem_used += num * slotsize;
> -	spin_unlock(&nfsd_drc_lock);
> -
> -	return num;
> -}
> -
> -static void nfsd4_put_drc_mem(struct nfsd4_channel_attrs *ca)
> -{
> -	int slotsize = slot_bytes(ca);
> -
> -	spin_lock(&nfsd_drc_lock);
> -	nfsd_drc_mem_used -= slotsize * ca->maxreqs;
> -	spin_unlock(&nfsd_drc_lock);
> -}
> -
>  static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
>  					   struct nfsd4_channel_attrs *battrs)
>  {
> @@ -1996,26 +1943,27 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
>  	struct nfsd4_session *new;
>  	int i;
>  
> -	BUILD_BUG_ON(struct_size(new, se_slots, NFSD_MAX_SLOTS_PER_SESSION)
> -		     > PAGE_SIZE);
> +	BUILD_BUG_ON(sizeof(new) > PAGE_SIZE);
>  
> -	new = kzalloc(struct_size(new, se_slots, numslots), GFP_KERNEL);
> +	new = kzalloc(sizeof(*new), GFP_KERNEL);
>  	if (!new)
>  		return NULL;
>  	/* allocate each struct nfsd4_slot and data cache in one piece */
> -	for (i = 0; i < numslots; i++) {
> -		new->se_slots[i] = kzalloc(slotsize, GFP_KERNEL);
> +	new->se_slots[0] = kzalloc(slotsize, GFP_KERNEL);
> +	if (!new->se_slots[0])
> +		goto out_free;
> +
> +	for (i = 1; i < numslots; i++) {
> +		new->se_slots[i] = kzalloc(slotsize, GFP_KERNEL | __GFP_NORETRY);
>  		if (!new->se_slots[i])
> -			goto out_free;
> +			break;
>  	}
> -
> +	fattrs->maxreqs = i;
>  	memcpy(&new->se_fchannel, fattrs, sizeof(struct nfsd4_channel_attrs));
>  	memcpy(&new->se_bchannel, battrs, sizeof(struct nfsd4_channel_attrs));
>  
>  	return new;
>  out_free:
> -	while (i--)
> -		kfree(new->se_slots[i]);
>  	kfree(new);
>  	return NULL;
>  }
> @@ -2128,7 +2076,6 @@ static void __free_session(struct nfsd4_session *ses)
>  static void free_session(struct nfsd4_session *ses)
>  {
>  	nfsd4_del_conns(ses);
> -	nfsd4_put_drc_mem(&ses->se_fchannel);
>  	__free_session(ses);
>  }
>  
> @@ -3748,17 +3695,6 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
>  	ca->maxresp_cached = min_t(u32, ca->maxresp_cached,
>  			NFSD_SLOT_CACHE_SIZE + NFSD_MIN_HDR_SEQ_SZ);
>  	ca->maxreqs = min_t(u32, ca->maxreqs, NFSD_MAX_SLOTS_PER_SESSION);
> -	/*
> -	 * Note decreasing slot size below client's request may make it
> -	 * difficult for client to function correctly, whereas
> -	 * decreasing the number of slots will (just?) affect
> -	 * performance.  When short on memory we therefore prefer to
> -	 * decrease number of slots instead of their size.  Clients that
> -	 * request larger slots than they need will get poor results:
> -	 * Note that we always allow at least one slot, because our
> -	 * accounting is soft and provides no guarantees either way.
> -	 */
> -	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
>  
>  	return nfs_ok;
>  }
> @@ -3836,11 +3772,11 @@ nfsd4_create_session(struct svc_rqst *rqstp,
>  		return status;
>  	status = check_backchannel_attrs(&cr_ses->back_channel);
>  	if (status)
> -		goto out_release_drc_mem;
> +		goto out_err;
>  	status = nfserr_jukebox;
>  	new = alloc_session(&cr_ses->fore_channel, &cr_ses->back_channel);
>  	if (!new)
> -		goto out_release_drc_mem;
> +		goto out_err;
>  	conn = alloc_conn_from_crses(rqstp, cr_ses);
>  	if (!conn)
>  		goto out_free_session;
> @@ -3950,8 +3886,7 @@ nfsd4_create_session(struct svc_rqst *rqstp,
>  		expire_client(old);
>  out_free_session:
>  	__free_session(new);
> -out_release_drc_mem:
> -	nfsd4_put_drc_mem(&cr_ses->fore_channel);
> +out_err:
>  	return status;
>  }
>  
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index 4b56ba1e8e48..3eb21e63b921 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -88,9 +88,6 @@ struct nfsd_genl_rqstp {
>  extern struct svc_program	nfsd_programs[];
>  extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
>  extern struct mutex		nfsd_mutex;
> -extern spinlock_t		nfsd_drc_lock;
> -extern unsigned long		nfsd_drc_max_mem;
> -extern unsigned long		nfsd_drc_mem_used;
>  extern atomic_t			nfsd_th_cnt;		/* number of available threads */
>  
>  extern const struct seq_operations nfs_exports_op;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 49e2f32102ab..3dbaefc96608 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -70,16 +70,6 @@ static __be32			nfsd_init_request(struct svc_rqst *,
>   */
>  DEFINE_MUTEX(nfsd_mutex);
>  
> -/*
> - * nfsd_drc_lock protects nfsd_drc_max_pages and nfsd_drc_pages_used.
> - * nfsd_drc_max_pages limits the total amount of memory available for
> - * version 4.1 DRC caches.
> - * nfsd_drc_pages_used tracks the current version 4.1 DRC memory usage.
> - */
> -DEFINE_SPINLOCK(nfsd_drc_lock);
> -unsigned long	nfsd_drc_max_mem;
> -unsigned long	nfsd_drc_mem_used;
> -
>  #if IS_ENABLED(CONFIG_NFS_LOCALIO)
>  static const struct svc_version *localio_versions[] = {
>  	[1] = &localio_version1,
> @@ -575,27 +565,6 @@ void nfsd_reset_versions(struct nfsd_net *nn)
>  		}
>  }
>  
> -/*
> - * Each session guarantees a negotiated per slot memory cache for replies
> - * which in turn consumes memory beyond the v2/v3/v4.0 server. A dedicated
> - * NFSv4.1 server might want to use more memory for a DRC than a machine
> - * with mutiple services.
> - *
> - * Impose a hard limit on the number of pages for the DRC which varies
> - * according to the machines free pages. This is of course only a default.
> - *
> - * For now this is a #defined shift which could be under admin control
> - * in the future.
> - */
> -static void set_max_drc(void)
> -{
> -	#define NFSD_DRC_SIZE_SHIFT	7
> -	nfsd_drc_max_mem = (nr_free_buffer_pages()
> -					>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
> -	nfsd_drc_mem_used = 0;
> -	dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
> -}
> -
>  static int nfsd_get_default_max_blksize(void)
>  {
>  	struct sysinfo i;
> @@ -678,7 +647,6 @@ int nfsd_create_serv(struct net *net)
>  	nn->nfsd_serv = serv;
>  	spin_unlock(&nfsd_notifier_lock);
>  
> -	set_max_drc();
>  	/* check if the notifier is already set */
>  	if (atomic_inc_return(&nfsd_notifier_refcount) == 1) {
>  		register_inetaddr_notifier(&nfsd_inetaddr_notifier);
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 35b3564c065f..c052e9eea81b 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -310,7 +310,7 @@ struct nfsd4_session {
>  	struct list_head	se_conns;
>  	u32			se_cb_prog;
>  	u32			se_cb_seq_nr;
> -	struct nfsd4_slot	*se_slots[];	/* forward channel slots */
> +	struct nfsd4_slot	*se_slots[NFSD_MAX_SLOTS_PER_SESSION];	/* forward channel slots */
>  };
>  
>  /* formatted contents of nfs4_sessionid */

I like this patch overall. One thing we could consider is allocating
slots aggressively with GFP_KERNEL up to a particular threshold, and
then switch to GFP_NOWAIT, but this seems like a reasonable place to
start.
 
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
  2024-11-13 16:48   ` Jeff Layton
@ 2024-11-15  4:01     ` NeilBrown
  2024-11-15 14:07       ` Jeff Layton
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2024-11-15  4:01 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Thu, 14 Nov 2024, Jeff Layton wrote:
> On Wed, 2024-11-13 at 16:38 +1100, NeilBrown wrote:
> > Rather than guessing how much space it might be safe to use for the DRC,
> > simply try allocating slots and be prepared to accept failure.
> > 
> > The first slot for each session is allocated with GFP_KERNEL which is
> > unlikely to fail.  Subsequent slots are allocated with the addition of
> > __GFP_NORETRY which is expected to fail if there isn't much free memory.
> > 
> > This is probably too aggressive but clears the way for adding a
> > shrinker interface to free extra slots when memory is tight.
> > 
> > Also *always* allow NFSD_MAX_SLOTS_PER_SESSION slot pointers per
> > session.  This allows the session to be allocated before we know how
> > many slots we will successfully allocate, and will be useful when we
> > starting dynamically increasing the number of allocated slots.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/nfs4state.c | 91 +++++++--------------------------------------
> >  fs/nfsd/nfsd.h      |  3 --
> >  fs/nfsd/nfssvc.c    | 32 ----------------
> >  fs/nfsd/state.h     |  2 +-
> >  4 files changed, 14 insertions(+), 114 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 551d2958ec29..2dcba0c83c10 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1935,59 +1935,6 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
> >  	return size + sizeof(struct nfsd4_slot);
> >  }
> >  
> > -/*
> > - * XXX: If we run out of reserved DRC memory we could (up to a point)
> > - * re-negotiate active sessions and reduce their slot usage to make
> > - * room for new connections. For now we just fail the create session.
> > - */
> > -static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
> > -{
> > -	u32 slotsize = slot_bytes(ca);
> > -	u32 num = ca->maxreqs;
> > -	unsigned long avail, total_avail;
> > -	unsigned int scale_factor;
> > -
> > -	spin_lock(&nfsd_drc_lock);
> > -	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
> > -		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
> > -	else
> > -		/* We have handed out more space than we chose in
> > -		 * set_max_drc() to allow.  That isn't really a
> > -		 * problem as long as that doesn't make us think we
> > -		 * have lots more due to integer overflow.
> > -		 */
> > -		total_avail = 0;
> > -	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
> > -	/*
> > -	 * Never use more than a fraction of the remaining memory,
> > -	 * unless it's the only way to give this client a slot.
> > -	 * The chosen fraction is either 1/8 or 1/number of threads,
> > -	 * whichever is smaller.  This ensures there are adequate
> > -	 * slots to support multiple clients per thread.
> > -	 * Give the client one slot even if that would require
> > -	 * over-allocation--it is better than failure.
> > -	 */
> > -	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
> > -
> > -	avail = clamp_t(unsigned long, avail, slotsize,
> > -			total_avail/scale_factor);
> > -	num = min_t(int, num, avail / slotsize);
> > -	num = max_t(int, num, 1);
> > -	nfsd_drc_mem_used += num * slotsize;
> > -	spin_unlock(&nfsd_drc_lock);
> > -
> > -	return num;
> > -}
> > -
> > -static void nfsd4_put_drc_mem(struct nfsd4_channel_attrs *ca)
> > -{
> > -	int slotsize = slot_bytes(ca);
> > -
> > -	spin_lock(&nfsd_drc_lock);
> > -	nfsd_drc_mem_used -= slotsize * ca->maxreqs;
> > -	spin_unlock(&nfsd_drc_lock);
> > -}
> > -
> >  static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
> >  					   struct nfsd4_channel_attrs *battrs)
> >  {
> > @@ -1996,26 +1943,27 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
> >  	struct nfsd4_session *new;
> >  	int i;
> >  
> > -	BUILD_BUG_ON(struct_size(new, se_slots, NFSD_MAX_SLOTS_PER_SESSION)
> > -		     > PAGE_SIZE);
> > +	BUILD_BUG_ON(sizeof(new) > PAGE_SIZE);
> >  
> > -	new = kzalloc(struct_size(new, se_slots, numslots), GFP_KERNEL);
> > +	new = kzalloc(sizeof(*new), GFP_KERNEL);
> >  	if (!new)
> >  		return NULL;
> >  	/* allocate each struct nfsd4_slot and data cache in one piece */
> > -	for (i = 0; i < numslots; i++) {
> > -		new->se_slots[i] = kzalloc(slotsize, GFP_KERNEL);
> > +	new->se_slots[0] = kzalloc(slotsize, GFP_KERNEL);
> > +	if (!new->se_slots[0])
> > +		goto out_free;
> > +
> > +	for (i = 1; i < numslots; i++) {
> > +		new->se_slots[i] = kzalloc(slotsize, GFP_KERNEL | __GFP_NORETRY);
> >  		if (!new->se_slots[i])
> > -			goto out_free;
> > +			break;
> >  	}
> > -
> > +	fattrs->maxreqs = i;
> >  	memcpy(&new->se_fchannel, fattrs, sizeof(struct nfsd4_channel_attrs));
> >  	memcpy(&new->se_bchannel, battrs, sizeof(struct nfsd4_channel_attrs));
> >  
> >  	return new;
> >  out_free:
> > -	while (i--)
> > -		kfree(new->se_slots[i]);
> >  	kfree(new);
> >  	return NULL;
> >  }
> > @@ -2128,7 +2076,6 @@ static void __free_session(struct nfsd4_session *ses)
> >  static void free_session(struct nfsd4_session *ses)
> >  {
> >  	nfsd4_del_conns(ses);
> > -	nfsd4_put_drc_mem(&ses->se_fchannel);
> >  	__free_session(ses);
> >  }
> >  
> > @@ -3748,17 +3695,6 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
> >  	ca->maxresp_cached = min_t(u32, ca->maxresp_cached,
> >  			NFSD_SLOT_CACHE_SIZE + NFSD_MIN_HDR_SEQ_SZ);
> >  	ca->maxreqs = min_t(u32, ca->maxreqs, NFSD_MAX_SLOTS_PER_SESSION);
> > -	/*
> > -	 * Note decreasing slot size below client's request may make it
> > -	 * difficult for client to function correctly, whereas
> > -	 * decreasing the number of slots will (just?) affect
> > -	 * performance.  When short on memory we therefore prefer to
> > -	 * decrease number of slots instead of their size.  Clients that
> > -	 * request larger slots than they need will get poor results:
> > -	 * Note that we always allow at least one slot, because our
> > -	 * accounting is soft and provides no guarantees either way.
> > -	 */
> > -	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
> >  
> >  	return nfs_ok;
> >  }
> > @@ -3836,11 +3772,11 @@ nfsd4_create_session(struct svc_rqst *rqstp,
> >  		return status;
> >  	status = check_backchannel_attrs(&cr_ses->back_channel);
> >  	if (status)
> > -		goto out_release_drc_mem;
> > +		goto out_err;
> >  	status = nfserr_jukebox;
> >  	new = alloc_session(&cr_ses->fore_channel, &cr_ses->back_channel);
> >  	if (!new)
> > -		goto out_release_drc_mem;
> > +		goto out_err;
> >  	conn = alloc_conn_from_crses(rqstp, cr_ses);
> >  	if (!conn)
> >  		goto out_free_session;
> > @@ -3950,8 +3886,7 @@ nfsd4_create_session(struct svc_rqst *rqstp,
> >  		expire_client(old);
> >  out_free_session:
> >  	__free_session(new);
> > -out_release_drc_mem:
> > -	nfsd4_put_drc_mem(&cr_ses->fore_channel);
> > +out_err:
> >  	return status;
> >  }
> >  
> > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > index 4b56ba1e8e48..3eb21e63b921 100644
> > --- a/fs/nfsd/nfsd.h
> > +++ b/fs/nfsd/nfsd.h
> > @@ -88,9 +88,6 @@ struct nfsd_genl_rqstp {
> >  extern struct svc_program	nfsd_programs[];
> >  extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
> >  extern struct mutex		nfsd_mutex;
> > -extern spinlock_t		nfsd_drc_lock;
> > -extern unsigned long		nfsd_drc_max_mem;
> > -extern unsigned long		nfsd_drc_mem_used;
> >  extern atomic_t			nfsd_th_cnt;		/* number of available threads */
> >  
> >  extern const struct seq_operations nfs_exports_op;
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index 49e2f32102ab..3dbaefc96608 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -70,16 +70,6 @@ static __be32			nfsd_init_request(struct svc_rqst *,
> >   */
> >  DEFINE_MUTEX(nfsd_mutex);
> >  
> > -/*
> > - * nfsd_drc_lock protects nfsd_drc_max_pages and nfsd_drc_pages_used.
> > - * nfsd_drc_max_pages limits the total amount of memory available for
> > - * version 4.1 DRC caches.
> > - * nfsd_drc_pages_used tracks the current version 4.1 DRC memory usage.
> > - */
> > -DEFINE_SPINLOCK(nfsd_drc_lock);
> > -unsigned long	nfsd_drc_max_mem;
> > -unsigned long	nfsd_drc_mem_used;
> > -
> >  #if IS_ENABLED(CONFIG_NFS_LOCALIO)
> >  static const struct svc_version *localio_versions[] = {
> >  	[1] = &localio_version1,
> > @@ -575,27 +565,6 @@ void nfsd_reset_versions(struct nfsd_net *nn)
> >  		}
> >  }
> >  
> > -/*
> > - * Each session guarantees a negotiated per slot memory cache for replies
> > - * which in turn consumes memory beyond the v2/v3/v4.0 server. A dedicated
> > - * NFSv4.1 server might want to use more memory for a DRC than a machine
> > - * with mutiple services.
> > - *
> > - * Impose a hard limit on the number of pages for the DRC which varies
> > - * according to the machines free pages. This is of course only a default.
> > - *
> > - * For now this is a #defined shift which could be under admin control
> > - * in the future.
> > - */
> > -static void set_max_drc(void)
> > -{
> > -	#define NFSD_DRC_SIZE_SHIFT	7
> > -	nfsd_drc_max_mem = (nr_free_buffer_pages()
> > -					>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
> > -	nfsd_drc_mem_used = 0;
> > -	dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
> > -}
> > -
> >  static int nfsd_get_default_max_blksize(void)
> >  {
> >  	struct sysinfo i;
> > @@ -678,7 +647,6 @@ int nfsd_create_serv(struct net *net)
> >  	nn->nfsd_serv = serv;
> >  	spin_unlock(&nfsd_notifier_lock);
> >  
> > -	set_max_drc();
> >  	/* check if the notifier is already set */
> >  	if (atomic_inc_return(&nfsd_notifier_refcount) == 1) {
> >  		register_inetaddr_notifier(&nfsd_inetaddr_notifier);
> > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > index 35b3564c065f..c052e9eea81b 100644
> > --- a/fs/nfsd/state.h
> > +++ b/fs/nfsd/state.h
> > @@ -310,7 +310,7 @@ struct nfsd4_session {
> >  	struct list_head	se_conns;
> >  	u32			se_cb_prog;
> >  	u32			se_cb_seq_nr;
> > -	struct nfsd4_slot	*se_slots[];	/* forward channel slots */
> > +	struct nfsd4_slot	*se_slots[NFSD_MAX_SLOTS_PER_SESSION];	/* forward channel slots */
> >  };
> >  
> >  /* formatted contents of nfs4_sessionid */
> 
> I like this patch overall. One thing we could consider is allocating
> slots aggressively with GFP_KERNEL up to a particular threshold, and
> then switch to GFP_NOWAIT, but this seems like a reasonable place to
> start.

At this point (when creating the session) it is __GFP_NORETRY , not
GET_NOWAIT.  So it makes a decent effort but does give up earlier than
GFP_KERNEL. 

What threshold should we use?  "1" seemed good to me as 1 is essential
but more is just for improved performance: wait for the client to
actually use them.

I'd rather allocate JIT than ASAP.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
  2024-11-15  4:01     ` NeilBrown
@ 2024-11-15 14:07       ` Jeff Layton
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Layton @ 2024-11-15 14:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Fri, 2024-11-15 at 15:01 +1100, NeilBrown wrote:
> On Thu, 14 Nov 2024, Jeff Layton wrote:
> > On Wed, 2024-11-13 at 16:38 +1100, NeilBrown wrote:
> > > Rather than guessing how much space it might be safe to use for the DRC,
> > > simply try allocating slots and be prepared to accept failure.
> > > 
> > > The first slot for each session is allocated with GFP_KERNEL which is
> > > unlikely to fail.  Subsequent slots are allocated with the addition of
> > > __GFP_NORETRY which is expected to fail if there isn't much free memory.
> > > 
> > > This is probably too aggressive but clears the way for adding a
> > > shrinker interface to free extra slots when memory is tight.
> > > 
> > > Also *always* allow NFSD_MAX_SLOTS_PER_SESSION slot pointers per
> > > session.  This allows the session to be allocated before we know how
> > > many slots we will successfully allocate, and will be useful when we
> > > starting dynamically increasing the number of allocated slots.
> > > 
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > ---
> > >  fs/nfsd/nfs4state.c | 91 +++++++--------------------------------------
> > >  fs/nfsd/nfsd.h      |  3 --
> > >  fs/nfsd/nfssvc.c    | 32 ----------------
> > >  fs/nfsd/state.h     |  2 +-
> > >  4 files changed, 14 insertions(+), 114 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 551d2958ec29..2dcba0c83c10 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -1935,59 +1935,6 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
> > >  	return size + sizeof(struct nfsd4_slot);
> > >  }
> > >  
> > > -/*
> > > - * XXX: If we run out of reserved DRC memory we could (up to a point)
> > > - * re-negotiate active sessions and reduce their slot usage to make
> > > - * room for new connections. For now we just fail the create session.
> > > - */
> > > -static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
> > > -{
> > > -	u32 slotsize = slot_bytes(ca);
> > > -	u32 num = ca->maxreqs;
> > > -	unsigned long avail, total_avail;
> > > -	unsigned int scale_factor;
> > > -
> > > -	spin_lock(&nfsd_drc_lock);
> > > -	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
> > > -		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
> > > -	else
> > > -		/* We have handed out more space than we chose in
> > > -		 * set_max_drc() to allow.  That isn't really a
> > > -		 * problem as long as that doesn't make us think we
> > > -		 * have lots more due to integer overflow.
> > > -		 */
> > > -		total_avail = 0;
> > > -	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
> > > -	/*
> > > -	 * Never use more than a fraction of the remaining memory,
> > > -	 * unless it's the only way to give this client a slot.
> > > -	 * The chosen fraction is either 1/8 or 1/number of threads,
> > > -	 * whichever is smaller.  This ensures there are adequate
> > > -	 * slots to support multiple clients per thread.
> > > -	 * Give the client one slot even if that would require
> > > -	 * over-allocation--it is better than failure.
> > > -	 */
> > > -	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
> > > -
> > > -	avail = clamp_t(unsigned long, avail, slotsize,
> > > -			total_avail/scale_factor);
> > > -	num = min_t(int, num, avail / slotsize);
> > > -	num = max_t(int, num, 1);
> > > -	nfsd_drc_mem_used += num * slotsize;
> > > -	spin_unlock(&nfsd_drc_lock);
> > > -
> > > -	return num;
> > > -}
> > > -
> > > -static void nfsd4_put_drc_mem(struct nfsd4_channel_attrs *ca)
> > > -{
> > > -	int slotsize = slot_bytes(ca);
> > > -
> > > -	spin_lock(&nfsd_drc_lock);
> > > -	nfsd_drc_mem_used -= slotsize * ca->maxreqs;
> > > -	spin_unlock(&nfsd_drc_lock);
> > > -}
> > > -
> > >  static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
> > >  					   struct nfsd4_channel_attrs *battrs)
> > >  {
> > > @@ -1996,26 +1943,27 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
> > >  	struct nfsd4_session *new;
> > >  	int i;
> > >  
> > > -	BUILD_BUG_ON(struct_size(new, se_slots, NFSD_MAX_SLOTS_PER_SESSION)
> > > -		     > PAGE_SIZE);
> > > +	BUILD_BUG_ON(sizeof(new) > PAGE_SIZE);
> > >  
> > > -	new = kzalloc(struct_size(new, se_slots, numslots), GFP_KERNEL);
> > > +	new = kzalloc(sizeof(*new), GFP_KERNEL);
> > >  	if (!new)
> > >  		return NULL;
> > >  	/* allocate each struct nfsd4_slot and data cache in one piece */
> > > -	for (i = 0; i < numslots; i++) {
> > > -		new->se_slots[i] = kzalloc(slotsize, GFP_KERNEL);
> > > +	new->se_slots[0] = kzalloc(slotsize, GFP_KERNEL);
> > > +	if (!new->se_slots[0])
> > > +		goto out_free;
> > > +
> > > +	for (i = 1; i < numslots; i++) {
> > > +		new->se_slots[i] = kzalloc(slotsize, GFP_KERNEL | __GFP_NORETRY);
> > >  		if (!new->se_slots[i])
> > > -			goto out_free;
> > > +			break;
> > >  	}
> > > -
> > > +	fattrs->maxreqs = i;
> > >  	memcpy(&new->se_fchannel, fattrs, sizeof(struct nfsd4_channel_attrs));
> > >  	memcpy(&new->se_bchannel, battrs, sizeof(struct nfsd4_channel_attrs));
> > >  
> > >  	return new;
> > >  out_free:
> > > -	while (i--)
> > > -		kfree(new->se_slots[i]);
> > >  	kfree(new);
> > >  	return NULL;
> > >  }
> > > @@ -2128,7 +2076,6 @@ static void __free_session(struct nfsd4_session *ses)
> > >  static void free_session(struct nfsd4_session *ses)
> > >  {
> > >  	nfsd4_del_conns(ses);
> > > -	nfsd4_put_drc_mem(&ses->se_fchannel);
> > >  	__free_session(ses);
> > >  }
> > >  
> > > @@ -3748,17 +3695,6 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
> > >  	ca->maxresp_cached = min_t(u32, ca->maxresp_cached,
> > >  			NFSD_SLOT_CACHE_SIZE + NFSD_MIN_HDR_SEQ_SZ);
> > >  	ca->maxreqs = min_t(u32, ca->maxreqs, NFSD_MAX_SLOTS_PER_SESSION);
> > > -	/*
> > > -	 * Note decreasing slot size below client's request may make it
> > > -	 * difficult for client to function correctly, whereas
> > > -	 * decreasing the number of slots will (just?) affect
> > > -	 * performance.  When short on memory we therefore prefer to
> > > -	 * decrease number of slots instead of their size.  Clients that
> > > -	 * request larger slots than they need will get poor results:
> > > -	 * Note that we always allow at least one slot, because our
> > > -	 * accounting is soft and provides no guarantees either way.
> > > -	 */
> > > -	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
> > >  
> > >  	return nfs_ok;
> > >  }
> > > @@ -3836,11 +3772,11 @@ nfsd4_create_session(struct svc_rqst *rqstp,
> > >  		return status;
> > >  	status = check_backchannel_attrs(&cr_ses->back_channel);
> > >  	if (status)
> > > -		goto out_release_drc_mem;
> > > +		goto out_err;
> > >  	status = nfserr_jukebox;
> > >  	new = alloc_session(&cr_ses->fore_channel, &cr_ses->back_channel);
> > >  	if (!new)
> > > -		goto out_release_drc_mem;
> > > +		goto out_err;
> > >  	conn = alloc_conn_from_crses(rqstp, cr_ses);
> > >  	if (!conn)
> > >  		goto out_free_session;
> > > @@ -3950,8 +3886,7 @@ nfsd4_create_session(struct svc_rqst *rqstp,
> > >  		expire_client(old);
> > >  out_free_session:
> > >  	__free_session(new);
> > > -out_release_drc_mem:
> > > -	nfsd4_put_drc_mem(&cr_ses->fore_channel);
> > > +out_err:
> > >  	return status;
> > >  }
> > >  
> > > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > > index 4b56ba1e8e48..3eb21e63b921 100644
> > > --- a/fs/nfsd/nfsd.h
> > > +++ b/fs/nfsd/nfsd.h
> > > @@ -88,9 +88,6 @@ struct nfsd_genl_rqstp {
> > >  extern struct svc_program	nfsd_programs[];
> > >  extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
> > >  extern struct mutex		nfsd_mutex;
> > > -extern spinlock_t		nfsd_drc_lock;
> > > -extern unsigned long		nfsd_drc_max_mem;
> > > -extern unsigned long		nfsd_drc_mem_used;
> > >  extern atomic_t			nfsd_th_cnt;		/* number of available threads */
> > >  
> > >  extern const struct seq_operations nfs_exports_op;
> > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > > index 49e2f32102ab..3dbaefc96608 100644
> > > --- a/fs/nfsd/nfssvc.c
> > > +++ b/fs/nfsd/nfssvc.c
> > > @@ -70,16 +70,6 @@ static __be32			nfsd_init_request(struct svc_rqst *,
> > >   */
> > >  DEFINE_MUTEX(nfsd_mutex);
> > >  
> > > -/*
> > > - * nfsd_drc_lock protects nfsd_drc_max_pages and nfsd_drc_pages_used.
> > > - * nfsd_drc_max_pages limits the total amount of memory available for
> > > - * version 4.1 DRC caches.
> > > - * nfsd_drc_pages_used tracks the current version 4.1 DRC memory usage.
> > > - */
> > > -DEFINE_SPINLOCK(nfsd_drc_lock);
> > > -unsigned long	nfsd_drc_max_mem;
> > > -unsigned long	nfsd_drc_mem_used;
> > > -
> > >  #if IS_ENABLED(CONFIG_NFS_LOCALIO)
> > >  static const struct svc_version *localio_versions[] = {
> > >  	[1] = &localio_version1,
> > > @@ -575,27 +565,6 @@ void nfsd_reset_versions(struct nfsd_net *nn)
> > >  		}
> > >  }
> > >  
> > > -/*
> > > - * Each session guarantees a negotiated per slot memory cache for replies
> > > - * which in turn consumes memory beyond the v2/v3/v4.0 server. A dedicated
> > > - * NFSv4.1 server might want to use more memory for a DRC than a machine
> > > - * with mutiple services.
> > > - *
> > > - * Impose a hard limit on the number of pages for the DRC which varies
> > > - * according to the machines free pages. This is of course only a default.
> > > - *
> > > - * For now this is a #defined shift which could be under admin control
> > > - * in the future.
> > > - */
> > > -static void set_max_drc(void)
> > > -{
> > > -	#define NFSD_DRC_SIZE_SHIFT	7
> > > -	nfsd_drc_max_mem = (nr_free_buffer_pages()
> > > -					>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
> > > -	nfsd_drc_mem_used = 0;
> > > -	dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
> > > -}
> > > -
> > >  static int nfsd_get_default_max_blksize(void)
> > >  {
> > >  	struct sysinfo i;
> > > @@ -678,7 +647,6 @@ int nfsd_create_serv(struct net *net)
> > >  	nn->nfsd_serv = serv;
> > >  	spin_unlock(&nfsd_notifier_lock);
> > >  
> > > -	set_max_drc();
> > >  	/* check if the notifier is already set */
> > >  	if (atomic_inc_return(&nfsd_notifier_refcount) == 1) {
> > >  		register_inetaddr_notifier(&nfsd_inetaddr_notifier);
> > > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > > index 35b3564c065f..c052e9eea81b 100644
> > > --- a/fs/nfsd/state.h
> > > +++ b/fs/nfsd/state.h
> > > @@ -310,7 +310,7 @@ struct nfsd4_session {
> > >  	struct list_head	se_conns;
> > >  	u32			se_cb_prog;
> > >  	u32			se_cb_seq_nr;
> > > -	struct nfsd4_slot	*se_slots[];	/* forward channel slots */
> > > +	struct nfsd4_slot	*se_slots[NFSD_MAX_SLOTS_PER_SESSION];	/* forward channel slots */
> > >  };
> > >  
> > >  /* formatted contents of nfs4_sessionid */
> > 
> > I like this patch overall. One thing we could consider is allocating
> > slots aggressively with GFP_KERNEL up to a particular threshold, and
> > then switch to GFP_NOWAIT, but this seems like a reasonable place to
> > start.
> 
> At this point (when creating the session) it is __GFP_NORETRY , not
> GET_NOWAIT.  So it makes a decent effort but does give up earlier than
> GFP_KERNEL. 
> 
> What threshold should we use?  "1" seemed good to me as 1 is essential
> but more is just for improved performance: wait for the client to
> actually use them.
> 
> I'd rather allocate JIT than ASAP.
> 

I think so too mostly. I'm just wondering whether the floor for the
slot table size ought to be bigger than 1. If memory is really tight
then it could really slow things down. Maybe that's a good thing though
since it'll keep nfsd from bogging down the system further. I think
we'll just have to experiment.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
  2024-11-13  5:38 [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand NeilBrown
  2024-11-13  5:38 ` [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC NeilBrown
@ 2024-11-13  5:38 ` NeilBrown
  2024-11-13  5:38 ` [PATCH 3/4] nfsd: free unused session-DRC slots NeilBrown
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2024-11-13  5:38 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

If a client ever uses the highest available slot for a given session,
attempt to allocate another slot so there is room for the client to use
more slots if wanted.  GFP_NOWAIT is used so if there is not plenty of
free memory, failure is expected - which is what we want.  It also
allows the allocation while holding a spinlock.

We would expect to stablise with one more slot available than the client
actually uses.

Now that we grow the slot table on demand we can start with a smaller
allocation.  Define NFSD_MAX_UNUSED_SLOTS and allocate at most that many
when session is created.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 22 ++++++++++++++++------
 fs/nfsd/state.h     |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 2dcba0c83c10..683bb908039b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1953,7 +1953,7 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
 	if (!new->se_slots[0])
 		goto out_free;
 
-	for (i = 1; i < numslots; i++) {
+	for (i = 1; i < numslots && i < NFSD_MAX_UNUSED_SLOTS; i++) {
 		new->se_slots[i] = kzalloc(slotsize, GFP_KERNEL | __GFP_NORETRY);
 		if (!new->se_slots[i])
 			break;
@@ -4187,11 +4187,6 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	slot = session->se_slots[seq->slotid];
 	dprintk("%s: slotid %d\n", __func__, seq->slotid);
 
-	/* We do not negotiate the number of slots yet, so set the
-	 * maxslots to the session maxreqs which is used to encode
-	 * sr_highest_slotid and the sr_target_slot id to maxslots */
-	seq->maxslots = session->se_fchannel.maxreqs;
-
 	trace_nfsd_slot_seqid_sequence(clp, seq, slot);
 	status = check_slot_seqid(seq->seqid, slot->sl_seqid,
 					slot->sl_flags & NFSD4_SLOT_INUSE);
@@ -4241,6 +4236,21 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	cstate->session = session;
 	cstate->clp = clp;
 
+	/*
+	 * If the client ever uses the highest available slot,
+	 * gently try to allocate another one.
+	 */
+	if (seq->slotid == session->se_fchannel.maxreqs - 1 &&
+	    session->se_fchannel.maxreqs < NFSD_MAX_SLOTS_PER_SESSION) {
+		int s = session->se_fchannel.maxreqs;
+
+		session->se_slots[s] = kzalloc(slot_bytes(&session->se_fchannel),
+					       GFP_NOWAIT);
+		if (session->se_slots[s])
+			session->se_fchannel.maxreqs += 1;
+	}
+	seq->maxslots = session->se_fchannel.maxreqs;
+
 out:
 	switch (clp->cl_cb_state) {
 	case NFSD4_CB_DOWN:
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index c052e9eea81b..012b68a0bafa 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -215,6 +215,8 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s)
 
 /* Maximum number of slots per session. 160 is useful for long haul TCP */
 #define NFSD_MAX_SLOTS_PER_SESSION     160
+/* Maximum number of slots per session that are allocated by unused */
+#define NFSD_MAX_UNUSED_SLOTS		6
 /* Maximum  session per slot cache size */
 #define NFSD_SLOT_CACHE_SIZE		2048
 /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] nfsd: free unused session-DRC slots
  2024-11-13  5:38 [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand NeilBrown
  2024-11-13  5:38 ` [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC NeilBrown
  2024-11-13  5:38 ` [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand NeilBrown
@ 2024-11-13  5:38 ` NeilBrown
  2024-11-13 16:52   ` Jeff Layton
  2024-11-13  5:38 ` [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated per session NeilBrown
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2024-11-13  5:38 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

When the client confirms that some currently allocated DRC slots are not
used, free some.  We currently keep 6 (NFSD_MAX_UNUSED_SLOTS) unused
slots to support bursts of client activity.  This could be tuned with a
shrinker.

When we free a slot we store the seqid in the slot pointer so that it can
be restored when we reactivate the slot.  The RFC requires the seqid for
each slot to increment on each request and does not permit it ever to be
reset.

We decoding sa_highest_slotid into maxslots we need to add 1 - this
matches how it is encoded for the reply.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 25 +++++++++++++++++++------
 fs/nfsd/nfs4xdr.c   |  3 ++-
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 683bb908039b..15de62416243 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1910,14 +1910,21 @@ gen_sessionid(struct nfsd4_session *ses)
 #define NFSD_MIN_HDR_SEQ_SZ  (24 + 12 + 44)
 
 static void
-free_session_slots(struct nfsd4_session *ses)
+free_session_slots(struct nfsd4_session *ses, int from)
 {
 	int i;
 
-	for (i = 0; i < ses->se_fchannel.maxreqs; i++) {
+	if (from >= ses->se_fchannel.maxreqs)
+		return;
+
+	for (i = from; i < ses->se_fchannel.maxreqs; i++) {
+		uintptr_t seqid = ses->se_slots[i]->sl_seqid;
 		free_svc_cred(&ses->se_slots[i]->sl_cred);
 		kfree(ses->se_slots[i]);
+		/* Save the seqid in case we reactivate this slot */
+		ses->se_slots[i] = (void *)seqid;
 	}
+	ses->se_fchannel.maxreqs = from;
 }
 
 /*
@@ -2069,7 +2076,7 @@ static void nfsd4_del_conns(struct nfsd4_session *s)
 
 static void __free_session(struct nfsd4_session *ses)
 {
-	free_session_slots(ses);
+	free_session_slots(ses, 0);
 	kfree(ses);
 }
 
@@ -4214,6 +4221,9 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (status)
 		goto out_put_session;
 
+	/* If there are lots of unused slots, free some */
+	free_session_slots(session, seq->maxslots + NFSD_MAX_UNUSED_SLOTS);
+
 	buflen = (seq->cachethis) ?
 			session->se_fchannel.maxresp_cached :
 			session->se_fchannel.maxresp_sz;
@@ -4244,10 +4254,13 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	    session->se_fchannel.maxreqs < NFSD_MAX_SLOTS_PER_SESSION) {
 		int s = session->se_fchannel.maxreqs;
 
-		session->se_slots[s] = kzalloc(slot_bytes(&session->se_fchannel),
-					       GFP_NOWAIT);
-		if (session->se_slots[s])
+		struct nfsd4_slot *slot = kzalloc(slot_bytes(&session->se_fchannel),
+						  GFP_NOWAIT);
+		if (slot) {
+			slot->sl_seqid = (uintptr_t)session->se_slots[s];
+			session->se_slots[s] = slot;
 			session->se_fchannel.maxreqs += 1;
+		}
 	}
 	seq->maxslots = session->se_fchannel.maxreqs;
 
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index f118921250c3..846ed52fdaf5 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1884,7 +1884,8 @@ nfsd4_decode_sequence(struct nfsd4_compoundargs *argp,
 		return nfserr_bad_xdr;
 	seq->seqid = be32_to_cpup(p++);
 	seq->slotid = be32_to_cpup(p++);
-	seq->maxslots = be32_to_cpup(p++);
+	/* sa_highest_slotid counts from 0 but maxslots  counts from 1 ... */
+	seq->maxslots = be32_to_cpup(p++) + 1;
 	seq->cachethis = be32_to_cpup(p);
 
 	seq->status_flags = 0;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] nfsd: free unused session-DRC slots
  2024-11-13  5:38 ` [PATCH 3/4] nfsd: free unused session-DRC slots NeilBrown
@ 2024-11-13 16:52   ` Jeff Layton
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Layton @ 2024-11-13 16:52 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, 2024-11-13 at 16:38 +1100, NeilBrown wrote:
> When the client confirms that some currently allocated DRC slots are not
> used, free some.  We currently keep 6 (NFSD_MAX_UNUSED_SLOTS) unused
> slots to support bursts of client activity.  This could be tuned with a
> shrinker.
> 
> When we free a slot we store the seqid in the slot pointer so that it can
> be restored when we reactivate the slot.  The RFC requires the seqid for
> each slot to increment on each request and does not permit it ever to be
> reset.
> 

I think we need to get some clarification on the above. 

IMO, once you free a slot, it's gone and its seqid should also be
forgotten. A reactivated slot, IOW, is effectively "new" and its seqid
should start again at 1. I don't think the spec intended to require
that both ends to remember seqids for defunct slots.

> We decoding sa_highest_slotid into maxslots we need to add 1 - this
> matches how it is encoded for the reply.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4state.c | 25 +++++++++++++++++++------
>  fs/nfsd/nfs4xdr.c   |  3 ++-
>  2 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 683bb908039b..15de62416243 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1910,14 +1910,21 @@ gen_sessionid(struct nfsd4_session *ses)
>  #define NFSD_MIN_HDR_SEQ_SZ  (24 + 12 + 44)
>  
>  static void
> -free_session_slots(struct nfsd4_session *ses)
> +free_session_slots(struct nfsd4_session *ses, int from)
>  {
>  	int i;
>  
> -	for (i = 0; i < ses->se_fchannel.maxreqs; i++) {
> +	if (from >= ses->se_fchannel.maxreqs)
> +		return;
> +
> +	for (i = from; i < ses->se_fchannel.maxreqs; i++) {
> +		uintptr_t seqid = ses->se_slots[i]->sl_seqid;
>  		free_svc_cred(&ses->se_slots[i]->sl_cred);
>  		kfree(ses->se_slots[i]);
> +		/* Save the seqid in case we reactivate this slot */
> +		ses->se_slots[i] = (void *)seqid;
>  	}
> +	ses->se_fchannel.maxreqs = from;
>  }
>  
>  /*
> @@ -2069,7 +2076,7 @@ static void nfsd4_del_conns(struct nfsd4_session *s)
>  
>  static void __free_session(struct nfsd4_session *ses)
>  {
> -	free_session_slots(ses);
> +	free_session_slots(ses, 0);
>  	kfree(ses);
>  }
>  
> @@ -4214,6 +4221,9 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	if (status)
>  		goto out_put_session;
>  
> +	/* If there are lots of unused slots, free some */
> +	free_session_slots(session, seq->maxslots + NFSD_MAX_UNUSED_SLOTS);
> +
>  	buflen = (seq->cachethis) ?
>  			session->se_fchannel.maxresp_cached :
>  			session->se_fchannel.maxresp_sz;
> @@ -4244,10 +4254,13 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	    session->se_fchannel.maxreqs < NFSD_MAX_SLOTS_PER_SESSION) {
>  		int s = session->se_fchannel.maxreqs;
>  
> -		session->se_slots[s] = kzalloc(slot_bytes(&session->se_fchannel),
> -					       GFP_NOWAIT);
> -		if (session->se_slots[s])
> +		struct nfsd4_slot *slot = kzalloc(slot_bytes(&session->se_fchannel),
> +						  GFP_NOWAIT);
> +		if (slot) {
> +			slot->sl_seqid = (uintptr_t)session->se_slots[s];
> +			session->se_slots[s] = slot;
>  			session->se_fchannel.maxreqs += 1;
> +		}
>  	}
>  	seq->maxslots = session->se_fchannel.maxreqs;
>  
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index f118921250c3..846ed52fdaf5 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1884,7 +1884,8 @@ nfsd4_decode_sequence(struct nfsd4_compoundargs *argp,
>  		return nfserr_bad_xdr;
>  	seq->seqid = be32_to_cpup(p++);
>  	seq->slotid = be32_to_cpup(p++);
> -	seq->maxslots = be32_to_cpup(p++);
> +	/* sa_highest_slotid counts from 0 but maxslots  counts from 1 ... */
> +	seq->maxslots = be32_to_cpup(p++) + 1;
>  	seq->cachethis = be32_to_cpup(p);
>  
>  	seq->status_flags = 0;

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated per session
  2024-11-13  5:38 [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand NeilBrown
                   ` (2 preceding siblings ...)
  2024-11-13  5:38 ` [PATCH 3/4] nfsd: free unused session-DRC slots NeilBrown
@ 2024-11-13  5:38 ` NeilBrown
  2024-11-13 11:23 ` [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand Daire Byrne
  2024-11-13 14:48 ` Chuck Lever III
  5 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2024-11-13  5:38 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

Add a shrinker which frees unused slots and may ask the clients to use
fewer slots on each session.

Each session now tracks se_client_maxreqs which is the most recent
max-requests-in-use reported by the client, and se_target_maxreqs which
is a target number of requests which is reduced by the shrinker.

The shrinker iterates over all sessions on all client in all
net-namespaces and reduces the target by 1 for each.  The shrinker may
get called multiple times to reduce by more than 1 each.

If se_target_maxreqs is above se_client_maxreqs, those slots can be
freed immediately.  If not the client will be ask to reduce its usage
and as the usage goes down slots will be freed.

Once the usage has dropped to match the target, the target can be
increased if the client uses all available slots and if a GFP_NOWAIT
allocation succeeds.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 82 ++++++++++++++++++++++++++++++++++++++++++++-
 fs/nfsd/nfs4xdr.c   |  2 +-
 fs/nfsd/state.h     |  3 ++
 fs/nfsd/xdr4.h      |  2 --
 4 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 15de62416243..bbc365002885 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1925,6 +1925,8 @@ free_session_slots(struct nfsd4_session *ses, int from)
 		ses->se_slots[i] = (void *)seqid;
 	}
 	ses->se_fchannel.maxreqs = from;
+	if (ses->se_target_maxreqs > from)
+		ses->se_target_maxreqs = from;
 }
 
 /*
@@ -1968,6 +1970,7 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
 	fattrs->maxreqs = i;
 	memcpy(&new->se_fchannel, fattrs, sizeof(struct nfsd4_channel_attrs));
 	memcpy(&new->se_bchannel, battrs, sizeof(struct nfsd4_channel_attrs));
+	new->se_target_maxreqs = i;
 
 	return new;
 out_free:
@@ -2086,6 +2089,57 @@ static void free_session(struct nfsd4_session *ses)
 	__free_session(ses);
 }
 
+
+static DEFINE_SPINLOCK(nfsd_session_list_lock);
+static LIST_HEAD(nfsd_session_list);
+
+static unsigned long
+nfsd_slot_count(struct shrinker *s, struct shrink_control *sc)
+{
+	struct nfsd4_session *ses;
+	unsigned long cnt = 0;
+
+	spin_lock(&nfsd_session_list_lock);
+	list_for_each_entry(ses, &nfsd_session_list, se_all_sessions)
+		if (ses->se_target_maxreqs > 1)
+			cnt += ses->se_target_maxreqs - 1;
+	spin_unlock(&nfsd_session_list_lock);
+	return cnt ? cnt : SHRINK_EMPTY;
+}
+
+static unsigned long
+nfsd_slot_scan(struct shrinker *s, struct shrink_control *sc)
+{
+	struct nfsd4_session *ses;
+	unsigned long scanned = 0;
+	unsigned long freed = 0;
+
+	spin_lock(&nfsd_session_list_lock);
+	list_for_each_entry(ses, &nfsd_session_list, se_all_sessions) {
+		struct nfsd_net *nn = net_generic(ses->se_client->net,
+						  nfsd_net_id);
+
+		spin_lock(&nn->client_lock);
+		if (ses->se_fchannel.maxreqs > 1 &&
+		    ses->se_target_maxreqs > 1) {
+			freed += 1;
+			ses->se_target_maxreqs -= 1;
+			free_session_slots(ses, max(ses->se_target_maxreqs,
+						    ses->se_client_maxreqs));
+		}
+		spin_unlock(&nn->client_lock);
+		scanned += 1;
+		if (scanned >= sc->nr_to_scan) {
+			/* Move starting point for next scan */
+			list_move(&nfsd_session_list, &ses->se_all_sessions);
+			break;
+		}
+	}
+	spin_unlock(&nfsd_session_list_lock);
+	sc->nr_scanned = scanned;
+	return freed;
+}
+
 static void init_session(struct svc_rqst *rqstp, struct nfsd4_session *new, struct nfs4_client *clp, struct nfsd4_create_session *cses)
 {
 	int idx;
@@ -2107,6 +2161,10 @@ static void init_session(struct svc_rqst *rqstp, struct nfsd4_session *new, stru
 	list_add(&new->se_perclnt, &clp->cl_sessions);
 	spin_unlock(&clp->cl_lock);
 
+	spin_lock(&nfsd_session_list_lock);
+	list_add_tail(&new->se_all_sessions, &nfsd_session_list);
+	spin_unlock(&nfsd_session_list_lock);
+
 	{
 		struct sockaddr *sa = svc_addr(rqstp);
 		/*
@@ -2176,6 +2234,9 @@ unhash_session(struct nfsd4_session *ses)
 	spin_lock(&ses->se_client->cl_lock);
 	list_del(&ses->se_perclnt);
 	spin_unlock(&ses->se_client->cl_lock);
+	spin_lock(&nfsd_session_list_lock);
+	list_del(&ses->se_all_sessions);
+	spin_unlock(&nfsd_session_list_lock);
 }
 
 /* SETCLIENTID and SETCLIENTID_CONFIRM Helper functions */
@@ -4221,8 +4282,11 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (status)
 		goto out_put_session;
 
-	/* If there are lots of unused slots, free some */
+	/* we can safely free some slots */
 	free_session_slots(session, seq->maxslots + NFSD_MAX_UNUSED_SLOTS);
+	free_session_slots(session, max(seq->maxslots,
+					session->se_target_maxreqs));
+	session->se_client_maxreqs = seq->maxslots;
 
 	buflen = (seq->cachethis) ?
 			session->se_fchannel.maxresp_cached :
@@ -4251,6 +4315,7 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	 * gently try to allocate another one.
 	 */
 	if (seq->slotid == session->se_fchannel.maxreqs - 1 &&
+	    session->se_target_maxreqs >= session->se_fchannel.maxreqs &&
 	    session->se_fchannel.maxreqs < NFSD_MAX_SLOTS_PER_SESSION) {
 		int s = session->se_fchannel.maxreqs;
 
@@ -4260,9 +4325,11 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 			slot->sl_seqid = (uintptr_t)session->se_slots[s];
 			session->se_slots[s] = slot;
 			session->se_fchannel.maxreqs += 1;
+			session->se_target_maxreqs = session->se_fchannel.maxreqs;
 		}
 	}
 	seq->maxslots = session->se_fchannel.maxreqs;
+	seq->target_maxslots = session->se_target_maxreqs;
 
 out:
 	switch (clp->cl_cb_state) {
@@ -8653,6 +8720,8 @@ nfs4_state_start_net(struct net *net)
 
 /* initialization to perform when the nfsd service is started: */
 
+static struct shrinker *nfsd_slot_shrinker;
+
 int
 nfs4_state_start(void)
 {
@@ -8662,6 +8731,16 @@ nfs4_state_start(void)
 	if (ret)
 		return ret;
 
+	nfsd_slot_shrinker = shrinker_alloc(0, "nfsd-DRC-slot");
+	if (!nfsd_slot_shrinker) {
+		rhltable_destroy(&nfs4_file_rhltable);
+		return -ENOMEM;
+	}
+	nfsd_slot_shrinker->count_objects = nfsd_slot_count;
+	nfsd_slot_shrinker->scan_objects = nfsd_slot_scan;
+	nfsd_slot_shrinker->seeks = 1;
+	shrinker_register(nfsd_slot_shrinker);
+
 	set_max_delegations();
 	return 0;
 }
@@ -8703,6 +8782,7 @@ void
 nfs4_state_shutdown(void)
 {
 	rhltable_destroy(&nfs4_file_rhltable);
+	shrinker_free(nfsd_slot_shrinker);
 }
 
 static void
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 846ed52fdaf5..ac3376c2e5cc 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4956,7 +4956,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
 	if (nfserr != nfs_ok)
 		return nfserr;
 	/* sr_target_highest_slotid */
-	nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1);
+	nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1);
 	if (nfserr != nfs_ok)
 		return nfserr;
 	/* sr_status_flags */
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 012b68a0bafa..a25f3cfaab09 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -304,12 +304,15 @@ struct nfsd4_session {
 /* See SESSION4_PERSIST, etc. for standard flags; this is internal-only: */
 #define NFS4_SESSION_DEAD	0x010
 	u32			se_flags;
+	struct list_head	se_all_sessions;/* global list of sessions */
 	struct nfs4_client	*se_client;
 	struct nfs4_sessionid	se_sessionid;
 	struct nfsd4_channel_attrs se_fchannel;
 	struct nfsd4_channel_attrs se_bchannel;
 	struct nfsd4_cb_sec	se_cb_sec;
 	struct list_head	se_conns;
+	u8			se_target_maxreqs;
+	u8			se_client_maxreqs;
 	u32			se_cb_prog;
 	u32			se_cb_seq_nr;
 	struct nfsd4_slot	*se_slots[NFSD_MAX_SLOTS_PER_SESSION];	/* forward channel slots */
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 2a21a7662e03..71b87190a4a6 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -575,9 +575,7 @@ struct nfsd4_sequence {
 	u32			slotid;			/* request/response */
 	u32			maxslots;		/* request/response */
 	u32			cachethis;		/* request */
-#if 0
 	u32			target_maxslots;	/* response */
-#endif /* not yet */
 	u32			status_flags;		/* response */
 };
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand
  2024-11-13  5:38 [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand NeilBrown
                   ` (3 preceding siblings ...)
  2024-11-13  5:38 ` [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated per session NeilBrown
@ 2024-11-13 11:23 ` Daire Byrne
  2024-11-15  4:56   ` NeilBrown
  2024-11-13 14:48 ` Chuck Lever III
  5 siblings, 1 reply; 16+ messages in thread
From: Daire Byrne @ 2024-11-13 11:23 UTC (permalink / raw)
  To: NeilBrown
  Cc: Chuck Lever, Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo,
	Tom Talpey

Neil,

I'm curious if this work relates to:

https://bugzilla.linux-nfs.org/show_bug.cgi?id=375
https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com

As my thread described, we currently use NFSv3 for our high latency
NFS re-export cases simply because it performs way better for parallel
client operations. You see, when you use re-exporting serving many
clients, you are in effect taking all those client operations and
stuffing them through a single client (the re-export server) which
then becomes a bottleneck. Add any kind of latency on top (>10ms) and
the NFSD_CACHE_SIZE_SLOTS_PER_SESSION (32) for NFSv4 becomes a major
bottleneck for a single client (re-export server).

We also used your "VFS: support parallel updates in the one directory"
patches for similar reasons up until I couldn't port it to newer
kernels anymore (my kernel code munging skills are not sufficient!).

Sorry to spam the thread if I am misinterpreting what this patch set
is all about.

Daire


On Wed, 13 Nov 2024 at 05:54, NeilBrown <neilb@suse.de> wrote:
>
> This patch set aims to allocate session-based DRC slots on demand, and
> free them when not in use, or when memory is tight.
>
> I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> overly agreesive, and with lots of printks, and it seems to do the right
> thing, though memory pressure has never freed anything - I think you
> need several clients with a non-trivial number of slots allocated before
> the thresholds in the shrinker code will trigger any freeing.
>
> I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> useful that is.  There are certainly cases where simply setting the
> target in a SEQUENCE reply might not be enough, but I doubt they are
> very common.  You would need a session to be completely idle, with the
> last request received on it indicating that lots of slots were still in
> use.
>
> Currently we allocate slots one at a time when the last available slot
> was used by the client, and only if a NOWAIT allocation can succeed.  It
> is possible that this isn't quite agreesive enough.  When performing a
> lot of writeback it can be useful to have lots of slots, but memory
> pressure is also likely to build up on the server so GFP_NOWAIT is likely
> to fail.  Maybe occasionally using a firmer request (outside the
> spinlock) would be justified.
>
> We free slots when the number of unused slots passes some threshold -
> currently 6 (because ...  why not).  Possible a hysteresis should be
> added so we don't free unused slots for a least N seconds.
>
> When the shrinker wants to apply presure we remove slots equally from
> all sessions.  Maybe there should be some proportionality but that would
> be more complex and I'm not sure it would gain much.  Slot 0 can never
> be freed of course.
>
> I'm very interested to see what people think of the over-all approach,
> and of the specifics of the code.
>
> Thanks,
> NeilBrown
>
>
>  [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
>  [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
>  [PATCH 3/4] nfsd: free unused session-DRC slots
>  [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand
  2024-11-13 11:23 ` [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand Daire Byrne
@ 2024-11-15  4:56   ` NeilBrown
  2024-11-15 14:18     ` Chuck Lever
  2024-11-18 15:43     ` Daire Byrne
  0 siblings, 2 replies; 16+ messages in thread
From: NeilBrown @ 2024-11-15  4:56 UTC (permalink / raw)
  To: Daire Byrne
  Cc: Chuck Lever, Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo,
	Tom Talpey

On Wed, 13 Nov 2024, Daire Byrne wrote:
> Neil,
> 
> I'm curious if this work relates to:
> 
> https://bugzilla.linux-nfs.org/show_bug.cgi?id=375
> https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com

Yes, it could possibly help with that - though more work would be
needed.
nfsd currently has a hard limit of 160 slots per session.  That wouldn't
be enough I suspect.  The Linux client has a hard limit of 1024.  That
might be enough.
Allowed nfsd to have 1024 (or more) shouldn't be too hard...

> 
> As my thread described, we currently use NFSv3 for our high latency
> NFS re-export cases simply because it performs way better for parallel
> client operations. You see, when you use re-exporting serving many
> clients, you are in effect taking all those client operations and
> stuffing them through a single client (the re-export server) which
> then becomes a bottleneck. Add any kind of latency on top (>10ms) and
> the NFSD_CACHE_SIZE_SLOTS_PER_SESSION (32) for NFSv4 becomes a major
> bottleneck for a single client (re-export server).
> 
> We also used your "VFS: support parallel updates in the one directory"
> patches for similar reasons up until I couldn't port it to newer
> kernels anymore (my kernel code munging skills are not sufficient!).

Yeah - I really should get back to that.  Al and Linus suggested some
changes and I just never got around to making them.

Thanks,
NeilBrown


> 
> Sorry to spam the thread if I am misinterpreting what this patch set
> is all about.
> 
> Daire
> 
> 
> On Wed, 13 Nov 2024 at 05:54, NeilBrown <neilb@suse.de> wrote:
> >
> > This patch set aims to allocate session-based DRC slots on demand, and
> > free them when not in use, or when memory is tight.
> >
> > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> > overly agreesive, and with lots of printks, and it seems to do the right
> > thing, though memory pressure has never freed anything - I think you
> > need several clients with a non-trivial number of slots allocated before
> > the thresholds in the shrinker code will trigger any freeing.
> >
> > I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> > useful that is.  There are certainly cases where simply setting the
> > target in a SEQUENCE reply might not be enough, but I doubt they are
> > very common.  You would need a session to be completely idle, with the
> > last request received on it indicating that lots of slots were still in
> > use.
> >
> > Currently we allocate slots one at a time when the last available slot
> > was used by the client, and only if a NOWAIT allocation can succeed.  It
> > is possible that this isn't quite agreesive enough.  When performing a
> > lot of writeback it can be useful to have lots of slots, but memory
> > pressure is also likely to build up on the server so GFP_NOWAIT is likely
> > to fail.  Maybe occasionally using a firmer request (outside the
> > spinlock) would be justified.
> >
> > We free slots when the number of unused slots passes some threshold -
> > currently 6 (because ...  why not).  Possible a hysteresis should be
> > added so we don't free unused slots for a least N seconds.
> >
> > When the shrinker wants to apply presure we remove slots equally from
> > all sessions.  Maybe there should be some proportionality but that would
> > be more complex and I'm not sure it would gain much.  Slot 0 can never
> > be freed of course.
> >
> > I'm very interested to see what people think of the over-all approach,
> > and of the specifics of the code.
> >
> > Thanks,
> > NeilBrown
> >
> >
> >  [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
> >  [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
> >  [PATCH 3/4] nfsd: free unused session-DRC slots
> >  [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated
> >
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand
  2024-11-15  4:56   ` NeilBrown
@ 2024-11-15 14:18     ` Chuck Lever
  2024-11-18 15:43     ` Daire Byrne
  1 sibling, 0 replies; 16+ messages in thread
From: Chuck Lever @ 2024-11-15 14:18 UTC (permalink / raw)
  To: NeilBrown
  Cc: Daire Byrne, Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo,
	Tom Talpey

On Fri, Nov 15, 2024 at 03:56:08PM +1100, NeilBrown wrote:
> On Wed, 13 Nov 2024, Daire Byrne wrote:
> > Neil,
> > 
> > I'm curious if this work relates to:
> > 
> > https://bugzilla.linux-nfs.org/show_bug.cgi?id=375
> > https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com
> 
> Yes, it could possibly help with that - though more work would be
> needed.
> nfsd currently has a hard limit of 160 slots per session.  That wouldn't
> be enough I suspect.  The Linux client has a hard limit of 1024.  That
> might be enough.
> Allowed nfsd to have 1024 (or more) shouldn't be too hard...

1024 could be in the range where having a shrinker (when there are
multiple clients with that many slots) will start to bring some
value.

Maybe 1024 is too large at the beginning of a session, but enabling
NFSD to grow a session to that size would not be a bad thing.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand
  2024-11-15  4:56   ` NeilBrown
  2024-11-15 14:18     ` Chuck Lever
@ 2024-11-18 15:43     ` Daire Byrne
  1 sibling, 0 replies; 16+ messages in thread
From: Daire Byrne @ 2024-11-18 15:43 UTC (permalink / raw)
  To: NeilBrown
  Cc: Chuck Lever, Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo,
	Tom Talpey

On Fri, 15 Nov 2024 at 04:56, NeilBrown <neilb@suse.de> wrote:
>
> On Wed, 13 Nov 2024, Daire Byrne wrote:
> > Neil,
> >
> > I'm curious if this work relates to:
> >
> > https://bugzilla.linux-nfs.org/show_bug.cgi?id=375
> > https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com
>
> Yes, it could possibly help with that - though more work would be
> needed.
> nfsd currently has a hard limit of 160 slots per session.  That wouldn't
> be enough I suspect.  The Linux client has a hard limit of 1024.  That
> might be enough.
> Allowed nfsd to have 1024 (or more) shouldn't be too hard...

That would be cool - I'd love to be able to switch out NFSv3 for NFSv4
for our use cases. Although saying that, any changes to nfsd to
support that would likely take many years to make it into the RHEL
based storage servers we currently use.

Our re-export case is pretty unique and niche in the sense that a
single client is essentially trying to do the work of many clients.

> > We also used your "VFS: support parallel updates in the one directory"
> > patches for similar reasons up until I couldn't port it to newer
> > kernels anymore (my kernel code munging skills are not sufficient!).
>
> Yeah - I really should get back to that.  Al and Linus suggested some
> changes and I just never got around to making them.

That would also be awesome. Again, our specific niche use case (many
clients writing to the same directory via a re-export) is probably the
main beneficiary, but maybe it helps with other (more common)
workloads too.

Cheers,

Daire

> Thanks,
> NeilBrown
>
>
> >
> > Sorry to spam the thread if I am misinterpreting what this patch set
> > is all about.
> >
> > Daire
> >
> >
> > On Wed, 13 Nov 2024 at 05:54, NeilBrown <neilb@suse.de> wrote:
> > >
> > > This patch set aims to allocate session-based DRC slots on demand, and
> > > free them when not in use, or when memory is tight.
> > >
> > > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> > > overly agreesive, and with lots of printks, and it seems to do the right
> > > thing, though memory pressure has never freed anything - I think you
> > > need several clients with a non-trivial number of slots allocated before
> > > the thresholds in the shrinker code will trigger any freeing.
> > >
> > > I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> > > useful that is.  There are certainly cases where simply setting the
> > > target in a SEQUENCE reply might not be enough, but I doubt they are
> > > very common.  You would need a session to be completely idle, with the
> > > last request received on it indicating that lots of slots were still in
> > > use.
> > >
> > > Currently we allocate slots one at a time when the last available slot
> > > was used by the client, and only if a NOWAIT allocation can succeed.  It
> > > is possible that this isn't quite agreesive enough.  When performing a
> > > lot of writeback it can be useful to have lots of slots, but memory
> > > pressure is also likely to build up on the server so GFP_NOWAIT is likely
> > > to fail.  Maybe occasionally using a firmer request (outside the
> > > spinlock) would be justified.
> > >
> > > We free slots when the number of unused slots passes some threshold -
> > > currently 6 (because ...  why not).  Possible a hysteresis should be
> > > added so we don't free unused slots for a least N seconds.
> > >
> > > When the shrinker wants to apply presure we remove slots equally from
> > > all sessions.  Maybe there should be some proportionality but that would
> > > be more complex and I'm not sure it would gain much.  Slot 0 can never
> > > be freed of course.
> > >
> > > I'm very interested to see what people think of the over-all approach,
> > > and of the specifics of the code.
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > >
> > >  [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
> > >  [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
> > >  [PATCH 3/4] nfsd: free unused session-DRC slots
> > >  [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated
> > >
> >
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand
  2024-11-13  5:38 [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand NeilBrown
                   ` (4 preceding siblings ...)
  2024-11-13 11:23 ` [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand Daire Byrne
@ 2024-11-13 14:48 ` Chuck Lever III
  2024-11-15  9:03   ` NeilBrown
  5 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever III @ 2024-11-13 14:48 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jeff Layton, Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo,
	Tom Talpey



> On Nov 13, 2024, at 12:38 AM, NeilBrown <neilb@suse.de> wrote:
> 
> This patch set aims to allocate session-based DRC slots on demand, and
> free them when not in use, or when memory is tight.
> 
> I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> overly agreesive, and with lots of printks, and it seems to do the right
> thing, though memory pressure has never freed anything - I think you
> need several clients with a non-trivial number of slots allocated before
> the thresholds in the shrinker code will trigger any freeing.

Can you describe your test set-up? Generally a system
with less than 4GB of memory can trigger shrinkers
pretty easily.

If we never see the mechanism being triggered due to
memory exhaustion, then I wonder if the additional
complexity is adding substantial value.


> I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> useful that is.  There are certainly cases where simply setting the
> target in a SEQUENCE reply might not be enough, but I doubt they are
> very common.  You would need a session to be completely idle, with the
> last request received on it indicating that lots of slots were still in
> use.
> 
> Currently we allocate slots one at a time when the last available slot
> was used by the client, and only if a NOWAIT allocation can succeed.  It
> is possible that this isn't quite agreesive enough.  When performing a
> lot of writeback it can be useful to have lots of slots, but memory
> pressure is also likely to build up on the server so GFP_NOWAIT is likely
> to fail.  Maybe occasionally using a firmer request (outside the
> spinlock) would be justified.

I'm wondering why GFP_NOWAIT is used here, and I admit
I'm not strongly familiar with the code or mechanism.
Why not always use GFP_KERNEL ?


> We free slots when the number of unused slots passes some threshold -
> currently 6 (because ...  why not).  Possible a hysteresis should be
> added so we don't free unused slots for a least N seconds.

Generally freeing unused resources is un-Linux like. :-)
Can you provide a rationale for why this is needed?


> When the shrinker wants to apply presure we remove slots equally from
> all sessions.  Maybe there should be some proportionality but that would
> be more complex and I'm not sure it would gain much.  Slot 0 can never
> be freed of course.
> 
> I'm very interested to see what people think of the over-all approach,
> and of the specifics of the code.
> 
> Thanks,
> NeilBrown
> 
> 
> [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
> [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
> [PATCH 3/4] nfsd: free unused session-DRC slots
> [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated
> 

--
Chuck Lever



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand
  2024-11-13 14:48 ` Chuck Lever III
@ 2024-11-15  9:03   ` NeilBrown
  2024-11-15 14:27     ` Chuck Lever
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2024-11-15  9:03 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jeff Layton, Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo,
	Tom Talpey

On Thu, 14 Nov 2024, Chuck Lever III wrote:
> 
> 
> > On Nov 13, 2024, at 12:38 AM, NeilBrown <neilb@suse.de> wrote:
> > 
> > This patch set aims to allocate session-based DRC slots on demand, and
> > free them when not in use, or when memory is tight.
> > 
> > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> > overly agreesive, and with lots of printks, and it seems to do the right
> > thing, though memory pressure has never freed anything - I think you
> > need several clients with a non-trivial number of slots allocated before
> > the thresholds in the shrinker code will trigger any freeing.
> 
> Can you describe your test set-up? Generally a system
> with less than 4GB of memory can trigger shrinkers
> pretty easily.
> 
> If we never see the mechanism being triggered due to
> memory exhaustion, then I wonder if the additional
> complexity is adding substantial value.

Just a single VM with 1G RAM.  Only one client so only one session.
The default batch count for shrinkers is 64 and the reported count of
freeable items is normally scaled down a lot until memory gets really
tight.  So if I only have 6 slots that could be freed the shrinker isn't
going to notice.
I set ->batch to 2 and ->seeks to 0 and the shrinker started freeing
things.  This allowed me to see some bugs.

One that I haven't resolved yet is the need to wait to get confirmation
from the client before rejecting requests with larger numbered slots.

> 
> 
> > I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> > useful that is.  There are certainly cases where simply setting the
> > target in a SEQUENCE reply might not be enough, but I doubt they are
> > very common.  You would need a session to be completely idle, with the
> > last request received on it indicating that lots of slots were still in
> > use.
> > 
> > Currently we allocate slots one at a time when the last available slot
> > was used by the client, and only if a NOWAIT allocation can succeed.  It
> > is possible that this isn't quite agreesive enough.  When performing a
> > lot of writeback it can be useful to have lots of slots, but memory
> > pressure is also likely to build up on the server so GFP_NOWAIT is likely
> > to fail.  Maybe occasionally using a firmer request (outside the
> > spinlock) would be justified.
> 
> I'm wondering why GFP_NOWAIT is used here, and I admit
> I'm not strongly familiar with the code or mechanism.
> Why not always use GFP_KERNEL ?

Partly because the kmalloc call is under a spinlock, so we cannot wait. 
But that could be changed with a bit of work.

GFP_KERNEL can block indefinitely, and we don't actually need the
allocation to succeed to satisfy the current request, so it seems wrong
to block at all when we don't need to.

I'm hoping that GFP_NOWAIT will succeed often enough that the slot table
will grow when there is demand - maybe not instantly but not too slowly.
If GFP_NOWAIT doesn't succeed, then reclaim will be happening and the
shrinker will probably ask us to return some slots soon - maybe it isn't
worth trying hard to allocate something we will have to return soon.

> 
> 
> > We free slots when the number of unused slots passes some threshold -
> > currently 6 (because ...  why not).  Possible a hysteresis should be
> > added so we don't free unused slots for a least N seconds.
> 
> Generally freeing unused resources is un-Linux like. :-)
> Can you provide a rationale for why this is needed?

Uhm...  No.  I added it so that patch which adds slot retirement could do
something useful before the shrinker was added, and when I added the
shrinker I couldn't bring myself to remove it.  Probably I should.


Thanks for your thoughtful review.

NeilBrown

> 
> 
> > When the shrinker wants to apply presure we remove slots equally from
> > all sessions.  Maybe there should be some proportionality but that would
> > be more complex and I'm not sure it would gain much.  Slot 0 can never
> > be freed of course.
> > 
> > I'm very interested to see what people think of the over-all approach,
> > and of the specifics of the code.
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> > [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
> > [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
> > [PATCH 3/4] nfsd: free unused session-DRC slots
> > [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated
> > 
> 
> --
> Chuck Lever
> 
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand
  2024-11-15  9:03   ` NeilBrown
@ 2024-11-15 14:27     ` Chuck Lever
  0 siblings, 0 replies; 16+ messages in thread
From: Chuck Lever @ 2024-11-15 14:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo,
	Tom Talpey

On Fri, Nov 15, 2024 at 08:03:00PM +1100, NeilBrown wrote:
> On Thu, 14 Nov 2024, Chuck Lever III wrote:
> > 
> > 
> > > On Nov 13, 2024, at 12:38 AM, NeilBrown <neilb@suse.de> wrote:
> > > 
> > > This patch set aims to allocate session-based DRC slots on demand, and
> > > free them when not in use, or when memory is tight.
> > > 
> > > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> > > overly agreesive, and with lots of printks, and it seems to do the right
> > > thing, though memory pressure has never freed anything - I think you
> > > need several clients with a non-trivial number of slots allocated before
> > > the thresholds in the shrinker code will trigger any freeing.
> > 
> > Can you describe your test set-up? Generally a system
> > with less than 4GB of memory can trigger shrinkers
> > pretty easily.
> > 
> > If we never see the mechanism being triggered due to
> > memory exhaustion, then I wonder if the additional
> > complexity is adding substantial value.
> 
> Just a single VM with 1G RAM.  Only one client so only one session.

That's a good RAM size for this test, but I think you do need to
have more than a single client to stress the server a little more.


> The default batch count for shrinkers is 64 and the reported count of
> freeable items is normally scaled down a lot until memory gets really
> tight.  So if I only have 6 slots that could be freed the shrinker isn't
> going to notice.
> I set ->batch to 2 and ->seeks to 0 and the shrinker started freeing
> things.  This allowed me to see some bugs.
> 
> One that I haven't resolved yet is the need to wait to get confirmation
> from the client before rejecting requests with larger numbered slots.
> 
> > 
> > 
> > > I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> > > useful that is.  There are certainly cases where simply setting the
> > > target in a SEQUENCE reply might not be enough, but I doubt they are
> > > very common.  You would need a session to be completely idle, with the
> > > last request received on it indicating that lots of slots were still in
> > > use.
> > > 
> > > Currently we allocate slots one at a time when the last available slot
> > > was used by the client, and only if a NOWAIT allocation can succeed.  It
> > > is possible that this isn't quite agreesive enough.  When performing a
> > > lot of writeback it can be useful to have lots of slots, but memory
> > > pressure is also likely to build up on the server so GFP_NOWAIT is likely
> > > to fail.  Maybe occasionally using a firmer request (outside the
> > > spinlock) would be justified.
> > 
> > I'm wondering why GFP_NOWAIT is used here, and I admit
> > I'm not strongly familiar with the code or mechanism.
> > Why not always use GFP_KERNEL ?
> 
> Partly because the kmalloc call is under a spinlock, so we cannot wait. 
> But that could be changed with a bit of work.
> 
> GFP_KERNEL can block indefinitely, and we don't actually need the
> allocation to succeed to satisfy the current request, so it seems wrong
> to block at all when we don't need to.

Ah, that is sensible. A comment might be added that explains that
the server response to the client's "request" to increase the slot
table size does not have to be immediate, and in fact, the failure
is a sign the server is under memory pressure and shouldn't grow the
slot table size anyway.


> I'm hoping that GFP_NOWAIT will succeed often enough that the slot table
> will grow when there is demand - maybe not instantly but not too slowly.

I don't think it will ever fail if the requested allocation is below
4KB, so this might be no more than a fussy detail. It does indeed
make sense to me, though.


> If GFP_NOWAIT doesn't succeed, then reclaim will be happening and the
> shrinker will probably ask us to return some slots soon - maybe it isn't
> worth trying hard to allocate something we will have to return soon.

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-11-18 15:43 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-13  5:38 [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand NeilBrown
2024-11-13  5:38 ` [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC NeilBrown
2024-11-13 16:48   ` Jeff Layton
2024-11-15  4:01     ` NeilBrown
2024-11-15 14:07       ` Jeff Layton
2024-11-13  5:38 ` [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand NeilBrown
2024-11-13  5:38 ` [PATCH 3/4] nfsd: free unused session-DRC slots NeilBrown
2024-11-13 16:52   ` Jeff Layton
2024-11-13  5:38 ` [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated per session NeilBrown
2024-11-13 11:23 ` [PATCH 0/4 RFC] nfsd: allocate/free session-based DRC slots on demand Daire Byrne
2024-11-15  4:56   ` NeilBrown
2024-11-15 14:18     ` Chuck Lever
2024-11-18 15:43     ` Daire Byrne
2024-11-13 14:48 ` Chuck Lever III
2024-11-15  9:03   ` NeilBrown
2024-11-15 14:27     ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox