* [PATCH v2 0/2] NFSD: memory shrinker for NFSv4 clients
@ 2022-08-29 0:47 Dai Ngo
2022-08-29 0:47 ` [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system Dai Ngo
2022-08-29 0:47 ` [PATCH v2 2/2] NFSD: add shrinker to reap courtesy clients on low memory condition Dai Ngo
0 siblings, 2 replies; 11+ messages in thread
From: Dai Ngo @ 2022-08-29 0:47 UTC (permalink / raw)
To: chuck.lever; +Cc: linux-nfs
This patch series implements the memory shrinker for NFSv4 clients
to react to system low memory condition.
The memory shrinker's count callback is used to trigger the laundromat.
The actual work of destroying the expired clients is done by the
laundromat itself. We can not destroying the expired clients on the
memory shrinler's scan callback context to avoid possible deadlock.
By destroying the expired clients, all states associated with these
clients are also released.
v2:
. fix kernel test robot errors in nfsd.h when CONFIG_NFSD_V4 not defined.
---
Dai Ngo (2):
NFSD: keep track of the number of courtesy clients in the system
NFSD: add shrinker to reap courtesy clients on low memory condition
fs/nfsd/netns.h | 5 ++++
fs/nfsd/nfs4state.c | 69 ++++++++++++++++++++++++++++++++++++++++++------
fs/nfsd/nfsctl.c | 6 +++--
fs/nfsd/nfsd.h | 9 +++++--
4 files changed, 77 insertions(+), 12 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system 2022-08-29 0:47 [PATCH v2 0/2] NFSD: memory shrinker for NFSv4 clients Dai Ngo @ 2022-08-29 0:47 ` Dai Ngo 2022-08-29 16:48 ` Jeff Layton 2022-08-29 0:47 ` [PATCH v2 2/2] NFSD: add shrinker to reap courtesy clients on low memory condition Dai Ngo 1 sibling, 1 reply; 11+ messages in thread From: Dai Ngo @ 2022-08-29 0:47 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs Add counter nfs4_courtesy_client_count to nfsd_net to keep track of the number of courtesy clients in the system. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> --- fs/nfsd/netns.h | 2 ++ fs/nfsd/nfs4state.c | 20 +++++++++++++++----- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index ffe17743cc74..2695dff1378a 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -192,6 +192,8 @@ struct nfsd_net { atomic_t nfs4_client_count; int nfs4_max_clients; + + atomic_t nfsd_courtesy_client_count; }; /* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c5d199d7e6b4..3d8d7ebb5b91 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -169,7 +169,8 @@ static __be32 get_client_locked(struct nfs4_client *clp) if (is_client_expired(clp)) return nfserr_expired; atomic_inc(&clp->cl_rpc_users); - clp->cl_state = NFSD4_ACTIVE; + if (xchg(&clp->cl_state, NFSD4_ACTIVE) != NFSD4_ACTIVE) + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); return nfs_ok; } @@ -190,7 +191,8 @@ renew_client_locked(struct nfs4_client *clp) list_move_tail(&clp->cl_lru, &nn->client_lru); clp->cl_time = ktime_get_boottime_seconds(); - clp->cl_state = NFSD4_ACTIVE; + if (xchg(&clp->cl_state, NFSD4_ACTIVE) != NFSD4_ACTIVE) + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); } static void put_client_renew_locked(struct nfs4_client *clp) @@ -2233,6 +2235,8 @@ __destroy_client(struct nfs4_client *clp) if (clp->cl_cb_conn.cb_xprt) svc_xprt_put(clp->cl_cb_conn.cb_xprt); atomic_add_unless(&nn->nfs4_client_count, -1, 0); + if (clp->cl_state != NFSD4_ACTIVE) + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); free_client(clp); wake_up_all(&expiry_wq); } @@ -4356,6 +4360,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) max_clients = (u64)si.totalram * si.mem_unit / (1024 * 1024 * 1024); max_clients *= NFS4_CLIENTS_PER_GB; nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); + + atomic_set(&nn->nfsd_courtesy_client_count, 0); } static void init_nfs4_replay(struct nfs4_replay *rp) @@ -5864,7 +5870,7 @@ static void nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, struct laundry_time *lt) { - unsigned int maxreap, reapcnt = 0; + unsigned int oldstate, maxreap, reapcnt = 0; struct list_head *pos, *next; struct nfs4_client *clp; @@ -5878,8 +5884,12 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, goto exp_client; if (!state_expired(lt, clp->cl_time)) break; - if (!atomic_read(&clp->cl_rpc_users)) - clp->cl_state = NFSD4_COURTESY; + oldstate = NFSD4_ACTIVE; + if (!atomic_read(&clp->cl_rpc_users)) { + oldstate = xchg(&clp->cl_state, NFSD4_COURTESY); + if (oldstate == NFSD4_ACTIVE) + atomic_inc(&nn->nfsd_courtesy_client_count); + } if (!client_has_state(clp)) goto exp_client; if (!nfs4_anylock_blockers(clp)) -- 2.9.5 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system 2022-08-29 0:47 ` [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system Dai Ngo @ 2022-08-29 16:48 ` Jeff Layton 2022-08-29 18:24 ` dai.ngo 0 siblings, 1 reply; 11+ messages in thread From: Jeff Layton @ 2022-08-29 16:48 UTC (permalink / raw) To: Dai Ngo, chuck.lever; +Cc: linux-nfs On Sun, 2022-08-28 at 17:47 -0700, Dai Ngo wrote: > Add counter nfs4_courtesy_client_count to nfsd_net to keep track > of the number of courtesy clients in the system. > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com> > --- > fs/nfsd/netns.h | 2 ++ > fs/nfsd/nfs4state.c | 20 +++++++++++++++----- > 2 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h > index ffe17743cc74..2695dff1378a 100644 > --- a/fs/nfsd/netns.h > +++ b/fs/nfsd/netns.h > @@ -192,6 +192,8 @@ struct nfsd_net { > > atomic_t nfs4_client_count; > int nfs4_max_clients; > + > + atomic_t nfsd_courtesy_client_count; > }; > > /* Simple check to find out if a given net was properly initialized */ > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index c5d199d7e6b4..3d8d7ebb5b91 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -169,7 +169,8 @@ static __be32 get_client_locked(struct nfs4_client *clp) > if (is_client_expired(clp)) > return nfserr_expired; > atomic_inc(&clp->cl_rpc_users); > - clp->cl_state = NFSD4_ACTIVE; > + if (xchg(&clp->cl_state, NFSD4_ACTIVE) != NFSD4_ACTIVE) The xchg calls seem like overkill. The cl_state is protected by the nn->client_lock. Nothing else can race in and change it here. > + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); > return nfs_ok; > } > > @@ -190,7 +191,8 @@ renew_client_locked(struct nfs4_client *clp) > > list_move_tail(&clp->cl_lru, &nn->client_lru); > clp->cl_time = ktime_get_boottime_seconds(); > - clp->cl_state = NFSD4_ACTIVE; > + if (xchg(&clp->cl_state, NFSD4_ACTIVE) != NFSD4_ACTIVE) > + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); > } > > static void put_client_renew_locked(struct nfs4_client *clp) > @@ -2233,6 +2235,8 @@ __destroy_client(struct nfs4_client *clp) > if (clp->cl_cb_conn.cb_xprt) > svc_xprt_put(clp->cl_cb_conn.cb_xprt); > atomic_add_unless(&nn->nfs4_client_count, -1, 0); > + if (clp->cl_state != NFSD4_ACTIVE) > + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); > free_client(clp); > wake_up_all(&expiry_wq); > } > @@ -4356,6 +4360,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) > max_clients = (u64)si.totalram * si.mem_unit / (1024 * 1024 * 1024); > max_clients *= NFS4_CLIENTS_PER_GB; > nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); > + > + atomic_set(&nn->nfsd_courtesy_client_count, 0); > } > > static void init_nfs4_replay(struct nfs4_replay *rp) > @@ -5864,7 +5870,7 @@ static void > nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, > struct laundry_time *lt) > { > - unsigned int maxreap, reapcnt = 0; > + unsigned int oldstate, maxreap, reapcnt = 0; > struct list_head *pos, *next; > struct nfs4_client *clp; > > @@ -5878,8 +5884,12 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, > goto exp_client; > if (!state_expired(lt, clp->cl_time)) > break; > - if (!atomic_read(&clp->cl_rpc_users)) > - clp->cl_state = NFSD4_COURTESY; > + oldstate = NFSD4_ACTIVE; > + if (!atomic_read(&clp->cl_rpc_users)) { > + oldstate = xchg(&clp->cl_state, NFSD4_COURTESY); > + if (oldstate == NFSD4_ACTIVE) > + atomic_inc(&nn->nfsd_courtesy_client_count); > + } > if (!client_has_state(clp)) > goto exp_client; > if (!nfs4_anylock_blockers(clp)) -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system 2022-08-29 16:48 ` Jeff Layton @ 2022-08-29 18:24 ` dai.ngo 2022-08-29 18:27 ` Jeff Layton 0 siblings, 1 reply; 11+ messages in thread From: dai.ngo @ 2022-08-29 18:24 UTC (permalink / raw) To: Jeff Layton, chuck.lever; +Cc: linux-nfs On 8/29/22 9:48 AM, Jeff Layton wrote: > On Sun, 2022-08-28 at 17:47 -0700, Dai Ngo wrote: >> Add counter nfs4_courtesy_client_count to nfsd_net to keep track >> of the number of courtesy clients in the system. >> >> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> >> --- >> fs/nfsd/netns.h | 2 ++ >> fs/nfsd/nfs4state.c | 20 +++++++++++++++----- >> 2 files changed, 17 insertions(+), 5 deletions(-) >> >> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h >> index ffe17743cc74..2695dff1378a 100644 >> --- a/fs/nfsd/netns.h >> +++ b/fs/nfsd/netns.h >> @@ -192,6 +192,8 @@ struct nfsd_net { >> >> atomic_t nfs4_client_count; >> int nfs4_max_clients; >> + >> + atomic_t nfsd_courtesy_client_count; >> }; >> >> /* Simple check to find out if a given net was properly initialized */ >> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >> index c5d199d7e6b4..3d8d7ebb5b91 100644 >> --- a/fs/nfsd/nfs4state.c >> +++ b/fs/nfsd/nfs4state.c >> @@ -169,7 +169,8 @@ static __be32 get_client_locked(struct nfs4_client *clp) >> if (is_client_expired(clp)) >> return nfserr_expired; >> atomic_inc(&clp->cl_rpc_users); >> - clp->cl_state = NFSD4_ACTIVE; >> + if (xchg(&clp->cl_state, NFSD4_ACTIVE) != NFSD4_ACTIVE) > The xchg calls seem like overkill. The cl_state is protected by the > nn->client_lock. Nothing else can race in and change it here. I use the 'xchg' calls for convenience and readability and not for protection in this case. But if you think this is overkill or unnecessary then I remove it. Fix in v2. Thanks, -Dai > >> + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); >> return nfs_ok; >> } >> >> @@ -190,7 +191,8 @@ renew_client_locked(struct nfs4_client *clp) >> >> list_move_tail(&clp->cl_lru, &nn->client_lru); >> clp->cl_time = ktime_get_boottime_seconds(); >> - clp->cl_state = NFSD4_ACTIVE; >> + if (xchg(&clp->cl_state, NFSD4_ACTIVE) != NFSD4_ACTIVE) >> + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); >> } >> >> static void put_client_renew_locked(struct nfs4_client *clp) >> @@ -2233,6 +2235,8 @@ __destroy_client(struct nfs4_client *clp) >> if (clp->cl_cb_conn.cb_xprt) >> svc_xprt_put(clp->cl_cb_conn.cb_xprt); >> atomic_add_unless(&nn->nfs4_client_count, -1, 0); >> + if (clp->cl_state != NFSD4_ACTIVE) >> + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); >> free_client(clp); >> wake_up_all(&expiry_wq); >> } >> @@ -4356,6 +4360,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) >> max_clients = (u64)si.totalram * si.mem_unit / (1024 * 1024 * 1024); >> max_clients *= NFS4_CLIENTS_PER_GB; >> nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); >> + >> + atomic_set(&nn->nfsd_courtesy_client_count, 0); >> } >> >> static void init_nfs4_replay(struct nfs4_replay *rp) >> @@ -5864,7 +5870,7 @@ static void >> nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, >> struct laundry_time *lt) >> { >> - unsigned int maxreap, reapcnt = 0; >> + unsigned int oldstate, maxreap, reapcnt = 0; >> struct list_head *pos, *next; >> struct nfs4_client *clp; >> >> @@ -5878,8 +5884,12 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, >> goto exp_client; >> if (!state_expired(lt, clp->cl_time)) >> break; >> - if (!atomic_read(&clp->cl_rpc_users)) >> - clp->cl_state = NFSD4_COURTESY; >> + oldstate = NFSD4_ACTIVE; >> + if (!atomic_read(&clp->cl_rpc_users)) { >> + oldstate = xchg(&clp->cl_state, NFSD4_COURTESY); >> + if (oldstate == NFSD4_ACTIVE) >> + atomic_inc(&nn->nfsd_courtesy_client_count); >> + } >> if (!client_has_state(clp)) >> goto exp_client; >> if (!nfs4_anylock_blockers(clp)) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system 2022-08-29 18:24 ` dai.ngo @ 2022-08-29 18:27 ` Jeff Layton 0 siblings, 0 replies; 11+ messages in thread From: Jeff Layton @ 2022-08-29 18:27 UTC (permalink / raw) To: dai.ngo, chuck.lever; +Cc: linux-nfs On Mon, 2022-08-29 at 11:24 -0700, dai.ngo@oracle.com wrote: > On 8/29/22 9:48 AM, Jeff Layton wrote: > > On Sun, 2022-08-28 at 17:47 -0700, Dai Ngo wrote: > > > Add counter nfs4_courtesy_client_count to nfsd_net to keep track > > > of the number of courtesy clients in the system. > > > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com> > > > --- > > > fs/nfsd/netns.h | 2 ++ > > > fs/nfsd/nfs4state.c | 20 +++++++++++++++----- > > > 2 files changed, 17 insertions(+), 5 deletions(-) > > > > > > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h > > > index ffe17743cc74..2695dff1378a 100644 > > > --- a/fs/nfsd/netns.h > > > +++ b/fs/nfsd/netns.h > > > @@ -192,6 +192,8 @@ struct nfsd_net { > > > > > > atomic_t nfs4_client_count; > > > int nfs4_max_clients; > > > + > > > + atomic_t nfsd_courtesy_client_count; > > > }; > > > > > > /* Simple check to find out if a given net was properly initialized */ > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > index c5d199d7e6b4..3d8d7ebb5b91 100644 > > > --- a/fs/nfsd/nfs4state.c > > > +++ b/fs/nfsd/nfs4state.c > > > @@ -169,7 +169,8 @@ static __be32 get_client_locked(struct nfs4_client *clp) > > > if (is_client_expired(clp)) > > > return nfserr_expired; > > > atomic_inc(&clp->cl_rpc_users); > > > - clp->cl_state = NFSD4_ACTIVE; > > > + if (xchg(&clp->cl_state, NFSD4_ACTIVE) != NFSD4_ACTIVE) > > The xchg calls seem like overkill. The cl_state is protected by the > > nn->client_lock. Nothing else can race in and change it here. > > I use the 'xchg' calls for convenience and readability and not for > protection in this case. But if you think this is overkill or > unnecessary then I remove it. > > Fix in v2. > > Thanks, > -Dai No worries, it's a minor thing. It probably doesn't hurt anything in this context, but the xchg operation itself has to be atomic. We already hold the spinlock in these places, so there's no need to demand that here. > > > + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); > > > return nfs_ok; > > > } > > > > > > @@ -190,7 +191,8 @@ renew_client_locked(struct nfs4_client *clp) > > > > > > list_move_tail(&clp->cl_lru, &nn->client_lru); > > > clp->cl_time = ktime_get_boottime_seconds(); > > > - clp->cl_state = NFSD4_ACTIVE; > > > + if (xchg(&clp->cl_state, NFSD4_ACTIVE) != NFSD4_ACTIVE) > > > + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); > > > } > > > > > > static void put_client_renew_locked(struct nfs4_client *clp) > > > @@ -2233,6 +2235,8 @@ __destroy_client(struct nfs4_client *clp) > > > if (clp->cl_cb_conn.cb_xprt) > > > svc_xprt_put(clp->cl_cb_conn.cb_xprt); > > > atomic_add_unless(&nn->nfs4_client_count, -1, 0); > > > + if (clp->cl_state != NFSD4_ACTIVE) > > > + atomic_add_unless(&nn->nfsd_courtesy_client_count, -1, 0); > > > free_client(clp); > > > wake_up_all(&expiry_wq); > > > } > > > @@ -4356,6 +4360,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) > > > max_clients = (u64)si.totalram * si.mem_unit / (1024 * 1024 * 1024); > > > max_clients *= NFS4_CLIENTS_PER_GB; > > > nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); > > > + > > > + atomic_set(&nn->nfsd_courtesy_client_count, 0); > > > } > > > > > > static void init_nfs4_replay(struct nfs4_replay *rp) > > > @@ -5864,7 +5870,7 @@ static void > > > nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, > > > struct laundry_time *lt) > > > { > > > - unsigned int maxreap, reapcnt = 0; > > > + unsigned int oldstate, maxreap, reapcnt = 0; > > > struct list_head *pos, *next; > > > struct nfs4_client *clp; > > > > > > @@ -5878,8 +5884,12 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, > > > goto exp_client; > > > if (!state_expired(lt, clp->cl_time)) > > > break; > > > - if (!atomic_read(&clp->cl_rpc_users)) > > > - clp->cl_state = NFSD4_COURTESY; > > > + oldstate = NFSD4_ACTIVE; > > > + if (!atomic_read(&clp->cl_rpc_users)) { > > > + oldstate = xchg(&clp->cl_state, NFSD4_COURTESY); > > > + if (oldstate == NFSD4_ACTIVE) > > > + atomic_inc(&nn->nfsd_courtesy_client_count); > > > + } > > > if (!client_has_state(clp)) > > > goto exp_client; > > > if (!nfs4_anylock_blockers(clp)) -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 2/2] NFSD: add shrinker to reap courtesy clients on low memory condition 2022-08-29 0:47 [PATCH v2 0/2] NFSD: memory shrinker for NFSv4 clients Dai Ngo 2022-08-29 0:47 ` [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system Dai Ngo @ 2022-08-29 0:47 ` Dai Ngo 2022-08-29 17:15 ` Jeff Layton 1 sibling, 1 reply; 11+ messages in thread From: Dai Ngo @ 2022-08-29 0:47 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs Add the courtesy client shrinker to react to low memory condition triggered by the memory shrinker. On the shrinker's count callback, we increment a callback counter and return the number of outstanding courtesy clients. When the laundromat runs, it checks if this counter is not zero and starts reaping old courtesy clients. The maximum number of clients to be reaped is limited to NFSD_CIENT_MAX_TRIM_PER_RUN (128). This limit is to prevent the laundromat from spending too much time reaping the clients and not processing other tasks in a timely manner. The laundromat is rescheduled to run sooner if it detects low low memory condition and there are more clients to reap. On the shrinker's scan callback, we return the number of clients That were reaped since the last scan callback. We can not reap the clients on the scan callback context since destroying the client might require call into the underlying filesystem or other subsystems which might allocate memory which can cause deadlock. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> --- fs/nfsd/netns.h | 3 +++ fs/nfsd/nfs4state.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++---- fs/nfsd/nfsctl.c | 6 ++++-- fs/nfsd/nfsd.h | 9 +++++++-- 4 files changed, 61 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 2695dff1378a..2a604951623f 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -194,6 +194,9 @@ struct nfsd_net { int nfs4_max_clients; atomic_t nfsd_courtesy_client_count; + atomic_t nfsd_client_shrinker_cb_count; + atomic_t nfsd_client_shrinker_reapcount; + struct shrinker nfsd_client_shrinker; }; /* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 3d8d7ebb5b91..9d5a20f0c3c4 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4341,7 +4341,39 @@ nfsd4_init_slabs(void) return -ENOMEM; } -void nfsd4_init_leases_net(struct nfsd_net *nn) +static unsigned long +nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) +{ + struct nfsd_net *nn = container_of(shrink, + struct nfsd_net, nfsd_client_shrinker); + + atomic_inc(&nn->nfsd_client_shrinker_cb_count); + return (unsigned long)atomic_read(&nn->nfsd_courtesy_client_count); +} + +static unsigned long +nfsd_courtesy_client_scan(struct shrinker *shrink, struct shrink_control *sc) +{ + struct nfsd_net *nn = container_of(shrink, + struct nfsd_net, nfsd_client_shrinker); + unsigned long cnt; + + cnt = atomic_read(&nn->nfsd_client_shrinker_reapcount); + atomic_set(&nn->nfsd_client_shrinker_reapcount, 0); + return cnt; +} + +static int +nfsd_register_client_shrinker(struct nfsd_net *nn) +{ + nn->nfsd_client_shrinker.scan_objects = nfsd_courtesy_client_scan; + nn->nfsd_client_shrinker.count_objects = nfsd_courtesy_client_count; + nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; + return register_shrinker(&nn->nfsd_client_shrinker, "nfsd-client"); +} + +int +nfsd4_init_leases_net(struct nfsd_net *nn) { struct sysinfo si; u64 max_clients; @@ -4362,6 +4394,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); atomic_set(&nn->nfsd_courtesy_client_count, 0); + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); + return nfsd_register_client_shrinker(nn); } static void init_nfs4_replay(struct nfs4_replay *rp) @@ -5870,12 +5904,17 @@ static void nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, struct laundry_time *lt) { - unsigned int oldstate, maxreap, reapcnt = 0; + unsigned int oldstate, maxreap = 0, reapcnt = 0; + int cb_cnt; struct list_head *pos, *next; struct nfs4_client *clp; - maxreap = (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) ? - NFSD_CLIENT_MAX_TRIM_PER_RUN : 0; + cb_cnt = atomic_read(&nn->nfsd_client_shrinker_cb_count); + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients || + cb_cnt) { + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); + } INIT_LIST_HEAD(reaplist); spin_lock(&nn->client_lock); list_for_each_safe(pos, next, &nn->client_lru) { @@ -5902,6 +5941,8 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, } } spin_unlock(&nn->client_lock); + if (cb_cnt) + atomic_add(reapcnt, &nn->nfsd_client_shrinker_reapcount); } static time64_t @@ -5942,6 +5983,8 @@ nfs4_laundromat(struct nfsd_net *nn) list_del_init(&clp->cl_lru); expire_client(clp); } + if (atomic_read(&nn->nfsd_client_shrinker_cb_count) > 0) + lt.new_timeo = NFSD_LAUNDROMAT_MINTIMEOUT; spin_lock(&state_lock); list_for_each_safe(pos, next, &nn->del_recall_lru) { dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 917fa1892fd2..597a26ad4183 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1481,11 +1481,12 @@ static __net_init int nfsd_init_net(struct net *net) goto out_idmap_error; nn->nfsd_versions = NULL; nn->nfsd4_minorversions = NULL; + retval = nfsd4_init_leases_net(nn); + if (retval) + goto out_drc_error; retval = nfsd_reply_cache_init(nn); if (retval) goto out_drc_error; - nfsd4_init_leases_net(nn); - get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); seqlock_init(&nn->writeverf_lock); @@ -1507,6 +1508,7 @@ static __net_exit void nfsd_exit_net(struct net *net) nfsd_idmap_shutdown(net); nfsd_export_shutdown(net); nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); + nfsd4_leases_net_shutdown(nn); } static struct pernet_operations nfsd_net_ops = { diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 57a468ed85c3..7e05ab7a3532 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -498,7 +498,11 @@ extern void unregister_cld_notifier(void); extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); #endif -extern void nfsd4_init_leases_net(struct nfsd_net *nn); +extern int nfsd4_init_leases_net(struct nfsd_net *nn); +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) +{ + unregister_shrinker(&nn->nfsd_client_shrinker); +}; #else /* CONFIG_NFSD_V4 */ static inline int nfsd4_is_junction(struct dentry *dentry) @@ -506,7 +510,8 @@ static inline int nfsd4_is_junction(struct dentry *dentry) return 0; } -static inline void nfsd4_init_leases_net(struct nfsd_net *nn) {}; +static inline int nfsd4_init_leases_net(struct nfsd_net *nn) { return 0; }; +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) { }; #define register_cld_notifier() 0 #define unregister_cld_notifier() do { } while(0) -- 2.9.5 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] NFSD: add shrinker to reap courtesy clients on low memory condition 2022-08-29 0:47 ` [PATCH v2 2/2] NFSD: add shrinker to reap courtesy clients on low memory condition Dai Ngo @ 2022-08-29 17:15 ` Jeff Layton 2022-08-29 18:25 ` dai.ngo 0 siblings, 1 reply; 11+ messages in thread From: Jeff Layton @ 2022-08-29 17:15 UTC (permalink / raw) To: Dai Ngo, chuck.lever; +Cc: linux-nfs On Sun, 2022-08-28 at 17:47 -0700, Dai Ngo wrote: > Add the courtesy client shrinker to react to low memory condition > triggered by the memory shrinker. > > On the shrinker's count callback, we increment a callback counter > and return the number of outstanding courtesy clients. When the > laundromat runs, it checks if this counter is not zero and starts > reaping old courtesy clients. The maximum number of clients to be > reaped is limited to NFSD_CIENT_MAX_TRIM_PER_RUN (128). This limit > is to prevent the laundromat from spending too much time reaping > the clients and not processing other tasks in a timely manner. > > The laundromat is rescheduled to run sooner if it detects low > low memory condition and there are more clients to reap. > > On the shrinker's scan callback, we return the number of clients > That were reaped since the last scan callback. We can not reap > the clients on the scan callback context since destroying the > client might require call into the underlying filesystem or other > subsystems which might allocate memory which can cause deadlock. > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com> > --- > fs/nfsd/netns.h | 3 +++ > fs/nfsd/nfs4state.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++---- > fs/nfsd/nfsctl.c | 6 ++++-- > fs/nfsd/nfsd.h | 9 +++++++-- > 4 files changed, 61 insertions(+), 8 deletions(-) > > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h > index 2695dff1378a..2a604951623f 100644 > --- a/fs/nfsd/netns.h > +++ b/fs/nfsd/netns.h > @@ -194,6 +194,9 @@ struct nfsd_net { > int nfs4_max_clients; > > atomic_t nfsd_courtesy_client_count; > + atomic_t nfsd_client_shrinker_cb_count; > + atomic_t nfsd_client_shrinker_reapcount; > + struct shrinker nfsd_client_shrinker; > }; > > /* Simple check to find out if a given net was properly initialized */ > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index 3d8d7ebb5b91..9d5a20f0c3c4 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -4341,7 +4341,39 @@ nfsd4_init_slabs(void) > return -ENOMEM; > } > > -void nfsd4_init_leases_net(struct nfsd_net *nn) > +static unsigned long > +nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) > +{ > + struct nfsd_net *nn = container_of(shrink, > + struct nfsd_net, nfsd_client_shrinker); > + > + atomic_inc(&nn->nfsd_client_shrinker_cb_count); > + return (unsigned long)atomic_read(&nn->nfsd_courtesy_client_count); > +} > + > +static unsigned long > +nfsd_courtesy_client_scan(struct shrinker *shrink, struct shrink_control *sc) > +{ > + struct nfsd_net *nn = container_of(shrink, > + struct nfsd_net, nfsd_client_shrinker); > + unsigned long cnt; > + > + cnt = atomic_read(&nn->nfsd_client_shrinker_reapcount); > + atomic_set(&nn->nfsd_client_shrinker_reapcount, 0); Is it legit to return that we freed these objects when it hasn't actually been done yet? Maybe this should always return 0? I'm not sure what the rules are with shrinkers. Either way, it seems like "scan" should cue the laundromat to run ASAP. When this is called, it may be quite some time before the laundromat runs again. Currently, it's always just scheduled it out to when we know there may be work to be done, but this is a different situation. > + return cnt; > +} > + > +static int > +nfsd_register_client_shrinker(struct nfsd_net *nn) > +{ > + nn->nfsd_client_shrinker.scan_objects = nfsd_courtesy_client_scan; > + nn->nfsd_client_shrinker.count_objects = nfsd_courtesy_client_count; > + nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; > + return register_shrinker(&nn->nfsd_client_shrinker, "nfsd-client"); > +} > + > +int > +nfsd4_init_leases_net(struct nfsd_net *nn) > { > struct sysinfo si; > u64 max_clients; > @@ -4362,6 +4394,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) > nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); > > atomic_set(&nn->nfsd_courtesy_client_count, 0); > + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); > + return nfsd_register_client_shrinker(nn); > } > > static void init_nfs4_replay(struct nfs4_replay *rp) > @@ -5870,12 +5904,17 @@ static void > nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, > struct laundry_time *lt) > { > - unsigned int oldstate, maxreap, reapcnt = 0; > + unsigned int oldstate, maxreap = 0, reapcnt = 0; > + int cb_cnt; > struct list_head *pos, *next; > struct nfs4_client *clp; > > - maxreap = (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) ? > - NFSD_CLIENT_MAX_TRIM_PER_RUN : 0; > + cb_cnt = atomic_read(&nn->nfsd_client_shrinker_cb_count); > + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients || > + cb_cnt) { > + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; > + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); > + } > INIT_LIST_HEAD(reaplist); > spin_lock(&nn->client_lock); > list_for_each_safe(pos, next, &nn->client_lru) { > @@ -5902,6 +5941,8 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, > } > } > spin_unlock(&nn->client_lock); > + if (cb_cnt) > + atomic_add(reapcnt, &nn->nfsd_client_shrinker_reapcount); > } > > static time64_t > @@ -5942,6 +5983,8 @@ nfs4_laundromat(struct nfsd_net *nn) > list_del_init(&clp->cl_lru); > expire_client(clp); > } > + if (atomic_read(&nn->nfsd_client_shrinker_cb_count) > 0) > + lt.new_timeo = NFSD_LAUNDROMAT_MINTIMEOUT; > spin_lock(&state_lock); > list_for_each_safe(pos, next, &nn->del_recall_lru) { > dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > index 917fa1892fd2..597a26ad4183 100644 > --- a/fs/nfsd/nfsctl.c > +++ b/fs/nfsd/nfsctl.c > @@ -1481,11 +1481,12 @@ static __net_init int nfsd_init_net(struct net *net) > goto out_idmap_error; > nn->nfsd_versions = NULL; > nn->nfsd4_minorversions = NULL; > + retval = nfsd4_init_leases_net(nn); > + if (retval) > + goto out_drc_error; > retval = nfsd_reply_cache_init(nn); > if (retval) > goto out_drc_error; > - nfsd4_init_leases_net(nn); > - > get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); > seqlock_init(&nn->writeverf_lock); > > @@ -1507,6 +1508,7 @@ static __net_exit void nfsd_exit_net(struct net *net) > nfsd_idmap_shutdown(net); > nfsd_export_shutdown(net); > nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); > + nfsd4_leases_net_shutdown(nn); > } > > static struct pernet_operations nfsd_net_ops = { > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h > index 57a468ed85c3..7e05ab7a3532 100644 > --- a/fs/nfsd/nfsd.h > +++ b/fs/nfsd/nfsd.h > @@ -498,7 +498,11 @@ extern void unregister_cld_notifier(void); > extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); > #endif > > -extern void nfsd4_init_leases_net(struct nfsd_net *nn); > +extern int nfsd4_init_leases_net(struct nfsd_net *nn); > +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) > +{ > + unregister_shrinker(&nn->nfsd_client_shrinker); > +}; > > #else /* CONFIG_NFSD_V4 */ > static inline int nfsd4_is_junction(struct dentry *dentry) > @@ -506,7 +510,8 @@ static inline int nfsd4_is_junction(struct dentry *dentry) > return 0; > } > > -static inline void nfsd4_init_leases_net(struct nfsd_net *nn) {}; > +static inline int nfsd4_init_leases_net(struct nfsd_net *nn) { return 0; }; > +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) { }; > > #define register_cld_notifier() 0 > #define unregister_cld_notifier() do { } while(0) -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] NFSD: add shrinker to reap courtesy clients on low memory condition 2022-08-29 17:15 ` Jeff Layton @ 2022-08-29 18:25 ` dai.ngo 2022-08-29 18:40 ` Jeff Layton 0 siblings, 1 reply; 11+ messages in thread From: dai.ngo @ 2022-08-29 18:25 UTC (permalink / raw) To: Jeff Layton, chuck.lever; +Cc: linux-nfs On 8/29/22 10:15 AM, Jeff Layton wrote: > On Sun, 2022-08-28 at 17:47 -0700, Dai Ngo wrote: >> Add the courtesy client shrinker to react to low memory condition >> triggered by the memory shrinker. >> >> On the shrinker's count callback, we increment a callback counter >> and return the number of outstanding courtesy clients. When the >> laundromat runs, it checks if this counter is not zero and starts >> reaping old courtesy clients. The maximum number of clients to be >> reaped is limited to NFSD_CIENT_MAX_TRIM_PER_RUN (128). This limit >> is to prevent the laundromat from spending too much time reaping >> the clients and not processing other tasks in a timely manner. >> >> The laundromat is rescheduled to run sooner if it detects low >> low memory condition and there are more clients to reap. >> >> On the shrinker's scan callback, we return the number of clients >> That were reaped since the last scan callback. We can not reap >> the clients on the scan callback context since destroying the >> client might require call into the underlying filesystem or other >> subsystems which might allocate memory which can cause deadlock. >> >> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> >> --- >> fs/nfsd/netns.h | 3 +++ >> fs/nfsd/nfs4state.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++---- >> fs/nfsd/nfsctl.c | 6 ++++-- >> fs/nfsd/nfsd.h | 9 +++++++-- >> 4 files changed, 61 insertions(+), 8 deletions(-) >> >> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h >> index 2695dff1378a..2a604951623f 100644 >> --- a/fs/nfsd/netns.h >> +++ b/fs/nfsd/netns.h >> @@ -194,6 +194,9 @@ struct nfsd_net { >> int nfs4_max_clients; >> >> atomic_t nfsd_courtesy_client_count; >> + atomic_t nfsd_client_shrinker_cb_count; >> + atomic_t nfsd_client_shrinker_reapcount; >> + struct shrinker nfsd_client_shrinker; >> }; >> >> /* Simple check to find out if a given net was properly initialized */ >> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >> index 3d8d7ebb5b91..9d5a20f0c3c4 100644 >> --- a/fs/nfsd/nfs4state.c >> +++ b/fs/nfsd/nfs4state.c >> @@ -4341,7 +4341,39 @@ nfsd4_init_slabs(void) >> return -ENOMEM; >> } >> >> -void nfsd4_init_leases_net(struct nfsd_net *nn) >> +static unsigned long >> +nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) >> +{ >> + struct nfsd_net *nn = container_of(shrink, >> + struct nfsd_net, nfsd_client_shrinker); >> + >> + atomic_inc(&nn->nfsd_client_shrinker_cb_count); >> + return (unsigned long)atomic_read(&nn->nfsd_courtesy_client_count); >> +} >> + >> +static unsigned long >> +nfsd_courtesy_client_scan(struct shrinker *shrink, struct shrink_control *sc) >> +{ >> + struct nfsd_net *nn = container_of(shrink, >> + struct nfsd_net, nfsd_client_shrinker); >> + unsigned long cnt; >> + >> + cnt = atomic_read(&nn->nfsd_client_shrinker_reapcount); >> + atomic_set(&nn->nfsd_client_shrinker_reapcount, 0); > Is it legit to return that we freed these objects when it hasn't > actually been done yet? Maybe this should always return 0? I'm not sure > what the rules are with shrinkers. nfsd_client_shrinker_reapcount is the actual number of clients that the laudromat was able to destroy in its last run. > > Either way, it seems like "scan" should cue the laundromat to run ASAP. > When this is called, it may be quite some time before the laundromat > runs again. Currently, it's always just scheduled it out to when we know > there may be work to be done, but this is a different situation. Normally the "scan" callback is used to free unused resources and return the number of objects freed. For the NFSv4 clients, we can not free the clients on the "scan" context due to potential deadlock and also the laundromat might block while destroying the clients. Because of this we use the "count" callback to notify the laundromat of the low memory condition. In the "count" callback we do not call mod_delayed_work to kick start the laundromat since we do not want to rely on the inner working mod_delayed_work to guarantee no deadlock. I also think we should do the minimal while on the memory shrinker's callback context. Once the laundromat runs it monitors the low memory condition and reschedules itself to run immediately (NFSD_LAUNDROMAT_MINTIMEOUT) if needed. Thanks, -Dai > >> + return cnt; >> +} >> + >> +static int >> +nfsd_register_client_shrinker(struct nfsd_net *nn) >> +{ >> + nn->nfsd_client_shrinker.scan_objects = nfsd_courtesy_client_scan; >> + nn->nfsd_client_shrinker.count_objects = nfsd_courtesy_client_count; >> + nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; >> + return register_shrinker(&nn->nfsd_client_shrinker, "nfsd-client"); >> +} >> + >> +int >> +nfsd4_init_leases_net(struct nfsd_net *nn) >> { >> struct sysinfo si; >> u64 max_clients; >> @@ -4362,6 +4394,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) >> nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); >> >> atomic_set(&nn->nfsd_courtesy_client_count, 0); >> + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); >> + return nfsd_register_client_shrinker(nn); >> } >> >> static void init_nfs4_replay(struct nfs4_replay *rp) >> @@ -5870,12 +5904,17 @@ static void >> nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, >> struct laundry_time *lt) >> { >> - unsigned int oldstate, maxreap, reapcnt = 0; >> + unsigned int oldstate, maxreap = 0, reapcnt = 0; >> + int cb_cnt; >> struct list_head *pos, *next; >> struct nfs4_client *clp; >> >> - maxreap = (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) ? >> - NFSD_CLIENT_MAX_TRIM_PER_RUN : 0; >> + cb_cnt = atomic_read(&nn->nfsd_client_shrinker_cb_count); >> + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients || >> + cb_cnt) { >> + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; >> + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); >> + } >> INIT_LIST_HEAD(reaplist); >> spin_lock(&nn->client_lock); >> list_for_each_safe(pos, next, &nn->client_lru) { >> @@ -5902,6 +5941,8 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, >> } >> } >> spin_unlock(&nn->client_lock); >> + if (cb_cnt) >> + atomic_add(reapcnt, &nn->nfsd_client_shrinker_reapcount); >> } >> >> static time64_t >> @@ -5942,6 +5983,8 @@ nfs4_laundromat(struct nfsd_net *nn) >> list_del_init(&clp->cl_lru); >> expire_client(clp); >> } >> + if (atomic_read(&nn->nfsd_client_shrinker_cb_count) > 0) >> + lt.new_timeo = NFSD_LAUNDROMAT_MINTIMEOUT; >> spin_lock(&state_lock); >> list_for_each_safe(pos, next, &nn->del_recall_lru) { >> dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); >> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c >> index 917fa1892fd2..597a26ad4183 100644 >> --- a/fs/nfsd/nfsctl.c >> +++ b/fs/nfsd/nfsctl.c >> @@ -1481,11 +1481,12 @@ static __net_init int nfsd_init_net(struct net *net) >> goto out_idmap_error; >> nn->nfsd_versions = NULL; >> nn->nfsd4_minorversions = NULL; >> + retval = nfsd4_init_leases_net(nn); >> + if (retval) >> + goto out_drc_error; >> retval = nfsd_reply_cache_init(nn); >> if (retval) >> goto out_drc_error; >> - nfsd4_init_leases_net(nn); >> - >> get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); >> seqlock_init(&nn->writeverf_lock); >> >> @@ -1507,6 +1508,7 @@ static __net_exit void nfsd_exit_net(struct net *net) >> nfsd_idmap_shutdown(net); >> nfsd_export_shutdown(net); >> nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); >> + nfsd4_leases_net_shutdown(nn); >> } >> >> static struct pernet_operations nfsd_net_ops = { >> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h >> index 57a468ed85c3..7e05ab7a3532 100644 >> --- a/fs/nfsd/nfsd.h >> +++ b/fs/nfsd/nfsd.h >> @@ -498,7 +498,11 @@ extern void unregister_cld_notifier(void); >> extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); >> #endif >> >> -extern void nfsd4_init_leases_net(struct nfsd_net *nn); >> +extern int nfsd4_init_leases_net(struct nfsd_net *nn); >> +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) >> +{ >> + unregister_shrinker(&nn->nfsd_client_shrinker); >> +}; >> >> #else /* CONFIG_NFSD_V4 */ >> static inline int nfsd4_is_junction(struct dentry *dentry) >> @@ -506,7 +510,8 @@ static inline int nfsd4_is_junction(struct dentry *dentry) >> return 0; >> } >> >> -static inline void nfsd4_init_leases_net(struct nfsd_net *nn) {}; >> +static inline int nfsd4_init_leases_net(struct nfsd_net *nn) { return 0; }; >> +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) { }; >> >> #define register_cld_notifier() 0 >> #define unregister_cld_notifier() do { } while(0) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] NFSD: add shrinker to reap courtesy clients on low memory condition 2022-08-29 18:25 ` dai.ngo @ 2022-08-29 18:40 ` Jeff Layton 2022-08-29 18:52 ` dai.ngo 0 siblings, 1 reply; 11+ messages in thread From: Jeff Layton @ 2022-08-29 18:40 UTC (permalink / raw) To: dai.ngo, chuck.lever; +Cc: linux-nfs On Mon, 2022-08-29 at 11:25 -0700, dai.ngo@oracle.com wrote: > On 8/29/22 10:15 AM, Jeff Layton wrote: > > On Sun, 2022-08-28 at 17:47 -0700, Dai Ngo wrote: > > > Add the courtesy client shrinker to react to low memory condition > > > triggered by the memory shrinker. > > > > > > On the shrinker's count callback, we increment a callback counter > > > and return the number of outstanding courtesy clients. When the > > > laundromat runs, it checks if this counter is not zero and starts > > > reaping old courtesy clients. The maximum number of clients to be > > > reaped is limited to NFSD_CIENT_MAX_TRIM_PER_RUN (128). This limit > > > is to prevent the laundromat from spending too much time reaping > > > the clients and not processing other tasks in a timely manner. > > > > > > The laundromat is rescheduled to run sooner if it detects low > > > low memory condition and there are more clients to reap. > > > > > > On the shrinker's scan callback, we return the number of clients > > > That were reaped since the last scan callback. We can not reap > > > the clients on the scan callback context since destroying the > > > client might require call into the underlying filesystem or other > > > subsystems which might allocate memory which can cause deadlock. > > > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com> > > > --- > > > fs/nfsd/netns.h | 3 +++ > > > fs/nfsd/nfs4state.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++---- > > > fs/nfsd/nfsctl.c | 6 ++++-- > > > fs/nfsd/nfsd.h | 9 +++++++-- > > > 4 files changed, 61 insertions(+), 8 deletions(-) > > > > > > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h > > > index 2695dff1378a..2a604951623f 100644 > > > --- a/fs/nfsd/netns.h > > > +++ b/fs/nfsd/netns.h > > > @@ -194,6 +194,9 @@ struct nfsd_net { > > > int nfs4_max_clients; > > > > > > atomic_t nfsd_courtesy_client_count; > > > + atomic_t nfsd_client_shrinker_cb_count; > > > + atomic_t nfsd_client_shrinker_reapcount; > > > + struct shrinker nfsd_client_shrinker; > > > }; > > > > > > /* Simple check to find out if a given net was properly initialized */ > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > index 3d8d7ebb5b91..9d5a20f0c3c4 100644 > > > --- a/fs/nfsd/nfs4state.c > > > +++ b/fs/nfsd/nfs4state.c > > > @@ -4341,7 +4341,39 @@ nfsd4_init_slabs(void) > > > return -ENOMEM; > > > } > > > > > > -void nfsd4_init_leases_net(struct nfsd_net *nn) > > > +static unsigned long > > > +nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) > > > +{ > > > + struct nfsd_net *nn = container_of(shrink, > > > + struct nfsd_net, nfsd_client_shrinker); > > > + > > > + atomic_inc(&nn->nfsd_client_shrinker_cb_count); > > > + return (unsigned long)atomic_read(&nn->nfsd_courtesy_client_count); > > > +} > > > + > > > +static unsigned long > > > +nfsd_courtesy_client_scan(struct shrinker *shrink, struct shrink_control *sc) > > > +{ > > > + struct nfsd_net *nn = container_of(shrink, > > > + struct nfsd_net, nfsd_client_shrinker); > > > + unsigned long cnt; > > > + > > > + cnt = atomic_read(&nn->nfsd_client_shrinker_reapcount); > > > + atomic_set(&nn->nfsd_client_shrinker_reapcount, 0); > > Is it legit to return that we freed these objects when it hasn't > > actually been done yet? Maybe this should always return 0? I'm not sure > > what the rules are with shrinkers. > > nfsd_client_shrinker_reapcount is the actual number of clients that > the laudromat was able to destroy in its last run. > > > > > Either way, it seems like "scan" should cue the laundromat to run ASAP. > > When this is called, it may be quite some time before the laundromat > > runs again. Currently, it's always just scheduled it out to when we know > > there may be work to be done, but this is a different situation. > > Normally the "scan" callback is used to free unused resources and return > the number of objects freed. For the NFSv4 clients, we can not free the > clients on the "scan" context due to potential deadlock and also the > laundromat might block while destroying the clients. Because of this we > use the "count" callback to notify the laundromat of the low memory > condition. > > In the "count" callback we do not call mod_delayed_work to kick start > the laundromat since we do not want to rely on the inner working > mod_delayed_work to guarantee no deadlock. I also think we should do > the minimal while on the memory shrinker's callback context. > Are you aware of a specific problem with shrinkers and queueing work? As I understand it, shrinkers are run in the context of an allocation. An allocation crosses a threshold of some sort and and we start trying to free stuff using shrinkers. queueing delayed work will never allocate anything, so I would think it'd be safe to do in the count callback. > Once the laundromat runs it monitors the low memory condition and > reschedules itself to run immediately (NFSD_LAUNDROMAT_MINTIMEOUT) if > needed. > > Thanks, > -Dai > > > > > > + return cnt; > > > +} > > > + > > > +static int > > > +nfsd_register_client_shrinker(struct nfsd_net *nn) > > > +{ > > > + nn->nfsd_client_shrinker.scan_objects = nfsd_courtesy_client_scan; > > > + nn->nfsd_client_shrinker.count_objects = nfsd_courtesy_client_count; > > > + nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; > > > + return register_shrinker(&nn->nfsd_client_shrinker, "nfsd-client"); > > > +} > > > + > > > +int > > > +nfsd4_init_leases_net(struct nfsd_net *nn) > > > { > > > struct sysinfo si; > > > u64 max_clients; > > > @@ -4362,6 +4394,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) > > > nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); > > > > > > atomic_set(&nn->nfsd_courtesy_client_count, 0); > > > + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); > > > + return nfsd_register_client_shrinker(nn); > > > } > > > > > > static void init_nfs4_replay(struct nfs4_replay *rp) > > > @@ -5870,12 +5904,17 @@ static void > > > nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, > > > struct laundry_time *lt) > > > { > > > - unsigned int oldstate, maxreap, reapcnt = 0; > > > + unsigned int oldstate, maxreap = 0, reapcnt = 0; > > > + int cb_cnt; > > > struct list_head *pos, *next; > > > struct nfs4_client *clp; > > > > > > - maxreap = (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) ? > > > - NFSD_CLIENT_MAX_TRIM_PER_RUN : 0; > > > + cb_cnt = atomic_read(&nn->nfsd_client_shrinker_cb_count); > > > + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients || > > > + cb_cnt) { > > > + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; > > > + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); > > > + } > > > INIT_LIST_HEAD(reaplist); > > > spin_lock(&nn->client_lock); > > > list_for_each_safe(pos, next, &nn->client_lru) { > > > @@ -5902,6 +5941,8 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, > > > } > > > } > > > spin_unlock(&nn->client_lock); > > > + if (cb_cnt) > > > + atomic_add(reapcnt, &nn->nfsd_client_shrinker_reapcount); > > > } > > > > > > static time64_t > > > @@ -5942,6 +5983,8 @@ nfs4_laundromat(struct nfsd_net *nn) > > > list_del_init(&clp->cl_lru); > > > expire_client(clp); > > > } > > > + if (atomic_read(&nn->nfsd_client_shrinker_cb_count) > 0) > > > + lt.new_timeo = NFSD_LAUNDROMAT_MINTIMEOUT; > > > spin_lock(&state_lock); > > > list_for_each_safe(pos, next, &nn->del_recall_lru) { > > > dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > > > index 917fa1892fd2..597a26ad4183 100644 > > > --- a/fs/nfsd/nfsctl.c > > > +++ b/fs/nfsd/nfsctl.c > > > @@ -1481,11 +1481,12 @@ static __net_init int nfsd_init_net(struct net *net) > > > goto out_idmap_error; > > > nn->nfsd_versions = NULL; > > > nn->nfsd4_minorversions = NULL; > > > + retval = nfsd4_init_leases_net(nn); > > > + if (retval) > > > + goto out_drc_error; > > > retval = nfsd_reply_cache_init(nn); > > > if (retval) > > > goto out_drc_error; > > > - nfsd4_init_leases_net(nn); > > > - > > > get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); > > > seqlock_init(&nn->writeverf_lock); > > > > > > @@ -1507,6 +1508,7 @@ static __net_exit void nfsd_exit_net(struct net *net) > > > nfsd_idmap_shutdown(net); > > > nfsd_export_shutdown(net); > > > nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); > > > + nfsd4_leases_net_shutdown(nn); > > > } > > > > > > static struct pernet_operations nfsd_net_ops = { > > > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h > > > index 57a468ed85c3..7e05ab7a3532 100644 > > > --- a/fs/nfsd/nfsd.h > > > +++ b/fs/nfsd/nfsd.h > > > @@ -498,7 +498,11 @@ extern void unregister_cld_notifier(void); > > > extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); > > > #endif > > > > > > -extern void nfsd4_init_leases_net(struct nfsd_net *nn); > > > +extern int nfsd4_init_leases_net(struct nfsd_net *nn); > > > +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) > > > +{ > > > + unregister_shrinker(&nn->nfsd_client_shrinker); > > > +}; > > > > > > #else /* CONFIG_NFSD_V4 */ > > > static inline int nfsd4_is_junction(struct dentry *dentry) > > > @@ -506,7 +510,8 @@ static inline int nfsd4_is_junction(struct dentry *dentry) > > > return 0; > > > } > > > > > > -static inline void nfsd4_init_leases_net(struct nfsd_net *nn) {}; > > > +static inline int nfsd4_init_leases_net(struct nfsd_net *nn) { return 0; }; > > > +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) { }; > > > > > > #define register_cld_notifier() 0 > > > #define unregister_cld_notifier() do { } while(0) -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/2] NFSD: add shrinker to reap courtesy clients on low memory condition 2022-08-29 18:40 ` Jeff Layton @ 2022-08-29 18:52 ` dai.ngo 0 siblings, 0 replies; 11+ messages in thread From: dai.ngo @ 2022-08-29 18:52 UTC (permalink / raw) To: Jeff Layton, chuck.lever; +Cc: linux-nfs On 8/29/22 11:40 AM, Jeff Layton wrote: > On Mon, 2022-08-29 at 11:25 -0700, dai.ngo@oracle.com wrote: >> On 8/29/22 10:15 AM, Jeff Layton wrote: >>> On Sun, 2022-08-28 at 17:47 -0700, Dai Ngo wrote: >>>> Add the courtesy client shrinker to react to low memory condition >>>> triggered by the memory shrinker. >>>> >>>> On the shrinker's count callback, we increment a callback counter >>>> and return the number of outstanding courtesy clients. When the >>>> laundromat runs, it checks if this counter is not zero and starts >>>> reaping old courtesy clients. The maximum number of clients to be >>>> reaped is limited to NFSD_CIENT_MAX_TRIM_PER_RUN (128). This limit >>>> is to prevent the laundromat from spending too much time reaping >>>> the clients and not processing other tasks in a timely manner. >>>> >>>> The laundromat is rescheduled to run sooner if it detects low >>>> low memory condition and there are more clients to reap. >>>> >>>> On the shrinker's scan callback, we return the number of clients >>>> That were reaped since the last scan callback. We can not reap >>>> the clients on the scan callback context since destroying the >>>> client might require call into the underlying filesystem or other >>>> subsystems which might allocate memory which can cause deadlock. >>>> >>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> >>>> --- >>>> fs/nfsd/netns.h | 3 +++ >>>> fs/nfsd/nfs4state.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++---- >>>> fs/nfsd/nfsctl.c | 6 ++++-- >>>> fs/nfsd/nfsd.h | 9 +++++++-- >>>> 4 files changed, 61 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h >>>> index 2695dff1378a..2a604951623f 100644 >>>> --- a/fs/nfsd/netns.h >>>> +++ b/fs/nfsd/netns.h >>>> @@ -194,6 +194,9 @@ struct nfsd_net { >>>> int nfs4_max_clients; >>>> >>>> atomic_t nfsd_courtesy_client_count; >>>> + atomic_t nfsd_client_shrinker_cb_count; >>>> + atomic_t nfsd_client_shrinker_reapcount; >>>> + struct shrinker nfsd_client_shrinker; >>>> }; >>>> >>>> /* Simple check to find out if a given net was properly initialized */ >>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>> index 3d8d7ebb5b91..9d5a20f0c3c4 100644 >>>> --- a/fs/nfsd/nfs4state.c >>>> +++ b/fs/nfsd/nfs4state.c >>>> @@ -4341,7 +4341,39 @@ nfsd4_init_slabs(void) >>>> return -ENOMEM; >>>> } >>>> >>>> -void nfsd4_init_leases_net(struct nfsd_net *nn) >>>> +static unsigned long >>>> +nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) >>>> +{ >>>> + struct nfsd_net *nn = container_of(shrink, >>>> + struct nfsd_net, nfsd_client_shrinker); >>>> + >>>> + atomic_inc(&nn->nfsd_client_shrinker_cb_count); >>>> + return (unsigned long)atomic_read(&nn->nfsd_courtesy_client_count); >>>> +} >>>> + >>>> +static unsigned long >>>> +nfsd_courtesy_client_scan(struct shrinker *shrink, struct shrink_control *sc) >>>> +{ >>>> + struct nfsd_net *nn = container_of(shrink, >>>> + struct nfsd_net, nfsd_client_shrinker); >>>> + unsigned long cnt; >>>> + >>>> + cnt = atomic_read(&nn->nfsd_client_shrinker_reapcount); >>>> + atomic_set(&nn->nfsd_client_shrinker_reapcount, 0); >>> Is it legit to return that we freed these objects when it hasn't >>> actually been done yet? Maybe this should always return 0? I'm not sure >>> what the rules are with shrinkers. >> nfsd_client_shrinker_reapcount is the actual number of clients that >> the laudromat was able to destroy in its last run. >> >>> Either way, it seems like "scan" should cue the laundromat to run ASAP. >>> When this is called, it may be quite some time before the laundromat >>> runs again. Currently, it's always just scheduled it out to when we know >>> there may be work to be done, but this is a different situation. >> Normally the "scan" callback is used to free unused resources and return >> the number of objects freed. For the NFSv4 clients, we can not free the >> clients on the "scan" context due to potential deadlock and also the >> laundromat might block while destroying the clients. Because of this we >> use the "count" callback to notify the laundromat of the low memory >> condition. >> >> In the "count" callback we do not call mod_delayed_work to kick start >> the laundromat since we do not want to rely on the inner working >> mod_delayed_work to guarantee no deadlock. I also think we should do >> the minimal while on the memory shrinker's callback context. >> > Are you aware of a specific problem with shrinkers and queueing work? > > As I understand it, shrinkers are run in the context of an allocation. > An allocation crosses a threshold of some sort and and we start trying > to free stuff using shrinkers. queueing delayed work will never allocate > anything, so I would think it'd be safe to do in the count callback. ok, I'll try to add mod_delayed_work on the "count" callback. I remember vaguely that I ran into some trouble before but that might have been caused by something else. -Dai > >> Once the laundromat runs it monitors the low memory condition and >> reschedules itself to run immediately (NFSD_LAUNDROMAT_MINTIMEOUT) if >> needed. >> >> Thanks, >> -Dai >> >>>> + return cnt; >>>> +} >>>> + >>>> +static int >>>> +nfsd_register_client_shrinker(struct nfsd_net *nn) >>>> +{ >>>> + nn->nfsd_client_shrinker.scan_objects = nfsd_courtesy_client_scan; >>>> + nn->nfsd_client_shrinker.count_objects = nfsd_courtesy_client_count; >>>> + nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; >>>> + return register_shrinker(&nn->nfsd_client_shrinker, "nfsd-client"); >>>> +} >>>> + >>>> +int >>>> +nfsd4_init_leases_net(struct nfsd_net *nn) >>>> { >>>> struct sysinfo si; >>>> u64 max_clients; >>>> @@ -4362,6 +4394,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) >>>> nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); >>>> >>>> atomic_set(&nn->nfsd_courtesy_client_count, 0); >>>> + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); >>>> + return nfsd_register_client_shrinker(nn); >>>> } >>>> >>>> static void init_nfs4_replay(struct nfs4_replay *rp) >>>> @@ -5870,12 +5904,17 @@ static void >>>> nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, >>>> struct laundry_time *lt) >>>> { >>>> - unsigned int oldstate, maxreap, reapcnt = 0; >>>> + unsigned int oldstate, maxreap = 0, reapcnt = 0; >>>> + int cb_cnt; >>>> struct list_head *pos, *next; >>>> struct nfs4_client *clp; >>>> >>>> - maxreap = (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) ? >>>> - NFSD_CLIENT_MAX_TRIM_PER_RUN : 0; >>>> + cb_cnt = atomic_read(&nn->nfsd_client_shrinker_cb_count); >>>> + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients || >>>> + cb_cnt) { >>>> + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; >>>> + atomic_set(&nn->nfsd_client_shrinker_cb_count, 0); >>>> + } >>>> INIT_LIST_HEAD(reaplist); >>>> spin_lock(&nn->client_lock); >>>> list_for_each_safe(pos, next, &nn->client_lru) { >>>> @@ -5902,6 +5941,8 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, >>>> } >>>> } >>>> spin_unlock(&nn->client_lock); >>>> + if (cb_cnt) >>>> + atomic_add(reapcnt, &nn->nfsd_client_shrinker_reapcount); >>>> } >>>> >>>> static time64_t >>>> @@ -5942,6 +5983,8 @@ nfs4_laundromat(struct nfsd_net *nn) >>>> list_del_init(&clp->cl_lru); >>>> expire_client(clp); >>>> } >>>> + if (atomic_read(&nn->nfsd_client_shrinker_cb_count) > 0) >>>> + lt.new_timeo = NFSD_LAUNDROMAT_MINTIMEOUT; >>>> spin_lock(&state_lock); >>>> list_for_each_safe(pos, next, &nn->del_recall_lru) { >>>> dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); >>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c >>>> index 917fa1892fd2..597a26ad4183 100644 >>>> --- a/fs/nfsd/nfsctl.c >>>> +++ b/fs/nfsd/nfsctl.c >>>> @@ -1481,11 +1481,12 @@ static __net_init int nfsd_init_net(struct net *net) >>>> goto out_idmap_error; >>>> nn->nfsd_versions = NULL; >>>> nn->nfsd4_minorversions = NULL; >>>> + retval = nfsd4_init_leases_net(nn); >>>> + if (retval) >>>> + goto out_drc_error; >>>> retval = nfsd_reply_cache_init(nn); >>>> if (retval) >>>> goto out_drc_error; >>>> - nfsd4_init_leases_net(nn); >>>> - >>>> get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); >>>> seqlock_init(&nn->writeverf_lock); >>>> >>>> @@ -1507,6 +1508,7 @@ static __net_exit void nfsd_exit_net(struct net *net) >>>> nfsd_idmap_shutdown(net); >>>> nfsd_export_shutdown(net); >>>> nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); >>>> + nfsd4_leases_net_shutdown(nn); >>>> } >>>> >>>> static struct pernet_operations nfsd_net_ops = { >>>> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h >>>> index 57a468ed85c3..7e05ab7a3532 100644 >>>> --- a/fs/nfsd/nfsd.h >>>> +++ b/fs/nfsd/nfsd.h >>>> @@ -498,7 +498,11 @@ extern void unregister_cld_notifier(void); >>>> extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); >>>> #endif >>>> >>>> -extern void nfsd4_init_leases_net(struct nfsd_net *nn); >>>> +extern int nfsd4_init_leases_net(struct nfsd_net *nn); >>>> +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) >>>> +{ >>>> + unregister_shrinker(&nn->nfsd_client_shrinker); >>>> +}; >>>> >>>> #else /* CONFIG_NFSD_V4 */ >>>> static inline int nfsd4_is_junction(struct dentry *dentry) >>>> @@ -506,7 +510,8 @@ static inline int nfsd4_is_junction(struct dentry *dentry) >>>> return 0; >>>> } >>>> >>>> -static inline void nfsd4_init_leases_net(struct nfsd_net *nn) {}; >>>> +static inline int nfsd4_init_leases_net(struct nfsd_net *nn) { return 0; }; >>>> +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) { }; >>>> >>>> #define register_cld_notifier() 0 >>>> #define unregister_cld_notifier() do { } while(0) ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 0/2] NFSD: handling memory shortage problem with Courteous server
@ 2022-07-04 19:05 Dai Ngo
2022-07-04 19:05 ` [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system Dai Ngo
0 siblings, 1 reply; 11+ messages in thread
From: Dai Ngo @ 2022-07-04 19:05 UTC (permalink / raw)
To: chuck.lever; +Cc: linux-nfs
Currently the idle timeout for courtesy client is fixed at 1 day. If
there are lots of courtesy clients remain in the system it can cause
memory resource shortage that effects the operations of other modules
in the kernel. This problem can be observed by running pynfs nfs4.0
CID5 test in a loop. Eventually system runs out of memory and rpc.gssd
fails to add new watch:
rpc.gssd[3851]: ERROR: inotify_add_watch failed for nfsd4_cb/clnt6c2e:
No space left on device
and alloc_inode also fails with out of memory:
Call Trace:
<TASK>
dump_stack_lvl+0x33/0x42
dump_header+0x4a/0x1ed
oom_kill_process+0x80/0x10d
out_of_memory+0x237/0x25f
__alloc_pages_slowpath.constprop.0+0x617/0x7b6
__alloc_pages+0x132/0x1e3
alloc_slab_page+0x15/0x33
allocate_slab+0x78/0x1ab
? alloc_inode+0x38/0x8d
___slab_alloc+0x2af/0x373
? alloc_inode+0x38/0x8d
? slab_pre_alloc_hook.constprop.0+0x9f/0x158
? alloc_inode+0x38/0x8d
__slab_alloc.constprop.0+0x1c/0x24
kmem_cache_alloc_lru+0x8c/0x142
alloc_inode+0x38/0x8d
iget_locked+0x60/0x126
kernfs_get_inode+0x18/0x105
kernfs_iop_lookup+0x6d/0xbc
__lookup_slow+0xb7/0xf9
lookup_slow+0x3a/0x52
walk_component+0x90/0x100
? inode_permission+0x87/0x128
link_path_walk.part.0.constprop.0+0x266/0x2ea
? path_init+0x101/0x2f2
path_lookupat+0x4c/0xfa
filename_lookup+0x63/0xd7
? getname_flags+0x32/0x17a
? kmem_cache_alloc+0x11f/0x144
? getname_flags+0x16c/0x17a
user_path_at_empty+0x37/0x4b
do_readlinkat+0x61/0x102
__x64_sys_readlinkat+0x18/0x1b
do_syscall_64+0x57/0x72
entry_SYSCALL_64_after_hwframe+0x46/0xb0
This patch addresses this problem by:
. removing the fixed 1-day idle time limit for courtesy client.
Courtesy client is now allowed to remain valid as long as the
available system memory is above 80%.
. when available system memory drops below 80%, laundromat starts
trimming older courtesy clients. The number of courtesy clients
to trim is a percentage of the total number of courtesy clients
exist in the system. This percentage is computed based on
the current percentage of available system memory.
. the percentage of number of courtesy clients to be trimmed
is based on this table:
----------------------------------
| % memory | % courtesy clients |
| available | to trim |
----------------------------------
| > 80 | 0 |
| > 70 | 10 |
| > 60 | 20 |
| > 50 | 40 |
| > 40 | 60 |
| > 30 | 80 |
| < 30 | 100 |
----------------------------------
. due to the overhead associated with removing client record,
there is a limit of 128 clients to be trimmed for each
laundromat run. This is done to prevent the laundromat from
spending too long destroying the clients and misses performing
its other tasks in a timely manner.
. the laundromat is scheduled to run sooner if there are more
courtesy clients need to be destroyed.
The shrinker method was evaluated and found it's not suitable
for this problem due to these reasons:
. destroying the NFSv4 client on the shrinker context can cause
deadlock since nfsd_file_put calls into the underlying FS
code and we have no control what it will do as seen in this
stack trace:
======================================================
WARNING: possible circular locking dependency detected
5.19.0-rc2_sk+ #1 Not tainted
------------------------------------------------------
lck/31847 is trying to acquire lock:
ffff88811d268850 (&sb->s_type->i_mutex_key#16){+.+.}-{3:3}, at: btrfs_inode_lock+0x38/0x70
#012but task is already holding lock:
ffffffffb41848c0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x506/0x1db0
#012which lock already depends on the new lock.
#012the existing dependency chain (in reverse order) is:
#012-> #1 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire+0xc0/0x100
__kmalloc+0x51/0x320
btrfs_buffered_write+0x2eb/0xd90
btrfs_do_write_iter+0x6bf/0x11c0
do_iter_readv_writev+0x2bb/0x5a0
do_iter_write+0x131/0x630
nfsd_vfs_write+0x4da/0x1900 [nfsd]
nfsd4_write+0x2ac/0x760 [nfsd]
nfsd4_proc_compound+0xce8/0x23e0 [nfsd]
nfsd_dispatch+0x4ed/0xc10 [nfsd]
svc_process_common+0xd3f/0x1b00 [sunrpc]
svc_process+0x361/0x4f0 [sunrpc]
nfsd+0x2d6/0x570 [nfsd]
kthread+0x2a1/0x340
ret_from_fork+0x22/0x30
#012-> #0 (&sb->s_type->i_mutex_key#16){+.+.}-{3:3}:
__lock_acquire+0x318d/0x7830
lock_acquire+0x1bb/0x500
down_write+0x82/0x130
btrfs_inode_lock+0x38/0x70
btrfs_sync_file+0x280/0x1010
nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
nfsd_file_put+0xd4/0x110 [nfsd]
release_all_access+0x13a/0x220 [nfsd]
nfs4_free_ol_stateid+0x40/0x90 [nfsd]
free_ol_stateid_reaplist+0x131/0x210 [nfsd]
release_openowner+0xf7/0x160 [nfsd]
__destroy_client+0x3cc/0x740 [nfsd]
nfsd_cc_lru_scan+0x271/0x410 [nfsd]
shrink_slab.constprop.0+0x31e/0x7d0
shrink_node+0x54b/0xe50
try_to_free_pages+0x394/0xba0
__alloc_pages_slowpath.constprop.0+0x5d2/0x1db0
__alloc_pages+0x4d6/0x580
__handle_mm_fault+0xc25/0x2810
handle_mm_fault+0x136/0x480
do_user_addr_fault+0x3d8/0xec0
exc_page_fault+0x5d/0xc0
asm_exc_page_fault+0x27/0x30
#012other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(&sb->s_type->i_mutex_key#16);
lock(fs_reclaim);
lock(&sb->s_type->i_mutex_key#16);
#012 *** DEADLOCK ***
. the shrinker kicks in only when memory drops really low, ~<5%.
By this time, some other components in the system already run
into issue with memory shortage. For example, rpc.gssd starts
failing to add watches in /var/lib/nfs/rpc_pipefs/nfsd4_cb
once the memory consumed by these watches reaches about 1% of
available system memory.
. destroying the NFSv4 client has significant overhead due to
the upcall to user space to remove the client records which
might access storage device. There is potential deadlock
if the storage subsystem needs to allocate memory.
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system 2022-07-04 19:05 [PATCH v2 0/2] NFSD: handling memory shortage problem with Courteous server Dai Ngo @ 2022-07-04 19:05 ` Dai Ngo 0 siblings, 0 replies; 11+ messages in thread From: Dai Ngo @ 2022-07-04 19:05 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs Add counter nfscourtesy_client_count to keep track of the number of courtesy clients in the system. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> --- fs/nfsd/nfs4state.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9409a0dc1b76..a34ffb0d8c77 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -126,11 +126,13 @@ static const struct nfsd4_callback_ops nfsd4_cb_recall_ops; static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops; static struct workqueue_struct *laundry_wq; +static atomic_t courtesy_client_count; int nfsd4_create_laundry_wq(void) { int rc = 0; + atomic_set(&courtesy_client_count, 0); laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd4"); if (laundry_wq == NULL) rc = -ENOMEM; @@ -169,7 +171,8 @@ static __be32 get_client_locked(struct nfs4_client *clp) if (is_client_expired(clp)) return nfserr_expired; atomic_inc(&clp->cl_rpc_users); - clp->cl_state = NFSD4_ACTIVE; + if (xchg(&clp->cl_state, NFSD4_ACTIVE) != NFSD4_ACTIVE) + atomic_add_unless(&courtesy_client_count, -1, 0); return nfs_ok; } @@ -190,7 +193,8 @@ renew_client_locked(struct nfs4_client *clp) list_move_tail(&clp->cl_lru, &nn->client_lru); clp->cl_time = ktime_get_boottime_seconds(); - clp->cl_state = NFSD4_ACTIVE; + if (xchg(&clp->cl_state, NFSD4_ACTIVE) != NFSD4_ACTIVE) + atomic_add_unless(&courtesy_client_count, -1, 0); } static void put_client_renew_locked(struct nfs4_client *clp) @@ -2226,6 +2230,8 @@ __destroy_client(struct nfs4_client *clp) nfsd4_shutdown_callback(clp); if (clp->cl_cb_conn.cb_xprt) svc_xprt_put(clp->cl_cb_conn.cb_xprt); + if (clp->cl_state != NFSD4_ACTIVE) + atomic_add_unless(&courtesy_client_count, -1, 0); free_client(clp); wake_up_all(&expiry_wq); } @@ -5803,8 +5809,11 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, goto exp_client; if (!state_expired(lt, clp->cl_time)) break; - if (!atomic_read(&clp->cl_rpc_users)) - clp->cl_state = NFSD4_COURTESY; + if (!atomic_read(&clp->cl_rpc_users)) { + if (xchg(&clp->cl_state, NFSD4_COURTESY) == + NFSD4_ACTIVE) + atomic_inc(&courtesy_client_count); + } if (!client_has_state(clp) || ktime_get_boottime_seconds() >= (clp->cl_time + NFSD_COURTESY_CLIENT_TIMEOUT)) -- 2.9.5 ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-08-29 18:53 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-29 0:47 [PATCH v2 0/2] NFSD: memory shrinker for NFSv4 clients Dai Ngo 2022-08-29 0:47 ` [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system Dai Ngo 2022-08-29 16:48 ` Jeff Layton 2022-08-29 18:24 ` dai.ngo 2022-08-29 18:27 ` Jeff Layton 2022-08-29 0:47 ` [PATCH v2 2/2] NFSD: add shrinker to reap courtesy clients on low memory condition Dai Ngo 2022-08-29 17:15 ` Jeff Layton 2022-08-29 18:25 ` dai.ngo 2022-08-29 18:40 ` Jeff Layton 2022-08-29 18:52 ` dai.ngo -- strict thread matches above, loose matches on Subject: below -- 2022-07-04 19:05 [PATCH v2 0/2] NFSD: handling memory shortage problem with Courteous server Dai Ngo 2022-07-04 19:05 ` [PATCH v2 1/2] NFSD: keep track of the number of courtesy clients in the system Dai Ngo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox