From: Chuck Lever <chuck.lever@oracle.com>
To: NeilBrown <neil@brown.name>, Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
Li Lingfeng <lilingfeng3@huawei.com>,
chenxiaosong@chenxiaosong.com
Subject: Re: [PATCH 2/3] nfsd: use kref and new mutex for global config management
Date: Thu, 19 Jun 2025 10:06:05 -0400 [thread overview]
Message-ID: <00bac421-cef6-451d-b868-592ed34c15af@oracle.com> (raw)
In-Reply-To: <20250618213347.425503-3-neil@brown.name>
On 6/18/25 5:31 PM, NeilBrown wrote:
> nfsd_mutex is used for two quite different things:
> 1/ it prevents races when start/stoping global resources:
> the filecache and the v4 state table
> 2/ it prevents races for per-netns config, typically
> ensure config changes are synchronised w.r.t. server
> startup/shutdown.
>
> This patch splits out the first used. A subsequent patch improves the
> second.
>
> "nfsd_users" is changed to a kref which is can be taken to delay
> the shutdown of global services. nfsd_startup_get(), it is succeeds,
> ensure the global services will remain until nfsd_startup_put().
>
> The new mutex, nfsd_startup_mutex, is only take for startup and
> shutdown. It is not needed to protect the kref.
>
> The locking needed by nfsd_file_cache_purge() is now provided internally
> by that function so calls don't need to be concerned.
>
> This replaces NFSD_FILE_CACHE_UP which is effective just a flag which
> says nfsd_users is non-zero.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/nfsd/export.c | 6 ------
> fs/nfsd/filecache.c | 31 +++++++++++--------------------
> fs/nfsd/nfsd.h | 3 +++
> fs/nfsd/nfssvc.c | 41 +++++++++++++++++++++++++++--------------
> 4 files changed, 41 insertions(+), 40 deletions(-)
>
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index cadfc2bae60e..1ea3d72ef5c9 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -243,13 +243,7 @@ static struct cache_head *expkey_alloc(void)
>
> static void expkey_flush(void)
> {
> - /*
> - * Take the nfsd_mutex here to ensure that the file cache is not
> - * destroyed while we're in the middle of flushing.
> - */
> - mutex_lock(&nfsd_mutex);
> nfsd_file_cache_purge(current->nsproxy->net_ns);
> - mutex_unlock(&nfsd_mutex);
> }
>
> static const struct cache_detail svc_expkey_cache_template = {
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index e108b6c705b4..0a9116b7530c 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -50,8 +50,6 @@
>
> #define NFSD_LAUNDRETTE_DELAY (2 * HZ)
>
> -#define NFSD_FILE_CACHE_UP (0)
> -
> /* We only care about NFSD_MAY_READ/WRITE for this cache */
> #define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
>
> @@ -70,7 +68,6 @@ struct nfsd_fcache_disposal {
> static struct kmem_cache *nfsd_file_slab;
> static struct kmem_cache *nfsd_file_mark_slab;
> static struct list_lru nfsd_file_lru;
> -static unsigned long nfsd_file_flags;
> static struct fsnotify_group *nfsd_file_fsnotify_group;
> static struct delayed_work nfsd_filecache_laundrette;
> static struct rhltable nfsd_file_rhltable
> @@ -112,9 +109,12 @@ static const struct rhashtable_params nfsd_file_rhash_params = {
> static void
> nfsd_file_schedule_laundrette(void)
> {
> - if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags))
> - queue_delayed_work(system_unbound_wq, &nfsd_filecache_laundrette,
> + if (nfsd_startup_get()) {
> + queue_delayed_work(system_unbound_wq,
> + &nfsd_filecache_laundrette,
> NFSD_LAUNDRETTE_DELAY);
> + nfsd_startup_put();
> + }
> }
>
> static void
> @@ -795,10 +795,6 @@ nfsd_file_cache_init(void)
> {
> int ret;
>
> - lockdep_assert_held(&nfsd_mutex);
> - if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1)
> - return 0;
> -
> ret = rhltable_init(&nfsd_file_rhltable, &nfsd_file_rhash_params);
> if (ret)
> goto out;
> @@ -853,8 +849,6 @@ nfsd_file_cache_init(void)
>
> INIT_DELAYED_WORK(&nfsd_filecache_laundrette, nfsd_file_gc_worker);
> out:
> - if (ret)
> - clear_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags);
> return ret;
> out_notifier:
> lease_unregister_notifier(&nfsd_file_lease_notifier);
> @@ -958,9 +952,10 @@ nfsd_file_cache_start_net(struct net *net)
> void
> nfsd_file_cache_purge(struct net *net)
> {
> - lockdep_assert_held(&nfsd_mutex);
> - if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1)
> + if (nfsd_startup_get()) {
> __nfsd_file_cache_purge(net);
> + nfsd_startup_put();
> + }
> }
>
> void
> @@ -975,10 +970,6 @@ nfsd_file_cache_shutdown(void)
> {
> int i;
>
> - lockdep_assert_held(&nfsd_mutex);
> - if (test_and_clear_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0)
> - return;
> -
> lease_unregister_notifier(&nfsd_file_lease_notifier);
> shrinker_free(nfsd_file_shrinker);
> /*
> @@ -1347,8 +1338,7 @@ int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
> unsigned long lru = 0, total_age = 0;
>
> /* Serialize with server shutdown */
> - mutex_lock(&nfsd_mutex);
> - if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) {
> + if (nfsd_startup_get()) {
> struct bucket_table *tbl;
> struct rhashtable *ht;
>
> @@ -1360,8 +1350,9 @@ int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
> tbl = rht_dereference_rcu(ht->tbl, ht);
> buckets = tbl->size;
> rcu_read_unlock();
> +
> + nfsd_startup_put();
> }
> - mutex_unlock(&nfsd_mutex);
>
> for_each_possible_cpu(i) {
> hits += per_cpu(nfsd_file_cache_hits, i);
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index 1bfd0b4e9af7..8ad9fcc23789 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -80,6 +80,9 @@ extern const struct svc_version nfsd_version2, nfsd_version3, nfsd_version4;
> extern struct mutex nfsd_mutex;
> extern atomic_t nfsd_th_cnt; /* number of available threads */
>
> +bool nfsd_startup_get(void);
> +void nfsd_startup_put(void);
> +
> extern const struct seq_operations nfs_exports_op;
>
> /*
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 82b0111ac469..b2080e5a71e6 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -270,38 +270,51 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred)
> return 0;
> }
>
> -static int nfsd_users = 0;
> +static struct kref nfsd_users = KREF_INIT(0);
> +static DEFINE_MUTEX(nfsd_startup_mutex);
>
> static int nfsd_startup_generic(void)
> {
> - int ret;
> + int ret = 0;
>
> - if (nfsd_users++)
> + if (kref_get_unless_zero(&nfsd_users))
> return 0;
> + mutex_lock(&nfsd_startup_mutex);
> + if (kref_get_unless_zero(&nfsd_users))
> + goto out_unlock;
>
> ret = nfsd_file_cache_init();
> if (ret)
> - goto dec_users;
> + goto out_unlock;
>
> ret = nfs4_state_start();
> if (ret)
> goto out_file_cache;
> - return 0;
> + kref_init(&nfsd_users);
> +out_unlock:
> + mutex_unlock(&nfsd_startup_mutex);
> + return ret;
>
> out_file_cache:
> nfsd_file_cache_shutdown();
> -dec_users:
> - nfsd_users--;
> - return ret;
> + goto out_unlock;
> }
>
> -static void nfsd_shutdown_generic(void)
> +static void nfsd_shutdown_cb(struct kref *kref)
> {
> - if (--nfsd_users)
> - return;
> -
> nfs4_state_shutdown();
> nfsd_file_cache_shutdown();
> + mutex_unlock(&nfsd_startup_mutex);
> +}
> +
> +bool nfsd_startup_get(void)
> +{
> + return kref_get_unless_zero(&nfsd_users);
> +}
> +
> +void nfsd_startup_put(void)
> +{
> + kref_put_mutex(&nfsd_users, nfsd_shutdown_cb, &nfsd_startup_mutex);
> }
>
> static bool nfsd_needs_lockd(struct nfsd_net *nn)
> @@ -416,7 +429,7 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
> nn->lockd_up = false;
> }
> out_socks:
> - nfsd_shutdown_generic();
> + nfsd_startup_put();
> return ret;
> }
>
> @@ -443,7 +456,7 @@ static void nfsd_shutdown_net(struct net *net)
> percpu_ref_exit(&nn->nfsd_net_ref);
>
> nn->nfsd_net_up = false;
> - nfsd_shutdown_generic();
> + nfsd_startup_put();
> }
>
> static DEFINE_SPINLOCK(nfsd_notifier_lock);
[ Adding Cc: chenxiaosong@chenxiaosong.com who seems to be working on a
crasher related to nfsd_users. ]
I like where this is going.
- I'm struggling a little with the names of nfsd_startup_get/put. But I
don't have a better suggestion. I think we're marking a critical
section of sorts. pin/unpin maybe?
- Maybe someone has a way to exercise NFSD service startup and shutdown.
We don't have much testing in this area, upstream. A CI-style "do this
thing 1000 times while the server is under load" might be good.
--
Chuck Lever
next prev parent reply other threads:[~2025-06-19 14:06 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-18 21:31 [PATCH 0/3 RFC] improve some nfsd_mutex locking NeilBrown
2025-06-18 21:31 ` [PATCH 1/3] nfsd: provide proper locking for all write_ function NeilBrown
2025-06-20 12:59 ` Jeff Layton
2025-06-18 21:31 ` [PATCH 2/3] nfsd: use kref and new mutex for global config management NeilBrown
2025-06-19 14:06 ` Chuck Lever [this message]
2025-06-20 13:01 ` Jeff Layton
2025-06-18 21:31 ` [PATCH 3/3] nfsd: split nfsd_mutex into one mutex per net-namespace NeilBrown
2025-06-19 12:33 ` kernel test robot
2025-06-20 13:13 ` Jeff Layton
-- strict thread matches above, loose matches on Subject: below --
2025-06-20 23:33 [PATCH 0/3 RFC] improve some nfsd_mutex locking NeilBrown
2025-06-20 23:33 ` [PATCH 2/3] nfsd: use kref and new mutex for global config management NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=00bac421-cef6-451d-b868-592ed34c15af@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Dai.Ngo@oracle.com \
--cc=chenxiaosong@chenxiaosong.com \
--cc=jlayton@kernel.org \
--cc=lilingfeng3@huawei.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox