public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz,
	roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org,
	paulmck@kernel.org, tytso@mit.edu, steven.price@arm.com,
	cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com,
	gregkh@linuxfoundation.org, muchun.song@linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	kvm@vger.kernel.org, xen-devel@lists.xenproject.org,
	linux-erofs@lists.ozlabs.org,
	linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com,
	linux-nfs@vger.kernel.org, linux-mtd@lists.infradead.org,
	rcu@vger.kernel.org, netdev@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org,
	dm-devel@redhat.com, linux-raid@vger.kernel.org,
	linux-bcache@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 44/47] mm: shrinker: make global slab shrink lockless
Date: Thu, 27 Jul 2023 09:09:42 +1000	[thread overview]
Message-ID: <ZMGnthZAh48JF+eV@dread.disaster.area> (raw)
In-Reply-To: <19ad6d06-8a14-6102-5eae-2134dc2c5061@bytedance.com>

On Wed, Jul 26, 2023 at 05:14:09PM +0800, Qi Zheng wrote:
> On 2023/7/26 16:08, Dave Chinner wrote:
> > On Mon, Jul 24, 2023 at 05:43:51PM +0800, Qi Zheng wrote:
> > > @@ -122,6 +126,13 @@ void shrinker_free_non_registered(struct shrinker *shrinker);
> > >   void shrinker_register(struct shrinker *shrinker);
> > >   void shrinker_unregister(struct shrinker *shrinker);
> > > +static inline bool shrinker_try_get(struct shrinker *shrinker)
> > > +{
> > > +	return READ_ONCE(shrinker->registered) &&
> > > +	       refcount_inc_not_zero(&shrinker->refcount);
> > > +}
> > 
> > Why do we care about shrinker->registered here? If we don't set
> > the refcount to 1 until we have fully initialised everything, then
> > the shrinker code can key entirely off the reference count and
> > none of the lookup code needs to care about whether the shrinker is
> > registered or not.
> 
> The purpose of checking shrinker->registered here is to stop running
> shrinker after calling shrinker_free(), which can prevent the following
> situations from happening:
> 
> CPU 0                 CPU 1
> 
> shrinker_try_get()
> 
>                        shrinker_try_get()
> 
> shrinker_put()
> shrinker_try_get()
>                        shrinker_put()

I don't see any race here? What is wrong with having multiple active
users at once?

> > 
> > This should use a completion, then it is always safe under
> > rcu_read_lock().  This also gets rid of the shrinker_lock spin lock,
> > which only exists because we can't take a blocking lock under
> > rcu_read_lock(). i.e:
> > 
> > 
> > void shrinker_put(struct shrinker *shrinker)
> > {
> > 	if (refcount_dec_and_test(&shrinker->refcount))
> > 		complete(&shrinker->done);
> > }
> > 
> > void shrinker_free()
> > {
> > 	.....
> > 	refcount_dec(&shrinker->refcount);
> 
> I guess what you mean is shrinker_put(), because here may be the last
> refcount.

Yes, I did.

> > 	wait_for_completion(&shrinker->done);
> > 	/*
> > 	 * lookups on the shrinker will now all fail as refcount has
> > 	 * fallen to zero. We can now remove it from the lists and
> > 	 * free it.
> > 	 */
> > 	down_write(shrinker_rwsem);
> > 	list_del_rcu(&shrinker->list);
> > 	up_write(&shrinker_rwsem);
> > 	call_rcu(shrinker->rcu_head, shrinker_free_rcu_cb);
> > }
> > 
> > ....
> > 
> > > @@ -686,11 +711,14 @@ EXPORT_SYMBOL(shrinker_free_non_registered);
> > >   void shrinker_register(struct shrinker *shrinker)
> > >   {
> > > -	down_write(&shrinker_rwsem);
> > > -	list_add_tail(&shrinker->list, &shrinker_list);
> > > -	shrinker->flags |= SHRINKER_REGISTERED;
> > > +	refcount_set(&shrinker->refcount, 1);
> > > +
> > > +	spin_lock(&shrinker_lock);
> > > +	list_add_tail_rcu(&shrinker->list, &shrinker_list);
> > > +	spin_unlock(&shrinker_lock);
> > > +
> > >   	shrinker_debugfs_add(shrinker);
> > > -	up_write(&shrinker_rwsem);
> > > +	WRITE_ONCE(shrinker->registered, true);
> > >   }
> > >   EXPORT_SYMBOL(shrinker_register);
> > 
> > This just looks wrong - you are trying to use WRITE_ONCE() as a
> > release barrier to indicate that the shrinker is now set up fully.
> > That's not necessary - the refcount is an atomic and along with the
> > rcu locks they should provides all the barriers we need. i.e.
> 
> The reason I used WRITE_ONCE() here is because the shrinker->registered
> will be read and written concurrently (read in shrinker_try_get() and
> written in shrinker_free()), which is why I added shrinker::registered
> field instead of using SHRINKER_REGISTERED flag (this can reduce the
> addition of WRITE_ONCE()/READ_ONCE()).

Using WRITE_ONCE/READ_ONCE doesn't provide memory barriers needed to
use the field like this. You need release/acquire memory ordering
here. i.e. smp_store_release()/smp_load_acquire().

As it is, the refcount_inc_not_zero() provides a control dependency,
as documented in include/linux/refcount.h, refcount_dec_and_test()
provides release memory ordering. The only thing I think we may need
is a write barrier before refcount_set(), such that if
refcount_inc_not_zero() sees a non-zero value, it is guaranteed to
see an initialised structure...

i.e. refcounts provide all the existence and initialisation
guarantees. Hence I don't see the need to use shrinker->registered
like this and it can remain a bit flag protected by the
shrinker_rwsem().


> > void shrinker_register(struct shrinker *shrinker)
> > {
> > 	down_write(&shrinker_rwsem);
> > 	list_add_tail_rcu(&shrinker->list, &shrinker_list);
> > 	shrinker->flags |= SHRINKER_REGISTERED;
> > 	shrinker_debugfs_add(shrinker);
> > 	up_write(&shrinker_rwsem);
> > 
> > 	/*
> > 	 * now the shrinker is fully set up, take the first
> > 	 * reference to it to indicate that lookup operations are
> > 	 * now allowed to use it via shrinker_try_get().
> > 	 */
> > 	refcount_set(&shrinker->refcount, 1);
> > }
> > 
> > > diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
> > > index f1becfd45853..c5573066adbf 100644
> > > --- a/mm/shrinker_debug.c
> > > +++ b/mm/shrinker_debug.c
> > > @@ -5,6 +5,7 @@
> > >   #include <linux/seq_file.h>
> > >   #include <linux/shrinker.h>
> > >   #include <linux/memcontrol.h>
> > > +#include <linux/rculist.h>
> > >   /* defined in vmscan.c */
> > >   extern struct rw_semaphore shrinker_rwsem;
> > > @@ -161,17 +162,21 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
> > >   {
> > >   	struct dentry *entry;
> > >   	char buf[128];
> > > -	int id;
> > > -
> > > -	lockdep_assert_held(&shrinker_rwsem);
> > > +	int id, ret = 0;
> > >   	/* debugfs isn't initialized yet, add debugfs entries later. */
> > >   	if (!shrinker_debugfs_root)
> > >   		return 0;
> > > +	down_write(&shrinker_rwsem);
> > > +	if (shrinker->debugfs_entry)
> > > +		goto fail;
> > > +
> > >   	id = ida_alloc(&shrinker_debugfs_ida, GFP_KERNEL);
> > > -	if (id < 0)
> > > -		return id;
> > > +	if (id < 0) {
> > > +		ret = id;
> > > +		goto fail;
> > > +	}
> > >   	shrinker->debugfs_id = id;
> > >   	snprintf(buf, sizeof(buf), "%s-%d", shrinker->name, id);
> > > @@ -180,7 +185,8 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
> > >   	entry = debugfs_create_dir(buf, shrinker_debugfs_root);
> > >   	if (IS_ERR(entry)) {
> > >   		ida_free(&shrinker_debugfs_ida, id);
> > > -		return PTR_ERR(entry);
> > > +		ret = PTR_ERR(entry);
> > > +		goto fail;
> > >   	}
> > >   	shrinker->debugfs_entry = entry;
> > > @@ -188,7 +194,10 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
> > >   			    &shrinker_debugfs_count_fops);
> > >   	debugfs_create_file("scan", 0220, entry, shrinker,
> > >   			    &shrinker_debugfs_scan_fops);
> > > -	return 0;
> > > +
> > > +fail:
> > > +	up_write(&shrinker_rwsem);
> > > +	return ret;
> > >   }
> > >   int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...)
> > > @@ -243,6 +252,11 @@ struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
> > >   	shrinker->name = NULL;
> > >   	*debugfs_id = entry ? shrinker->debugfs_id : -1;
> > > +	/*
> > > +	 * Ensure that shrinker->registered has been set to false before
> > > +	 * shrinker->debugfs_entry is set to NULL.
> > > +	 */
> > > +	smp_wmb();
> > >   	shrinker->debugfs_entry = NULL;
> > >   	return entry;
> > > @@ -266,14 +280,26 @@ static int __init shrinker_debugfs_init(void)
> > >   	shrinker_debugfs_root = dentry;
> > >   	/* Create debugfs entries for shrinkers registered at boot */
> > > -	down_write(&shrinker_rwsem);
> > > -	list_for_each_entry(shrinker, &shrinker_list, list)
> > > +	rcu_read_lock();
> > > +	list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
> > > +		if (!shrinker_try_get(shrinker))
> > > +			continue;
> > > +		rcu_read_unlock();
> > > +
> > >   		if (!shrinker->debugfs_entry) {
> > > -			ret = shrinker_debugfs_add(shrinker);
> > > -			if (ret)
> > > -				break;
> > > +			/* Paired with smp_wmb() in shrinker_debugfs_detach() */
> > > +			smp_rmb();
> > > +			if (READ_ONCE(shrinker->registered))
> > > +				ret = shrinker_debugfs_add(shrinker);
> > >   		}
> > > -	up_write(&shrinker_rwsem);
> > > +
> > > +		rcu_read_lock();
> > > +		shrinker_put(shrinker);
> > > +
> > > +		if (ret)
> > > +			break;
> > > +	}
> > > +	rcu_read_unlock();
> > >   	return ret;
> > >   }
> > 
> > And all this churn and complexity can go away because the
> > shrinker_rwsem is still used to protect shrinker_register()
> > entirely....
> 
> My consideration is that during this process, there may be a
> driver probe failure and then shrinker_free() is called (the
> shrinker_debugfs_init() is called in late_initcall stage). In
> this case, we need to use RCU+refcount to ensure that the shrinker
> is not freed.

Yeah, you're trying to work around the lack of a
wait_for_completion() call in shrinker_free().

With that, this doesn't need RCU at all, and the iteration can be
done fully under the shrinker_rwsem() safely and so none of this
code needs to change.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-07-26 23:09 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-24  9:43 [PATCH v2 00/47] use refcount+RCU method to implement lockless slab shrink Qi Zheng
2023-07-24  9:43 ` [PATCH v2 01/47] mm: vmscan: move shrinker-related code into a separate file Qi Zheng
2023-07-25  2:35   ` Muchun Song
2023-07-25  3:09     ` Qi Zheng
2023-07-25  3:23       ` Muchun Song
2023-07-25  3:27         ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 02/47] mm: shrinker: remove redundant shrinker_rwsem in debugfs operations Qi Zheng
2023-07-25  3:17   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 03/47] mm: shrinker: add infrastructure for dynamically allocating shrinker Qi Zheng
2023-07-24 12:25   ` Peter Zijlstra
2023-07-25  3:01     ` Qi Zheng
2023-07-25  9:02   ` Muchun Song
2023-07-25  9:56     ` Qi Zheng
2023-07-26  7:26   ` Dave Chinner
2023-07-26  9:20     ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 04/47] kvm: mmu: dynamically allocate the x86-mmu shrinker Qi Zheng
2023-07-25  9:16   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 05/47] binder: dynamically allocate the android-binder shrinker Qi Zheng
2023-07-24  9:56   ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 06/47] drm/ttm: dynamically allocate the drm-ttm_pool shrinker Qi Zheng
2023-07-25  9:19   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 07/47] xenbus/backend: dynamically allocate the xen-backend shrinker Qi Zheng
2023-07-25  9:22   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 08/47] erofs: dynamically allocate the erofs-shrinker Qi Zheng
2023-07-25  9:24   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 09/47] f2fs: dynamically allocate the f2fs-shrinker Qi Zheng
2023-07-25  9:25   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 10/47] gfs2: dynamically allocate the gfs2-glock shrinker Qi Zheng
2023-07-26  6:43   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 11/47] gfs2: dynamically allocate the gfs2-qd shrinker Qi Zheng
2023-07-26  6:49   ` Muchun Song
2023-07-26  9:22     ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 12/47] NFSv4.2: dynamically allocate the nfs-xattr shrinkers Qi Zheng
2023-07-26  6:55   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 13/47] nfs: dynamically allocate the nfs-acl shrinker Qi Zheng
2023-07-26  6:57   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 14/47] nfsd: dynamically allocate the nfsd-filecache shrinker Qi Zheng
2023-07-26  6:59   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 15/47] quota: dynamically allocate the dquota-cache shrinker Qi Zheng
2023-07-26  6:59   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 16/47] ubifs: dynamically allocate the ubifs-slab shrinker Qi Zheng
2023-07-26  7:00   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 17/47] rcu: dynamically allocate the rcu-lazy shrinker Qi Zheng
2023-07-26  7:04   ` Muchun Song
2023-07-26  9:24     ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 18/47] rcu: dynamically allocate the rcu-kfree shrinker Qi Zheng
2023-07-26  7:05   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 19/47] mm: thp: dynamically allocate the thp-related shrinkers Qi Zheng
2023-07-26  7:10   ` Muchun Song
2023-07-26  9:27     ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 20/47] sunrpc: dynamically allocate the sunrpc_cred shrinker Qi Zheng
2023-07-26  7:11   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 21/47] mm: workingset: dynamically allocate the mm-shadow shrinker Qi Zheng
2023-07-26  7:13   ` Muchun Song
2023-07-26  9:28     ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 22/47] drm/i915: dynamically allocate the i915_gem_mm shrinker Qi Zheng
2023-07-26  7:15   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 23/47] drm/msm: dynamically allocate the drm-msm_gem shrinker Qi Zheng
2023-07-26  7:24   ` Muchun Song
2023-07-26  9:31     ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 24/47] drm/panfrost: dynamically allocate the drm-panfrost shrinker Qi Zheng
2023-07-24 11:17   ` Steven Price
2023-07-25  3:05     ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 25/47] dm: dynamically allocate the dm-bufio shrinker Qi Zheng
2023-07-26  7:24   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 26/47] dm zoned: dynamically allocate the dm-zoned-meta shrinker Qi Zheng
2023-07-26  7:25   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 27/47] md/raid5: dynamically allocate the md-raid5 shrinker Qi Zheng
2023-07-26  7:27   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 28/47] bcache: dynamically allocate the md-bcache shrinker Qi Zheng
2023-07-26  7:32   ` Muchun Song
2023-07-26  9:33     ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 29/47] vmw_balloon: dynamically allocate the vmw-balloon shrinker Qi Zheng
2023-07-26  7:35   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 30/47] virtio_balloon: dynamically allocate the virtio-balloon shrinker Qi Zheng
2023-07-26  7:36   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 31/47] mbcache: dynamically allocate the mbcache shrinker Qi Zheng
2023-07-26  7:39   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 32/47] ext4: dynamically allocate the ext4-es shrinker Qi Zheng
2023-07-26  7:40   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 33/47] jbd2,ext4: dynamically allocate the jbd2-journal shrinker Qi Zheng
2023-07-26  7:41   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 34/47] nfsd: dynamically allocate the nfsd-client shrinker Qi Zheng
2023-07-24 12:17   ` Jeff Layton
2023-07-24  9:43 ` [PATCH v2 35/47] nfsd: dynamically allocate the nfsd-reply shrinker Qi Zheng
2023-07-24 12:17   ` Jeff Layton
2023-07-24  9:43 ` [PATCH v2 36/47] xfs: dynamically allocate the xfs-buf shrinker Qi Zheng
2023-07-26  7:42   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 37/47] xfs: dynamically allocate the xfs-inodegc shrinker Qi Zheng
2023-07-26  7:42   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 38/47] xfs: dynamically allocate the xfs-qm shrinker Qi Zheng
2023-07-26  7:43   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 39/47] zsmalloc: dynamically allocate the mm-zspool shrinker Qi Zheng
2023-07-26  7:43   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 40/47] fs: super: dynamically allocate the s_shrink Qi Zheng
2023-07-26  7:45   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 41/47] mm: shrinker: remove old APIs Qi Zheng
2023-07-26  7:46   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 42/47] drm/ttm: introduce pool_shrink_rwsem Qi Zheng
2023-07-26  7:56   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 43/47] mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred} Qi Zheng
2023-07-26  9:30   ` Muchun Song
2023-07-24  9:43 ` [PATCH v2 44/47] mm: shrinker: make global slab shrink lockless Qi Zheng
2023-07-26  8:08   ` Dave Chinner
2023-07-26  9:14     ` Qi Zheng
2023-07-26 23:09       ` Dave Chinner [this message]
2023-07-27  3:34         ` Qi Zheng
2023-07-24  9:43 ` [PATCH v2 45/47] mm: shrinker: make memcg " Qi Zheng
2023-07-24  9:43 ` [PATCH v2 46/47] mm: shrinker: hold write lock to reparent shrinker nr_deferred Qi Zheng
2023-07-24  9:43 ` [PATCH v2 47/47] mm: shrinker: convert shrinker_rwsem to mutex Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMGnthZAh48JF+eV@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=cel@kernel.org \
    --cc=cluster-devel@redhat.com \
    --cc=djwong@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=roman.gushchin@linux.dev \
    --cc=senozhatsky@chromium.org \
    --cc=steven.price@arm.com \
    --cc=tkhai@ya.ru \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    --cc=yujie.liu@intel.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox