linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Dave Chinner <david@fromorbit.com>
Cc: Kirill Tkhai <tkhai@ya.ru>,
	akpm@linux-foundation.org, vbabka@suse.cz,
	viro@zeniv.linux.org.uk, brauner@kernel.org, djwong@kernel.org,
	hughd@google.com, paulmck@kernel.org, muchun.song@linux.dev,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	zhengqi.arch@bytedance.com
Subject: Re: [PATCH v2 3/3] fs: Use delayed shrinker unregistration
Date: Mon, 5 Jun 2023 19:56:59 -0700	[thread overview]
Message-ID: <ZH6ge3yiGAotYRR9@P9FQF9L96D> (raw)
In-Reply-To: <ZH6K0McWBeCjaf16@dread.disaster.area>

On Tue, Jun 06, 2023 at 11:24:32AM +1000, Dave Chinner wrote:
> On Mon, Jun 05, 2023 at 05:38:27PM -0700, Roman Gushchin wrote:
> > On Mon, Jun 05, 2023 at 10:03:25PM +0300, Kirill Tkhai wrote:
> > > Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec
> > > test case caused by commit: f95bdb700bc6 ("mm: vmscan: make global slab
> > > shrink lockless"). Qi Zheng investigated that the reason is in long SRCU's
> > > synchronize_srcu() occuring in unregister_shrinker().
> > > 
> > > This patch fixes the problem by using new unregistration interfaces,
> > > which split unregister_shrinker() in two parts. First part actually only
> > > notifies shrinker subsystem about the fact of unregistration and it prevents
> > > future shrinker methods calls. The second part completes the unregistration
> > > and it insures, that struct shrinker is not used during shrinker chain
> > > iteration anymore, so shrinker memory may be freed. Since the long second
> > > part is called from delayed work asynchronously, it hides synchronize_srcu()
> > > delay from a user.
> > > 
> > > Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
> > > ---
> > >  fs/super.c |    3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/super.c b/fs/super.c
> > > index 8d8d68799b34..f3e4f205ec79 100644
> > > --- a/fs/super.c
> > > +++ b/fs/super.c
> > > @@ -159,6 +159,7 @@ static void destroy_super_work(struct work_struct *work)
> > >  							destroy_work);
> > >  	int i;
> > >  
> > > +	unregister_shrinker_delayed_finalize(&s->s_shrink);
> > >  	for (i = 0; i < SB_FREEZE_LEVELS; i++)
> > >  		percpu_free_rwsem(&s->s_writers.rw_sem[i]);
> > >  	kfree(s);
> > > @@ -327,7 +328,7 @@ void deactivate_locked_super(struct super_block *s)
> > >  {
> > >  	struct file_system_type *fs = s->s_type;
> > >  	if (atomic_dec_and_test(&s->s_active)) {
> > > -		unregister_shrinker(&s->s_shrink);
> > > +		unregister_shrinker_delayed_initiate(&s->s_shrink);
> > 
> > Hm, it makes the API more complex and easier to mess with. Like what will happen
> > if the second part is never called? Or it's called without the first part being
> > called first?
> 
> Bad things.

Agree.

> Also, it doesn't fix the three other unregister_shrinker() calls in
> the XFS unmount path, nor the three in the ext4/mbcache/jbd2 unmount
> path.
> 
> Those are just some of the unregister_shrinker() calls that have
> dynamic contexts that would also need this same fix; I haven't
> audited the 3 dozen other unregister_shrinker() calls around the
> kernel to determine if any of them need similar treatment, too.
> 
> IOWs, this patchset is purely a band-aid to fix the reported
> regression, not an actual fix for the underlying problems caused by
> moving the shrinker infrastructure to SRCU protection.  This is why
> I really want the SRCU changeover reverted.
> 
> Not only are the significant changes the API being necessary, it's
> put the entire shrinker paths under a SRCU critical section. AIUI,
> this means while the shrinkers are running the RCU grace period
> cannot expire and no RCU freed memory will actually get freed until
> the srcu read lock is dropped by the shrinker.
> 
> Given the superblock shrinkers are freeing dentry and inode objects
> by RCU freeing, this is also a fairly significant change of
> behaviour. i.e.  cond_resched() in the shrinker processing loops no
> longer allows RCU grace periods to expire and have memory freed with
> the shrinkers are running.
> 
> Are there problems this will cause? I don't know, but I'm pretty
> sure they haven't even been considered until now....
> 
> > Isn't it possible to hide it from a user and call the second part from a work
> > context automatically?
> 
> Nope, because it has to be done before the struct shrinker is freed.
> Those are embedded into other structures rather than being
> dynamically allocated objects.

This part we might consider to revisit, if it helps to solve other problems.
Having an extra memory allocation (or two) per mount-point doesn't look
that expensive. Again, iff it helps with more important problems.

Thanks!

  reply	other threads:[~2023-06-06  2:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-05 19:02 [PATCH v2 0/3] mm: Make unregistration of super_block shrinker more faster Kirill Tkhai
2023-06-05 19:03 ` [PATCH v2 1/3] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu() Kirill Tkhai
2023-06-06  0:31   ` Roman Gushchin
2023-06-05 19:03 ` [PATCH v2 2/3] mm: Split unregister_shrinker() in fast and slow part Kirill Tkhai
2023-06-07  4:49   ` kernel test robot
2023-06-07  7:33     ` Yujie Liu
2023-06-05 19:03 ` [PATCH v2 3/3] fs: Use delayed shrinker unregistration Kirill Tkhai
2023-06-06  0:38   ` Roman Gushchin
2023-06-06  1:24     ` Dave Chinner
2023-06-06  2:56       ` Roman Gushchin [this message]
2023-06-06  6:51         ` Dave Chinner
2023-06-06 15:56           ` Roman Gushchin
2023-06-06 21:21       ` Kirill Tkhai
2023-06-06 22:30         ` Dave Chinner
2023-06-08 16:36       ` Theodore Ts'o
2023-06-08 23:17         ` Dave Chinner
2023-06-09  0:27           ` Andrew Morton
2023-06-09  2:50             ` Qi Zheng
2023-06-05 22:32 ` [PATCH v2 0/3] mm: Make unregistration of super_block shrinker more faster Dave Chinner
2023-06-06 21:06   ` Kirill Tkhai
2023-06-06 22:02     ` Dave Chinner
2023-06-07  2:51       ` Qi Zheng
2023-06-08 21:58         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZH6ge3yiGAotYRR9@P9FQF9L96D \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=paulmck@kernel.org \
    --cc=tkhai@ya.ru \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).