From: Al Viro <viro@ZenIV.linux.org.uk>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] fs: don't scan the inode cache before SB_ACTIVE is set
Date: Mon, 26 Mar 2018 06:51:37 +0100 [thread overview]
Message-ID: <20180326055137.GP30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20180326053151.GO30522@ZenIV.linux.org.uk>
On Mon, Mar 26, 2018 at 06:31:51AM +0100, Al Viro wrote:
> On Mon, Mar 26, 2018 at 03:35:03PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > We recently had an oops reported on a 4.14 kernel in
> > xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
> > and so the m_perag_tree lookup walked into lala land.
> >
> > We found a mount in a failed state, blocked on teh shrinker rwsem
> > here:
> >
> > mount_bdev()
> > deactivate_locked_super()
> > unregister_shrinker()
> >
> > Essentially, the machine was under memory pressure when the mount
> > was being run, xfs_fs_fill_super() failed after allocating the
> > xfs_mount and attaching it to sb->s_fs_info. It then cleaned up and
> > freed the xfs_mount, but the sb->s_fs_info field still pointed to
> > the freed memory. Hence when the superblock shrinker then ran
> > it fell off the bad pointer.
> >
> > This is reproduced by using the mount_delay sysfs control as added
> > in teh previous patch. It produces an oops down this path during the
> > stalled mount:
>
> > The problem is that the superblock shrinker is running before the
> > filesystem structures it depends on have been fully set up. i.e.
> > the shrinker is registered in sget(), before ->fill_super() has been
> > called, and the shrinker can call into the filesystem before
> > fill_super() does it's setup work.
>
> Wait a sec... How the hell does it get through trylock_super() before
> ->s_root is set and ->s_umount is unlocked?
I see... So basically the story is
* super_cache_count() lacks trylock_super(), making it possible that it'll
be called too early on half-set superblock.
* it can't be called too late (during fs shutdown), since the shrinker is
unregistered before the call of ->kill_sb()
* making sure it won't get called too early can be done by checking SB_ACTIVE.
It's potentially racy, though - don't we need a barrier between setting the
things up and setting SB_ACTIVE?
And that, BTW, means that we want SB_BORN instead of SB_ACTIVE - unlike the
latter, the former is set only in one place. So I'd suggest switching to
checking that, with a barrier pair added (in mount_fs() before setting the
sucker, another in super_cache_count() (before doing the
scan).
next prev parent reply other threads:[~2018-03-26 5:51 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-26 4:35 [PATCH] fs: don't scan the inode cache before SB_ACTIVE is set Dave Chinner
2018-03-26 5:31 ` Al Viro
2018-03-26 5:51 ` Al Viro [this message]
2018-03-26 6:33 ` Dave Chinner
2018-03-26 6:55 ` Al Viro
2018-03-26 7:21 ` Dave Chinner
2018-03-27 6:57 ` [PATCH V2] fs: don't scan the inode cache before SB_BORN " Dave Chinner
2018-03-27 7:24 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180326055137.GP30522@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).