From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Lucas Stach <l.stach@pengutronix.de>, linux-xfs@vger.kernel.org
Subject: Re: Regular FS shutdown while rsync is running
Date: Thu, 24 Jan 2019 07:45:28 +1100 [thread overview]
Message-ID: <20190123204528.GL4205@dastard> (raw)
In-Reply-To: <20190123130306.GB27284@bfoster>
On Wed, Jan 23, 2019 at 08:03:07AM -0500, Brian Foster wrote:
> On Wed, Jan 23, 2019 at 07:11:53AM -0500, Brian Foster wrote:
> > On Wed, Jan 23, 2019 at 12:14:17PM +0100, Lucas Stach wrote:
> > > build_ino_tree(), which is shared between XFS_BTNUM_INO and
> > > XFS_BTNUM_FINO btree rebuilds contains the following in one of the
> > > loops:
> > >
> > > if (lptr->num_recs_pb > 0)
> > > prop_ino_cursor(mp, agno, btree_curs,
> > > ino_rec->ino_startnum, 0);
> > >
> > > prop_ino_cursor() calls libxfs_btree_init_block() with the btnum
> > > parameter being a fixed XFS_BTNUM_INO.
Nice find, Lucas!
> > It's been a while since I've looked at the guts of this code but that
> > definitely does not look right. build_ino_tree() is where we abstract
> > construction of the different tree types. Nice catch!
> >
> > If this is the problem, I'll have to take a closer look to figure out
> > why this problem isn't more prevalent..
> >
>
> Hmm... so it looks like this is the code responsible for _growing_ the
> interior nodes of the tree (i.e., level > 0). If I'm following the code
> correctly, the high level algorithm is to first set up a block for each
> level of the tree. This occurs early in build_ino_tree() and uses the
> correct btnum. We then start populating the leaf nodes with the finobt
> records that have previously been built up in-core. For each leaf block,
> we populate the higher level block with the key based on the first
> record in the leaf then fill in the remaining records in the leaf. This
> repeats until the block at the next level up is full, at which point we
> grab a new level > 0 block and initialize it with the inobt magic.
>
> I haven't confirmed, but I think this basically means that we'd
> initialize every intermediate block after the left-most block at each
> level in the tree with the inobt magic. I think Dave already recognized
> in the other subthread that you have a 2 block level 1 tree. I suspect
> the only reasons we haven't run into this until now are that the finobt
> is intended to remain fairly small by only tracking free inode records
> and thus prioritizing allocation from those records (removing them from
> the tree) and this is limited to users who repair a filesystem in such a
> state.
>
> I'm guessing you have a large enough set of inodes and a sustained
> enough deletion pattern in your workload that you eventually free enough
> inodes across a large enough set of records to populate a multi-block
> level 1 tree. Then as soon as you try to allocate from that side of the
> tree the finer grained checks in __xfs_btree_check_[l|s]block() discover
> the wrong magic and generate the error.
Yeah, this is the "lots of sparse free inodes" sort of filesystem
that finobt is intended to really help with. :/
> The positive here is that this isn't really a coherency problem with the
> filesystem. The finobt itself is coherent, consistent with the inobt and
> the magic value doesn't really dictate how the trees are used/updated at
> runtime. This means that the only real corruption should be the magic
> value.
*nod*
Which means it should be fixable by a bit of xfs_db magic to find
and rewrite the magic numbers without needing a full repair to run.
> I am wondering whether this warrants a special case in the magic
> checking to convert the runtime internal error into more of a warning to
> unmount and run xfs_repair (similar to how we detected the agfl size
> mismatch problem). Dave, any thoughts on that?
I don't think it has the scope and breadth of the AGFL issue - there
isn't an actual fatal corruption on disk that will lead to loss or
corruption of user data. It's just the wrong magic number being
placed in the btree block.
IMO, after fixing the repair bug, we should:
a) make the inobt/finobt block verifier more robust to make
sure we have the correct magic number for the btree type. In
hindsight, if we did that in the first place then the
repair bug would have not escaped into the wild.
b) write a xfs_db finobt traversal command that rewrites the
magic numbers of all the blocks in the tree. Probably with
an xfs_admin wrapper to make it nice for users who need to
run it.
c) audit all the other verifiers to determine whether the
combined magic number checks can/should be split so in
future we detect this class of bug during development and
testing.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2019-01-23 20:45 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-26 13:29 Regular FS shutdown while rsync is running Lucas Stach
2018-11-26 15:32 ` Brian Foster
2019-01-21 10:41 ` Lucas Stach
2019-01-21 13:01 ` Brian Foster
2019-01-21 16:23 ` Lucas Stach
2019-01-21 18:11 ` Brian Foster
2019-01-21 18:21 ` Lucas Stach
2019-01-22 10:39 ` Lucas Stach
2019-01-22 13:02 ` Brian Foster
2019-01-23 11:14 ` Lucas Stach
2019-01-23 12:11 ` Brian Foster
2019-01-23 13:03 ` Brian Foster
2019-01-23 18:58 ` Brian Foster
2019-01-24 8:53 ` Lucas Stach
2019-01-23 20:45 ` Dave Chinner [this message]
2019-01-24 13:31 ` Brian Foster
2019-01-24 19:11 ` Brian Foster
2019-01-28 22:34 ` Dave Chinner
2019-01-29 13:46 ` Brian Foster
2019-01-21 21:18 ` Dave Chinner
2019-01-22 9:15 ` Lucas Stach
2019-01-22 21:41 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190123204528.GL4205@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=l.stach@pengutronix.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).