public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
Date: Thu, 16 Dec 2021 11:25:53 +1100	[thread overview]
Message-ID: <20211216002553.GV449541@dread.disaster.area> (raw)
In-Reply-To: <d797a746-dc8f-c74a-bb15-637b6b3ecbe7@sandeen.net>

On Wed, Dec 15, 2021 at 06:21:16PM -0600, Eric Sandeen wrote:
> On 12/15/21 6:17 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Got a report that a repeated crash test of a container host would
> > eventually fail with a log recovery error preventing the system from
> > mounting the root filesystem. It manifested as a directory leaf node
> > corruption on writeback like so:
> > 
> >   XFS (loop0): Mounting V5 Filesystem
> >   XFS (loop0): Starting recovery (logdev: internal)
> >   XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
> >   XFS (loop0): Unmount and run xfs_repair
> >   XFS (loop0): First 128 bytes of corrupted metadata buffer:
> >   00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
> >   00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
> >   00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
> >   00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
> >   00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
> >   00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
> >   00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
> >   00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
> >   XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
> >   XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> >   XFS (loop0): log mount/recovery failed: error -117
> >   XFS (loop0): log mount failed
> > 
> > Tracing indicated that we were recovering changes from a transaction
> > at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> > That is, log recovery was overwriting a buffer with newer changes on
> > disk than was in the transaction. Tracing indicated that we were
> > hitting the "recovery immediately" case in
> > xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> > buffer.
> > 
> > The code was extracting the LSN correctly, then ignoring it because
> > the UUID in the buffer did not match the superblock UUID. The
> > problem arises because the UUID check uses the wrong UUID - it
> > should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> > has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> > correct matching sb_meta_uuid in it, it's just the code checked it
> > against the wrong superblock uuid.
> > 
> > The is no corruption in the filesystem, and failing to recover the
> > buffer due to a write verifier failure means the recovery bug did
> > not propagate the corruption to disk. Hence there is no corruption
> > before or after this bug has manifested, the impact is limited
> > simply to an unmountable filesystem....
> > 
> > This was missed back in 2015 during an audit of incorrect sb_uuid
> > usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> > to validate against sb_meta_uuid") that fixed the magic32 buffers to
> > validate against sb_meta_uuid instead of sb_uuid. It missed the
> > magicda buffers....
> > 
> > Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Kind of amazing it took so long to turn up and get found, eh?

Especially considering that I use sb_uuid != sb_meta_uuid for all
the test device filesystems I run fstests on and have for the past 5
years. I think that means we *never* run log recovery tests on test
devices in fstests.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-12-16  0:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-16  0:17 [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery Dave Chinner
2021-12-16  0:21 ` Eric Sandeen
2021-12-16  0:25   ` Dave Chinner [this message]
2021-12-16  0:23 ` Dave Chinner
2021-12-16  1:10 ` Darrick J. Wong
2021-12-16  3:58   ` Darrick J. Wong
2021-12-16  5:16     ` Dave Chinner
2021-12-24  7:07 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211216002553.GV449541@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox