From: Brian Foster <bfoster@redhat.com>
To: Emmanuel Florac <eflorac@intellique.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
Dave Chinner <david@fromorbit.com>,
"'linux-xfs@vger.kernel.org'" <linux-xfs@vger.kernel.org>
Subject: Re: Weird xfs_repair error
Date: Mon, 24 Jul 2017 10:51:25 -0400 [thread overview]
Message-ID: <20170724145125.GA12097@bfoster.bfoster> (raw)
In-Reply-To: <20170724162728.2a77797a@harpe.intellique.com>
On Mon, Jul 24, 2017 at 04:27:28PM +0200, Emmanuel Florac wrote:
> Le Mon, 17 Jul 2017 13:11:29 -0400
> Brian Foster <bfoster@redhat.com> écrivait:
>
> > On Tue, Jul 11, 2017 at 03:23:52PM +0200, Emmanuel Florac wrote:
> > > Le Fri, 7 Jul 2017 08:36:33 -0700
> > > "Darrick J. Wong" <darrick.wong@oracle.com> écrivait:
> > >
> > > > > fatal error -- name create failed in lost+found (28), filesystem
> > > > > may be out of space
> > > >
> > > > Would be helpful to have a metadump of this goobered-up lost+found
> > > > fs...
> > > >
> > >
> > > The metadump is here for anyone who would like to have a look:
> > >
> > > http://update2.intellique.com/pub/bign.metadump.xz
> > >
> > > The filesystem is about 115 TiB.
> > >
> >
> > Thanks for posting this. The first thing to note is that this
> > filesystem is severely corrupted.
>
> This I have determined myself through the fact that many runs of
> xfs_repair (and different versions of it, v4.7, 4.9, 4.11...) can't get
> it into a stable (i.e. that won't crash while trying to access it)
> state.
>
> > Nonetheless, I've been playing
> > around with trying to get the latest for-next xfs_repair to run
> > through this fs (via gdb) and have definitely hit a few issues:
> >
> > - xfs_sb_verify() was changed to use bp->b_maps[0].bm_bn rather than
> > bp->b_bn in libxfs commit 85428dd23f ("xfs: fix superblock
> > inprogress check"). b_maps isn't allocated if the buffer was
> > initialized with libxfs_initbuf() (rather than libxfs_initbuf_map()).
> > This causes a sigsegv here, though only if I disable -O2 optimization
> > for some reason that I haven't dug into yet.
> > - libxfs commit 0268fdc3fe ("xfs: remove xfs_trans_get_block_res")
> > replaced the use of xfs_trans_get_block_res() in
> > xfs_bmbt_alloc_block() which causes the -ENOSPC error. The previous
> > function was hardcoded to return 1 such that this would never occur.
> > - The recently added directory sf format verifier (xfs_iformat_fork()
> > -> xfs_dir2_sf_verify()) seems to cause a premature repair failure in
> > at least one case.
> >
> > I was able to eventually get repair to complete with some quick hacks
> > to bypass those issues. I did have to run repair two or three times
> > to get the fs to a clean state. The fs mounts and otherwise appears
> > clean to xfs_repair, but it's not clear to me how usable the
> > resulting fs really is (repair is for fs consistency after all, not
> > necessarily data recovery). Note that lost+found appears to be loaded
> > with 18T of data across almost 2 million inodes. :/
>
> Thank you for your efforts, the loaded lost+found matches my own
> results, however some of the files there have been present for possibly
> years. In fact this filesystem has crashed several times in the past
> years but always went back online at some point, until... now.
>
> So what could I do, at least to be able to mount it and copy everything
> elsewhere before mkfs'ing it all again? Do you have an xfs_repair
> binary at hand that I could use, or should I dig into the latest
> source?
>
There are several fixes in-flight for the issues uncovered by this
metadump. I think you'll want to include the following 3 patches to
xfsprogs:
http://marc.info/?l=linux-xfs&m=150047977108174&w=2
http://marc.info/?l=linux-xfs&m=150040481220074&w=2
http://marc.info/?l=linux-xfs&m=150040481820076&w=2
Note that the last 2 patches are probably going to be reworked into a
different implementation. The idea here is ultimately to avoid running
the verifier in a case where it disrupts xfs_repair, so using this
intermediate patch series should be good enough to build a custom binary
that allows xfs_repair to eventually piece the fs back together. You
could alternatively just hack xfs_dir2_sf_verify() to return 0.
Note that I would highly recommend to test whatever you build against
your metadump before the original fs.
Brian
> --
> ------------------------------------------------------------------------
> Emmanuel Florac | Direction technique
> | Intellique
> | <eflorac@intellique.com>
> | +33 1 78 94 84 02
> ------------------------------------------------------------------------
next prev parent reply other threads:[~2017-07-24 14:51 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-06 13:30 Weird xfs_repair error Emmanuel Florac
2017-07-06 13:48 ` Brian Foster
2017-07-06 14:49 ` Emmanuel Florac
2017-07-06 23:28 ` Dave Chinner
2017-07-07 11:36 ` Emmanuel Florac
2017-07-07 11:50 ` Emmanuel Florac
2017-07-07 15:36 ` Darrick J. Wong
2017-07-10 17:29 ` Emmanuel Florac
2017-07-11 13:23 ` Emmanuel Florac
2017-07-17 17:11 ` Brian Foster
2017-07-24 14:27 ` Emmanuel Florac
2017-07-24 14:51 ` Brian Foster [this message]
2017-07-25 16:44 ` Emmanuel Florac
2017-07-25 17:16 ` Emmanuel Florac
2017-07-25 19:22 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170724145125.GA12097@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=eflorac@intellique.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).