public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 0/9] xfsprogs: big, broken filesystems cause pain
Date: Tue, 22 Dec 2015 08:37:00 +1100	[thread overview]
Message-ID: <1450733829-9319-1-git-send-email-david@fromorbit.com> (raw)

Hi folks,

This is a work-in-progress patchset that I"ve been spending the last
week on trying to get xfs_repair to run cleanly through a busted
30TB filesystem image. The first 2 patches were needed just to get
metadump to create the filesystem image, the third is helpful in
tellingme exactly how much of the 38GB of metadata has been
restored.

The next two patches parallelise parts of the repair process;
uncertain inode processing in phase 3 was taking more than 20
minutes, and phase 7 was taking almost 2 hours. Both are trivially
parallelisable - the phase 3 is now down under 5 minutes, but I
haven't fully tested the phase 7 code because I haven't managed to
get a full repair of the original image past phase 6 since I wrote
this patch. I have run it through xfstests many times, but that's
not the same as having it process and correct the link counts on
several million inodes....

Patch 6 was the first crash problem I fixed - this is 17 year old
bug in the directory code, and will also need to be fixed in the
kernel.

Patch 7-9 fix the major problem that was causing issues - the
cache's handling of buffers that were dirty but still corrupt.
xfs_repair doesn't fix all the problems in a buffer in a single pass
- it may make modifications in early phases and then use those
modifications to trigger specific repairs in later phases. However,
when you have 38GB of metadata to check and correct, the buffer
cache is not going to hold all these buffers, and so the reclaim
algorithms are going to have an impact.

That impact was pretty bad - the partially correct buffers were
being tossed away because their write verifiers were failing and
hence never making it to disk.  Hence when the later phase re-read
the buffer, it pull the original uncorrected, corrupt blocks back in
from disk, and so phases 5, 6 and 7 were tripping over corruptions
that were assumed to be fixed and that was causing random memory
corruptions, use after free, etc.

These three patches are a pretty nasty hack to keep the dirty
buffers around until they are fully repaired. The whole userspace
libxfs buffer cache is really showing it's limitations here; it
doesn't scale effectively, it doesn't isolate operations between
independent threads (i.e. per-ag threads), it doesn't handle dirty
objects or writeback failures sanely and it has an overly
complex cache abstraction that has only one user. Ultimately, we need
to rewrite it from scratch, but in the mean time we need to make
repair actually complete properly and hence these patches to hack
the necessary fixes into it.

With these, repair is getting deep into phase 6 on the original
image, before failing moving an inode to lost+found because the
inode has a mismatch between the bmbt size and the number of records
supposedly in the bmbt. That's a new failure I haven't seen before,
so there's still more fixes to come....

-Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

             reply	other threads:[~2015-12-21 21:37 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-21 21:37 Dave Chinner [this message]
2015-12-21 21:37 ` [PATCH 1/9] metadump: clean up btree block region zeroing Dave Chinner
2016-01-04 19:11   ` Brian Foster
2015-12-21 21:37 ` [PATCH 2/9] metadump: bounds check btree block regions being zeroed Dave Chinner
2016-01-04 19:11   ` Brian Foster
2015-12-21 21:37 ` [PATCH 3/9] xfs_mdrestore: correctly account bytes read Dave Chinner
2016-01-04 19:12   ` Brian Foster
2015-12-21 21:37 ` [PATCH 4/9] repair: parallelise phase 7 Dave Chinner
2016-01-04 19:12   ` Brian Foster
2015-12-21 21:37 ` [PATCH 5/9] repair: parallelise uncertin inode processing in phase 3 Dave Chinner
2016-01-04 19:12   ` Brian Foster
2015-12-21 21:37 ` [PATCH 6/9] libxfs: directory node splitting does not have an extra block Dave Chinner
2016-01-05 18:34   ` Brian Foster
2016-01-05 22:07     ` Dave Chinner
2015-12-21 21:37 ` [PATCH 7/9] libxfs: don't discard dirty buffers Dave Chinner
2016-01-05 18:34   ` Brian Foster
2015-12-21 21:37 ` [PATCH 8/9] libxfs: don't repeatedly shake unwritable buffers Dave Chinner
2016-01-05 18:34   ` Brian Foster
2015-12-21 21:37 ` [PATCH 9/9] libxfs: keep unflushable buffers off the cache MRUs Dave Chinner
2016-01-05 18:34   ` Brian Foster
2016-01-05 23:58     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1450733829-9319-1-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox