linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Eric Sandeen <sandeen@redhat.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: What is the vision for btrfs fs repair?
Date: Thu, 09 Oct 2014 07:29:23 -0400	[thread overview]
Message-ID: <54367193.6000202@gmail.com> (raw)
In-Reply-To: <54358C77.2070808@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4907 bytes --]

On 2014-10-08 15:11, Eric Sandeen wrote:
> I was looking at Marc's post:
>
> http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html
>
> and it feels like there isn't exactly a cohesive, overarching vision for
> repair of a corrupted btrfs filesystem.
>
> In other words - I'm an admin cruising along, when the kernel throws some
> fs corruption error, or for whatever reason btrfs fails to mount.
> What should I do?
>
> Marc lays out several steps, but to me this highlights that there seem to
> be a lot of disjoint mechanisms out there to deal with these problems;
> mostly from Marc's blog, with some bits of my own:
>
> * btrfs scrub
> 	"Errors are corrected along if possible" (what *is* possible?)
> * mount -o recovery
> 	"Enable autorecovery attempts if a bad tree root is found at mount time."
> * mount -o degraded
> 	"Allow mounts to continue with missing devices."
> 	(This isn't really a way to recover from corruption, right?)
> * btrfs-zero-log
> 	"remove the log tree if log tree is corrupt"
> * btrfs rescue
> 	"Recover a damaged btrfs filesystem"
> 	chunk-recover
> 	super-recover
> 	How does this relate to btrfs check?
> * btrfs check
> 	"repair a btrfs filesystem"
> 	--repair
> 	--init-csum-tree
> 	--init-extent-tree
> 	How does this relate to btrfs rescue?
> * btrfs restore
> 	"try to salvage files from a damaged filesystem"
> 	(not really repair, it's disk-scraping)
>
>
> What's the vision for, say, scrub vs. check vs. rescue?  Should they repair the
> same errors, only online vs. offline?  If not, what class of errors does one fix vs.
> the other?  How would an admin know?  Can btrfs check recover a bad tree root
> in the same way that mount -o recovery does?  How would I know if I should use
> --init-*-tree, or chunk-recover, and what are the ramifications of using
> these options?
>
> It feels like recovery tools have been badly splintered, and if there's an
> overarching design or vision for btrfs fs repair, I can't tell what it is.
> Can anyone help me?

Well, based on my understanding:
* btrfs scrub is intended to be almost exactly equivalent to scrubbing a 
RAID volume; that is, it fixes disparity between multiple copies of the 
same block.  IOW, it isn't really repair per se, but more preventative 
maintnence.  Currently, it only works for cases where you have multiple 
copies of a block (dup, raid1, and raid10 profiles), but support is 
planned for error correction of raid5 and raid6 profiles.
* mount -o recovery I don't know much about, but AFAICT, it s more for 
dealing with metadata related FS corruption.
* mount -o degraded is used to mount a fs configured for a raid storage 
profile with fewer devices than the profile minimum.  It's primarily so 
that you can get the fs into a state where you can run 'btrfs device 
replace'
* btrfs-zero-log only deals with log tree corruption.  This would be 
roughly equivalent to zeroing out the journal on an XFS or ext4 
filesystem, and should almost never be needed.
* btrfs rescue is intended for low level recovery corruption on an 
offline fs.
     * chunk-recover I'm not entirely sure about, but I believe it's 
like scrub for a single chunk on an offline fs
     * super-recover is for dealing with corrupted superblocks, and 
tries to replace it with one of the other copies (which hopefully isn't 
corrupted)
* btrfs check is intended to (eventually) be equivalent to the fsck 
utility for most other filesystems.  Currently, it's relatively good at 
identifying corruption, but less so at actually fixing it.  There are 
however, some things that it won't catch, like a superblock pointing to 
a corrupted root tree.
* btrfs restore is essentially disk scraping, but with built-in 
knowledge of the filesystem's on-disk structure, which makes it more 
reliable than more generic tools like scalpel for files that are too big 
to fit in the metadata blocks, and it is pretty much essential for 
dealing with transparently compressed files.

In general, my personal procedure for handling a misbehaving BTRFS 
filesystem is:
* Run btrfs check on it WITHOUT ANY OTHER OPTIONS to try to identify 
what's wrong
* Try mounting it using -o recovery
* Try mounting it using -o ro,recovery
* Use -o degraded only if it's a BTRFS raid set that lost a disk
* If btrfs check AND dmesg both seem to indicate that the log tree is 
corrupt, try btrfs-zero-log
* If btrfs check indicated a corrupt superblock, try btrfs rescue 
super-recover
* If all of the above fails, ask for advice on the mailing list or IRC
Also, you should be running btrfs scrub regularly to correct bit-rot and 
force remapping of blocks with read errors.  While BTRFS technically 
handles both transparently on reads, it only corrects thing on disk when 
you do a scrub.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

  reply	other threads:[~2014-10-09 11:29 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-08 19:11 What is the vision for btrfs fs repair? Eric Sandeen
2014-10-09 11:29 ` Austin S Hemmelgarn [this message]
2014-10-09 11:53   ` Duncan
2014-10-09 11:55     ` Hugo Mills
2014-10-09 12:07     ` Austin S Hemmelgarn
2014-10-09 12:12       ` Hugo Mills
2014-10-09 12:32         ` Austin S Hemmelgarn
     [not found]     ` <107Y1p00G0wm9Bl0107vjZ>
2014-10-09 12:34       ` Duncan
2014-10-09 13:18         ` Austin S Hemmelgarn
2014-10-09 13:49           ` Duncan
2014-10-09 15:44             ` Eric Sandeen
     [not found]     ` <0zvr1p0162Q6ekd01zvtN0>
2014-10-09 12:42       ` Duncan
2014-10-10  1:58 ` Chris Murphy
2014-10-10  3:20   ` Duncan
2014-10-10 10:53   ` Bob Marley
2014-10-10 10:59     ` Roman Mamedov
2014-10-10 11:12       ` Bob Marley
2014-10-10 15:18         ` cwillu
2014-10-10 14:37     ` Chris Murphy
2014-10-10 17:43       ` Bob Marley
2014-10-10 17:53         ` Bardur Arantsson
2014-10-10 19:35         ` Austin S Hemmelgarn
2014-10-10 22:05           ` Eric Sandeen
2014-10-13 11:26             ` Austin S Hemmelgarn
2014-10-12 10:14       ` Martin Steigerwald
2014-10-12 23:59         ` Duncan
2014-10-13 11:37         ` Austin S Hemmelgarn
2014-10-13 11:48         ` Rich Freeman
2014-10-11  7:29     ` Goffredo Baroncelli
2014-11-17 20:55       ` Phillip Susi
2014-10-12 10:06   ` Martin Steigerwald
2014-10-12 10:17 ` Martin Steigerwald
2014-10-13 21:09 ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54367193.6000202@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).