Re: [PARTIALLY SOLVED] Btrfs RAID1 corrupted after crash

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Maximilian Bräutigam" <m@xbra.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PARTIALLY SOLVED] Btrfs RAID1 corrupted after crash
Date: Mon, 14 Apr 2014 09:12:44 +0200	[thread overview]
Message-ID: <534B8A6C.4090808@xbra.de> (raw)
In-Reply-To: <pan$b81be$f24a3d23$40041e1f$3538cdeb@cox.net>

Am 14.04.2014 00:42, schrieb Duncan:
> Maximilian Bräutigam posted on Sun, 13 Apr 2014 22:18:21 +0200 as
> excerpted:
> 
>> unfortunately, I am very very deperate and I highly appreciate any help.
>> One week ago, I move my entire system to btrfs to setup a RAID1. I
>> created the RAID between device /dev/sdb and /dev/sdc with no partition
>> table on normal HDDs. Everything was working smoothly until my computer
>> crashed and at reboot I was not able to mount the device (my home dir)
>> again and got the following messages:
> 
> You did your research before switching to a new filesystem and know that 
> (as the btrfs kernel config option implies, and as the mkfs.btrfs command 
> said at least last I used it, tho that was the v3.12 version) btrfs isn't 
> entirely stable yet, and that (even more than with fully stable 
> filesystems, where the general principle still applies) you should keep 
> tested-to-be-usable backups when running it, or by action if not words, 
> you're demonstrating that you really don't care about the data you place 
> on it and don't mind if it gets trashed, right?
> 
> Good.  Then you either have a backup and can simply mkfs from your rescue 
> method and restore from that backup, or you've demonstrated by your 
> actions that the data wasn't of any major value to you anyway. No big 
> deal either way! =:^)
> 
> In case you didn't, well, you still have a reasonably good chance at 
> recovery =:^), but regardless of whether it's recovered or not, do chalk 
> this up to a learning experience and do your research and have those 
> backups ready and tested next time, OK?
> 
> [snip dmesg output from first attempt to mount]
> 
>> So I cleared the cache with trying the mount option clear_cache
> 
> Good.  First thing to try. =:^)
> 
>> but it stayed problematic and I was not able to mount it:
>>
>> [  368.159594] BTRFS: error (device sdc) in __btrfs_free_extent:5755:
>> errno=-5 IO failure
>> [  368.159602] BTRFS: error (device sdc) in
>> btrfs_run_delayed_refs:2713: errno=-5 IO failure
>> [  368.165584] BTRFS warning (device sdc): Skipping commit of aborted
>> transaction.
>> [  368.165589] BTRFS: error (device sdc) in cleanup_transaction:1545:
>> errno=-5 IO failure
>> [  368.165787] BTRFS: error (device sdc) in
>> open_ctree:2839: errno=-5 IO failure (Failed to recover log tree)
>> [  368.227161] BTRFS: open_ctree failed
> 
> OK, there's several things to try based on that output...
> 
>> Now, if I tried to mount it manually with degraded option enabled:
>>
>> # mount -t btrfs -o degraded /dev/sdb /mnt/sonst/
>> mount: wrong fs type, bad option, bad superblock on /dev/sdb,
>>        missing codepage or helper program, or other error
>>
>>        In some cases useful info is found in syslog - try dmesg | tail
>>        or so.
> 
> FWIW, the degraded option could be used if you didn't have both devices 
> available, but the above dmesg got beyond that, so degraded isn't likely 
> to help here.
> 
> 
>> Now I run btrfsck with repair option enabled but still I cannot mount
>> it.
> 
> That was a mistake, as you'd have known if you had read this list before 
> you tried your btrfs test.  btrfsck --repair can fix some problems, but 
> the code is rather new and not well tested and it can also make some 
> problems it doesn't know about worse, so the recommendation is to try it 
> last, after all other attempts to either fix the problem or simply 
> recover the data have failed and the next step would be a mkfs, so you're 
> not losing anything by trying it anyway.  Either that, or run it in 
> repair mode (without --repair it's OK since it's read-only and thus can't 
> do further damage) only after being told to do so by a dev who can read 
> the output from the read-only run and other diagnostics and is thus 
> relatively confident it will fix the problems without doing further 
> damage.
> 
>> Here you can find the dmesg and btrfsck outputs:
>> dmesg: http://pastebin.com/zsaKQ0h1
>> btrfsck: http://pastebin.com/xva6uJwT
>>
>> Please, help me! ;( Are there other options to investigate my RAID or to
>> even temporarily mount it to get some data? What went wrong here? What
>> can I do? Why is a simple crash making my RAID unusable? Can I use other
>> tools for a recovery?
> 
>> Archlinux, linux-3.14-5, btrfs-progs-3.14-1
> 
> Good.  You're using current kernel and tools. =:^)
> 
> As hinted above, there are indeed additional tools to try, and there's a 
> fair chance you can at least recover some/most of the data.  =:^)  Tho 
> you didn't do yourself any favors running btrfsck --repair before trying 
> them. =:^(
> 
> Please read the wiki and manpages before doing anything else so as to 
> increase the chances of recovery without further damage, but there's the 
> recovery mount option (which often works best with ro), and tools to 
> bypass the log tree and to recover from previous tree roots, among other 
> things.
> 
> wiki start page (suitable for memory or bookmarking):
> 
> https://btrfs.wiki.kernel.org
> 
> Here's the wiki's btrfsck page, which has a nice list of other things to 
> try before you use it with --repair (and a link to the page of a list 
> regular with further detail, too), but they will hopefully work afterward 
> as well.  Given the log-tree error in your dmesg, the btrfs-zero-log tool 
> might be useful.  But I'd definitely try mount -o ro,recovery first, and 
> if that works, get everything to backup before trying anything else.
> 
> https://btrfs.wiki.kernel.org/index.php/Btrfsck
> 

Hi Duncan,

I was not really afraid of my data since I have several external backups
of the important data or git repos of what I do for work. But I would
have lost some very recent photos, which would have not been nice. And I
am (still) afraid of setting up/configure a properly working home dir on
another fs again. This is just time consuming. Furthermore, I thought
that btrfs has reached a certain level of maturity and this means some
fail safety for me. But "filesystem disk format is no longer unstable"
[1] does obviously not mean that there is an intact ecosystem of repair
tools (or better said one program that simply tries its best).

I tried several things according to [2].

1) btrfs restore
Was not really working, only a few GB of my data.

2) then I realised some "transid verify failed", so I did a
btrfs-zero-log DEVICE

3) From here I was able to mount my volume again – so I could save my
latest photos. ;)

When I mount my volume with autodefrag,compress=lzo,subvolid=0, I end up
with a "rw" mounted device. Then I copy some data with e.g. rsync and it
turns to "ro" on some point. I found this while I wanted to scrub the
devices, but this is naturally only working for writable mounts. And it
is still – I don't know why – not possible to boot from the device again.

Things to do next: try again with recovery option. If this is not
working: roll back to ext4. But I really like the idea behind COW,
subvolumes, no partitioning, RAID and everything in one fs. Snapshots
against user mistakes, RAID against disk failure – perfectly save, if
there was not the fs itself.

So far, so good. The problem is, that even if I can come back to a fully
working device or RAID again, the work load (that I have to put in just
because my computer crashed) is much to high for something profound like
a home dir.

Duncan, I appreciate your email. Unfortunately, the only thing I learned
to far is to give btrfs some more decades to age. ;)

Best wishes and thanks again,
Max

[1] https://btrfs.wiki.kernel.org/index.php/Main_Page
[2] https://unix.stackexchange.com/questions/32440/how-do-i-fix-btrfs

next prev parent reply	other threads:[~2014-04-14  7:12 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-13 20:18 Btrfs RAID1 corrupted after crash Maximilian Bräutigam
2014-04-13 22:42 ` Duncan
2014-04-14  7:12   ` Maximilian Bräutigam [this message]
2014-04-14 11:02     ` [PARTIALLY SOLVED] " Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=534B8A6C.4090808@xbra.de \
    --to=m@xbra.de \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.