Btrfs RAID1 corrupted after crash

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Btrfs RAID1 corrupted after crash
@ 2014-04-13 20:18 Maximilian Bräutigam
  2014-04-13 22:42 ` Duncan
  0 siblings, 1 reply; 4+ messages in thread
From: Maximilian Bräutigam @ 2014-04-13 20:18 UTC (permalink / raw)
  To: linux-btrfs

Dear all,

unfortunately, I am very very deperate and I highly appreciate any help.
One week ago, I move my entire system to btrfs to setup a RAID1. I
created the RAID between device /dev/sdb and /dev/sdc with no
partition table on normal HDDs. Everything was working smoothly until
my computer crashed and at reboot I was not able to mount the device
(my home dir) again and got the following messages:

[  125.834802] BTRFS info (device sdc): disk space caching is enabled
[  130.600101] BTRFS error (device sdc): block group 1268688879616 has
wrong amount of free space
[  130.600113] BTRFS error (device sdc): failed to load free space
cache for block group 1268688879616
[  130.751274] BTRFS critical (device sdc): corrupt leaf, slot offset
bad: block=1268477591552,root=1, slot=137
[  130.751659] BTRFS critical (device sdc): corrupt leaf, slot offset
bad: block=1268477591552,root=1, slot=137

So I cleared the cache with trying the mount option clear_cache, but
it stayed problematic and I was not able to mount it:

[  368.159594] BTRFS: error (device sdc) in __btrfs_free_extent:5755:
errno=-5 IO failure
[  368.159602] BTRFS: error (device sdc) in
btrfs_run_delayed_refs:2713: errno=-5 IO failure
[  368.165584] BTRFS warning (device sdc): Skipping commit of aborted
transaction.
[  368.165589] BTRFS: error (device sdc) in cleanup_transaction:1545:
errno=-5 IO failure
[  368.165787] BTRFS: error (device sdc) in open_ctree:2839: errno=-5
IO failure (Failed to recover log tree)
[  368.227161] BTRFS: open_ctree failed

Now, if I tried to mount it manually with degraded option enabled:

# mount -t btrfs -o degraded /dev/sdb /mnt/sonst/
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

Now I run btrfsck with repair option enabled but still I cannot mount it.
Here you can find the dmesg and btrfsck outputs:
dmesg: http://pastebin.com/zsaKQ0h1
btrfsck: http://pastebin.com/xva6uJwT

Please, help me! ;( Are there other options to investigate my RAID or
to even temporarily mount it to get some data? What went wrong here?
What can I do? Why is a simple crash making my RAID unusable? Can I
use other tools for a recovery?

Again, every help is highly appreciated.
Best wishes,
Max
PS: Archlinux, linux-3.14-5, btrfs-progs-3.14-1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs RAID1 corrupted after crash
  2014-04-13 20:18 Btrfs RAID1 corrupted after crash Maximilian Bräutigam
@ 2014-04-13 22:42 ` Duncan
  2014-04-14  7:12   ` [PARTIALLY SOLVED] " Maximilian Bräutigam
  0 siblings, 1 reply; 4+ messages in thread
From: Duncan @ 2014-04-13 22:42 UTC (permalink / raw)
  To: linux-btrfs

Maximilian Bräutigam posted on Sun, 13 Apr 2014 22:18:21 +0200 as
excerpted:

> unfortunately, I am very very deperate and I highly appreciate any help.
> One week ago, I move my entire system to btrfs to setup a RAID1. I
> created the RAID between device /dev/sdb and /dev/sdc with no partition
> table on normal HDDs. Everything was working smoothly until my computer
> crashed and at reboot I was not able to mount the device (my home dir)
> again and got the following messages:

You did your research before switching to a new filesystem and know that 
(as the btrfs kernel config option implies, and as the mkfs.btrfs command 
said at least last I used it, tho that was the v3.12 version) btrfs isn't 
entirely stable yet, and that (even more than with fully stable 
filesystems, where the general principle still applies) you should keep 
tested-to-be-usable backups when running it, or by action if not words, 
you're demonstrating that you really don't care about the data you place 
on it and don't mind if it gets trashed, right?

Good.  Then you either have a backup and can simply mkfs from your rescue 
method and restore from that backup, or you've demonstrated by your 
actions that the data wasn't of any major value to you anyway. No big 
deal either way! =:^)

In case you didn't, well, you still have a reasonably good chance at 
recovery =:^), but regardless of whether it's recovered or not, do chalk 
this up to a learning experience and do your research and have those 
backups ready and tested next time, OK?

[snip dmesg output from first attempt to mount]

> So I cleared the cache with trying the mount option clear_cache

Good.  First thing to try. =:^)

> but it stayed problematic and I was not able to mount it:
> 
> [  368.159594] BTRFS: error (device sdc) in __btrfs_free_extent:5755:
> errno=-5 IO failure
> [  368.159602] BTRFS: error (device sdc) in
> btrfs_run_delayed_refs:2713: errno=-5 IO failure
> [  368.165584] BTRFS warning (device sdc): Skipping commit of aborted
> transaction.
> [  368.165589] BTRFS: error (device sdc) in cleanup_transaction:1545:
> errno=-5 IO failure
> [  368.165787] BTRFS: error (device sdc) in
> open_ctree:2839: errno=-5 IO failure (Failed to recover log tree)
> [  368.227161] BTRFS: open_ctree failed

OK, there's several things to try based on that output...

> Now, if I tried to mount it manually with degraded option enabled:
> 
> # mount -t btrfs -o degraded /dev/sdb /mnt/sonst/
> mount: wrong fs type, bad option, bad superblock on /dev/sdb,
>        missing codepage or helper program, or other error
> 
>        In some cases useful info is found in syslog - try dmesg | tail
>        or so.

FWIW, the degraded option could be used if you didn't have both devices 
available, but the above dmesg got beyond that, so degraded isn't likely 
to help here.

> Now I run btrfsck with repair option enabled but still I cannot mount
> it.

That was a mistake, as you'd have known if you had read this list before 
you tried your btrfs test.  btrfsck --repair can fix some problems, but 
the code is rather new and not well tested and it can also make some 
problems it doesn't know about worse, so the recommendation is to try it 
last, after all other attempts to either fix the problem or simply 
recover the data have failed and the next step would be a mkfs, so you're 
not losing anything by trying it anyway.  Either that, or run it in 
repair mode (without --repair it's OK since it's read-only and thus can't 
do further damage) only after being told to do so by a dev who can read 
the output from the read-only run and other diagnostics and is thus 
relatively confident it will fix the problems without doing further 
damage.

> Here you can find the dmesg and btrfsck outputs:
> dmesg: http://pastebin.com/zsaKQ0h1
> btrfsck: http://pastebin.com/xva6uJwT
> 
> Please, help me! ;( Are there other options to investigate my RAID or to
> even temporarily mount it to get some data? What went wrong here? What
> can I do? Why is a simple crash making my RAID unusable? Can I use other
> tools for a recovery?

> Archlinux, linux-3.14-5, btrfs-progs-3.14-1

Good.  You're using current kernel and tools. =:^)

As hinted above, there are indeed additional tools to try, and there's a 
fair chance you can at least recover some/most of the data.  =:^)  Tho 
you didn't do yourself any favors running btrfsck --repair before trying 
them. =:^(

Please read the wiki and manpages before doing anything else so as to 
increase the chances of recovery without further damage, but there's the 
recovery mount option (which often works best with ro), and tools to 
bypass the log tree and to recover from previous tree roots, among other 
things.

wiki start page (suitable for memory or bookmarking):

https://btrfs.wiki.kernel.org

Here's the wiki's btrfsck page, which has a nice list of other things to 
try before you use it with --repair (and a link to the page of a list 
regular with further detail, too), but they will hopefully work afterward 
as well.  Given the log-tree error in your dmesg, the btrfs-zero-log tool 
might be useful.  But I'd definitely try mount -o ro,recovery first, and 
if that works, get everything to backup before trying anything else.

https://btrfs.wiki.kernel.org/index.php/Btrfsck

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PARTIALLY SOLVED] Btrfs RAID1 corrupted after crash
  2014-04-13 22:42 ` Duncan
@ 2014-04-14  7:12   ` Maximilian Bräutigam
  2014-04-14 11:02     ` Duncan
  0 siblings, 1 reply; 4+ messages in thread
From: Maximilian Bräutigam @ 2014-04-14  7:12 UTC (permalink / raw)
  To: linux-btrfs

Am 14.04.2014 00:42, schrieb Duncan:
> Maximilian Bräutigam posted on Sun, 13 Apr 2014 22:18:21 +0200 as
> excerpted:
> 
>> unfortunately, I am very very deperate and I highly appreciate any help.
>> One week ago, I move my entire system to btrfs to setup a RAID1. I
>> created the RAID between device /dev/sdb and /dev/sdc with no partition
>> table on normal HDDs. Everything was working smoothly until my computer
>> crashed and at reboot I was not able to mount the device (my home dir)
>> again and got the following messages:
> 
> You did your research before switching to a new filesystem and know that 
> (as the btrfs kernel config option implies, and as the mkfs.btrfs command 
> said at least last I used it, tho that was the v3.12 version) btrfs isn't 
> entirely stable yet, and that (even more than with fully stable 
> filesystems, where the general principle still applies) you should keep 
> tested-to-be-usable backups when running it, or by action if not words, 
> you're demonstrating that you really don't care about the data you place 
> on it and don't mind if it gets trashed, right?
> 
> Good.  Then you either have a backup and can simply mkfs from your rescue 
> method and restore from that backup, or you've demonstrated by your 
> actions that the data wasn't of any major value to you anyway. No big 
> deal either way! =:^)
> 
> In case you didn't, well, you still have a reasonably good chance at 
> recovery =:^), but regardless of whether it's recovered or not, do chalk 
> this up to a learning experience and do your research and have those 
> backups ready and tested next time, OK?
> 
> [snip dmesg output from first attempt to mount]
> 
>> So I cleared the cache with trying the mount option clear_cache
> 
> Good.  First thing to try. =:^)
> 
>> but it stayed problematic and I was not able to mount it:
>>
>> [  368.159594] BTRFS: error (device sdc) in __btrfs_free_extent:5755:
>> errno=-5 IO failure
>> [  368.159602] BTRFS: error (device sdc) in
>> btrfs_run_delayed_refs:2713: errno=-5 IO failure
>> [  368.165584] BTRFS warning (device sdc): Skipping commit of aborted
>> transaction.
>> [  368.165589] BTRFS: error (device sdc) in cleanup_transaction:1545:
>> errno=-5 IO failure
>> [  368.165787] BTRFS: error (device sdc) in
>> open_ctree:2839: errno=-5 IO failure (Failed to recover log tree)
>> [  368.227161] BTRFS: open_ctree failed
> 
> OK, there's several things to try based on that output...
> 
>> Now, if I tried to mount it manually with degraded option enabled:
>>
>> # mount -t btrfs -o degraded /dev/sdb /mnt/sonst/
>> mount: wrong fs type, bad option, bad superblock on /dev/sdb,
>>        missing codepage or helper program, or other error
>>
>>        In some cases useful info is found in syslog - try dmesg | tail
>>        or so.
> 
> FWIW, the degraded option could be used if you didn't have both devices 
> available, but the above dmesg got beyond that, so degraded isn't likely 
> to help here.
> 
> 
>> Now I run btrfsck with repair option enabled but still I cannot mount
>> it.
> 
> That was a mistake, as you'd have known if you had read this list before 
> you tried your btrfs test.  btrfsck --repair can fix some problems, but 
> the code is rather new and not well tested and it can also make some 
> problems it doesn't know about worse, so the recommendation is to try it 
> last, after all other attempts to either fix the problem or simply 
> recover the data have failed and the next step would be a mkfs, so you're 
> not losing anything by trying it anyway.  Either that, or run it in 
> repair mode (without --repair it's OK since it's read-only and thus can't 
> do further damage) only after being told to do so by a dev who can read 
> the output from the read-only run and other diagnostics and is thus 
> relatively confident it will fix the problems without doing further 
> damage.
> 
>> Here you can find the dmesg and btrfsck outputs:
>> dmesg: http://pastebin.com/zsaKQ0h1
>> btrfsck: http://pastebin.com/xva6uJwT
>>
>> Please, help me! ;( Are there other options to investigate my RAID or to
>> even temporarily mount it to get some data? What went wrong here? What
>> can I do? Why is a simple crash making my RAID unusable? Can I use other
>> tools for a recovery?
> 
>> Archlinux, linux-3.14-5, btrfs-progs-3.14-1
> 
> Good.  You're using current kernel and tools. =:^)
> 
> As hinted above, there are indeed additional tools to try, and there's a 
> fair chance you can at least recover some/most of the data.  =:^)  Tho 
> you didn't do yourself any favors running btrfsck --repair before trying 
> them. =:^(
> 
> Please read the wiki and manpages before doing anything else so as to 
> increase the chances of recovery without further damage, but there's the 
> recovery mount option (which often works best with ro), and tools to 
> bypass the log tree and to recover from previous tree roots, among other 
> things.
> 
> wiki start page (suitable for memory or bookmarking):
> 
> https://btrfs.wiki.kernel.org
> 
> Here's the wiki's btrfsck page, which has a nice list of other things to 
> try before you use it with --repair (and a link to the page of a list 
> regular with further detail, too), but they will hopefully work afterward 
> as well.  Given the log-tree error in your dmesg, the btrfs-zero-log tool 
> might be useful.  But I'd definitely try mount -o ro,recovery first, and 
> if that works, get everything to backup before trying anything else.
> 
> https://btrfs.wiki.kernel.org/index.php/Btrfsck
> 

Hi Duncan,

I was not really afraid of my data since I have several external backups
of the important data or git repos of what I do for work. But I would
have lost some very recent photos, which would have not been nice. And I
am (still) afraid of setting up/configure a properly working home dir on
another fs again. This is just time consuming. Furthermore, I thought
that btrfs has reached a certain level of maturity and this means some
fail safety for me. But "filesystem disk format is no longer unstable"
[1] does obviously not mean that there is an intact ecosystem of repair
tools (or better said one program that simply tries its best).

I tried several things according to [2].

1) btrfs restore
Was not really working, only a few GB of my data.

2) then I realised some "transid verify failed", so I did a
btrfs-zero-log DEVICE

3) From here I was able to mount my volume again – so I could save my
latest photos. ;)

When I mount my volume with autodefrag,compress=lzo,subvolid=0, I end up
with a "rw" mounted device. Then I copy some data with e.g. rsync and it
turns to "ro" on some point. I found this while I wanted to scrub the
devices, but this is naturally only working for writable mounts. And it
is still – I don't know why – not possible to boot from the device again.

Things to do next: try again with recovery option. If this is not
working: roll back to ext4. But I really like the idea behind COW,
subvolumes, no partitioning, RAID and everything in one fs. Snapshots
against user mistakes, RAID against disk failure – perfectly save, if
there was not the fs itself.

So far, so good. The problem is, that even if I can come back to a fully
working device or RAID again, the work load (that I have to put in just
because my computer crashed) is much to high for something profound like
a home dir.

Duncan, I appreciate your email. Unfortunately, the only thing I learned
to far is to give btrfs some more decades to age. ;)

Best wishes and thanks again,
Max

[1] https://btrfs.wiki.kernel.org/index.php/Main_Page
[2] https://unix.stackexchange.com/questions/32440/how-do-i-fix-btrfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PARTIALLY SOLVED] Btrfs RAID1 corrupted after crash
  2014-04-14  7:12   ` [PARTIALLY SOLVED] " Maximilian Bräutigam
@ 2014-04-14 11:02     ` Duncan
  0 siblings, 0 replies; 4+ messages in thread
From: Duncan @ 2014-04-14 11:02 UTC (permalink / raw)
  To: linux-btrfs

Maximilian Bräutigam posted on Mon, 14 Apr 2014 09:12:44 +0200 as
excerpted:

> Duncan, I appreciate your email. Unfortunately, the only thing I learned
> to far is to give btrfs some more decades to age. ;)

Well maybe not decades, but a year or possibly two, or if you're 
conservative and haven't even switched from ext3 to ext4 yet, perhaps 
five...

Tho it's certainly getting better.  But there's certainly still some sore 
spots left, and as you said, the potential workload's still a bit high 
for people who just want it to work.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-14 11:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-13 20:18 Btrfs RAID1 corrupted after crash Maximilian Bräutigam
2014-04-13 22:42 ` Duncan
2014-04-14  7:12   ` [PARTIALLY SOLVED] " Maximilian Bräutigam
2014-04-14 11:02     ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).