* Btrfs RAID1 corrupted after crash
@ 2014-04-13 20:18 Maximilian Bräutigam
2014-04-13 22:42 ` Duncan
0 siblings, 1 reply; 4+ messages in thread
From: Maximilian Bräutigam @ 2014-04-13 20:18 UTC (permalink / raw)
To: linux-btrfs
Dear all,
unfortunately, I am very very deperate and I highly appreciate any help.
One week ago, I move my entire system to btrfs to setup a RAID1. I
created the RAID between device /dev/sdb and /dev/sdc with no
partition table on normal HDDs. Everything was working smoothly until
my computer crashed and at reboot I was not able to mount the device
(my home dir) again and got the following messages:
[ 125.834802] BTRFS info (device sdc): disk space caching is enabled
[ 130.600101] BTRFS error (device sdc): block group 1268688879616 has
wrong amount of free space
[ 130.600113] BTRFS error (device sdc): failed to load free space
cache for block group 1268688879616
[ 130.751274] BTRFS critical (device sdc): corrupt leaf, slot offset
bad: block=1268477591552,root=1, slot=137
[ 130.751659] BTRFS critical (device sdc): corrupt leaf, slot offset
bad: block=1268477591552,root=1, slot=137
So I cleared the cache with trying the mount option clear_cache, but
it stayed problematic and I was not able to mount it:
[ 368.159594] BTRFS: error (device sdc) in __btrfs_free_extent:5755:
errno=-5 IO failure
[ 368.159602] BTRFS: error (device sdc) in
btrfs_run_delayed_refs:2713: errno=-5 IO failure
[ 368.165584] BTRFS warning (device sdc): Skipping commit of aborted
transaction.
[ 368.165589] BTRFS: error (device sdc) in cleanup_transaction:1545:
errno=-5 IO failure
[ 368.165787] BTRFS: error (device sdc) in open_ctree:2839: errno=-5
IO failure (Failed to recover log tree)
[ 368.227161] BTRFS: open_ctree failed
Now, if I tried to mount it manually with degraded option enabled:
# mount -t btrfs -o degraded /dev/sdb /mnt/sonst/
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
Now I run btrfsck with repair option enabled but still I cannot mount it.
Here you can find the dmesg and btrfsck outputs:
dmesg: http://pastebin.com/zsaKQ0h1
btrfsck: http://pastebin.com/xva6uJwT
Please, help me! ;( Are there other options to investigate my RAID or
to even temporarily mount it to get some data? What went wrong here?
What can I do? Why is a simple crash making my RAID unusable? Can I
use other tools for a recovery?
Again, every help is highly appreciated.
Best wishes,
Max
PS: Archlinux, linux-3.14-5, btrfs-progs-3.14-1
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Btrfs RAID1 corrupted after crash 2014-04-13 20:18 Btrfs RAID1 corrupted after crash Maximilian Bräutigam @ 2014-04-13 22:42 ` Duncan 2014-04-14 7:12 ` [PARTIALLY SOLVED] " Maximilian Bräutigam 0 siblings, 1 reply; 4+ messages in thread From: Duncan @ 2014-04-13 22:42 UTC (permalink / raw) To: linux-btrfs Maximilian Bräutigam posted on Sun, 13 Apr 2014 22:18:21 +0200 as excerpted: > unfortunately, I am very very deperate and I highly appreciate any help. > One week ago, I move my entire system to btrfs to setup a RAID1. I > created the RAID between device /dev/sdb and /dev/sdc with no partition > table on normal HDDs. Everything was working smoothly until my computer > crashed and at reboot I was not able to mount the device (my home dir) > again and got the following messages: You did your research before switching to a new filesystem and know that (as the btrfs kernel config option implies, and as the mkfs.btrfs command said at least last I used it, tho that was the v3.12 version) btrfs isn't entirely stable yet, and that (even more than with fully stable filesystems, where the general principle still applies) you should keep tested-to-be-usable backups when running it, or by action if not words, you're demonstrating that you really don't care about the data you place on it and don't mind if it gets trashed, right? Good. Then you either have a backup and can simply mkfs from your rescue method and restore from that backup, or you've demonstrated by your actions that the data wasn't of any major value to you anyway. No big deal either way! =:^) In case you didn't, well, you still have a reasonably good chance at recovery =:^), but regardless of whether it's recovered or not, do chalk this up to a learning experience and do your research and have those backups ready and tested next time, OK? [snip dmesg output from first attempt to mount] > So I cleared the cache with trying the mount option clear_cache Good. First thing to try. =:^) > but it stayed problematic and I was not able to mount it: > > [ 368.159594] BTRFS: error (device sdc) in __btrfs_free_extent:5755: > errno=-5 IO failure > [ 368.159602] BTRFS: error (device sdc) in > btrfs_run_delayed_refs:2713: errno=-5 IO failure > [ 368.165584] BTRFS warning (device sdc): Skipping commit of aborted > transaction. > [ 368.165589] BTRFS: error (device sdc) in cleanup_transaction:1545: > errno=-5 IO failure > [ 368.165787] BTRFS: error (device sdc) in > open_ctree:2839: errno=-5 IO failure (Failed to recover log tree) > [ 368.227161] BTRFS: open_ctree failed OK, there's several things to try based on that output... > Now, if I tried to mount it manually with degraded option enabled: > > # mount -t btrfs -o degraded /dev/sdb /mnt/sonst/ > mount: wrong fs type, bad option, bad superblock on /dev/sdb, > missing codepage or helper program, or other error > > In some cases useful info is found in syslog - try dmesg | tail > or so. FWIW, the degraded option could be used if you didn't have both devices available, but the above dmesg got beyond that, so degraded isn't likely to help here. > Now I run btrfsck with repair option enabled but still I cannot mount > it. That was a mistake, as you'd have known if you had read this list before you tried your btrfs test. btrfsck --repair can fix some problems, but the code is rather new and not well tested and it can also make some problems it doesn't know about worse, so the recommendation is to try it last, after all other attempts to either fix the problem or simply recover the data have failed and the next step would be a mkfs, so you're not losing anything by trying it anyway. Either that, or run it in repair mode (without --repair it's OK since it's read-only and thus can't do further damage) only after being told to do so by a dev who can read the output from the read-only run and other diagnostics and is thus relatively confident it will fix the problems without doing further damage. > Here you can find the dmesg and btrfsck outputs: > dmesg: http://pastebin.com/zsaKQ0h1 > btrfsck: http://pastebin.com/xva6uJwT > > Please, help me! ;( Are there other options to investigate my RAID or to > even temporarily mount it to get some data? What went wrong here? What > can I do? Why is a simple crash making my RAID unusable? Can I use other > tools for a recovery? > Archlinux, linux-3.14-5, btrfs-progs-3.14-1 Good. You're using current kernel and tools. =:^) As hinted above, there are indeed additional tools to try, and there's a fair chance you can at least recover some/most of the data. =:^) Tho you didn't do yourself any favors running btrfsck --repair before trying them. =:^( Please read the wiki and manpages before doing anything else so as to increase the chances of recovery without further damage, but there's the recovery mount option (which often works best with ro), and tools to bypass the log tree and to recover from previous tree roots, among other things. wiki start page (suitable for memory or bookmarking): https://btrfs.wiki.kernel.org Here's the wiki's btrfsck page, which has a nice list of other things to try before you use it with --repair (and a link to the page of a list regular with further detail, too), but they will hopefully work afterward as well. Given the log-tree error in your dmesg, the btrfs-zero-log tool might be useful. But I'd definitely try mount -o ro,recovery first, and if that works, get everything to backup before trying anything else. https://btrfs.wiki.kernel.org/index.php/Btrfsck -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PARTIALLY SOLVED] Btrfs RAID1 corrupted after crash 2014-04-13 22:42 ` Duncan @ 2014-04-14 7:12 ` Maximilian Bräutigam 2014-04-14 11:02 ` Duncan 0 siblings, 1 reply; 4+ messages in thread From: Maximilian Bräutigam @ 2014-04-14 7:12 UTC (permalink / raw) To: linux-btrfs Am 14.04.2014 00:42, schrieb Duncan: > Maximilian Bräutigam posted on Sun, 13 Apr 2014 22:18:21 +0200 as > excerpted: > >> unfortunately, I am very very deperate and I highly appreciate any help. >> One week ago, I move my entire system to btrfs to setup a RAID1. I >> created the RAID between device /dev/sdb and /dev/sdc with no partition >> table on normal HDDs. Everything was working smoothly until my computer >> crashed and at reboot I was not able to mount the device (my home dir) >> again and got the following messages: > > You did your research before switching to a new filesystem and know that > (as the btrfs kernel config option implies, and as the mkfs.btrfs command > said at least last I used it, tho that was the v3.12 version) btrfs isn't > entirely stable yet, and that (even more than with fully stable > filesystems, where the general principle still applies) you should keep > tested-to-be-usable backups when running it, or by action if not words, > you're demonstrating that you really don't care about the data you place > on it and don't mind if it gets trashed, right? > > Good. Then you either have a backup and can simply mkfs from your rescue > method and restore from that backup, or you've demonstrated by your > actions that the data wasn't of any major value to you anyway. No big > deal either way! =:^) > > In case you didn't, well, you still have a reasonably good chance at > recovery =:^), but regardless of whether it's recovered or not, do chalk > this up to a learning experience and do your research and have those > backups ready and tested next time, OK? > > [snip dmesg output from first attempt to mount] > >> So I cleared the cache with trying the mount option clear_cache > > Good. First thing to try. =:^) > >> but it stayed problematic and I was not able to mount it: >> >> [ 368.159594] BTRFS: error (device sdc) in __btrfs_free_extent:5755: >> errno=-5 IO failure >> [ 368.159602] BTRFS: error (device sdc) in >> btrfs_run_delayed_refs:2713: errno=-5 IO failure >> [ 368.165584] BTRFS warning (device sdc): Skipping commit of aborted >> transaction. >> [ 368.165589] BTRFS: error (device sdc) in cleanup_transaction:1545: >> errno=-5 IO failure >> [ 368.165787] BTRFS: error (device sdc) in >> open_ctree:2839: errno=-5 IO failure (Failed to recover log tree) >> [ 368.227161] BTRFS: open_ctree failed > > OK, there's several things to try based on that output... > >> Now, if I tried to mount it manually with degraded option enabled: >> >> # mount -t btrfs -o degraded /dev/sdb /mnt/sonst/ >> mount: wrong fs type, bad option, bad superblock on /dev/sdb, >> missing codepage or helper program, or other error >> >> In some cases useful info is found in syslog - try dmesg | tail >> or so. > > FWIW, the degraded option could be used if you didn't have both devices > available, but the above dmesg got beyond that, so degraded isn't likely > to help here. > > >> Now I run btrfsck with repair option enabled but still I cannot mount >> it. > > That was a mistake, as you'd have known if you had read this list before > you tried your btrfs test. btrfsck --repair can fix some problems, but > the code is rather new and not well tested and it can also make some > problems it doesn't know about worse, so the recommendation is to try it > last, after all other attempts to either fix the problem or simply > recover the data have failed and the next step would be a mkfs, so you're > not losing anything by trying it anyway. Either that, or run it in > repair mode (without --repair it's OK since it's read-only and thus can't > do further damage) only after being told to do so by a dev who can read > the output from the read-only run and other diagnostics and is thus > relatively confident it will fix the problems without doing further > damage. > >> Here you can find the dmesg and btrfsck outputs: >> dmesg: http://pastebin.com/zsaKQ0h1 >> btrfsck: http://pastebin.com/xva6uJwT >> >> Please, help me! ;( Are there other options to investigate my RAID or to >> even temporarily mount it to get some data? What went wrong here? What >> can I do? Why is a simple crash making my RAID unusable? Can I use other >> tools for a recovery? > >> Archlinux, linux-3.14-5, btrfs-progs-3.14-1 > > Good. You're using current kernel and tools. =:^) > > As hinted above, there are indeed additional tools to try, and there's a > fair chance you can at least recover some/most of the data. =:^) Tho > you didn't do yourself any favors running btrfsck --repair before trying > them. =:^( > > Please read the wiki and manpages before doing anything else so as to > increase the chances of recovery without further damage, but there's the > recovery mount option (which often works best with ro), and tools to > bypass the log tree and to recover from previous tree roots, among other > things. > > wiki start page (suitable for memory or bookmarking): > > https://btrfs.wiki.kernel.org > > Here's the wiki's btrfsck page, which has a nice list of other things to > try before you use it with --repair (and a link to the page of a list > regular with further detail, too), but they will hopefully work afterward > as well. Given the log-tree error in your dmesg, the btrfs-zero-log tool > might be useful. But I'd definitely try mount -o ro,recovery first, and > if that works, get everything to backup before trying anything else. > > https://btrfs.wiki.kernel.org/index.php/Btrfsck > Hi Duncan, I was not really afraid of my data since I have several external backups of the important data or git repos of what I do for work. But I would have lost some very recent photos, which would have not been nice. And I am (still) afraid of setting up/configure a properly working home dir on another fs again. This is just time consuming. Furthermore, I thought that btrfs has reached a certain level of maturity and this means some fail safety for me. But "filesystem disk format is no longer unstable" [1] does obviously not mean that there is an intact ecosystem of repair tools (or better said one program that simply tries its best). I tried several things according to [2]. 1) btrfs restore Was not really working, only a few GB of my data. 2) then I realised some "transid verify failed", so I did a btrfs-zero-log DEVICE 3) From here I was able to mount my volume again – so I could save my latest photos. ;) When I mount my volume with autodefrag,compress=lzo,subvolid=0, I end up with a "rw" mounted device. Then I copy some data with e.g. rsync and it turns to "ro" on some point. I found this while I wanted to scrub the devices, but this is naturally only working for writable mounts. And it is still – I don't know why – not possible to boot from the device again. Things to do next: try again with recovery option. If this is not working: roll back to ext4. But I really like the idea behind COW, subvolumes, no partitioning, RAID and everything in one fs. Snapshots against user mistakes, RAID against disk failure – perfectly save, if there was not the fs itself. So far, so good. The problem is, that even if I can come back to a fully working device or RAID again, the work load (that I have to put in just because my computer crashed) is much to high for something profound like a home dir. Duncan, I appreciate your email. Unfortunately, the only thing I learned to far is to give btrfs some more decades to age. ;) Best wishes and thanks again, Max [1] https://btrfs.wiki.kernel.org/index.php/Main_Page [2] https://unix.stackexchange.com/questions/32440/how-do-i-fix-btrfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PARTIALLY SOLVED] Btrfs RAID1 corrupted after crash 2014-04-14 7:12 ` [PARTIALLY SOLVED] " Maximilian Bräutigam @ 2014-04-14 11:02 ` Duncan 0 siblings, 0 replies; 4+ messages in thread From: Duncan @ 2014-04-14 11:02 UTC (permalink / raw) To: linux-btrfs Maximilian Bräutigam posted on Mon, 14 Apr 2014 09:12:44 +0200 as excerpted: > Duncan, I appreciate your email. Unfortunately, the only thing I learned > to far is to give btrfs some more decades to age. ;) Well maybe not decades, but a year or possibly two, or if you're conservative and haven't even switched from ext3 to ext4 yet, perhaps five... Tho it's certainly getting better. But there's certainly still some sore spots left, and as you said, the potential workload's still a bit high for people who just want it to work. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-04-14 11:02 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-13 20:18 Btrfs RAID1 corrupted after crash Maximilian Bräutigam 2014-04-13 22:42 ` Duncan 2014-04-14 7:12 ` [PARTIALLY SOLVED] " Maximilian Bräutigam 2014-04-14 11:02 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).