* Help needed when btrfs raid5 crashed
@ 2015-04-23 17:30 - -
2015-04-24 4:30 ` Duncan
0 siblings, 1 reply; 2+ messages in thread
From: - - @ 2015-04-23 17:30 UTC (permalink / raw)
To: linux-btrfs
Hello,
I had a 3 disk raid5 system with btrfs installed. Unfortunately one of the disks
crashed. Now I cannot mount the system any more, not even with the degraded
option. I suspect the failed disk to have a hw failure. I Think part of the
problem might be that I configured the system to not only have the data and
metadata, but also the system data in raid config. Is there any chance that I
might get my data back from the file system?
Currently the system does not boot any more. It was a debian testing system with
btrfs version 3.17. The kernel was originally 3.16.0-4-amd64, but now I also
have 3.19.0-trunk-amd64 installed.
When I run btrfs fi show, I get an error message:
Check tree block failed, want=<big number>, have=<another big number>
read block failed check_tree_block
Couldn' t read chunk root
warning, device 2 is missing
Sorry, I cannot copy/paste as the machine does not boot anymore.
Can anyone give me some help or can explain to me what other kind of info you
need? Thanks.
---
Kind regards
Felix
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Help needed when btrfs raid5 crashed
2015-04-23 17:30 Help needed when btrfs raid5 crashed - -
@ 2015-04-24 4:30 ` Duncan
0 siblings, 0 replies; 2+ messages in thread
From: Duncan @ 2015-04-24 4:30 UTC (permalink / raw)
To: linux-btrfs
- - posted on Thu, 23 Apr 2015 19:30:56 +0200 as excerpted:
> Hello,
>
> I had a 3 disk raid5 system with btrfs installed. Unfortunately one of
> the disks crashed. Now I cannot mount the system any more, not even with
> the degraded option. I suspect the failed disk to have a hw failure. I
> Think part of the problem might be that I configured the system to not
> only have the data and metadata, but also the system data in raid
> config. Is there any chance that I might get my data back from the file
> system?
>
> Currently the system does not boot any more. It was a debian testing
> system with btrfs version 3.17. The kernel was originally
> 3.16.0-4-amd64, but now I also have 3.19.0-trunk-amd64 installed.
>
> When I run btrfs fi show, I get an error message:
> Check tree block failed, want=<big number>, have=<another big number>
> read block failed check_tree_block Couldn' t read chunk root warning,
> device 2 is missing
>
> Sorry, I cannot copy/paste as the machine does not boot anymore.
>
> Can anyone give me some help or can explain to me what other kind of
> info you need? Thanks.
Full recovery support for btrfs raid5 is very *VERY* new. Kernel 3.19
was the first version that was supposed to have it at all, and due to the
newness, it could be expected to be buggy, so you should really have 4.0
and be prepared to upgrade kernels pretty quickly for a few releases
until the raid56 mode support matures a bit. Before 3.19, normal raid56
modes runtime was there, but support for recovery wasn't complete, so in
effect you were running a slow raid0, with effectively no available
protection against device failure at all (parity was calculated and
written, the runtime side, but the code to use it for recovery was
incomplete).
So first off, for btrfs raid5 recovery, forget kernels previous to 3.19
and preferably use 4.0. Second, try a similarly current userspace. I'm
not actually sure on userspace raid5 status, but 3.17 is certainly not
current userspace, and given the newness of raid5 recovery support, I'd
strongly recommend 3.19 or 3.19.1 (current as of two days ago at least)
userspace as well, just to be sure.
Beyond that... I'm running raid1 mode here and have only followed raid56
mode development at a certain distance, so my help will be limited.
However...
Third, I'm not sure if the wiki (https://btrfs.wiki.kernel.org) has been
well updated for raid56 or not, but the user-level guy with the most
testing and experience with it (pre-full-recovery-support, at least) is
Marc MERLIN, and there should be a link from the wiki's raid56 discussion
to his blog, which has FAR more detail, altho as I said, some of it may
be a bit dated now if he hasn't updated. But that's likely to be some of
the better help you can get.
Forth... those "big numbers" you mentioned are probably generation aka
transaction-id numbers. The generation/transid is a monotonically
increasing number bumped every time the root block is updated, which is
every 30 seconds (by default) if anything has changed on the btrfs. So
on an active btrfs around for any length of time, yes, it'll be a "big
number". But because it's monotonically increasing, the difference
between the wanted and have values can give you a hint at how bad the
situation is. If wanted is only a bit higher, the generations are fairly
close and the chances of recovery are reasonably good. If wanted is a
LOT higher, then you may well still be able to recover, but the number of
files that may revert to old copies is higher. If wanted is LOWER than
have, you probably hit the bug from a couple kernels ago that was
resetting generation. That's an entirely different situation with its
own recovery scenario.
Fifth, on the wiki there's a (somewhat dated last I looked) writeup on
using btrfs-find-root and btrfs restore, to try to recover files from an
unmounted filesystem, writing them to some other location as it finds
them. This tool doesn't write anything to the damaged btrfs, so unlike
other tools, has no chance of making the damage worse. You can use it to
pull files off the filesystem, if you don't have a current backup (which
you certainly should have had of btrfs raid5, given that before 3.19 it
was effectively btrfs raid0, if you placed any value on the data at all,
but unfortunately, people often learn about the importance of backups the
hard way). The general idea is that you find a good generation using
find-root, and then feed that to restore if the current generation isn't
usable, to get as current a valid version of your files as possible.
*BUT*, I'm not entirely sure of btrfs restore's ability to work with
raid5, that being so new. Hopefully it works and you're good, but...
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-04-24 4:30 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-23 17:30 Help needed when btrfs raid5 crashed - -
2015-04-24 4:30 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).