* BTRFS RAID 1 broken: Mounted drive(s) basically empty after repair attempt
@ 2016-05-17 14:00 Quanttek Jonas
2016-05-18 7:38 ` Duncan
0 siblings, 1 reply; 2+ messages in thread
From: Quanttek Jonas @ 2016-05-17 14:00 UTC (permalink / raw)
To: linux-btrfs
Hey there,
first up, I'll provide the basic information:
(I'm using Fedora 23 on my system with newest available kernel etc,
but I'm currently booted up with a Live-USB-Stick)
$ uname -a
Linux localhost 4.2.3-300.fc23.x86_64 #1 SMP Mon Oct 5 15:42:54
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ btrfs --version
btrfs-progs v4.2.2
$ sudo btrfs fi show
Label: 'Fedora' uuid: 4ba9cb30-37d0-493d-8e52-59b69ec8d6ba
Total devices 1 FS bytes used 27.23GiB
devid 1 size 138.47GiB used 65.01GiB path /dev/sda5
Label: 'Data' uuid: a4cddadf-0a26-4eca-ba7f-7df179152247
Total devices 2 FS bytes used 1.49TiB
devid 1 size 1.82TiB used 1.52TiB path /dev/sdc
devid 2 size 1.82TiB used 1.52TiB path /dev/sdd
$ sudo btrfs fi df /mnt/sdc
Data, RAID1: total=1.52TiB, used=1.49TiB
Data, single: total=8.00MiB, used=0.00B
System, RAID1: total=8.00MiB, used=240.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=3.00GiB, used=2.05GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B
FYI GParted reports usage with 762.88 GiB of a 1.82 TiB disk, which
seems to be the correct values if I recall them correctly
dmesg.log: http://paste.fedoraproject.org/366440/32506321/raw/. I
should note that I had multiple stack traces
and errors, with soft lockups of the CPU cores (like here:
http://ubuntuforums.org/showthread.php?t=2205211) and "task
btrfs-transacti blocked for more than 120 seconds" and wasn't able to
mount the system, but that changed (mainly because I probably tried
some stupid stuff).
___________________
So to start with the "story": I locked the screen for a few minutes
and when I came back my system was frozen, so I used the reset-button,
but I wasn't able to boot up (took forever). So I forced systemd
emergency mode and checked the logs, which stated that systemd was
unable to mount /home. So I tried to mount it, which took forever and
I was unable to cancel it, so I forcefully restarted. Same happened
using the "recovery" option. Looking into the logs, the errors laid
out above appeared.
I restarted and used "btrfsck /dev/sdc": Error output was relatively
strong (unlike the current one linked below as "btrfsck output.txt")
talking about "Errors found in extent allocation tree or chunk
allocation", "cache appears valid but isnt" and "total csum bytes: 0"
IIRC (unfortunately I didn't save the output and only vaguely
remember).
So I did some googling and using "btrfsck --repair" got recommended,
which I did (and later learned, that this was a big mistake). I got
thousands of lines along the kind shown in "btrfsck output.txt", just
with a little "Trying to rebuild inode" added at the beginning of each
line. I waited multiple hours, but it never finished.
Interestingly after that I was able to mount the filesystem, but only
the root dirs existed (Music, Downloads, Videos etc) and they were all
empty. After that I tried mounting each drive with "-o degrading, ro,
recovery", but still same problem. Basically I followed the quoted
comment after that:
https://bbs.archlinux.org/viewtopic.php?pid=1321989#p1321989 (though I
didn't use "btrfsck --repair" a second time). Interestingly the
"btrfsck /dev/sdc" output was relatively short, with similiar output
like in the beginning. Trying different superblocks with "btrfsck -s #
/dev/sdc" didn't change the existing errors however.
Following this http://askubuntu.com/questions/157917/how-do-i-recover-a-btrfs-partition-that-will-not-mount
I tried mounting with "-o recovery,nospace_cache,clear_cache", which
brought back the long btrfsck output. This is also the log attached.
At this point I also created a btrfs-image (following the
bbs.archlinux.org post), which is only 1.6 GB big and a "real" image
using dd.
I cleared the logs with "btrfs-zero-log" and tried "btrfs check
--init-extent-tree" and then btrfs restore with basically no results,
so I restored the dd image. Here I also tried btrfs restore, but I
only got hundreds of "parent transid verify failed on", with "extent
buffer leak: start 31571968 len 16384" at the end. It also is only
able to recover the root folders (like ~/Downloads). Also using
"btrfs-find-root", left me with this:
http://paste.fedoraproject.org/366427/24974814/. And using btrfs check
now, mostly results in those "parent transid verify failed" errors
___________________
So, the question is: How can I recover from this? How do I get my data
back, after foolishly using "btrfsck --repair"?
Thanks for any help in advance!
___________________
"btrfsck output.txt": http://paste.ubuntu.com/16382852/
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: BTRFS RAID 1 broken: Mounted drive(s) basically empty after repair attempt
2016-05-17 14:00 BTRFS RAID 1 broken: Mounted drive(s) basically empty after repair attempt Quanttek Jonas
@ 2016-05-18 7:38 ` Duncan
0 siblings, 0 replies; 2+ messages in thread
From: Duncan @ 2016-05-18 7:38 UTC (permalink / raw)
To: linux-btrfs
Quanttek Jonas posted on Tue, 17 May 2016 10:00:41 -0400 as excerpted:
> So, the question is: How can I recover from this? How do I get my data
> back, after foolishly using "btrfsck --repair"?
First, let me note that I'm a list regular and btrfs user, not a dev, and
that as such, much of your post was beyond my tech understanding level.
Thus I snipped it above. For a technical take perhaps one of the devs
will help, and other user and general dev but not btrfs dev will likely
post their thoughts as well.
But here I'd probably declare the filesystem beyond full repair and focus
on getting any files off it I could using the below described method, and
restoring what I couldn't get from the damaged filesystem from backup
It's worth pausing to note that this point the sysadmin's rule of
backups, which in simplest form simply states that if you don't have at
least one level of backup, you are by choosing not to do that backup,
defining your data as worth less than the trouble and resources necessary
to do that backup. Thus, by definition, you /always/ save what was of
most importance to you, either the data, if you decided it was worth
making that backup, or if by your actions you defined the time and
resources that would otherwise be spent in making that backup as more
valuable than the data, then you saved your valuable time and resource,
even if you lost what you had defined to be of lower value, that being
your data.
And that rule applies in normal conditions, using fully mature and long-
term stable filesystems such as ext3/4, xfs, or (the one I still use on
my spinning rust, I only use btrfs on my ssds) reiserfs. Btrfs, while
stabilizing, is not yet fully stable and mature, definitely not to the
level of the above filesystems, so the rule applies even more strongly
there (a less simple form of the rule takes into account varying levels
of risk and varying data value, along with multiple levels of backup, 100
levels of backup with some offsite in other locations may not be enough
for extremely high value data).
So I'll assume that much like me you keep backups where the data is
valuable enough to warrant it, but you may not always have /current/
backups, because the value of the data in the delta between the last
backup and current simply doesn't warrant the hassle of refreshing the
backup, yet, given the limited risk of /future/ loss. However, once the
potential loss happens, the question changes. Now it's a matter of
whether the hassle of further recovery efforts is justified, vs. the
known loss of the data in that delta between the last backup and the last
"good" state before things started going bad.
As it happens, btrfs has this really useful tool called btrfs restore,
that can often help you recover your data at very close to the last good
state, or at least to a state beyond that of your last backup. It has
certainly helped me recover this from-last-backup-delta data a couple
times here, allowing me to use it instead of having to fall back to the
older and more stale backup. One nice thing about btrfs restore is that
it's read-only with respect to the damaged filesystem, so you can safely
use it on a filesystem to restore what you can, before trying more
dangerous things that might cause even more damage. Since it's a purely
read-only operation, it won't cause further damage. =:^)
There's a page on the wiki that describes this process in more detail,
but be aware, once you get beyond where automatic mode can help and you
have to try manual, it gets quite technical, and a lot of folks find they
need some additional help from a human, beyond the wiki.
Before I link the wiki page, here's an introduction...
Btrfs restore works on the /unmounted/ filesystem, writing any files it
recovers to some other filesystem, which of course means that you need
enough space on that other filesystem to store whatever you wish to
recover. By default it will write them as root, using root's umask, with
current timestamps, and will skip writing symlinks or restoring extended
attributes, but there are options that will restore ownership/perms/
timestamps, extended attributes, and symlinks, if desired.
Normally, btrfs restore will use a mechanism similar to the recovery
mount option to try to find a copy of the root tree of the filesystem
within a few commits (which are 30-seconds apart by default) of what the
superblocks say is current.
If that works, great. If not, you have to use a much more manual mode,
telling btrfs restore what root to try, while using btrfs-find-root to
find older roots (by generation, aka transid), then feeding the addresses
found to btrfs restore -t, first with the -l option to list the other
trees available from that root, then if it finds all the critical trees,
using it with --dry-run to see if it seems to find most of the expected
files, before trying the real restore if things look good.
With that, here's the wiki page link. Try the normal mode first. If it
fails and you need further help with the advanced usage stuff, you can
ask more questions then.
https://btrfs.wiki.kernel.org/index.php/Restore
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-05-18 7:38 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-17 14:00 BTRFS RAID 1 broken: Mounted drive(s) basically empty after repair attempt Quanttek Jonas
2016-05-18 7:38 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).