BTRFS RAID 1 broken: Mounted drive(s) basically empty after repair attempt

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* BTRFS RAID 1 broken: Mounted drive(s) basically empty after repair attempt
@ 2016-05-17 14:00 Quanttek Jonas
  2016-05-18  7:38 ` Duncan
  0 siblings, 1 reply; 2+ messages in thread
From: Quanttek Jonas @ 2016-05-17 14:00 UTC (permalink / raw)
  To: linux-btrfs

Hey there,

first up, I'll provide the basic information:

(I'm using Fedora 23 on my system with newest available kernel etc,
but I'm currently booted up with a Live-USB-Stick)

    $ uname -a
    Linux localhost 4.2.3-300.fc23.x86_64 #1 SMP Mon Oct 5 15:42:54
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

    $ btrfs --version
    btrfs-progs v4.2.2

    $ sudo btrfs fi show
    Label: 'Fedora'  uuid: 4ba9cb30-37d0-493d-8e52-59b69ec8d6ba
        Total devices 1 FS bytes used 27.23GiB
        devid    1 size 138.47GiB used 65.01GiB path /dev/sda5

    Label: 'Data'  uuid: a4cddadf-0a26-4eca-ba7f-7df179152247
        Total devices 2 FS bytes used 1.49TiB
        devid    1 size 1.82TiB used 1.52TiB path /dev/sdc
        devid    2 size 1.82TiB used 1.52TiB path /dev/sdd

    $ sudo btrfs fi df /mnt/sdc
    Data, RAID1: total=1.52TiB, used=1.49TiB
    Data, single: total=8.00MiB, used=0.00B
    System, RAID1: total=8.00MiB, used=240.00KiB
    System, single: total=4.00MiB, used=0.00B
    Metadata, RAID1: total=3.00GiB, used=2.05GiB
    Metadata, single: total=8.00MiB, used=0.00B
    GlobalReserve, single: total=512.00MiB, used=0.00B

FYI GParted reports usage with 762.88 GiB of a 1.82 TiB disk, which
seems to be the correct values if I recall them correctly

dmesg.log: http://paste.fedoraproject.org/366440/32506321/raw/. I
should note that I had multiple stack traces
and errors, with soft lockups of the CPU cores (like here:
http://ubuntuforums.org/showthread.php?t=2205211) and "task
btrfs-transacti blocked for more than 120 seconds" and wasn't able to
mount the system, but that changed (mainly because I probably tried
some stupid stuff).

___________________

So to start with the "story": I locked the screen for a few minutes
and when I came back my system was frozen, so I used the reset-button,
but I wasn't able to boot up (took forever). So I forced systemd
emergency mode and checked the logs, which stated that systemd was
unable to mount /home. So I tried to mount it, which took forever and
I was unable to cancel it, so I forcefully restarted. Same happened
using the "recovery" option. Looking into the logs, the errors laid
out above appeared.

I restarted and used "btrfsck /dev/sdc": Error output was relatively
strong (unlike the current one linked below as "btrfsck output.txt")
talking about "Errors found in extent allocation tree or chunk
allocation", "cache appears valid but isnt" and "total csum bytes: 0"
IIRC (unfortunately I didn't save the output and only vaguely
remember).

So I did some googling and using "btrfsck --repair" got recommended,
which I did (and later learned, that this was a big mistake). I got
thousands of lines along the kind shown in "btrfsck output.txt", just
with a little "Trying to rebuild inode" added at the beginning of each
line. I waited multiple hours, but it never finished.

Interestingly after that I was able to mount the filesystem, but only
the root dirs existed (Music, Downloads, Videos etc) and they were all
empty. After that I tried mounting each drive with "-o degrading, ro,
recovery", but still same problem. Basically I followed the quoted
comment after that:
https://bbs.archlinux.org/viewtopic.php?pid=1321989#p1321989 (though I
didn't use "btrfsck --repair" a second time). Interestingly the
"btrfsck /dev/sdc" output was relatively short, with similiar output
like in the beginning. Trying different superblocks with "btrfsck -s #
/dev/sdc" didn't change the existing errors however.

Following this http://askubuntu.com/questions/157917/how-do-i-recover-a-btrfs-partition-that-will-not-mount
I tried mounting with "-o recovery,nospace_cache,clear_cache", which
brought back the long btrfsck output. This is also the log attached.
At this point I also created a btrfs-image (following the
bbs.archlinux.org post), which is only 1.6 GB big and a "real" image
using dd.

I cleared the logs with "btrfs-zero-log" and tried "btrfs check
--init-extent-tree" and then btrfs restore with basically no results,
so I restored the dd image. Here I also tried btrfs restore, but I
only got hundreds of "parent transid verify failed on", with "extent
buffer leak: start 31571968 len 16384" at the end. It also is only
able to recover the root folders (like ~/Downloads). Also using
"btrfs-find-root", left me with this:
http://paste.fedoraproject.org/366427/24974814/. And using btrfs check
now, mostly results in those "parent transid verify failed" errors

___________________

So, the question is: How can I recover from this? How do I get my data
back, after foolishly using "btrfsck --repair"?

Thanks for any help in advance!

___________________

"btrfsck output.txt": http://paste.ubuntu.com/16382852/

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: BTRFS RAID 1 broken: Mounted drive(s) basically empty after repair attempt
  2016-05-17 14:00 BTRFS RAID 1 broken: Mounted drive(s) basically empty after repair attempt Quanttek Jonas
@ 2016-05-18  7:38 ` Duncan
  0 siblings, 0 replies; 2+ messages in thread
From: Duncan @ 2016-05-18  7:38 UTC (permalink / raw)
  To: linux-btrfs

Quanttek Jonas posted on Tue, 17 May 2016 10:00:41 -0400 as excerpted:

> So, the question is: How can I recover from this? How do I get my data
> back, after foolishly using "btrfsck --repair"?

First, let me note that I'm a list regular and btrfs user, not a dev, and 
that as such, much of your post was beyond my tech understanding level.  
Thus I snipped it above.  For a technical take perhaps one of the devs 
will help, and other user and general dev but not btrfs dev will likely 
post their thoughts as well.

But here I'd probably declare the filesystem beyond full repair and focus 
on getting any files off it I could using the below described method, and 
restoring what I couldn't get from the damaged filesystem from backup

It's worth pausing to note that this point the sysadmin's rule of 
backups, which in simplest form simply states that if you don't have at 
least one level of backup, you are by choosing not to do that backup, 
defining your data as worth less than the trouble and resources necessary 
to do that backup.  Thus, by definition, you /always/ save what was of 
most importance to you, either the data, if you decided it was worth 
making that backup, or if by your actions you defined the time and 
resources that would otherwise be spent in making that backup as more 
valuable than the data, then you saved your valuable time and resource, 
even if you lost what you had defined to be of lower value, that being 
your data.

And that rule applies in normal conditions, using fully mature and long-
term stable filesystems such as ext3/4, xfs, or (the one I still use on 
my spinning rust, I only use btrfs on my ssds) reiserfs.  Btrfs, while 
stabilizing, is not yet fully stable and mature, definitely not to the 
level of the above filesystems, so the rule applies even more strongly 
there (a less simple form of the rule takes into account varying levels 
of risk and varying data value, along with multiple levels of backup, 100 
levels of backup with some offsite in other locations may not be enough 
for extremely high value data).

So I'll assume that much like me you keep backups where the data is 
valuable enough to warrant it, but you may not always have /current/ 
backups, because the value of the data in the delta between the last 
backup and current simply doesn't warrant the hassle of refreshing the 
backup, yet, given the limited risk of /future/ loss.  However, once the 
potential loss happens, the question changes.  Now it's a matter of 
whether the hassle of further recovery efforts is justified, vs. the 
known loss of the data in that delta between the last backup and the last 
"good" state before things started going bad.

As it happens, btrfs has this really useful tool called btrfs restore, 
that can often help you recover your data at very close to the last good 
state, or at least to a state beyond that of your last backup.  It has 
certainly helped me recover this from-last-backup-delta data a couple 
times here, allowing me to use it instead of having to fall back to the 
older and more stale backup.  One nice thing about btrfs restore is that 
it's read-only with respect to the damaged filesystem, so you can safely 
use it on a filesystem to restore what you can, before trying more 
dangerous things that might cause even more damage.  Since it's a purely 
read-only operation, it won't cause further damage. =:^)

There's a page on the wiki that describes this process in more detail, 
but be aware, once you get beyond where automatic mode can help and you 
have to try manual, it gets quite technical, and a lot of folks find they 
need some additional help from a human, beyond the wiki.

Before I link the wiki page, here's an introduction...

Btrfs restore works on the /unmounted/ filesystem, writing any files it 
recovers to some other filesystem, which of course means that you need 
enough space on that other filesystem to store whatever you wish to 
recover.  By default it will write them as root, using root's umask, with 
current timestamps, and will skip writing symlinks or restoring extended 
attributes, but there are options that will restore ownership/perms/
timestamps, extended attributes, and symlinks, if desired.

Normally, btrfs restore will use a mechanism similar to the recovery 
mount option to try to find a copy of the root tree of the filesystem 
within a few commits (which are 30-seconds apart by default) of what the 
superblocks say is current.

If that works, great.  If not, you have to use a much more manual mode, 
telling btrfs restore what root to try, while using btrfs-find-root to 
find older roots (by generation, aka transid), then feeding the addresses 
found to btrfs restore -t, first with the -l option to list the other 
trees available from that root, then if it finds all the critical trees, 
using it with --dry-run to see if it seems to find most of the expected 
files, before trying the real restore if things look good.

With that, here's the wiki page link.  Try the normal mode first.  If it 
fails and you need further help with the advanced usage stuff, you can 
ask more questions then.

https://btrfs.wiki.kernel.org/index.php/Restore

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-05-18  7:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-17 14:00 BTRFS RAID 1 broken: Mounted drive(s) basically empty after repair attempt Quanttek Jonas
2016-05-18  7:38 ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).