From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Problem with unmountable filesystem.
Date: Wed, 17 Sep 2014 07:23:46 -0400 [thread overview]
Message-ID: <54196F42.4030101@gmail.com> (raw)
In-Reply-To: <2A2CB71A-7516-43CD-94E1-BCB2198F5FC4@colorremedies.com>
[-- Attachment #1: Type: text/plain, Size: 7616 bytes --]
On 2014-09-16 16:57, Chris Murphy wrote:
>
> On Sep 16, 2014, at 8:40 AM, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote:
>
>> Based on the kernel messages, the primary issue is log corruption, and
>> in theory btrfs-zero-log should fix it.
>
> Can you provide a complete dmesg somewhere for this initial failure, just for reference? I'm curious what this indication looks like compared to other problems.
>
Okay, I can't really get a 'complete' dmesg, because the system panics
on the mount failure (the filesystem in question is the system's root
filesystem), the system has no serial ports, and I didn't think to
build in support for console on ttyUSB0. I can however get what the
recovery environment (locally compiled based on buildroot) shows when I
try to mount the filesystem:
[ 30.871036] BTRFS: device label gentoo devid 1 transid 160615 /dev/sda3
[ 30.875225] BTRFS info (device sda3): disk space caching is enabled
[ 30.917091] BTRFS: detected SSD devices, enabling SSD mode
[ 30.920536] BTRFS: bad tree block start 0 130402254848
[ 30.924018] BTRFS: bad tree block start 0 130402254848
[ 30.926234] BTRFS: failed to read log tree
[ 30.953055] BTRFS: open_ctree failed
>> The actual issue however, is
>> that the primary superblock appears to be pointing at a corrupted root
>> tree, which causes pretty much everything that does anything other than
>> just read the sb to fail. The first backup sb does point to a good
>> tree, but only btrfs check and btrfs restore have any option to ignore
>> the first sb and use one of the backups instead.
>
> Maybe use wipefs -a on this volume, which removes the magic from only the first superblock by default (you can specify another location). And then try btrfs-show-super -F which "dumps" supers with bad magic.
>
Thanks for the suggestion, I hadn't thought of that...
> I just tried this:
> # wipefs -a /dev/sdb
> /dev/sdb: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum 0x5c1196d7 [DON'T MATCH]
> bytenr 65536
> flags 0x1
> magic ........ [DON'T MATCH]
> […]
> # btrfs-show-super -i1 /dev/sdb
> superblock: bytenr=67108864, device=/dev/sdb
> ---------------------------------------------------------
> csum 0xfc70be19 [match]
> bytenr 67108864
> flags 0x1
> magic _BHRfS_M [match]
>
> So the mirror is definitely there and valid.
> # btrfs rescue super-recover -yv /dev/sdb
> No valid Btrfs found on /dev/sdb
> Usage or syntax errors
>
> Not expected at all, man page says "Recover bad superblocks from good copies." There's a good copy, it's not being found by btrfs rescue super-recover. Seems like a bug.
>
>
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
>
> # btrfs check -s1 /dev/sdb
> using SB copy 1, bytenr 67108864
> Checking filesystem on /dev/sdb
> UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
> [snip]
> OK it finds it, maybe a --repair will fix the bad first one?
> # btrfs check -s1 /dev/sdb
> using SB copy 1, bytenr 67108864
> enabling repair mode
> Checking filesystem on /dev/sdb
> UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
> [snip]
> No indication of repair
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> [root@f21v ~]# btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum 0x5c1196d7 [DON'T MATCH]
> bytenr 65536
> flags 0x1
> magic ........ [DON'T MATCH]
>
>
> Still not fixed. Maybe I needed to corrupt something else in the superblock other than the magic and this behavior is intentional, otherwise wipefs -a, followed by btrfsck would resurrect an intentionally wiped btrfs fs, potentially wiping out some newer file system in the process.
>
...though maybe it's a good thing I didn't.
>
>
>> I'm fine using dd to replace the primary sb with one of the
>> backups, but don't know the exact parameters that would be needed.
>
> Here's an idea:
>
> # btrfs-show-super /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum 0x92aa51ab [match]
> [snip]
> So I know what I'm looking for starts at LBA 65536/512
>
> # dd if=/dev/sdb skip=128 count=4 2>/dev/null | hexdump -C
> 00000000 92 aa 51 ab 00 00 00 00 00 00 00 00 00 00 00 00 |..Q…..........|
> [snip]
>
> And as it turns out the csum is right at the beginning, 4 bytes. So use bs of 4 bytes, seek 65536/4, count of 1. This should zero just 4 bytes starting at 65536 bytes in.
>
> # dd if=/dev/zero of=/dev/sdb bs=4 seek=16384 count=1
>
> Checked it with the earlier skip=128 command and it looks like everything else is intact.
>
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum 0x00000000 [DON'T MATCH]
> bytenr 65536
> flags 0x1
> magic _BHRfS_M [match]
> [snip]
> OK so the csum is bad, the magic is good. Now see if btrfs rescue super-recover does anything
> # btrfs rescue super-recover /dev/sdb
> Make sure this is a btrfs disk otherwise the tool will destroy other fs, Are you sure? [y/N]: Y
> Recovered bad superblocks successful
> *** Error in `btrfs': corrupted double-linked list: 0x0000000002289e40 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7a77e)[0x7f388663977e]
> /lib64/libc.so.6(+0x80b03)[0x7f388663fb03]
> /lib64/libc.so.6(+0x81c88)[0x7f3886640c88]
> /lib64/libc.so.6(cfree+0x4c)[0x7f38866456ec]
> btrfs[0x425ec6]
> btrfs[0x406902]
> /lib64/libc.so.6(__libc_start_main+0xf0)[0x7f38865df0e0]
> btrfs[0x406a04]
> ======= Memory m
> [snip]
>
> kaboom!
>
> But was it really successful?
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum 0x92aa51ab [match]
> [skip]
> Looks fixed. And it mounts.
>
> NOW, I didn't actually have my first superblock pointing to a corrupt root tree. So it's possible that while the csum was fixed in my case, that the subsequent crash has not properly copied all good parts of superblock1 to superblock0. *shrug*
>
> And since it crashes, looks like I found a bug.
>
>> I'm using btrfs-progs 3.16 and
>> kernel 3.16.1.
>
> So did I for all of the above.
>
>
Since posting this, I realized that the recovery environment I'm working from is actually btrfs-progs 3.14.1 and kernel 3.14.5, I need to make a point to update that once I get the system working again.
I've also discovered, when trying to use btrfs restore to copy out the data to a different system, that 3.14.1 restore apparently chokes on filesystem that have lzo compression turned on. It's reporting errors trying to inflate compressed files, and I know for a fact that none of those files were even open, let alone being written to, when the system crashed. I don't know if this is a known bug or even if it is still the case with btrfs-progs 3.16, but I figured I'd comment about it because I haven't seen anything about it anywhere.
Also, I interestingly didn't get the crash you saw above with btrfs rescue super-recover, so that might be a regression in 3.16 btrfs-progs.
Thanks for all the help.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]
next prev parent reply other threads:[~2014-09-17 11:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-16 14:40 Problem with unmountable filesystem Austin S Hemmelgarn
2014-09-16 20:57 ` Chris Murphy
2014-09-17 11:23 ` Austin S Hemmelgarn [this message]
2014-09-17 18:57 ` Chris Murphy
2014-09-17 20:07 ` Duncan
2014-09-18 17:12 ` Austin S Hemmelgarn
2014-09-18 21:15 ` Chris Murphy
2014-09-18 21:25 ` Duncan
2014-09-19 17:07 ` Chris Murphy
2014-09-19 17:42 ` Austin S Hemmelgarn
2014-09-17 20:22 ` Duncan
2014-09-18 17:19 ` Austin S Hemmelgarn
2014-09-19 17:54 ` Chris Murphy
2014-09-19 18:44 ` Austin S Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54196F42.4030101@gmail.com \
--to=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.