From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Problem with unmountable filesystem.
Date: Wed, 17 Sep 2014 07:23:46 -0400 [thread overview]
Message-ID: <54196F42.4030101@gmail.com> (raw)
In-Reply-To: <2A2CB71A-7516-43CD-94E1-BCB2198F5FC4@colorremedies.com>
[-- Attachment #1: Type: text/plain, Size: 7616 bytes --]
On 2014-09-16 16:57, Chris Murphy wrote:
>
> On Sep 16, 2014, at 8:40 AM, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote:
>
>> Based on the kernel messages, the primary issue is log corruption, and
>> in theory btrfs-zero-log should fix it.
>
> Can you provide a complete dmesg somewhere for this initial failure, just for reference? I'm curious what this indication looks like compared to other problems.
>
Okay, I can't really get a 'complete' dmesg, because the system panics
on the mount failure (the filesystem in question is the system's root
filesystem), the system has no serial ports, and I didn't think to
build in support for console on ttyUSB0. I can however get what the
recovery environment (locally compiled based on buildroot) shows when I
try to mount the filesystem:
[ 30.871036] BTRFS: device label gentoo devid 1 transid 160615 /dev/sda3
[ 30.875225] BTRFS info (device sda3): disk space caching is enabled
[ 30.917091] BTRFS: detected SSD devices, enabling SSD mode
[ 30.920536] BTRFS: bad tree block start 0 130402254848
[ 30.924018] BTRFS: bad tree block start 0 130402254848
[ 30.926234] BTRFS: failed to read log tree
[ 30.953055] BTRFS: open_ctree failed
>> The actual issue however, is
>> that the primary superblock appears to be pointing at a corrupted root
>> tree, which causes pretty much everything that does anything other than
>> just read the sb to fail. The first backup sb does point to a good
>> tree, but only btrfs check and btrfs restore have any option to ignore
>> the first sb and use one of the backups instead.
>
> Maybe use wipefs -a on this volume, which removes the magic from only the first superblock by default (you can specify another location). And then try btrfs-show-super -F which "dumps" supers with bad magic.
>
Thanks for the suggestion, I hadn't thought of that...
> I just tried this:
> # wipefs -a /dev/sdb
> /dev/sdb: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum 0x5c1196d7 [DON'T MATCH]
> bytenr 65536
> flags 0x1
> magic ........ [DON'T MATCH]
> […]
> # btrfs-show-super -i1 /dev/sdb
> superblock: bytenr=67108864, device=/dev/sdb
> ---------------------------------------------------------
> csum 0xfc70be19 [match]
> bytenr 67108864
> flags 0x1
> magic _BHRfS_M [match]
>
> So the mirror is definitely there and valid.
> # btrfs rescue super-recover -yv /dev/sdb
> No valid Btrfs found on /dev/sdb
> Usage or syntax errors
>
> Not expected at all, man page says "Recover bad superblocks from good copies." There's a good copy, it's not being found by btrfs rescue super-recover. Seems like a bug.
>
>
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
>
> # btrfs check -s1 /dev/sdb
> using SB copy 1, bytenr 67108864
> Checking filesystem on /dev/sdb
> UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
> [snip]
> OK it finds it, maybe a --repair will fix the bad first one?
> # btrfs check -s1 /dev/sdb
> using SB copy 1, bytenr 67108864
> enabling repair mode
> Checking filesystem on /dev/sdb
> UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
> [snip]
> No indication of repair
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> [root@f21v ~]# btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum 0x5c1196d7 [DON'T MATCH]
> bytenr 65536
> flags 0x1
> magic ........ [DON'T MATCH]
>
>
> Still not fixed. Maybe I needed to corrupt something else in the superblock other than the magic and this behavior is intentional, otherwise wipefs -a, followed by btrfsck would resurrect an intentionally wiped btrfs fs, potentially wiping out some newer file system in the process.
>
...though maybe it's a good thing I didn't.
>
>
>> I'm fine using dd to replace the primary sb with one of the
>> backups, but don't know the exact parameters that would be needed.
>
> Here's an idea:
>
> # btrfs-show-super /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum 0x92aa51ab [match]
> [snip]
> So I know what I'm looking for starts at LBA 65536/512
>
> # dd if=/dev/sdb skip=128 count=4 2>/dev/null | hexdump -C
> 00000000 92 aa 51 ab 00 00 00 00 00 00 00 00 00 00 00 00 |..Q…..........|
> [snip]
>
> And as it turns out the csum is right at the beginning, 4 bytes. So use bs of 4 bytes, seek 65536/4, count of 1. This should zero just 4 bytes starting at 65536 bytes in.
>
> # dd if=/dev/zero of=/dev/sdb bs=4 seek=16384 count=1
>
> Checked it with the earlier skip=128 command and it looks like everything else is intact.
>
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum 0x00000000 [DON'T MATCH]
> bytenr 65536
> flags 0x1
> magic _BHRfS_M [match]
> [snip]
> OK so the csum is bad, the magic is good. Now see if btrfs rescue super-recover does anything
> # btrfs rescue super-recover /dev/sdb
> Make sure this is a btrfs disk otherwise the tool will destroy other fs, Are you sure? [y/N]: Y
> Recovered bad superblocks successful
> *** Error in `btrfs': corrupted double-linked list: 0x0000000002289e40 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7a77e)[0x7f388663977e]
> /lib64/libc.so.6(+0x80b03)[0x7f388663fb03]
> /lib64/libc.so.6(+0x81c88)[0x7f3886640c88]
> /lib64/libc.so.6(cfree+0x4c)[0x7f38866456ec]
> btrfs[0x425ec6]
> btrfs[0x406902]
> /lib64/libc.so.6(__libc_start_main+0xf0)[0x7f38865df0e0]
> btrfs[0x406a04]
> ======= Memory m
> [snip]
>
> kaboom!
>
> But was it really successful?
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum 0x92aa51ab [match]
> [skip]
> Looks fixed. And it mounts.
>
> NOW, I didn't actually have my first superblock pointing to a corrupt root tree. So it's possible that while the csum was fixed in my case, that the subsequent crash has not properly copied all good parts of superblock1 to superblock0. *shrug*
>
> And since it crashes, looks like I found a bug.
>
>> I'm using btrfs-progs 3.16 and
>> kernel 3.16.1.
>
> So did I for all of the above.
>
>
Since posting this, I realized that the recovery environment I'm working from is actually btrfs-progs 3.14.1 and kernel 3.14.5, I need to make a point to update that once I get the system working again.
I've also discovered, when trying to use btrfs restore to copy out the data to a different system, that 3.14.1 restore apparently chokes on filesystem that have lzo compression turned on. It's reporting errors trying to inflate compressed files, and I know for a fact that none of those files were even open, let alone being written to, when the system crashed. I don't know if this is a known bug or even if it is still the case with btrfs-progs 3.16, but I figured I'd comment about it because I haven't seen anything about it anywhere.
Also, I interestingly didn't get the crash you saw above with btrfs rescue super-recover, so that might be a regression in 3.16 btrfs-progs.
Thanks for all the help.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]
next prev parent reply other threads:[~2014-09-17 11:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-16 14:40 Problem with unmountable filesystem Austin S Hemmelgarn
2014-09-16 20:57 ` Chris Murphy
2014-09-17 11:23 ` Austin S Hemmelgarn [this message]
2014-09-17 18:57 ` Chris Murphy
2014-09-17 20:07 ` Duncan
2014-09-18 17:12 ` Austin S Hemmelgarn
2014-09-18 21:15 ` Chris Murphy
2014-09-18 21:25 ` Duncan
2014-09-19 17:07 ` Chris Murphy
2014-09-19 17:42 ` Austin S Hemmelgarn
2014-09-17 20:22 ` Duncan
2014-09-18 17:19 ` Austin S Hemmelgarn
2014-09-19 17:54 ` Chris Murphy
2014-09-19 18:44 ` Austin S Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54196F42.4030101@gmail.com \
--to=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).