linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Problem with unmountable filesystem.
Date: Wed, 17 Sep 2014 07:23:46 -0400	[thread overview]
Message-ID: <54196F42.4030101@gmail.com> (raw)
In-Reply-To: <2A2CB71A-7516-43CD-94E1-BCB2198F5FC4@colorremedies.com>

[-- Attachment #1: Type: text/plain, Size: 7616 bytes --]

On 2014-09-16 16:57, Chris Murphy wrote:
> 
> On Sep 16, 2014, at 8:40 AM, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote:
> 
>> Based on the kernel messages, the primary issue is log corruption, and
>> in theory btrfs-zero-log should fix it.
> 
> Can you provide a complete dmesg somewhere for this initial failure, just for reference? I'm curious what this indication looks like compared to other problems.
> 
Okay, I can't really get a 'complete' dmesg, because the system panics 
on the mount failure (the filesystem in question is the system's root 
filesystem), the system has no serial ports, and I didn't think to 
build in support for console on ttyUSB0.  I can however get what the 
recovery environment (locally compiled based on buildroot) shows when I 
try to mount the filesystem:
[   30.871036] BTRFS: device label gentoo devid 1 transid 160615 /dev/sda3
[   30.875225] BTRFS info (device sda3): disk space caching is enabled
[   30.917091] BTRFS: detected SSD devices, enabling SSD mode
[   30.920536] BTRFS: bad tree block start 0 130402254848
[   30.924018] BTRFS: bad tree block start 0 130402254848
[   30.926234] BTRFS: failed to read log tree
[   30.953055] BTRFS: open_ctree failed
>>  The actual issue however, is
>> that the primary superblock appears to be pointing at a corrupted root
>> tree, which causes pretty much everything that does anything other than
>> just read the sb to fail.  The first backup sb does point to a good
>> tree, but only btrfs check and btrfs restore have any option to ignore
>> the first sb and use one of the backups instead.
> 
> Maybe use wipefs -a on this volume, which removes the magic from only the first superblock by default (you can specify another location). And then try btrfs-show-super -F which "dumps" supers with bad magic.
> 
Thanks for the suggestion, I hadn't thought of that...
> I just tried this:
> # wipefs -a /dev/sdb
> /dev/sdb: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum			0x5c1196d7 [DON'T MATCH]
> bytenr			65536
> flags			0x1
> magic			........ [DON'T MATCH]
> […]
> # btrfs-show-super -i1 /dev/sdb
> superblock: bytenr=67108864, device=/dev/sdb
> ---------------------------------------------------------
> csum			0xfc70be19 [match]
> bytenr			67108864
> flags			0x1
> magic			_BHRfS_M [match]
> 
> So the mirror is definitely there and valid.
> # btrfs rescue super-recover -yv /dev/sdb
> No valid Btrfs found on /dev/sdb
> Usage or syntax errors
> 
> Not expected at all, man page says "Recover bad superblocks from good copies." There's a good copy, it's not being found by btrfs rescue super-recover. Seems like a bug.
> 
> 
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> 
> # btrfs check -s1 /dev/sdb
> using SB copy 1, bytenr 67108864
> Checking filesystem on /dev/sdb
> UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
> [snip]
> OK it finds it, maybe a --repair will fix the bad first one?
> # btrfs check -s1 /dev/sdb
> using SB copy 1, bytenr 67108864
> enabling repair mode
> Checking filesystem on /dev/sdb
> UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
> [snip]
> No indication of repair
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> [root@f21v ~]# btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum			0x5c1196d7 [DON'T MATCH]
> bytenr			65536
> flags			0x1
> magic			........ [DON'T MATCH]
> 
> 
> Still not fixed. Maybe I needed to corrupt something else in the superblock other than the magic and this behavior is intentional, otherwise wipefs -a, followed by btrfsck would resurrect an intentionally wiped btrfs fs, potentially wiping out some newer file system in the process.
> 
...though maybe it's a good thing I didn't.
> 
> 
>> I'm fine using dd to replace the primary sb with one of the
>> backups, but don't know the exact parameters that would be needed.
> 
> Here's an idea:
> 
> # btrfs-show-super /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum			0x92aa51ab [match]
> [snip]
> So I know what I'm looking for starts at LBA 65536/512
> 
> # dd if=/dev/sdb skip=128 count=4 2>/dev/null | hexdump -C
> 00000000  92 aa 51 ab 00 00 00 00  00 00 00 00 00 00 00 00  |..Q…..........|
> [snip]
> 
> And as it turns out the csum is right at the beginning, 4 bytes. So use bs of 4 bytes, seek 65536/4, count of 1. This should zero just 4 bytes starting at 65536 bytes in.
> 
> # dd if=/dev/zero of=/dev/sdb bs=4 seek=16384 count=1
> 
> Checked it with the earlier skip=128 command and it looks like everything else is intact.
> 
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum			0x00000000 [DON'T MATCH]
> bytenr			65536
> flags			0x1
> magic			_BHRfS_M [match]
> [snip]
> OK so the csum is bad, the magic is good. Now see if btrfs rescue super-recover does anything
> # btrfs rescue super-recover /dev/sdb
> Make sure this is a btrfs disk otherwise the tool will destroy other fs, Are you sure? [y/N]: Y
> Recovered bad superblocks successful
> *** Error in `btrfs': corrupted double-linked list: 0x0000000002289e40 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7a77e)[0x7f388663977e]
> /lib64/libc.so.6(+0x80b03)[0x7f388663fb03]
> /lib64/libc.so.6(+0x81c88)[0x7f3886640c88]
> /lib64/libc.so.6(cfree+0x4c)[0x7f38866456ec]
> btrfs[0x425ec6]
> btrfs[0x406902]
> /lib64/libc.so.6(__libc_start_main+0xf0)[0x7f38865df0e0]
> btrfs[0x406a04]
> ======= Memory m
> [snip]
> 
> kaboom!
> 
> But was it really successful?
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum			0x92aa51ab [match]
> [skip]
> Looks fixed. And it mounts.
> 
> NOW, I didn't actually have my first superblock pointing to a corrupt root tree. So it's possible that while the csum was fixed in my case, that the subsequent crash has not properly copied all good parts of superblock1 to superblock0. *shrug*
> 
> And since it crashes, looks like I found a bug.
> 
>> I'm using btrfs-progs 3.16 and
>> kernel 3.16.1.
> 
> So did I for all of the above.
> 
> 
Since posting this, I realized that the recovery environment I'm working from is actually btrfs-progs 3.14.1 and kernel 3.14.5, I need to make a point to update that once I get the system working again.

I've also discovered, when trying to use btrfs restore to copy out the data to a different system, that 3.14.1 restore apparently chokes on filesystem that have lzo compression turned on.  It's reporting errors trying to inflate compressed files, and I know for a fact that none of those files were even open, let alone being written to, when the system crashed.  I don't know if this is a known bug or even if it is still the case with btrfs-progs 3.16, but I figured I'd comment about it because I haven't seen anything about it anywhere.

Also, I interestingly didn't get the crash you saw above with btrfs rescue super-recover, so that might be a regression in 3.16 btrfs-progs.

Thanks for all the help.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

  reply	other threads:[~2014-09-17 11:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-16 14:40 Problem with unmountable filesystem Austin S Hemmelgarn
2014-09-16 20:57 ` Chris Murphy
2014-09-17 11:23   ` Austin S Hemmelgarn [this message]
2014-09-17 18:57     ` Chris Murphy
2014-09-17 20:07       ` Duncan
2014-09-18 17:12       ` Austin S Hemmelgarn
2014-09-18 21:15         ` Chris Murphy
2014-09-18 21:25         ` Duncan
2014-09-19 17:07           ` Chris Murphy
2014-09-19 17:42             ` Austin S Hemmelgarn
2014-09-17 20:22     ` Duncan
2014-09-18 17:19       ` Austin S Hemmelgarn
2014-09-19 17:54     ` Chris Murphy
2014-09-19 18:44       ` Austin S Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54196F42.4030101@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).