Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Peter Chant <pete@petezilla.co.uk>,
	Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Mount issue, mount /dev/sdc2: can't read superblock
Date: Mon, 24 Dec 2018 20:02:20 +0800	[thread overview]
Message-ID: <0024a4b2-7117-8d76-45c5-240e23edc29b@gmx.com> (raw)
In-Reply-To: <99716398-e99c-6ee9-e256-6d05fdc48122@petezilla.co.uk>


[-- Attachment #1.1: Type: text/plain, Size: 8707 bytes --]



On 2018/12/24 下午7:31, Peter Chant wrote:
> On 12/24/18 12:58 AM, Chris Murphy wrote:
>> On Sat, Dec 22, 2018 at 10:22 AM Peter Chant <pete@petezilla.co.uk> wrote:
>>
>>> btrfs rescue super -v /dev/sdb2
>> ...
>>> All supers are valid, no need to recover
>>>
>>>
>>> btrfs insp dump-s -f <dev>
>> ...
>>> generation              7937947
>> ...
>>>         backup 0:
>>>                 backup_tree_root:       1113909100544   gen: 7937935    level: 1
>> ...
>>>         backup 1:
>>>                 backup_tree_root:       1113907347456   gen: 7937936    level: 1
>> ...
>>>         backup 2:
>>>                 backup_tree_root:       1113911951360   gen: 7937937    level: 1
>> ...
>>>         backup 3:
>>>                 backup_tree_root:       1113907494912   gen: 7937934    level: 1
>> ...
>>
>>
>> The kernel wrote out three valid checksummed supers, with what seems
>> to be a rather significant sanity violation. The super generation and
>> tree root address do not match any of the backup tree roots. The
>> *current* tree root is supposed to be in one of the backups as well.
>>
> 
> I wonder if this is a result of my trying to fix things?  E.g. btrfs
> rescue super-recover or my attempts using the tools (and kernel) in Mint
> 18.1 at one point?

At least super-recover is not responsible for this.
While btrfs check --repair could indeed cause problems.

So it may be the case.

> 
> I must admit, early on I had assumed that either this file system was a
> simple fix or was completely trashed, so I thought I'd have a quick go
> at fixing it, or wipe it and start again.  But then I seemed to get
> close with only the one error, but unmountable.
> 
> 
>> Qu, any idea how this is even theoretically possible? Bit flip right
>> before the super is computed and checksummed? Seems like some kind of
>> corruption before checksum is computed.
>>
>>
>>> I'm getting suspicious of the drive as when I was trying the various
>>> btrfs rescue * tools I saw a 'bad block', or similar, error displayed.
>>> I also have a separate basic install on ext4 on the same disk.  Though
>>> e2fsck shows no errors and mounts fine I cannot log into that install.
>>> Maybe a coincidence, but too many bad things thrown up make me
>>> suspicious.  Whatever is happening this seems to be really fighting me.
>>
>> I'm not sure how even a bad device accounts for the super generation
>> and backup mismatches. That's damn strange.
> 
> I'm less suspicious of the drive now.  I've been using an ext4 partition
> on the same drive for a few days now, having reinstalled on that and
> everything _seems_ fine.  Mind you, apart from usb sticks, I've not
> experienced a ssd failure.  Perhaps my hdd failure experience is not
> relevent, i.e. they work until they start throwing errors and then
> rapidly fail?

I don't really believe a drive can be so easily corrupted to certain
bits while all other bits are OK.

> 
> 
>>
>> If you get bored with the back and forth and just want to give up,
>> that's fine. I suggest that if you have the time and space, to take a
>> btrfs-image in case Qu or some other developer wants to look at this
>> file system at some point. The btrfs-image is a read only process, can
>> be set to scrub filenames, and only contains metadata. Size of the
>> resulting file is around 1/2 of the size of metadata, when doing
>> 'btrfs filesystem usage' or 'btrfs filesystem df'. So you'll need that
>> much free space to direct the command to.
>>
>> btrfs-image -ss -c9 -t4 <devicetoimage> pathtofile
> 
> Just done that:
> bash-4.3# btrfs-image -ss -c9 -t4 /dev/sdd2
> /mnt/backup/btrfs_issue_dec_2018/btrfs_root_image_error_20181224.img
> WARNING: cannot find a hash collision for '..', generating garbage, it
> won't match indexes
> 
> 
> 
>>
>> It might fail, if so you can try adding -w and see if that helps.
> 
> 
> OK, try with -w:
> 
> OK, many many complaints about hash collisions:
> ...
> ARNING: cannot find a hash collision for 'ifup', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'catv', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'FDPC', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'LIBS', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'INTC', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'SPI', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'PDCA', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'EBI', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'SMC', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'WIFI', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'LWIP', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'HID', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'yun', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'avr4', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'avr6', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'WiFi', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'TFT', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'Knob', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'FP.h', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'SD.h', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'Beep', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'FORK', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'CHM', generating garbage, it
> won't match indexes
> WARNING: cannot find a hash collision for 'HandS', generating garbage,
> it won't match indexes
> WARNING: cannot find a hash collision for 'dm-0', generating garbage, it
> won't match indexes
> 
> 
> Now seems to stopped producing output.  Can't see if it is doing
> something useful.  (note, started again, more such messages)

I don't know about other developers, normally I don't like btrfs-image
-ss at all.

Even plain btrfs-image isn't so helpful, especially considering its size.

Anyway, from all the data you collected, I suspect it's a corruption in
tree blocks allocation, maybe a btrfs bug in older kernels, which buried
a dangerous seed into the fs, breaking the metadata CoW.

And one day, an unexpected powerloss makes the seed grow and screw up
the fs.

Just a personal recommendation, for btrfs especially used with older
kernels, after a powerloss, it's highly recommended to run btrfs check
--readonly before mounting it.

Thanks,
Qu

> 
> 
>>
>> There is no log listed in the super so zero-log isn't indicated, and
>> also tells me there were no fsync's still flushing at the time of the
>> crash. The loss should be at most a minute of data, not an
>> inconsistent file system that can't be mounted anymore. Pretty weird.
>>
> 
> I think I ran zero-log to see if that helped.  Given that there was no
> important data and I'd assume I'd either easily fix it, or wipe it and
> start over I may have taken the 'monkey radomly pounding the buttons'
> approach, short of 'btrfs check --repair'.  I only posted here as I
> though I'd fixed it apart from the one error!  If it were a simple fix
> then it was worth asking.
> 
> 
>> What were your mount options? Defaults? Anything custom like discard,
>> commit=, notreelog? Any non-default mount options themselves would not
>> be the cause of the problem, but might suggest partial ideas for what
>> might have happened.
>>
> fstab states:
> autodefrag,ssd,discard,noatime,defaults,subvol=_r_sl14.
> 2,compress=lzo
> 
> However, I used an initrd, so I'm not sure if that is correct?
> 
> Ok, digging into init within my initrd, the line where the root partion
> is mounted:
>   mount -o ro -t $ROOTFS $ROOTDEV /mnt
> 
> Where $ROOTFS is:
> btrfs -o subvol=_r_sl14.2
> 
> and $ROOTDEV is:
> /dev/disk/by-uuid/6496aabd-d6aa-49e0-96ca-e49c316edd8e
> 
> 
> 
> Pete
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2018-12-24 12:02 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-20 21:21 Mount issue, mount /dev/sdc2: can't read superblock Peter Chant
2018-12-21 22:25 ` Chris Murphy
2018-12-22 12:34   ` Peter Chant
2018-12-24  0:58     ` Chris Murphy
2018-12-24  2:00       ` Qu Wenruo
2018-12-24 11:36         ` Peter Chant
2018-12-24 11:31       ` Peter Chant
2018-12-24 12:02         ` Qu Wenruo [this message]
2018-12-24 12:48           ` Tomáš Metelka
2018-12-24 13:02             ` Qu Wenruo
2018-12-24 13:52               ` Tomáš Metelka
2018-12-24 14:19                 ` Qu Wenruo
2018-12-30  0:48                   ` Broken chunk tree - Was: " Tomáš Metelka
2018-12-30  3:59                     ` Duncan
2018-12-30  4:38                     ` Qu Wenruo
2018-12-24 23:20         ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0024a4b2-7117-8d76-45c5-240e23edc29b@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=pete@petezilla.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox