public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: <weldon@newfietech.com>
To: "'Qu Wenruo'" <quwenruo.btrfs@gmx.com>, <linux-btrfs@vger.kernel.org>
Subject: RE: BTRFS fails mount after power failure
Date: Mon, 23 Aug 2021 18:10:44 -0600	[thread overview]
Message-ID: <001401d7987c$7bb81ff0$73285fd0$@newfietech.com> (raw)
In-Reply-To: <0be8ec2b-7226-f3d1-a02b-608e757bda24@gmx.com>

Thank you for the reply Qu.

The hardware setup is a bit wonky in a home lab, but is as follows:

Dell PowerEdge R510 Chassis
Dell PERC H700
6 * 4TB SATA Disks in a RAID 5 configuration
ESXi 6.5 hypervisor sees storage as local DELL Disk, 18.19TB

17.66TB Provisioned as a Datastore on the hypervisor, VMFS5.
- 14.5TB provisioned as a vmdk and presented as local disk to Ubuntu virtual machine, mounted as /data (btrfs)
- 200GB provisioned as vmdk and presented as local disk to Ubuntu virtual machine, mounted as / (ext4)

Happy and willing to try any suggestions you may have.

root@onyx:/home# btrfs ins dump-tree /dev/sdb1
btrfs-progs v5.4.1
parent transid verify failed on 7939752886272 wanted 120260 found 120262
parent transid verify failed on 7939752886272 wanted 120260 found 120265
parent transid verify failed on 7939752886272 wanted 120260 found 120265
Ignoring transid failure
WARNING: could not setup extent tree, skipping it
Couldn't setup device tree
ERROR: unable to open /dev/sdb1
root@onyx:/home#


Thanks in advance,
Weldon


-----Original Message-----
From: Qu Wenruo <quwenruo.btrfs@gmx.com> 
Sent: August 23, 2021 5:55 PM
To: weldon@newfietech.com; linux-btrfs@vger.kernel.org
Subject: Re: BTRFS fails mount after power failure



On 2021/8/24 上午4:52, weldon@newfietech.com wrote:
> Good day folks,
>
> I awoke this morning to find that my UPS had died overnight and my 
> Ubuntu server with a 14.5TB (Raid 5) BTRFS volume went down with it.

RAID5 has known write hole bug, and although that bug won't cause immediate problems, it slowly degrades the whole array with each corrupted sector or unexpected power loss.

This would eventually bring down the array with enough degradation.

>  The machine
> rebooted fine and the hardware reports no errors, however the BTRFS 
>volume  will no longer mount.  The OS boots fine, the 14.5TB volume is 
>for data  storage only.  gparted shows the volume/partition,  and 
>correctly reports  space used as well as total size.  I've never 
>encountered this type of issue  over the past year while using btrfs 
>and I'm not sure where to start.  A  number of google search results 
>express caution when attempting to  recover/repair, so I'm hoping for some expert advice.
>
> My dmesg log exceeds the 100,000 bytes restriction, so I'm unable to 
> attach it, so please ask if there's anything specific I can include otherwise.
>
> # uname -a
> Linux onyx 5.4.0-81-generic #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 
> 2021
> x86_64 x86_64 x86_64 GNU/Linux
>
> # btrfs --version
> btrfs-progs v5.4.1
>
> # btrfs fi show
> Label: 'Data'  uuid: 7f500ee1-32b7-45a3-b1e9-deb7e1f59632
>          Total devices 1 FS bytes used 7.17TiB
>          devid    1 size 14.50TiB used 7.40TiB path /dev/sdb1
>
> # dmesg | grep sdb
> [    2.312875] sd 32:0:1:0: [sdb] Very big device. Trying to use READ
> CAPACITY(16).
> [    2.313010] sd 32:0:1:0: [sdb] 31138512896 512-byte logical blocks: (15.9
> TB/14.5 TiB)
> [    2.313062] sd 32:0:1:0: [sdb] Write Protect is off
> [    2.313065] sd 32:0:1:0: [sdb] Mode Sense: 61 00 00 00
> [    2.313116] sd 32:0:1:0: [sdb] Cache data unavailable
> [    2.313119] sd 32:0:1:0: [sdb] Assuming drive cache: write through
> [    2.333321] sd 32:0:1:0: [sdb] Very big device. Trying to use READ
> CAPACITY(16).
> [    2.396761]  sdb: sdb1
> [    2.397170] sd 32:0:1:0: [sdb] Very big device. Trying to use READ
> CAPACITY(16).
> [    2.397261] sd 32:0:1:0: [sdb] Attached SCSI disk
> [    4.709963] BTRFS: device label Data devid 1 transid 120260 /dev/sdb1
> [   21.849570] BTRFS info (device sdb1): disk space caching is enabled
> [   21.849573] BTRFS info (device sdb1): has skinny extents
> [   22.023224] BTRFS error (device sdb1): parent transid verify failed on
> 7939752886272 wanted 120260 found 120262
> [   22.047940] BTRFS error (device sdb1): parent transid verify failed on
> 7939752886272 wanted 120260 found 120265

This already shows some mismatch in on-disk data and recovered data from parity.

This shows the on-disk data and parity have drifted from each other, exactly the write hole problem.

Furthermore, the disk has newer data than what we expect.

What's the device model? It looks like a misbehavior, not sure if it's from the hardware, or the btrfs code.
As RAID56 is already marked as unsafe for a while, not that much love nor code fix is directed to RAID56, thus both cases are possible.

> [   22.047949] BTRFS warning (device sdb1): failed to read tree root
> [   22.089003] BTRFS error (device sdb1): open_ctree failed
>
> root@onyx:/home/weldon# btrfs-find-root /dev/sdb1 parent transid 
> verify failed on 7939752886272 wanted 120260 found 120262 parent 
> transid verify failed on 7939752886272 wanted 120260 found 120265 
> parent transid verify failed on 7939752886272 wanted 120260 found 
> 120265 Ignoring transid failure
> WARNING: could not setup extent tree, skipping it Couldn't setup 
> device tree Superblock thinks the generation is 120260 Superblock 
> thinks the level is 1 Well block 7939758882816(gen: 120264 level: 1) 
> seems good, but generation/level doesn't match, want gen: 120260 
> level: 1 Well block 7939747938304(gen: 120263 level: 1) seems good, 
> but generation/level doesn't match, want gen: 120260 level: 1 Well 
> block 7939756146688(gen: 120262 level: 1) seems good, but 
> generation/level doesn't match, want gen: 120260 level: 1 Well block 
> 7939751559168(gen: 120261 level: 0) seems good, but generation/level 
> doesn't match, want gen: 120260 level: 1
>
> *** A large selection of block references was removed due to character 
> count... if needed, I can resend with the full output.
>
> Well block 1316967743488(gen: 1293 level: 0) seems good, but 
> generation/level doesn't match, want gen: 120260 level: 1 Well block 
> 1316909662208(gen: 1283 level: 0) seems good, but generation/level 
> doesn't match, want gen: 120260 level: 1 Well block 1316908711936(gen: 
> 1283 level: 0) seems good, but generation/level doesn't match, want 
> gen: 120260 level: 1 root@onyx:/home#
>
> Any help or assistance would be greatly appreciated.  Important data 
> has been backed up, however if it's possible to recover without 
> thrashing the entire volume, that would be preferred.

First thing first, don't expect too much about magically turning the fs back to fully functional status.
Transid error is always tricky for btrfs.


But for your case, I'm guessing your sdb1 does not have the latest super block.
We have newer tree roots on disk, but older super block.

Maybe you would like to try "btrfs ins dump-tree" on all the involved disks, and find if there is newer super blocks.

Thanks,
Qu
>
> Regards,
> Weldon
>


  reply	other threads:[~2021-08-24  0:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-23 20:52 BTRFS fails mount after power failure weldon
2021-08-23 23:54 ` Qu Wenruo
2021-08-24  0:10   ` weldon [this message]
2021-08-24  0:38     ` Qu Wenruo
2021-08-24  0:52       ` weldon
2021-08-24  1:54         ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='001401d7987c$7bb81ff0$73285fd0$@newfietech.com' \
    --to=weldon@newfietech.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox