I've attached the complete results of the btrfs-find-root command. If I understood your directions correctly, below is the result: root@onyx:/home# btrfs check -r 7939758882816 /dev/sdb1 Opening filesystem to check... parent transid verify failed on 7939758882816 wanted 120260 found 120264 parent transid verify failed on 7939758882816 wanted 120260 found 120264 parent transid verify failed on 7939758882816 wanted 120260 found 120264 Ignoring transid failure parent transid verify failed on 7939751723008 wanted 120264 found 120262 parent transid verify failed on 7939751723008 wanted 120264 found 120266 parent transid verify failed on 7939751723008 wanted 120264 found 120266 Ignoring transid failure parent transid verify failed on 7939735683072 wanted 120264 found 120263 parent transid verify failed on 7939735683072 wanted 120261 found 120264 Ignoring transid failure parent transid verify failed on 7939734437888 wanted 120264 found 120253 Checking filesystem on /dev/sdb1 UUID: 7f500ee1-32b7-45a3-b1e9-deb7e1f59632 [1/7] checking root items Error: could not find extent items for root 18446744073709551607 ERROR: failed to repair root items: No such file or directory root@onyx:/home# root@onyx:/home# btrfs check -r 7939747938304 /dev/sdb1 Opening filesystem to check... parent transid verify failed on 7939747938304 wanted 120260 found 120263 parent transid verify failed on 7939747938304 wanted 120260 found 120265 parent transid verify failed on 7939747938304 wanted 120260 found 120265 Ignoring transid failure ERROR: could not setup extent tree ERROR: cannot open file system root@onyx:/home# root@onyx:/home# btrfs check -r 7939756146688 /dev/sdb1 Opening filesystem to check... parent transid verify failed on 7939756146688 wanted 120260 found 120262 parent transid verify failed on 7939756146688 wanted 120260 found 120264 parent transid verify failed on 7939756146688 wanted 120260 found 120264 Ignoring transid failure ERROR: could not setup extent tree ERROR: cannot open file system root@onyx:/home# root@onyx:/home# btrfs check -r 7939751559168 /dev/sdb1 Opening filesystem to check... parent transid verify failed on 7939751559168 wanted 120260 found 120261 parent transid verify failed on 7939751559168 wanted 120260 found 120261 parent transid verify failed on 7939751559168 wanted 120260 found 120261 Ignoring transid failure ERROR: could not setup extent tree ERROR: cannot open file system root@onyx:/home# Thanks again, Weldon -----Original Message----- From: Qu Wenruo Sent: August 23, 2021 6:39 PM To: weldon@newfietech.com; linux-btrfs@vger.kernel.org Subject: Re: BTRFS fails mount after power failure On 2021/8/24 上午8:10, weldon@newfietech.com wrote: > Thank you for the reply Qu. > > The hardware setup is a bit wonky in a home lab, but is as follows: > > Dell PowerEdge R510 Chassis > Dell PERC H700 > 6 * 4TB SATA Disks in a RAID 5 configuration The RAID5 is not provided by btrfs, but some hardware RAID controller? Then we don't need to bother the btrfs RAID5 bug. But still, this means the RAID controller or the hdd is not doing proper flush/fua. This means, next time your UPS went down or a kernel crash happens, you may still hit a similar problem. And this time, we're pretty sure it's less possible to blame btrfs code. > ESXi 6.5 hypervisor sees storage as local DELL Disk, 18.19TB > > 17.66TB Provisioned as a Datastore on the hypervisor, VMFS5. > - 14.5TB provisioned as a vmdk and presented as local disk to Ubuntu > virtual machine, mounted as /data (btrfs) > - 200GB provisioned as vmdk and presented as local disk to Ubuntu > virtual machine, mounted as / (ext4) > > Happy and willing to try any suggestions you may have. > > root@onyx:/home# btrfs ins dump-tree /dev/sdb1 My bad, I mean "btrfs ins dump-super -fFa", but that's for the case of btrfs RAID5 setup. Since you're using hardware RAID5 controller, we can go direct to the recovery part. Your previous find-root output would be pretty helpful. You can try btrfs-check with -r option: # btrfs check -r 7939758882816 /dev/sdb1 To see how many errors it throws. if it had almost no error, then it has a pretty high chance to recover the data. You can also try other bytenr from your find-root output, but I guess you only need to try the first 4 bytenrs. Thanks, Qu > btrfs-progs v5.4.1 > parent transid verify failed on 7939752886272 wanted 120260 found > 120262 parent transid verify failed on 7939752886272 wanted 120260 > found 120265 parent transid verify failed on 7939752886272 wanted > 120260 found 120265 Ignoring transid failure > WARNING: could not setup extent tree, skipping it Couldn't setup > device tree > ERROR: unable to open /dev/sdb1 > root@onyx:/home# > > > Thanks in advance, > Weldon > > > -----Original Message----- > From: Qu Wenruo > Sent: August 23, 2021 5:55 PM > To: weldon@newfietech.com; linux-btrfs@vger.kernel.org > Subject: Re: BTRFS fails mount after power failure > > > > On 2021/8/24 上午4:52, weldon@newfietech.com wrote: >> Good day folks, >> >> I awoke this morning to find that my UPS had died overnight and my >> Ubuntu server with a 14.5TB (Raid 5) BTRFS volume went down with it. > > RAID5 has known write hole bug, and although that bug won't cause immediate problems, it slowly degrades the whole array with each corrupted sector or unexpected power loss. > > This would eventually bring down the array with enough degradation. > >> The machine >> rebooted fine and the hardware reports no errors, however the BTRFS >> volume will no longer mount. The OS boots fine, the 14.5TB volume >> is for data storage only. gparted shows the volume/partition, and >> correctly reports space used as well as total size. I've never >> encountered this type of issue over the past year while using btrfs >> and I'm not sure where to start. A number of google search results >> express caution when attempting to recover/repair, so I'm hoping for some expert advice. >> >> My dmesg log exceeds the 100,000 bytes restriction, so I'm unable to >> attach it, so please ask if there's anything specific I can include otherwise. >> >> # uname -a >> Linux onyx 5.4.0-81-generic #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC >> 2021 >> x86_64 x86_64 x86_64 GNU/Linux >> >> # btrfs --version >> btrfs-progs v5.4.1 >> >> # btrfs fi show >> Label: 'Data' uuid: 7f500ee1-32b7-45a3-b1e9-deb7e1f59632 >> Total devices 1 FS bytes used 7.17TiB >> devid 1 size 14.50TiB used 7.40TiB path /dev/sdb1 >> >> # dmesg | grep sdb >> [ 2.312875] sd 32:0:1:0: [sdb] Very big device. Trying to use READ >> CAPACITY(16). >> [ 2.313010] sd 32:0:1:0: [sdb] 31138512896 512-byte logical blocks: (15.9 >> TB/14.5 TiB) >> [ 2.313062] sd 32:0:1:0: [sdb] Write Protect is off >> [ 2.313065] sd 32:0:1:0: [sdb] Mode Sense: 61 00 00 00 >> [ 2.313116] sd 32:0:1:0: [sdb] Cache data unavailable >> [ 2.313119] sd 32:0:1:0: [sdb] Assuming drive cache: write through >> [ 2.333321] sd 32:0:1:0: [sdb] Very big device. Trying to use READ >> CAPACITY(16). >> [ 2.396761] sdb: sdb1 >> [ 2.397170] sd 32:0:1:0: [sdb] Very big device. Trying to use READ >> CAPACITY(16). >> [ 2.397261] sd 32:0:1:0: [sdb] Attached SCSI disk >> [ 4.709963] BTRFS: device label Data devid 1 transid 120260 /dev/sdb1 >> [ 21.849570] BTRFS info (device sdb1): disk space caching is enabled >> [ 21.849573] BTRFS info (device sdb1): has skinny extents >> [ 22.023224] BTRFS error (device sdb1): parent transid verify failed on >> 7939752886272 wanted 120260 found 120262 >> [ 22.047940] BTRFS error (device sdb1): parent transid verify failed on >> 7939752886272 wanted 120260 found 120265 > > This already shows some mismatch in on-disk data and recovered data from parity. > > This shows the on-disk data and parity have drifted from each other, exactly the write hole problem. > > Furthermore, the disk has newer data than what we expect. > > What's the device model? It looks like a misbehavior, not sure if it's from the hardware, or the btrfs code. > As RAID56 is already marked as unsafe for a while, not that much love nor code fix is directed to RAID56, thus both cases are possible. > >> [ 22.047949] BTRFS warning (device sdb1): failed to read tree root >> [ 22.089003] BTRFS error (device sdb1): open_ctree failed >> >> root@onyx:/home/weldon# btrfs-find-root /dev/sdb1 parent transid >> verify failed on 7939752886272 wanted 120260 found 120262 parent >> transid verify failed on 7939752886272 wanted 120260 found 120265 >> parent transid verify failed on 7939752886272 wanted 120260 found >> 120265 Ignoring transid failure >> WARNING: could not setup extent tree, skipping it Couldn't setup >> device tree Superblock thinks the generation is 120260 Superblock >> thinks the level is 1 Well block 7939758882816(gen: 120264 level: 1) >> seems good, but generation/level doesn't match, want gen: 120260 >> level: 1 Well block 7939747938304(gen: 120263 level: 1) seems good, >> but generation/level doesn't match, want gen: 120260 level: 1 Well >> block 7939756146688(gen: 120262 level: 1) seems good, but >> generation/level doesn't match, want gen: 120260 level: 1 Well block >> 7939751559168(gen: 120261 level: 0) seems good, but generation/level >> doesn't match, want gen: 120260 level: 1 >> >> *** A large selection of block references was removed due to >> character count... if needed, I can resend with the full output. >> >> Well block 1316967743488(gen: 1293 level: 0) seems good, but >> generation/level doesn't match, want gen: 120260 level: 1 Well block >> 1316909662208(gen: 1283 level: 0) seems good, but generation/level >> doesn't match, want gen: 120260 level: 1 Well block 1316908711936(gen: >> 1283 level: 0) seems good, but generation/level doesn't match, want >> gen: 120260 level: 1 root@onyx:/home# >> >> Any help or assistance would be greatly appreciated. Important data >> has been backed up, however if it's possible to recover without >> thrashing the entire volume, that would be preferred. > > First thing first, don't expect too much about magically turning the fs back to fully functional status. > Transid error is always tricky for btrfs. > > > But for your case, I'm guessing your sdb1 does not have the latest super block. > We have newer tree roots on disk, but older super block. > > Maybe you would like to try "btrfs ins dump-tree" on all the involved disks, and find if there is newer super blocks. > > Thanks, > Qu >> >> Regards, >> Weldon >> >