ERROR: failed to read block groups: Input/output error

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ERROR: failed to read block groups: Input/output error
@ 2021-01-13 23:09 Dāvis Mosāns
  2021-01-13 23:39 ` Dāvis Mosāns
  2021-02-19 19:29 ` Zygo Blaxell
  0 siblings, 2 replies; 10+ messages in thread
From: Dāvis Mosāns @ 2021-01-13 23:09 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi,

I've 6x 3TB HDD RAID1 BTRFS filesystem where HBA card failed and
caused some corruption.
When I try to mount it I get
$ mount /dev/sdt /mnt
mount: /mnt/: wrong fs type, bad option, bad superblock on /dev/sdt,
missing codepage or helper program, or other error
$ dmesg | tail -n 9
[  617.158962] BTRFS info (device sdt): disk space caching is enabled
[  617.158965] BTRFS info (device sdt): has skinny extents
[  617.756924] BTRFS info (device sdt): bdev /dev/sdl errs: wr 0, rd
0, flush 0, corrupt 473, gen 0
[  617.756929] BTRFS info (device sdt): bdev /dev/sdj errs: wr 31626,
rd 18765, flush 178, corrupt 5841, gen 0
[  617.756933] BTRFS info (device sdt): bdev /dev/sdg errs: wr 6867,
rd 2640, flush 178, corrupt 1066, gen 0
[  631.353725] BTRFS warning (device sdt): sdt checksum verify failed
on 21057101103104 wanted 0x753cdd5f found 0x9c0ba035 level 0
[  631.376024] BTRFS warning (device sdt): sdt checksum verify failed
on 21057101103104 wanted 0x753cdd5f found 0xb908effa level 0
[  631.376038] BTRFS error (device sdt): failed to read block groups: -5
[  631.422811] BTRFS error (device sdt): open_ctree failed

$ uname -r
5.9.14-arch1-1
$ btrfs --version
btrfs-progs v5.9
$ btrfs check /dev/sdt
Opening filesystem to check...
checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
checksum verify failed on 21057101103104 found 0000009C wanted 00000075
checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
Csum didn't match
ERROR: failed to read block groups: Input/output error
ERROR: cannot open file system

$ btrfs filesystem show
Label: 'RAID'  uuid: 8aef11a9-beb6-49ea-9b2d-7876611a39e5
Total devices 6 FS bytes used 4.69TiB
devid    1 size 2.73TiB used 1.71TiB path /dev/sdt
devid    2 size 2.73TiB used 1.70TiB path /dev/sdl
devid    3 size 2.73TiB used 1.71TiB path /dev/sdj
devid    4 size 2.73TiB used 1.70TiB path /dev/sds
devid    5 size 2.73TiB used 1.69TiB path /dev/sdg
devid    6 size 2.73TiB used 1.69TiB path /dev/sdc

My guess is that some drives dropped out while kernel was still
writing to rest thus causing inconsistency.
There should be some way to find out which drives has the most
up-to-date info and assume those are correct.
I tried to mount with
$ mount -o ro,degraded,rescue=usebackuproot /dev/sdt /mnt
but that didn't make any difference

So any idea how to fix this filesystem?

Thanks!

Best regards,
Dāvis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ERROR: failed to read block groups: Input/output error
  2021-01-13 23:09 ERROR: failed to read block groups: Input/output error Dāvis Mosāns
@ 2021-01-13 23:39 ` Dāvis Mosāns
  2021-02-19  3:03   ` Dāvis Mosāns
  2021-02-19 19:29 ` Zygo Blaxell
  1 sibling, 1 reply; 10+ messages in thread
From: Dāvis Mosāns @ 2021-01-13 23:39 UTC (permalink / raw)
  To: Btrfs BTRFS

>
> Hi,
>
> I've 6x 3TB HDD RAID1 BTRFS filesystem where HBA card failed and
> caused some corruption.
> When I try to mount it I get
> $ mount /dev/sdt /mnt
> mount: /mnt/: wrong fs type, bad option, bad superblock on /dev/sdt,
> missing codepage or helper program, or other error
> $ dmesg | tail -n 9
> [  617.158962] BTRFS info (device sdt): disk space caching is enabled
> [  617.158965] BTRFS info (device sdt): has skinny extents
> [  617.756924] BTRFS info (device sdt): bdev /dev/sdl errs: wr 0, rd
> 0, flush 0, corrupt 473, gen 0
> [  617.756929] BTRFS info (device sdt): bdev /dev/sdj errs: wr 31626,
> rd 18765, flush 178, corrupt 5841, gen 0
> [  617.756933] BTRFS info (device sdt): bdev /dev/sdg errs: wr 6867,
> rd 2640, flush 178, corrupt 1066, gen 0
> [  631.353725] BTRFS warning (device sdt): sdt checksum verify failed
> on 21057101103104 wanted 0x753cdd5f found 0x9c0ba035 level 0
> [  631.376024] BTRFS warning (device sdt): sdt checksum verify failed
> on 21057101103104 wanted 0x753cdd5f found 0xb908effa level 0
> [  631.376038] BTRFS error (device sdt): failed to read block groups: -5
> [  631.422811] BTRFS error (device sdt): open_ctree failed
>
> $ uname -r
> 5.9.14-arch1-1
> $ btrfs --version
> btrfs-progs v5.9
> $ btrfs check /dev/sdt
> Opening filesystem to check...
> checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> checksum verify failed on 21057101103104 found 0000009C wanted 00000075
> checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> Csum didn't match
> ERROR: failed to read block groups: Input/output error
> ERROR: cannot open file system
>
> $ btrfs filesystem show
> Label: 'RAID'  uuid: 8aef11a9-beb6-49ea-9b2d-7876611a39e5
> Total devices 6 FS bytes used 4.69TiB
> devid    1 size 2.73TiB used 1.71TiB path /dev/sdt
> devid    2 size 2.73TiB used 1.70TiB path /dev/sdl
> devid    3 size 2.73TiB used 1.71TiB path /dev/sdj
> devid    4 size 2.73TiB used 1.70TiB path /dev/sds
> devid    5 size 2.73TiB used 1.69TiB path /dev/sdg
> devid    6 size 2.73TiB used 1.69TiB path /dev/sdc
>
>
> My guess is that some drives dropped out while kernel was still
> writing to rest thus causing inconsistency.
> There should be some way to find out which drives has the most
> up-to-date info and assume those are correct.
> I tried to mount with
> $ mount -o ro,degraded,rescue=usebackuproot /dev/sdt /mnt
> but that didn't make any difference
>
> So any idea how to fix this filesystem?
>
> Thanks!
>
> Best regards,
> Dāvis

By the way

$ btrfs-find-root /dev/sdt
ERROR: failed to read block groups: Input/output error
Superblock thinks the generation is 2262739
Superblock thinks the level is 1
Found tree root at 21057011679232 gen 2262739 level 1
Well block 21056933724160(gen: 2262738 level: 1) seems good, but
generation/level doesn't match, want gen: 2262739 level: 1
Well block 21056867319808(gen: 2262737 level: 1) seems good, but
generation/level doesn't match, want gen: 2262739 level: 1
Well block 21056855900160(gen: 2262736 level: 1) seems good, but
generation/level doesn't match, want gen: 2262739 level: 1
Well block 21056850739200(gen: 2120504 level: 0) seems good, but
generation/level doesn't match, want gen: 2262739 level: 1

$ btrfs restore -l /dev/sdt
tree key (EXTENT_TREE ROOT_ITEM 0) 21057008975872 level 3
tree key (DEV_TREE ROOT_ITEM 0) 21056861863936 level 1
tree key (FS_TREE ROOT_ITEM 0) 21063463993344 level 1
tree key (CSUM_TREE ROOT_ITEM 0) 21057010728960 level 3
tree key (UUID_TREE ROOT_ITEM 0) 21061425545216 level 0
tree key (262 ROOT_ITEM 0) 21063533002752 level 0
tree key (263 ROOT_ITEM 0) 21058890629120 level 2
tree key (418 ROOT_ITEM 0) 21057902198784 level 2
tree key (421 ROOT_ITEM 0) 21060222500864 level 2
tree key (427 ROOT_ITEM 0) 21061262114816 level 2
tree key (428 ROOT_ITEM 0) 21061278040064 level 2
tree key (440 ROOT_ITEM 0) 21061362417664 level 2
tree key (451 ROOT_ITEM 0) 21061017174016 level 2
tree key (454 ROOT_ITEM 0) 21559581114368 level 1
tree key (455 ROOT_ITEM 0) 21079314776064 level 1
tree key (456 ROOT_ITEM 0) 21058026831872 level 2
tree key (457 ROOT_ITEM 0) 21060907909120 level 3
tree key (497 ROOT_ITEM 0) 21058120990720 level 2
tree key (571 ROOT_ITEM 0) 21058195668992 level 2
tree key (599 ROOT_ITEM 0) 21058818015232 level 2
tree key (635 ROOT_ITEM 0) 21056973766656 level 2
tree key (638 ROOT_ITEM 0) 21061023072256 level 0
tree key (676 ROOT_ITEM 0) 21061314330624 level 2
tree key (3937 ROOT_ITEM 0) 21061408686080 level 0
tree key (3938 ROOT_ITEM 0) 21079315841024 level 1
tree key (3957 ROOT_ITEM 0) 21061419139072 level 2
tree key (6128 ROOT_ITEM 0) 21061400018944 level 1
tree key (8575 ROOT_ITEM 0) 21061023055872 level 0
tree key (18949 ROOT_ITEM 1728623) 21080421875712 level 1
tree key (18950 ROOT_ITEM 1728624) 21080424726528 level 2
tree key (18951 ROOT_ITEM 1728625) 21080424824832 level 2
tree key (18952 ROOT_ITEM 1728626) 21080426004480 level 3
tree key (18953 ROOT_ITEM 1728627) 21080422105088 level 2
tree key (18954 ROOT_ITEM 1728628) 21080424497152 level 2
tree key (18955 ROOT_ITEM 1728629) 21080426332160 level 2
tree key (18956 ROOT_ITEM 1728631) 21080423645184 level 2
tree key (18957 ROOT_ITEM 1728632) 21080425316352 level 2
tree key (18958 ROOT_ITEM 1728633) 21080423972864 level 2
tree key (18959 ROOT_ITEM 1728634) 21080422400000 level 2
tree key (18960 ROOT_ITEM 1728635) 21080422662144 level 2
tree key (18961 ROOT_ITEM 1728636) 21080423153664 level 2
tree key (18962 ROOT_ITEM 1728637) 21080425414656 level 2
tree key (18963 ROOT_ITEM 1728638) 21080421171200 level 1
tree key (18964 ROOT_ITEM 1728639) 21080423481344 level 2
tree key (19721 ROOT_ITEM 0) 21076937326592 level 2
checksum verify failed on 21057125580800 found 00000026 wanted 00000035
checksum verify failed on 21057108082688 found 00000074 wanted FFFFFFC5
checksum verify failed on 21057108082688 found 000000ED wanted FFFFFFC5
checksum verify failed on 21057108082688 found 00000074 wanted FFFFFFC5
Csum didn't match

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ERROR: failed to read block groups: Input/output error
  2021-01-13 23:39 ` Dāvis Mosāns
@ 2021-02-19  3:03   ` Dāvis Mosāns
  2021-02-19  5:16     ` Chris Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Dāvis Mosāns @ 2021-02-19  3:03 UTC (permalink / raw)
  To: Btrfs BTRFS

ceturtd., 2021. g. 14. janv., plkst. 01:39 — lietotājs Dāvis Mosāns
(<davispuh@gmail.com>) rakstīja:
>
> >
> > Hi,
> >
> > I've 6x 3TB HDD RAID1 BTRFS filesystem where HBA card failed and
> > caused some corruption.
> > When I try to mount it I get
> > $ mount /dev/sdt /mnt
> > mount: /mnt/: wrong fs type, bad option, bad superblock on /dev/sdt,
> > missing codepage or helper program, or other error
> > $ dmesg | tail -n 9
> > [  617.158962] BTRFS info (device sdt): disk space caching is enabled
> > [  617.158965] BTRFS info (device sdt): has skinny extents
> > [  617.756924] BTRFS info (device sdt): bdev /dev/sdl errs: wr 0, rd
> > 0, flush 0, corrupt 473, gen 0
> > [  617.756929] BTRFS info (device sdt): bdev /dev/sdj errs: wr 31626,
> > rd 18765, flush 178, corrupt 5841, gen 0
> > [  617.756933] BTRFS info (device sdt): bdev /dev/sdg errs: wr 6867,
> > rd 2640, flush 178, corrupt 1066, gen 0
> > [  631.353725] BTRFS warning (device sdt): sdt checksum verify failed
> > on 21057101103104 wanted 0x753cdd5f found 0x9c0ba035 level 0
> > [  631.376024] BTRFS warning (device sdt): sdt checksum verify failed
> > on 21057101103104 wanted 0x753cdd5f found 0xb908effa level 0
> > [  631.376038] BTRFS error (device sdt): failed to read block groups: -5
> > [  631.422811] BTRFS error (device sdt): open_ctree failed
> >
> > $ uname -r
> > 5.9.14-arch1-1
> > $ btrfs --version
> > btrfs-progs v5.9
> > $ btrfs check /dev/sdt
> > Opening filesystem to check...
> > checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> > checksum verify failed on 21057101103104 found 0000009C wanted 00000075
> > checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> > Csum didn't match
> > ERROR: failed to read block groups: Input/output error
> > ERROR: cannot open file system
> >
> > $ btrfs filesystem show
> > Label: 'RAID'  uuid: 8aef11a9-beb6-49ea-9b2d-7876611a39e5
> > Total devices 6 FS bytes used 4.69TiB
> > devid    1 size 2.73TiB used 1.71TiB path /dev/sdt
> > devid    2 size 2.73TiB used 1.70TiB path /dev/sdl
> > devid    3 size 2.73TiB used 1.71TiB path /dev/sdj
> > devid    4 size 2.73TiB used 1.70TiB path /dev/sds
> > devid    5 size 2.73TiB used 1.69TiB path /dev/sdg
> > devid    6 size 2.73TiB used 1.69TiB path /dev/sdc
> >
> >
> > My guess is that some drives dropped out while kernel was still
> > writing to rest thus causing inconsistency.
> > There should be some way to find out which drives has the most
> > up-to-date info and assume those are correct.
> > I tried to mount with
> > $ mount -o ro,degraded,rescue=usebackuproot /dev/sdt /mnt
> > but that didn't make any difference
> >
> > So any idea how to fix this filesystem?
> >
> > Thanks!
> >
> > Best regards,
> > Dāvis
>
> By the way
>
> $ btrfs-find-root /dev/sdt
> ERROR: failed to read block groups: Input/output error
> Superblock thinks the generation is 2262739
> Superblock thinks the level is 1
> Found tree root at 21057011679232 gen 2262739 level 1
> Well block 21056933724160(gen: 2262738 level: 1) seems good, but
> generation/level doesn't match, want gen: 2262739 level: 1
> Well block 21056867319808(gen: 2262737 level: 1) seems good, but
> generation/level doesn't match, want gen: 2262739 level: 1
> Well block 21056855900160(gen: 2262736 level: 1) seems good, but
> generation/level doesn't match, want gen: 2262739 level: 1
> Well block 21056850739200(gen: 2120504 level: 0) seems good, but
> generation/level doesn't match, want gen: 2262739 level: 1
>
> $ btrfs restore -l /dev/sdt
> tree key (EXTENT_TREE ROOT_ITEM 0) 21057008975872 level 3
> tree key (DEV_TREE ROOT_ITEM 0) 21056861863936 level 1
> tree key (FS_TREE ROOT_ITEM 0) 21063463993344 level 1
> tree key (CSUM_TREE ROOT_ITEM 0) 21057010728960 level 3
> tree key (UUID_TREE ROOT_ITEM 0) 21061425545216 level 0
> tree key (262 ROOT_ITEM 0) 21063533002752 level 0
> tree key (263 ROOT_ITEM 0) 21058890629120 level 2
> tree key (418 ROOT_ITEM 0) 21057902198784 level 2
> tree key (421 ROOT_ITEM 0) 21060222500864 level 2
> tree key (427 ROOT_ITEM 0) 21061262114816 level 2
> tree key (428 ROOT_ITEM 0) 21061278040064 level 2
> tree key (440 ROOT_ITEM 0) 21061362417664 level 2
> tree key (451 ROOT_ITEM 0) 21061017174016 level 2
> tree key (454 ROOT_ITEM 0) 21559581114368 level 1
> tree key (455 ROOT_ITEM 0) 21079314776064 level 1
> tree key (456 ROOT_ITEM 0) 21058026831872 level 2
> tree key (457 ROOT_ITEM 0) 21060907909120 level 3
> tree key (497 ROOT_ITEM 0) 21058120990720 level 2
> tree key (571 ROOT_ITEM 0) 21058195668992 level 2
> tree key (599 ROOT_ITEM 0) 21058818015232 level 2
> tree key (635 ROOT_ITEM 0) 21056973766656 level 2
> tree key (638 ROOT_ITEM 0) 21061023072256 level 0
> tree key (676 ROOT_ITEM 0) 21061314330624 level 2
> tree key (3937 ROOT_ITEM 0) 21061408686080 level 0
> tree key (3938 ROOT_ITEM 0) 21079315841024 level 1
> tree key (3957 ROOT_ITEM 0) 21061419139072 level 2
> tree key (6128 ROOT_ITEM 0) 21061400018944 level 1
> tree key (8575 ROOT_ITEM 0) 21061023055872 level 0
> tree key (18949 ROOT_ITEM 1728623) 21080421875712 level 1
> tree key (18950 ROOT_ITEM 1728624) 21080424726528 level 2
> tree key (18951 ROOT_ITEM 1728625) 21080424824832 level 2
> tree key (18952 ROOT_ITEM 1728626) 21080426004480 level 3
> tree key (18953 ROOT_ITEM 1728627) 21080422105088 level 2
> tree key (18954 ROOT_ITEM 1728628) 21080424497152 level 2
> tree key (18955 ROOT_ITEM 1728629) 21080426332160 level 2
> tree key (18956 ROOT_ITEM 1728631) 21080423645184 level 2
> tree key (18957 ROOT_ITEM 1728632) 21080425316352 level 2
> tree key (18958 ROOT_ITEM 1728633) 21080423972864 level 2
> tree key (18959 ROOT_ITEM 1728634) 21080422400000 level 2
> tree key (18960 ROOT_ITEM 1728635) 21080422662144 level 2
> tree key (18961 ROOT_ITEM 1728636) 21080423153664 level 2
> tree key (18962 ROOT_ITEM 1728637) 21080425414656 level 2
> tree key (18963 ROOT_ITEM 1728638) 21080421171200 level 1
> tree key (18964 ROOT_ITEM 1728639) 21080423481344 level 2
> tree key (19721 ROOT_ITEM 0) 21076937326592 level 2
> checksum verify failed on 21057125580800 found 00000026 wanted 00000035
> checksum verify failed on 21057108082688 found 00000074 wanted FFFFFFC5
> checksum verify failed on 21057108082688 found 000000ED wanted FFFFFFC5
> checksum verify failed on 21057108082688 found 00000074 wanted FFFFFFC5
> Csum didn't match

From what I understand it seems that some EXTENT_ITEM is corrupted and
when mount tries to read block groups it encounters csum mismatch for
it and immediatly aborts.
Is there some tool I could use to check this EXTENT_ITEM and see if it
can be fixed or maybe just removed?
Basically I guess I need to find physical location on disk from this
block number.
Also I think ignoring csum for btrfs inspect would be useful.

$ btrfs inspect dump-tree -b 21057050689536 /dev/sda
btrfs-progs v5.10.1
node 21057050689536 level 1 items 281 free space 212 generation
2262739 owner EXTENT_TREE
node 21057050689536 flags 0x1(WRITTEN) backref revision 1
fs uuid 8aef11a9-beb6-49ea-9b2d-7876611a39e5
chunk uuid 4ffec48c-28ed-419d-ba87-229c0adb2ab9
[...]
key (19264654909440 EXTENT_ITEM 524288) block 21057101103104 gen 2262739
[...]


$ btrfs inspect dump-tree -b 21057101103104 /dev/sda
btrfs-progs v5.10.1
checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
checksum verify failed on 21057101103104 found 0000009C wanted 00000075
checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
Csum didn't match
ERROR: failed to read tree block 21057101103104


Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ERROR: failed to read block groups: Input/output error
  2021-02-19  3:03   ` Dāvis Mosāns
@ 2021-02-19  5:16     ` Chris Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2021-02-19  5:16 UTC (permalink / raw)
  To: Dāvis Mosāns; +Cc: Btrfs BTRFS

On Thu, Feb 18, 2021 at 8:08 PM Dāvis Mosāns <davispuh@gmail.com> wrote:
>
> ceturtd., 2021. g. 14. janv., plkst. 01:39 — lietotājs Dāvis Mosāns
> (<davispuh@gmail.com>) rakstīja:
> >
> > >
> > > Hi,
> > >
> > > I've 6x 3TB HDD RAID1 BTRFS filesystem where HBA card failed and
> > > caused some corruption.
> > > When I try to mount it I get
> > > $ mount /dev/sdt /mnt
> > > mount: /mnt/: wrong fs type, bad option, bad superblock on /dev/sdt,
> > > missing codepage or helper program, or other error
> > > $ dmesg | tail -n 9
> > > [  617.158962] BTRFS info (device sdt): disk space caching is enabled
> > > [  617.158965] BTRFS info (device sdt): has skinny extents
> > > [  617.756924] BTRFS info (device sdt): bdev /dev/sdl errs: wr 0, rd
> > > 0, flush 0, corrupt 473, gen 0
> > > [  617.756929] BTRFS info (device sdt): bdev /dev/sdj errs: wr 31626,
> > > rd 18765, flush 178, corrupt 5841, gen 0
> > > [  617.756933] BTRFS info (device sdt): bdev /dev/sdg errs: wr 6867,
> > > rd 2640, flush 178, corrupt 1066, gen 0
> > > [  631.353725] BTRFS warning (device sdt): sdt checksum verify failed
> > > on 21057101103104 wanted 0x753cdd5f found 0x9c0ba035 level 0
> > > [  631.376024] BTRFS warning (device sdt): sdt checksum verify failed
> > > on 21057101103104 wanted 0x753cdd5f found 0xb908effa level 0
> > > [  631.376038] BTRFS error (device sdt): failed to read block groups: -5
> > > [  631.422811] BTRFS error (device sdt): open_ctree failed
> > >
> > > $ uname -r
> > > 5.9.14-arch1-1
> > > $ btrfs --version
> > > btrfs-progs v5.9
> > > $ btrfs check /dev/sdt
> > > Opening filesystem to check...
> > > checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> > > checksum verify failed on 21057101103104 found 0000009C wanted 00000075
> > > checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> > > Csum didn't match
> > > ERROR: failed to read block groups: Input/output error
> > > ERROR: cannot open file system
> > >
> > > $ btrfs filesystem show
> > > Label: 'RAID'  uuid: 8aef11a9-beb6-49ea-9b2d-7876611a39e5
> > > Total devices 6 FS bytes used 4.69TiB
> > > devid    1 size 2.73TiB used 1.71TiB path /dev/sdt
> > > devid    2 size 2.73TiB used 1.70TiB path /dev/sdl
> > > devid    3 size 2.73TiB used 1.71TiB path /dev/sdj
> > > devid    4 size 2.73TiB used 1.70TiB path /dev/sds
> > > devid    5 size 2.73TiB used 1.69TiB path /dev/sdg
> > > devid    6 size 2.73TiB used 1.69TiB path /dev/sdc
> > >
> > >
> > > My guess is that some drives dropped out while kernel was still
> > > writing to rest thus causing inconsistency.
> > > There should be some way to find out which drives has the most
> > > up-to-date info and assume those are correct.
> > > I tried to mount with
> > > $ mount -o ro,degraded,rescue=usebackuproot /dev/sdt /mnt
> > > but that didn't make any difference
> > >
> > > So any idea how to fix this filesystem?
> > >
> > > Thanks!
> > >
> > > Best regards,
> > > Dāvis
> >
> > By the way
> >
> > $ btrfs-find-root /dev/sdt
> > ERROR: failed to read block groups: Input/output error
> > Superblock thinks the generation is 2262739
> > Superblock thinks the level is 1
> > Found tree root at 21057011679232 gen 2262739 level 1
> > Well block 21056933724160(gen: 2262738 level: 1) seems good, but
> > generation/level doesn't match, want gen: 2262739 level: 1
> > Well block 21056867319808(gen: 2262737 level: 1) seems good, but
> > generation/level doesn't match, want gen: 2262739 level: 1
> > Well block 21056855900160(gen: 2262736 level: 1) seems good, but
> > generation/level doesn't match, want gen: 2262739 level: 1
> > Well block 21056850739200(gen: 2120504 level: 0) seems good, but
> > generation/level doesn't match, want gen: 2262739 level: 1
> >
> > $ btrfs restore -l /dev/sdt
> > tree key (EXTENT_TREE ROOT_ITEM 0) 21057008975872 level 3
> > tree key (DEV_TREE ROOT_ITEM 0) 21056861863936 level 1
> > tree key (FS_TREE ROOT_ITEM 0) 21063463993344 level 1
> > tree key (CSUM_TREE ROOT_ITEM 0) 21057010728960 level 3
> > tree key (UUID_TREE ROOT_ITEM 0) 21061425545216 level 0
> > tree key (262 ROOT_ITEM 0) 21063533002752 level 0
> > tree key (263 ROOT_ITEM 0) 21058890629120 level 2
> > tree key (418 ROOT_ITEM 0) 21057902198784 level 2
> > tree key (421 ROOT_ITEM 0) 21060222500864 level 2
> > tree key (427 ROOT_ITEM 0) 21061262114816 level 2
> > tree key (428 ROOT_ITEM 0) 21061278040064 level 2
> > tree key (440 ROOT_ITEM 0) 21061362417664 level 2
> > tree key (451 ROOT_ITEM 0) 21061017174016 level 2
> > tree key (454 ROOT_ITEM 0) 21559581114368 level 1
> > tree key (455 ROOT_ITEM 0) 21079314776064 level 1
> > tree key (456 ROOT_ITEM 0) 21058026831872 level 2
> > tree key (457 ROOT_ITEM 0) 21060907909120 level 3
> > tree key (497 ROOT_ITEM 0) 21058120990720 level 2
> > tree key (571 ROOT_ITEM 0) 21058195668992 level 2
> > tree key (599 ROOT_ITEM 0) 21058818015232 level 2
> > tree key (635 ROOT_ITEM 0) 21056973766656 level 2
> > tree key (638 ROOT_ITEM 0) 21061023072256 level 0
> > tree key (676 ROOT_ITEM 0) 21061314330624 level 2
> > tree key (3937 ROOT_ITEM 0) 21061408686080 level 0
> > tree key (3938 ROOT_ITEM 0) 21079315841024 level 1
> > tree key (3957 ROOT_ITEM 0) 21061419139072 level 2
> > tree key (6128 ROOT_ITEM 0) 21061400018944 level 1
> > tree key (8575 ROOT_ITEM 0) 21061023055872 level 0
> > tree key (18949 ROOT_ITEM 1728623) 21080421875712 level 1
> > tree key (18950 ROOT_ITEM 1728624) 21080424726528 level 2
> > tree key (18951 ROOT_ITEM 1728625) 21080424824832 level 2
> > tree key (18952 ROOT_ITEM 1728626) 21080426004480 level 3
> > tree key (18953 ROOT_ITEM 1728627) 21080422105088 level 2
> > tree key (18954 ROOT_ITEM 1728628) 21080424497152 level 2
> > tree key (18955 ROOT_ITEM 1728629) 21080426332160 level 2
> > tree key (18956 ROOT_ITEM 1728631) 21080423645184 level 2
> > tree key (18957 ROOT_ITEM 1728632) 21080425316352 level 2
> > tree key (18958 ROOT_ITEM 1728633) 21080423972864 level 2
> > tree key (18959 ROOT_ITEM 1728634) 21080422400000 level 2
> > tree key (18960 ROOT_ITEM 1728635) 21080422662144 level 2
> > tree key (18961 ROOT_ITEM 1728636) 21080423153664 level 2
> > tree key (18962 ROOT_ITEM 1728637) 21080425414656 level 2
> > tree key (18963 ROOT_ITEM 1728638) 21080421171200 level 1
> > tree key (18964 ROOT_ITEM 1728639) 21080423481344 level 2
> > tree key (19721 ROOT_ITEM 0) 21076937326592 level 2
> > checksum verify failed on 21057125580800 found 00000026 wanted 00000035
> > checksum verify failed on 21057108082688 found 00000074 wanted FFFFFFC5
> > checksum verify failed on 21057108082688 found 000000ED wanted FFFFFFC5
> > checksum verify failed on 21057108082688 found 00000074 wanted FFFFFFC5
> > Csum didn't match
>
> From what I understand it seems that some EXTENT_ITEM is corrupted and
> when mount tries to read block groups it encounters csum mismatch for
> it and immediatly aborts.
> Is there some tool I could use to check this EXTENT_ITEM and see if it
> can be fixed or maybe just removed?
> Basically I guess I need to find physical location on disk from this
> block number.
> Also I think ignoring csum for btrfs inspect would be useful.
>
> $ btrfs inspect dump-tree -b 21057050689536 /dev/sda
> btrfs-progs v5.10.1
> node 21057050689536 level 1 items 281 free space 212 generation
> 2262739 owner EXTENT_TREE
> node 21057050689536 flags 0x1(WRITTEN) backref revision 1
> fs uuid 8aef11a9-beb6-49ea-9b2d-7876611a39e5
> chunk uuid 4ffec48c-28ed-419d-ba87-229c0adb2ab9
> [...]
> key (19264654909440 EXTENT_ITEM 524288) block 21057101103104 gen 2262739
> [...]
>
>
> $ btrfs inspect dump-tree -b 21057101103104 /dev/sda
> btrfs-progs v5.10.1
> checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> checksum verify failed on 21057101103104 found 0000009C wanted 00000075
> checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> Csum didn't match
> ERROR: failed to read tree block 21057101103104
>
>
> Thanks!

What do you get for

btrfs rescue super -v /dev/

btrfs check -b /dev/

You might try kernel 5.11 which has a new mount option that will skip
bad roots and csums. It's 'mount -o ro,rescue=all' and while it won't
let you fix it, in the off chance it mounts, it'll let you get data
out before trying to repair the file system, which sometimes makes
things worse.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ERROR: failed to read block groups: Input/output error
  2021-01-13 23:09 ERROR: failed to read block groups: Input/output error Dāvis Mosāns
  2021-01-13 23:39 ` Dāvis Mosāns
@ 2021-02-19 19:29 ` Zygo Blaxell
  2021-02-20 23:45   ` Dāvis Mosāns
  1 sibling, 1 reply; 10+ messages in thread
From: Zygo Blaxell @ 2021-02-19 19:29 UTC (permalink / raw)
  To: Dāvis Mosāns; +Cc: Btrfs BTRFS

On Thu, Jan 14, 2021 at 01:09:40AM +0200, Dāvis Mosāns wrote:
> Hi,
> 
> I've 6x 3TB HDD RAID1 BTRFS filesystem where HBA card failed and
> caused some corruption.
> When I try to mount it I get
> $ mount /dev/sdt /mnt
> mount: /mnt/: wrong fs type, bad option, bad superblock on /dev/sdt,
> missing codepage or helper program, or other error
> $ dmesg | tail -n 9
> [  617.158962] BTRFS info (device sdt): disk space caching is enabled
> [  617.158965] BTRFS info (device sdt): has skinny extents
> [  617.756924] BTRFS info (device sdt): bdev /dev/sdl errs: wr 0, rd
> 0, flush 0, corrupt 473, gen 0
> [  617.756929] BTRFS info (device sdt): bdev /dev/sdj errs: wr 31626,
> rd 18765, flush 178, corrupt 5841, gen 0
> [  617.756933] BTRFS info (device sdt): bdev /dev/sdg errs: wr 6867,
> rd 2640, flush 178, corrupt 1066, gen 0

You have write errors on 2 disks, read errors on 3 disks, and raid1
tolerates only 1 disk failure, so successful recovery is unlikely.

> [  631.353725] BTRFS warning (device sdt): sdt checksum verify failed
> on 21057101103104 wanted 0x753cdd5f found 0x9c0ba035 level 0
> [  631.376024] BTRFS warning (device sdt): sdt checksum verify failed
> on 21057101103104 wanted 0x753cdd5f found 0xb908effa level 0

Both copies of this metadata block are corrupted, differently.

This is consistent with some kinds of HBA failure:  every outgoing block
from the host is potentially corrupted, usually silently.  Due to the HBA
failure, there is no indication of failure available to the filesystem
until after several corrupt blocks are written to disk.  By the time
failure is detected, damage is extensive, especially for metadata where
overwrites are frequent.

This is failure mode that you need backups to recover from (or mirror
disks on separate, non-failing HBA hardware).

> [  631.376038] BTRFS error (device sdt): failed to read block groups: -5
> [  631.422811] BTRFS error (device sdt): open_ctree failed
> 
> $ uname -r
> 5.9.14-arch1-1
> $ btrfs --version
> btrfs-progs v5.9
> $ btrfs check /dev/sdt
> Opening filesystem to check...
> checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> checksum verify failed on 21057101103104 found 0000009C wanted 00000075
> checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> Csum didn't match
> ERROR: failed to read block groups: Input/output error
> ERROR: cannot open file system
> 
> $ btrfs filesystem show
> Label: 'RAID'  uuid: 8aef11a9-beb6-49ea-9b2d-7876611a39e5
> Total devices 6 FS bytes used 4.69TiB
> devid    1 size 2.73TiB used 1.71TiB path /dev/sdt
> devid    2 size 2.73TiB used 1.70TiB path /dev/sdl
> devid    3 size 2.73TiB used 1.71TiB path /dev/sdj
> devid    4 size 2.73TiB used 1.70TiB path /dev/sds
> devid    5 size 2.73TiB used 1.69TiB path /dev/sdg
> devid    6 size 2.73TiB used 1.69TiB path /dev/sdc
> 
> 
> My guess is that some drives dropped out while kernel was still
> writing to rest thus causing inconsistency.
> There should be some way to find out which drives has the most
> up-to-date info and assume those are correct.

Neither available copy is correct, so the kernel's self-healing mechanism
doesn't work.  Thousands of pages are damaged, possibly only with minor
errors, but multiply a minor error by a thousand and it's no longer minor.

At this point it is a forensic recovery exercise.

> I tried to mount with
> $ mount -o ro,degraded,rescue=usebackuproot /dev/sdt /mnt
> but that didn't make any difference
> 
> So any idea how to fix this filesystem?

Before you can mount the filesystem read-write again, you would need to
rebuild the extent tree from the surviving pages of the subvol trees.
All other metadata pages on the filesystem must be scanned, any excess
reference items must be deleted, and any missing reference items must
be inserted.  Once the metadata references are correct, btrfs can
rebuild the free space maps, and then you can scrub and delete/replace
any damaged data files.

'btrfs check --repair' might work if only a handful of blocks are
corrupted (it takes a few short cuts and can repair minor damage)
but according to your dev stats you have thousands of corrupted blocks,
so the filesystem is probably beyond the capabilities of this tool.

'btrfs check --repair --init-extent-tree' is a brute-force operation that
will more or less rebuild the entire filesystem by scraping metadata
leaf pages off the disks.  This is your only hope here, and it's not a
good one.

Both methods are likely to fail in the presence of so much corruption
and they may take so long to run that mkfs + restore from backups could
be significantly faster.  Definitely extract any data from the filesystem
that you want to keep _before_ attempting any of these operations.

It might be possible to recover by manually inspecting the corrupted
metadata blocks and making guesses and adjustments, but that could take
even longer than check --repair if there are thousands of damaged pages.

> Thanks!
> 
> Best regards,
> Dāvis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ERROR: failed to read block groups: Input/output error
  2021-02-19 19:29 ` Zygo Blaxell
@ 2021-02-20 23:45   ` Dāvis Mosāns
  2021-02-21  1:03     ` Dāvis Mosāns
  2021-02-22  5:22     ` Zygo Blaxell
  0 siblings, 2 replies; 10+ messages in thread
From: Dāvis Mosāns @ 2021-02-20 23:45 UTC (permalink / raw)
  To: Zygo Blaxell, Chris Murphy; +Cc: Btrfs BTRFS

piektd., 2021. g. 19. febr., plkst. 07:16 — lietotājs Chris Murphy
(<lists@colorremedies.com>) rakstīja:
> [...]
> What do you get for
>
> btrfs rescue super -v /dev/
>

That seems to be all good

$ btrfs rescue super -v /dev/sda
All Devices:
Device: id = 2, name = /dev/sdt
Device: id = 4, name = /dev/sdj
Device: id = 3, name = /dev/sdg
Device: id = 6, name = /dev/sdb
Device: id = 1, name = /dev/sdl
Device: id = 5, name = /dev/sda

Before Recovering:
[All good supers]:
device name = /dev/sdt
superblock bytenr = 65536

device name = /dev/sdt
superblock bytenr = 67108864

device name = /dev/sdt
superblock bytenr = 274877906944

device name = /dev/sdj
superblock bytenr = 65536

device name = /dev/sdj
superblock bytenr = 67108864

device name = /dev/sdj
superblock bytenr = 274877906944

device name = /dev/sdg
superblock bytenr = 65536

device name = /dev/sdg
superblock bytenr = 67108864

device name = /dev/sdg
superblock bytenr = 274877906944

device name = /dev/sdb
superblock bytenr = 65536

device name = /dev/sdb
superblock bytenr = 67108864

device name = /dev/sdb
superblock bytenr = 274877906944

device name = /dev/sdl
superblock bytenr = 65536

device name = /dev/sdl
superblock bytenr = 67108864

device name = /dev/sdl
superblock bytenr = 274877906944

device name = /dev/sda
superblock bytenr = 65536

device name = /dev/sda
superblock bytenr = 67108864

device name = /dev/sda
superblock bytenr = 274877906944

[All bad supers]:

All supers are valid, no need to recover


$ btrfs inspect dump-super -f /dev/sda
superblock: bytenr=65536, device=/dev/sda
---------------------------------------------------------
csum_type               0 (crc32c)
csum_size               4
csum                    0xf72e6634 [match]
bytenr                  65536
flags                   0x1
( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    8aef11a9-beb6-49ea-9b2d-7876611a39e5
metadata_uuid           8aef11a9-beb6-49ea-9b2d-7876611a39e5
label                   RAID
generation              2262739
root                    21057011679232
sys_array_size          129
chunk_root_generation   2205349
root_level              1
chunk_root              21056798736384
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             18003557892096
bytes_used              5154539671552
sectorsize              4096
nodesize                16384
leafsize (deprecated)   16384
stripesize              4096
root_dir                6
num_devices             6
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x36b
( MIXED_BACKREF |
DEFAULT_SUBVOL |
COMPRESS_LZO |
BIG_METADATA |
EXTENDED_IREF |
SKINNY_METADATA |
NO_HOLES )
cache_generation        2262739
uuid_tree_generation    1807368
dev_item.uuid           098e5987-adf9-4a37-aad0-dff0819c6588
dev_item.fsid           8aef11a9-beb6-49ea-9b2d-7876611a39e5 [match]
dev_item.type           0
dev_item.total_bytes    3000592982016
dev_item.bytes_used     1860828135424
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          5
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0
sys_chunk_array[2048]:
item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 21056797999104)
length 33554432 owner 2 stripe_len 65536 type SYSTEM|RAID1
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 1
stripe 0 devid 5 offset 4329570304
dev_uuid 098e5987-adf9-4a37-aad0-dff0819c6588
stripe 1 devid 6 offset 4329570304
dev_uuid 7036ea10-4dce-48c6-b6d5-66378ba54b03
backup_roots[4]:
backup 0:
backup_tree_root:       21056867319808  gen: 2262737    level: 1
backup_chunk_root:      21056798736384  gen: 2205349    level: 1
backup_extent_root:     21056867106816  gen: 2262737    level: 3
backup_fs_root:         21063463993344  gen: 2095377    level: 1
backup_dev_root:        21056861863936  gen: 2262736    level: 1
backup_csum_root:       21056868122624  gen: 2262737    level: 3
backup_total_bytes:     18003557892096
backup_bytes_used:      5154539933696
backup_num_devices:     6

backup 1:
backup_tree_root:       21056933724160  gen: 2262738    level: 1
backup_chunk_root:      21056798736384  gen: 2205349    level: 1
backup_extent_root:     21056867762176  gen: 2262738    level: 3
backup_fs_root:         21063463993344  gen: 2095377    level: 1
backup_dev_root:        21056861863936  gen: 2262736    level: 1
backup_csum_root:       21056944685056  gen: 2262738    level: 3
backup_total_bytes:     18003557892096
backup_bytes_used:      5154548318208
backup_num_devices:     6

backup 2:
backup_tree_root:       21057011679232  gen: 2262739    level: 1
backup_chunk_root:      21056798736384  gen: 2205349    level: 1
backup_extent_root:     21057133690880  gen: 2262740    level: 3
backup_fs_root:         21063463993344  gen: 2095377    level: 1
backup_dev_root:        21056861863936  gen: 2262736    level: 1
backup_csum_root:       21057139916800  gen: 2262740    level: 3
backup_total_bytes:     18003557892096
backup_bytes_used:      5154540572672
backup_num_devices:     6

backup 3:
backup_tree_root:       21056855900160  gen: 2262736    level: 1
backup_chunk_root:      21056798736384  gen: 2205349    level: 1
backup_extent_root:     21056854228992  gen: 2262736    level: 3
backup_fs_root:         21063463993344  gen: 2095377    level: 1
backup_dev_root:        21056861863936  gen: 2262736    level: 1
backup_csum_root:       21056857341952  gen: 2262736    level: 3
backup_total_bytes:     18003557892096
backup_bytes_used:      5154539933696
backup_num_devices:     6


> btrfs check -b /dev/
>

This gives lots of errors and not sure if main superblock can be fixed
with less errors.

$ btrfs check -b /dev/sda
Opening filesystem to check...
Checking filesystem on /dev/sda
UUID: 8aef11a9-beb6-49ea-9b2d-7876611a39e5
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
block group 20733568155648 has wrong amount of free space, free space
cache has 696320 block group has 729088
failed to load free space cache for block group 20733568155648
[4/7] checking fs roots
root 457 inode 682022 errors 1040, bad file extent, some csum missing
root 457 inode 2438260 errors 1040, bad file extent, some csum missing
root 599 inode 228661 errors 1040, bad file extent, some csum missing
root 18950 inode 2298187 errors 1040, bad file extent, some csum missing

[...] 21068 entries like: root 18950 inode X errors 1040, bad file
extent, some csum missing

root 18950 inode 13845002 errors 1040, bad file extent, some csum missing
root 18952 inode 682022 errors 1040, bad file extent, some csum missing
root 18952 inode 2438161 errors 1040, bad file extent, some csum missing
root 18952 inode 2438162 errors 1040, bad file extent, some csum missing
root 18952 inode 2438166 errors 1040, bad file extent, some csum missing
root 18952 inode 2438167 errors 1040, bad file extent, some csum missing
root 18952 inode 2438170 errors 1040, bad file extent, some csum missing
root 18952 inode 2438187 errors 1040, bad file extent, some csum missing
root 18952 inode 2438260 errors 1040, bad file extent, some csum missing
root 18955 inode 228661 errors 1040, bad file extent, some csum missing
root 28874 inode 682022 errors 1040, bad file extent, some csum missing
root 28874 inode 2438162 errors 1040, bad file extent, some csum missing
root 28874 inode 2438187 errors 1040, bad file extent, some csum missing
root 28874 inode 2438260 errors 1040, bad file extent, some csum missing
root 28877 inode 228661 errors 1040, bad file extent, some csum missing
root 29405 inode 682022 errors 1040, bad file extent, some csum missing
root 29405 inode 2438260 errors 1040, bad file extent, some csum missing
root 29408 inode 228661 errors 1040, bad file extent, some csum missing
root 29581 inode 682022 errors 1040, bad file extent, some csum missing
root 29581 inode 2438260 errors 1040, bad file extent, some csum missing
root 29584 inode 228661 errors 1040, bad file extent, some csum missing
root 29597 inode 682022 errors 1040, bad file extent, some csum missing
root 29597 inode 2438260 errors 1040, bad file extent, some csum missing
root 29600 inode 228661 errors 1040, bad file extent, some csum missing
root 29613 inode 682022 errors 1040, bad file extent, some csum missing
root 29613 inode 2438260 errors 1040, bad file extent, some csum missing
root 29616 inode 228661 errors 1040, bad file extent, some csum missing
root 29629 inode 682022 errors 1040, bad file extent, some csum missing
root 29629 inode 2438260 errors 1040, bad file extent, some csum missing
root 29632 inode 228661 errors 1040, bad file extent, some csum missing
root 29645 inode 682022 errors 1040, bad file extent, some csum missing
root 29645 inode 2438260 errors 1040, bad file extent, some csum missing
root 29648 inode 228661 errors 1040, bad file extent, some csum missing
root 29661 inode 682022 errors 1040, bad file extent, some csum missing
root 29661 inode 2438260 errors 1040, bad file extent, some csum missing
root 29664 inode 228661 errors 1040, bad file extent, some csum missing
ERROR: errors found in fs roots
found 5152420646912 bytes used, error(s) found
total csum bytes: 4748365652
total tree bytes: 22860578816
total fs tree bytes: 15688564736
total extent tree bytes: 1642725376
btree space waste bytes: 3881167880
file data blocks allocated: 24721653870592
referenced 7836810440704

I also tried
$ btrfs check -r 21056933724160 /dev/sda
but the output was exactly same so seems it doesn't really make difference.

So I think btrfs check -b --repair should be able to fix most of things.

> You might try kernel 5.11 which has a new mount option that will skip
> bad roots and csums. It's 'mount -o ro,rescue=all' and while it won't
> let you fix it, in the off chance it mounts, it'll let you get data
> out before trying to repair the file system, which sometimes makes
> things worse.
>
>

It doesn't make any difference, still doesn't mount
$ uname -r
5.11.0-arch2-1
$ sudo mount -o ro,rescue=all /dev/sda ./RAID
mount: /mnt/RAID: wrong fs type, bad option, bad superblock on
/dev/sda, missing codepage or helper program, or other error.

BTRFS warning (device sdl): sdl checksum verify failed on
21057101103104 wanted 0x753cdd5f found 0xb908effa level 0
BTRFS warning (device sdl): sdl checksum verify failed on
21057101103104 wanted 0x753cdd5f found 0x9c0ba035 level 0
BTRFS error (device sdl): failed to read block groups: -5
BTRFS error (device sdl): open_ctree failed

It seems there should be a way to mount with backup tree root like I
did for check but strangly usebackuproot doesn't do that...

piektd., 2021. g. 19. febr., plkst. 21:29 — lietotājs Zygo Blaxell
(<ce3g8jdj@umail.furryterror.org>) rakstīja:
>
> On Thu, Jan 14, 2021 at 01:09:40AM +0200, Dāvis Mosāns wrote:
> > Hi,
> >
> > I've 6x 3TB HDD RAID1 BTRFS filesystem where HBA card failed and
> > caused some corruption.
> > When I try to mount it I get
> > $ mount /dev/sdt /mnt
> > mount: /mnt/: wrong fs type, bad option, bad superblock on /dev/sdt,
> > missing codepage or helper program, or other error
> > $ dmesg | tail -n 9
> > [  617.158962] BTRFS info (device sdt): disk space caching is enabled
> > [  617.158965] BTRFS info (device sdt): has skinny extents
> > [  617.756924] BTRFS info (device sdt): bdev /dev/sdl errs: wr 0, rd
> > 0, flush 0, corrupt 473, gen 0
> > [  617.756929] BTRFS info (device sdt): bdev /dev/sdj errs: wr 31626,
> > rd 18765, flush 178, corrupt 5841, gen 0
> > [  617.756933] BTRFS info (device sdt): bdev /dev/sdg errs: wr 6867,
> > rd 2640, flush 178, corrupt 1066, gen 0
>
> You have write errors on 2 disks, read errors on 3 disks, and raid1
> tolerates only 1 disk failure, so successful recovery is unlikely.
>

Those wr/rd/corrupt error counts are inflated/misleading, in past when
some HDD drops out I've had them increase in huge numbers, but after
running scrub usually it was able to fix almost everything except like
few files that could be just deleted. Only now it's possible that it
failed while scrub was running making it a lot worse.

> > [  631.353725] BTRFS warning (device sdt): sdt checksum verify failed
> > on 21057101103104 wanted 0x753cdd5f found 0x9c0ba035 level 0
> > [  631.376024] BTRFS warning (device sdt): sdt checksum verify failed
> > on 21057101103104 wanted 0x753cdd5f found 0xb908effa level 0
>
> Both copies of this metadata block are corrupted, differently.
>
> This is consistent with some kinds of HBA failure:  every outgoing block
> from the host is potentially corrupted, usually silently.  Due to the HBA
> failure, there is no indication of failure available to the filesystem
> until after several corrupt blocks are written to disk.  By the time
> failure is detected, damage is extensive, especially for metadata where
> overwrites are frequent.
>

I don't think it's that bad here. My guess is that it failed while
updating extent tree and some part of it didn't got written to disk. I
want to check how it looks like on disk, is there some tool to map
block number to offset in disk?

> This is failure mode that you need backups to recover from (or mirror
> disks on separate, non-failing HBA hardware).
>

I don't know how btrfs decides in which disks it stores copies or if
it's always in same disk. To prevent such failure in future I could
split RAID1 across 2 different HBAs but it's not clear which disks
would need to be seperated.

> > [  631.376038] BTRFS error (device sdt): failed to read block groups: -5
> > [  631.422811] BTRFS error (device sdt): open_ctree failed
> >
> > $ uname -r
> > 5.9.14-arch1-1
> > $ btrfs --version
> > btrfs-progs v5.9
> > $ btrfs check /dev/sdt
> > Opening filesystem to check...
> > checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> > checksum verify failed on 21057101103104 found 0000009C wanted 00000075
> > checksum verify failed on 21057101103104 found 000000B9 wanted 00000075
> > Csum didn't match
> > ERROR: failed to read block groups: Input/output error
> > ERROR: cannot open file system
> >
> > $ btrfs filesystem show
> > Label: 'RAID'  uuid: 8aef11a9-beb6-49ea-9b2d-7876611a39e5
> > Total devices 6 FS bytes used 4.69TiB
> > devid    1 size 2.73TiB used 1.71TiB path /dev/sdt
> > devid    2 size 2.73TiB used 1.70TiB path /dev/sdl
> > devid    3 size 2.73TiB used 1.71TiB path /dev/sdj
> > devid    4 size 2.73TiB used 1.70TiB path /dev/sds
> > devid    5 size 2.73TiB used 1.69TiB path /dev/sdg
> > devid    6 size 2.73TiB used 1.69TiB path /dev/sdc
> >
> >
> > My guess is that some drives dropped out while kernel was still
> > writing to rest thus causing inconsistency.
> > There should be some way to find out which drives has the most
> > up-to-date info and assume those are correct.
>
> Neither available copy is correct, so the kernel's self-healing mechanism
> doesn't work.  Thousands of pages are damaged, possibly only with minor
> errors, but multiply a minor error by a thousand and it's no longer minor.
>
> At this point it is a forensic recovery exercise.
>
> > I tried to mount with
> > $ mount -o ro,degraded,rescue=usebackuproot /dev/sdt /mnt
> > but that didn't make any difference
> >
> > So any idea how to fix this filesystem?
>
> Before you can mount the filesystem read-write again, you would need to
> rebuild the extent tree from the surviving pages of the subvol trees.
> All other metadata pages on the filesystem must be scanned, any excess
> reference items must be deleted, and any missing reference items must
> be inserted.  Once the metadata references are correct, btrfs can
> rebuild the free space maps, and then you can scrub and delete/replace
> any damaged data files.
>
> 'btrfs check --repair' might work if only a handful of blocks are
> corrupted (it takes a few short cuts and can repair minor damage)
> but according to your dev stats you have thousands of corrupted blocks,
> so the filesystem is probably beyond the capabilities of this tool.
>
> 'btrfs check --repair --init-extent-tree' is a brute-force operation that
> will more or less rebuild the entire filesystem by scraping metadata
> leaf pages off the disks.  This is your only hope here, and it's not a
> good one.
>

I don't want to use --init-extent-tree because I don't want to reset
everything, but only corrupted things. Also btrfs check --repair
doesn't work as it aborts too quickly, only using with -b flag could
fix it I think.

> Both methods are likely to fail in the presence of so much corruption
> and they may take so long to run that mkfs + restore from backups could
> be significantly faster.  Definitely extract any data from the filesystem
> that you want to keep _before_ attempting any of these operations.
>

It's not really about time, I rather want to reduce possilble data
loss as much as possible.
I can't mount it even read-only so it seems the only way to get data
out is by using btrfs restore which seems to work fine but does it
verify file checksums? It looks like it doesn't... I have some files
where it said:
We seem to be looping a lot on ./something, do you want to keep going
on ? (y/N/a)

When I checked this file I see that it's corrupted. Basically I want
restore only files with valid checksums and then have a list of
corrupted files. From currupted files there are few I want to see if
they can be recovered. I have lot of of snapshots but even the oldest
ones are corrupted in exactly same way - they're identical. It seems I
need to find previous copy of this file if it exsists at all... Any
idea how to find previous version of file?
I tried
$ btrfs restore -u 2 -t 21056933724160
with different superblocks/tree roots but they all give same corrupted file.
The file looks like this
$ hexdump -C file | head -n 5
00000000  27 47 10 00 6d 64 61 74  7b 7b 7b 7b 7b 7b 7b 7b  |'G..mdat{{{{{{{{|
00000010  7a 7a 79 79 7a 7a 7a 7a  7b 7b 7b 7c 7c 7c 7b 7b  |zzyyzzzz{{{|||{{|
00000020  7c 7c 7c 7b 7b 7b 7b 7b  7c 7c 7c 7c 7c 7b 7b 7b  ||||{{{{{|||||{{{|
00000030  7b 7b 7b 7b 7b 7b 7b 7b  7b 7a 7a 7a 7a 79 79 7a  |{{{{{{{{{zzzzyyz|
00000040  7b 7b 7b 7b 7a 7b 7b 7c  7b 7c 7c 7b 7c 7c 7b 7b  |{{{{z{{|{||{||{{|

Those repeated 7a/b/c is wrong data. Also I'm not sure if these files
have been corrupted now or more in past... So I need to check if
checksum matches.

> It might be possible to recover by manually inspecting the corrupted
> metadata blocks and making guesses and adjustments, but that could take
> even longer than check --repair if there are thousands of damaged pages.
>

I want to look into this but not sure if there are any tools with
which it would be easy to inspect data. dump-tree is nice but it
doesn't work when checksum is incorrect.

My current plan is:
1. btrfs restore only valid files, how? I don't want to mix good files
with corrupted ones together
2. look into how exactly extent tree is corrupted
3. try to see if few of corrupted files can be recovered in some way
4. do btrfs check -b --repair (maybe if extent tree can be fixed then
wouldn't need to use -b flag)
5. try to mount and btrfs scrub
6. maybe wipe and recreate new fielsystem

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ERROR: failed to read block groups: Input/output error
  2021-02-20 23:45   ` Dāvis Mosāns
@ 2021-02-21  1:03     ` Dāvis Mosāns
  2021-02-21  1:08       ` Qu Wenruo
  2021-02-22  5:22     ` Zygo Blaxell
  1 sibling, 1 reply; 10+ messages in thread
From: Dāvis Mosāns @ 2021-02-21  1:03 UTC (permalink / raw)
  To: Zygo Blaxell, Chris Murphy; +Cc: Btrfs BTRFS

I just found something really strange, it seems pointers for extent
tree and csum tree have somehow gotten swapped...

$ btrfs inspect dump-super -f /dev/sda | grep backup_extent_root
backup_extent_root:     21056867106816  gen: 2262737    level: 3
backup_extent_root:     21056867762176  gen: 2262738    level: 3
backup_extent_root:     21057133690880  gen: 2262740    level: 3 <<
points to CSUM_TREE
backup_extent_root:     21056854228992  gen: 2262736    level: 3

$ btrfs inspect dump-super -f /dev/sda | grep backup_csum_root
backup_csum_root:       21056868122624  gen: 2262737    level: 3
backup_csum_root:       21056944685056  gen: 2262738    level: 3
backup_csum_root:       21057139916800  gen: 2262740    level: 3 <<
points to EXTENT_TREE
backup_csum_root:       21056857341952  gen: 2262736    level: 3

$ btrfs inspect dump-tree -b 21057133690880 /dev/sda | head -n 2
btrfs-progs v5.10.1
node 21057133690880 level 1 items 316 free space 177 generation
2262698 owner CSUM_TREE

$ btrfs inspect dump-tree -b 21057139916800 /dev/sda | head -n 2
btrfs-progs v5.10.1
leaf 21057139916800 items 166 free space 6367 generation 2262696 owner
EXTENT_TREE


Previous gen is fine

$ btrfs inspect dump-tree -b 21056867762176 /dev/sda | head -n 2
btrfs-progs v5.10.1
node 21056867762176 level 3 items 2 free space 491 generation 2262738
owner EXTENT_TREE

$ btrfs inspect dump-tree -b 21056944685056 /dev/sda | head -n 2
btrfs-progs v5.10.1
node 21056944685056 level 3 items 5 free space 488 generation 2262738
owner CSUM_TREE

Also generation specified in backup root doesn't match with one in
block so seems like latest gen wasn't written to disk or something
like that.

In root tree there is different extent tree used than one specified in
backup root.
$ btrfs inspect dump-tree -b 21057011679232 /dev/sda | head -n 6
btrfs-progs v5.10.1
node 21057011679232 level 1 items 126 free space 367 generation
2262739 owner ROOT_TREE
node 21057011679232 flags 0x1(WRITTEN) backref revision 1
fs uuid 8aef11a9-beb6-49ea-9b2d-7876611a39e5
chunk uuid 4ffec48c-28ed-419d-ba87-229c0adb2ab9
key (EXTENT_TREE ROOT_ITEM 0) block 21057018363904 gen 2262739

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ERROR: failed to read block groups: Input/output error
  2021-02-21  1:03     ` Dāvis Mosāns
@ 2021-02-21  1:08       ` Qu Wenruo
  2021-02-21  2:21         ` Dāvis Mosāns
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2021-02-21  1:08 UTC (permalink / raw)
  To: Dāvis Mosāns, Zygo Blaxell, Chris Murphy; +Cc: Btrfs BTRFS



On 2021/2/21 上午9:03, Dāvis Mosāns wrote:
> I just found something really strange, it seems pointers for extent
> tree and csum tree have somehow gotten swapped...

Only the latest 2 backup roots are supposed to be correct, older ones
are no longer ensured to be correct.

This is not strange.

Thanks,
Qu
>
> $ btrfs inspect dump-super -f /dev/sda | grep backup_extent_root
> backup_extent_root:     21056867106816  gen: 2262737    level: 3
> backup_extent_root:     21056867762176  gen: 2262738    level: 3
> backup_extent_root:     21057133690880  gen: 2262740    level: 3 <<
> points to CSUM_TREE
> backup_extent_root:     21056854228992  gen: 2262736    level: 3
>
> $ btrfs inspect dump-super -f /dev/sda | grep backup_csum_root
> backup_csum_root:       21056868122624  gen: 2262737    level: 3
> backup_csum_root:       21056944685056  gen: 2262738    level: 3
> backup_csum_root:       21057139916800  gen: 2262740    level: 3 <<
> points to EXTENT_TREE
> backup_csum_root:       21056857341952  gen: 2262736    level: 3
>
> $ btrfs inspect dump-tree -b 21057133690880 /dev/sda | head -n 2
> btrfs-progs v5.10.1
> node 21057133690880 level 1 items 316 free space 177 generation
> 2262698 owner CSUM_TREE
>
> $ btrfs inspect dump-tree -b 21057139916800 /dev/sda | head -n 2
> btrfs-progs v5.10.1
> leaf 21057139916800 items 166 free space 6367 generation 2262696 owner
> EXTENT_TREE
>
>
> Previous gen is fine
>
> $ btrfs inspect dump-tree -b 21056867762176 /dev/sda | head -n 2
> btrfs-progs v5.10.1
> node 21056867762176 level 3 items 2 free space 491 generation 2262738
> owner EXTENT_TREE
>
> $ btrfs inspect dump-tree -b 21056944685056 /dev/sda | head -n 2
> btrfs-progs v5.10.1
> node 21056944685056 level 3 items 5 free space 488 generation 2262738
> owner CSUM_TREE
>
> Also generation specified in backup root doesn't match with one in
> block so seems like latest gen wasn't written to disk or something
> like that.
>
> In root tree there is different extent tree used than one specified in
> backup root.
> $ btrfs inspect dump-tree -b 21057011679232 /dev/sda | head -n 6
> btrfs-progs v5.10.1
> node 21057011679232 level 1 items 126 free space 367 generation
> 2262739 owner ROOT_TREE
> node 21057011679232 flags 0x1(WRITTEN) backref revision 1
> fs uuid 8aef11a9-beb6-49ea-9b2d-7876611a39e5
> chunk uuid 4ffec48c-28ed-419d-ba87-229c0adb2ab9
> key (EXTENT_TREE ROOT_ITEM 0) block 21057018363904 gen 2262739
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ERROR: failed to read block groups: Input/output error
  2021-02-21  1:08       ` Qu Wenruo
@ 2021-02-21  2:21         ` Dāvis Mosāns
  0 siblings, 0 replies; 10+ messages in thread
From: Dāvis Mosāns @ 2021-02-21  2:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Zygo Blaxell, Chris Murphy, Btrfs BTRFS

svētd., 2021. g. 21. febr., plkst. 03:08 — lietotājs Qu Wenruo
(<quwenruo.btrfs@gmx.com>) rakstīja:
>
>
>
> On 2021/2/21 上午9:03, Dāvis Mosāns wrote:
> > I just found something really strange, it seems pointers for extent
> > tree and csum tree have somehow gotten swapped...
>
> Only the latest 2 backup roots are supposed to be correct, older ones
> are no longer ensured to be correct.
>
> This is not strange.
>

Well here it is latest backup root, it has the highest generation but
looks like it's not issue as it's not used.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ERROR: failed to read block groups: Input/output error
  2021-02-20 23:45   ` Dāvis Mosāns
  2021-02-21  1:03     ` Dāvis Mosāns
@ 2021-02-22  5:22     ` Zygo Blaxell
  1 sibling, 0 replies; 10+ messages in thread
From: Zygo Blaxell @ 2021-02-22  5:22 UTC (permalink / raw)
  To: Dāvis Mosāns; +Cc: Chris Murphy, Btrfs BTRFS

On Sun, Feb 21, 2021 at 01:45:10AM +0200, Dāvis Mosāns wrote:
> piektd., 2021. g. 19. febr., plkst. 07:16 — lietotājs Chris Murphy
> (<lists@colorremedies.com>) rakstīja:
[...]
> > btrfs check -b /dev/
> So I think btrfs check -b --repair should be able to fix most of things.

Why do you think that?  If the HBA was failing for more than a minute
before the filesystem detected a failure and went read-only, it will
have damaged metadata all over the tree.

> > You might try kernel 5.11 which has a new mount option that will skip
> > bad roots and csums. It's 'mount -o ro,rescue=all' and while it won't
> > let you fix it, in the off chance it mounts, it'll let you get data
> > out before trying to repair the file system, which sometimes makes
> > things worse.

rescue=all (or rescue=ignorebadroots) only takes effect if the extent
tree root cannot be found at all.  In this case, you have an available
extent tree root, but some of the leaves are damaged, so the filesystem
fails later on.

There's an opportunity for a patch here that doesn't even try to use the
extent tree, just pretends it wasn't found.  There is similar logic in the
patch for ignoredatacsums that can be used as a reference.  Maybe that's
what rescue=ignorebadroots should always do...I mean if we think the
non-essential roots are bad, why would we even try to use any of them?

On the other hand, such a patch would only get past this problem so
we can encounter the next problem.  You say you want to check csums,
so you need to read the csum tree, but the csum tree would probably be
similarly damaged.  Large numbers of csums may not be available without
rebuilding the csum tree.  Note this is not the same as recomputing the
csums--we just want to scrape up all the surviving csum tree metadata
leaf pages and build a new csum tree out of them.  I'm not sure if any
of the existing tools do that.

> It doesn't make any difference, still doesn't mount
> $ uname -r
> 5.11.0-arch2-1
> $ sudo mount -o ro,rescue=all /dev/sda ./RAID
> mount: /mnt/RAID: wrong fs type, bad option, bad superblock on
> /dev/sda, missing codepage or helper program, or other error.
> 
> BTRFS warning (device sdl): sdl checksum verify failed on
> 21057101103104 wanted 0x753cdd5f found 0xb908effa level 0
> BTRFS warning (device sdl): sdl checksum verify failed on
> 21057101103104 wanted 0x753cdd5f found 0x9c0ba035 level 0
> BTRFS error (device sdl): failed to read block groups: -5
> BTRFS error (device sdl): open_ctree failed
> 
> It seems there should be a way to mount with backup tree root like I
> did for check but strangly usebackuproot doesn't do that...

All but the last few backup trees are normally destroyed within a few
minutes by writes from new transactions.

> piektd., 2021. g. 19. febr., plkst. 21:29 — lietotājs Zygo Blaxell
> (<ce3g8jdj@umail.furryterror.org>) rakstīja:
> >
> > On Thu, Jan 14, 2021 at 01:09:40AM +0200, Dāvis Mosāns wrote:
> > > Hi,
> > >
> > > I've 6x 3TB HDD RAID1 BTRFS filesystem where HBA card failed and
> > > caused some corruption.
> > > When I try to mount it I get
> > > $ mount /dev/sdt /mnt
> > > mount: /mnt/: wrong fs type, bad option, bad superblock on /dev/sdt,
> > > missing codepage or helper program, or other error
> > > $ dmesg | tail -n 9
> > > [  617.158962] BTRFS info (device sdt): disk space caching is enabled
> > > [  617.158965] BTRFS info (device sdt): has skinny extents
> > > [  617.756924] BTRFS info (device sdt): bdev /dev/sdl errs: wr 0, rd
> > > 0, flush 0, corrupt 473, gen 0
> > > [  617.756929] BTRFS info (device sdt): bdev /dev/sdj errs: wr 31626,
> > > rd 18765, flush 178, corrupt 5841, gen 0
> > > [  617.756933] BTRFS info (device sdt): bdev /dev/sdg errs: wr 6867,
> > > rd 2640, flush 178, corrupt 1066, gen 0
> >
> > You have write errors on 2 disks, read errors on 3 disks, and raid1
> > tolerates only 1 disk failure, so successful recovery is unlikely.
> 
> Those wr/rd/corrupt error counts are inflated/misleading, in past when
> some HDD drops out I've had them increase in huge numbers, but after
> running scrub usually it was able to fix almost everything except like
> few files that could be just deleted. 

That is very bad.

With recoverable failures on raid1, scrub should be able to fix every
error, every time (to the extent that btrfs can detect errors, and there
is a known issue with writing to pages in memory while doing direct IO at
the same time).  If scrub fails to recover all errors then your system
was likely exceeding the RAID1 failure tolerances (at most 1 failure
per mirrored block) and if that is not fixed then loss of filesystem
is inevitable.

Healthy disks should never drop off the SATA bus.  If that is happening
then problem needs to be identified and resolved, i.e. find the bad cable,
bad PSU, bad HBA, bad disk, bad disk firmware, mismatched kernel/SCTERC
timeouts, failing disk, BIOS misconfiguration, etc. and replace the bad
hardware or fix the configuration.

> Only now it's possible that it
> failed while scrub was running making it a lot worse.

If the HBA is mangling command headers (turning reads into writes, or
changing the LBA of a write) then scrub could do some damage:  corrupted
reads would trigger self-repair which would try to overwrite the
affected blocks with correct data.  If the HBA corrupts the next write
command address then that data could end up written somewhere it's not
supposed to be.  The error rate would have to be very high or bursty
before scrub could damage metadata--~99% of btrfs scrub IO is data,
so metadata corruption caused by scrub is very rare.

Scrub will heat up the HBA chip which could increase the error rate
temporarily or trigger a more permanent failure if it was already faulty.
That theory is consistent with the observations so far.

The HBA could have been randomly corrupting data and dropping drives off
the bus the whole time, and finally the corruption landed on a metadata
block recently.  That theory is also consistent with the observations
so far.

> > > [  631.353725] BTRFS warning (device sdt): sdt checksum verify failed
> > > on 21057101103104 wanted 0x753cdd5f found 0x9c0ba035 level 0
> > > [  631.376024] BTRFS warning (device sdt): sdt checksum verify failed
> > > on 21057101103104 wanted 0x753cdd5f found 0xb908effa level 0
> >
> > Both copies of this metadata block are corrupted, differently.
> >
> > This is consistent with some kinds of HBA failure:  every outgoing block
> > from the host is potentially corrupted, usually silently.  Due to the HBA
> > failure, there is no indication of failure available to the filesystem
> > until after several corrupt blocks are written to disk.  By the time
> > failure is detected, damage is extensive, especially for metadata where
> > overwrites are frequent.
> 
> I don't think it's that bad here. My guess is that it failed while
> updating extent tree and some part of it didn't got written to disk. 

"some part of [the extent tree] didn't get written to disk" is the worst
case scenario.  It doesn't matter if the filesystem lost 1 extent tree
page or 1000.  If there are any missing interior nodes then the trees
need to be rebuilt before they are usable again.

If metadata interior nodes are missing, then the leaf pages behind them
are no longer accessible without brute-force search of all metadata
chunks.  There's no other way to find them because they could be anywhere
in a metadata chunk.  If leaf pages are missing then reference counts
are no longer reliable and safely writing to the filesystem is not
possible.

It's theoretically possible to scan for just the leaf pages and rebuild
the interior nodes so that read-only access to the data is possible,
but I don't know of an existing tool that does only that.  Check will
read from the subvol trees and try to rebuild the extent tree, which
is not quite the same thing.

> I
> want to check how it looks like on disk, is there some tool to map
> block number to offset in disk?

'btrfs-map-logical' does the translation internally, but it will go ahead
and read the block for you, it doesn't tell you the translation result.

> > This is failure mode that you need backups to recover from (or mirror
> > disks on separate, non-failing HBA hardware).
> 
> I don't know how btrfs decides in which disks it stores copies or if
> it's always in same disk. To prevent such failure in future I could
> split RAID1 across 2 different HBAs but it's not clear which disks
> would need to be seperated.

Ideally in any RAID system every disk gets its own separate controller.
There would be isolated hardware for every disk, including HBA and
power supply.  This isn't very practical or cost-effective (except
on purpose-built NAS and server hardware), so most people backup the
filesystem content on another host instead.

Using a separate host means you get easy and effective isolation from
failures in CPU, RAM, HBA, PCI bridges, power supply, cooling, kernel,
and any of a dozen other points of failure that could break a filesystem.

> I don't want to use --init-extent-tree because I don't want to reset
> everything, but only corrupted things. 

Same thing.

For read-write mounts, you can remove all the corrupted things,
but after that some of the reference data is missing from the tree,
so you have to verify the entire tree to remove the inconsistencies.
The easiest way to do that is to build a new consistent tree out of the
old metadata items so you can query it for duplicate and missing items,
but if you've already built a new consistent tree then you might as well
keep it and throw away the old one.

For read-only mounts, you will need to search for all the leaf pages of
the tree where an interior node is damaged and no longer points to the
leaf page.

> Also btrfs check --repair
> doesn't work as it aborts too quickly, only using with -b flag could
> fix it I think.

The difference between --repair and --init-extent-tree is that --repair
can make small fixes to trees that are otherwise intact.  If --repair
encounters something that cannot be repaired in place (like missing
interior nodes of the tree) then it will abort.  In such cases you will
need to do --init-extent-tree to make the filesystem writable again
(or even readable in some cases).

I would not advise trying to make this filesystem writable again.
I'd copy the data off, then mkfs and copy it back (or swap disks with
the copy).  The HBA failure has likely corrupted recently written data
too, so I'd stick with trying to recover older files and assume all new
files are bad.  Old files can be compared with backups to see if they
were damaged by misaddressed writes.

I wouldn't advise using this filesystem as a btrfs check --repair
test case.  It's easy to build a randomly corrupted filesystem for
development testing, and check --repair needs more development work to
cope with much gentler corruption cases.

> > Both methods are likely to fail in the presence of so much corruption
> > and they may take so long to run that mkfs + restore from backups could
> > be significantly faster.  Definitely extract any data from the filesystem
> > that you want to keep _before_ attempting any of these operations.
> 
> It's not really about time, I rather want to reduce possilble data
> loss as much as possible.
> I can't mount it even read-only so it seems the only way to get data
> out is by using btrfs restore which seems to work fine but does it
> verify file checksums? It looks like it doesn't... 

btrfs restore does not verify checksums (nor is it confused by metadata
inconsistency or corruption in the csum tree).  You get the data that
is on disk in all its corrupt glory.

There's another patch opportunity:  teach btrfs restore how to read the
csum tree.

Note that if you have damaged metadata, the file data checksums may no
longer be available, or a brute-force search may be required to locate
them, so even if you get csum verification working, it might not be
very useful.

> I have some files
> where it said:
> We seem to be looping a lot on ./something, do you want to keep going
> on ? (y/N/a)

That's normal, the looping detection threshold is very low.  Most people
hit 'a' here.  It doesn't affect correctness of output.  It only prevents
getting stuck in infinite loops when trying to locate the metadata page
for the next extent in the file.

> When I checked this file I see that it's corrupted. Basically I want
> restore only files with valid checksums and then have a list of
> corrupted files. From currupted files there are few I want to see if
> they can be recovered. I have lot of of snapshots but even the oldest
> ones are corrupted in exactly same way - they're identical. It seems I
> need to find previous copy of this file if it exsists at all... Any
> idea how to find previous version of file?

If they were snapshots then the files share physical storage, i.e. there
are not two distinct versions of the shared content, they are two
references to the same content, and corruption in one will appear in
the other.

> I tried
> $ btrfs restore -u 2 -t 21056933724160
> with different superblocks/tree roots but they all give same corrupted file.
> The file looks like this
> $ hexdump -C file | head -n 5
> 00000000  27 47 10 00 6d 64 61 74  7b 7b 7b 7b 7b 7b 7b 7b  |'G..mdat{{{{{{{{|
> 00000010  7a 7a 79 79 7a 7a 7a 7a  7b 7b 7b 7c 7c 7c 7b 7b  |zzyyzzzz{{{|||{{|
> 00000020  7c 7c 7c 7b 7b 7b 7b 7b  7c 7c 7c 7c 7c 7b 7b 7b  ||||{{{{{|||||{{{|
> 00000030  7b 7b 7b 7b 7b 7b 7b 7b  7b 7a 7a 7a 7a 79 79 7a  |{{{{{{{{{zzzzyyz|
> 00000040  7b 7b 7b 7b 7a 7b 7b 7c  7b 7c 7c 7b 7c 7c 7b 7b  |{{{{z{{|{||{||{{|

All tree roots will ultimately point to the same data blocks unless the
file was written in between.  Tree roots share all pages except for the
unique pages that are introduced in the tree root's transaction.

> Those repeated 7a/b/c is wrong data. Also I'm not sure if these files
> have been corrupted now or more in past... So I need to check if
> checksum matches.
> 
> > It might be possible to recover by manually inspecting the corrupted
> > metadata blocks and making guesses and adjustments, but that could take
> > even longer than check --repair if there are thousands of damaged pages.
> >
> 
> I want to look into this but not sure if there are any tools with
> which it would be easy to inspect data. dump-tree is nice but it
> doesn't work when checksum is incorrect.

It also doesn't work if an interior tree page is destroyed.  In that
case, nothing less than --init-extent-tree can find the page--and even
init-extent-tree needs to read a few pages to work.

> My current plan is:
> 1. btrfs restore only valid files, how? I don't want to mix good files
> with corrupted ones together
> 2. look into how exactly extent tree is corrupted
> 3. try to see if few of corrupted files can be recovered in some way
> 4. do btrfs check -b --repair (maybe if extent tree can be fixed then
> wouldn't need to use -b flag)
> 5. try to mount and btrfs scrub
> 6. maybe wipe and recreate new fielsystem

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-02-22  5:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-01-13 23:09 ERROR: failed to read block groups: Input/output error Dāvis Mosāns
2021-01-13 23:39 ` Dāvis Mosāns
2021-02-19  3:03   ` Dāvis Mosāns
2021-02-19  5:16     ` Chris Murphy
2021-02-19 19:29 ` Zygo Blaxell
2021-02-20 23:45   ` Dāvis Mosāns
2021-02-21  1:03     ` Dāvis Mosāns
2021-02-21  1:08       ` Qu Wenruo
2021-02-21  2:21         ` Dāvis Mosāns
2021-02-22  5:22     ` Zygo Blaxell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).