public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* Recovery of BTRFS critical (device md126): corrupt  leaf, bad key order: block=10872141938688, root=1, slot=119
@ 2022-04-28 13:54 alex.challis
  2022-04-28 14:22 ` Hugo Mills
  0 siblings, 1 reply; 4+ messages in thread
From: alex.challis @ 2022-04-28 13:54 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3599 bytes --]

Dear BTRFS Team

Have a NetGear ReadyNas that uses brtfs for the data volume 
(/dev/disk/by-label/33eaff11\:HDD1).

Was attempting to stop a running container (Docker CE) around the time 
the failure happened. Had just docker pulled a new version of 
container. Not 100% sure they were related but NAS dropped data volume 
into RO mode around the time of stopping the container. Subsequent 
attempts to docker rm the container failed with read-only file system 
errors. Upon re-boot the data volume would no longer mount.

  uname -a:
Linux fatterboy 4.4.218.x86_64.1 #1 SMP Sun Nov 7 15:20:05 UTC 2021 
x86_64 GNU/Linux

   btrfs --version:
btrfs-progs v4.16

   btrfs fi show:
Label: '33eaff11:root'  uuid: e360cd8a-7496-4714-a0b7-dadb4829e6f5
         Total devices 1 FS bytes used 993.29MiB
         devid    1 size 4.00GiB used 2.45GiB path /dev/md0

Label: '33eaff11:HDD1'  uuid: 9dbd11f2-da2f-4f68-a4e9-552cbc90d1e0
         Total devices 2 FS bytes used 4.25TiB
         devid    1 size 5.44TiB used 4.41TiB path /dev/md126
         devid    2 size 461.13GiB used 7.03GiB path /dev/md127

   btrfs fi df /HDD1 :
Data, single: total=2.04GiB, used=979.09MiB
System, DUP: total=8.00MiB, used=16.00KiB
Metadata, DUP: total=204.56MiB, used=14.19MiB
GlobalReserve, single: total=16.00MiB, used=0.00B

   dmesg > dmesg.log
Attached


Culprit seems to be:
  dmesg | grep -i btrfs
[    1.337264] Btrfs loaded, crc32c=crc32c-generic
[   23.296969] BTRFS: device label 33eaff11:root devid 1 transid 2341967 
/dev/md0
[   23.297437] BTRFS info (device md0): has skinny extents
[   24.505292] BTRFS: device label 33eaff11:HDD1 devid 2 transid 1424350 
/dev/md127
[   24.643613] BTRFS: device label 33eaff11:HDD1 devid 1 transid 1424350 
/dev/md126
[   24.800256] BTRFS info (device md126): has skinny extents
[   24.894582] BTRFS critical (device md126): corrupt leaf, bad key 
order: block=10872141938688, root=1, slot=119
[   24.894596] BTRFS error (device md126): failed to read block groups: 
-5
[   24.894811] BTRFS error (device md126): failed to read block groups: 
-17
[   24.898074] BTRFS error (device md126): failed to read block groups: 
-17
[   24.912298] BTRFS error (device md126): failed to read block groups: 
-17
[   24.912851] BTRFS error (device md126): parent transid verify failed 
on 10872188272640 wanted 1424347 found 1424349
[   24.912857] BTRFS warning (device md126): failed to read tree root
[   24.933058] BTRFS error (device md126): open_ctree failed

  btrfs-debug-tree -b 10872141938688 /dev/disk/by-label/33eaff11\:HDD1
<clip>
         item 117 key (1127493074944 METADATA_ITEM 0) itemoff 27954 
itemsize 33
                 refs 1 gen 23101 flags TREE_BLOCK
                 tree block skinny level 0
                 tree block backref root 7
         item 118 key (1127493107712 METADATA_ITEM 0) itemoff 27894 
itemsize 60
                 refs 4 gen 718838 flags TREE_BLOCK|FULL_BACKREF
                 tree block skinny level 0
                 shared block backref parent 4593432821760
                 shared block backref parent 4593432788992
                 shared block backref parent 4593432756224
                 shared block backref parent 4593432723456
         item 119 key (2211708928 UNKNOWN.0 0) itemoff 27834 itemsize 60
         item 120 key (1127493173248 METADATA_ITEM 0) itemoff 27801 
itemsize 33
                 refs 1 gen 29828 flags TREE_BLOCK
                 tree block skinny level 0
                 tree block backref root 7
<clip>

Key 119 is out of sequence and type UNKNOWN (!?)



Please advise on recovery please?


Cheers
Alex.

[-- Attachment #2: dmesg.log --]
[-- Type: application/unknown, Size: 57145 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Recovery of BTRFS critical (device md126): corrupt  leaf, bad key order: block=10872141938688, root=1, slot=119
  2022-04-28 13:54 Recovery of BTRFS critical (device md126): corrupt leaf, bad key order: block=10872141938688, root=1, slot=119 alex.challis
@ 2022-04-28 14:22 ` Hugo Mills
  2022-04-29 17:02   ` alex.challis
  0 siblings, 1 reply; 4+ messages in thread
From: Hugo Mills @ 2022-04-28 14:22 UTC (permalink / raw)
  To: alex.challis; +Cc: linux-btrfs

On Thu, Apr 28, 2022 at 02:54:09PM +0100, alex.challis wrote:
> Dear BTRFS Team
> 
> Have a NetGear ReadyNas that uses brtfs for the data volume
> (/dev/disk/by-label/33eaff11\:HDD1).
> 
> Was attempting to stop a running container (Docker CE) around the time the
> failure happened. Had just docker pulled a new version of container. Not
> 100% sure they were related but NAS dropped data volume into RO mode around
> the time of stopping the container. Subsequent attempts to docker rm the
> container failed with read-only file system errors. Upon re-boot the data
> volume would no longer mount.
> 
>  uname -a:
> Linux fatterboy 4.4.218.x86_64.1 #1 SMP Sun Nov 7 15:20:05 UTC 2021 x86_64
> GNU/Linux
> 
>   btrfs --version:
> btrfs-progs v4.16
> 
>   btrfs fi show:
> Label: '33eaff11:root'  uuid: e360cd8a-7496-4714-a0b7-dadb4829e6f5
>         Total devices 1 FS bytes used 993.29MiB
>         devid    1 size 4.00GiB used 2.45GiB path /dev/md0
> 
> Label: '33eaff11:HDD1'  uuid: 9dbd11f2-da2f-4f68-a4e9-552cbc90d1e0
>         Total devices 2 FS bytes used 4.25TiB
>         devid    1 size 5.44TiB used 4.41TiB path /dev/md126
>         devid    2 size 461.13GiB used 7.03GiB path /dev/md127
> 
>   btrfs fi df /HDD1 :
> Data, single: total=2.04GiB, used=979.09MiB
> System, DUP: total=8.00MiB, used=16.00KiB
> Metadata, DUP: total=204.56MiB, used=14.19MiB
> GlobalReserve, single: total=16.00MiB, used=0.00B
> 
>   dmesg > dmesg.log
> Attached
> 
> 
> Culprit seems to be:
>  dmesg | grep -i btrfs
> [    1.337264] Btrfs loaded, crc32c=crc32c-generic
> [   23.296969] BTRFS: device label 33eaff11:root devid 1 transid 2341967
> /dev/md0
> [   23.297437] BTRFS info (device md0): has skinny extents
> [   24.505292] BTRFS: device label 33eaff11:HDD1 devid 2 transid 1424350
> /dev/md127
> [   24.643613] BTRFS: device label 33eaff11:HDD1 devid 1 transid 1424350
> /dev/md126
> [   24.800256] BTRFS info (device md126): has skinny extents
> [   24.894582] BTRFS critical (device md126): corrupt leaf, bad key order:
> block=10872141938688, root=1, slot=119
> [   24.894596] BTRFS error (device md126): failed to read block groups: -5
> [   24.894811] BTRFS error (device md126): failed to read block groups: -17
> [   24.898074] BTRFS error (device md126): failed to read block groups: -17
> [   24.912298] BTRFS error (device md126): failed to read block groups: -17
> [   24.912851] BTRFS error (device md126): parent transid verify failed on
> 10872188272640 wanted 1424347 found 1424349
> [   24.912857] BTRFS warning (device md126): failed to read tree root
> [   24.933058] BTRFS error (device md126): open_ctree failed
> 
>  btrfs-debug-tree -b 10872141938688 /dev/disk/by-label/33eaff11\:HDD1
> <clip>
>         item 117 key (1127493074944 METADATA_ITEM 0) itemoff 27954 itemsize
> 33
>                 refs 1 gen 23101 flags TREE_BLOCK
>                 tree block skinny level 0
>                 tree block backref root 7
>         item 118 key (1127493107712 METADATA_ITEM 0) itemoff 27894 itemsize
> 60
>                 refs 4 gen 718838 flags TREE_BLOCK|FULL_BACKREF
>                 tree block skinny level 0
>                 shared block backref parent 4593432821760
>                 shared block backref parent 4593432788992
>                 shared block backref parent 4593432756224
>                 shared block backref parent 4593432723456
>         item 119 key (2211708928 UNKNOWN.0 0) itemoff 27834 itemsize 60
>         item 120 key (1127493173248 METADATA_ITEM 0) itemoff 27801 itemsize
> 33
>                 refs 1 gen 29828 flags TREE_BLOCK
>                 tree block skinny level 0
>                 tree block backref root 7
> <clip>
> 
> Key 119 is out of sequence and type UNKNOWN (!?)

The first elements of the key tuples for 118-120 are:

0x10683d38000
0x00083d40000
0x10683d48000

   This, along with the UNKNOWN.0, suggests that something has written
a very small number of zero bytes into the metadata page while it was
in RAM (probably 4 or 8 bytes, as nothing else seems to be damaged).

   It's definitely happened in RAM, as the checksum is correct. We'd
have had a csum failure if the corruption happened on disk.

   This is an indication either of a broken driver that's done some
bad pointer arithmetic and stomped on memory that it doesn't own, or
(more likely, in my opinion) some bad RAM that's flipped a bit on an
address held in kernel memory somewhere, and led something to zero the
wrong area of RAM.

> Please advise on recovery please?

   I don't think there's anything in btrfs check that could fix this
(although I might be wrong). Your first task, though, should be to try
to identify and replace the broken RAM on this machine. Once that's
done, one of the devs may be able to help you with a custom patch to
btrfs check to fix it -- but don't do that until the hardware's
repaired.

   Hugo.


-- 
Hugo Mills             | I spent most of my money on drink, women and fast
hugo@... carfax.org.uk | cars. The rest I wasted.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                            James Hunt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Recovery of BTRFS critical (device md126): corrupt  leaf, bad key order: block=10872141938688, root=1, slot=119
  2022-04-28 14:22 ` Hugo Mills
@ 2022-04-29 17:02   ` alex.challis
  2022-05-06 13:43     ` alex.challis
  0 siblings, 1 reply; 4+ messages in thread
From: alex.challis @ 2022-04-29 17:02 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

Thank you for the advice Hugo, have now replaced the RAM.

Would one of the devs be able to help with custom patch to btrfs check 
to fix it please?

Many thanks!

Cheers
Alex.



------ Original Message ------
From: "Hugo Mills" <hugo@carfax.org.uk>
To: "alex.challis" <alex.challis@btinternet.com>
Cc: linux-btrfs@vger.kernel.org
Sent: Thursday, 28 Apr, 2022 At 15:22
Subject: Re: Recovery of BTRFS critical (device md126): corrupt  leaf, 
bad key order: block=10872141938688, root=1, slot=119

On Thu, Apr 28, 2022 at 02:54:09PM +0100, alex.challis wrote:

Dear BTRFS Team



Have a NetGear ReadyNas that uses brtfs for the data volume

(/dev/disk/by-label/33eaff11\:HDD1).



Was attempting to stop a running container (Docker CE) around the time 
the

failure happened. Had just docker pulled a new version of container. Not

100% sure they were related but NAS dropped data volume into RO mode 
around

the time of stopping the container. Subsequent attempts to docker rm the

container failed with read-only file system errors. Upon re-boot the 
data

volume would no longer mount.



  uname -a:

Linux fatterboy 4.4.218.x86_64.1 #1 SMP Sun Nov 7 15:20:05 UTC 2021 
x86_64

GNU/Linux



   btrfs --version:

btrfs-progs v4.16



   btrfs fi show:

Label: '33eaff11:root'  uuid: e360cd8a-7496-4714-a0b7-dadb4829e6f5

         Total devices 1 FS bytes used 993.29MiB

         devid    1 size 4.00GiB used 2.45GiB path /dev/md0



Label: '33eaff11:HDD1'  uuid: 9dbd11f2-da2f-4f68-a4e9-552cbc90d1e0

         Total devices 2 FS bytes used 4.25TiB

         devid    1 size 5.44TiB used 4.41TiB path /dev/md126

         devid    2 size 461.13GiB used 7.03GiB path /dev/md127



   btrfs fi df /HDD1 :

Data, single: total=2.04GiB, used=979.09MiB

System, DUP: total=8.00MiB, used=16.00KiB

Metadata, DUP: total=204.56MiB, used=14.19MiB

GlobalReserve, single: total=16.00MiB, used=0.00B



   dmesg > dmesg.log

Attached





Culprit seems to be:

  dmesg | grep -i btrfs

[    1.337264] Btrfs loaded, crc32c=crc32c-generic

[   23.296969] BTRFS: device label 33eaff11:root devid 1 transid 2341967

/dev/md0

[   23.297437] BTRFS info (device md0): has skinny extents

[   24.505292] BTRFS: device label 33eaff11:HDD1 devid 2 transid 1424350

/dev/md127

[   24.643613] BTRFS: device label 33eaff11:HDD1 devid 1 transid 1424350

/dev/md126

[   24.800256] BTRFS info (device md126): has skinny extents

[   24.894582] BTRFS critical (device md126): corrupt leaf, bad key 
order:

block=10872141938688, root=1, slot=119

[   24.894596] BTRFS error (device md126): failed to read block groups: 
-5

[   24.894811] BTRFS error (device md126): failed to read block groups: 
-17

[   24.898074] BTRFS error (device md126): failed to read block groups: 
-17

[   24.912298] BTRFS error (device md126): failed to read block groups: 
-17

[   24.912851] BTRFS error (device md126): parent transid verify failed 
on

10872188272640 wanted 1424347 found 1424349

[   24.912857] BTRFS warning (device md126): failed to read tree root

[   24.933058] BTRFS error (device md126): open_ctree failed



  btrfs-debug-tree -b 10872141938688 /dev/disk/by-label/33eaff11\:HDD1

<clip>

         item 117 key (1127493074944 METADATA_ITEM 0) itemoff 27954 
itemsize

33

                 refs 1 gen 23101 flags TREE_BLOCK

                 tree block skinny level 0

                 tree block backref root 7

         item 118 key (1127493107712 METADATA_ITEM 0) itemoff 27894 
itemsize

60

                 refs 4 gen 718838 flags TREE_BLOCK|FULL_BACKREF

                 tree block skinny level 0

                 shared block backref parent 4593432821760

                 shared block backref parent 4593432788992

                 shared block backref parent 4593432756224

                 shared block backref parent 4593432723456

         item 119 key (2211708928 UNKNOWN.0 0) itemoff 27834 itemsize 60

         item 120 key (1127493173248 METADATA_ITEM 0) itemoff 27801 
itemsize

33

                 refs 1 gen 29828 flags TREE_BLOCK

                 tree block skinny level 0

                 tree block backref root 7

<clip>



Key 119 is out of sequence and type UNKNOWN (!?)



The first elements of the key tuples for 118-120 are:



0x10683d38000

0x00083d40000

0x10683d48000



    This, along with the UNKNOWN.0, suggests that something has written

a very small number of zero bytes into the metadata page while it was

in RAM (probably 4 or 8 bytes, as nothing else seems to be damaged).



    It's definitely happened in RAM, as the checksum is correct. We'd

have had a csum failure if the corruption happened on disk.



    This is an indication either of a broken driver that's done some

bad pointer arithmetic and stomped on memory that it doesn't own, or

(more likely, in my opinion) some bad RAM that's flipped a bit on an

address held in kernel memory somewhere, and led something to zero the

wrong area of RAM.



Please advise on recovery please?



    I don't think there's anything in btrfs check that could fix this

(although I might be wrong). Your first task, though, should be to try

to identify and replace the broken RAM on this machine. Once that's

done, one of the devs may be able to help you with a custom patch to

btrfs check to fix it -- but don't do that until the hardware's

repaired.



    Hugo.





-- 

Hugo Mills             | I spent most of my money on drink, women and 
fast

hugo@... carfax.org.uk | cars. The rest I wasted.

http://carfax.org.uk/  |

PGP: E2AB1DE4          | 
James Hunt


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Recovery of BTRFS critical (device md126): corrupt  leaf, bad key order: block=10872141938688, root=1, slot=119
  2022-04-29 17:02   ` alex.challis
@ 2022-05-06 13:43     ` alex.challis
  0 siblings, 0 replies; 4+ messages in thread
From: alex.challis @ 2022-05-06 13:43 UTC (permalink / raw)
  To: Hugo Mills, alex.challis, linux-btrfs

Dear BTRFS Team

Still hoping one of the devs would help me recover the unmountable 
volume with a custom patch to btrfs check please?
I realise you all give up your own time, so very happy to make a 
donation to dev's charity of choice if it helps justify your time 
investment in helping me out?

Thank you
Cheers
Alex.



------ Original Message ------
From: "alex.challis" <alex.challis@btinternet.com>
To: "Hugo Mills" <hugo@carfax.org.uk>; linux-btrfs@vger.kernel.org
Sent: Friday, 29 Apr, 2022 At 18:02
Subject: Re: Recovery of BTRFS critical (device md126): corrupt  leaf, 
bad key order: block=10872141938688, root=1, slot=119

Thank you for the advice Hugo, have now replaced the RAM.



Would one of the devs be able to help with custom patch to btrfs check 
to fix it please?



Many thanks!



Cheers

Alex.







------ Original Message ------

From: "Hugo Mills" <hugo@carfax.org.uk>

To: "alex.challis" <alex.challis@btinternet.com>

Cc: linux-btrfs@vger.kernel.org

Sent: Thursday, 28 Apr, 2022 At 15:22

Subject: Re: Recovery of BTRFS critical (device md126): corrupt  leaf, 
bad key order: block=10872141938688, root=1, slot=119



On Thu, Apr 28, 2022 at 02:54:09PM +0100, alex.challis wrote:



Dear BTRFS Team







Have a NetGear ReadyNas that uses brtfs for the data volume



(/dev/disk/by-label/33eaff11\:HDD1).







Was attempting to stop a running container (Docker CE) around the time 
the



failure happened. Had just docker pulled a new version of container. Not



100% sure they were related but NAS dropped data volume into RO mode 
around



the time of stopping the container. Subsequent attempts to docker rm the



container failed with read-only file system errors. Upon re-boot the 
data



volume would no longer mount.







  uname -a:



Linux fatterboy 4.4.218.x86_64.1 #1 SMP Sun Nov 7 15:20:05 UTC 2021 
x86_64



GNU/Linux







   btrfs --version:



btrfs-progs v4.16







   btrfs fi show:



Label: '33eaff11:root'  uuid: e360cd8a-7496-4714-a0b7-dadb4829e6f5



         Total devices 1 FS bytes used 993.29MiB



         devid    1 size 4.00GiB used 2.45GiB path /dev/md0







Label: '33eaff11:HDD1'  uuid: 9dbd11f2-da2f-4f68-a4e9-552cbc90d1e0



         Total devices 2 FS bytes used 4.25TiB



         devid    1 size 5.44TiB used 4.41TiB path /dev/md126



         devid    2 size 461.13GiB used 7.03GiB path /dev/md127







   btrfs fi df /HDD1 :



Data, single: total=2.04GiB, used=979.09MiB



System, DUP: total=8.00MiB, used=16.00KiB



Metadata, DUP: total=204.56MiB, used=14.19MiB



GlobalReserve, single: total=16.00MiB, used=0.00B







   dmesg > dmesg.log



Attached











Culprit seems to be:



  dmesg | grep -i btrfs



[    1.337264] Btrfs loaded, crc32c=crc32c-generic



[   23.296969] BTRFS: device label 33eaff11:root devid 1 transid 2341967



/dev/md0



[   23.297437] BTRFS info (device md0): has skinny extents



[   24.505292] BTRFS: device label 33eaff11:HDD1 devid 2 transid 1424350



/dev/md127



[   24.643613] BTRFS: device label 33eaff11:HDD1 devid 1 transid 1424350



/dev/md126



[   24.800256] BTRFS info (device md126): has skinny extents



[   24.894582] BTRFS critical (device md126): corrupt leaf, bad key 
order:



block=10872141938688, root=1, slot=119



[   24.894596] BTRFS error (device md126): failed to read block groups: 
-5



[   24.894811] BTRFS error (device md126): failed to read block groups: 
-17



[   24.898074] BTRFS error (device md126): failed to read block groups: 
-17



[   24.912298] BTRFS error (device md126): failed to read block groups: 
-17



[   24.912851] BTRFS error (device md126): parent transid verify failed 
on



10872188272640 wanted 1424347 found 1424349



[   24.912857] BTRFS warning (device md126): failed to read tree root



[   24.933058] BTRFS error (device md126): open_ctree failed







  btrfs-debug-tree -b 10872141938688 /dev/disk/by-label/33eaff11\:HDD1



<clip>



         item 117 key (1127493074944 METADATA_ITEM 0) itemoff 27954 
itemsize



33



                 refs 1 gen 23101 flags TREE_BLOCK



                 tree block skinny level 0



                 tree block backref root 7



         item 118 key (1127493107712 METADATA_ITEM 0) itemoff 27894 
itemsize



60



                 refs 4 gen 718838 flags TREE_BLOCK|FULL_BACKREF



                 tree block skinny level 0



                 shared block backref parent 4593432821760



                 shared block backref parent 4593432788992



                 shared block backref parent 4593432756224



                 shared block backref parent 4593432723456



         item 119 key (2211708928 UNKNOWN.0 0) itemoff 27834 itemsize 60



         item 120 key (1127493173248 METADATA_ITEM 0) itemoff 27801 
itemsize



33



                 refs 1 gen 29828 flags TREE_BLOCK



                 tree block skinny level 0



                 tree block backref root 7



<clip>







Key 119 is out of sequence and type UNKNOWN (!?)







The first elements of the key tuples for 118-120 are:







0x10683d38000



0x00083d40000



0x10683d48000







    This, along with the UNKNOWN.0, suggests that something has written



a very small number of zero bytes into the metadata page while it was



in RAM (probably 4 or 8 bytes, as nothing else seems to be damaged).







    It's definitely happened in RAM, as the checksum is correct. We'd



have had a csum failure if the corruption happened on disk.







    This is an indication either of a broken driver that's done some



bad pointer arithmetic and stomped on memory that it doesn't own, or



(more likely, in my opinion) some bad RAM that's flipped a bit on an



address held in kernel memory somewhere, and led something to zero the



wrong area of RAM.







Please advise on recovery please?







    I don't think there's anything in btrfs check that could fix this



(although I might be wrong). Your first task, though, should be to try



to identify and replace the broken RAM on this machine. Once that's



done, one of the devs may be able to help you with a custom patch to



btrfs check to fix it -- but don't do that until the hardware's



repaired.







    Hugo.











-- 



Hugo Mills             | I spent most of my money on drink, women and 
fast



hugo@... carfax.org.uk | cars. The rest I wasted.



http://carfax.org.uk/  |



PGP: E2AB1DE4          | James Hunt




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-06 13:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-28 13:54 Recovery of BTRFS critical (device md126): corrupt leaf, bad key order: block=10872141938688, root=1, slot=119 alex.challis
2022-04-28 14:22 ` Hugo Mills
2022-04-29 17:02   ` alex.challis
2022-05-06 13:43     ` alex.challis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox