From: "alex.challis" <alex.challis@btinternet.com>
To: Hugo Mills <hugo@carfax.org.uk>, linux-btrfs@vger.kernel.org
Subject: Re: Recovery of BTRFS critical (device md126): corrupt leaf, bad key order: block=10872141938688, root=1, slot=119
Date: Fri, 29 Apr 2022 18:02:12 +0100 (BST) [thread overview]
Message-ID: <7122e22a.9e4d.1807645de56.Webtop.83@btinternet.com> (raw)
In-Reply-To: <20220428142248.GF15632@savella.carfax.org.uk>
Thank you for the advice Hugo, have now replaced the RAM.
Would one of the devs be able to help with custom patch to btrfs check
to fix it please?
Many thanks!
Cheers
Alex.
------ Original Message ------
From: "Hugo Mills" <hugo@carfax.org.uk>
To: "alex.challis" <alex.challis@btinternet.com>
Cc: linux-btrfs@vger.kernel.org
Sent: Thursday, 28 Apr, 2022 At 15:22
Subject: Re: Recovery of BTRFS critical (device md126): corrupt leaf,
bad key order: block=10872141938688, root=1, slot=119
On Thu, Apr 28, 2022 at 02:54:09PM +0100, alex.challis wrote:
Dear BTRFS Team
Have a NetGear ReadyNas that uses brtfs for the data volume
(/dev/disk/by-label/33eaff11\:HDD1).
Was attempting to stop a running container (Docker CE) around the time
the
failure happened. Had just docker pulled a new version of container. Not
100% sure they were related but NAS dropped data volume into RO mode
around
the time of stopping the container. Subsequent attempts to docker rm the
container failed with read-only file system errors. Upon re-boot the
data
volume would no longer mount.
uname -a:
Linux fatterboy 4.4.218.x86_64.1 #1 SMP Sun Nov 7 15:20:05 UTC 2021
x86_64
GNU/Linux
btrfs --version:
btrfs-progs v4.16
btrfs fi show:
Label: '33eaff11:root' uuid: e360cd8a-7496-4714-a0b7-dadb4829e6f5
Total devices 1 FS bytes used 993.29MiB
devid 1 size 4.00GiB used 2.45GiB path /dev/md0
Label: '33eaff11:HDD1' uuid: 9dbd11f2-da2f-4f68-a4e9-552cbc90d1e0
Total devices 2 FS bytes used 4.25TiB
devid 1 size 5.44TiB used 4.41TiB path /dev/md126
devid 2 size 461.13GiB used 7.03GiB path /dev/md127
btrfs fi df /HDD1 :
Data, single: total=2.04GiB, used=979.09MiB
System, DUP: total=8.00MiB, used=16.00KiB
Metadata, DUP: total=204.56MiB, used=14.19MiB
GlobalReserve, single: total=16.00MiB, used=0.00B
dmesg > dmesg.log
Attached
Culprit seems to be:
dmesg | grep -i btrfs
[ 1.337264] Btrfs loaded, crc32c=crc32c-generic
[ 23.296969] BTRFS: device label 33eaff11:root devid 1 transid 2341967
/dev/md0
[ 23.297437] BTRFS info (device md0): has skinny extents
[ 24.505292] BTRFS: device label 33eaff11:HDD1 devid 2 transid 1424350
/dev/md127
[ 24.643613] BTRFS: device label 33eaff11:HDD1 devid 1 transid 1424350
/dev/md126
[ 24.800256] BTRFS info (device md126): has skinny extents
[ 24.894582] BTRFS critical (device md126): corrupt leaf, bad key
order:
block=10872141938688, root=1, slot=119
[ 24.894596] BTRFS error (device md126): failed to read block groups:
-5
[ 24.894811] BTRFS error (device md126): failed to read block groups:
-17
[ 24.898074] BTRFS error (device md126): failed to read block groups:
-17
[ 24.912298] BTRFS error (device md126): failed to read block groups:
-17
[ 24.912851] BTRFS error (device md126): parent transid verify failed
on
10872188272640 wanted 1424347 found 1424349
[ 24.912857] BTRFS warning (device md126): failed to read tree root
[ 24.933058] BTRFS error (device md126): open_ctree failed
btrfs-debug-tree -b 10872141938688 /dev/disk/by-label/33eaff11\:HDD1
<clip>
item 117 key (1127493074944 METADATA_ITEM 0) itemoff 27954
itemsize
33
refs 1 gen 23101 flags TREE_BLOCK
tree block skinny level 0
tree block backref root 7
item 118 key (1127493107712 METADATA_ITEM 0) itemoff 27894
itemsize
60
refs 4 gen 718838 flags TREE_BLOCK|FULL_BACKREF
tree block skinny level 0
shared block backref parent 4593432821760
shared block backref parent 4593432788992
shared block backref parent 4593432756224
shared block backref parent 4593432723456
item 119 key (2211708928 UNKNOWN.0 0) itemoff 27834 itemsize 60
item 120 key (1127493173248 METADATA_ITEM 0) itemoff 27801
itemsize
33
refs 1 gen 29828 flags TREE_BLOCK
tree block skinny level 0
tree block backref root 7
<clip>
Key 119 is out of sequence and type UNKNOWN (!?)
The first elements of the key tuples for 118-120 are:
0x10683d38000
0x00083d40000
0x10683d48000
This, along with the UNKNOWN.0, suggests that something has written
a very small number of zero bytes into the metadata page while it was
in RAM (probably 4 or 8 bytes, as nothing else seems to be damaged).
It's definitely happened in RAM, as the checksum is correct. We'd
have had a csum failure if the corruption happened on disk.
This is an indication either of a broken driver that's done some
bad pointer arithmetic and stomped on memory that it doesn't own, or
(more likely, in my opinion) some bad RAM that's flipped a bit on an
address held in kernel memory somewhere, and led something to zero the
wrong area of RAM.
Please advise on recovery please?
I don't think there's anything in btrfs check that could fix this
(although I might be wrong). Your first task, though, should be to try
to identify and replace the broken RAM on this machine. Once that's
done, one of the devs may be able to help you with a custom patch to
btrfs check to fix it -- but don't do that until the hardware's
repaired.
Hugo.
--
Hugo Mills | I spent most of my money on drink, women and
fast
hugo@... carfax.org.uk | cars. The rest I wasted.
http://carfax.org.uk/ |
PGP: E2AB1DE4 |
James Hunt
next prev parent reply other threads:[~2022-04-29 17:02 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-28 13:54 Recovery of BTRFS critical (device md126): corrupt leaf, bad key order: block=10872141938688, root=1, slot=119 alex.challis
2022-04-28 14:22 ` Hugo Mills
2022-04-29 17:02 ` alex.challis [this message]
2022-05-06 13:43 ` alex.challis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7122e22a.9e4d.1807645de56.Webtop.83@btinternet.com \
--to=alex.challis@btinternet.com \
--cc=hugo@carfax.org.uk \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox