From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: BERGUE Kevin <kevin.bergue-externe@hemeria-group.com>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Root level mismatch after sudden shutdown
Date: Thu, 10 Feb 2022 09:33:45 +0800 [thread overview]
Message-ID: <51b5c958-1df6-e95e-d394-c95a0863ea0f@gmx.com> (raw)
In-Reply-To: <776a73dbf91d4518a36b465ac9ac2d5a@hemeria-group.com>
On 2022/2/9 18:53, BERGUE Kevin wrote:
> Hello everyone.
>
>
> After a sudden shutdown my btrfs partition seems to be corrupted. After a few hours of reading documentation and trying various repair methods I found an error message I can't find anywhere so I'm sending it your way hoping you can at least explain what the issue is. The disk was running with linux 5.16.5 at the moment of the crash, my recovery environnement is a linux 5.15.16 machine with btrfs-progs v5.16.
>
>
> To retrace my steps a bit:
>
> - I tried to mount my partition normally:
> # mount /dev/mapper/SSD-Root /mnt/broken/
> mount: /mnt/broken: wrong fs type, bad option, bad superblock on /dev/mapper/SSD-Root, missing codepage or helper program, or other error.
>
> - I then looked at the relevant logs from dmesg:
> # dmesg
> [ 2118.381387] BTRFS info (device dm-1): flagging fs with big metadata feature
> [ 2118.381394] BTRFS info (device dm-1): disk space caching is enabled
> [ 2118.381395] BTRFS info (device dm-1): has skinny extents
> [ 2118.384626] BTRFS error (device dm-1): parent transid verify failed on 1869491683328 wanted 526959 found 526999
> [ 2118.384900] BTRFS error (device dm-1): parent transid verify failed on 1869491683328 wanted 526959 found 526999
Transid mismatch, and it's newer tree blocks overwriting old tree blocks.
This means two possible situations:
- The SSD is lying about its flush/fua opeartions
This means, the SSD firmware doesn't really write all its data back to
its NAND when it reports flush/fua operations are finished.
This mostly means the SSD is not reliable for any filesystems.
Although traditional journal based filesystems may have a better
chance to survive.
- Some corrupted v1 cache makes btrfs to break its CoW checks
This can only happen for v1 space cache, and is pretty tricky to
happen, as v1 cache has its own checksum.
And now v2 cache is the default for newly created btrfs.
- Corrupted extent tree screwing up metadata CoW
Normally it would cause transaction abort in the first place though.
- Complex storage stack added extra points to cause FLUSH/FUA problem
If you're using things like LVM/LUKS/Bcache, it's more complex and
any stack in the middle can cause FLUSH/FUA problem if there is some
bugs.
Mind to share the following info?
- Storage stack
From hardware disk to the top level btrfs, all things like LVM/LUKS
/Bcache would help us to understand the situation.
- SSD model
To see if it's some known one to have FLUSH/FUA problems.
Thanks,
Qu
> [ 2118.384905] BTRFS warning (device dm-1): failed to read root (objectid=4): -5
> [ 2118.385304] BTRFS error (device dm-1): open_ctree failed
>
> - After some reading about the "parent transid verify failed" errors I tried to mount the volume with the usebackuproot flag:
> # mount -t btrfs -o ro,rescue=usebackuproot /dev/mapper/SSD-Root /mnt/broken/
> mount: /mnt/broken: wrong fs type, bad option, bad superblock on /dev/mapper/SSD-Root, missing codepage or helper program, or other error.
> And the dmesg content:
> [ 2442.117867] BTRFS info (device dm-1): flagging fs with big metadata feature
> [ 2442.117871] BTRFS info (device dm-1): trying to use backup root at mount time
> [ 2442.117872] BTRFS info (device dm-1): disk space caching is enabled
> [ 2442.117873] BTRFS info (device dm-1): has skinny extents
> [ 2442.123056] BTRFS error (device dm-1): parent transid verify failed on 1869491683328 wanted 526959 found 526999
> [ 2442.123344] BTRFS error (device dm-1): parent transid verify failed on 1869491683328 wanted 526959 found 526999
> [ 2442.123348] BTRFS warning (device dm-1): failed to read root (objectid=4): -5
> [ 2442.124743] BTRFS error (device dm-1): parent transid verify failed on 1869491683328 wanted 526959 found 526999
> [ 2442.124939] BTRFS error (device dm-1): parent transid verify failed on 1869491683328 wanted 526959 found 526999
> [ 2442.124942] BTRFS warning (device dm-1): failed to read root (objectid=4): -5
> [ 2442.125196] BTRFS critical (device dm-1): corrupt leaf: block=1869863370752 slot=97 extent bytenr=920192651264 len=4096 invalid generation, have 527002 expect (0, 527001]
> [ 2442.125201] BTRFS error (device dm-1): block=1869863370752 read time tree block corruption detected
> [ 2442.125500] BTRFS critical (device dm-1): corrupt leaf: block=1869863370752 slot=97 extent bytenr=920192651264 len=4096 invalid generation, have 527002 expect (0, 527001]
> [ 2442.125502] BTRFS error (device dm-1): block=1869863370752 read time tree block corruption detected
> [ 2442.125508] BTRFS warning (device dm-1): couldn't read tree root
> [ 2442.125806] BTRFS critical (device dm-1): corrupt leaf: block=1869866401792 slot=117 extent bytenr=906206486528 len=4096 invalid generation, have 527003 expect (0, 527002]
> [ 2442.125808] BTRFS error (device dm-1): block=1869866401792 read time tree block corruption detected
> [ 2442.126174] BTRFS critical (device dm-1): corrupt leaf: block=1869866401792 slot=117 extent bytenr=906206486528 len=4096 invalid generation, have 527003 expect (0, 527002]
> [ 2442.126175] BTRFS error (device dm-1): block=1869866401792 read time tree block corruption detected
> [ 2442.126184] BTRFS warning (device dm-1): couldn't read tree root
> [ 2442.126599] BTRFS error (device dm-1): open_ctree failed
>
> - I then tried a check:
> # btrfs check /dev/mapper/SSD-Root
> Opening filesystem to check...
> parent transid verify failed on 1869491683328 wanted 526959 found 526999
> parent transid verify failed on 1869491683328 wanted 526959 found 526999
> parent transid verify failed on 1869491683328 wanted 526959 found 526999
> Ignoring transid failure
> ERROR: root [4 0] level 0 does not match 1
>
> Couldn't setup device tree
> ERROR: cannot open file system
>
>
> I think the "root [4 0] level 0 does not match 1" error is the real cuprit but I can't seem to find any info on this message anywhere. I tried a bunch of other commands including:
> - btrfs rescue zero-log
> - btrfs rescue chunk-recover
> - btrfs check --repair
> - btrfs rescue super-recover
> - btrfs check --repair with the tree root found by btrfs-find-root
> - btrfs check --repair --init-csum-tree --init-extent-tree
> - btrfs restore
>
> I'm aware I probably executed some commands that don't make a lot of sense in my context but all of them failed with either the "root [4 0] level 0 does not match 1" message or a more generic "could not open ctree" or equivalent.
next prev parent reply other threads:[~2022-02-10 2:00 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-09 10:53 Root level mismatch after sudden shutdown BERGUE Kevin
2022-02-09 21:03 ` Chris Murphy
2022-02-10 8:44 ` BERGUE Kevin
2022-02-10 1:33 ` Qu Wenruo [this message]
2022-02-10 8:48 ` BERGUE Kevin
2022-02-10 23:20 ` Chris Murphy
2022-02-16 7:54 ` Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51b5c958-1df6-e95e-d394-c95a0863ea0f@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=kevin.bergue-externe@hemeria-group.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox