* BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096
@ 2024-09-17 18:04 Chris Murphy
2024-09-17 21:40 ` Qu Wenruo
0 siblings, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2024-09-17 18:04 UTC (permalink / raw)
To: Btrfs BTRFS
Happens with 6.10.6-6.10.9, does not happen with 6.9.7.
Complete kernel messages are attached to the bug report
https://bugzilla.redhat.com/show_bug.cgi?id=2312886
kernel message excerpts:
Sep 17 00:55:42 kernel: page: refcount:4 mapcount:0 mapping:00000000339eecab index:0xef2f60 pfn:0x1528ae
Sep 17 00:55:42 kernel: memcg:ffff9a2180399000
Sep 17 00:55:42 kernel: aops:btree_aops ino:1
Sep 17 00:55:42 kernel: flags: 0x17ffffd600422e(referenced|uptodate|lru|workingset|private|writeback|node=0|zone=2|lastcpupid=0x1fffff)
Sep 17 00:55:42 kernel: raw: 0017ffffd600422e ffffe31e054a2bc8 ffffe31e054a2b48 ffff9a2180488338
Sep 17 00:55:42 kernel: raw: 0000000000ef2f60 ffff9a232a13c1e0 00000004ffffffff ffff9a2180399000
Sep 17 00:55:42 kernel: page dumped because: eb page dump
Sep 17 00:55:43 kernel: BTRFS critical (device vda3): corrupt node: root=2 block=64205750272 slot=121, unaligned pointer, have 64012238993 should be aligned to 4096
Sep 17 00:55:43 kernel: BTRFS info (device vda3): node 64205750272 level 1 gen 2593 total ptrs 206 free spc 287 owner 2
...
Sep 17 00:55:43 kernel: BTRFS error (device vda3): block=64205750272 write time tree block corruption detected
Sep 17 00:55:43 kernel: page: refcount:4 mapcount:0 mapping:00000000339eecab index:0xef3a90 pfn:0x1ce336
Sep 17 00:55:43 kernel: memcg:ffff9a2180399000
Sep 17 00:55:43 kernel: aops:btree_aops ino:1
Sep 17 00:55:43 kernel: flags: 0x17ffffd600422e(referenced|uptodate|lru|workingset|private|writeback|node=0|zone=2|lastcpupid=0x1fffff)
Sep 17 00:55:43 kernel: raw: 0017ffffd600422e ffffe31e04b94988 ffffe31e0738cdc8 ffff9a2180488338
Sep 17 00:55:43 kernel: raw: 0000000000ef3a90 ffff9a2192f701e0 00000004ffffffff ffff9a2180399000
Sep 17 00:55:43 kernel: page dumped because: eb page dump
Sep 17 00:55:43 kernel: BTRFS critical (device vda3): corrupt leaf: root=256 block=64217481216 slot=3 ino=16205860, invalid dir item type, have 33 expect (0, 9)
...
Sep 17 00:55:43 kernel: BTRFS error (device vda3): block=64217481216 write time tree block corruption detected
Sep 17 00:55:43 kernel: BTRFS: error (device vda3) in btrfs_commit_transaction:2505: errno=-5 IO failure (Error while writing out transaction)
Sep 17 00:55:43 kernel: BTRFS info (device vda3 state E): forced readonly
--
Chris Murphy
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096
2024-09-17 18:04 BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096 Chris Murphy
@ 2024-09-17 21:40 ` Qu Wenruo
2024-09-23 22:02 ` Chris Murphy
0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2024-09-17 21:40 UTC (permalink / raw)
To: Chris Murphy, Btrfs BTRFS
在 2024/9/18 03:34, Chris Murphy 写道:
> Happens with 6.10.6-6.10.9, does not happen with 6.9.7.
>
> Complete kernel messages are attached to the bug report
> https://bugzilla.redhat.com/show_bug.cgi?id=2312886
>
> kernel message excerpts:
>
> Sep 17 00:55:42 kernel: page: refcount:4 mapcount:0 mapping:00000000339eecab index:0xef2f60 pfn:0x1528ae
> Sep 17 00:55:42 kernel: memcg:ffff9a2180399000
> Sep 17 00:55:42 kernel: aops:btree_aops ino:1
> Sep 17 00:55:42 kernel: flags: 0x17ffffd600422e(referenced|uptodate|lru|workingset|private|writeback|node=0|zone=2|lastcpupid=0x1fffff)
> Sep 17 00:55:42 kernel: raw: 0017ffffd600422e ffffe31e054a2bc8 ffffe31e054a2b48 ffff9a2180488338
> Sep 17 00:55:42 kernel: raw: 0000000000ef2f60 ffff9a232a13c1e0 00000004ffffffff ffff9a2180399000
> Sep 17 00:55:42 kernel: page dumped because: eb page dump
> Sep 17 00:55:43 kernel: BTRFS critical (device vda3): corrupt node: root=2 block=64205750272 slot=121, unaligned pointer, have 64012238993 should be aligned to 4096
> Sep 17 00:55:43 kernel: BTRFS info (device vda3): node 64205750272 level 1 gen 2593 total ptrs 206 free spc 287 owner 2
> ...
> Sep 17 00:55:43 kernel: BTRFS error (device vda3): block=64205750272 write time tree block corruption detected
> Sep 17 00:55:43 kernel: page: refcount:4 mapcount:0 mapping:00000000339eecab index:0xef3a90 pfn:0x1ce336
> Sep 17 00:55:43 kernel: memcg:ffff9a2180399000
> Sep 17 00:55:43 kernel: aops:btree_aops ino:1
> Sep 17 00:55:43 kernel: flags: 0x17ffffd600422e(referenced|uptodate|lru|workingset|private|writeback|node=0|zone=2|lastcpupid=0x1fffff)
> Sep 17 00:55:43 kernel: raw: 0017ffffd600422e ffffe31e04b94988 ffffe31e0738cdc8 ffff9a2180488338
> Sep 17 00:55:43 kernel: raw: 0000000000ef3a90 ffff9a2192f701e0 00000004ffffffff ffff9a2180399000
> Sep 17 00:55:43 kernel: page dumped because: eb page dump
> Sep 17 00:55:43 kernel: BTRFS critical (device vda3): corrupt leaf: root=256 block=64217481216 slot=3 ino=16205860, invalid dir item type, have 33 expect (0, 9)
>
> ...
> Sep 17 00:55:43 kernel: BTRFS error (device vda3): block=64217481216 write time tree block corruption detected
> Sep 17 00:55:43 kernel: BTRFS: error (device vda3) in btrfs_commit_transaction:2505: errno=-5 IO failure (Error while writing out transaction)
> Sep 17 00:55:43 kernel: BTRFS info (device vda3 state E): forced readonly
>
>
[17 Sep]
It shows everything we need to know:
kernel: #011key 120 (40535728128 168 4096) block 64012189696 gen 2592
kernel: #011key 121 (2774366982960963584 113 5236482350604877970)
block 64012238993 gen 2592
kernel: #011key 122 (40538132480 168 4096) block 64012255232 gen 2592
Obviously the key 121 is corrupted and not continuous with other keys.
Furthermore, the generation looks good, so it looks like a range of
memory is corrupted.
The affected range includes the key and block ptr bytenr.
And for the other write time failure, it may be a bit flip (0x01 ->
0x21), at least there is no other obvious corruption unlike the node
pointer error.
Considering it's a VM for fedora project, I guess it has ECC memory so
that we can rule out the memory corruption by hardware.
[9 SEP]
Is the dmesg truncated? For every EUCLEAN case from
__btrfs_free_extent() we should have an error message line (that's the
standard practice to have an error message for each EUCLEAN error).
I can not see the needed error line, thus hard to say.
Thanks,
Qu
> --
> Chris Murphy
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096
2024-09-17 21:40 ` Qu Wenruo
@ 2024-09-23 22:02 ` Chris Murphy
2024-09-24 14:39 ` Josef Bacik
0 siblings, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2024-09-23 22:02 UTC (permalink / raw)
To: Qu Wenruo, Btrfs BTRFS
New comment in the bug report: https://bugzilla.redhat.com/show_bug.cgi?id=2312886#c4
Except kernel messages. What's interesting:
- multiple arches (also s390x but I don't have any kernel messages for it);
- single workload (koji composes)
- the btrfs error is different for each, this makes me think we're not dealing with a btrfs bug
But how can we get more information about why this is happening? Is running a KASAN enabled kernel useful?
6.10.5-200.fc40.aarch64
Sep 21 14:06:14 kernel: BTRFS critical (device vda3): corrupt leaf: root=256 block=553541238784 slot=9 ino=1551466593, name hash mismatch with key, have 0x00000000cb63b37b expect 0x00000000fec4d99b
Sep 21 14:06:14 kernel: BTRFS info (device vda3): leaf 553541238784 gen 623185 total ptrs 91 free space 5959 owner 256
Sep 21 14:06:14 kernel: BTRFS error (device vda3): block=553541238784 write time tree block corruption detected
Sep 21 14:06:14 kernel: BTRFS: error (device vda3) in btrfs_commit_transaction:2505: errno=-5 IO failure (Error while writing out transaction)
Sep 21 14:06:14 kernel: BTRFS info (device vda3 state E): forced readonly
Sep 21 14:06:14 kernel: BTRFS warning (device vda3 state E): Skipping commit of aborted transaction.
Sep 21 14:06:14 kernel: BTRFS error (device vda3 state EA): Transaction aborted (error -5)
Sep 21 14:06:14 kernel: BTRFS: error (device vda3 state EA) in cleanup_transaction:1999: errno=-5 IO failure
Sep 21 14:06:14 kernel: BTRFS: error (device vda3 state EA) in btrfs_sync_log:3174: errno=-5 IO failure
6.10.5-200.fc40.x86_64
Sep 23 05:17:20 kernel: BTRFS warning (device vda3): csum hole found for disk bytenr range [4880289792, 4880293888)
Sep 23 05:17:20 kernel: BTRFS warning (device vda3): csum hole found for disk bytenr range [4880293888, 4880297984)
Sep 23 05:17:20 kernel: BTRFS warning (device vda3): csum failed root 256 ino 281 off 66670592 csum 0xb85d0050 expected csum 0x00000000 mirror 1
Sep 23 05:17:20 kernel: BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
...
Sep 23 09:38:53 kernel: BTRFS info (device loop0p4): last unmount of filesystem 2cf58b61-2878-4bc4-b0b0-64e75d467fdf
Sep 23 09:39:30 kernel: BTRFS info (device loop2p4): last unmount of filesystem 57ad289b-2ac5-48ea-86bf-841e20af8720
Sep 23 10:24:52 kernel: BTRFS critical (device vda3): corrupt leaf: root=7 block=1374024122368 slot=150, unexpected item end, have 4706614063 expect 14527
Sep 23 10:24:52 kernel: BTRFS info (device vda3): leaf 1374024122368 gen 699790 total ptrs 346 free space 2909 owner 7
--
Chris Murphy
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096
2024-09-23 22:02 ` Chris Murphy
@ 2024-09-24 14:39 ` Josef Bacik
0 siblings, 0 replies; 4+ messages in thread
From: Josef Bacik @ 2024-09-24 14:39 UTC (permalink / raw)
To: Chris Murphy; +Cc: Qu Wenruo, Btrfs BTRFS
On Mon, Sep 23, 2024 at 06:02:31PM -0400, Chris Murphy wrote:
> New comment in the bug report: https://bugzilla.redhat.com/show_bug.cgi?id=2312886#c4
>
> Except kernel messages. What's interesting:
> - multiple arches (also s390x but I don't have any kernel messages for it);
> - single workload (koji composes)
> - the btrfs error is different for each, this makes me think we're not dealing with a btrfs bug
>
> But how can we get more information about why this is happening? Is running a KASAN enabled kernel useful?
Yeah try KASAN. I'm in the middle of a different investigation, once I wrap
that up I'll dig into this some more with Qu. Thanks,
Josef
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-09-24 14:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-17 18:04 BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096 Chris Murphy
2024-09-17 21:40 ` Qu Wenruo
2024-09-23 22:02 ` Chris Murphy
2024-09-24 14:39 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).