linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096
@ 2024-09-17 18:04 Chris Murphy
  2024-09-17 21:40 ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2024-09-17 18:04 UTC (permalink / raw)
  To: Btrfs BTRFS

Happens with 6.10.6-6.10.9, does not happen with 6.9.7.

Complete kernel messages are attached to the bug report
https://bugzilla.redhat.com/show_bug.cgi?id=2312886

kernel message excerpts:

Sep 17 00:55:42 kernel: page: refcount:4 mapcount:0 mapping:00000000339eecab index:0xef2f60 pfn:0x1528ae
Sep 17 00:55:42 kernel: memcg:ffff9a2180399000
Sep 17 00:55:42 kernel: aops:btree_aops ino:1
Sep 17 00:55:42 kernel: flags: 0x17ffffd600422e(referenced|uptodate|lru|workingset|private|writeback|node=0|zone=2|lastcpupid=0x1fffff)
Sep 17 00:55:42 kernel: raw: 0017ffffd600422e ffffe31e054a2bc8 ffffe31e054a2b48 ffff9a2180488338
Sep 17 00:55:42 kernel: raw: 0000000000ef2f60 ffff9a232a13c1e0 00000004ffffffff ffff9a2180399000
Sep 17 00:55:42 kernel: page dumped because: eb page dump
Sep 17 00:55:43 kernel: BTRFS critical (device vda3): corrupt node: root=2 block=64205750272 slot=121, unaligned pointer, have 64012238993 should be aligned to 4096
Sep 17 00:55:43 kernel: BTRFS info (device vda3): node 64205750272 level 1 gen 2593 total ptrs 206 free spc 287 owner 2
...
Sep 17 00:55:43 kernel: BTRFS error (device vda3): block=64205750272 write time tree block corruption detected
Sep 17 00:55:43 kernel: page: refcount:4 mapcount:0 mapping:00000000339eecab index:0xef3a90 pfn:0x1ce336
Sep 17 00:55:43 kernel: memcg:ffff9a2180399000
Sep 17 00:55:43 kernel: aops:btree_aops ino:1
Sep 17 00:55:43 kernel: flags: 0x17ffffd600422e(referenced|uptodate|lru|workingset|private|writeback|node=0|zone=2|lastcpupid=0x1fffff)
Sep 17 00:55:43 kernel: raw: 0017ffffd600422e ffffe31e04b94988 ffffe31e0738cdc8 ffff9a2180488338
Sep 17 00:55:43 kernel: raw: 0000000000ef3a90 ffff9a2192f701e0 00000004ffffffff ffff9a2180399000
Sep 17 00:55:43 kernel: page dumped because: eb page dump
Sep 17 00:55:43 kernel: BTRFS critical (device vda3): corrupt leaf: root=256 block=64217481216 slot=3 ino=16205860, invalid dir item type, have 33 expect (0, 9)

...
Sep 17 00:55:43 kernel: BTRFS error (device vda3): block=64217481216 write time tree block corruption detected
Sep 17 00:55:43 kernel: BTRFS: error (device vda3) in btrfs_commit_transaction:2505: errno=-5 IO failure (Error while writing out transaction)
Sep 17 00:55:43 kernel: BTRFS info (device vda3 state E): forced readonly


--
Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096
  2024-09-17 18:04 BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096 Chris Murphy
@ 2024-09-17 21:40 ` Qu Wenruo
  2024-09-23 22:02   ` Chris Murphy
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2024-09-17 21:40 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS



在 2024/9/18 03:34, Chris Murphy 写道:
> Happens with 6.10.6-6.10.9, does not happen with 6.9.7.
>
> Complete kernel messages are attached to the bug report
> https://bugzilla.redhat.com/show_bug.cgi?id=2312886
>
> kernel message excerpts:
>
> Sep 17 00:55:42 kernel: page: refcount:4 mapcount:0 mapping:00000000339eecab index:0xef2f60 pfn:0x1528ae
> Sep 17 00:55:42 kernel: memcg:ffff9a2180399000
> Sep 17 00:55:42 kernel: aops:btree_aops ino:1
> Sep 17 00:55:42 kernel: flags: 0x17ffffd600422e(referenced|uptodate|lru|workingset|private|writeback|node=0|zone=2|lastcpupid=0x1fffff)
> Sep 17 00:55:42 kernel: raw: 0017ffffd600422e ffffe31e054a2bc8 ffffe31e054a2b48 ffff9a2180488338
> Sep 17 00:55:42 kernel: raw: 0000000000ef2f60 ffff9a232a13c1e0 00000004ffffffff ffff9a2180399000
> Sep 17 00:55:42 kernel: page dumped because: eb page dump
> Sep 17 00:55:43 kernel: BTRFS critical (device vda3): corrupt node: root=2 block=64205750272 slot=121, unaligned pointer, have 64012238993 should be aligned to 4096
> Sep 17 00:55:43 kernel: BTRFS info (device vda3): node 64205750272 level 1 gen 2593 total ptrs 206 free spc 287 owner 2
> ...
> Sep 17 00:55:43 kernel: BTRFS error (device vda3): block=64205750272 write time tree block corruption detected
> Sep 17 00:55:43 kernel: page: refcount:4 mapcount:0 mapping:00000000339eecab index:0xef3a90 pfn:0x1ce336
> Sep 17 00:55:43 kernel: memcg:ffff9a2180399000
> Sep 17 00:55:43 kernel: aops:btree_aops ino:1
> Sep 17 00:55:43 kernel: flags: 0x17ffffd600422e(referenced|uptodate|lru|workingset|private|writeback|node=0|zone=2|lastcpupid=0x1fffff)
> Sep 17 00:55:43 kernel: raw: 0017ffffd600422e ffffe31e04b94988 ffffe31e0738cdc8 ffff9a2180488338
> Sep 17 00:55:43 kernel: raw: 0000000000ef3a90 ffff9a2192f701e0 00000004ffffffff ffff9a2180399000
> Sep 17 00:55:43 kernel: page dumped because: eb page dump
> Sep 17 00:55:43 kernel: BTRFS critical (device vda3): corrupt leaf: root=256 block=64217481216 slot=3 ino=16205860, invalid dir item type, have 33 expect (0, 9)
>
> ...
> Sep 17 00:55:43 kernel: BTRFS error (device vda3): block=64217481216 write time tree block corruption detected
> Sep 17 00:55:43 kernel: BTRFS: error (device vda3) in btrfs_commit_transaction:2505: errno=-5 IO failure (Error while writing out transaction)
> Sep 17 00:55:43 kernel: BTRFS info (device vda3 state E): forced readonly
>
>

[17 Sep]
It shows everything we need to know:

  kernel: #011key 120 (40535728128 168 4096) block 64012189696 gen 2592
  kernel: #011key 121 (2774366982960963584 113 5236482350604877970)
block 64012238993 gen 2592
  kernel: #011key 122 (40538132480 168 4096) block 64012255232 gen 2592

Obviously the key 121 is corrupted and not continuous with other keys.

Furthermore, the generation looks good, so it looks like a range of
memory is corrupted.
The affected range includes the key and block ptr bytenr.

And for the other write time failure, it may be a bit flip (0x01 ->
0x21), at least there is no other obvious corruption unlike the node
pointer error.

Considering it's a VM for fedora project, I guess it has ECC memory so
that we can rule out the memory corruption by hardware.

[9 SEP]
Is the dmesg truncated? For every EUCLEAN case from
__btrfs_free_extent() we should have an error message line (that's the
standard practice to have an error message for each EUCLEAN error).

I can not see the needed error line, thus hard to say.

Thanks,
Qu

> --
> Chris Murphy
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096
  2024-09-17 21:40 ` Qu Wenruo
@ 2024-09-23 22:02   ` Chris Murphy
  2024-09-24 14:39     ` Josef Bacik
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2024-09-23 22:02 UTC (permalink / raw)
  To: Qu Wenruo, Btrfs BTRFS

New comment in the bug report: https://bugzilla.redhat.com/show_bug.cgi?id=2312886#c4

Except kernel messages. What's interesting:
-  multiple arches (also s390x but I don't have any kernel messages for it);
-  single workload (koji composes)
-  the btrfs error is different for each, this makes me think we're not dealing with a btrfs bug

But how can we get more information about why this is happening? Is running a KASAN enabled kernel useful?


6.10.5-200.fc40.aarch64

Sep 21 14:06:14  kernel: BTRFS critical (device vda3): corrupt leaf: root=256 block=553541238784 slot=9 ino=1551466593, name hash mismatch with key, have 0x00000000cb63b37b expect 0x00000000fec4d99b
Sep 21 14:06:14  kernel: BTRFS info (device vda3): leaf 553541238784 gen 623185 total ptrs 91 free space 5959 owner 256
Sep 21 14:06:14  kernel: BTRFS error (device vda3): block=553541238784 write time tree block corruption detected
Sep 21 14:06:14  kernel: BTRFS: error (device vda3) in btrfs_commit_transaction:2505: errno=-5 IO failure (Error while writing out transaction)
Sep 21 14:06:14  kernel: BTRFS info (device vda3 state E): forced readonly
Sep 21 14:06:14  kernel: BTRFS warning (device vda3 state E): Skipping commit of aborted transaction.
Sep 21 14:06:14  kernel: BTRFS error (device vda3 state EA): Transaction aborted (error -5)
Sep 21 14:06:14  kernel: BTRFS: error (device vda3 state EA) in cleanup_transaction:1999: errno=-5 IO failure
Sep 21 14:06:14  kernel: BTRFS: error (device vda3 state EA) in btrfs_sync_log:3174: errno=-5 IO failure


6.10.5-200.fc40.x86_64

Sep 23 05:17:20 kernel: BTRFS warning (device vda3): csum hole found for disk bytenr range [4880289792, 4880293888)
Sep 23 05:17:20 kernel: BTRFS warning (device vda3): csum hole found for disk bytenr range [4880293888, 4880297984)
Sep 23 05:17:20 kernel: BTRFS warning (device vda3): csum failed root 256 ino 281 off 66670592 csum 0xb85d0050 expected csum 0x00000000 mirror 1
Sep 23 05:17:20 kernel: BTRFS error (device vda3): bdev /dev/vda3 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
...
Sep 23 09:38:53 kernel: BTRFS info (device loop0p4): last unmount of filesystem 2cf58b61-2878-4bc4-b0b0-64e75d467fdf
Sep 23 09:39:30 kernel: BTRFS info (device loop2p4): last unmount of filesystem 57ad289b-2ac5-48ea-86bf-841e20af8720
Sep 23 10:24:52 kernel: BTRFS critical (device vda3): corrupt leaf: root=7 block=1374024122368 slot=150, unexpected item end, have 4706614063 expect 14527
Sep 23 10:24:52 kernel: BTRFS info (device vda3): leaf 1374024122368 gen 699790 total ptrs 346 free space 2909 owner 7

--
Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096
  2024-09-23 22:02   ` Chris Murphy
@ 2024-09-24 14:39     ` Josef Bacik
  0 siblings, 0 replies; 4+ messages in thread
From: Josef Bacik @ 2024-09-24 14:39 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Qu Wenruo, Btrfs BTRFS

On Mon, Sep 23, 2024 at 06:02:31PM -0400, Chris Murphy wrote:
> New comment in the bug report: https://bugzilla.redhat.com/show_bug.cgi?id=2312886#c4
> 
> Except kernel messages. What's interesting:
> -  multiple arches (also s390x but I don't have any kernel messages for it);
> -  single workload (koji composes)
> -  the btrfs error is different for each, this makes me think we're not dealing with a btrfs bug
> 
> But how can we get more information about why this is happening? Is running a KASAN enabled kernel useful?

Yeah try KASAN.  I'm in the middle of a different investigation, once I wrap
that up I'll dig into this some more with Qu.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-09-24 14:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-17 18:04 BTRFS critical, corrupt node, unaligned pointer, should be aligned to 4096 Chris Murphy
2024-09-17 21:40 ` Qu Wenruo
2024-09-23 22:02   ` Chris Murphy
2024-09-24 14:39     ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).