From: Jean-Louis Dupond <jean-louis@dupond.be>
To: linux-ext4@vger.kernel.org
Subject: Re: ext4 metadata corruption - snapshot related?
Date: Wed, 2 Jul 2025 15:43:25 +0200 [thread overview]
Message-ID: <7b9c7a42-de7b-4408-91a6-1c35e14cc380@dupond.be> (raw)
In-Reply-To: <e90d9c7f-adf8-453d-a3c2-f1d28ee9d9b3@dupond.be>
We updated a machine to a newer 6.15.2-1.el8.elrepo.x86_64 kernel, and
the same? bug reoccurred after some time:
The error was the following:
Jul 02 11:03:35 xxxxx kernel: EXT4-fs error (device sdd1):
ext4_lookup:1791: inode #44962812: comm imap: deleted inode referenced:
44997932
Jul 02 11:03:35 xxxxx kernel: EXT4-fs error (device sdd1):
ext4_lookup:1791: inode #44962812: comm imap: deleted inode referenced:
44997932
Jul 02 11:03:35 xxxxx kernel: EXT4-fs error (device sdd1):
ext4_lookup:1791: inode #44962812: comm imap: deleted inode referenced:
44997932
Jul 02 11:04:03 xxxxx kernel: EXT4-fs error (device sdd1):
ext4_lookup:1791: inode #44962812: comm imap: deleted inode referenced:
44997932
Any idea's on how this could be debugged further?
Thanks
Jean-Louis
On 12/06/2025 16:43, Jean-Louis Dupond wrote:
> Hi,
>
> We have around 200 VM's running on qemu (on a AlmaLinux 9 based
> hypervisor).
> All those VM's are migrated from physical machines recently.
>
> But when we enable backups on those VM's (which triggers snapshots),
> we notice some weird/random ext4 corruption within the VM itself.
> The VM itself runs CloudLinux 8 (4.18.0-553.40.1.lve.el8.x86_64 kernel).
>
> This are some examples of corruption we see:
> 1)
> kernel: EXT4-fs error (device sdc1): htree_dirblock_to_tree:1036:
> inode #19280823: comm lsphp: Directory block failed checksum
> kernel: EXT4-fs error (device sdc1): ext4_empty_dir:2801: inode
> #19280823: comm lsphp: Directory block failed checksum
> kernel: EXT4-fs error (device sdc1): htree_dirblock_to_tree:1036:
> inode #19280820: comm lsphp: Directory block failed checksum
> kernel: EXT4-fs error (device sdc1): ext4_empty_dir:2801: inode
> #19280820: comm lsphp: Directory block failed checksum
>
> 2)
> kernel: EXT4-fs error (device sdc1): ext4_lookup:1645: inode
> #49419787: comm lsphp: deleted inode referenced: 49422454
> kernel: EXT4-fs error (device sdc1): ext4_lookup:1645: inode
> #49419787: comm lsphp: deleted inode referenced: 49422454
> kernel: EXT4-fs error (device sdc1): ext4_lookup:1645: inode
> #49419787: comm lsphp: deleted inode referenced: 49422454
>
> 3)
> kernel: EXT4-fs error (device sdb1): ext4_validate_block_bitmap:384:
> comm kworker/u240:3: bg 308: bad block bitmap checksum
> kernel: EXT4-fs (sdb1): Delayed block allocation failed for inode
> 2513946 at logical offset 2 with max blocks 1 with error 74
> kernel: EXT4-fs (sdb1): This should not happen!! Data will be lost
> kernel: EXT4-fs (sdb1): Inode 2513946 (00000000265d63ca):
> i_reserved_data_blocks (1) not cleared!
> kernel: EXT4-fs (sdb1): error count since last fsck: 1
> kernel: EXT4-fs (sdb1): initial error at time 1747923211:
> ext4_validate_block_bitmap:384
> kernel: EXT4-fs (sdb1): last error at time 1747923211:
> ext4_validate_block_bitmap:384
> kernel: EXT4-fs (sdb1): error count since last fsck: 1
> kernel: EXT4-fs (sdb1): initial error at time 1747923211:
> ext4_validate_block_bitmap:384
> kernel: EXT4-fs (sdb1): last error at time 1747923211:
> ext4_validate_block_bitmap:384
>
> 4)
> kernel: EXT4-fs (sdc1): error count since last fsck: 4
> kernel: EXT4-fs (sdc1): initial error at time 1746616017:
> ext4_validate_block_bitmap:384
> kernel: EXT4-fs (sdc1): last error at time 1746621676:
> ext4_mb_generate_buddy:808
>
>
> Now as a test we upgraded to some newer (backported) kernel, more
> specificly: 5.14.0-284.1101
> And after doing some backups again, we had another error:
>
> kernel: EXT4-fs error (device sdc1): htree_dirblock_to_tree:1073:
> inode #34752060: comm tar: Directory block failed checksum
> kernel: EXT4-fs warning (device sdc1): ext4_dirblock_csum_verify:405:
> inode #34752232: comm tar: No space for directory leaf checksum.
> Please run e2fsck -D.
> kernel: EXT4-fs error (device sdc1): htree_dirblock_to_tree:1073:
> inode #34752232: comm tar: Directory block failed checksum
> kernel: EXT4-fs warning (device sdc1): ext4_dirblock_csum_verify:405:
> inode #34752064: comm tar: No space for directory leaf checksum.
> Please run e2fsck -D.
> kernel: EXT4-fs error (device sdc1): htree_dirblock_to_tree:1073:
> inode #34752064: comm tar: Directory block failed checksum
> kernel: EXT4-fs warning (device sdc1): ext4_dirblock_csum_verify:405:
> inode #34752167: comm tar: No space for directory leaf checksum.
> Please run e2fsck -D.
> kernel: EXT4-fs error (device sdc1): htree_dirblock_to_tree:1073:
> inode #34752167: comm tar: Directory block failed checksum
>
>
> So now we are wondering what could cause this corruption here.
> - We have more VM's on the same kind of setup, without seeing any
> corruption. The only difference there is that the VM's are running
> Debian, have smaller disks and not doing quota.
> - If we disable backups/snapshots, no corruption is observed
> - Even if we disable the qemu-guest-agent (so no fsfreeze is
> executed), the corruption still occurs
>
> We (for now at least) only see the corruption on filesystems where
> quota is enabled (both usrjquota and usrquota).
> The filesystems are between 600GB and 2TB.
> And today I noticed (as the filesystems are resized during setup), the
> journal size is only 64M (could this potentially be an issue?).
>
> The big question in the whole story here is, could it be an in-guest
> (ext4?) bug/issue? Or do we really need to look into the layer below
> (aka qemu/hypervisor).
> Or if somebody has other idea's, feel free to share! Also additional
> things that could help to troubleshoot the issue.
>
> Thanks
> Jean-Louis
next prev parent reply other threads:[~2025-07-02 13:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-12 14:43 ext4 metadata corruption - snapshot related? Jean-Louis Dupond
2025-07-02 13:43 ` Jean-Louis Dupond [this message]
2025-07-02 14:37 ` Theodore Ts'o
2025-07-02 15:32 ` Jean-Louis Dupond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b9c7a42-de7b-4408-91a6-1c35e14cc380@dupond.be \
--to=jean-louis@dupond.be \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).