Hello ! We are experiencing issues with the ext4 file system in automatic tests. Here are some required information: [2.] Full description of the problem/report: Sometimes, accessing a file on the EXT4 file system fails with an error message in the kernel log. So far, we observed 3 different kind of messages: - EXT4-fs error (device mmcblk1p2): ext4_lookup:1785: inode #10287: comm ostree: iget: checksum invalid - EXT4-fs error (device mmcblk0p3): __ext4_find_entry:1623: inode #258562: comm gst-launch-1.0: checksumming directory block 0 - EXT4-fs error (device mmcblk0p3): ext4_validate_block_bitmap:390: comm fstrim: bg 16: bad block bitmap checksum The first issue was apparently fixed by patching our kernel with this patchset: https://lore.kernel.org/all/20210901020955.1657340-1-yi.zhang@huawei.com/ The second issue seems to be happening for all kind of programs. In this instance, it was gstreamer opening a file. It can also happen when mkdir creates a directory. The third issue seems to only happen with fstrim. This seems to be a random issue and cannot be reproduced easily nor is there a procedure to reproduce it. Each time a test suite is run, the image is freshly written on the device. The same tested multiple times will sometimes fail, sometimes not. [3.] Keywords (i.e., modules, networking, kernel): ext4, checksum [4.] Kernel information We use a modified version of the debian kernel, the source code is here: https://gitlab.apertis.org/pkg/linux No patches are modifying the ext4 filesystem code. [4.1.] Kernel version (from /proc/version): It is hard to determine when the issue started appearing. One educated guess would be when we upgraded from 5.15.1 to 5.15.22. One version where this failed is: Linux version 5.15.0-trunk-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Apertis 10.2.1-6+apertis6bv2023dev1b2) 10.2.1 20210110, GNU ld (GNU Binutils for Apertis) 2.35.2) #1 SMP Debian 5.15.22-0~apertis2 (2022-02-16) [4.2.] Kernel .config file: See the attached config.txt file [5.] Most recent kernel version which did not have the bug: My guess is 5.15.1, but I cannot be sure of this. [6.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/admin-guide/bug-hunting.rst) N/A [7.] A small shell script or example program which triggers the problem (if possible) N/A Although you can check the full output here for example: https:// lava.collabora.co.uk/scheduler/job/5756873#L12901 (pointed on the line of the error) [8.] Environment We use two deployment types for our images: APT (classic debian's apt) and OSTree. The issue seems to only happen with OSTree images. Also, the issue has happened on multiple different boards, with multiple architectures (amd64, armhf and arm64). So failing hardware is unlikely at fault here. [8.1.] Software (add the output of the ver_linux script here) [8.2.] Processor information (from /proc/cpuinfo): Not related [8.3.] Module information (from /proc/modules): See attached modules.txt file [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) Happens on different HW [8.5.] PCI information ('lspci -vvv' as root) Not related [8.6.] SCSI information (from /proc/scsi/scsi) Not related [8.7.] Other information that might be relevant to the problem (please look in /proc and include all information that you think to be relevant): See the output of mount (amd64) in mount.txt The issues can happen on the rootfs or the home partition. [X.] Other notes, patches, fixes, workarounds: Because I am not familiar with the internals of the ext4 file system and the issue is random and hard to reproduce, I am mainly asking for pointers or for patches in review to try. I can get more information as needed. Regards, Detlev.