ext4: failed to convert unwritten extents (6.12.31 regression)

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* ext4: failed to convert unwritten extents (6.12.31 regression)
@ 2025-09-22 11:11 Andrea Biardi
  2025-09-22 12:41 ` Theodore Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Andrea Biardi @ 2025-09-22 11:11 UTC (permalink / raw)
  To: linux-ext4

Hi All,

The CI process of a product that I'm working on involves the creation of a temporary KVM VM which boots a cdrom image containing a custom kernel + busybox in order to flash a filesystem image to /dev/vda, then shuts it down and exports the VM (that's my "deliverable" for the next stage).

For this custom kernel, I have used 6.6.x for a long time; after upgrading to 6.12, I started observing filesystem corruption in the deliverable image and these messages in dmesg (these are produced by the imaging kernel during flashing):

[   10.188754] EXT4-fs (vda2): mounted filesystem 42e94213-17de-4a91-9c58-c39852446bf2 r/w with ordered data mode. Quota mode: none.
[   11.612142] EXT4-fs (vda1): mounted filesystem e32da11b-d5d4-4621-a7d4-8b9bc5034c83 r/w with ordered data mode. Quota mode: none.
[  174.903010] I/O error, dev vda, sector 167922 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[  174.903023] I/O error, dev vda, sector 167938 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[  174.903027] I/O error, dev vda, sector 169970 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[  174.903031] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 84985)
[  174.903106] I/O error, dev vda, sector 169986 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[  174.903172] I/O error, dev vda, sector 172018 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[  174.903176] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 86009)
[  174.903239] I/O error, dev vda, sector 172034 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[  174.903297] I/O error, dev vda, sector 174066 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[  174.903300] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 87033)
[  174.903371] I/O error, dev vda, sector 174082 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[  174.903401] EXT4-fs (vda1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 16, error -5)
[  174.906697] Buffer I/O error on device vda1, logical block 84993
[  174.906708] Buffer I/O error on device vda1, logical block 84994
[  174.906710] Buffer I/O error on device vda1, logical block 84995
[  174.906712] Buffer I/O error on device vda1, logical block 84996
[  174.906716] Buffer I/O error on device vda1, logical block 84997
[  174.906718] Buffer I/O error on device vda1, logical block 84998
[  174.906719] Buffer I/O error on device vda1, logical block 84999
[  174.906721] Buffer I/O error on device vda1, logical block 85000
[  174.906723] Buffer I/O error on device vda1, logical block 85001
[  174.906724] Buffer I/O error on device vda1, logical block 85002
[  174.928451] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 83961)
[  174.928787] EXT4-fs (vda1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 16, error -5)
[  175.019677] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 88169)
[  175.019752] EXT4-fs (vda1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 16, error -5)
[  183.121276] EXT4-fs (vda1): unmounting filesystem e32da11b-d5d4-4621-a7d4-8b9bc5034c83.
[  183.711275] EXT4-fs (vda2): unmounting filesystem 42e94213-17de-4a91-9c58-c39852446bf2.

The relevant sequence of events inside the imaging VM is:
1) sfdisk /dev/vda (creates: vda1 for /boot, vda2 for the root filesystem)
2) mke2fs -t ext4 on both
3) mount at /mnt and /mnt/boot and rsync the source image (~100k files)
4) chroot to make a couple modifications, install grub and rebuild the initrd
5) shutdown

The error I'm now seeing always occurs as a result of rebuilding the initrd (although I'm not sure why, certainly the rsync sees a lot more I/O over the 3 preceding minutes). As the sole purpose of this VM is to flash a filesystem image, nothing else is happening in the background.

I've done a rough bisection based on kernel releases and this problem occurs on 6.12.31 (212 out of 365 runs) and later, including 6.16.7 (6.12.30 is fine, just as 6.6.106 was).

Looking at the changelog for 6.12.31, commit 785ac699113320e3c3968754ca0c78d40a013107 "ext4: do not convert the unwritten extents if data writeback fails" stands out.

The configuration of the custom kernel used in the VM is fairly generic -- mostly a default x86_64 config with stuff that I don't need turned off: IPv6, sound, wireless, a few other bits.

I can rule out issues with the underlying hardware (tried on 3 different KVM hosts and nothing in host's dmesg either).

Also, I have a similar procedure (same custom kernel, same imagaging scripts) that runs against ESXi and Hyper-V hypervisors (to create ESXi or Hyper-V VM images, respectively) and neither exhibits this problem (the notable difference, I suppose, is the block device being sda, i.e. not virtio).

For reasons that I don't understand, the regression occurs only if the imaging involves 2 distinct partitions / filesystems (boot and root). If I make a single partition/filesystem and mount that at /mnt, the error doesn't trigger. This may be a coincidence, however it's hard to ignore the fact the the file corruption always happens on the mounted /boot (that's where dracut writes the initrd), and in the single-partition case there's a single ext4 filesystem (disclaimer: haven't done hundreds of runs for this case).

Any ideas?

Thanks
Andrea.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext4: failed to convert unwritten extents (6.12.31 regression)
  2025-09-22 11:11 ext4: failed to convert unwritten extents (6.12.31 regression) Andrea Biardi
@ 2025-09-22 12:41 ` Theodore Ts'o
  2025-09-22 14:00   ` Andrea Biardi
  0 siblings, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2025-09-22 12:41 UTC (permalink / raw)
  To: Andrea Biardi; +Cc: linux-ext4

On Mon, Sep 22, 2025 at 11:11:15AM +0000, Andrea Biardi wrote:
> 
> The CI process of a product that I'm working on involves the creation of a temporary KVM VM which boots a cdrom image containing a custom kernel + busybox in order to flash a filesystem image to /dev/vda, then shuts it down and exports the VM (that's my "deliverable" for the next stage).
>
> [  174.903010] I/O error, dev vda, sector 167922 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
> [  174.903023] I/O error, dev vda, sector 167938 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
> [  174.903027] I/O error, dev vda, sector 169970 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
> [  174.903031] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 84985)

The failure is coming from the block device, which in your case, is
the virtio device.  The only causes for this are:

1)  An underlying hardware failure
2)  A bug in the block virtio device
3)  A bug in the VMM (I assume qemu in your case).

The bug might be triggered by a change in the behavior of ext4, but
ultimately, there is nothing that a file system can do that could
result in an I/O error other than (1), (2), or (3), above.

The only thing I can suggest is to do a full bisection between 6.12.30
and 6.12.31.  Or take a look at commits that were landed between
6.12.30 and 6.12.31, focusing on changes in /drivers/block,
/drivers/virtio, and /block.  I doubt that it's /block, given that no
one else is reporting it.

One other thing you might to try is to changing your qemu
configuration to use virtio-scsi or NVMe emulation.  Most commercial
cloud products (e.g., Amazon, Azure, Google Cloud) tend to use
emulated SCSI and NVMe, instead of virtio-blk.  It's true that
virtio-blk is more efficient, but the virtual SCSI and NVMe devices
are more similar to Real Hardware(tm), which is why commercial cloud
products tend to use them; they tend to easier for companies doing
"lift and shift".  As a result, it's likely that issues with
virtio-blk might not be noticed, given that it gets fewer amounts of
testing.

I do regular regression testing of ext4 using Google Cloud[1], and it
uses either SCSI or NVMe devices (depending on whether the VM type
supports SCSI or NVMe --- the more expensive, higher performance VM's
tend to use NVMe because allows better performance for the
high-performance block devices).  While I *can* run kvm-xfstests using
virtio-blk, but when gce-xfstests takes 2-3 hours of wall clock time
(running on a dozen VM's running in parallel), or 24 hours if I were
to run the identical tests using kvm-xfstests, there's a reason why I
rarely use kvm-xfstests/qemu-xfstests.  If I'm someplace without
network access, and all I have is qemu using MacOS's Hypervisor
Framework (hvf) on my Macbook Air, sure, I'll use qemu-xfstests.  But
it's not something I'll do unless I don't have any other alternatives.

[1] https://thunk.org/gce-xfstests

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext4: failed to convert unwritten extents (6.12.31 regression)
  2025-09-22 12:41 ` Theodore Ts'o
@ 2025-09-22 14:00   ` Andrea Biardi
  2025-09-22 20:51     ` Andreas Dilger
  0 siblings, 1 reply; 4+ messages in thread
From: Andrea Biardi @ 2025-09-22 14:00 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On 22-Sep-2025,  "Theodore Ts'o" <tytso@mit.edu> wrote:

> > [  174.903010] I/O error, dev vda, sector 167922 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
> > [  174.903023] I/O error, dev vda, sector 167938 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
> > [  174.903027] I/O error, dev vda, sector 169970 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
> > [  174.903031] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 84985)
>
> The failure is coming from the block device, which in your case, is
> the virtio device.  The only causes for this are:
> 1)  An underlying hardware failure
> 2)  A bug in the block virtio device
> 3)  A bug in the VMM (I assume qemu in your case).

Thank you for the quick response!

You do have a point there, the first reported problem is effectively a write failure on vda.
I tried with virtio-scsi, and can't reproduce the bug. I will try with a newer version of qemu first, and then look into virtio-blk.

Cheers,
Andrea.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext4: failed to convert unwritten extents (6.12.31 regression)
  2025-09-22 14:00   ` Andrea Biardi
@ 2025-09-22 20:51     ` Andreas Dilger
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Dilger @ 2025-09-22 20:51 UTC (permalink / raw)
  To: Andrea Biardi; +Cc: Theodore Ts'o, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 1920 bytes --]

On Sep 22, 2025, at 8:00 AM, Andrea Biardi <Andrea.Biardi@viavisolutions.com> wrote:
> 
> On 22-Sep-2025,  "Theodore Ts'o" <tytso@mit.edu> wrote:
> 
>>> [  174.903010] I/O error, dev vda, sector 167922 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
>>> [  174.903023] I/O error, dev vda, sector 167938 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
>>> [  174.903027] I/O error, dev vda, sector 169970 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
>>> [  174.903031] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 84985)
>> 
>> The failure is coming from the block device, which in your case, is
>> the virtio device.  The only causes for this are:
>> 1)  An underlying hardware failure
>> 2)  A bug in the block virtio device
>> 3)  A bug in the VMM (I assume qemu in your case).
> 
> Thank you for the quick response!
> 
> You do have a point there, the first reported problem is effectively a write failure on vda.
> I tried with virtio-scsi, and can't reproduce the bug. I will try with a newer version of qemu first, and then look into virtio-blk.

It _could_ still be a software issue, if these vda errors are caused by writes
beyond the end of the block device?  These are showing errors at sector 167922+
so if the vda=/boot filesystem is just below 82MiB=83968 blocks=167936 sectors
in size then this might be the issue?

If adding/removing this specific patch shows/hides this patch then it could
also be a bug in how the ext4 extents or uninitialized extent zeroing at the
end of the device is handled, assuming that the /boot device is indeed 82MB.

Have you tried doing a linear read/write of /dev/vda in 1KiB units to confirm
that all of the sectors can be read and written?

It might also be a mismatch between /dev/vda size vs. how many block the
filesystem is formatted to use?

Cheers, Andreas

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-09-22 20:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-22 11:11 ext4: failed to convert unwritten extents (6.12.31 regression) Andrea Biardi
2025-09-22 12:41 ` Theodore Ts'o
2025-09-22 14:00   ` Andrea Biardi
2025-09-22 20:51     ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox