* [Bug 220594] New: Online defragmentation has broken in 6.16
@ 2025-09-22 17:57 bugzilla-daemon
2025-09-22 18:04 ` [Bug 220594] " bugzilla-daemon
` (12 more replies)
0 siblings, 13 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-09-22 17:57 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=220594
Bug ID: 220594
Summary: Online defragmentation has broken in 6.16
Product: File System
Version: 2.5
Hardware: All
OS: Linux
Status: NEW
Severity: low
Priority: P3
Component: ext4
Assignee: fs_ext4@kernel-bugs.osdl.org
Reporter: aros@gmx.com
Regression: No
I cannot defragment multiple files on ext4 in 6.16 with this error:
Failed to defrag with EXT4_IOC_MOVE_EXT ioctl:Success [ NG ]
This wasn't the case with 6.15.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon @ 2025-09-22 18:04 ` bugzilla-daemon 2025-09-25 12:14 ` bugzilla-daemon ` (11 subsequent siblings) 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-09-22 18:04 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 --- Comment #1 from Artem S. Tashkinov (aros@gmx.com) --- We are talking about over a hundred of files as small as 100KB despite ample free space (yes, I do have enough space to properly allocate files) - over 16GB. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon 2025-09-22 18:04 ` [Bug 220594] " bugzilla-daemon @ 2025-09-25 12:14 ` bugzilla-daemon 2025-09-25 14:24 ` bugzilla-daemon ` (10 subsequent siblings) 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-09-25 12:14 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 Artem S. Tashkinov (aros@gmx.com) changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tytso@mit.edu --- Comment #2 from Artem S. Tashkinov (aros@gmx.com) --- I see no patches in 6.16.9 or a discussion on LKML. The bug is still valid. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon 2025-09-22 18:04 ` [Bug 220594] " bugzilla-daemon 2025-09-25 12:14 ` bugzilla-daemon @ 2025-09-25 14:24 ` bugzilla-daemon 2025-09-25 18:44 ` bugzilla-daemon ` (9 subsequent siblings) 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-09-25 14:24 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 Theodore Tso (tytso@mit.edu) changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |UNREPRODUCIBLE --- Comment #3 from Theodore Tso (tytso@mit.edu) --- I can't reproduce the bug, so it's not valid for me.... # uname -a Linux kvm-xfstests 6.17.0-rc4-xfstests #245 SMP PREEMPT_DYNAMIC Thu Sep 25 10:02:23 EDT 2025 x86_64 GNU/Linux root@kvm-xfstests:/vdc# seq 1 1024 | xargs -n 1 cp /etc/motd root@kvm-xfstests:/vdc# seq 1 2 1024 | xargs rm root@kvm-xfstests:/vdc# cp /bin/bash . root@kvm-xfstests:/vdc# cp /bin/netstat . root@kvm-xfstests:/vdc# cp /bin/uniq . root@kvm-xfstests:/vdc# seq 2 2 1024 | xargs rm root@kvm-xfstests:/vdc# filefrag * bash: 317 extents found lost+found: 1 extent found netstat: 39 extents found uniq: 1 extent found root@kvm-xfstests:/vdc# e4defrag * e4defrag 1.47.2 (1-Jan-2025) ext4 defragmentation for bash [1/1]bash: 100% [ OK ] Success: [1/1] ext4 defragmentation for directory(lost+found) Can not process "lost+found" "lost+found" ext4 defragmentation for netstat [1/1]netstat: 100% [ OK ] Success: [1/1] ext4 defragmentation for uniq [1/1]uniq: 100% [ OK ] Success: [1/1] root@kvm-xfstests:/vdc# -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon ` (2 preceding siblings ...) 2025-09-25 14:24 ` bugzilla-daemon @ 2025-09-25 18:44 ` bugzilla-daemon 2025-09-25 20:44 ` bugzilla-daemon ` (8 subsequent siblings) 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-09-25 18:44 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 Artem S. Tashkinov (aros@gmx.com) changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|UNREPRODUCIBLE |--- --- Comment #4 from Artem S. Tashkinov (aros@gmx.com) --- I can upload a couple of ext4 images for you to repro the bug. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon ` (3 preceding siblings ...) 2025-09-25 18:44 ` bugzilla-daemon @ 2025-09-25 20:44 ` bugzilla-daemon 2025-09-26 21:29 ` bugzilla-daemon ` (7 subsequent siblings) 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-09-25 20:44 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 --- Comment #5 from Artem S. Tashkinov (aros@gmx.com) --- I've emailed you privately/offlist. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon ` (4 preceding siblings ...) 2025-09-25 20:44 ` bugzilla-daemon @ 2025-09-26 21:29 ` bugzilla-daemon 2025-09-27 13:19 ` bugzilla-daemon ` (6 subsequent siblings) 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-09-26 21:29 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 --- Comment #6 from Theodore Tso (tytso@mit.edu) --- The reproducer file system which Artem sent me doesn't fail for me on both 6.16 and 6.17-rc4. So it maybe something which is specific to Artem's kernel or hardwaare configuration. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon ` (5 preceding siblings ...) 2025-09-26 21:29 ` bugzilla-daemon @ 2025-09-27 13:19 ` bugzilla-daemon 2025-09-28 2:17 ` bugzilla-daemon ` (5 subsequent siblings) 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-09-27 13:19 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 --- Comment #7 from Artem S. Tashkinov (aros@gmx.com) --- Created attachment 308721 --> https://bugzilla.kernel.org/attachment.cgi?id=308721&action=edit My 6.16 configuration I remember from the log that you sent, there were quite a lot of failures, over 9000 of them. Are you sure your scripts handles this? e4defrag -v uBlock0\@raymondhill.net.xpi; echo $? e4defrag 1.47.2 (1-Jan-2025) ext4 defragmentation for uBlock0@raymondhill.net.xpi [1/1]uBlock0@raymondhill.net.xpi: 0% Failed to defrag with EXT4_IOC_MOVE_EXT ioctl:Success [ NG ] Success: [0/1] 1 So, it's "successful" except it's not. > is specific to Artem's kernel or hardware configuration. I thought ext4 code is quite independent from the rest of the kernel and no other kernel options can possibly affect it. gzip -d < /proc/config.gz | grep -i ext4 CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT2=y CONFIG_EXT4_FS_POSIX_ACL=y CONFIG_EXT4_FS_SECURITY=y # CONFIG_EXT4_DEBUG is not set -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon ` (6 preceding siblings ...) 2025-09-27 13:19 ` bugzilla-daemon @ 2025-09-28 2:17 ` bugzilla-daemon 2025-11-20 6:48 ` bugzilla-daemon ` (4 subsequent siblings) 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-09-28 2:17 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 --- Comment #8 from Theodore Tso (tytso@mit.edu) --- e4defrag reports directories as failures. We should probably change it to just skip directories so it doesn't confuse people: root@kvm-xfstests:~# mke2fs -Fq -t ext4 /dev/vdc /dev/vdc contains a ext4 file system last mounted on Sat Sep 27 22:14:06 2025 root@kvm-xfstests:~# mount /dev/vdc /vdc [ 118.187747] EXT4-fs (vdc): mounted filesystem c6ef56ee-8512-4408-8a58-2c83e3a36f91 r/w with ordered data mode. Quota mode: none. root@kvm-xfstests:~# e4defrag /vdc e4defrag 1.47.2 (1-Jan-2025) ext4 defragmentation for directory(/vdc) Success: [ 0/2 ] Failure: [ 2/2 ] root@kvm-xfstests:~# find /vdc -type d /vdc /vdc/lost+found root@kvm-xfstests:~# I am not seeing the "Failure to defrag with EXT4_IOC_MOVE_EXT ioctl" message which you cited. So I don't believe I'm able to reproduce whatever you're seeing. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon ` (7 preceding siblings ...) 2025-09-28 2:17 ` bugzilla-daemon @ 2025-11-20 6:48 ` bugzilla-daemon 2025-11-24 5:15 ` Theodore Tso 2025-11-20 6:52 ` bugzilla-daemon ` (3 subsequent siblings) 12 siblings, 1 reply; 16+ messages in thread From: bugzilla-daemon @ 2025-11-20 6:48 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 --- Comment #9 from Artem S. Tashkinov (aros@gmx.com) --- Sadly it's still broken in 6.17: ext4 defragmentation for ./birdie/.config/google-chrome/Default/Sync Data/LevelDB/000435.log Failed to defrag with EXT4_IOC_MOVE_EXT ioctl:Success [ NG ] ls -la /home/birdie/.config/google/Default/Sync Data/LevelDB/000435.log -rw-------. 1 birdie birdie 139150 Nov 19 10:22 '/home/birdie/.config/google-chrome/Default/Sync Data/LevelDB/000435.log' A small file that is not getting defragmented and I have literally five dozens of them. And I have plenty of free space. Filesystem volume name: Last mounted on: / Filesystem UUID: Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: not clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 1572864 Block count: 6287360 Reserved block count: 314367 Overhead clusters: 109963 Free blocks: 4126491 Free inodes: 1428171 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1022 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 Filesystem created: Last mount time: Last write time: Mount count: Maximum mount count: -1 Last checked: Check interval: 0 (<none>) Lifetime writes: Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Default directory hash: half_md4 Directory Hash Seed: Journal backup: inode blocks -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bug 220594] Online defragmentation has broken in 6.16 2025-11-20 6:48 ` bugzilla-daemon @ 2025-11-24 5:15 ` Theodore Tso 0 siblings, 0 replies; 16+ messages in thread From: Theodore Tso @ 2025-11-24 5:15 UTC (permalink / raw) To: bugzilla-daemon; +Cc: linux-ext4 So it's not that all files can't be defragged; just *some* files. Is that correct? And when I ask whether or not it's reproducible, can you take a snapshot of your file system, and then remount the snapshot, and will the exact same file that failed before fails on the snapshot? And for the files that were failing, if you unmount the file system and remount it, can you then defrag the file in question? If the answer is yes, this is why bug reports of the form "Online defragmentation in 6.16 is broken" is not particularly useful. And it's why I've not spent a lot of time on this bug. We have defragmentation tests in fstests, and they are passing, and I've tried running defrag on the snapshot that you sent me, And It Works For Me. So a broad "it's broken" without any further data, when it most manifestly is not broken in my tests, means that if you really want it to be fixed, you're going to have to do more of the debugging. But now that we know that it's an EBUSY error, it sounds like it's some kind of transient thing, and that's why I'm not seeing it when I tried running it on your snapshot. For example, one of the places where you can get EBUSY in the MOVE_EXT ioctl is here: if (!filemap_release_folio(folio[0], 0) || !filemap_release_folio(folio[1], 0)) { *err = -EBUSY; goto drop_data_sem; } ... and this ultimately calls ext4_release_folio: static bool ext4_release_folio(struct folio *folio, gfp_t wait) { struct inode *inode = folio->mapping->host; journal_t *journal = EXT4_JOURNAL(inode); trace_ext4_release_folio(inode, folio); /* Page has dirty journalled data -> cannot release */ if (folio_test_checked(folio)) return false; if (journal) return jbd2_journal_try_to_free_buffers(journal, folio); else return try_to_free_buffers(folio); } What this means is that if the file has pages which need to be written out to the final location on disk (e.g., if you are in data=journal mode, and the modified file may have been written or scheduled to be written to the journal, but not *yet* to the final location on disk, or you are using delayed allocation and the file was just recently written, delayed allocation is enabled, and blocks get allocated but they haven't been written back yet) --- then the MOVE_EXT ioctl will return EBUSY. This is not new behaviour; we've always had this. Now, 6.16 is when large folio support landed for ext4, and this can result in some really wonderful performance improvements. This may have resulted in a change in how often recently written files might end up getting EBUSY when you try to defrag them --- but quite frankly, if this is a very tiny fraction of the files in your file system, and a subsequent defrag run will take care of them --- I'd probably think that is a fair tradeoff. So... if you take a look at the files that failed trying call MOVE_EXT --- can you take a look at the timestamps and see if they are relatively recently written files? Also, for future reference, if you had disclosed that this was only happening on a tiny percentage of all of the files in your file system, and if you checked to see if the specific small number of files (by percentage) that were failing could be defragged later, and checked the timestamps, that would have been really useful data which would have allowed you (and me) to waste a lot less time. Cheers, - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon ` (8 preceding siblings ...) 2025-11-20 6:48 ` bugzilla-daemon @ 2025-11-20 6:52 ` bugzilla-daemon 2025-11-24 5:15 ` bugzilla-daemon ` (2 subsequent siblings) 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-11-20 6:52 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 --- Comment #10 from Artem S. Tashkinov (aros@gmx.com) --- openat(AT_FDCWD, "/home/birdie/.config/google-chrome/Default/Sync Data/LevelDB/000435.log", O_RDWR) = 3 ioctl(3, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615, fm_flags=FIEMAP_FLAG_SYNC, fm_extent_count=512} => {fm_flags=FIEMAP_FLAG_SYNC, fm_mapped_extents=4, ...}) = 0 fstatfs(3, {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=6177397, f_bfree=4170566, f_bavail=3852103, f_files=1572864, f_ffree=1433202, f_fsid={val=[0xbbd03838, 0xb062f9ed]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOATIME}) = 0 fcntl(3, F_GETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0, l_pid=0}) = 0 fsync(3) = 0 openat(AT_FDCWD, "/home/birdie/.config/google-chrome/Default/Sync Data/LevelDB/000435.log.defrag", O_WRONLY|O_CREAT|O_EXCL, 0400) = 4 unlink("/home/birdie/.config/google-chrome/Default/Sync Data/LevelDB/000435.log.defrag") = 0 fallocate(4, 0, 0, 139264) = 0 ioctl(4, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615, fm_flags=FIEMAP_FLAG_SYNC, fm_extent_count=512} => {fm_flags=FIEMAP_FLAG_SYNC, fm_mapped_extents=1, ...}) = 0 [1/1]/home/birdie/.config/google-chrome/Default/Sync Data/LevelDB/000435.log: 0%) = 97 mmap(NULL, 139264, PROT_READ, MAP_SHARED, 3, 0) = 0x7f5541c99000 mincore(0x7f5541c99000, 139264, [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...]) = 0 munmap(0x7f5541c99000, 139264) = 0 ioctl(3, EXT4_IOC_MOVE_EXT, 0x7ffd6bb8ac40) = -1 EBUSY (Device or resource busy) sync_file_range(3, 0, 139264, SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER) = 0 fadvise64(3, 0, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 4096, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 8192, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 12288, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 16384, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 20480, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 24576, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 28672, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 32768, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 36864, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 40960, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 45056, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 49152, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 53248, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 57344, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 61440, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 65536, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 69632, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 73728, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 77824, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 81920, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 86016, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 90112, 4096, POSIX_FADV_DONTNEED) = 0 fadvise64(3, 94208, 4096, POSIX_FADV_DONTNEED) = 0 write(1, "\n", 1 ) = 1 write(2, "\tFailed to defrag with EXT4_IOC_"..., 62 Failed to defrag with EXT4_IOC_MOVE_EXT ioctl:Success [ NG ] ) = 62 ioctl(3, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615, fm_flags=FIEMAP_FLAG_SYNC, fm_extent_count=0} => {fm_flags=FIEMAP_FLAG_SYNC, fm_mapped_extents=4, ...}) = 0 Device or resource busy? Why? Chrome is not open. ps ax | grep -i chrome 967139 pts/1 S+ 0:00 grep --color=auto -i chrome Online defragmentation in both 6.16 and 6.17 is broken. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon ` (9 preceding siblings ...) 2025-11-20 6:52 ` bugzilla-daemon @ 2025-11-24 5:15 ` bugzilla-daemon 2025-11-24 16:13 ` bugzilla-daemon 2025-11-24 16:33 ` bugzilla-daemon 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-11-24 5:15 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 --- Comment #11 from Theodore Tso (tytso@mit.edu) --- So it's not that all files can't be defragged; just *some* files. Is that correct? And when I ask whether or not it's reproducible, can you take a snapshot of your file system, and then remount the snapshot, and will the exact same file that failed before fails on the snapshot? And for the files that were failing, if you unmount the file system and remount it, can you then defrag the file in question? If the answer is yes, this is why bug reports of the form "Online defragmentation in 6.16 is broken" is not particularly useful. And it's why I've not spent a lot of time on this bug. We have defragmentation tests in fstests, and they are passing, and I've tried running defrag on the snapshot that you sent me, And It Works For Me. So a broad "it's broken" without any further data, when it most manifestly is not broken in my tests, means that if you really want it to be fixed, you're going to have to do more of the debugging. But now that we know that it's an EBUSY error, it sounds like it's some kind of transient thing, and that's why I'm not seeing it when I tried running it on your snapshot. For example, one of the places where you can get EBUSY in the MOVE_EXT ioctl is here: if (!filemap_release_folio(folio[0], 0) || !filemap_release_folio(folio[1], 0)) { *err = -EBUSY; goto drop_data_sem; } ... and this ultimately calls ext4_release_folio: static bool ext4_release_folio(struct folio *folio, gfp_t wait) { struct inode *inode = folio->mapping->host; journal_t *journal = EXT4_JOURNAL(inode); trace_ext4_release_folio(inode, folio); /* Page has dirty journalled data -> cannot release */ if (folio_test_checked(folio)) return false; if (journal) return jbd2_journal_try_to_free_buffers(journal, folio); else return try_to_free_buffers(folio); } What this means is that if the file has pages which need to be written out to the final location on disk (e.g., if you are in data=journal mode, and the modified file may have been written or scheduled to be written to the journal, but not *yet* to the final location on disk, or you are using delayed allocation and the file was just recently written, delayed allocation is enabled, and blocks get allocated but they haven't been written back yet) --- then the MOVE_EXT ioctl will return EBUSY. This is not new behaviour; we've always had this. Now, 6.16 is when large folio support landed for ext4, and this can result in some really wonderful performance improvements. This may have resulted in a change in how often recently written files might end up getting EBUSY when you try to defrag them --- but quite frankly, if this is a very tiny fraction of the files in your file system, and a subsequent defrag run will take care of them --- I'd probably think that is a fair tradeoff. So... if you take a look at the files that failed trying call MOVE_EXT --- can you take a look at the timestamps and see if they are relatively recently written files? Also, for future reference, if you had disclosed that this was only happening on a tiny percentage of all of the files in your file system, and if you checked to see if the specific small number of files (by percentage) that were failing could be defragged later, and checked the timestamps, that would have been really useful data which would have allowed you (and me) to waste a lot less time. Cheers, - Ted -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon ` (10 preceding siblings ...) 2025-11-24 5:15 ` bugzilla-daemon @ 2025-11-24 16:13 ` bugzilla-daemon 2025-11-24 16:33 ` Theodore Tso 2025-11-24 16:33 ` bugzilla-daemon 12 siblings, 1 reply; 16+ messages in thread From: bugzilla-daemon @ 2025-11-24 16:13 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 --- Comment #12 from Artem S. Tashkinov (aros@gmx.com) --- (In reply to Theodore Tso from comment #11) > So it's not that all files can't be defragged; just *some* files. Is > that correct? That's correct. > > And when I ask whether or not it's reproducible, can you take a > snapshot of your file system, and then remount the snapshot, and will > the exact same file that failed before fails on the snapshot? It still fails on the snapshot. > > And for the files that were failing, if you unmount the file system > and remount it, can you then defrag the file in question? If the No. Tried that thrice. > answer is yes, this is why bug reports of the form "Online > defragmentation in 6.16 is broken" is not particularly useful. And > it's why I've not spent a lot of time on this bug. We have > defragmentation tests in fstests, and they are passing, and I've tried > running defrag on the snapshot that you sent me, And It Works For Me. It still doesn't with the Fedora's kernel (now running 6.17.8-200.fc42.x86_64). > So a broad "it's broken" without any further data, when it most > manifestly is not broken in my tests, means that if you really want it > to be fixed, you're going to have to do more of the debugging. I'd love to help however I cant to get it fixed. > > But now that we know that it's an EBUSY error, it sounds like it's > some kind of transient thing, and that's why I'm not seeing it when I > tried running it on your snapshot. > > For example, one of the places where you can get EBUSY in the MOVE_EXT > ioctl is here: > > if (!filemap_release_folio(folio[0], 0) || > !filemap_release_folio(folio[1], 0)) { > *err = -EBUSY; > goto drop_data_sem; > } > > ... and this ultimately calls ext4_release_folio: > > static bool ext4_release_folio(struct folio *folio, gfp_t wait) > { > struct inode *inode = folio->mapping->host; > journal_t *journal = EXT4_JOURNAL(inode); > > trace_ext4_release_folio(inode, folio); > > /* Page has dirty journalled data -> cannot release */ > if (folio_test_checked(folio)) > return false; > if (journal) > return jbd2_journal_try_to_free_buffers(journal, folio); > else > return try_to_free_buffers(folio); > } > > What this means is that if the file has pages which need to be written > out to the final location on disk (e.g., if you are in data=journal Journalling is disabled on all my ext4 partitions. > mode, and the modified file may have been written or scheduled to be > written to the journal, but not *yet* to the final location on disk, > or you are using delayed allocation and the file was just recently > written, delayed allocation is enabled, and blocks get allocated but > they haven't been written back yet) --- then the MOVE_EXT ioctl will > return EBUSY. > > This is not new behaviour; we've always had this. Now, 6.16 is when > large folio support landed for ext4, and this can result in some > really wonderful performance improvements. This may have resulted in > a change in how often recently written files might end up getting > EBUSY when you try to defrag them --- but quite frankly, if this is a > very tiny fraction of the files in your file system, and a subsequent > defrag run will take care of them --- I'd probably think that is a > fair tradeoff. 6.15 didn't have the issue. subsequent defrag runs don't help. I've tried rebooting multiple times, tried to defrag in single user mode (booted with `1`), with only systemd running and journald disabled altogether, so only ~/.bash_history is opened for writing, nothing else. No dirty buffers to speak of, `sync` does nothing as there's nothing to flush. > > So... if you take a look at the files that failed trying call MOVE_EXT > --- can you take a look at the timestamps and see if they are > relatively recently written files? I'll check it. > > Also, for future reference, if you had disclosed that this was only > happening on a tiny percentage of all of the files in your file > system, and if you checked to see if the specific small number of > files (by percentage) that were failing could be defragged later, and > checked the timestamps, that would have been really useful data which > would have allowed you (and me) to waste a lot less time. > > Cheers, > > - Ted Thanks! -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Bug 220594] Online defragmentation has broken in 6.16 2025-11-24 16:13 ` bugzilla-daemon @ 2025-11-24 16:33 ` Theodore Tso 0 siblings, 0 replies; 16+ messages in thread From: Theodore Tso @ 2025-11-24 16:33 UTC (permalink / raw) To: bugzilla-daemon; +Cc: linux-ext4 On Mon, Nov 24, 2025 at 04:13:27PM +0000, bugzilla-daemon@kernel.org wrote: > > And for the files that were failing, if you unmount the file system > > and remount it, can you then defrag the file in question? If the > > No. Tried that thrice. Can you try that again, and verify using strace that you get the same EBUSY error (as opposed to some other error) after unmounting and remounting the file system? At this point, I don't want to take *anything* for granted. Given that past attempts where you've sent me a metadata-only e2image dump, I haven't been able to reproduce it, are you willing to build an upstream kernel (as opposed to a Fedora kernel), and demonstrate that it reproduces on an upstream kernel? If so, would you be willing to run an upstream kernel with some printk debugging added so we can see what is going on --- since again, I still haven't been able to reprdouce it on my systems. > > What this means is that if the file has pages which need to be written > > out to the final location on disk (e.g., if you are in data=journal > > Journalling is disabled on all my ext4 partitions. So you are running a file system with ^has_journal? Can you send me a copy dumpe2fs -h on that file system? Something else to do. For those files for which e2defrag is failing reliably after an unmount/remount, are you reproducing the failure by running e4defrag on just that one file, or by iterating over the entire file system? If it reproduces reliably where you try defragging just that one file, can you try using debugfs's "stat" command and see what might be different on that file versus some file for which e4defrag on just that one file *does* work? e.g.: debugfs /dev/hdXX debugfs: stat groups Inode: 177 Type: regular Mode: 0755 Flags: 0x80000 Generation: 0 Version: 0x00000000:00000000 User: 0 Group: 0 Project: 0 Size: 43432 File ACL: 0 Links: 1 Blockcount: 88 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x6916c804:00000000 -- Fri Nov 14 01:11:16 2025 atime: 0x6916c879:00000000 -- Fri Nov 14 01:13:13 2025 mtime: 0x684062bd:00000000 -- Wed Jun 4 11:14:05 2025 crtime: 0x6924883d:00000000 -- Mon Nov 24 11:30:53 2025 Size of extra inode fields: 32 Inode checksum: 0x2e204798 EXTENTS: (0-10):9368-9378 Finally, I'm curious --- if it's only just a few files out of hundreds of thousands of files, why do you *care*? You seem to be emphatic about calling online defragmentation *broken* and seem outraged that no one else seems to be discussing or working this issue. Why is this a high priority issue for you? Thanks, - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug 220594] Online defragmentation has broken in 6.16 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon ` (11 preceding siblings ...) 2025-11-24 16:13 ` bugzilla-daemon @ 2025-11-24 16:33 ` bugzilla-daemon 12 siblings, 0 replies; 16+ messages in thread From: bugzilla-daemon @ 2025-11-24 16:33 UTC (permalink / raw) To: linux-ext4 https://bugzilla.kernel.org/show_bug.cgi?id=220594 --- Comment #13 from Theodore Tso (tytso@mit.edu) --- On Mon, Nov 24, 2025 at 04:13:27PM +0000, bugzilla-daemon@kernel.org wrote: > > And for the files that were failing, if you unmount the file system > > and remount it, can you then defrag the file in question? If the > > No. Tried that thrice. Can you try that again, and verify using strace that you get the same EBUSY error (as opposed to some other error) after unmounting and remounting the file system? At this point, I don't want to take *anything* for granted. Given that past attempts where you've sent me a metadata-only e2image dump, I haven't been able to reproduce it, are you willing to build an upstream kernel (as opposed to a Fedora kernel), and demonstrate that it reproduces on an upstream kernel? If so, would you be willing to run an upstream kernel with some printk debugging added so we can see what is going on --- since again, I still haven't been able to reprdouce it on my systems. > > What this means is that if the file has pages which need to be written > > out to the final location on disk (e.g., if you are in data=journal > > Journalling is disabled on all my ext4 partitions. So you are running a file system with ^has_journal? Can you send me a copy dumpe2fs -h on that file system? Something else to do. For those files for which e2defrag is failing reliably after an unmount/remount, are you reproducing the failure by running e4defrag on just that one file, or by iterating over the entire file system? If it reproduces reliably where you try defragging just that one file, can you try using debugfs's "stat" command and see what might be different on that file versus some file for which e4defrag on just that one file *does* work? e.g.: debugfs /dev/hdXX debugfs: stat groups Inode: 177 Type: regular Mode: 0755 Flags: 0x80000 Generation: 0 Version: 0x00000000:00000000 User: 0 Group: 0 Project: 0 Size: 43432 File ACL: 0 Links: 1 Blockcount: 88 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x6916c804:00000000 -- Fri Nov 14 01:11:16 2025 atime: 0x6916c879:00000000 -- Fri Nov 14 01:13:13 2025 mtime: 0x684062bd:00000000 -- Wed Jun 4 11:14:05 2025 crtime: 0x6924883d:00000000 -- Mon Nov 24 11:30:53 2025 Size of extra inode fields: 32 Inode checksum: 0x2e204798 EXTENTS: (0-10):9368-9378 Finally, I'm curious --- if it's only just a few files out of hundreds of thousands of files, why do you *care*? You seem to be emphatic about calling online defragmentation *broken* and seem outraged that no one else seems to be discussing or working this issue. Why is this a high priority issue for you? Thanks, - Ted -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-11-24 16:33 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon 2025-09-22 18:04 ` [Bug 220594] " bugzilla-daemon 2025-09-25 12:14 ` bugzilla-daemon 2025-09-25 14:24 ` bugzilla-daemon 2025-09-25 18:44 ` bugzilla-daemon 2025-09-25 20:44 ` bugzilla-daemon 2025-09-26 21:29 ` bugzilla-daemon 2025-09-27 13:19 ` bugzilla-daemon 2025-09-28 2:17 ` bugzilla-daemon 2025-11-20 6:48 ` bugzilla-daemon 2025-11-24 5:15 ` Theodore Tso 2025-11-20 6:52 ` bugzilla-daemon 2025-11-24 5:15 ` bugzilla-daemon 2025-11-24 16:13 ` bugzilla-daemon 2025-11-24 16:33 ` Theodore Tso 2025-11-24 16:33 ` bugzilla-daemon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).