[Bug 220594] New: Online defragmentation has broken in 6.16

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Bug 220594] New: Online defragmentation has broken in 6.16
@ 2025-09-22 17:57 bugzilla-daemon
  2025-09-22 18:04 ` [Bug 220594] " bugzilla-daemon
                   ` (12 more replies)
  0 siblings, 13 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-09-22 17:57 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

            Bug ID: 220594
           Summary: Online defragmentation has broken in 6.16
           Product: File System
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: low
          Priority: P3
         Component: ext4
          Assignee: fs_ext4@kernel-bugs.osdl.org
          Reporter: aros@gmx.com
        Regression: No

I cannot defragment multiple files on ext4 in 6.16 with this error:

        Failed to defrag with EXT4_IOC_MOVE_EXT ioctl:Success   [ NG ]

This wasn't the case with 6.15.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
@ 2025-09-22 18:04 ` bugzilla-daemon
  2025-09-25 12:14 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-09-22 18:04 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #1 from Artem S. Tashkinov (aros@gmx.com) ---
We are talking about over a hundred of files as small as 100KB despite ample
free space (yes, I do have enough space to properly allocate files) - over
16GB.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
  2025-09-22 18:04 ` [Bug 220594] " bugzilla-daemon
@ 2025-09-25 12:14 ` bugzilla-daemon
  2025-09-25 14:24 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-09-25 12:14 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

Artem S. Tashkinov (aros@gmx.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@mit.edu

--- Comment #2 from Artem S. Tashkinov (aros@gmx.com) ---
I see no patches in 6.16.9 or a discussion on LKML.

The bug is still valid.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
  2025-09-22 18:04 ` [Bug 220594] " bugzilla-daemon
  2025-09-25 12:14 ` bugzilla-daemon
@ 2025-09-25 14:24 ` bugzilla-daemon
  2025-09-25 18:44 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-09-25 14:24 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

Theodore Tso (tytso@mit.edu) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |UNREPRODUCIBLE

--- Comment #3 from Theodore Tso (tytso@mit.edu) ---
I can't reproduce the bug, so it's not valid for me....

# uname -a
Linux kvm-xfstests 6.17.0-rc4-xfstests #245 SMP PREEMPT_DYNAMIC Thu Sep 25
10:02:23 EDT 2025 x86_64 GNU/Linux

root@kvm-xfstests:/vdc# seq 1 1024 | xargs -n 1 cp /etc/motd 
root@kvm-xfstests:/vdc# seq 1 2 1024 | xargs rm 
root@kvm-xfstests:/vdc# cp /bin/bash .
root@kvm-xfstests:/vdc# cp /bin/netstat .
root@kvm-xfstests:/vdc# cp /bin/uniq .
root@kvm-xfstests:/vdc# seq 2 2 1024 | xargs rm 
root@kvm-xfstests:/vdc# filefrag  *
bash: 317 extents found
lost+found: 1 extent found
netstat: 39 extents found
uniq: 1 extent found

root@kvm-xfstests:/vdc# e4defrag  *
e4defrag 1.47.2 (1-Jan-2025)
ext4 defragmentation for bash
[1/1]bash:      100%    [ OK ]
 Success:                       [1/1]
ext4 defragmentation for directory(lost+found)
Can not process "lost+found"
 "lost+found"
ext4 defragmentation for netstat
[1/1]netstat:   100%    [ OK ]
 Success:                       [1/1]
ext4 defragmentation for uniq
[1/1]uniq:      100%    [ OK ]
 Success:                       [1/1]
root@kvm-xfstests:/vdc#

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
                   ` (2 preceding siblings ...)
  2025-09-25 14:24 ` bugzilla-daemon
@ 2025-09-25 18:44 ` bugzilla-daemon
  2025-09-25 20:44 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-09-25 18:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

Artem S. Tashkinov (aros@gmx.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|UNREPRODUCIBLE              |---

--- Comment #4 from Artem S. Tashkinov (aros@gmx.com) ---
I can upload a couple of ext4 images for you to repro the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
                   ` (3 preceding siblings ...)
  2025-09-25 18:44 ` bugzilla-daemon
@ 2025-09-25 20:44 ` bugzilla-daemon
  2025-09-26 21:29 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-09-25 20:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #5 from Artem S. Tashkinov (aros@gmx.com) ---
I've emailed you privately/offlist.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
                   ` (4 preceding siblings ...)
  2025-09-25 20:44 ` bugzilla-daemon
@ 2025-09-26 21:29 ` bugzilla-daemon
  2025-09-27 13:19 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-09-26 21:29 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #6 from Theodore Tso (tytso@mit.edu) ---
The reproducer file system which Artem sent me doesn't fail for me on both 6.16
and 6.17-rc4.   So it maybe something which is specific to Artem's kernel or
hardwaare configuration.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
                   ` (5 preceding siblings ...)
  2025-09-26 21:29 ` bugzilla-daemon
@ 2025-09-27 13:19 ` bugzilla-daemon
  2025-09-28  2:17 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-09-27 13:19 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #7 from Artem S. Tashkinov (aros@gmx.com) ---
Created attachment 308721
  --> https://bugzilla.kernel.org/attachment.cgi?id=308721&action=edit
My 6.16 configuration

I remember from the log that you sent, there were quite a lot of failures, over
9000 of them.

Are you sure your scripts handles this?

e4defrag -v uBlock0\@raymondhill.net.xpi; echo $?
e4defrag 1.47.2 (1-Jan-2025)
ext4 defragmentation for uBlock0@raymondhill.net.xpi
[1/1]uBlock0@raymondhill.net.xpi:         0%
        Failed to defrag with EXT4_IOC_MOVE_EXT ioctl:Success   [ NG ]
 Success:                       [0/1]
1

So, it's "successful" except it's not.

> is specific to Artem's kernel or hardware configuration.

I thought ext4 code is quite independent from the rest of the kernel and no
other kernel options can possibly affect it.

gzip -d < /proc/config.gz | grep -i ext4
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
                   ` (6 preceding siblings ...)
  2025-09-27 13:19 ` bugzilla-daemon
@ 2025-09-28  2:17 ` bugzilla-daemon
  2025-11-20  6:48 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-09-28  2:17 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #8 from Theodore Tso (tytso@mit.edu) ---
e4defrag reports directories as failures.   We should probably change it to
just skip directories so it doesn't confuse people:

root@kvm-xfstests:~# mke2fs -Fq -t ext4 /dev/vdc
/dev/vdc contains a ext4 file system
        last mounted on Sat Sep 27 22:14:06 2025
root@kvm-xfstests:~# mount /dev/vdc /vdc
[  118.187747] EXT4-fs (vdc): mounted filesystem
c6ef56ee-8512-4408-8a58-2c83e3a36f91 r/w with ordered data mode. Quota mode:
none.
root@kvm-xfstests:~# e4defrag /vdc
e4defrag 1.47.2 (1-Jan-2025)
ext4 defragmentation for directory(/vdc)

        Success:                        [ 0/2 ]
        Failure:                        [ 2/2 ]
root@kvm-xfstests:~# find /vdc -type d
/vdc
/vdc/lost+found
root@kvm-xfstests:~# 

I am not seeing the "Failure to defrag with EXT4_IOC_MOVE_EXT ioctl" message
which you cited.   So I don't believe I'm able to reproduce whatever you're
seeing.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
                   ` (7 preceding siblings ...)
  2025-09-28  2:17 ` bugzilla-daemon
@ 2025-11-20  6:48 ` bugzilla-daemon
  2025-11-24  5:15   ` Theodore Tso
  2025-11-20  6:52 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 16+ messages in thread
From: bugzilla-daemon @ 2025-11-20  6:48 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #9 from Artem S. Tashkinov (aros@gmx.com) ---
Sadly it's still broken in 6.17:

ext4 defragmentation for ./birdie/.config/google-chrome/Default/Sync
Data/LevelDB/000435.log
        Failed to defrag with EXT4_IOC_MOVE_EXT ioctl:Success   [ NG ]

ls -la /home/birdie/.config/google/Default/Sync Data/LevelDB/000435.log

-rw-------. 1 birdie birdie 139150 Nov 19 10:22
'/home/birdie/.config/google-chrome/Default/Sync Data/LevelDB/000435.log'

A small file that is not getting defragmented and I have literally five dozens
of them.

And I have plenty of free space.

Filesystem volume name:   
Last mounted on:          /
Filesystem UUID:          
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr resize_inode dir_index filetype extent
flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         not clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              1572864
Block count:              6287360
Reserved block count:     314367
Overhead clusters:        109963
Free blocks:              4126491
Free inodes:              1428171
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1022
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       
Last mount time:          
Last write time:          
Mount count:              
Maximum mount count:      -1
Last checked:             
Check interval:           0 (<none>)
Lifetime writes:          
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Default directory hash:   half_md4
Directory Hash Seed:      
Journal backup:           inode blocks

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
                   ` (8 preceding siblings ...)
  2025-11-20  6:48 ` bugzilla-daemon
@ 2025-11-20  6:52 ` bugzilla-daemon
  2025-11-24  5:15 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-11-20  6:52 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #10 from Artem S. Tashkinov (aros@gmx.com) ---
openat(AT_FDCWD, "/home/birdie/.config/google-chrome/Default/Sync
Data/LevelDB/000435.log", O_RDWR) = 3
ioctl(3, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615,
fm_flags=FIEMAP_FLAG_SYNC, fm_extent_count=512} => {fm_flags=FIEMAP_FLAG_SYNC,
fm_mapped_extents=4, ...}) = 0
fstatfs(3, {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=6177397,
f_bfree=4170566, f_bavail=3852103, f_files=1572864, f_ffree=1433202,
f_fsid={val=[0xbbd03838, 0xb062f9ed]}, f_namelen=255, f_frsize=4096,
f_flags=ST_VALID|ST_NOATIME}) = 0
fcntl(3, F_GETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0,
l_pid=0}) = 0
fsync(3)                                = 0
openat(AT_FDCWD, "/home/birdie/.config/google-chrome/Default/Sync
Data/LevelDB/000435.log.defrag", O_WRONLY|O_CREAT|O_EXCL, 0400) = 4
unlink("/home/birdie/.config/google-chrome/Default/Sync
Data/LevelDB/000435.log.defrag") = 0
fallocate(4, 0, 0, 139264)              = 0
ioctl(4, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615,
fm_flags=FIEMAP_FLAG_SYNC, fm_extent_count=512} => {fm_flags=FIEMAP_FLAG_SYNC,
fm_mapped_extents=1, ...}) = 0
[1/1]/home/birdie/.config/google-chrome/Default/Sync Data/LevelDB/000435.log:  
  0%) = 97
mmap(NULL, 139264, PROT_READ, MAP_SHARED, 3, 0) = 0x7f5541c99000
mincore(0x7f5541c99000, 139264, [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...]) = 0
munmap(0x7f5541c99000, 139264)          = 0
ioctl(3, EXT4_IOC_MOVE_EXT, 0x7ffd6bb8ac40) = -1 EBUSY (Device or resource
busy)
sync_file_range(3, 0, 139264,
SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER) =
0
fadvise64(3, 0, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 4096, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 8192, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 12288, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 16384, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 20480, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 24576, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 28672, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 32768, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 36864, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 40960, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 45056, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 49152, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 53248, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 57344, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 61440, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 65536, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 69632, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 73728, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 77824, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 81920, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 86016, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 90112, 4096, POSIX_FADV_DONTNEED) = 0
fadvise64(3, 94208, 4096, POSIX_FADV_DONTNEED) = 0
write(1, "\n", 1
)                       = 1
write(2, "\tFailed to defrag with EXT4_IOC_"..., 62     Failed to defrag with
EXT4_IOC_MOVE_EXT ioctl:Success   [ NG ]
) = 62
ioctl(3, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615,
fm_flags=FIEMAP_FLAG_SYNC, fm_extent_count=0} => {fm_flags=FIEMAP_FLAG_SYNC,
fm_mapped_extents=4, ...}) = 0



Device or resource busy? Why? Chrome is not open.

ps ax | grep -i chrome
 967139 pts/1    S+     0:00 grep --color=auto -i chrome


Online defragmentation in both 6.16 and 6.17 is broken.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bug 220594] Online defragmentation has broken in 6.16
  2025-11-20  6:48 ` bugzilla-daemon
@ 2025-11-24  5:15   ` Theodore Tso
  0 siblings, 0 replies; 16+ messages in thread
From: Theodore Tso @ 2025-11-24  5:15 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-ext4

So it's not that all files can't be defragged; just *some* files.  Is
that correct?

And when I ask whether or not it's reproducible, can you take a
snapshot of your file system, and then remount the snapshot, and will
the exact same file that failed before fails on the snapshot?

And for the files that were failing, if you unmount the file system
and remount it, can you then defrag the file in question?  If the
answer is yes, this is why bug reports of the form "Online
defragmentation in 6.16 is broken" is not particularly useful.  And
it's why I've not spent a lot of time on this bug.  We have
defragmentation tests in fstests, and they are passing, and I've tried
running defrag on the snapshot that you sent me, And It Works For Me.
So a broad "it's broken" without any further data, when it most
manifestly is not broken in my tests, means that if you really want it
to be fixed, you're going to have to do more of the debugging.

But now that we know that it's an EBUSY error, it sounds like it's
some kind of transient thing, and that's why I'm not seeing it when I
tried running it on your snapshot.

For example, one of the places where you can get EBUSY in the MOVE_EXT
ioctl is here:

                if (!filemap_release_folio(folio[0], 0) ||
                    !filemap_release_folio(folio[1], 0)) {
                        *err = -EBUSY;
                        goto drop_data_sem;
                }

... and this ultimately calls ext4_release_folio:

static bool ext4_release_folio(struct folio *folio, gfp_t wait)
{
	struct inode *inode = folio->mapping->host;
	journal_t *journal = EXT4_JOURNAL(inode);

	trace_ext4_release_folio(inode, folio);

	/* Page has dirty journalled data -> cannot release */
	if (folio_test_checked(folio))
		return false;
	if (journal)
		return jbd2_journal_try_to_free_buffers(journal, folio);
	else
		return try_to_free_buffers(folio);
}

What this means is that if the file has pages which need to be written
out to the final location on disk (e.g., if you are in data=journal
mode, and the modified file may have been written or scheduled to be
written to the journal, but not *yet* to the final location on disk,
or you are using delayed allocation and the file was just recently
written, delayed allocation is enabled, and blocks get allocated but
they haven't been written back yet) --- then the MOVE_EXT ioctl will
return EBUSY.

This is not new behaviour; we've always had this.  Now, 6.16 is when
large folio support landed for ext4, and this can result in some
really wonderful performance improvements.  This may have resulted in
a change in how often recently written files might end up getting
EBUSY when you try to defrag them --- but quite frankly, if this is a
very tiny fraction of the files in your file system, and a subsequent
defrag run will take care of them --- I'd probably think that is a
fair tradeoff.

So... if you take a look at the files that failed trying call MOVE_EXT
--- can you take a look at the timestamps and see if they are
relatively recently written files?

Also, for future reference, if you had disclosed that this was only
happening on a tiny percentage of all of the files in your file
system, and if you checked to see if the specific small number of
files (by percentage) that were failing could be defragged later, and
checked the timestamps, that would have been really useful data which
would have allowed you (and me) to waste a lot less time.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
                   ` (9 preceding siblings ...)
  2025-11-20  6:52 ` bugzilla-daemon
@ 2025-11-24  5:15 ` bugzilla-daemon
  2025-11-24 16:13 ` bugzilla-daemon
  2025-11-24 16:33 ` bugzilla-daemon
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-11-24  5:15 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #11 from Theodore Tso (tytso@mit.edu) ---
So it's not that all files can't be defragged; just *some* files.  Is
that correct?

And when I ask whether or not it's reproducible, can you take a
snapshot of your file system, and then remount the snapshot, and will
the exact same file that failed before fails on the snapshot?

And for the files that were failing, if you unmount the file system
and remount it, can you then defrag the file in question?  If the
answer is yes, this is why bug reports of the form "Online
defragmentation in 6.16 is broken" is not particularly useful.  And
it's why I've not spent a lot of time on this bug.  We have
defragmentation tests in fstests, and they are passing, and I've tried
running defrag on the snapshot that you sent me, And It Works For Me.
So a broad "it's broken" without any further data, when it most
manifestly is not broken in my tests, means that if you really want it
to be fixed, you're going to have to do more of the debugging.

But now that we know that it's an EBUSY error, it sounds like it's
some kind of transient thing, and that's why I'm not seeing it when I
tried running it on your snapshot.

For example, one of the places where you can get EBUSY in the MOVE_EXT
ioctl is here:

                if (!filemap_release_folio(folio[0], 0) ||
                    !filemap_release_folio(folio[1], 0)) {
                        *err = -EBUSY;
                        goto drop_data_sem;
                }

... and this ultimately calls ext4_release_folio:

static bool ext4_release_folio(struct folio *folio, gfp_t wait)
{
        struct inode *inode = folio->mapping->host;
        journal_t *journal = EXT4_JOURNAL(inode);

        trace_ext4_release_folio(inode, folio);

        /* Page has dirty journalled data -> cannot release */
        if (folio_test_checked(folio))
                return false;
        if (journal)
                return jbd2_journal_try_to_free_buffers(journal, folio);
        else
                return try_to_free_buffers(folio);
}

What this means is that if the file has pages which need to be written
out to the final location on disk (e.g., if you are in data=journal
mode, and the modified file may have been written or scheduled to be
written to the journal, but not *yet* to the final location on disk,
or you are using delayed allocation and the file was just recently
written, delayed allocation is enabled, and blocks get allocated but
they haven't been written back yet) --- then the MOVE_EXT ioctl will
return EBUSY.

This is not new behaviour; we've always had this.  Now, 6.16 is when
large folio support landed for ext4, and this can result in some
really wonderful performance improvements.  This may have resulted in
a change in how often recently written files might end up getting
EBUSY when you try to defrag them --- but quite frankly, if this is a
very tiny fraction of the files in your file system, and a subsequent
defrag run will take care of them --- I'd probably think that is a
fair tradeoff.

So... if you take a look at the files that failed trying call MOVE_EXT
--- can you take a look at the timestamps and see if they are
relatively recently written files?

Also, for future reference, if you had disclosed that this was only
happening on a tiny percentage of all of the files in your file
system, and if you checked to see if the specific small number of
files (by percentage) that were failing could be defragged later, and
checked the timestamps, that would have been really useful data which
would have allowed you (and me) to waste a lot less time.

Cheers,

                                        - Ted

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
                   ` (10 preceding siblings ...)
  2025-11-24  5:15 ` bugzilla-daemon
@ 2025-11-24 16:13 ` bugzilla-daemon
  2025-11-24 16:33   ` Theodore Tso
  2025-11-24 16:33 ` bugzilla-daemon
  12 siblings, 1 reply; 16+ messages in thread
From: bugzilla-daemon @ 2025-11-24 16:13 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #12 from Artem S. Tashkinov (aros@gmx.com) ---
(In reply to Theodore Tso from comment #11)
> So it's not that all files can't be defragged; just *some* files.  Is
> that correct?

That's correct.

> 
> And when I ask whether or not it's reproducible, can you take a
> snapshot of your file system, and then remount the snapshot, and will
> the exact same file that failed before fails on the snapshot?

It still fails on the snapshot.

> 
> And for the files that were failing, if you unmount the file system
> and remount it, can you then defrag the file in question?  If the

No. Tried that thrice.

> answer is yes, this is why bug reports of the form "Online
> defragmentation in 6.16 is broken" is not particularly useful.  And
> it's why I've not spent a lot of time on this bug.  We have
> defragmentation tests in fstests, and they are passing, and I've tried
> running defrag on the snapshot that you sent me, And It Works For Me.

It still doesn't with the Fedora's kernel (now running 6.17.8-200.fc42.x86_64).

> So a broad "it's broken" without any further data, when it most
> manifestly is not broken in my tests, means that if you really want it
> to be fixed, you're going to have to do more of the debugging.

I'd love to help however I cant to get it fixed.

> 
> But now that we know that it's an EBUSY error, it sounds like it's
> some kind of transient thing, and that's why I'm not seeing it when I
> tried running it on your snapshot.
> 
> For example, one of the places where you can get EBUSY in the MOVE_EXT
> ioctl is here:
> 
>                 if (!filemap_release_folio(folio[0], 0) ||
>                     !filemap_release_folio(folio[1], 0)) {
>                         *err = -EBUSY;
>                         goto drop_data_sem;
>                 }
> 
> ... and this ultimately calls ext4_release_folio:
> 
> static bool ext4_release_folio(struct folio *folio, gfp_t wait)
> {
>       struct inode *inode = folio->mapping->host;
>       journal_t *journal = EXT4_JOURNAL(inode);
> 
>       trace_ext4_release_folio(inode, folio);
> 
>       /* Page has dirty journalled data -> cannot release */
>       if (folio_test_checked(folio))
>               return false;
>       if (journal)
>               return jbd2_journal_try_to_free_buffers(journal, folio);
>       else
>               return try_to_free_buffers(folio);
> }
> 
> What this means is that if the file has pages which need to be written
> out to the final location on disk (e.g., if you are in data=journal

Journalling is disabled on all my ext4 partitions.

> mode, and the modified file may have been written or scheduled to be
> written to the journal, but not *yet* to the final location on disk,
> or you are using delayed allocation and the file was just recently
> written, delayed allocation is enabled, and blocks get allocated but
> they haven't been written back yet) --- then the MOVE_EXT ioctl will
> return EBUSY.
> 
> This is not new behaviour; we've always had this.  Now, 6.16 is when
> large folio support landed for ext4, and this can result in some
> really wonderful performance improvements.  This may have resulted in
> a change in how often recently written files might end up getting
> EBUSY when you try to defrag them --- but quite frankly, if this is a
> very tiny fraction of the files in your file system, and a subsequent
> defrag run will take care of them --- I'd probably think that is a
> fair tradeoff.

6.15 didn't have the issue.

subsequent defrag runs don't help. I've tried rebooting multiple times, tried
to defrag in single user mode (booted with `1`), with only systemd running and
journald disabled altogether, so only ~/.bash_history is opened for writing,
nothing else. No dirty buffers to speak of, `sync` does nothing as there's
nothing to flush.

> 
> So... if you take a look at the files that failed trying call MOVE_EXT
> --- can you take a look at the timestamps and see if they are
> relatively recently written files?

I'll check it.

> 
> Also, for future reference, if you had disclosed that this was only
> happening on a tiny percentage of all of the files in your file
> system, and if you checked to see if the specific small number of
> files (by percentage) that were failing could be defragged later, and
> checked the timestamps, that would have been really useful data which
> would have allowed you (and me) to waste a lot less time.
> 
> Cheers,
> 
>                                       - Ted

Thanks!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bug 220594] Online defragmentation has broken in 6.16
  2025-11-24 16:13 ` bugzilla-daemon
@ 2025-11-24 16:33   ` Theodore Tso
  0 siblings, 0 replies; 16+ messages in thread
From: Theodore Tso @ 2025-11-24 16:33 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-ext4

On Mon, Nov 24, 2025 at 04:13:27PM +0000, bugzilla-daemon@kernel.org wrote:
> > And for the files that were failing, if you unmount the file system
> > and remount it, can you then defrag the file in question?  If the
> 
> No. Tried that thrice.

Can you try that again, and verify using strace that you get the same
EBUSY error (as opposed to some other error) after unmounting and
remounting the file system?  At this point, I don't want to take
*anything* for granted.

Given that past attempts where you've sent me a metadata-only e2image
dump, I haven't been able to reproduce it, are you willing to build an
upstream kernel (as opposed to a Fedora kernel), and demonstrate that
it reproduces on an upstream kernel?  If so, would you be willing to
run an upstream kernel with some printk debugging added so we can see
what is going on --- since again, I still haven't been able to
reprdouce it on my systems.

> > What this means is that if the file has pages which need to be written
> > out to the final location on disk (e.g., if you are in data=journal
> 
> Journalling is disabled on all my ext4 partitions.

So you are running a file system with ^has_journal?  Can you send me a
copy dumpe2fs -h on that file system?

Something else to do.  For those files for which e2defrag is failing
reliably after an unmount/remount, are you reproducing the failure by
running e4defrag on just that one file, or by iterating over the
entire file system?  If it reproduces reliably where you try
defragging just that one file, can you try using debugfs's "stat"
command and see what might be different on that file versus some file
for which e4defrag on just that one file *does* work?

e.g.:

debugfs /dev/hdXX
debugfs:  stat groups
Inode: 177   Type: regular    Mode:  0755   Flags: 0x80000
Generation: 0    Version: 0x00000000:00000000
User:     0   Group:     0   Project:     0   Size: 43432
File ACL: 0
Links: 1   Blockcount: 88
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x6916c804:00000000 -- Fri Nov 14 01:11:16 2025
 atime: 0x6916c879:00000000 -- Fri Nov 14 01:13:13 2025
 mtime: 0x684062bd:00000000 -- Wed Jun  4 11:14:05 2025
crtime: 0x6924883d:00000000 -- Mon Nov 24 11:30:53 2025
Size of extra inode fields: 32
Inode checksum: 0x2e204798
EXTENTS:
(0-10):9368-9378

Finally, I'm curious --- if it's only just a few files out of hundreds
of thousands of files, why do you *care*?  You seem to be emphatic
about calling online defragmentation *broken* and seem outraged that
no one else seems to be discussing or working this issue.  Why is this
a high priority issue for you?

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 220594] Online defragmentation has broken in 6.16
  2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
                   ` (11 preceding siblings ...)
  2025-11-24 16:13 ` bugzilla-daemon
@ 2025-11-24 16:33 ` bugzilla-daemon
  12 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2025-11-24 16:33 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #13 from Theodore Tso (tytso@mit.edu) ---
On Mon, Nov 24, 2025 at 04:13:27PM +0000, bugzilla-daemon@kernel.org wrote:
> > And for the files that were failing, if you unmount the file system
> > and remount it, can you then defrag the file in question?  If the
> 
> No. Tried that thrice.

Can you try that again, and verify using strace that you get the same
EBUSY error (as opposed to some other error) after unmounting and
remounting the file system?  At this point, I don't want to take
*anything* for granted.

Given that past attempts where you've sent me a metadata-only e2image
dump, I haven't been able to reproduce it, are you willing to build an
upstream kernel (as opposed to a Fedora kernel), and demonstrate that
it reproduces on an upstream kernel?  If so, would you be willing to
run an upstream kernel with some printk debugging added so we can see
what is going on --- since again, I still haven't been able to
reprdouce it on my systems.

> > What this means is that if the file has pages which need to be written
> > out to the final location on disk (e.g., if you are in data=journal
> 
> Journalling is disabled on all my ext4 partitions.

So you are running a file system with ^has_journal?  Can you send me a
copy dumpe2fs -h on that file system?

Something else to do.  For those files for which e2defrag is failing
reliably after an unmount/remount, are you reproducing the failure by
running e4defrag on just that one file, or by iterating over the
entire file system?  If it reproduces reliably where you try
defragging just that one file, can you try using debugfs's "stat"
command and see what might be different on that file versus some file
for which e4defrag on just that one file *does* work?

e.g.:

debugfs /dev/hdXX
debugfs:  stat groups
Inode: 177   Type: regular    Mode:  0755   Flags: 0x80000
Generation: 0    Version: 0x00000000:00000000
User:     0   Group:     0   Project:     0   Size: 43432
File ACL: 0
Links: 1   Blockcount: 88
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x6916c804:00000000 -- Fri Nov 14 01:11:16 2025
 atime: 0x6916c879:00000000 -- Fri Nov 14 01:13:13 2025
 mtime: 0x684062bd:00000000 -- Wed Jun  4 11:14:05 2025
crtime: 0x6924883d:00000000 -- Mon Nov 24 11:30:53 2025
Size of extra inode fields: 32
Inode checksum: 0x2e204798
EXTENTS:
(0-10):9368-9378

Finally, I'm curious --- if it's only just a few files out of hundreds
of thousands of files, why do you *care*?  You seem to be emphatic
about calling online defragmentation *broken* and seem outraged that
no one else seems to be discussing or working this issue.  Why is this
a high priority issue for you?

Thanks,

                                        - Ted

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-11-24 16:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-22 17:57 [Bug 220594] New: Online defragmentation has broken in 6.16 bugzilla-daemon
2025-09-22 18:04 ` [Bug 220594] " bugzilla-daemon
2025-09-25 12:14 ` bugzilla-daemon
2025-09-25 14:24 ` bugzilla-daemon
2025-09-25 18:44 ` bugzilla-daemon
2025-09-25 20:44 ` bugzilla-daemon
2025-09-26 21:29 ` bugzilla-daemon
2025-09-27 13:19 ` bugzilla-daemon
2025-09-28  2:17 ` bugzilla-daemon
2025-11-20  6:48 ` bugzilla-daemon
2025-11-24  5:15   ` Theodore Tso
2025-11-20  6:52 ` bugzilla-daemon
2025-11-24  5:15 ` bugzilla-daemon
2025-11-24 16:13 ` bugzilla-daemon
2025-11-24 16:33   ` Theodore Tso
2025-11-24 16:33 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).