* [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
@ 2025-05-25 8:32 Al Viro
2025-05-25 18:02 ` Al Viro
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: Al Viro @ 2025-05-25 8:32 UTC (permalink / raw)
To: Jens Axboe
Cc: Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel, Linus Torvalds
generic/127 with xfstests built on debian-testing (trixie) ends up with
assorted memory corruption; trace below is with CONFIG_DEBUG_PAGEALLOC and
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT and it looks like a double free
somewhere in iomap. Unfortunately, commit in question is just making
xfs use the infrastructure built in earlier series - not that useful
for isolating the breakage.
[ 22.001529] run fstests generic/127 at 2025-05-25 04:13:23
[ 35.498573] BUG: Bad page state in process kworker/2:1 pfn:112ce9
[ 35.499260] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e 9
[ 35.499764] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)
[ 35.500302] raw: 800000000000000e dead000000000100 dead000000000122 000000000
[ 35.500786] raw: 000000000000003e 0000000000000000 00000000ffffffff 000000000
[ 35.501248] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 35.501624] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs0
[ 35.503209] CPU: 2 UID: 0 PID: 85 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ 7
[ 35.503211] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.164
[ 35.503212] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]
[ 35.503279] Call Trace:
[ 35.503281] <TASK>
[ 35.503282] dump_stack_lvl+0x4f/0x60
[ 35.503296] bad_page+0x6f/0x100
[ 35.503300] free_frozen_pages+0x303/0x550
[ 35.503301] iomap_finish_ioend+0xf6/0x380
[ 35.503304] iomap_finish_ioends+0x83/0xc0
[ 35.503305] xfs_end_ioend+0x64/0x140 [xfs]
[ 35.503342] xfs_end_io+0x93/0xc0 [xfs]
[ 35.503378] process_one_work+0x153/0x390
[ 35.503382] worker_thread+0x2ab/0x3b0
It's 4:30am here, so I'm going to leave attempts to actually debug that
thing until tomorrow; I do have a kvm where it's reliably reproduced
within a few minutes, so if anyone comes up with patches, I'll be able
to test them.
Breakage is still present in the current mainline ;-/
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-25 8:32 [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?) Al Viro
@ 2025-05-25 18:02 ` Al Viro
2025-05-25 18:06 ` Al Viro
2025-05-29 1:56 ` Darrick J. Wong
2 siblings, 0 replies; 24+ messages in thread
From: Al Viro @ 2025-05-25 18:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel, Linus Torvalds
On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
> generic/127 with xfstests built on debian-testing (trixie) ends up with
> assorted memory corruption; trace below is with CONFIG_DEBUG_PAGEALLOC and
> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT and it looks like a double free
> somewhere in iomap. Unfortunately, commit in question is just making
> xfs use the infrastructure built in earlier series - not that useful
> for isolating the breakage.
FWIW, the same breakage is reproduced within a couple of iterations of
./check generic/127 on debian-testing image with xfstests built fresh from
git and debian linux-image-6.15-rc7-amd64-unsigned_6.15~rc7-1~exp1_amd64.deb
IOW, it's not something exotic in .config here. KVM setup is also not
unusual -
kvm \
-boot order=c \
-m 16384 \
-netdev "tap,id=nic0,ifname=tap4,script=no,downscript=no" \
-device "e1000,netdev=nic0" \
-nographic \
-smp 4 \
-hdb /home/al/emu/ssd/image \
trixie.img
with image partitioned into two 6G xfs filesystems, with
export TEST_DEV=/dev/sdb1
export TEST_DIR=/home/test
export SCRATCH_DEV=/dev/sdb2
export SCRATCH_MNT=/home/scratch
for local.config. Bog-standard install, ext4 for everything on sda,
nothing fancy for storage setup - qemu defaults all way through.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-25 8:32 [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?) Al Viro
2025-05-25 18:02 ` Al Viro
@ 2025-05-25 18:06 ` Al Viro
2025-05-25 19:12 ` Vlastimil Babka
2025-05-29 1:56 ` Darrick J. Wong
2 siblings, 1 reply; 24+ messages in thread
From: Al Viro @ 2025-05-25 18:06 UTC (permalink / raw)
To: Jens Axboe
Cc: Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel, Linus Torvalds
On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
> Breakage is still present in the current mainline ;-/
With CONFIG_DEBUG_VM on top of pagealloc debugging:
[ 1434.992817] run fstests generic/127 at 2025-05-25 11:46:11g
[ 1448.956242] BUG: Bad page state in process kworker/2:1 pfn:112cb0g
[ 1448.956846] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
[ 1448.957453] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
[ 1448.957863] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
[ 1448.958303] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
[ 1448.958833] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) setg
[ 1448.959320] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc loop ecryptfs 9pnet_virtio 9pnet netfs evdev pcspkr sg button ext4 jbd2 btrfs blake2b_generic xor zlib_deflate raid6_pq zstd_compress sr_mod cdrom ata_generic ata_piix psmouse serio_raw i2c_piix4 i2c_smbus libata e1000g
[ 1448.960874] CPU: 2 UID: 0 PID: 2614 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ #78g
[ 1448.960878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014g
[ 1448.960879] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]g
[ 1448.960938] Call Trace:g
[ 1448.960939] <TASK>g
[ 1448.960940] dump_stack_lvl+0x4f/0x60g
[ 1448.960953] bad_page+0x6f/0x100g
[ 1448.960957] free_frozen_pages+0x471/0x640g
[ 1448.960958] iomap_finish_ioend+0x196/0x3c0g
[ 1448.960963] iomap_finish_ioends+0x83/0xc0g
[ 1448.960964] xfs_end_ioend+0x64/0x140 [xfs]g
[ 1448.961003] xfs_end_io+0x93/0xc0 [xfs]g
[ 1448.961036] process_one_work+0x153/0x390g
[ 1448.961044] worker_thread+0x2ab/0x3b0g
[ 1448.961045] ? rescuer_thread+0x470/0x470g
[ 1448.961047] kthread+0xf7/0x200g
[ 1448.961048] ? kthread_use_mm+0xa0/0xa0g
[ 1448.961049] ret_from_fork+0x2d/0x50g
[ 1448.961053] ? kthread_use_mm+0xa0/0xa0g
[ 1448.961054] ret_from_fork_asm+0x11/0x20g
[ 1448.961058] </TASK>g
[ 1448.961155] Disabling lock debugging due to kernel taintg
[ 1448.969569] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
[ 1448.970023] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
[ 1448.970651] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
[ 1448.971222] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
[ 1448.971812] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))g
[ 1448.972490] ------------[ cut here ]------------g
[ 1448.972841] kernel BUG at ./include/linux/mm.h:1455!g
[ 1448.973421] Oops: invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOCg
[ 1448.973853] CPU: 2 UID: 0 PID: 2614 Comm: kworker/2:1 Tainted: G B 6.14.0-rc1+ #78g
[ 1448.974345] Tainted: [B]=BAD_PAGEg
[ 1448.974565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014g
[ 1448.975074] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]g
[ 1448.975428] RIP: 0010:folio_end_writeback+0x155/0x180g
[ 1448.975731] Code: 13 40 0f 92 c5 e9 23 ff ff ff 48 c7 c6 00 d5 e7 81 48 89 df e8 0c 8a 03 00 0f 0b 48 c7 c6 d0 38 e5 81 48 89 df e8 fb 89 03 00 <0f> 0b 48 c7 c6 40 5b e5 81 48 89 df e8 ea 89 03 00 0f 0b 48 c7 c6g
[ 1448.976655] RSP: 0018:ffffc90001a53d68 EFLAGS: 00010286g
[ 1448.976953] RAX: 000000000000005c RBX: ffffea00044b2c00 RCX: 0000000000000000g
[ 1448.977331] RDX: 0000000000000001 RSI: ffffffff81e74e9e RDI: 00000000ffffffffg
[ 1448.977711] RBP: ffffea00044b2c40 R08: 0000000000004ffb R09: 00000000ffffefffg
[ 1448.978089] R10: 00000000ffffefff R11: ffffffff82043bc0 R12: 0000000000001000g
[ 1448.978464] R13: ffff888101ecb840 R14: 0000000000000000 R15: ffffea00044b2c00g
[ 1448.978844] FS: 0000000000000000(0000) GS:ffff88842dd00000(0000) knlGS:0000000000000000g
[ 1448.979289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033g
[ 1448.979609] CR2: 00007fd3d42a2000 CR3: 0000000111543000 CR4: 00000000000006f0g
[ 1448.979989] Call Trace:g
[ 1448.980170] <TASK>g
[ 1448.980336] ? die+0x32/0x80g
[ 1448.980543] ? do_trap+0xd5/0x100g
[ 1448.980767] ? folio_end_writeback+0x155/0x180g
[ 1448.981033] ? do_error_trap+0x65/0x80g
[ 1448.981270] ? folio_end_writeback+0x155/0x180g
[ 1448.981536] ? exc_invalid_op+0x4c/0x60g
[ 1448.981790] ? folio_end_writeback+0x155/0x180g
[ 1448.982056] ? asm_exc_invalid_op+0x16/0x20g
[ 1448.982315] ? folio_end_writeback+0x155/0x180g
[ 1448.982580] ? folio_end_writeback+0x155/0x180g
[ 1448.982846] iomap_finish_ioend+0x196/0x3c0g
[ 1448.983108] iomap_finish_ioends+0x55/0xc0g
[ 1448.983363] xfs_end_ioend+0x64/0x140 [xfs]g
[ 1448.983663] xfs_end_io+0x93/0xc0 [xfs]g
[ 1448.983937] process_one_work+0x153/0x390g
[ 1448.984189] worker_thread+0x2ab/0x3b0g
[ 1448.984427] ? rescuer_thread+0x470/0x470g
[ 1448.984674] kthread+0xf7/0x200g
[ 1448.984887] ? kthread_use_mm+0xa0/0xa0g
[ 1448.985128] ret_from_fork+0x2d/0x50g
[ 1448.985362] ? kthread_use_mm+0xa0/0xa0g
[ 1448.985601] ret_from_fork_asm+0x11/0x20g
[ 1448.985846] </TASK>g
[ 1448.986017] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc loop ecryptfs 9pnet_virtio 9pnet netfs evdev pcspkr sg button ext4 jbd2 btrfs blake2b_generic xor zlib_deflate raid6_pq zstd_compress sr_mod cdrom ata_generic ata_piix psmouse serio_raw i2c_piix4 i2c_smbus libata e1000g
[ 1448.987399] ---[ end trace 0000000000000000 ]---g
[ 1448.987896] RIP: 0010:folio_end_writeback+0x155/0x180g
[ 1448.988220] Code: 13 40 0f 92 c5 e9 23 ff ff ff 48 c7 c6 00 d5 e7 81 48 89 df e8 0c 8a 03 00 0f 0b 48 c7 c6 d0 38 e5 81 48 89 df e8 fb 89 03 00 <0f> 0b 48 c7 c6 40 5b e5 81 48 89 df e8 ea 89 03 00 0f 0b 48 c7 c6g
[ 1448.989246] RSP: 0018:ffffc90001a53d68 EFLAGS: 00010286g
[ 1448.992210] RAX: 000000000000005c RBX: ffffea00044b2c00 RCX: 0000000000000000g
[ 1448.992619] RDX: 0000000000000001 RSI: ffffffff81e74e9e RDI: 00000000ffffffffg
[ 1448.993010] RBP: ffffea00044b2c40 R08: 0000000000004ffb R09: 00000000ffffefffg
[ 1448.993577] R10: 00000000ffffefff R11: ffffffff82043bc0 R12: 0000000000001000g
[ 1448.994411] R13: ffff888101ecb840 R14: 0000000000000000 R15: ffffea00044b2c00g
[ 1448.994823] FS: 0000000000000000(0000) GS:ffff88842dd00000(0000) knlGS:0000000000000000g
[ 1448.995390] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033g
[ 1448.995916] CR2: 00007fd3d42a2000 CR3: 0000000111543000 CR4: 00000000000006f0g
kvm: terminating on signal 15 from pid 32057 (killall)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-25 18:06 ` Al Viro
@ 2025-05-25 19:12 ` Vlastimil Babka
2025-05-25 20:32 ` Linus Torvalds
2025-05-26 13:05 ` Jens Axboe
0 siblings, 2 replies; 24+ messages in thread
From: Vlastimil Babka @ 2025-05-25 19:12 UTC (permalink / raw)
To: Al Viro, Jens Axboe, Matthew Wilcox, Jan Kara
Cc: Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel, Linus Torvalds
On 5/25/25 8:06 PM, Al Viro wrote:
> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
>
>> Breakage is still present in the current mainline ;-/
>
> With CONFIG_DEBUG_VM on top of pagealloc debugging:
>
> [ 1434.992817] run fstests generic/127 at 2025-05-25 11:46:11g
> [ 1448.956242] BUG: Bad page state in process kworker/2:1 pfn:112cb0g
> [ 1448.956846] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
> [ 1448.957453] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
It doesn't like the writeback flag.
> [ 1448.957863] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
> [ 1448.958303] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
> [ 1448.958833] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) setg
> [ 1448.959320] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc loop ecryptfs 9pnet_virtio 9pnet netfs evdev pcspkr sg button ext4 jbd2 btrfs blake2b_generic xor zlib_deflate raid6_pq zstd_compress sr_mod cdrom ata_generic ata_piix psmouse serio_raw i2c_piix4 i2c_smbus libata e1000g
> [ 1448.960874] CPU: 2 UID: 0 PID: 2614 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ #78g
> [ 1448.960878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014g
> [ 1448.960879] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]g
> [ 1448.960938] Call Trace:g
> [ 1448.960939] <TASK>g
> [ 1448.960940] dump_stack_lvl+0x4f/0x60g
> [ 1448.960953] bad_page+0x6f/0x100g
> [ 1448.960957] free_frozen_pages+0x471/0x640g
> [ 1448.960958] iomap_finish_ioend+0x196/0x3c0g
> [ 1448.960963] iomap_finish_ioends+0x83/0xc0g
> [ 1448.960964] xfs_end_ioend+0x64/0x140 [xfs]g
> [ 1448.961003] xfs_end_io+0x93/0xc0 [xfs]g
> [ 1448.961036] process_one_work+0x153/0x390g
> [ 1448.961044] worker_thread+0x2ab/0x3b0g
> [ 1448.961045] ? rescuer_thread+0x470/0x470g
> [ 1448.961047] kthread+0xf7/0x200g
> [ 1448.961048] ? kthread_use_mm+0xa0/0xa0g
> [ 1448.961049] ret_from_fork+0x2d/0x50g
> [ 1448.961053] ? kthread_use_mm+0xa0/0xa0g
> [ 1448.961054] ret_from_fork_asm+0x11/0x20g
> [ 1448.961058] </TASK>g
> [ 1448.961155] Disabling lock debugging due to kernel taintg
> [ 1448.969569] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
same pfn, same struct page
> [ 1448.970023] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
> [ 1448.970651] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
> [ 1448.971222] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
> [ 1448.971812] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))g
> [ 1448.972490] ------------[ cut here ]------------g
> [ 1448.972841] kernel BUG at ./include/linux/mm.h:1455!g
this is folio_get() noticing refcount is 0, so a use-after free, because
we already tried to free the page above.
I'm not familiar with this code too much, but I suspect problem was
introduced by commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached
pages when writeback completes") and only (more) exposed here.
so in folio_end_writeback() we have
if (__folio_end_writeback(folio))
folio_wake_bit(folio, PG_writeback);
but calling the folio_end_dropbehind_write() doesn't depend on the
result of __folio_end_writeback()
this seems rather suspicious
I think if __folio_end_writeback() was true then PG_writeback would be
cleared and thus we'd not see the PAGE_FLAGS_CHECK_AT_FREE failure.
Instead we do a premature folio_end_dropbehind_write() dropping a page
ref and then the final folio_put() in folio_end_writeback() frees the
page and splats on the PG_writeback. Then the folio is processed again
in the following iteration of iomap_finish_ioend() and splats on the
refcount-already-zero.
So I think folio_end_dropbehind_write() should only be done when
__folio_end_writeback() was true. Most likely even the
folio_test_clear_dropbehind() should be tied to that, or we clear it too
early and then never act upon it later?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-25 19:12 ` Vlastimil Babka
@ 2025-05-25 20:32 ` Linus Torvalds
2025-05-25 20:48 ` Matthew Wilcox
2025-05-26 13:05 ` Jens Axboe
1 sibling, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2025-05-25 20:32 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Al Viro, Jens Axboe, Matthew Wilcox, Jan Kara, Christoph Hellwig,
Darrick J. Wong, Christian Brauner, linux-fsdevel
Well, this isn't great timing, since I was going to do 6.15 within the hour.
On Sun, 25 May 2025 at 12:11, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> I'm not familiar with this code too much, but I suspect problem was
> introduced by commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached
> pages when writeback completes") and only (more) exposed here.
That bug goes back to 6.13 if so.
But yeah, maybe the drop-behind case never triggers in practice, and I
should just revert commit 974c5e6139db ("xfs: flag as supporting
FOP_DONTCACHE") for now.
That's kind of sad too, but at least that's new to 6.15 and we
wouldn't have a kernel release that triggers this issue.
I realize that Vlastimil had a suggested possible fix, but doing
_that_ kind of surgery at this point in the release isn't an option,
I'm afraid. And delaying 6.15 for this also seems a bit excessive - if
it turns out to be easy to fix, we can always just backport the fix
and undo the revert.
Sounds like a plan?
I'm somewhat surprised that this was only noticed now if it triggers
so easily for Al with xfstests on xfs. But better late than never, I
guess..
Linus
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-25 20:32 ` Linus Torvalds
@ 2025-05-25 20:48 ` Matthew Wilcox
2025-05-25 20:54 ` Linus Torvalds
2025-05-25 21:49 ` Al Viro
0 siblings, 2 replies; 24+ messages in thread
From: Matthew Wilcox @ 2025-05-25 20:48 UTC (permalink / raw)
To: Linus Torvalds
Cc: Vlastimil Babka, Al Viro, Jens Axboe, Jan Kara, Christoph Hellwig,
Darrick J. Wong, Christian Brauner, linux-fsdevel
On Sun, May 25, 2025 at 01:32:33PM -0700, Linus Torvalds wrote:
> But yeah, maybe the drop-behind case never triggers in practice, and I
> should just revert commit 974c5e6139db ("xfs: flag as supporting
> FOP_DONTCACHE") for now.
>
> That's kind of sad too, but at least that's new to 6.15 and we
> wouldn't have a kernel release that triggers this issue.
>
> I realize that Vlastimil had a suggested possible fix, but doing
> _that_ kind of surgery at this point in the release isn't an option,
> I'm afraid. And delaying 6.15 for this also seems a bit excessive - if
> it turns out to be easy to fix, we can always just backport the fix
> and undo the revert.
>
> Sounds like a plan?
>
> I'm somewhat surprised that this was only noticed now if it triggers
> so easily for Al with xfstests on xfs. But better late than never, I
> guess..
I wonder if we shouldn't do ...
+++ b/include/linux/fs.h
@@ -3725,6 +3725,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags,
return -EOPNOTSUPP;
}
if (flags & RWF_DONTCACHE) {
+ /* Houston, we have a problem */
+ return -EOPNOTSUPP;
/* file system must support it */
if (!(ki->ki_filp->f_op->fop_flags & FOP_DONTCACHE))
return -EOPNOTSUPP;
in case some other filesystem adds support for it? I don't see anything
in -next right now, but I see Darrick playing with it here for FUSE:
https://lore.kernel.org/all/174787195629.1483178.7917092102987513364.stgit@frogsfrogsfrogs/
Jeff playing with it for nfsd here:
https://lore.kernel.org/all/370dd4ae06d44f852342b7ee2b969fc544bd1213.camel@kernel.org/
Trond implementing it for NFS client here:
https://lore.kernel.org/all/cover.1745381692.git.trond.myklebust@hammerspace.com/
I thought I saw someone implement it for ext4, but perhaps I'm confused
with something else. Anyway, some kind of not-xfs-specific patch is
appropriate here, I think?
Oh, and we're only just seeing it, I think, because you need to recompile
xfstests to test this functionality ... and I certainly don't re-pull
and re-compile xfstests on a regular basis; I just use the one I pulled
and compiled, um, months ago.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-25 20:48 ` Matthew Wilcox
@ 2025-05-25 20:54 ` Linus Torvalds
2025-05-25 21:49 ` Al Viro
1 sibling, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2025-05-25 20:54 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Vlastimil Babka, Al Viro, Jens Axboe, Jan Kara, Christoph Hellwig,
Darrick J. Wong, Christian Brauner, linux-fsdevel
On Sun, 25 May 2025 at 13:48, Matthew Wilcox <willy@infradead.org> wrote:
>
> I wonder if we shouldn't do ...
>
> +++ b/include/linux/fs.h
> @@ -3725,6 +3725,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags,
> return -EOPNOTSUPP;
> }
> if (flags & RWF_DONTCACHE) {
> + /* Houston, we have a problem */
> + return -EOPNOTSUPP;
Hmm. Your point about other filesystems is well taken.
I'd have preferred a revert as a "don't do anything new at this
point", but I guess disabling it at this point is probably the safer
option considering that this isn't a xfs issue.
> Oh, and we're only just seeing it, I think, because you need to recompile
> xfstests to test this functionality ...
Ahh, good. Well, not "good" exactly, but it certainly at least
explains the unlucky timing.
Thanks,
Linus
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-25 20:48 ` Matthew Wilcox
2025-05-25 20:54 ` Linus Torvalds
@ 2025-05-25 21:49 ` Al Viro
2025-05-25 22:05 ` Linus Torvalds
1 sibling, 1 reply; 24+ messages in thread
From: Al Viro @ 2025-05-25 21:49 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Linus Torvalds, Vlastimil Babka, Jens Axboe, Jan Kara,
Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel
On Sun, May 25, 2025 at 09:48:45PM +0100, Matthew Wilcox wrote:
> On Sun, May 25, 2025 at 01:32:33PM -0700, Linus Torvalds wrote:
> > But yeah, maybe the drop-behind case never triggers in practice, and I
> > should just revert commit 974c5e6139db ("xfs: flag as supporting
> > FOP_DONTCACHE") for now.
> >
> > That's kind of sad too, but at least that's new to 6.15 and we
> > wouldn't have a kernel release that triggers this issue.
> >
> > I realize that Vlastimil had a suggested possible fix, but doing
> > _that_ kind of surgery at this point in the release isn't an option,
> > I'm afraid. And delaying 6.15 for this also seems a bit excessive - if
> > it turns out to be easy to fix, we can always just backport the fix
> > and undo the revert.
> >
> > Sounds like a plan?
> >
> > I'm somewhat surprised that this was only noticed now if it triggers
> > so easily for Al with xfstests on xfs. But better late than never, I
> > guess..
>
> I wonder if we shouldn't do ...
>
> +++ b/include/linux/fs.h
> @@ -3725,6 +3725,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags,
> return -EOPNOTSUPP;
> }
> if (flags & RWF_DONTCACHE) {
> + /* Houston, we have a problem */
> + return -EOPNOTSUPP;
> /* file system must support it */
> if (!(ki->ki_filp->f_op->fop_flags & FOP_DONTCACHE))
> return -EOPNOTSUPP;
>
Perhaps
-#define FOP_DONTCACHE ((__force fop_flags_t)(1 << 7)) when shit gets fixed
+#define FOP_DONTCACHE 0 // ((__force fop_flags_t)(1 << 7)) when shit gets fixed
instead?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-25 21:49 ` Al Viro
@ 2025-05-25 22:05 ` Linus Torvalds
0 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2025-05-25 22:05 UTC (permalink / raw)
To: Al Viro
Cc: Matthew Wilcox, Vlastimil Babka, Jens Axboe, Jan Kara,
Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel
On Sun, 25 May 2025 at 14:49, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Perhaps
>
> -#define FOP_DONTCACHE ((__force fop_flags_t)(1 << 7)) when shit gets fixed
> +#define FOP_DONTCACHE 0 // ((__force fop_flags_t)(1 << 7)) when shit gets fixed
>
> instead?
Yeah, I think that ends up being prettier than an extra error return
in the middle of code.
Will do. Thanks for noticing this, even if the timing is awkward.
Linus
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-25 19:12 ` Vlastimil Babka
2025-05-25 20:32 ` Linus Torvalds
@ 2025-05-26 13:05 ` Jens Axboe
2025-05-26 15:06 ` Jens Axboe
1 sibling, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2025-05-26 13:05 UTC (permalink / raw)
To: Vlastimil Babka, Al Viro, Matthew Wilcox, Jan Kara
Cc: Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel, Linus Torvalds
On 5/25/25 1:12 PM, Vlastimil Babka wrote:
> On 5/25/25 8:06 PM, Al Viro wrote:
>> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
>>
>>> Breakage is still present in the current mainline ;-/
>>
>> With CONFIG_DEBUG_VM on top of pagealloc debugging:
>>
>> [ 1434.992817] run fstests generic/127 at 2025-05-25 11:46:11g
>> [ 1448.956242] BUG: Bad page state in process kworker/2:1 pfn:112cb0g
>> [ 1448.956846] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
>> [ 1448.957453] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
>
> It doesn't like the writeback flag.
>
>> [ 1448.957863] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
>> [ 1448.958303] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
>> [ 1448.958833] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) setg
>> [ 1448.959320] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc loop ecryptfs 9pnet_virtio 9pnet netfs evdev pcspkr sg button ext4 jbd2 btrfs blake2b_generic xor zlib_deflate raid6_pq zstd_compress sr_mod cdrom ata_generic ata_piix psmouse serio_raw i2c_piix4 i2c_smbus libata e1000g
>> [ 1448.960874] CPU: 2 UID: 0 PID: 2614 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ #78g
>> [ 1448.960878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014g
>> [ 1448.960879] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]g
>> [ 1448.960938] Call Trace:g
>> [ 1448.960939] <TASK>g
>> [ 1448.960940] dump_stack_lvl+0x4f/0x60g
>> [ 1448.960953] bad_page+0x6f/0x100g
>> [ 1448.960957] free_frozen_pages+0x471/0x640g
>> [ 1448.960958] iomap_finish_ioend+0x196/0x3c0g
>> [ 1448.960963] iomap_finish_ioends+0x83/0xc0g
>> [ 1448.960964] xfs_end_ioend+0x64/0x140 [xfs]g
>> [ 1448.961003] xfs_end_io+0x93/0xc0 [xfs]g
>> [ 1448.961036] process_one_work+0x153/0x390g
>> [ 1448.961044] worker_thread+0x2ab/0x3b0g
>> [ 1448.961045] ? rescuer_thread+0x470/0x470g
>> [ 1448.961047] kthread+0xf7/0x200g
>> [ 1448.961048] ? kthread_use_mm+0xa0/0xa0g
>> [ 1448.961049] ret_from_fork+0x2d/0x50g
>> [ 1448.961053] ? kthread_use_mm+0xa0/0xa0g
>> [ 1448.961054] ret_from_fork_asm+0x11/0x20g
>> [ 1448.961058] </TASK>g
>> [ 1448.961155] Disabling lock debugging due to kernel taintg
>> [ 1448.969569] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
>
> same pfn, same struct page
>
>> [ 1448.970023] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
>> [ 1448.970651] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
>> [ 1448.971222] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
>> [ 1448.971812] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))g
>> [ 1448.972490] ------------[ cut here ]------------g
>> [ 1448.972841] kernel BUG at ./include/linux/mm.h:1455!g
>
> this is folio_get() noticing refcount is 0, so a use-after free, because
> we already tried to free the page above.
>
> I'm not familiar with this code too much, but I suspect problem was
> introduced by commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached
> pages when writeback completes") and only (more) exposed here.
>
> so in folio_end_writeback() we have
> if (__folio_end_writeback(folio))
> folio_wake_bit(folio, PG_writeback);
>
> but calling the folio_end_dropbehind_write() doesn't depend on the
> result of __folio_end_writeback()
> this seems rather suspicious
>
> I think if __folio_end_writeback() was true then PG_writeback would be
> cleared and thus we'd not see the PAGE_FLAGS_CHECK_AT_FREE failure.
> Instead we do a premature folio_end_dropbehind_write() dropping a page
> ref and then the final folio_put() in folio_end_writeback() frees the
> page and splats on the PG_writeback. Then the folio is processed again
> in the following iteration of iomap_finish_ioend() and splats on the
> refcount-already-zero.
>
> So I think folio_end_dropbehind_write() should only be done when
> __folio_end_writeback() was true. Most likely even the
> folio_test_clear_dropbehind() should be tied to that, or we clear it too
> early and then never act upon it later?
Thanks for taking a look at this! I tried to reproduce this this morning
and failed miserably. I then injected a delay for the above case, and it
does indeed then trigger for me. So far, so good.
I agree with your analysis, we should only be doing the dropbehind for a
non-zero return from __folio_end_writeback(), and that includes the
test_and_clear to avoid dropping the drop-behind state. But we also need
to check/clear this state pre __folio_end_writeback(), which then puts
us in a spot where it needs to potentially be re-set. Which fails pretty
racy...
I'll ponder this a bit. Good thing fsx got RWF_DONTCACHE support, or I
suspect this would've taken a while to run into.
--
Jens Axboe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-26 13:05 ` Jens Axboe
@ 2025-05-26 15:06 ` Jens Axboe
2025-05-26 15:31 ` Vlastimil Babka
2025-05-26 17:38 ` Jens Axboe
0 siblings, 2 replies; 24+ messages in thread
From: Jens Axboe @ 2025-05-26 15:06 UTC (permalink / raw)
To: Vlastimil Babka, Al Viro, Matthew Wilcox, Jan Kara
Cc: Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel, Linus Torvalds
On 5/26/25 7:05 AM, Jens Axboe wrote:
> On 5/25/25 1:12 PM, Vlastimil Babka wrote:
>> On 5/25/25 8:06 PM, Al Viro wrote:
>>> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
>>>
>>>> Breakage is still present in the current mainline ;-/
>>>
>>> With CONFIG_DEBUG_VM on top of pagealloc debugging:
>>>
>>> [ 1434.992817] run fstests generic/127 at 2025-05-25 11:46:11g
>>> [ 1448.956242] BUG: Bad page state in process kworker/2:1 pfn:112cb0g
>>> [ 1448.956846] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
>>> [ 1448.957453] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
>>
>> It doesn't like the writeback flag.
>>
>>> [ 1448.957863] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
>>> [ 1448.958303] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
>>> [ 1448.958833] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) setg
>>> [ 1448.959320] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc loop ecryptfs 9pnet_virtio 9pnet netfs evdev pcspkr sg button ext4 jbd2 btrfs blake2b_generic xor zlib_deflate raid6_pq zstd_compress sr_mod cdrom ata_generic ata_piix psmouse serio_raw i2c_piix4 i2c_smbus libata e1000g
>>> [ 1448.960874] CPU: 2 UID: 0 PID: 2614 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ #78g
>>> [ 1448.960878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014g
>>> [ 1448.960879] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]g
>>> [ 1448.960938] Call Trace:g
>>> [ 1448.960939] <TASK>g
>>> [ 1448.960940] dump_stack_lvl+0x4f/0x60g
>>> [ 1448.960953] bad_page+0x6f/0x100g
>>> [ 1448.960957] free_frozen_pages+0x471/0x640g
>>> [ 1448.960958] iomap_finish_ioend+0x196/0x3c0g
>>> [ 1448.960963] iomap_finish_ioends+0x83/0xc0g
>>> [ 1448.960964] xfs_end_ioend+0x64/0x140 [xfs]g
>>> [ 1448.961003] xfs_end_io+0x93/0xc0 [xfs]g
>>> [ 1448.961036] process_one_work+0x153/0x390g
>>> [ 1448.961044] worker_thread+0x2ab/0x3b0g
>>> [ 1448.961045] ? rescuer_thread+0x470/0x470g
>>> [ 1448.961047] kthread+0xf7/0x200g
>>> [ 1448.961048] ? kthread_use_mm+0xa0/0xa0g
>>> [ 1448.961049] ret_from_fork+0x2d/0x50g
>>> [ 1448.961053] ? kthread_use_mm+0xa0/0xa0g
>>> [ 1448.961054] ret_from_fork_asm+0x11/0x20g
>>> [ 1448.961058] </TASK>g
>>> [ 1448.961155] Disabling lock debugging due to kernel taintg
>>> [ 1448.969569] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
>>
>> same pfn, same struct page
>>
>>> [ 1448.970023] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
>>> [ 1448.970651] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
>>> [ 1448.971222] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
>>> [ 1448.971812] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))g
>>> [ 1448.972490] ------------[ cut here ]------------g
>>> [ 1448.972841] kernel BUG at ./include/linux/mm.h:1455!g
>>
>> this is folio_get() noticing refcount is 0, so a use-after free, because
>> we already tried to free the page above.
>>
>> I'm not familiar with this code too much, but I suspect problem was
>> introduced by commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached
>> pages when writeback completes") and only (more) exposed here.
>>
>> so in folio_end_writeback() we have
>> if (__folio_end_writeback(folio))
>> folio_wake_bit(folio, PG_writeback);
>>
>> but calling the folio_end_dropbehind_write() doesn't depend on the
>> result of __folio_end_writeback()
>> this seems rather suspicious
>>
>> I think if __folio_end_writeback() was true then PG_writeback would be
>> cleared and thus we'd not see the PAGE_FLAGS_CHECK_AT_FREE failure.
>> Instead we do a premature folio_end_dropbehind_write() dropping a page
>> ref and then the final folio_put() in folio_end_writeback() frees the
>> page and splats on the PG_writeback. Then the folio is processed again
>> in the following iteration of iomap_finish_ioend() and splats on the
>> refcount-already-zero.
>>
>> So I think folio_end_dropbehind_write() should only be done when
>> __folio_end_writeback() was true. Most likely even the
>> folio_test_clear_dropbehind() should be tied to that, or we clear it too
>> early and then never act upon it later?
>
> Thanks for taking a look at this! I tried to reproduce this this morning
> and failed miserably. I then injected a delay for the above case, and it
> does indeed then trigger for me. So far, so good.
>
> I agree with your analysis, we should only be doing the dropbehind for a
> non-zero return from __folio_end_writeback(), and that includes the
> test_and_clear to avoid dropping the drop-behind state. But we also need
> to check/clear this state pre __folio_end_writeback(), which then puts
> us in a spot where it needs to potentially be re-set. Which fails pretty
> racy...
>
> I'll ponder this a bit. Good thing fsx got RWF_DONTCACHE support, or I
> suspect this would've taken a while to run into.
Took a closer look... I may be smoking something good here, but I don't
see what the __folio_end_writeback()() return value has to do with this
at all. Regardless of what it returns, it should've cleared
PG_writeback, and in fact the only thing it returns is whether or not we
had anyone waiting on it. Which should have _zero_ bearing on whether or
not we can clear/invalidate the range.
To me, this smells more like a race of some sort, between dirty and
invalidation. fsx does a lot of sub-page sized operations.
I'll poke a bit more...
--
Jens Axboe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-26 15:06 ` Jens Axboe
@ 2025-05-26 15:31 ` Vlastimil Babka
2025-05-26 15:58 ` Jens Axboe
2025-05-26 17:38 ` Jens Axboe
1 sibling, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2025-05-26 15:31 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Matthew Wilcox, Jan Kara
Cc: Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel, Linus Torvalds
On 5/26/25 17:06, Jens Axboe wrote:
> On 5/26/25 7:05 AM, Jens Axboe wrote:
>> On 5/25/25 1:12 PM, Vlastimil Babka wrote:
>>
>> Thanks for taking a look at this! I tried to reproduce this this morning
>> and failed miserably. I then injected a delay for the above case, and it
>> does indeed then trigger for me. So far, so good.
>>
>> I agree with your analysis, we should only be doing the dropbehind for a
>> non-zero return from __folio_end_writeback(), and that includes the
>> test_and_clear to avoid dropping the drop-behind state. But we also need
>> to check/clear this state pre __folio_end_writeback(), which then puts
>> us in a spot where it needs to potentially be re-set. Which fails pretty
>> racy...
>>
>> I'll ponder this a bit. Good thing fsx got RWF_DONTCACHE support, or I
>> suspect this would've taken a while to run into.
>
> Took a closer look... I may be smoking something good here, but I don't
> see what the __folio_end_writeback()() return value has to do with this
> at all. Regardless of what it returns, it should've cleared
> PG_writeback, and in fact the only thing it returns is whether or not we
> had anyone waiting on it. Which should have _zero_ bearing on whether or
> not we can clear/invalidate the range.
Yeah it's very much possible that I was wrong, folio_xor_flags_has_waiters()
looked a bit impenetrable to me, and it seemed like an simple explanation to
the splats. But as you had to add delays, this indeed smells as a race.
> To me, this smells more like a race of some sort, between dirty and
> invalidation. fsx does a lot of sub-page sized operations.
>
> I'll poke a bit more...
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-26 15:31 ` Vlastimil Babka
@ 2025-05-26 15:58 ` Jens Axboe
0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2025-05-26 15:58 UTC (permalink / raw)
To: Vlastimil Babka, Al Viro, Matthew Wilcox, Jan Kara
Cc: Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel, Linus Torvalds
On 5/26/25 9:31 AM, Vlastimil Babka wrote:
> On 5/26/25 17:06, Jens Axboe wrote:
>> On 5/26/25 7:05 AM, Jens Axboe wrote:
>>> On 5/25/25 1:12 PM, Vlastimil Babka wrote:
>>>
>>> Thanks for taking a look at this! I tried to reproduce this this morning
>>> and failed miserably. I then injected a delay for the above case, and it
>>> does indeed then trigger for me. So far, so good.
>>>
>>> I agree with your analysis, we should only be doing the dropbehind for a
>>> non-zero return from __folio_end_writeback(), and that includes the
>>> test_and_clear to avoid dropping the drop-behind state. But we also need
>>> to check/clear this state pre __folio_end_writeback(), which then puts
>>> us in a spot where it needs to potentially be re-set. Which fails pretty
>>> racy...
>>>
>>> I'll ponder this a bit. Good thing fsx got RWF_DONTCACHE support, or I
>>> suspect this would've taken a while to run into.
>>
>> Took a closer look... I may be smoking something good here, but I don't
>> see what the __folio_end_writeback()() return value has to do with this
>> at all. Regardless of what it returns, it should've cleared
>> PG_writeback, and in fact the only thing it returns is whether or not we
>> had anyone waiting on it. Which should have _zero_ bearing on whether or
>> not we can clear/invalidate the range.
>
> Yeah it's very much possible that I was wrong, folio_xor_flags_has_waiters()
> looked a bit impenetrable to me, and it seemed like an simple explanation to
> the splats. But as you had to add delays, this indeed smells as a race.
Here's my delay trace fwiw, which is a bit different:
BUG: Bad page state in process fsx pfn:4866b
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x25 pfn:0x4866b
flags: 0x3ffe0000000000a(uptodate|writeback|node=0|zone=0|lastcpupid=0x1fff)
raw: 03ffe0000000000a dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000025 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
Modules linked in:
CPU: 6 UID: 0 PID: 1853 Comm: fsx Not tainted 6.15.0-rc7-00144-gb1427432d3b6-dirty #1053 NONE
Hardware name: linux,dummy-virt (DT)
Call trace:
show_stack+0x1c/0x30 (C)
dump_stack_lvl+0x58/0x78
dump_stack+0x18/0x20
bad_page+0x1a4/0x228
free_unref_folios+0xc2c/0x1920
folios_put_refs+0x354/0x5f0
__folio_batch_release+0x98/0xd0
writeback_iter+0x8f8/0xd00
iomap_writepages+0x16e4/0x2090
xfs_vm_writepages+0x200/0x2c0
do_writepages+0x148/0x7c0
filemap_fdatawrite_wbc+0xe0/0x138
__filemap_fdatawrite_range+0xb0/0x100
filemap_write_and_wait_range+0x68/0x100
__generic_remap_file_range_prep+0x418/0x1090
generic_remap_file_range_prep+0x18/0x80
xfs_reflink_remap_prep+0x160/0x7d8
xfs_file_remap_range+0x164/0xa90
vfs_dedupe_file_range_one+0x398/0x4a0
vfs_dedupe_file_range+0x410/0x648
do_vfs_ioctl+0x13c4/0x1fc0
__arm64_sys_ioctl+0xd8/0x188
invoke_syscall.constprop.0+0x60/0x2a0
el0_svc_common.constprop.0+0x148/0x240
do_el0_svc+0x40/0x60
el0_svc+0x34/0x70
el0t_64_sync_handler+0x104/0x138
el0t_64_sync+0x170/0x178
Disabling lock debugging due to kernel taint
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x25 pfn:0x4866b
flags: 0x3ffe0000000000a(uptodate|writeback|node=0|zone=0|lastcpupid=0x1fff)
raw: 03ffe0000000000a dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000025 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
------------[ cut here ]------------
kernel BUG at ./include/linux/mm.h:1543!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
Modules linked in:
CPU: 6 UID: 0 PID: 0 Comm: swapper/6 Tainted: G B 6.15.0-rc7-00144-gb1427432d3b6-dirty #1053 NONE
Tainted: [B]=BAD_PAGE
Hardware name: linux,dummy-virt (DT)
pstate: 614000c5 (nZCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : folio_end_writeback+0x470/0x560
lr : folio_end_writeback+0x470/0x560
sp : ffff8000859978f0
x29: ffff8000859978f0 x28: dfff800000000000 x27: fffffdffc0219ac0
x26: 0000000000000000 x25: ffff000005ed8138 x24: 0000000000000000
x23: 1fffffbff804335e x22: 0000000000000004 x21: 0000000000000001
x20: fffffdffc0219af4 x19: fffffdffc0219ac0 x18: 000000000000000f
x17: 635f6665725f6f69 x16: 6c6f662029746e69 x15: 0720072007200720
x14: 0720072007200720 x13: 0720072007200720 x12: ffff60001b67150b
x11: 1fffe0001b67150a x10: ffff60001b67150a x9 : dfff800000000000
x8 : 00009fffe498eaf6 x7 : ffff0000db38a853 x6 : 0000000000000001
x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff0000c1f98000 x0 : 000000000000005c
Call trace:
folio_end_writeback+0x470/0x560 (P)
iomap_finish_ioend_buffered+0x38c/0x9e0
iomap_writepage_end_bio+0x80/0xc0
bio_endio+0x4dc/0x678
blk_mq_end_request_batch+0x2b4/0x10c0
nvme_pci_complete_batch+0x338/0x518
nvme_irq+0xd8/0xf0
__handle_irq_event_percpu+0xdc/0x528
handle_irq_event+0x174/0x3d8
handle_fasteoi_irq+0x2cc/0xba0
handle_irq_desc+0xb8/0x120
generic_handle_domain_irq+0x20/0x30
gic_handle_irq+0x50/0x140
call_on_irq_stack+0x24/0x50
do_interrupt_handler+0xe0/0x148
el1_interrupt+0x30/0x50
el1h_64_irq_handler+0x14/0x20
el1h_64_irq+0x6c/0x70
do_idle+0x244/0x4c8 (P)
cpu_startup_entry+0x64/0x80
secondary_start_kernel+0x1e4/0x240
__secondary_switched+0x74/0x78
Code: 91190021 91218021 aa1303e0 94039279 (d4210000)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x0000,000000e0,0109a650,834e7607
Memory Limit: none
---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]---
--
Jens Axboe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-26 15:06 ` Jens Axboe
2025-05-26 15:31 ` Vlastimil Babka
@ 2025-05-26 17:38 ` Jens Axboe
2025-05-26 23:56 ` Al Viro
2025-05-27 0:51 ` Trond Myklebust
1 sibling, 2 replies; 24+ messages in thread
From: Jens Axboe @ 2025-05-26 17:38 UTC (permalink / raw)
To: Vlastimil Babka, Al Viro, Matthew Wilcox, Jan Kara
Cc: Christoph Hellwig, Darrick J. Wong, Christian Brauner,
linux-fsdevel, Linus Torvalds
On 5/26/25 9:06 AM, Jens Axboe wrote:
> On 5/26/25 7:05 AM, Jens Axboe wrote:
>> On 5/25/25 1:12 PM, Vlastimil Babka wrote:
>>> On 5/25/25 8:06 PM, Al Viro wrote:
>>>> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
>>>>
>>>>> Breakage is still present in the current mainline ;-/
>>>>
>>>> With CONFIG_DEBUG_VM on top of pagealloc debugging:
>>>>
>>>> [ 1434.992817] run fstests generic/127 at 2025-05-25 11:46:11g
>>>> [ 1448.956242] BUG: Bad page state in process kworker/2:1 pfn:112cb0g
>>>> [ 1448.956846] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
>>>> [ 1448.957453] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
>>>
>>> It doesn't like the writeback flag.
>>>
>>>> [ 1448.957863] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
>>>> [ 1448.958303] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
>>>> [ 1448.958833] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) setg
>>>> [ 1448.959320] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc loop ecryptfs 9pnet_virtio 9pnet netfs evdev pcspkr sg button ext4 jbd2 btrfs blake2b_generic xor zlib_deflate raid6_pq zstd_compress sr_mod cdrom ata_generic ata_piix psmouse serio_raw i2c_piix4 i2c_smbus libata e1000g
>>>> [ 1448.960874] CPU: 2 UID: 0 PID: 2614 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ #78g
>>>> [ 1448.960878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014g
>>>> [ 1448.960879] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]g
>>>> [ 1448.960938] Call Trace:g
>>>> [ 1448.960939] <TASK>g
>>>> [ 1448.960940] dump_stack_lvl+0x4f/0x60g
>>>> [ 1448.960953] bad_page+0x6f/0x100g
>>>> [ 1448.960957] free_frozen_pages+0x471/0x640g
>>>> [ 1448.960958] iomap_finish_ioend+0x196/0x3c0g
>>>> [ 1448.960963] iomap_finish_ioends+0x83/0xc0g
>>>> [ 1448.960964] xfs_end_ioend+0x64/0x140 [xfs]g
>>>> [ 1448.961003] xfs_end_io+0x93/0xc0 [xfs]g
>>>> [ 1448.961036] process_one_work+0x153/0x390g
>>>> [ 1448.961044] worker_thread+0x2ab/0x3b0g
>>>> [ 1448.961045] ? rescuer_thread+0x470/0x470g
>>>> [ 1448.961047] kthread+0xf7/0x200g
>>>> [ 1448.961048] ? kthread_use_mm+0xa0/0xa0g
>>>> [ 1448.961049] ret_from_fork+0x2d/0x50g
>>>> [ 1448.961053] ? kthread_use_mm+0xa0/0xa0g
>>>> [ 1448.961054] ret_from_fork_asm+0x11/0x20g
>>>> [ 1448.961058] </TASK>g
>>>> [ 1448.961155] Disabling lock debugging due to kernel taintg
>>>> [ 1448.969569] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
>>>
>>> same pfn, same struct page
>>>
>>>> [ 1448.970023] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
>>>> [ 1448.970651] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
>>>> [ 1448.971222] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
>>>> [ 1448.971812] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))g
>>>> [ 1448.972490] ------------[ cut here ]------------g
>>>> [ 1448.972841] kernel BUG at ./include/linux/mm.h:1455!g
>>>
>>> this is folio_get() noticing refcount is 0, so a use-after free, because
>>> we already tried to free the page above.
>>>
>>> I'm not familiar with this code too much, but I suspect problem was
>>> introduced by commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached
>>> pages when writeback completes") and only (more) exposed here.
>>>
>>> so in folio_end_writeback() we have
>>> if (__folio_end_writeback(folio))
>>> folio_wake_bit(folio, PG_writeback);
>>>
>>> but calling the folio_end_dropbehind_write() doesn't depend on the
>>> result of __folio_end_writeback()
>>> this seems rather suspicious
>>>
>>> I think if __folio_end_writeback() was true then PG_writeback would be
>>> cleared and thus we'd not see the PAGE_FLAGS_CHECK_AT_FREE failure.
>>> Instead we do a premature folio_end_dropbehind_write() dropping a page
>>> ref and then the final folio_put() in folio_end_writeback() frees the
>>> page and splats on the PG_writeback. Then the folio is processed again
>>> in the following iteration of iomap_finish_ioend() and splats on the
>>> refcount-already-zero.
>>>
>>> So I think folio_end_dropbehind_write() should only be done when
>>> __folio_end_writeback() was true. Most likely even the
>>> folio_test_clear_dropbehind() should be tied to that, or we clear it too
>>> early and then never act upon it later?
>>
>> Thanks for taking a look at this! I tried to reproduce this this morning
>> and failed miserably. I then injected a delay for the above case, and it
>> does indeed then trigger for me. So far, so good.
>>
>> I agree with your analysis, we should only be doing the dropbehind for a
>> non-zero return from __folio_end_writeback(), and that includes the
>> test_and_clear to avoid dropping the drop-behind state. But we also need
>> to check/clear this state pre __folio_end_writeback(), which then puts
>> us in a spot where it needs to potentially be re-set. Which fails pretty
>> racy...
>>
>> I'll ponder this a bit. Good thing fsx got RWF_DONTCACHE support, or I
>> suspect this would've taken a while to run into.
>
> Took a closer look... I may be smoking something good here, but I don't
> see what the __folio_end_writeback()() return value has to do with this
> at all. Regardless of what it returns, it should've cleared
> PG_writeback, and in fact the only thing it returns is whether or not we
> had anyone waiting on it. Which should have _zero_ bearing on whether or
> not we can clear/invalidate the range.
>
> To me, this smells more like a race of some sort, between dirty and
> invalidation. fsx does a lot of sub-page sized operations.
>
> I'll poke a bit more...
I _think_ we're racing with the same folio being marked for writeback
again. Al, can you try the below?
diff --git a/mm/filemap.c b/mm/filemap.c
index 7b90cbeb4a1a..e95b184a2459 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1604,7 +1604,7 @@ static void folio_end_dropbehind_write(struct folio *folio)
* invalidation in that case.
*/
if (in_task() && folio_trylock(folio)) {
- if (folio->mapping)
+ if (folio->mapping && !folio_test_writeback(folio))
folio_unmap_invalidate(folio->mapping, folio, 0);
folio_unlock(folio);
}
--
Jens Axboe
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-26 17:38 ` Jens Axboe
@ 2025-05-26 23:56 ` Al Viro
2025-05-27 0:58 ` Jens Axboe
2025-05-27 0:51 ` Trond Myklebust
1 sibling, 1 reply; 24+ messages in thread
From: Al Viro @ 2025-05-26 23:56 UTC (permalink / raw)
To: Jens Axboe
Cc: Vlastimil Babka, Matthew Wilcox, Jan Kara, Christoph Hellwig,
Darrick J. Wong, Christian Brauner, linux-fsdevel, Linus Torvalds
On Mon, May 26, 2025 at 11:38:53AM -0600, Jens Axboe wrote:
> > I'll poke a bit more...
>
> I _think_ we're racing with the same folio being marked for writeback
> again. Al, can you try the below?
It seems to survive on top of v6.15^^
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-26 17:38 ` Jens Axboe
2025-05-26 23:56 ` Al Viro
@ 2025-05-27 0:51 ` Trond Myklebust
2025-05-27 0:56 ` Jens Axboe
1 sibling, 1 reply; 24+ messages in thread
From: Trond Myklebust @ 2025-05-27 0:51 UTC (permalink / raw)
To: willy@infradead.org, jack@suse.cz, axboe@kernel.dk,
viro@zeniv.linux.org.uk, vbabka@suse.cz
Cc: hch@lst.de, djwong@kernel.org, brauner@kernel.org,
linux-fsdevel@vger.kernel.org, torvalds@linux-foundation.org
On Mon, 2025-05-26 at 11:38 -0600, Jens Axboe wrote:
> On 5/26/25 9:06 AM, Jens Axboe wrote:
> > On 5/26/25 7:05 AM, Jens Axboe wrote:
> > > On 5/25/25 1:12 PM, Vlastimil Babka wrote:
> > > > On 5/25/25 8:06 PM, Al Viro wrote:
> > > > > On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
> > > > >
> > > > > > Breakage is still present in the current mainline ;-/
> > > > >
> > > > > With CONFIG_DEBUG_VM on top of pagealloc debugging:
> > > > >
> > > > > [ 1434.992817] run fstests generic/127 at 2025-05-25
> > > > > 11:46:11g
> > > > > [ 1448.956242] BUG: Bad page state in process kworker/2:1
> > > > > pfn:112cb0g
> > > > > [ 1448.956846] page: refcount:0 mapcount:0
> > > > > mapping:0000000000000000 index:0x3e pfn:0x112cb0g
> > > > > [ 1448.957453] flags:
> > > > > 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
> > > >
> > > > It doesn't like the writeback flag.
> > > >
> > > > > [ 1448.957863] raw: 800000000000000e dead000000000100
> > > > > dead000000000122 0000000000000000g
> > > > > [ 1448.958303] raw: 000000000000003e 0000000000000000
> > > > > 00000000ffffffff 0000000000000000g
> > > > > [ 1448.958833] page dumped because: PAGE_FLAGS_CHECK_AT_FREE
> > > > > flag(s) setg
> > > > > [ 1448.959320] Modules linked in: xfs autofs4 fuse nfsd
> > > > > auth_rpcgss nfs_acl nfs lockd grace sunrpc loop ecryptfs
> > > > > 9pnet_virtio 9pnet netfs evdev pcspkr sg button ext4 jbd2
> > > > > btrfs blake2b_generic xor zlib_deflate raid6_pq zstd_compress
> > > > > sr_mod cdrom ata_generic ata_piix psmouse serio_raw i2c_piix4
> > > > > i2c_smbus libata e1000g
> > > > > [ 1448.960874] CPU: 2 UID: 0 PID: 2614 Comm: kworker/2:1 Not
> > > > > tainted 6.14.0-rc1+ #78g
> > > > > [ 1448.960878] Hardware name: QEMU Standard PC (i440FX +
> > > > > PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014g
> > > > > [ 1448.960879] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]g
> > > > > [ 1448.960938] Call Trace:g
> > > > > [ 1448.960939] <TASK>g
> > > > > [ 1448.960940] dump_stack_lvl+0x4f/0x60g
> > > > > [ 1448.960953] bad_page+0x6f/0x100g
> > > > > [ 1448.960957] free_frozen_pages+0x471/0x640g
> > > > > [ 1448.960958] iomap_finish_ioend+0x196/0x3c0g
> > > > > [ 1448.960963] iomap_finish_ioends+0x83/0xc0g
> > > > > [ 1448.960964] xfs_end_ioend+0x64/0x140 [xfs]g
> > > > > [ 1448.961003] xfs_end_io+0x93/0xc0 [xfs]g
> > > > > [ 1448.961036] process_one_work+0x153/0x390g
> > > > > [ 1448.961044] worker_thread+0x2ab/0x3b0g
> > > > > [ 1448.961045] ? rescuer_thread+0x470/0x470g
> > > > > [ 1448.961047] kthread+0xf7/0x200g
> > > > > [ 1448.961048] ? kthread_use_mm+0xa0/0xa0g
> > > > > [ 1448.961049] ret_from_fork+0x2d/0x50g
> > > > > [ 1448.961053] ? kthread_use_mm+0xa0/0xa0g
> > > > > [ 1448.961054] ret_from_fork_asm+0x11/0x20g
> > > > > [ 1448.961058] </TASK>g
> > > > > [ 1448.961155] Disabling lock debugging due to kernel taintg
> > > > > [ 1448.969569] page: refcount:0 mapcount:0
> > > > > mapping:0000000000000000 index:0x3e pfn:0x112cb0g
> > > >
> > > > same pfn, same struct page
> > > >
> > > > > [ 1448.970023] flags:
> > > > > 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
> > > > > [ 1448.970651] raw: 800000000000000e dead000000000100
> > > > > dead000000000122 0000000000000000g
> > > > > [ 1448.971222] raw: 000000000000003e 0000000000000000
> > > > > 00000000ffffffff 0000000000000000g
> > > > > [ 1448.971812] page dumped because:
> > > > > VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u
> > > > > <= 127u))g
> > > > > [ 1448.972490] ------------[ cut here ]------------g
> > > > > [ 1448.972841] kernel BUG at ./include/linux/mm.h:1455!g
> > > >
> > > > this is folio_get() noticing refcount is 0, so a use-after
> > > > free, because
> > > > we already tried to free the page above.
> > > >
> > > > I'm not familiar with this code too much, but I suspect problem
> > > > was
> > > > introduced by commit fb7d3bc414939 ("mm/filemap: drop
> > > > streaming/uncached
> > > > pages when writeback completes") and only (more) exposed here.
> > > >
> > > > so in folio_end_writeback() we have
> > > > if (__folio_end_writeback(folio))
> > > > folio_wake_bit(folio, PG_writeback);
> > > >
> > > > but calling the folio_end_dropbehind_write() doesn't depend on
> > > > the
> > > > result of __folio_end_writeback()
> > > > this seems rather suspicious
> > > >
> > > > I think if __folio_end_writeback() was true then PG_writeback
> > > > would be
> > > > cleared and thus we'd not see the PAGE_FLAGS_CHECK_AT_FREE
> > > > failure.
> > > > Instead we do a premature folio_end_dropbehind_write() dropping
> > > > a page
> > > > ref and then the final folio_put() in folio_end_writeback()
> > > > frees the
> > > > page and splats on the PG_writeback. Then the folio is
> > > > processed again
> > > > in the following iteration of iomap_finish_ioend() and splats
> > > > on the
> > > > refcount-already-zero.
> > > >
> > > > So I think folio_end_dropbehind_write() should only be done
> > > > when
> > > > __folio_end_writeback() was true. Most likely even the
> > > > folio_test_clear_dropbehind() should be tied to that, or we
> > > > clear it too
> > > > early and then never act upon it later?
> > >
> > > Thanks for taking a look at this! I tried to reproduce this this
> > > morning
> > > and failed miserably. I then injected a delay for the above case,
> > > and it
> > > does indeed then trigger for me. So far, so good.
> > >
> > > I agree with your analysis, we should only be doing the
> > > dropbehind for a
> > > non-zero return from __folio_end_writeback(), and that includes
> > > the
> > > test_and_clear to avoid dropping the drop-behind state. But we
> > > also need
> > > to check/clear this state pre __folio_end_writeback(), which then
> > > puts
> > > us in a spot where it needs to potentially be re-set. Which fails
> > > pretty
> > > racy...
> > >
> > > I'll ponder this a bit. Good thing fsx got RWF_DONTCACHE support,
> > > or I
> > > suspect this would've taken a while to run into.
> >
> > Took a closer look... I may be smoking something good here, but I
> > don't
> > see what the __folio_end_writeback()() return value has to do with
> > this
> > at all. Regardless of what it returns, it should've cleared
> > PG_writeback, and in fact the only thing it returns is whether or
> > not we
> > had anyone waiting on it. Which should have _zero_ bearing on
> > whether or
> > not we can clear/invalidate the range.
> >
> > To me, this smells more like a race of some sort, between dirty and
> > invalidation. fsx does a lot of sub-page sized operations.
> >
> > I'll poke a bit more...
>
> I _think_ we're racing with the same folio being marked for writeback
> again. Al, can you try the below?
>
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 7b90cbeb4a1a..e95b184a2459 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1604,7 +1604,7 @@ static void folio_end_dropbehind_write(struct
> folio *folio)
> * invalidation in that case.
> */
> if (in_task() && folio_trylock(folio)) {
> - if (folio->mapping)
> + if (folio->mapping && !folio_test_writeback(folio))
> folio_unmap_invalidate(folio->mapping,
> folio, 0);
> folio_unlock(folio);
> }
>
I think we need to test for PG_dirty after retaking the folio lock as
well. Nothing stops a second thread from redirtying the page once the
folio lock is dropped, and while some filesystems may insist on waiting
for PG_writeback before allowing redirtying to complete, that still
ends up racing because folio_end_dropbehind_write() is called after the
call to __folio_end_writeback().
Note that the same set of races can happen in
filemap_end_dropbehind_read(), so we need the same set of checks after
taking the folio lock there too. The existing checks are insufficient,
since they only happen before taking the folio lock.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-27 0:51 ` Trond Myklebust
@ 2025-05-27 0:56 ` Jens Axboe
0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2025-05-27 0:56 UTC (permalink / raw)
To: Trond Myklebust, willy@infradead.org, jack@suse.cz,
viro@zeniv.linux.org.uk, vbabka@suse.cz
Cc: hch@lst.de, djwong@kernel.org, brauner@kernel.org,
linux-fsdevel@vger.kernel.org, torvalds@linux-foundation.org
On 5/26/25 6:51 PM, Trond Myklebust wrote:
> On Mon, 2025-05-26 at 11:38 -0600, Jens Axboe wrote:
>> On 5/26/25 9:06 AM, Jens Axboe wrote:
>>> On 5/26/25 7:05 AM, Jens Axboe wrote:
>>>> On 5/25/25 1:12 PM, Vlastimil Babka wrote:
>>>>> On 5/25/25 8:06 PM, Al Viro wrote:
>>>>>> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
>>>>>>
>>>>>>> Breakage is still present in the current mainline ;-/
>>>>>>
>>>>>> With CONFIG_DEBUG_VM on top of pagealloc debugging:
>>>>>>
>>>>>> [ 1434.992817] run fstests generic/127 at 2025-05-25
>>>>>> 11:46:11g
>>>>>> [ 1448.956242] BUG: Bad page state in process kworker/2:1
>>>>>> pfn:112cb0g
>>>>>> [ 1448.956846] page: refcount:0 mapcount:0
>>>>>> mapping:0000000000000000 index:0x3e pfn:0x112cb0g
>>>>>> [ 1448.957453] flags:
>>>>>> 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
>>>>>
>>>>> It doesn't like the writeback flag.
>>>>>
>>>>>> [ 1448.957863] raw: 800000000000000e dead000000000100
>>>>>> dead000000000122 0000000000000000g
>>>>>> [ 1448.958303] raw: 000000000000003e 0000000000000000
>>>>>> 00000000ffffffff 0000000000000000g
>>>>>> [ 1448.958833] page dumped because: PAGE_FLAGS_CHECK_AT_FREE
>>>>>> flag(s) setg
>>>>>> [ 1448.959320] Modules linked in: xfs autofs4 fuse nfsd
>>>>>> auth_rpcgss nfs_acl nfs lockd grace sunrpc loop ecryptfs
>>>>>> 9pnet_virtio 9pnet netfs evdev pcspkr sg button ext4 jbd2
>>>>>> btrfs blake2b_generic xor zlib_deflate raid6_pq zstd_compress
>>>>>> sr_mod cdrom ata_generic ata_piix psmouse serio_raw i2c_piix4
>>>>>> i2c_smbus libata e1000g
>>>>>> [ 1448.960874] CPU: 2 UID: 0 PID: 2614 Comm: kworker/2:1 Not
>>>>>> tainted 6.14.0-rc1+ #78g
>>>>>> [ 1448.960878] Hardware name: QEMU Standard PC (i440FX +
>>>>>> PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014g
>>>>>> [ 1448.960879] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]g
>>>>>> [ 1448.960938] Call Trace:g
>>>>>> [ 1448.960939] <TASK>g
>>>>>> [ 1448.960940] dump_stack_lvl+0x4f/0x60g
>>>>>> [ 1448.960953] bad_page+0x6f/0x100g
>>>>>> [ 1448.960957] free_frozen_pages+0x471/0x640g
>>>>>> [ 1448.960958] iomap_finish_ioend+0x196/0x3c0g
>>>>>> [ 1448.960963] iomap_finish_ioends+0x83/0xc0g
>>>>>> [ 1448.960964] xfs_end_ioend+0x64/0x140 [xfs]g
>>>>>> [ 1448.961003] xfs_end_io+0x93/0xc0 [xfs]g
>>>>>> [ 1448.961036] process_one_work+0x153/0x390g
>>>>>> [ 1448.961044] worker_thread+0x2ab/0x3b0g
>>>>>> [ 1448.961045] ? rescuer_thread+0x470/0x470g
>>>>>> [ 1448.961047] kthread+0xf7/0x200g
>>>>>> [ 1448.961048] ? kthread_use_mm+0xa0/0xa0g
>>>>>> [ 1448.961049] ret_from_fork+0x2d/0x50g
>>>>>> [ 1448.961053] ? kthread_use_mm+0xa0/0xa0g
>>>>>> [ 1448.961054] ret_from_fork_asm+0x11/0x20g
>>>>>> [ 1448.961058] </TASK>g
>>>>>> [ 1448.961155] Disabling lock debugging due to kernel taintg
>>>>>> [ 1448.969569] page: refcount:0 mapcount:0
>>>>>> mapping:0000000000000000 index:0x3e pfn:0x112cb0g
>>>>>
>>>>> same pfn, same struct page
>>>>>
>>>>>> [ 1448.970023] flags:
>>>>>> 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
>>>>>> [ 1448.970651] raw: 800000000000000e dead000000000100
>>>>>> dead000000000122 0000000000000000g
>>>>>> [ 1448.971222] raw: 000000000000003e 0000000000000000
>>>>>> 00000000ffffffff 0000000000000000g
>>>>>> [ 1448.971812] page dumped because:
>>>>>> VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u
>>>>>> <= 127u))g
>>>>>> [ 1448.972490] ------------[ cut here ]------------g
>>>>>> [ 1448.972841] kernel BUG at ./include/linux/mm.h:1455!g
>>>>>
>>>>> this is folio_get() noticing refcount is 0, so a use-after
>>>>> free, because
>>>>> we already tried to free the page above.
>>>>>
>>>>> I'm not familiar with this code too much, but I suspect problem
>>>>> was
>>>>> introduced by commit fb7d3bc414939 ("mm/filemap: drop
>>>>> streaming/uncached
>>>>> pages when writeback completes") and only (more) exposed here.
>>>>>
>>>>> so in folio_end_writeback() we have
>>>>> if (__folio_end_writeback(folio))
>>>>> folio_wake_bit(folio, PG_writeback);
>>>>>
>>>>> but calling the folio_end_dropbehind_write() doesn't depend on
>>>>> the
>>>>> result of __folio_end_writeback()
>>>>> this seems rather suspicious
>>>>>
>>>>> I think if __folio_end_writeback() was true then PG_writeback
>>>>> would be
>>>>> cleared and thus we'd not see the PAGE_FLAGS_CHECK_AT_FREE
>>>>> failure.
>>>>> Instead we do a premature folio_end_dropbehind_write() dropping
>>>>> a page
>>>>> ref and then the final folio_put() in folio_end_writeback()
>>>>> frees the
>>>>> page and splats on the PG_writeback. Then the folio is
>>>>> processed again
>>>>> in the following iteration of iomap_finish_ioend() and splats
>>>>> on the
>>>>> refcount-already-zero.
>>>>>
>>>>> So I think folio_end_dropbehind_write() should only be done
>>>>> when
>>>>> __folio_end_writeback() was true. Most likely even the
>>>>> folio_test_clear_dropbehind() should be tied to that, or we
>>>>> clear it too
>>>>> early and then never act upon it later?
>>>>
>>>> Thanks for taking a look at this! I tried to reproduce this this
>>>> morning
>>>> and failed miserably. I then injected a delay for the above case,
>>>> and it
>>>> does indeed then trigger for me. So far, so good.
>>>>
>>>> I agree with your analysis, we should only be doing the
>>>> dropbehind for a
>>>> non-zero return from __folio_end_writeback(), and that includes
>>>> the
>>>> test_and_clear to avoid dropping the drop-behind state. But we
>>>> also need
>>>> to check/clear this state pre __folio_end_writeback(), which then
>>>> puts
>>>> us in a spot where it needs to potentially be re-set. Which fails
>>>> pretty
>>>> racy...
>>>>
>>>> I'll ponder this a bit. Good thing fsx got RWF_DONTCACHE support,
>>>> or I
>>>> suspect this would've taken a while to run into.
>>>
>>> Took a closer look... I may be smoking something good here, but I
>>> don't
>>> see what the __folio_end_writeback()() return value has to do with
>>> this
>>> at all. Regardless of what it returns, it should've cleared
>>> PG_writeback, and in fact the only thing it returns is whether or
>>> not we
>>> had anyone waiting on it. Which should have _zero_ bearing on
>>> whether or
>>> not we can clear/invalidate the range.
>>>
>>> To me, this smells more like a race of some sort, between dirty and
>>> invalidation. fsx does a lot of sub-page sized operations.
>>>
>>> I'll poke a bit more...
>>
>> I _think_ we're racing with the same folio being marked for writeback
>> again. Al, can you try the below?
>>
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 7b90cbeb4a1a..e95b184a2459 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -1604,7 +1604,7 @@ static void folio_end_dropbehind_write(struct
>> folio *folio)
>> * invalidation in that case.
>> */
>> if (in_task() && folio_trylock(folio)) {
>> - if (folio->mapping)
>> + if (folio->mapping && !folio_test_writeback(folio))
>> folio_unmap_invalidate(folio->mapping,
>> folio, 0);
>> folio_unlock(folio);
>> }
>>
>
> I think we need to test for PG_dirty after retaking the folio lock as
> well. Nothing stops a second thread from redirtying the page once the
> folio lock is dropped, and while some filesystems may insist on waiting
> for PG_writeback before allowing redirtying to complete, that still
> ends up racing because folio_end_dropbehind_write() is called after the
> call to __folio_end_writeback().
Agree, local version actually has both as well.
> Note that the same set of races can happen in
> filemap_end_dropbehind_read(), so we need the same set of checks after
> taking the folio lock there too. The existing checks are insufficient,
> since they only happen before taking the folio lock.
Ah good catch. I'll send out the patch tomorrow.
--
Jens Axboe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-26 23:56 ` Al Viro
@ 2025-05-27 0:58 ` Jens Axboe
2025-05-27 1:24 ` Al Viro
0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2025-05-27 0:58 UTC (permalink / raw)
To: Al Viro
Cc: Vlastimil Babka, Matthew Wilcox, Jan Kara, Christoph Hellwig,
Darrick J. Wong, Christian Brauner, linux-fsdevel, Linus Torvalds
On 5/26/25 5:56 PM, Al Viro wrote:
> On Mon, May 26, 2025 at 11:38:53AM -0600, Jens Axboe wrote:
>>> I'll poke a bit more...
>>
>> I _think_ we're racing with the same folio being marked for writeback
>> again. Al, can you try the below?
>
> It seems to survive on top of v6.15^^
Thanks for testing, Al! Assuming it goes without saying, but that's 6.15
with 478ad02d6844 reverted, right?
--
Jens Axboe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-27 0:58 ` Jens Axboe
@ 2025-05-27 1:24 ` Al Viro
2025-05-27 1:29 ` Jens Axboe
0 siblings, 1 reply; 24+ messages in thread
From: Al Viro @ 2025-05-27 1:24 UTC (permalink / raw)
To: Jens Axboe
Cc: Vlastimil Babka, Matthew Wilcox, Jan Kara, Christoph Hellwig,
Darrick J. Wong, Christian Brauner, linux-fsdevel, Linus Torvalds
On Mon, May 26, 2025 at 06:58:47PM -0600, Jens Axboe wrote:
> On 5/26/25 5:56 PM, Al Viro wrote:
> > On Mon, May 26, 2025 at 11:38:53AM -0600, Jens Axboe wrote:
> >>> I'll poke a bit more...
> >>
> >> I _think_ we're racing with the same folio being marked for writeback
> >> again. Al, can you try the below?
> >
> > It seems to survive on top of v6.15^^
>
> Thanks for testing, Al! Assuming it goes without saying, but that's 6.15
> with 478ad02d6844 reverted, right?
That's 6.15 without two last commits - 478ad02d6844 and the version bump ;-)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-27 1:24 ` Al Viro
@ 2025-05-27 1:29 ` Jens Axboe
0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2025-05-27 1:29 UTC (permalink / raw)
To: Al Viro
Cc: Vlastimil Babka, Matthew Wilcox, Jan Kara, Christoph Hellwig,
Darrick J. Wong, Christian Brauner, linux-fsdevel, Linus Torvalds
On 5/26/25 7:24 PM, Al Viro wrote:
> On Mon, May 26, 2025 at 06:58:47PM -0600, Jens Axboe wrote:
>> On 5/26/25 5:56 PM, Al Viro wrote:
>>> On Mon, May 26, 2025 at 11:38:53AM -0600, Jens Axboe wrote:
>>>>> I'll poke a bit more...
>>>>
>>>> I _think_ we're racing with the same folio being marked for writeback
>>>> again. Al, can you try the below?
>>>
>>> It seems to survive on top of v6.15^^
>>
>> Thanks for testing, Al! Assuming it goes without saying, but that's 6.15
>> with 478ad02d6844 reverted, right?
>
> That's 6.15 without two last commits - 478ad02d6844 and the version bump ;-)
OK good, I would've been confused it not, but never hurts to confirm...
FWIW, have a branch here:
https://git.kernel.dk/cgit/linux/log/?h=dontcache
with the read/write side patches, and finally the revert as well.
There's a consolidation patch that can be done on top in terms of a
cleanup, but figured it was better to keep that separate from the actual
bug fix.
--
Jens Axboe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-25 8:32 [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?) Al Viro
2025-05-25 18:02 ` Al Viro
2025-05-25 18:06 ` Al Viro
@ 2025-05-29 1:56 ` Darrick J. Wong
2025-05-31 1:10 ` Darrick J. Wong
2 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2025-05-29 1:56 UTC (permalink / raw)
To: Al Viro
Cc: Jens Axboe, Christoph Hellwig, Christian Brauner, linux-fsdevel,
Linus Torvalds
On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
> generic/127 with xfstests built on debian-testing (trixie) ends up with
> assorted memory corruption; trace below is with CONFIG_DEBUG_PAGEALLOC and
> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT and it looks like a double free
> somewhere in iomap. Unfortunately, commit in question is just making
> xfs use the infrastructure built in earlier series - not that useful
> for isolating the breakage.
>
> [ 22.001529] run fstests generic/127 at 2025-05-25 04:13:23
> [ 35.498573] BUG: Bad page state in process kworker/2:1 pfn:112ce9
> [ 35.499260] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e 9
> [ 35.499764] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)
> [ 35.500302] raw: 800000000000000e dead000000000100 dead000000000122 000000000
> [ 35.500786] raw: 000000000000003e 0000000000000000 00000000ffffffff 000000000
> [ 35.501248] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [ 35.501624] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs0
> [ 35.503209] CPU: 2 UID: 0 PID: 85 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ 7
> [ 35.503211] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.164
> [ 35.503212] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]
> [ 35.503279] Call Trace:
> [ 35.503281] <TASK>
> [ 35.503282] dump_stack_lvl+0x4f/0x60
> [ 35.503296] bad_page+0x6f/0x100
> [ 35.503300] free_frozen_pages+0x303/0x550
> [ 35.503301] iomap_finish_ioend+0xf6/0x380
> [ 35.503304] iomap_finish_ioends+0x83/0xc0
> [ 35.503305] xfs_end_ioend+0x64/0x140 [xfs]
> [ 35.503342] xfs_end_io+0x93/0xc0 [xfs]
> [ 35.503378] process_one_work+0x153/0x390
> [ 35.503382] worker_thread+0x2ab/0x3b0
>
> It's 4:30am here, so I'm going to leave attempts to actually debug that
> thing until tomorrow; I do have a kvm where it's reliably reproduced
> within a few minutes, so if anyone comes up with patches, I'll be able
> to test them.
>
> Breakage is still present in the current mainline ;-/
Hey Al,
Welll this certainly looks like the same report I made a month ago.
I'll go run 6.15 final (with the #define RWF_DONTCACHE 0) overnight to
confirm if that makes my problem go away. If these are one and the same
bug, then thank you for finding a better reproducer! :)
https://lore.kernel.org/linux-fsdevel/20250416180837.GN25675@frogsfrogsfrogs/
--D
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-29 1:56 ` Darrick J. Wong
@ 2025-05-31 1:10 ` Darrick J. Wong
2025-05-31 21:00 ` Jens Axboe
0 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2025-05-31 1:10 UTC (permalink / raw)
To: Al Viro
Cc: Jens Axboe, Christoph Hellwig, Christian Brauner, linux-fsdevel,
Linus Torvalds
On Wed, May 28, 2025 at 06:56:37PM -0700, Darrick J. Wong wrote:
> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
> > generic/127 with xfstests built on debian-testing (trixie) ends up with
> > assorted memory corruption; trace below is with CONFIG_DEBUG_PAGEALLOC and
> > CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT and it looks like a double free
> > somewhere in iomap. Unfortunately, commit in question is just making
> > xfs use the infrastructure built in earlier series - not that useful
> > for isolating the breakage.
> >
> > [ 22.001529] run fstests generic/127 at 2025-05-25 04:13:23
> > [ 35.498573] BUG: Bad page state in process kworker/2:1 pfn:112ce9
> > [ 35.499260] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e 9
> > [ 35.499764] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)
> > [ 35.500302] raw: 800000000000000e dead000000000100 dead000000000122 000000000
> > [ 35.500786] raw: 000000000000003e 0000000000000000 00000000ffffffff 000000000
> > [ 35.501248] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> > [ 35.501624] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs0
> > [ 35.503209] CPU: 2 UID: 0 PID: 85 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ 7
> > [ 35.503211] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.164
> > [ 35.503212] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]
> > [ 35.503279] Call Trace:
> > [ 35.503281] <TASK>
> > [ 35.503282] dump_stack_lvl+0x4f/0x60
> > [ 35.503296] bad_page+0x6f/0x100
> > [ 35.503300] free_frozen_pages+0x303/0x550
> > [ 35.503301] iomap_finish_ioend+0xf6/0x380
> > [ 35.503304] iomap_finish_ioends+0x83/0xc0
> > [ 35.503305] xfs_end_ioend+0x64/0x140 [xfs]
> > [ 35.503342] xfs_end_io+0x93/0xc0 [xfs]
> > [ 35.503378] process_one_work+0x153/0x390
> > [ 35.503382] worker_thread+0x2ab/0x3b0
> >
> > It's 4:30am here, so I'm going to leave attempts to actually debug that
> > thing until tomorrow; I do have a kvm where it's reliably reproduced
> > within a few minutes, so if anyone comes up with patches, I'll be able
> > to test them.
> >
> > Breakage is still present in the current mainline ;-/
>
> Hey Al,
>
> Welll this certainly looks like the same report I made a month ago.
> I'll go run 6.15 final (with the #define RWF_DONTCACHE 0) overnight to
> confirm if that makes my problem go away. If these are one and the same
> bug, then thank you for finding a better reproducer! :)
>
> https://lore.kernel.org/linux-fsdevel/20250416180837.GN25675@frogsfrogsfrogs/
After a full QA run, 6.15 final passes fstests with flying colors. So I
guess we now know the culprit. Will test the new RWF_DONTCACHE fixes
whenever they appear in upstream.
--D
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-31 1:10 ` Darrick J. Wong
@ 2025-05-31 21:00 ` Jens Axboe
2025-06-02 9:04 ` Christian Brauner
0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2025-05-31 21:00 UTC (permalink / raw)
To: Darrick J. Wong, Al Viro
Cc: Christoph Hellwig, Christian Brauner, linux-fsdevel,
Linus Torvalds
On 5/30/25 7:10 PM, Darrick J. Wong wrote:
> On Wed, May 28, 2025 at 06:56:37PM -0700, Darrick J. Wong wrote:
>> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
>>> generic/127 with xfstests built on debian-testing (trixie) ends up with
>>> assorted memory corruption; trace below is with CONFIG_DEBUG_PAGEALLOC and
>>> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT and it looks like a double free
>>> somewhere in iomap. Unfortunately, commit in question is just making
>>> xfs use the infrastructure built in earlier series - not that useful
>>> for isolating the breakage.
>>>
>>> [ 22.001529] run fstests generic/127 at 2025-05-25 04:13:23
>>> [ 35.498573] BUG: Bad page state in process kworker/2:1 pfn:112ce9
>>> [ 35.499260] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e 9
>>> [ 35.499764] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)
>>> [ 35.500302] raw: 800000000000000e dead000000000100 dead000000000122 000000000
>>> [ 35.500786] raw: 000000000000003e 0000000000000000 00000000ffffffff 000000000
>>> [ 35.501248] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>>> [ 35.501624] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs0
>>> [ 35.503209] CPU: 2 UID: 0 PID: 85 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ 7
>>> [ 35.503211] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.164
>>> [ 35.503212] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]
>>> [ 35.503279] Call Trace:
>>> [ 35.503281] <TASK>
>>> [ 35.503282] dump_stack_lvl+0x4f/0x60
>>> [ 35.503296] bad_page+0x6f/0x100
>>> [ 35.503300] free_frozen_pages+0x303/0x550
>>> [ 35.503301] iomap_finish_ioend+0xf6/0x380
>>> [ 35.503304] iomap_finish_ioends+0x83/0xc0
>>> [ 35.503305] xfs_end_ioend+0x64/0x140 [xfs]
>>> [ 35.503342] xfs_end_io+0x93/0xc0 [xfs]
>>> [ 35.503378] process_one_work+0x153/0x390
>>> [ 35.503382] worker_thread+0x2ab/0x3b0
>>>
>>> It's 4:30am here, so I'm going to leave attempts to actually debug that
>>> thing until tomorrow; I do have a kvm where it's reliably reproduced
>>> within a few minutes, so if anyone comes up with patches, I'll be able
>>> to test them.
>>>
>>> Breakage is still present in the current mainline ;-/
>>
>> Hey Al,
>>
>> Welll this certainly looks like the same report I made a month ago.
>> I'll go run 6.15 final (with the #define RWF_DONTCACHE 0) overnight to
>> confirm if that makes my problem go away. If these are one and the same
>> bug, then thank you for finding a better reproducer! :)
>>
>> https://lore.kernel.org/linux-fsdevel/20250416180837.GN25675@frogsfrogsfrogs/
>
> After a full QA run, 6.15 final passes fstests with flying colors. So I
> guess we now know the culprit. Will test the new RWF_DONTCACHE fixes
> whenever they appear in upstream.
Please do! Unfortunately I never saw your original report as I wasn't
CC'ed on it, which I can't really fault anyone for as there was no
reason to suspect it so far.
--
Jens Axboe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
2025-05-31 21:00 ` Jens Axboe
@ 2025-06-02 9:04 ` Christian Brauner
0 siblings, 0 replies; 24+ messages in thread
From: Christian Brauner @ 2025-06-02 9:04 UTC (permalink / raw)
To: Jens Axboe
Cc: Darrick J. Wong, Al Viro, Christoph Hellwig, linux-fsdevel,
Linus Torvalds
> >> https://lore.kernel.org/linux-fsdevel/20250416180837.GN25675@frogsfrogsfrogs/
> >
> > After a full QA run, 6.15 final passes fstests with flying colors. So I
> > guess we now know the culprit. Will test the new RWF_DONTCACHE fixes
> > whenever they appear in upstream.
>
> Please do! Unfortunately I never saw your original report as I wasn't
> CC'ed on it, which I can't really fault anyone for as there was no
> reason to suspect it so far.
I've just sent the pull request with the fixes a minute ago.
Thanks for testing!
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2025-06-02 9:04 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-25 8:32 [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?) Al Viro
2025-05-25 18:02 ` Al Viro
2025-05-25 18:06 ` Al Viro
2025-05-25 19:12 ` Vlastimil Babka
2025-05-25 20:32 ` Linus Torvalds
2025-05-25 20:48 ` Matthew Wilcox
2025-05-25 20:54 ` Linus Torvalds
2025-05-25 21:49 ` Al Viro
2025-05-25 22:05 ` Linus Torvalds
2025-05-26 13:05 ` Jens Axboe
2025-05-26 15:06 ` Jens Axboe
2025-05-26 15:31 ` Vlastimil Babka
2025-05-26 15:58 ` Jens Axboe
2025-05-26 17:38 ` Jens Axboe
2025-05-26 23:56 ` Al Viro
2025-05-27 0:58 ` Jens Axboe
2025-05-27 1:24 ` Al Viro
2025-05-27 1:29 ` Jens Axboe
2025-05-27 0:51 ` Trond Myklebust
2025-05-27 0:56 ` Jens Axboe
2025-05-29 1:56 ` Darrick J. Wong
2025-05-31 1:10 ` Darrick J. Wong
2025-05-31 21:00 ` Jens Axboe
2025-06-02 9:04 ` Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).