From: Jens Axboe <axboe@kernel.dk>
To: Vlastimil Babka <vbabka@suse.cz>,
Al Viro <viro@zeniv.linux.org.uk>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>,
"Darrick J. Wong" <djwong@kernel.org>,
Christian Brauner <brauner@kernel.org>,
linux-fsdevel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)
Date: Mon, 26 May 2025 07:05:16 -0600 [thread overview]
Message-ID: <431cb497-b1ad-40a0-86b1-228b0b7490b9@kernel.dk> (raw)
In-Reply-To: <40eeba97-a298-4ae1-9691-b5911ad00095@suse.cz>
On 5/25/25 1:12 PM, Vlastimil Babka wrote:
> On 5/25/25 8:06 PM, Al Viro wrote:
>> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
>>
>>> Breakage is still present in the current mainline ;-/
>>
>> With CONFIG_DEBUG_VM on top of pagealloc debugging:
>>
>> [ 1434.992817] run fstests generic/127 at 2025-05-25 11:46:11g
>> [ 1448.956242] BUG: Bad page state in process kworker/2:1 pfn:112cb0g
>> [ 1448.956846] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
>> [ 1448.957453] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
>
> It doesn't like the writeback flag.
>
>> [ 1448.957863] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
>> [ 1448.958303] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
>> [ 1448.958833] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) setg
>> [ 1448.959320] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc loop ecryptfs 9pnet_virtio 9pnet netfs evdev pcspkr sg button ext4 jbd2 btrfs blake2b_generic xor zlib_deflate raid6_pq zstd_compress sr_mod cdrom ata_generic ata_piix psmouse serio_raw i2c_piix4 i2c_smbus libata e1000g
>> [ 1448.960874] CPU: 2 UID: 0 PID: 2614 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ #78g
>> [ 1448.960878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014g
>> [ 1448.960879] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]g
>> [ 1448.960938] Call Trace:g
>> [ 1448.960939] <TASK>g
>> [ 1448.960940] dump_stack_lvl+0x4f/0x60g
>> [ 1448.960953] bad_page+0x6f/0x100g
>> [ 1448.960957] free_frozen_pages+0x471/0x640g
>> [ 1448.960958] iomap_finish_ioend+0x196/0x3c0g
>> [ 1448.960963] iomap_finish_ioends+0x83/0xc0g
>> [ 1448.960964] xfs_end_ioend+0x64/0x140 [xfs]g
>> [ 1448.961003] xfs_end_io+0x93/0xc0 [xfs]g
>> [ 1448.961036] process_one_work+0x153/0x390g
>> [ 1448.961044] worker_thread+0x2ab/0x3b0g
>> [ 1448.961045] ? rescuer_thread+0x470/0x470g
>> [ 1448.961047] kthread+0xf7/0x200g
>> [ 1448.961048] ? kthread_use_mm+0xa0/0xa0g
>> [ 1448.961049] ret_from_fork+0x2d/0x50g
>> [ 1448.961053] ? kthread_use_mm+0xa0/0xa0g
>> [ 1448.961054] ret_from_fork_asm+0x11/0x20g
>> [ 1448.961058] </TASK>g
>> [ 1448.961155] Disabling lock debugging due to kernel taintg
>> [ 1448.969569] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g
>
> same pfn, same struct page
>
>> [ 1448.970023] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g
>> [ 1448.970651] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g
>> [ 1448.971222] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g
>> [ 1448.971812] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))g
>> [ 1448.972490] ------------[ cut here ]------------g
>> [ 1448.972841] kernel BUG at ./include/linux/mm.h:1455!g
>
> this is folio_get() noticing refcount is 0, so a use-after free, because
> we already tried to free the page above.
>
> I'm not familiar with this code too much, but I suspect problem was
> introduced by commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached
> pages when writeback completes") and only (more) exposed here.
>
> so in folio_end_writeback() we have
> if (__folio_end_writeback(folio))
> folio_wake_bit(folio, PG_writeback);
>
> but calling the folio_end_dropbehind_write() doesn't depend on the
> result of __folio_end_writeback()
> this seems rather suspicious
>
> I think if __folio_end_writeback() was true then PG_writeback would be
> cleared and thus we'd not see the PAGE_FLAGS_CHECK_AT_FREE failure.
> Instead we do a premature folio_end_dropbehind_write() dropping a page
> ref and then the final folio_put() in folio_end_writeback() frees the
> page and splats on the PG_writeback. Then the folio is processed again
> in the following iteration of iomap_finish_ioend() and splats on the
> refcount-already-zero.
>
> So I think folio_end_dropbehind_write() should only be done when
> __folio_end_writeback() was true. Most likely even the
> folio_test_clear_dropbehind() should be tied to that, or we clear it too
> early and then never act upon it later?
Thanks for taking a look at this! I tried to reproduce this this morning
and failed miserably. I then injected a delay for the above case, and it
does indeed then trigger for me. So far, so good.
I agree with your analysis, we should only be doing the dropbehind for a
non-zero return from __folio_end_writeback(), and that includes the
test_and_clear to avoid dropping the drop-behind state. But we also need
to check/clear this state pre __folio_end_writeback(), which then puts
us in a spot where it needs to potentially be re-set. Which fails pretty
racy...
I'll ponder this a bit. Good thing fsx got RWF_DONTCACHE support, or I
suspect this would've taken a while to run into.
--
Jens Axboe
next prev parent reply other threads:[~2025-05-26 13:05 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-25 8:32 [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?) Al Viro
2025-05-25 18:02 ` Al Viro
2025-05-25 18:06 ` Al Viro
2025-05-25 19:12 ` Vlastimil Babka
2025-05-25 20:32 ` Linus Torvalds
2025-05-25 20:48 ` Matthew Wilcox
2025-05-25 20:54 ` Linus Torvalds
2025-05-25 21:49 ` Al Viro
2025-05-25 22:05 ` Linus Torvalds
2025-05-26 13:05 ` Jens Axboe [this message]
2025-05-26 15:06 ` Jens Axboe
2025-05-26 15:31 ` Vlastimil Babka
2025-05-26 15:58 ` Jens Axboe
2025-05-26 17:38 ` Jens Axboe
2025-05-26 23:56 ` Al Viro
2025-05-27 0:58 ` Jens Axboe
2025-05-27 1:24 ` Al Viro
2025-05-27 1:29 ` Jens Axboe
2025-05-27 0:51 ` Trond Myklebust
2025-05-27 0:56 ` Jens Axboe
2025-05-29 1:56 ` Darrick J. Wong
2025-05-31 1:10 ` Darrick J. Wong
2025-05-31 21:00 ` Jens Axboe
2025-06-02 9:04 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=431cb497-b1ad-40a0-86b1-228b0b7490b9@kernel.dk \
--to=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).