From: Jens Axboe <axboe@kernel.dk>
To: Dave Chinner <david@fromorbit.com>
Cc: Christian Brauner <brauner@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH] iomap: don't lose folio dropbehind state for overwrites
Date: Tue, 27 May 2025 15:17:02 -0600 [thread overview]
Message-ID: <555564a3-cc29-4fc1-a708-9ef395469d90@kernel.dk> (raw)
In-Reply-To: <aDYqtuXdLvcSl78t@dread.disaster.area>
On 5/27/25 3:12 PM, Dave Chinner wrote:
> On Tue, May 27, 2025 at 09:43:42AM -0600, Jens Axboe wrote:
>> DONTCACHE I/O must have the completion punted to a workqueue, just like
>> what is done for unwritten extents, as the completion needs task context
>> to perform the invalidation of the folio(s). However, if writeback is
>> started off filemap_fdatawrite_range() off generic_sync() and it's an
>> overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend()
>> don't look at the folio being added and no further state is passed down
>> to help it know that this is a dropbehind/DONTCACHE write.
>>
>> Check if the folio being added is marked as dropbehind, and set
>> IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into
>> the decision making of completion context in xfs_submit_ioend().
>> Additionally include this ioend flag in the NOMERGE flags, to avoid
>> mixing it with unrelated IO.
>>
>> This fixes extra page cache being instantiated when the write performed
>> is an overwrite, rather than newly instantiated blocks.
>>
>> Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE")
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> Found this one while testing the unrelated issue of invalidation being a
>> bit broken before 6.15 release. We need this to ensure that overwrites
>> also prune correctly, just like unwritten extents currently do.
>
> I wondered about the stack traces showing DONTCACHE writeback
> completion being handled from irq context[*] when I read the -fsdevel
> thread about broken DONTCACHE functionality yesterday.
>
> [*] second trace in the failure reported in this comment:
>
> https://lore.kernel.org/linux-fsdevel/432302ad-aa95-44f4-8728-77e61cc1f20c@kernel.dk/
Indeed, though that could've been a "normal" write and not a DONTCACHE
one. But with the bug being fixed by this one, both would've gone that
path...
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 233abf598f65..3729391a18f3 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -1691,6 +1691,8 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>> ioend_flags |= IOMAP_IOEND_UNWRITTEN;
>> if (wpc->iomap.flags & IOMAP_F_SHARED)
>> ioend_flags |= IOMAP_IOEND_SHARED;
>> + if (folio_test_dropbehind(folio))
>> + ioend_flags |= IOMAP_IOEND_DONTCACHE;
>> if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
>> ioend_flags |= IOMAP_IOEND_BOUNDARY;
>>
>> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
>> index 26a04a783489..1b7a006402ea 100644
>> --- a/fs/xfs/xfs_aops.c
>> +++ b/fs/xfs/xfs_aops.c
>> @@ -436,6 +436,9 @@ xfs_map_blocks(
>> return 0;
>> }
>>
>> +#define IOEND_WQ_FLAGS (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED | \
>> + IOMAP_IOEND_DONTCACHE)
>> +
>> static int
>> xfs_submit_ioend(
>> struct iomap_writepage_ctx *wpc,
>> @@ -460,8 +463,7 @@ xfs_submit_ioend(
>> memalloc_nofs_restore(nofs_flag);
>>
>> /* send ioends that might require a transaction to the completion wq */
>> - if (xfs_ioend_is_append(ioend) ||
>> - (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED)))
>> + if (xfs_ioend_is_append(ioend) || ioend->io_flags & IOEND_WQ_FLAGS)
>> ioend->io_bio.bi_end_io = xfs_end_bio;
>>
>> if (status)
>
> IMO, this would be cleaner as a helper so that individual cases can
> be commented correctly, as page cache invalidation does not actually
> require a transaction...
>
> Something like:
>
> static bool
> xfs_ioend_needs_wq_completion(
> struct xfs_ioend *ioend)
> {
> /* Changing inode size requires a transaction. */
> if (xfs_ioend_is_append(ioend))
> return true;
>
> /* Extent manipulation requires a transaction. */
> if (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED))
> return true;
>
> /* Page cache invalidation cannot be done in irq context. */
> if (ioend->io_flags & IOMAP_IOEND_DONTCACHE)
> return true;
>
> return false;
> }
>
> Otherwise seems fine.
Yeah I like that, gets rid of the need to add the mask as well. I'll
spin a v2 and add the helper.
--
Jens Axboe
next prev parent reply other threads:[~2025-05-27 21:17 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-27 15:43 [PATCH] iomap: don't lose folio dropbehind state for overwrites Jens Axboe
2025-05-27 21:12 ` Dave Chinner
2025-05-27 21:17 ` Jens Axboe [this message]
2025-06-02 17:46 ` Ritesh Harjani
2025-06-02 18:04 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=555564a3-cc29-4fc1-a708-9ef395469d90@kernel.dk \
--to=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).