linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Dave Chinner <david@fromorbit.com>
Cc: Christian Brauner <brauner@kernel.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH] iomap: don't lose folio dropbehind state for overwrites
Date: Tue, 27 May 2025 15:17:02 -0600	[thread overview]
Message-ID: <555564a3-cc29-4fc1-a708-9ef395469d90@kernel.dk> (raw)
In-Reply-To: <aDYqtuXdLvcSl78t@dread.disaster.area>

On 5/27/25 3:12 PM, Dave Chinner wrote:
> On Tue, May 27, 2025 at 09:43:42AM -0600, Jens Axboe wrote:
>> DONTCACHE I/O must have the completion punted to a workqueue, just like
>> what is done for unwritten extents, as the completion needs task context
>> to perform the invalidation of the folio(s). However, if writeback is
>> started off filemap_fdatawrite_range() off generic_sync() and it's an
>> overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend()
>> don't look at the folio being added and no further state is passed down
>> to help it know that this is a dropbehind/DONTCACHE write.
>>
>> Check if the folio being added is marked as dropbehind, and set
>> IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into
>> the decision making of completion context in xfs_submit_ioend().
>> Additionally include this ioend flag in the NOMERGE flags, to avoid
>> mixing it with unrelated IO.
>>
>> This fixes extra page cache being instantiated when the write performed
>> is an overwrite, rather than newly instantiated blocks.
>>
>> Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE")
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> Found this one while testing the unrelated issue of invalidation being a
>> bit broken before 6.15 release. We need this to ensure that overwrites
>> also prune correctly, just like unwritten extents currently do.
> 
> I wondered about the stack traces showing DONTCACHE writeback
> completion being handled from irq context[*] when I read the -fsdevel
> thread about broken DONTCACHE functionality yesterday.
> 
> [*] second trace in the failure reported in this comment:
>
> https://lore.kernel.org/linux-fsdevel/432302ad-aa95-44f4-8728-77e61cc1f20c@kernel.dk/

Indeed, though that could've been a "normal" write and not a DONTCACHE
one. But with the bug being fixed by this one, both would've gone that
path...

 
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 233abf598f65..3729391a18f3 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -1691,6 +1691,8 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>>  		ioend_flags |= IOMAP_IOEND_UNWRITTEN;
>>  	if (wpc->iomap.flags & IOMAP_F_SHARED)
>>  		ioend_flags |= IOMAP_IOEND_SHARED;
>> +	if (folio_test_dropbehind(folio))
>> +		ioend_flags |= IOMAP_IOEND_DONTCACHE;
>>  	if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
>>  		ioend_flags |= IOMAP_IOEND_BOUNDARY;
>>  
>> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
>> index 26a04a783489..1b7a006402ea 100644
>> --- a/fs/xfs/xfs_aops.c
>> +++ b/fs/xfs/xfs_aops.c
>> @@ -436,6 +436,9 @@ xfs_map_blocks(
>>  	return 0;
>>  }
>>  
>> +#define IOEND_WQ_FLAGS	(IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED | \
>> +			 IOMAP_IOEND_DONTCACHE)
>> +
>>  static int
>>  xfs_submit_ioend(
>>  	struct iomap_writepage_ctx *wpc,
>> @@ -460,8 +463,7 @@ xfs_submit_ioend(
>>  	memalloc_nofs_restore(nofs_flag);
>>  
>>  	/* send ioends that might require a transaction to the completion wq */
>> -	if (xfs_ioend_is_append(ioend) ||
>> -	    (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED)))
>> +	if (xfs_ioend_is_append(ioend) || ioend->io_flags & IOEND_WQ_FLAGS)
>>  		ioend->io_bio.bi_end_io = xfs_end_bio;
>>  
>>  	if (status)
> 
> IMO, this would be cleaner as a helper so that individual cases can
> be commented correctly, as page cache invalidation does not actually
> require a transaction...
> 
> Something like:
> 
> static bool
> xfs_ioend_needs_wq_completion(
> 	struct xfs_ioend	*ioend)
> {
> 	/* Changing inode size requires a transaction. */
> 	if (xfs_ioend_is_append(ioend))
> 		return true;
> 
> 	/* Extent manipulation requires a transaction. */
> 	if (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED))
> 		return true;
> 
> 	/* Page cache invalidation cannot be done in irq context. */
> 	if (ioend->io_flags & IOMAP_IOEND_DONTCACHE)
> 		return true;
> 
> 	return false;
> }
> 
> Otherwise seems fine.

Yeah I like that, gets rid of the need to add the mask as well. I'll
spin a v2 and add the helper.

-- 
Jens Axboe

  reply	other threads:[~2025-05-27 21:17 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-27 15:43 [PATCH] iomap: don't lose folio dropbehind state for overwrites Jens Axboe
2025-05-27 21:12 ` Dave Chinner
2025-05-27 21:17   ` Jens Axboe [this message]
2025-06-02 17:46 ` Ritesh Harjani
2025-06-02 18:04   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=555564a3-cc29-4fc1-a708-9ef395469d90@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).