* [PATCH] iomap: don't lose folio dropbehind state for overwrites
@ 2025-05-27 15:43 Jens Axboe
2025-05-27 21:12 ` Dave Chinner
2025-06-02 17:46 ` Ritesh Harjani
0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2025-05-27 15:43 UTC (permalink / raw)
To: Christian Brauner, Darrick J. Wong
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
DONTCACHE I/O must have the completion punted to a workqueue, just like
what is done for unwritten extents, as the completion needs task context
to perform the invalidation of the folio(s). However, if writeback is
started off filemap_fdatawrite_range() off generic_sync() and it's an
overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend()
don't look at the folio being added and no further state is passed down
to help it know that this is a dropbehind/DONTCACHE write.
Check if the folio being added is marked as dropbehind, and set
IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into
the decision making of completion context in xfs_submit_ioend().
Additionally include this ioend flag in the NOMERGE flags, to avoid
mixing it with unrelated IO.
This fixes extra page cache being instantiated when the write performed
is an overwrite, rather than newly instantiated blocks.
Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
Found this one while testing the unrelated issue of invalidation being a
bit broken before 6.15 release. We need this to ensure that overwrites
also prune correctly, just like unwritten extents currently do.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 233abf598f65..3729391a18f3 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1691,6 +1691,8 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
ioend_flags |= IOMAP_IOEND_UNWRITTEN;
if (wpc->iomap.flags & IOMAP_F_SHARED)
ioend_flags |= IOMAP_IOEND_SHARED;
+ if (folio_test_dropbehind(folio))
+ ioend_flags |= IOMAP_IOEND_DONTCACHE;
if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
ioend_flags |= IOMAP_IOEND_BOUNDARY;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 26a04a783489..1b7a006402ea 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -436,6 +436,9 @@ xfs_map_blocks(
return 0;
}
+#define IOEND_WQ_FLAGS (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED | \
+ IOMAP_IOEND_DONTCACHE)
+
static int
xfs_submit_ioend(
struct iomap_writepage_ctx *wpc,
@@ -460,8 +463,7 @@ xfs_submit_ioend(
memalloc_nofs_restore(nofs_flag);
/* send ioends that might require a transaction to the completion wq */
- if (xfs_ioend_is_append(ioend) ||
- (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED)))
+ if (xfs_ioend_is_append(ioend) || ioend->io_flags & IOEND_WQ_FLAGS)
ioend->io_bio.bi_end_io = xfs_end_bio;
if (status)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 68416b135151..522644d62f30 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -377,13 +377,16 @@ sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
#define IOMAP_IOEND_BOUNDARY (1U << 2)
/* is direct I/O */
#define IOMAP_IOEND_DIRECT (1U << 3)
+/* is DONTCACHE I/O */
+#define IOMAP_IOEND_DONTCACHE (1U << 4)
/*
* Flags that if set on either ioend prevent the merge of two ioends.
* (IOMAP_IOEND_BOUNDARY also prevents merges, but only one-way)
*/
#define IOMAP_IOEND_NOMERGE_FLAGS \
- (IOMAP_IOEND_SHARED | IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_DIRECT)
+ (IOMAP_IOEND_SHARED | IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_DIRECT | \
+ IOMAP_IOEND_DONTCACHE)
/*
* Structure for writeback I/O completions.
--
Jens Axboe
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] iomap: don't lose folio dropbehind state for overwrites
2025-05-27 15:43 [PATCH] iomap: don't lose folio dropbehind state for overwrites Jens Axboe
@ 2025-05-27 21:12 ` Dave Chinner
2025-05-27 21:17 ` Jens Axboe
2025-06-02 17:46 ` Ritesh Harjani
1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2025-05-27 21:12 UTC (permalink / raw)
To: Jens Axboe
Cc: Christian Brauner, Darrick J. Wong, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On Tue, May 27, 2025 at 09:43:42AM -0600, Jens Axboe wrote:
> DONTCACHE I/O must have the completion punted to a workqueue, just like
> what is done for unwritten extents, as the completion needs task context
> to perform the invalidation of the folio(s). However, if writeback is
> started off filemap_fdatawrite_range() off generic_sync() and it's an
> overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend()
> don't look at the folio being added and no further state is passed down
> to help it know that this is a dropbehind/DONTCACHE write.
>
> Check if the folio being added is marked as dropbehind, and set
> IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into
> the decision making of completion context in xfs_submit_ioend().
> Additionally include this ioend flag in the NOMERGE flags, to avoid
> mixing it with unrelated IO.
>
> This fixes extra page cache being instantiated when the write performed
> is an overwrite, rather than newly instantiated blocks.
>
> Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE")
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> ---
>
> Found this one while testing the unrelated issue of invalidation being a
> bit broken before 6.15 release. We need this to ensure that overwrites
> also prune correctly, just like unwritten extents currently do.
I wondered about the stack traces showing DONTCACHE writeback
completion being handled from irq context[*] when I read the -fsdevel
thread about broken DONTCACHE functionality yesterday.
[*] second trace in the failure reported in this comment:
https://lore.kernel.org/linux-fsdevel/432302ad-aa95-44f4-8728-77e61cc1f20c@kernel.dk/
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 233abf598f65..3729391a18f3 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1691,6 +1691,8 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
> ioend_flags |= IOMAP_IOEND_UNWRITTEN;
> if (wpc->iomap.flags & IOMAP_F_SHARED)
> ioend_flags |= IOMAP_IOEND_SHARED;
> + if (folio_test_dropbehind(folio))
> + ioend_flags |= IOMAP_IOEND_DONTCACHE;
> if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
> ioend_flags |= IOMAP_IOEND_BOUNDARY;
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 26a04a783489..1b7a006402ea 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -436,6 +436,9 @@ xfs_map_blocks(
> return 0;
> }
>
> +#define IOEND_WQ_FLAGS (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED | \
> + IOMAP_IOEND_DONTCACHE)
> +
> static int
> xfs_submit_ioend(
> struct iomap_writepage_ctx *wpc,
> @@ -460,8 +463,7 @@ xfs_submit_ioend(
> memalloc_nofs_restore(nofs_flag);
>
> /* send ioends that might require a transaction to the completion wq */
> - if (xfs_ioend_is_append(ioend) ||
> - (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED)))
> + if (xfs_ioend_is_append(ioend) || ioend->io_flags & IOEND_WQ_FLAGS)
> ioend->io_bio.bi_end_io = xfs_end_bio;
>
> if (status)
IMO, this would be cleaner as a helper so that individual cases can
be commented correctly, as page cache invalidation does not actually
require a transaction...
Something like:
static bool
xfs_ioend_needs_wq_completion(
struct xfs_ioend *ioend)
{
/* Changing inode size requires a transaction. */
if (xfs_ioend_is_append(ioend))
return true;
/* Extent manipulation requires a transaction. */
if (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED))
return true;
/* Page cache invalidation cannot be done in irq context. */
if (ioend->io_flags & IOMAP_IOEND_DONTCACHE)
return true;
return false;
}
Otherwise seems fine.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] iomap: don't lose folio dropbehind state for overwrites
2025-05-27 21:12 ` Dave Chinner
@ 2025-05-27 21:17 ` Jens Axboe
0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2025-05-27 21:17 UTC (permalink / raw)
To: Dave Chinner
Cc: Christian Brauner, Darrick J. Wong, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org
On 5/27/25 3:12 PM, Dave Chinner wrote:
> On Tue, May 27, 2025 at 09:43:42AM -0600, Jens Axboe wrote:
>> DONTCACHE I/O must have the completion punted to a workqueue, just like
>> what is done for unwritten extents, as the completion needs task context
>> to perform the invalidation of the folio(s). However, if writeback is
>> started off filemap_fdatawrite_range() off generic_sync() and it's an
>> overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend()
>> don't look at the folio being added and no further state is passed down
>> to help it know that this is a dropbehind/DONTCACHE write.
>>
>> Check if the folio being added is marked as dropbehind, and set
>> IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into
>> the decision making of completion context in xfs_submit_ioend().
>> Additionally include this ioend flag in the NOMERGE flags, to avoid
>> mixing it with unrelated IO.
>>
>> This fixes extra page cache being instantiated when the write performed
>> is an overwrite, rather than newly instantiated blocks.
>>
>> Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE")
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> Found this one while testing the unrelated issue of invalidation being a
>> bit broken before 6.15 release. We need this to ensure that overwrites
>> also prune correctly, just like unwritten extents currently do.
>
> I wondered about the stack traces showing DONTCACHE writeback
> completion being handled from irq context[*] when I read the -fsdevel
> thread about broken DONTCACHE functionality yesterday.
>
> [*] second trace in the failure reported in this comment:
>
> https://lore.kernel.org/linux-fsdevel/432302ad-aa95-44f4-8728-77e61cc1f20c@kernel.dk/
Indeed, though that could've been a "normal" write and not a DONTCACHE
one. But with the bug being fixed by this one, both would've gone that
path...
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 233abf598f65..3729391a18f3 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -1691,6 +1691,8 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>> ioend_flags |= IOMAP_IOEND_UNWRITTEN;
>> if (wpc->iomap.flags & IOMAP_F_SHARED)
>> ioend_flags |= IOMAP_IOEND_SHARED;
>> + if (folio_test_dropbehind(folio))
>> + ioend_flags |= IOMAP_IOEND_DONTCACHE;
>> if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
>> ioend_flags |= IOMAP_IOEND_BOUNDARY;
>>
>> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
>> index 26a04a783489..1b7a006402ea 100644
>> --- a/fs/xfs/xfs_aops.c
>> +++ b/fs/xfs/xfs_aops.c
>> @@ -436,6 +436,9 @@ xfs_map_blocks(
>> return 0;
>> }
>>
>> +#define IOEND_WQ_FLAGS (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED | \
>> + IOMAP_IOEND_DONTCACHE)
>> +
>> static int
>> xfs_submit_ioend(
>> struct iomap_writepage_ctx *wpc,
>> @@ -460,8 +463,7 @@ xfs_submit_ioend(
>> memalloc_nofs_restore(nofs_flag);
>>
>> /* send ioends that might require a transaction to the completion wq */
>> - if (xfs_ioend_is_append(ioend) ||
>> - (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED)))
>> + if (xfs_ioend_is_append(ioend) || ioend->io_flags & IOEND_WQ_FLAGS)
>> ioend->io_bio.bi_end_io = xfs_end_bio;
>>
>> if (status)
>
> IMO, this would be cleaner as a helper so that individual cases can
> be commented correctly, as page cache invalidation does not actually
> require a transaction...
>
> Something like:
>
> static bool
> xfs_ioend_needs_wq_completion(
> struct xfs_ioend *ioend)
> {
> /* Changing inode size requires a transaction. */
> if (xfs_ioend_is_append(ioend))
> return true;
>
> /* Extent manipulation requires a transaction. */
> if (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED))
> return true;
>
> /* Page cache invalidation cannot be done in irq context. */
> if (ioend->io_flags & IOMAP_IOEND_DONTCACHE)
> return true;
>
> return false;
> }
>
> Otherwise seems fine.
Yeah I like that, gets rid of the need to add the mask as well. I'll
spin a v2 and add the helper.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] iomap: don't lose folio dropbehind state for overwrites
2025-05-27 15:43 [PATCH] iomap: don't lose folio dropbehind state for overwrites Jens Axboe
2025-05-27 21:12 ` Dave Chinner
@ 2025-06-02 17:46 ` Ritesh Harjani
2025-06-02 18:04 ` Jens Axboe
1 sibling, 1 reply; 5+ messages in thread
From: Ritesh Harjani @ 2025-06-02 17:46 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Christian Brauner, Darrick J. Wong
Jens Axboe <axboe@kernel.dk> writes:
> DONTCACHE I/O must have the completion punted to a workqueue, just like
> what is done for unwritten extents, as the completion needs task context
> to perform the invalidation of the folio(s). However, if writeback is
> started off filemap_fdatawrite_range() off generic_sync() and it's an
> overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend()
> don't look at the folio being added and no further state is passed down
> to help it know that this is a dropbehind/DONTCACHE write.
>
> Check if the folio being added is marked as dropbehind, and set
> IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into
> the decision making of completion context in xfs_submit_ioend().
> Additionally include this ioend flag in the NOMERGE flags, to avoid
> mixing it with unrelated IO.
>
> This fixes extra page cache being instantiated when the write performed
> is an overwrite, rather than newly instantiated blocks.
>
> Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE")
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> ---
>
> Found this one while testing the unrelated issue of invalidation being a
> bit broken before 6.15 release. We need this to ensure that overwrites
> also prune correctly, just like unwritten extents currently do.
I guess I did report this to you a while ago when I was adding support
for uncahed buffered-io to xfs_io. But I never heard back from you :(
https://lore.kernel.org/all/87h649trof.fsf@gmail.com/
No worries, good that we finally have this fixed.
-ritesh
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] iomap: don't lose folio dropbehind state for overwrites
2025-06-02 17:46 ` Ritesh Harjani
@ 2025-06-02 18:04 ` Jens Axboe
0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2025-06-02 18:04 UTC (permalink / raw)
To: Ritesh Harjani (IBM)
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Christian Brauner, Darrick J. Wong
On 6/2/25 11:46 AM, Ritesh Harjani (IBM) wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>
>> DONTCACHE I/O must have the completion punted to a workqueue, just like
>> what is done for unwritten extents, as the completion needs task context
>> to perform the invalidation of the folio(s). However, if writeback is
>> started off filemap_fdatawrite_range() off generic_sync() and it's an
>> overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend()
>> don't look at the folio being added and no further state is passed down
>> to help it know that this is a dropbehind/DONTCACHE write.
>>
>> Check if the folio being added is marked as dropbehind, and set
>> IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into
>> the decision making of completion context in xfs_submit_ioend().
>> Additionally include this ioend flag in the NOMERGE flags, to avoid
>> mixing it with unrelated IO.
>>
>> This fixes extra page cache being instantiated when the write performed
>> is an overwrite, rather than newly instantiated blocks.
>>
>> Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE")
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> Found this one while testing the unrelated issue of invalidation being a
>> bit broken before 6.15 release. We need this to ensure that overwrites
>> also prune correctly, just like unwritten extents currently do.
>
> I guess I did report this to you a while ago when I was adding support
> for uncahed buffered-io to xfs_io. But I never heard back from you :(
>
> https://lore.kernel.org/all/87h649trof.fsf@gmail.com/
>
> No worries, good that we finally have this fixed.
Sorry not sure how I missed that, to be fair there were a lot of emails
in various threads on this topic back at that time. Not trying to make
excuses... Yes, at least the "would otherwise have been !task context"
issue is resolved.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-06-02 18:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-27 15:43 [PATCH] iomap: don't lose folio dropbehind state for overwrites Jens Axboe
2025-05-27 21:12 ` Dave Chinner
2025-05-27 21:17 ` Jens Axboe
2025-06-02 17:46 ` Ritesh Harjani
2025-06-02 18:04 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).