Linux filesystem development
 help / color / mirror / Atom feed
* [PATCH] mm: fix pagecache_isize_extended() early-return bypass for large folio mappings
       [not found] <20240919160741.208162-3-bfoster@redhat.com>
@ 2026-02-26 13:31 ` Morduan Zang
  2026-02-26 20:22   ` Brian Foster
  0 siblings, 1 reply; 2+ messages in thread
From: Morduan Zang @ 2026-02-26 13:31 UTC (permalink / raw)
  To: bfoster; +Cc: linux-ext4, linux-fsdevel, linux-mm, willy, Morduan Zang

pagecache_isize_extended() has two early-return guards that were designed
for the traditional sub-page block-size case:

  Guard 1:  if (from >= to || bsize >= PAGE_SIZE)
                return;

  Guard 2:  rounded_from = round_up(from, bsize);
            if (to <= rounded_from || !(rounded_from & (PAGE_SIZE - 1)))
                return;

Guard 1 was originally "bsize == PAGE_SIZE" and was widened to
"bsize >= PAGE_SIZE" by commit 2ebe90dab980 ("mm: convert
pagecache_isize_extended to use a folio").  The rationale is correct
for the traditional buffer_head path: when the block size equals the page
size, every folio covers exactly one block, so writeback's EOF handling
(e.g. iomap_writepage_handle_eof()) zeros the post-EOF tail of the folio
before writing it out, and no action is needed here.

Guard 2 covers the case where @from rounded up to the next block boundary
is already PAGE_SIZE-aligned, meaning no hole block straddles a page
boundary.

Both guards are correct for the traditional case.  However, commit
52aecaee1c26 ("mm: zero range of eof folio exposed by inode size extension")
added post-EOF zeroing inside pagecache_isize_extended() to
handle dirty folios that will not go through writeback before the new
i_size becomes visible.  That zeroing code is placed after both guards,
so it is unreachable whenever either guard fires.

The same stale-data window is also covered by xfstests generic/363
which uses fsx with "-e 1" (EOF pollution mode) and exercises a broad
range of size-changing operations.

Fixes: 52aecaee1c26 ("mm: zero range of eof folio exposed by inode size extension")
Fixes: 2ebe90dab980 ("mm: convert pagecache_isize_extended to use a folio")
Signed-off-by: Morduan Zang <zhangdandan@uniontech.com>
---
 mm/truncate.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 12467c1bd711..d3e473a206b3 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -847,13 +847,32 @@ void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to)
 
 	WARN_ON(to > inode->i_size);
 
-	if (from >= to || bsize >= PAGE_SIZE)
+	if (from >= to)
 		return;
+
+	/*
+	 * For filesystems with bsize >= PAGE_SIZE, the traditional buffer_head
+	 * path handles post-EOF zeroing correctly at writeback time. However,
+	 * with large folios enabled, a single folio can span multiple PAGE_SIZE
+	 * blocks, so mmap writes beyond EOF within the same folio are not zeroed
+	 * at writeback time before i_size is extended. We must handle this here.
+	 */
+	if (bsize >= PAGE_SIZE) {
+		/*
+		 * Only needed if the mapping supports large folios, since otherwise
+		 * each folio is exactly one page and writeback handles EOF zeroing.
+		 */
+		if (!mapping_large_folio_support(inode->i_mapping))
+			return;
+		goto find_folio;
+	}
+
 	/* Page straddling @from will not have any hole block created? */
 	rounded_from = round_up(from, bsize);
 	if (to <= rounded_from || !(rounded_from & (PAGE_SIZE - 1)))
 		return;
 
+find_folio:
 	folio = filemap_lock_folio(inode->i_mapping, from / PAGE_SIZE);
 	/* Folio not cached? Nothing to do */
 	if (IS_ERR(folio))
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] mm: fix pagecache_isize_extended() early-return bypass for large folio mappings
  2026-02-26 13:31 ` [PATCH] mm: fix pagecache_isize_extended() early-return bypass for large folio mappings Morduan Zang
@ 2026-02-26 20:22   ` Brian Foster
  0 siblings, 0 replies; 2+ messages in thread
From: Brian Foster @ 2026-02-26 20:22 UTC (permalink / raw)
  To: Morduan Zang; +Cc: linux-ext4, linux-fsdevel, linux-mm, willy

On Thu, Feb 26, 2026 at 09:31:49PM +0800, Morduan Zang wrote:
> pagecache_isize_extended() has two early-return guards that were designed
> for the traditional sub-page block-size case:
> 
>   Guard 1:  if (from >= to || bsize >= PAGE_SIZE)
>                 return;
> 
>   Guard 2:  rounded_from = round_up(from, bsize);
>             if (to <= rounded_from || !(rounded_from & (PAGE_SIZE - 1)))
>                 return;
> 
> Guard 1 was originally "bsize == PAGE_SIZE" and was widened to
> "bsize >= PAGE_SIZE" by commit 2ebe90dab980 ("mm: convert
> pagecache_isize_extended to use a folio").  The rationale is correct
> for the traditional buffer_head path: when the block size equals the page
> size, every folio covers exactly one block, so writeback's EOF handling
> (e.g. iomap_writepage_handle_eof()) zeros the post-EOF tail of the folio
> before writing it out, and no action is needed here.
> 
> Guard 2 covers the case where @from rounded up to the next block boundary
> is already PAGE_SIZE-aligned, meaning no hole block straddles a page
> boundary.
> 
> Both guards are correct for the traditional case.  However, commit
> 52aecaee1c26 ("mm: zero range of eof folio exposed by inode size extension")
> added post-EOF zeroing inside pagecache_isize_extended() to
> handle dirty folios that will not go through writeback before the new
> i_size becomes visible.  That zeroing code is placed after both guards,
> so it is unreachable whenever either guard fires.
> 
> The same stale-data window is also covered by xfstests generic/363
> which uses fsx with "-e 1" (EOF pollution mode) and exercises a broad
> range of size-changing operations.
> 

Hi Morduan,

So looking back at the original cover letter for this, this bit was for
the case where we had a dirty folio in pagecache that might be partially
hole backed due to eof, therefore fs zeroing might not occur.  Hence we
do the page zeroing here before exposing this range to the file (i.e.
that writeback would have done if the folio were clean).

I thought at the time this plus the ext4 patch covered the bases for
generic/363 on ext4. You refer to this test above but don't mention if
it fails. Do you reproduce a failure with that test, or is this
something discovered by inspection?

> Fixes: 52aecaee1c26 ("mm: zero range of eof folio exposed by inode size extension")
> Fixes: 2ebe90dab980 ("mm: convert pagecache_isize_extended to use a folio")
> Signed-off-by: Morduan Zang <zhangdandan@uniontech.com>
> ---
>  mm/truncate.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 12467c1bd711..d3e473a206b3 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -847,13 +847,32 @@ void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to)
>  
>  	WARN_ON(to > inode->i_size);
>  
> -	if (from >= to || bsize >= PAGE_SIZE)
> +	if (from >= to)
>  		return;
> +
> +	/*
> +	 * For filesystems with bsize >= PAGE_SIZE, the traditional buffer_head
> +	 * path handles post-EOF zeroing correctly at writeback time. However,
> +	 * with large folios enabled, a single folio can span multiple PAGE_SIZE
> +	 * blocks, so mmap writes beyond EOF within the same folio are not zeroed
> +	 * at writeback time before i_size is extended. We must handle this here.
> +	 */
> +	if (bsize >= PAGE_SIZE) {
> +		/*
> +		 * Only needed if the mapping supports large folios, since otherwise
> +		 * each folio is exactly one page and writeback handles EOF zeroing.
> +		 */
> +		if (!mapping_large_folio_support(inode->i_mapping))
> +			return;

Is there currently a case for bsize >= PAGE_SIZE &&
!mapping_large_folio_support()? I thought there was a WIP for
multi-block folios, but I wasn't sure if that actually worked anywhere.

> +		goto find_folio;
> +	}
> +
>  	/* Page straddling @from will not have any hole block created? */
>  	rounded_from = round_up(from, bsize);
>  	if (to <= rounded_from || !(rounded_from & (PAGE_SIZE - 1)))
>  		return;
>  

If I understood this code correctly (and I very well may not), the
purpose of this is to basically filter out cases where a dirty eof folio
doesn't require a refault after the size update for the fs to fully
populate it with blocks. If that is the case, this makes me wonder if
perhaps this check should remain, but instead use folio_size() of the
eof folio (if one exists)..?

My understanding at one point was that we wouldn't have large eof folios
that included a page aligned offset beyond eof, but I also feel like
I've run into that once or twice when dealing with some other oddball fs
related issues, so I'm not really clear on what the expected behavior is
supposed to be there. Maybe it's a corner case (i.e. related to split
failure or some such)..? That is probably a question for Willy..

Brian

> +find_folio:
>  	folio = filemap_lock_folio(inode->i_mapping, from / PAGE_SIZE);
>  	/* Folio not cached? Nothing to do */
>  	if (IS_ERR(folio))
> -- 
> 2.50.1
> 
> 


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-02-26 20:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240919160741.208162-3-bfoster@redhat.com>
2026-02-26 13:31 ` [PATCH] mm: fix pagecache_isize_extended() early-return bypass for large folio mappings Morduan Zang
2026-02-26 20:22   ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox