public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Patch "btrfs: allow buffered write to avoid full page read if it's block aligned" has been added to the 6.14-stable tree
       [not found] <20250522210549.3118269-1-sashal@kernel.org>
@ 2025-05-22 21:21 ` Qu Wenruo
  2025-05-23  4:52   ` Greg KH
  0 siblings, 1 reply; 2+ messages in thread
From: Qu Wenruo @ 2025-05-22 21:21 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Chris Mason, Josef Bacik, David Sterba



在 2025/5/23 06:35, Sasha Levin 写道:
> This is a note to let you know that I've just added the patch titled
> 
>      btrfs: allow buffered write to avoid full page read if it's block aligned
> 
> to the 6.14-stable tree which can be found at:
>      http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> 
> The filename of the patch is:
>       btrfs-allow-buffered-write-to-avoid-full-page-read-i.patch
> and it can be found in the queue-6.14 subdirectory.
> 
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable@vger.kernel.org> know about it.

Please drop this patch from all stable branches.

Although this patch mentions a failure in fstests, it acts more like an 
optimization for btrfs.

Furthermore it relies quite some patches that may not be in stable kernels.

Without all the dependency, this can lead to data corruption.

Please drop this one from all stable kernels.

Thanks,
Qu

> 
> 
> 
> commit de0860d610aaaee77a8c5c713c41fea584ac83b3
> Author: Qu Wenruo <wqu@suse.com>
> Date:   Wed Oct 30 17:04:02 2024 +1030
> 
>      btrfs: allow buffered write to avoid full page read if it's block aligned
>      
>      [ Upstream commit 0d31ca6584f21821c708752d379871b9fce2dc48 ]
>      
>      [BUG]
>      Since the support of block size (sector size) < page size for btrfs,
>      test case generic/563 fails with 4K block size and 64K page size:
>      
>        --- tests/generic/563.out     2024-04-25 18:13:45.178550333 +0930
>        +++ /home/adam/xfstests-dev/results//generic/563.out.bad      2024-09-30 09:09:16.155312379 +0930
>        @@ -3,7 +3,8 @@
>         read is in range
>         write is in range
>         write -> read/write
>        -read is in range
>        +read has value of 8388608
>        +read is NOT in range -33792 .. 33792
>         write is in range
>        ...
>      
>      [CAUSE]
>      The test case creates a 8MiB file, then does buffered write into the 8MiB
>      using 4K block size, to overwrite the whole file.
>      
>      On 4K page sized systems, since the write range covers the full block and
>      page, btrfs will not bother reading the page, just like what XFS and EXT4
>      do.
>      
>      But on 64K page sized systems, although the 4K sized write is still block
>      aligned, it's not page aligned anymore, thus btrfs will read the full
>      page, which will be accounted by cgroup and fail the test.
>      
>      As the test case itself expects such 4K block aligned write should not
>      trigger any read.
>      
>      Such expected behavior is an optimization to reduce folio reads when
>      possible, and unfortunately btrfs does not implement such optimization.
>      
>      [FIX]
>      To skip the full page read, we need to do the following modification:
>      
>      - Do not trigger full page read as long as the buffered write is block
>        aligned
>        This is pretty simple by modifying the check inside
>        prepare_uptodate_page().
>      
>      - Skip already uptodate blocks during full page read
>        Or we can lead to the following data corruption:
>      
>        0       32K        64K
>        |///////|          |
>      
>        Where the file range [0, 32K) is dirtied by buffered write, the
>        remaining range [32K, 64K) is not.
>      
>        When reading the full page, since [0,32K) is only dirtied but not
>        written back, there is no data extent map for it, but a hole covering
>        [0, 64k).
>      
>        If we continue reading the full page range [0, 64K), the dirtied range
>        will be filled with 0 (since there is only a hole covering the whole
>        range).
>        This causes the dirtied range to get lost.
>      
>      With this optimization, btrfs can pass generic/563 even if the page size
>      is larger than fs block size.
>      
>      Reviewed-by: Filipe Manana <fdmanana@suse.com>
>      Signed-off-by: Qu Wenruo <wqu@suse.com>
>      Signed-off-by: David Sterba <dsterba@suse.com>
>      Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 06922529f19dc..13b5359ea1b77 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -974,6 +974,10 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
>   			end_folio_read(folio, true, cur, iosize);
>   			break;
>   		}
> +		if (btrfs_folio_test_uptodate(fs_info, folio, cur, blocksize)) {
> +			end_folio_read(folio, true, cur, blocksize);
> +			continue;
> +		}
>   		em = get_extent_map(BTRFS_I(inode), folio, cur, end - cur + 1, em_cached);
>   		if (IS_ERR(em)) {
>   			end_folio_read(folio, false, cur, end + 1 - cur);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index cd4e40a719186..61ad1a79e5698 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -804,14 +804,15 @@ static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64
>   {
>   	u64 clamp_start = max_t(u64, pos, folio_pos(folio));
>   	u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio));
> +	const u32 blocksize = inode_to_fs_info(inode)->sectorsize;
>   	int ret = 0;
>   
>   	if (folio_test_uptodate(folio))
>   		return 0;
>   
>   	if (!force_uptodate &&
> -	    IS_ALIGNED(clamp_start, PAGE_SIZE) &&
> -	    IS_ALIGNED(clamp_end, PAGE_SIZE))
> +	    IS_ALIGNED(clamp_start, blocksize) &&
> +	    IS_ALIGNED(clamp_end, blocksize))
>   		return 0;
>   
>   	ret = btrfs_read_folio(NULL, folio);


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Patch "btrfs: allow buffered write to avoid full page read if it's block aligned" has been added to the 6.14-stable tree
  2025-05-22 21:21 ` Patch "btrfs: allow buffered write to avoid full page read if it's block aligned" has been added to the 6.14-stable tree Qu Wenruo
@ 2025-05-23  4:52   ` Greg KH
  0 siblings, 0 replies; 2+ messages in thread
From: Greg KH @ 2025-05-23  4:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: stable, stable-commits, Chris Mason, Josef Bacik, David Sterba

On Fri, May 23, 2025 at 06:51:18AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2025/5/23 06:35, Sasha Levin 写道:
> > This is a note to let you know that I've just added the patch titled
> > 
> >      btrfs: allow buffered write to avoid full page read if it's block aligned
> > 
> > to the 6.14-stable tree which can be found at:
> >      http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> > 
> > The filename of the patch is:
> >       btrfs-allow-buffered-write-to-avoid-full-page-read-i.patch
> > and it can be found in the queue-6.14 subdirectory.
> > 
> > If you, or anyone else, feels it should not be added to the stable tree,
> > please let <stable@vger.kernel.org> know about it.
> 
> Please drop this patch from all stable branches.
> 
> Although this patch mentions a failure in fstests, it acts more like an
> optimization for btrfs.
> 
> Furthermore it relies quite some patches that may not be in stable kernels.
> 
> Without all the dependency, this can lead to data corruption.
> 
> Please drop this one from all stable kernels.

Now dropped, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-05-23  4:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250522210549.3118269-1-sashal@kernel.org>
2025-05-22 21:21 ` Patch "btrfs: allow buffered write to avoid full page read if it's block aligned" has been added to the 6.14-stable tree Qu Wenruo
2025-05-23  4:52   ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox