public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] btrfs: prepare for larger folios support
@ 2025-03-10  7:35 Qu Wenruo
  2025-03-10  7:35 ` [PATCH 1/6] btrfs: subpage: make btrfs_is_subpage() check against a folio Qu Wenruo
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Qu Wenruo @ 2025-03-10  7:35 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Split the subpage.[ch] modification into 3 patches
- Rebased the latest for-next branch
  Now all dependency are in for-next.

This means:

- Our subpage routine should check against the folio size other than
  PAGE_SIZE

- Make functions handling filemap folios to use folio_size() other than
  PAGE_SIZE

  The most common paths are:
  * Buffered reads/writes
  * Uncompressed folio writeback
    Already handled pretty well

  * Compressed read
  * Compressed write
    To take full advantage of larger folios, we should use folio_iter
    other than bvec_iter.
    This will be a dedicated patchset, and the existing bvec_iter can
    still handle larger folios.

  Internal usages can still use page sized folios, or even pages,
  including:
  * Encoded reads/writes
  * Compressed folios
  * RAID56 internal pages
  * Scrub internal pages

This patchset will handle the above mentioned points by:

- Prepare the subpage routine to handle larger folios
  This will introduce a small overhead, as all checks are against folio
  sizes, even on x86_64 we can no longer skip subpage completely.

  This is done in the first patch.

- Convert straightforward PAGE_SIZE users to use folio_size()
  This is done in the remaining patches.

Currently this patchset is not a exhaustive conversion, I'm pretty sure
there are other complex situations which can cause problems.
Those problems can only be exposed and fixed after switching on the
experimental larger folios support later.

Qu Wenruo (6):
  btrfs: subpage: make btrfs_is_subpage() check against a folio
  btrfs: add a @fsize parameter to btrfs_alloc_subpage()
  btrfs: replace PAGE_SIZE with folio_size for subpage.[ch]
  btrfs: prepare btrfs_launcher_folio() for larger folios support
  btrfs: prepare extent_io.c for future larger folio support
  btrfs: prepare btrfs_page_mkwrite() for larger folios

 fs/btrfs/extent_io.c | 49 ++++++++++++++++++++++++--------------------
 fs/btrfs/file.c      | 19 +++++++++--------
 fs/btrfs/inode.c     |  4 ++--
 fs/btrfs/subpage.c   | 38 +++++++++++++++++-----------------
 fs/btrfs/subpage.h   | 16 +++++++--------
 5 files changed, 66 insertions(+), 60 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] btrfs: subpage: make btrfs_is_subpage() check against a folio
  2025-03-10  7:35 [PATCH v2 0/6] btrfs: prepare for larger folios support Qu Wenruo
@ 2025-03-10  7:35 ` Qu Wenruo
  2025-03-10  7:35 ` [PATCH 2/6] btrfs: add a @fsize parameter to btrfs_alloc_subpage() Qu Wenruo
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2025-03-10  7:35 UTC (permalink / raw)
  To: linux-btrfs

To support larger data folios, we can no longer assume every filemap
folio is page sized.

So btrfs_is_subpage() check must be done against a folio.

Thankfully for metadata folios, we have the full control and ensure a
larger folio will not be larger than nodesize, so
btrfs_meta_is_subpage() doesn't need this change.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 14 +++++++-------
 fs/btrfs/inode.c     |  2 +-
 fs/btrfs/subpage.c   | 24 ++++++++++++------------
 fs/btrfs/subpage.h   | 12 ++++++------
 4 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a78ff093ea37..d2a7472f28b6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -432,7 +432,7 @@ static void end_folio_read(struct folio *folio, bool uptodate, u64 start, u32 le
 	else
 		btrfs_folio_clear_uptodate(fs_info, folio, start, len);
 
-	if (!btrfs_is_subpage(fs_info, folio->mapping))
+	if (!btrfs_is_subpage(fs_info, folio))
 		folio_unlock(folio);
 	else
 		btrfs_folio_end_lock(fs_info, folio, start, len);
@@ -488,7 +488,7 @@ static void end_bbio_data_write(struct btrfs_bio *bbio)
 static void begin_folio_read(struct btrfs_fs_info *fs_info, struct folio *folio)
 {
 	ASSERT(folio_test_locked(folio));
-	if (!btrfs_is_subpage(fs_info, folio->mapping))
+	if (!btrfs_is_subpage(fs_info, folio))
 		return;
 
 	ASSERT(folio_test_private(folio));
@@ -870,7 +870,7 @@ int set_folio_extent_mapped(struct folio *folio)
 
 	fs_info = folio_to_fs_info(folio);
 
-	if (btrfs_is_subpage(fs_info, folio->mapping))
+	if (btrfs_is_subpage(fs_info, folio))
 		return btrfs_attach_subpage(fs_info, folio, BTRFS_SUBPAGE_DATA);
 
 	folio_attach_private(folio, (void *)EXTENT_FOLIO_PRIVATE);
@@ -887,7 +887,7 @@ void clear_folio_extent_mapped(struct folio *folio)
 		return;
 
 	fs_info = folio_to_fs_info(folio);
-	if (btrfs_is_subpage(fs_info, folio->mapping))
+	if (btrfs_is_subpage(fs_info, folio))
 		return btrfs_detach_subpage(fs_info, folio, BTRFS_SUBPAGE_DATA);
 
 	folio_detach_private(folio);
@@ -1331,7 +1331,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 {
 	struct btrfs_fs_info *fs_info = inode_to_fs_info(&inode->vfs_inode);
 	struct writeback_control *wbc = bio_ctrl->wbc;
-	const bool is_subpage = btrfs_is_subpage(fs_info, folio->mapping);
+	const bool is_subpage = btrfs_is_subpage(fs_info, folio);
 	const u64 page_start = folio_pos(folio);
 	const u64 page_end = page_start + folio_size(folio) - 1;
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
@@ -1359,7 +1359,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	int bit;
 
 	/* Save the dirty bitmap as our submission bitmap will be a subset of it. */
-	if (btrfs_is_subpage(fs_info, inode->vfs_inode.i_mapping)) {
+	if (btrfs_is_subpage(fs_info, folio)) {
 		ASSERT(blocks_per_folio > 1);
 		btrfs_get_subpage_dirty_bitmap(fs_info, folio, &bio_ctrl->submit_bitmap);
 	} else {
@@ -2411,7 +2411,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			 * regular submission.
 			 */
 			if (wbc->sync_mode != WB_SYNC_NONE ||
-			    btrfs_is_subpage(inode_to_fs_info(inode), mapping)) {
+			    btrfs_is_subpage(inode_to_fs_info(inode), folio)) {
 				if (folio_test_writeback(folio))
 					submit_write_bio(bio_ctrl, 0);
 				folio_wait_writeback(folio);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e7e6accbaf6c..1af72f77f820 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7283,7 +7283,7 @@ static void wait_subpage_spinlock(struct folio *folio)
 	struct btrfs_fs_info *fs_info = folio_to_fs_info(folio);
 	struct btrfs_subpage *subpage;
 
-	if (!btrfs_is_subpage(fs_info, folio->mapping))
+	if (!btrfs_is_subpage(fs_info, folio))
 		return;
 
 	ASSERT(folio_test_private(folio) && folio_get_private(folio));
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index d7e70525f4fb..1392b4eaa1f9 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -83,7 +83,7 @@ int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
 		return 0;
 	if (type == BTRFS_SUBPAGE_METADATA && !btrfs_meta_is_subpage(fs_info))
 		return 0;
-	if (type == BTRFS_SUBPAGE_DATA && !btrfs_is_subpage(fs_info, folio->mapping))
+	if (type == BTRFS_SUBPAGE_DATA && !btrfs_is_subpage(fs_info, folio))
 		return 0;
 
 	subpage = btrfs_alloc_subpage(fs_info, type);
@@ -104,7 +104,7 @@ void btrfs_detach_subpage(const struct btrfs_fs_info *fs_info, struct folio *fol
 		return;
 	if (type == BTRFS_SUBPAGE_METADATA && !btrfs_meta_is_subpage(fs_info))
 		return;
-	if (type == BTRFS_SUBPAGE_DATA && !btrfs_is_subpage(fs_info, folio->mapping))
+	if (type == BTRFS_SUBPAGE_DATA && !btrfs_is_subpage(fs_info, folio))
 		return;
 
 	subpage = folio_detach_private(folio);
@@ -286,7 +286,7 @@ void btrfs_folio_end_lock(const struct btrfs_fs_info *fs_info,
 
 	ASSERT(folio_test_locked(folio));
 
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio->mapping)) {
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio)) {
 		folio_unlock(folio);
 		return;
 	}
@@ -320,7 +320,7 @@ void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info,
 	int cleared = 0;
 	int bit;
 
-	if (!btrfs_is_subpage(fs_info, folio->mapping)) {
+	if (!btrfs_is_subpage(fs_info, folio)) {
 		folio_unlock(folio);
 		return;
 	}
@@ -572,7 +572,7 @@ void btrfs_folio_set_##name(const struct btrfs_fs_info *fs_info,	\
 			    struct folio *folio, u64 start, u32 len)	\
 {									\
 	if (unlikely(!fs_info) ||					\
-	    !btrfs_is_subpage(fs_info, folio->mapping)) {		\
+	    !btrfs_is_subpage(fs_info, folio)) {			\
 		folio_set_func(folio);					\
 		return;							\
 	}								\
@@ -582,7 +582,7 @@ void btrfs_folio_clear_##name(const struct btrfs_fs_info *fs_info,	\
 			      struct folio *folio, u64 start, u32 len)	\
 {									\
 	if (unlikely(!fs_info) ||					\
-	    !btrfs_is_subpage(fs_info, folio->mapping)) {		\
+	    !btrfs_is_subpage(fs_info, folio)) {			\
 		folio_clear_func(folio);				\
 		return;							\
 	}								\
@@ -592,7 +592,7 @@ bool btrfs_folio_test_##name(const struct btrfs_fs_info *fs_info,	\
 			     struct folio *folio, u64 start, u32 len)	\
 {									\
 	if (unlikely(!fs_info) ||					\
-	    !btrfs_is_subpage(fs_info, folio->mapping))			\
+	    !btrfs_is_subpage(fs_info, folio))				\
 		return folio_test_func(folio);				\
 	return btrfs_subpage_test_##name(fs_info, folio, start, len);	\
 }									\
@@ -600,7 +600,7 @@ void btrfs_folio_clamp_set_##name(const struct btrfs_fs_info *fs_info,	\
 				  struct folio *folio, u64 start, u32 len) \
 {									\
 	if (unlikely(!fs_info) ||					\
-	    !btrfs_is_subpage(fs_info, folio->mapping)) {		\
+	    !btrfs_is_subpage(fs_info, folio)) {			\
 		folio_set_func(folio);					\
 		return;							\
 	}								\
@@ -611,7 +611,7 @@ void btrfs_folio_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \
 				    struct folio *folio, u64 start, u32 len) \
 {									\
 	if (unlikely(!fs_info) ||					\
-	    !btrfs_is_subpage(fs_info, folio->mapping)) {		\
+	    !btrfs_is_subpage(fs_info, folio)) {			\
 		folio_clear_func(folio);				\
 		return;							\
 	}								\
@@ -622,7 +622,7 @@ bool btrfs_folio_clamp_test_##name(const struct btrfs_fs_info *fs_info,	\
 				   struct folio *folio, u64 start, u32 len) \
 {									\
 	if (unlikely(!fs_info) ||					\
-	    !btrfs_is_subpage(fs_info, folio->mapping))			\
+	    !btrfs_is_subpage(fs_info, folio))				\
 		return folio_test_func(folio);				\
 	btrfs_subpage_clamp_range(folio, &start, &len);			\
 	return btrfs_subpage_test_##name(fs_info, folio, start, len);	\
@@ -700,7 +700,7 @@ void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info,
 	if (!IS_ENABLED(CONFIG_BTRFS_ASSERT))
 		return;
 
-	if (!btrfs_is_subpage(fs_info, folio->mapping)) {
+	if (!btrfs_is_subpage(fs_info, folio)) {
 		ASSERT(!folio_test_dirty(folio));
 		return;
 	}
@@ -735,7 +735,7 @@ void btrfs_folio_set_lock(const struct btrfs_fs_info *fs_info,
 	int ret;
 
 	ASSERT(folio_test_locked(folio));
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio->mapping))
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio))
 		return;
 
 	subpage = folio_get_private(folio);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 9d1ad6c7c6bd..baa23258e7fa 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -85,10 +85,10 @@ static inline bool btrfs_meta_is_subpage(const struct btrfs_fs_info *fs_info)
 	return fs_info->nodesize < PAGE_SIZE;
 }
 static inline bool btrfs_is_subpage(const struct btrfs_fs_info *fs_info,
-				    struct address_space *mapping)
+				    struct folio *folio)
 {
-	if (mapping && mapping->host)
-		ASSERT(is_data_inode(BTRFS_I(mapping->host)));
+	if (folio->mapping && folio->mapping->host)
+		ASSERT(is_data_inode(BTRFS_I(folio->mapping->host)));
 	return fs_info->sectorsize < PAGE_SIZE;
 }
 #else
@@ -97,10 +97,10 @@ static inline bool btrfs_meta_is_subpage(const struct btrfs_fs_info *fs_info)
 	return false;
 }
 static inline bool btrfs_is_subpage(const struct btrfs_fs_info *fs_info,
-				    struct address_space *mapping)
+				    struct folio *folio)
 {
-	if (mapping && mapping->host)
-		ASSERT(is_data_inode(BTRFS_I(mapping->host)));
+	if (folio->mapping && folio->mapping->host)
+		ASSERT(is_data_inode(BTRFS_I(folio->mapping->host)));
 	return false;
 }
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] btrfs: add a @fsize parameter to btrfs_alloc_subpage()
  2025-03-10  7:35 [PATCH v2 0/6] btrfs: prepare for larger folios support Qu Wenruo
  2025-03-10  7:35 ` [PATCH 1/6] btrfs: subpage: make btrfs_is_subpage() check against a folio Qu Wenruo
@ 2025-03-10  7:35 ` Qu Wenruo
  2025-03-10  7:35 ` [PATCH 3/6] btrfs: replace PAGE_SIZE with folio_size for subpage.[ch] Qu Wenruo
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2025-03-10  7:35 UTC (permalink / raw)
  To: linux-btrfs

Since we can no longer assume page sized folio for data filemap folios,
allow btrfs_alloc_subpage() to accept a new parameter, @fsize,
indicating the folio size.

This doesn't follow the regular behavior of passing a folio directly,
because this function is shared by both data and metadata folios, and
for metadata folios we have extra allocation policy to ensure no larger
folios whose sizes are larger than nodesize (unless it's page sized).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 2 +-
 fs/btrfs/subpage.c   | 8 ++++----
 fs/btrfs/subpage.h   | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d2a7472f28b6..337d2bed98d9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3257,7 +3257,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	 * manually if we exit earlier.
 	 */
 	if (btrfs_meta_is_subpage(fs_info)) {
-		prealloc = btrfs_alloc_subpage(fs_info, BTRFS_SUBPAGE_METADATA);
+		prealloc = btrfs_alloc_subpage(fs_info, PAGE_SIZE, BTRFS_SUBPAGE_METADATA);
 		if (IS_ERR(prealloc)) {
 			ret = PTR_ERR(prealloc);
 			goto out;
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 1392b4eaa1f9..6e776c3bd873 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -86,7 +86,7 @@ int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
 	if (type == BTRFS_SUBPAGE_DATA && !btrfs_is_subpage(fs_info, folio))
 		return 0;
 
-	subpage = btrfs_alloc_subpage(fs_info, type);
+	subpage = btrfs_alloc_subpage(fs_info, folio_size(folio), type);
 	if (IS_ERR(subpage))
 		return  PTR_ERR(subpage);
 
@@ -113,16 +113,16 @@ void btrfs_detach_subpage(const struct btrfs_fs_info *fs_info, struct folio *fol
 }
 
 struct btrfs_subpage *btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info,
-					  enum btrfs_subpage_type type)
+				size_t fsize, enum btrfs_subpage_type type)
 {
 	struct btrfs_subpage *ret;
 	unsigned int real_size;
 
-	ASSERT(fs_info->sectorsize < PAGE_SIZE);
+	ASSERT(fs_info->sectorsize < fsize);
 
 	real_size = struct_size(ret, bitmaps,
 			BITS_TO_LONGS(btrfs_bitmap_nr_max *
-				      (PAGE_SIZE >> fs_info->sectorsize_bits)));
+				      (fsize >> fs_info->sectorsize_bits)));
 	ret = kzalloc(real_size, GFP_NOFS);
 	if (!ret)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index baa23258e7fa..083390e76d13 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -112,7 +112,7 @@ void btrfs_detach_subpage(const struct btrfs_fs_info *fs_info, struct folio *fol
 
 /* Allocate additional data where page represents more than one sector */
 struct btrfs_subpage *btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info,
-					  enum btrfs_subpage_type type);
+				size_t fsize, enum btrfs_subpage_type type);
 void btrfs_free_subpage(struct btrfs_subpage *subpage);
 
 void btrfs_folio_inc_eb_refs(const struct btrfs_fs_info *fs_info, struct folio *folio);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] btrfs: replace PAGE_SIZE with folio_size for subpage.[ch]
  2025-03-10  7:35 [PATCH v2 0/6] btrfs: prepare for larger folios support Qu Wenruo
  2025-03-10  7:35 ` [PATCH 1/6] btrfs: subpage: make btrfs_is_subpage() check against a folio Qu Wenruo
  2025-03-10  7:35 ` [PATCH 2/6] btrfs: add a @fsize parameter to btrfs_alloc_subpage() Qu Wenruo
@ 2025-03-10  7:35 ` Qu Wenruo
  2025-03-10  7:36 ` [PATCH 4/6] btrfs: prepare btrfs_launcher_folio() for larger folios support Qu Wenruo
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2025-03-10  7:35 UTC (permalink / raw)
  To: linux-btrfs

Since we can no longer assume all data filemap folios are page sized,
use proper folio_size() calls to determine the folio size, as a
preparation for future larger data filemap folios.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 6 +++---
 fs/btrfs/subpage.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 6e776c3bd873..834161f35a00 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -6,7 +6,7 @@
 #include "btrfs_inode.h"
 
 /*
- * Subpage (sectorsize < PAGE_SIZE) support overview:
+ * Subpage (block size < folio size) support overview:
  *
  * Limitations:
  *
@@ -194,7 +194,7 @@ static void btrfs_subpage_assert(const struct btrfs_fs_info *fs_info,
 	 */
 	if (folio->mapping)
 		ASSERT(folio_pos(folio) <= start &&
-		       start + len <= folio_pos(folio) + PAGE_SIZE);
+		       start + len <= folio_pos(folio) + folio_size(folio));
 }
 
 #define subpage_calc_start_bit(fs_info, folio, name, start, len)	\
@@ -223,7 +223,7 @@ static void btrfs_subpage_clamp_range(struct folio *folio, u64 *start, u32 *len)
 	if (folio_pos(folio) >= orig_start + orig_len)
 		*len = 0;
 	else
-		*len = min_t(u64, folio_pos(folio) + PAGE_SIZE,
+		*len = min_t(u64, folio_pos(folio) + folio_size(folio),
 			     orig_start + orig_len) - *start;
 }
 
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 083390e76d13..3042c5ea840a 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -89,7 +89,7 @@ static inline bool btrfs_is_subpage(const struct btrfs_fs_info *fs_info,
 {
 	if (folio->mapping && folio->mapping->host)
 		ASSERT(is_data_inode(BTRFS_I(folio->mapping->host)));
-	return fs_info->sectorsize < PAGE_SIZE;
+	return fs_info->sectorsize < folio_size(folio);
 }
 #else
 static inline bool btrfs_meta_is_subpage(const struct btrfs_fs_info *fs_info)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] btrfs: prepare btrfs_launcher_folio() for larger folios support
  2025-03-10  7:35 [PATCH v2 0/6] btrfs: prepare for larger folios support Qu Wenruo
                   ` (2 preceding siblings ...)
  2025-03-10  7:35 ` [PATCH 3/6] btrfs: replace PAGE_SIZE with folio_size for subpage.[ch] Qu Wenruo
@ 2025-03-10  7:36 ` Qu Wenruo
  2025-03-10  7:36 ` [PATCH 5/6] btrfs: prepare extent_io.c for future larger folio support Qu Wenruo
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2025-03-10  7:36 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn

That function is only calling btrfs_qgroup_free_data(), which doesn't
care about the size of the folio.

Just replace the fixed PAGE_SIZE with folio_size().

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1af72f77f820..f2ced3f7b112 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7307,7 +7307,7 @@ static void wait_subpage_spinlock(struct folio *folio)
 static int btrfs_launder_folio(struct folio *folio)
 {
 	return btrfs_qgroup_free_data(folio_to_inode(folio), NULL, folio_pos(folio),
-				      PAGE_SIZE, NULL);
+				      folio_size(folio), NULL);
 }
 
 static bool __btrfs_release_folio(struct folio *folio, gfp_t gfp_flags)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] btrfs: prepare extent_io.c for future larger folio support
  2025-03-10  7:35 [PATCH v2 0/6] btrfs: prepare for larger folios support Qu Wenruo
                   ` (3 preceding siblings ...)
  2025-03-10  7:36 ` [PATCH 4/6] btrfs: prepare btrfs_launcher_folio() for larger folios support Qu Wenruo
@ 2025-03-10  7:36 ` Qu Wenruo
  2025-03-10 23:27   ` Boris Burkov
  2025-03-10  7:36 ` [PATCH 6/6] btrfs: prepare btrfs_page_mkwrite() for larger folios Qu Wenruo
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2025-03-10  7:36 UTC (permalink / raw)
  To: linux-btrfs

When we're handling folios from filemap, we can no longer assume all
folios are page sized.

Thus for call sites assuming the folio is page sized, change the
PAGE_SIZE usage to folio_size() instead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 337d2bed98d9..337908f09b88 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -425,7 +425,7 @@ static void end_folio_read(struct folio *folio, bool uptodate, u64 start, u32 le
 	struct btrfs_fs_info *fs_info = folio_to_fs_info(folio);
 
 	ASSERT(folio_pos(folio) <= start &&
-	       start + len <= folio_pos(folio) + PAGE_SIZE);
+	       start + len <= folio_pos(folio) + folio_size(folio));
 
 	if (uptodate && btrfs_verify_folio(folio, start, len))
 		btrfs_folio_set_uptodate(fs_info, folio, start, len);
@@ -492,7 +492,7 @@ static void begin_folio_read(struct btrfs_fs_info *fs_info, struct folio *folio)
 		return;
 
 	ASSERT(folio_test_private(folio));
-	btrfs_folio_set_lock(fs_info, folio, folio_pos(folio), PAGE_SIZE);
+	btrfs_folio_set_lock(fs_info, folio, folio_pos(folio), folio_size(folio));
 }
 
 /*
@@ -753,7 +753,7 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
 {
 	struct btrfs_inode *inode = folio_to_inode(folio);
 
-	ASSERT(pg_offset + size <= PAGE_SIZE);
+	ASSERT(pg_offset + size <= folio_size(folio));
 	ASSERT(bio_ctrl->end_io_func);
 
 	if (bio_ctrl->bbio &&
@@ -935,7 +935,7 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
 	struct inode *inode = folio->mapping->host;
 	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
 	u64 start = folio_pos(folio);
-	const u64 end = start + PAGE_SIZE - 1;
+	const u64 end = start + folio_size(folio) - 1;
 	u64 extent_offset;
 	u64 last_byte = i_size_read(inode);
 	struct extent_map *em;
@@ -1279,7 +1279,7 @@ static void set_delalloc_bitmap(struct folio *folio, unsigned long *delalloc_bit
 	unsigned int start_bit;
 	unsigned int nbits;
 
-	ASSERT(start >= folio_start && start + len <= folio_start + PAGE_SIZE);
+	ASSERT(start >= folio_start && start + len <= folio_start + folio_size(folio));
 	start_bit = (start - folio_start) >> fs_info->sectorsize_bits;
 	nbits = len >> fs_info->sectorsize_bits;
 	ASSERT(bitmap_test_range_all_zero(delalloc_bitmap, start_bit, nbits));
@@ -1297,7 +1297,7 @@ static bool find_next_delalloc_bitmap(struct folio *folio,
 	unsigned int first_zero;
 	unsigned int first_set;
 
-	ASSERT(start >= folio_start && start < folio_start + PAGE_SIZE);
+	ASSERT(start >= folio_start && start < folio_start + folio_size(folio));
 
 	start_bit = (start - folio_start) >> fs_info->sectorsize_bits;
 	first_set = find_next_bit(delalloc_bitmap, bitmap_size, start_bit);
@@ -1499,10 +1499,10 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 		delalloc_end = page_end;
 	/*
 	 * delalloc_end is already one less than the total length, so
-	 * we don't subtract one from PAGE_SIZE
+	 * we don't subtract one from folio_size().
 	 */
 	delalloc_to_write +=
-		DIV_ROUND_UP(delalloc_end + 1 - page_start, PAGE_SIZE);
+		DIV_ROUND_UP(delalloc_end + 1 - page_start, folio_size(folio));
 
 	/*
 	 * If all ranges are submitted asynchronously, we just need to account
@@ -1765,7 +1765,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 		goto done;
 
 	ret = extent_writepage_io(inode, folio, folio_pos(folio),
-				  PAGE_SIZE, bio_ctrl, i_size);
+				  folio_size(folio), bio_ctrl, i_size);
 	if (ret == 1)
 		return 0;
 	if (ret < 0)
@@ -2492,8 +2492,8 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
 	ASSERT(IS_ALIGNED(start, sectorsize) && IS_ALIGNED(end + 1, sectorsize));
 
 	while (cur <= end) {
-		u64 cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end);
-		u32 cur_len = cur_end + 1 - cur;
+		u64 cur_end;
+		u32 cur_len;
 		struct folio *folio;
 
 		folio = filemap_get_folio(mapping, cur >> PAGE_SHIFT);
@@ -2503,13 +2503,18 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
 		 * code is just in case, but shouldn't actually be run.
 		 */
 		if (IS_ERR(folio)) {
+			cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end);
+			cur_len = cur_end + 1 - cur;
 			btrfs_mark_ordered_io_finished(BTRFS_I(inode), NULL,
 						       cur, cur_len, false);
 			mapping_set_error(mapping, PTR_ERR(folio));
-			cur = cur_end + 1;
+			cur = cur_end;
 			continue;
 		}
 
+		cur_end = min(folio_pos(folio) + folio_size(folio) - 1, end);
+		cur_len = cur_end + 1 - cur;
+
 		ASSERT(folio_test_locked(folio));
 		if (pages_dirty && folio != locked_folio)
 			ASSERT(folio_test_dirty(folio));
@@ -2621,7 +2626,7 @@ static bool try_release_extent_state(struct extent_io_tree *tree,
 				     struct folio *folio)
 {
 	u64 start = folio_pos(folio);
-	u64 end = start + PAGE_SIZE - 1;
+	u64 end = start + folio_size(folio) - 1;
 	bool ret;
 
 	if (test_range_bit_exists(tree, start, end, EXTENT_LOCKED)) {
@@ -2659,7 +2664,7 @@ static bool try_release_extent_state(struct extent_io_tree *tree,
 bool try_release_extent_mapping(struct folio *folio, gfp_t mask)
 {
 	u64 start = folio_pos(folio);
-	u64 end = start + PAGE_SIZE - 1;
+	u64 end = start + folio_size(folio) - 1;
 	struct btrfs_inode *inode = folio_to_inode(folio);
 	struct extent_io_tree *io_tree = &inode->io_tree;
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] btrfs: prepare btrfs_page_mkwrite() for larger folios
  2025-03-10  7:35 [PATCH v2 0/6] btrfs: prepare for larger folios support Qu Wenruo
                   ` (4 preceding siblings ...)
  2025-03-10  7:36 ` [PATCH 5/6] btrfs: prepare extent_io.c for future larger folio support Qu Wenruo
@ 2025-03-10  7:36 ` Qu Wenruo
  2025-03-10 23:32 ` [PATCH v2 0/6] btrfs: prepare for larger folios support Boris Burkov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2025-03-10  7:36 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn

This changes the assumption that the folio is always page sized.
(Although the ASSERT() for folio order is still kept as-is).

Just replace the PAGE_SIZE with folio_size().

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 643f101c7340..262a707d8990 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1782,6 +1782,7 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	struct extent_changeset *data_reserved = NULL;
 	unsigned long zero_start;
 	loff_t size;
+	size_t fsize = folio_size(folio);
 	vm_fault_t ret;
 	int ret2;
 	int reserved = 0;
@@ -1792,7 +1793,7 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 
 	ASSERT(folio_order(folio) == 0);
 
-	reserved_space = PAGE_SIZE;
+	reserved_space = fsize;
 
 	sb_start_pagefault(inode->i_sb);
 	page_start = folio_pos(folio);
@@ -1846,7 +1847,7 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	 * We can't set the delalloc bits if there are pending ordered
 	 * extents.  Drop our locks and wait for them to finish.
 	 */
-	ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start, PAGE_SIZE);
+	ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start, fsize);
 	if (ordered) {
 		unlock_extent(io_tree, page_start, page_end, &cached_state);
 		folio_unlock(folio);
@@ -1858,11 +1859,11 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 
 	if (folio->index == ((size - 1) >> PAGE_SHIFT)) {
 		reserved_space = round_up(size - page_start, fs_info->sectorsize);
-		if (reserved_space < PAGE_SIZE) {
+		if (reserved_space < fsize) {
 			end = page_start + reserved_space - 1;
 			btrfs_delalloc_release_space(BTRFS_I(inode),
 					data_reserved, page_start,
-					PAGE_SIZE - reserved_space, true);
+					fsize - reserved_space, true);
 		}
 	}
 
@@ -1889,12 +1890,12 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	if (page_start + folio_size(folio) > size)
 		zero_start = offset_in_folio(folio, size);
 	else
-		zero_start = PAGE_SIZE;
+		zero_start = fsize;
 
-	if (zero_start != PAGE_SIZE)
+	if (zero_start != fsize)
 		folio_zero_range(folio, zero_start, folio_size(folio) - zero_start);
 
-	btrfs_folio_clear_checked(fs_info, folio, page_start, PAGE_SIZE);
+	btrfs_folio_clear_checked(fs_info, folio, page_start, fsize);
 	btrfs_folio_set_dirty(fs_info, folio, page_start, end + 1 - page_start);
 	btrfs_folio_set_uptodate(fs_info, folio, page_start, end + 1 - page_start);
 
@@ -1903,7 +1904,7 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	unlock_extent(io_tree, page_start, page_end, &cached_state);
 	up_read(&BTRFS_I(inode)->i_mmap_lock);
 
-	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), fsize);
 	sb_end_pagefault(inode->i_sb);
 	extent_changeset_free(data_reserved);
 	return VM_FAULT_LOCKED;
@@ -1912,7 +1913,7 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	folio_unlock(folio);
 	up_read(&BTRFS_I(inode)->i_mmap_lock);
 out:
-	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), fsize);
 	btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
 				     reserved_space, (ret != 0));
 out_noreserve:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] btrfs: prepare extent_io.c for future larger folio support
  2025-03-10  7:36 ` [PATCH 5/6] btrfs: prepare extent_io.c for future larger folio support Qu Wenruo
@ 2025-03-10 23:27   ` Boris Burkov
  2025-03-10 23:51     ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Boris Burkov @ 2025-03-10 23:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Mar 10, 2025 at 06:06:01PM +1030, Qu Wenruo wrote:
> When we're handling folios from filemap, we can no longer assume all
> folios are page sized.
> 
> Thus for call sites assuming the folio is page sized, change the
> PAGE_SIZE usage to folio_size() instead.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 33 +++++++++++++++++++--------------
>  1 file changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 337d2bed98d9..337908f09b88 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -425,7 +425,7 @@ static void end_folio_read(struct folio *folio, bool uptodate, u64 start, u32 le
>  	struct btrfs_fs_info *fs_info = folio_to_fs_info(folio);
>  
>  	ASSERT(folio_pos(folio) <= start &&
> -	       start + len <= folio_pos(folio) + PAGE_SIZE);
> +	       start + len <= folio_pos(folio) + folio_size(folio));
>  
>  	if (uptodate && btrfs_verify_folio(folio, start, len))
>  		btrfs_folio_set_uptodate(fs_info, folio, start, len);
> @@ -492,7 +492,7 @@ static void begin_folio_read(struct btrfs_fs_info *fs_info, struct folio *folio)
>  		return;
>  
>  	ASSERT(folio_test_private(folio));
> -	btrfs_folio_set_lock(fs_info, folio, folio_pos(folio), PAGE_SIZE);
> +	btrfs_folio_set_lock(fs_info, folio, folio_pos(folio), folio_size(folio));
>  }
>  
>  /*
> @@ -753,7 +753,7 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
>  {
>  	struct btrfs_inode *inode = folio_to_inode(folio);
>  
> -	ASSERT(pg_offset + size <= PAGE_SIZE);
> +	ASSERT(pg_offset + size <= folio_size(folio));
>  	ASSERT(bio_ctrl->end_io_func);
>  
>  	if (bio_ctrl->bbio &&
> @@ -935,7 +935,7 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
>  	struct inode *inode = folio->mapping->host;
>  	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
>  	u64 start = folio_pos(folio);
> -	const u64 end = start + PAGE_SIZE - 1;
> +	const u64 end = start + folio_size(folio) - 1;
>  	u64 extent_offset;
>  	u64 last_byte = i_size_read(inode);
>  	struct extent_map *em;
> @@ -1279,7 +1279,7 @@ static void set_delalloc_bitmap(struct folio *folio, unsigned long *delalloc_bit
>  	unsigned int start_bit;
>  	unsigned int nbits;
>  
> -	ASSERT(start >= folio_start && start + len <= folio_start + PAGE_SIZE);
> +	ASSERT(start >= folio_start && start + len <= folio_start + folio_size(folio));
>  	start_bit = (start - folio_start) >> fs_info->sectorsize_bits;
>  	nbits = len >> fs_info->sectorsize_bits;
>  	ASSERT(bitmap_test_range_all_zero(delalloc_bitmap, start_bit, nbits));
> @@ -1297,7 +1297,7 @@ static bool find_next_delalloc_bitmap(struct folio *folio,
>  	unsigned int first_zero;
>  	unsigned int first_set;
>  
> -	ASSERT(start >= folio_start && start < folio_start + PAGE_SIZE);
> +	ASSERT(start >= folio_start && start < folio_start + folio_size(folio));
>  
>  	start_bit = (start - folio_start) >> fs_info->sectorsize_bits;
>  	first_set = find_next_bit(delalloc_bitmap, bitmap_size, start_bit);
> @@ -1499,10 +1499,10 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
>  		delalloc_end = page_end;
>  	/*
>  	 * delalloc_end is already one less than the total length, so
> -	 * we don't subtract one from PAGE_SIZE
> +	 * we don't subtract one from folio_size().
>  	 */
>  	delalloc_to_write +=
> -		DIV_ROUND_UP(delalloc_end + 1 - page_start, PAGE_SIZE);
> +		DIV_ROUND_UP(delalloc_end + 1 - page_start, folio_size(folio));
>  
>  	/*
>  	 * If all ranges are submitted asynchronously, we just need to account
> @@ -1765,7 +1765,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
>  		goto done;
>  
>  	ret = extent_writepage_io(inode, folio, folio_pos(folio),
> -				  PAGE_SIZE, bio_ctrl, i_size);
> +				  folio_size(folio), bio_ctrl, i_size);
>  	if (ret == 1)
>  		return 0;
>  	if (ret < 0)
> @@ -2492,8 +2492,8 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
>  	ASSERT(IS_ALIGNED(start, sectorsize) && IS_ALIGNED(end + 1, sectorsize));
>  
>  	while (cur <= end) {
> -		u64 cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end);
> -		u32 cur_len = cur_end + 1 - cur;
> +		u64 cur_end;
> +		u32 cur_len;
>  		struct folio *folio;
>  
>  		folio = filemap_get_folio(mapping, cur >> PAGE_SHIFT);
> @@ -2503,13 +2503,18 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
>  		 * code is just in case, but shouldn't actually be run.
>  		 */
>  		if (IS_ERR(folio)) {
> +			cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end);
> +			cur_len = cur_end + 1 - cur;

Curious: is this guess based on PAGE_SIZE more correct than using 1 as
num_bytes? How much more correct? :)

Probably better question: what is the behavior for the range
[PAGE_SIZE, folio_size(folio)] should that filemap_get_folio have
returned properly? If it truly can never happen AND we can't handle
it correctly, is handling it "sort of correctly" a good idea?

>  			btrfs_mark_ordered_io_finished(BTRFS_I(inode), NULL,
>  						       cur, cur_len, false);
>  			mapping_set_error(mapping, PTR_ERR(folio));
> -			cur = cur_end + 1;
> +			cur = cur_end;
>  			continue;
>  		}
>  
> +		cur_end = min(folio_pos(folio) + folio_size(folio) - 1, end);
> +		cur_len = cur_end + 1 - cur;
> +
>  		ASSERT(folio_test_locked(folio));
>  		if (pages_dirty && folio != locked_folio)
>  			ASSERT(folio_test_dirty(folio));
> @@ -2621,7 +2626,7 @@ static bool try_release_extent_state(struct extent_io_tree *tree,
>  				     struct folio *folio)
>  {
>  	u64 start = folio_pos(folio);
> -	u64 end = start + PAGE_SIZE - 1;
> +	u64 end = start + folio_size(folio) - 1;
>  	bool ret;
>  
>  	if (test_range_bit_exists(tree, start, end, EXTENT_LOCKED)) {
> @@ -2659,7 +2664,7 @@ static bool try_release_extent_state(struct extent_io_tree *tree,
>  bool try_release_extent_mapping(struct folio *folio, gfp_t mask)
>  {
>  	u64 start = folio_pos(folio);
> -	u64 end = start + PAGE_SIZE - 1;
> +	u64 end = start + folio_size(folio) - 1;
>  	struct btrfs_inode *inode = folio_to_inode(folio);
>  	struct extent_io_tree *io_tree = &inode->io_tree;
>  
> -- 
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/6] btrfs: prepare for larger folios support
  2025-03-10  7:35 [PATCH v2 0/6] btrfs: prepare for larger folios support Qu Wenruo
                   ` (5 preceding siblings ...)
  2025-03-10  7:36 ` [PATCH 6/6] btrfs: prepare btrfs_page_mkwrite() for larger folios Qu Wenruo
@ 2025-03-10 23:32 ` Boris Burkov
  2025-03-10 23:56   ` Qu Wenruo
  2025-03-11 10:25 ` Johannes Thumshirn
  2025-03-12 13:44 ` David Sterba
  8 siblings, 1 reply; 15+ messages in thread
From: Boris Burkov @ 2025-03-10 23:32 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Mar 10, 2025 at 06:05:56PM +1030, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Split the subpage.[ch] modification into 3 patches

I found the way you split it to be a little confusing, honestly. I would
have preferred splitting it by operation (alloc_subpage, is_subpage,
etc..), rather than a basically incorrect prep patch and then a sed patch
on the subpage file.

You already iterated on this, so I absolutely don't want to hold
anything up on this observation, haha. If it's something others agree
on, I wouldn't mind seeing it before the patches go in.

I asked a question on patch 5 that I think is interesting, but please
feel free to add:
Reviewed-by: Boris Burkov <boris@bur.io>

> - Rebased the latest for-next branch
>   Now all dependency are in for-next.
> 
> This means:
> 
> - Our subpage routine should check against the folio size other than
>   PAGE_SIZE
> 
> - Make functions handling filemap folios to use folio_size() other than
>   PAGE_SIZE
> 
>   The most common paths are:
>   * Buffered reads/writes
>   * Uncompressed folio writeback
>     Already handled pretty well
> 
>   * Compressed read
>   * Compressed write
>     To take full advantage of larger folios, we should use folio_iter
>     other than bvec_iter.
>     This will be a dedicated patchset, and the existing bvec_iter can
>     still handle larger folios.
> 
>   Internal usages can still use page sized folios, or even pages,
>   including:
>   * Encoded reads/writes
>   * Compressed folios
>   * RAID56 internal pages
>   * Scrub internal pages
> 
> This patchset will handle the above mentioned points by:
> 
> - Prepare the subpage routine to handle larger folios
>   This will introduce a small overhead, as all checks are against folio
>   sizes, even on x86_64 we can no longer skip subpage completely.
> 
>   This is done in the first patch.
> 
> - Convert straightforward PAGE_SIZE users to use folio_size()
>   This is done in the remaining patches.
> 
> Currently this patchset is not a exhaustive conversion, I'm pretty sure
> there are other complex situations which can cause problems.
> Those problems can only be exposed and fixed after switching on the
> experimental larger folios support later.
> 
> Qu Wenruo (6):
>   btrfs: subpage: make btrfs_is_subpage() check against a folio
>   btrfs: add a @fsize parameter to btrfs_alloc_subpage()
>   btrfs: replace PAGE_SIZE with folio_size for subpage.[ch]
>   btrfs: prepare btrfs_launcher_folio() for larger folios support
>   btrfs: prepare extent_io.c for future larger folio support
>   btrfs: prepare btrfs_page_mkwrite() for larger folios
> 
>  fs/btrfs/extent_io.c | 49 ++++++++++++++++++++++++--------------------
>  fs/btrfs/file.c      | 19 +++++++++--------
>  fs/btrfs/inode.c     |  4 ++--
>  fs/btrfs/subpage.c   | 38 +++++++++++++++++-----------------
>  fs/btrfs/subpage.h   | 16 +++++++--------
>  5 files changed, 66 insertions(+), 60 deletions(-)
> 
> -- 
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] btrfs: prepare extent_io.c for future larger folio support
  2025-03-10 23:27   ` Boris Burkov
@ 2025-03-10 23:51     ` Qu Wenruo
  0 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2025-03-10 23:51 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs



在 2025/3/11 09:57, Boris Burkov 写道:
[...]
>> @@ -2503,13 +2503,18 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
>>   		 * code is just in case, but shouldn't actually be run.
>>   		 */
>>   		if (IS_ERR(folio)) {
>> +			cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end);
>> +			cur_len = cur_end + 1 - cur;
> 
> Curious: is this guess based on PAGE_SIZE more correct than using 1 as
> num_bytes? How much more correct? :)

Filemap (page cache) itself is mostly an xarray using page index.

That's why filemap_get_folio() only accepts a page index, not a bytenr.

Using length 1 will just waste a lot of CPU (PAGE_SIZE times) trying to 
get the same missing folio at the same index.

> 
> Probably better question: what is the behavior for the range
> [PAGE_SIZE, folio_size(folio)] should that filemap_get_folio have
> returned properly?

Depends on the range of extent_write_locked_range().

If the range covers [PAGE_SIZE, folio_size(folio)], we just submit the 
write in one go, thus improving the performance.


> If it truly can never happen AND we can't handle
> it correctly, is handling it "sort of correctly" a good idea?

So far I think it should not happen, but this is only called by the 
compressed write path when compression failed, which is not a super hot 
path.
Thus I tend to keep it as-is for now.

The future plan is to change how we submit compressed write.
Instead of the more-or-less cursed async extent (just check how many 
rabbits I have to pull out of the hat for subpage block-perfect 
compression), we will submit the write no different than any other writes.

And the real compression is handled with one extra layer (like RAID56, 
but not exactly the same) on that bbio.

Then we can get rid of the extent_write_locked_range() completely, thus 
no more such weird handling.

Thanks,
Qu

> 
>>   			btrfs_mark_ordered_io_finished(BTRFS_I(inode), NULL,
>>   						       cur, cur_len, false);
>>   			mapping_set_error(mapping, PTR_ERR(folio));
>> -			cur = cur_end + 1;
>> +			cur = cur_end;
>>   			continue;
>>   		}
>>   
>> +		cur_end = min(folio_pos(folio) + folio_size(folio) - 1, end);
>> +		cur_len = cur_end + 1 - cur;
>> +
>>   		ASSERT(folio_test_locked(folio));
>>   		if (pages_dirty && folio != locked_folio)
>>   			ASSERT(folio_test_dirty(folio));
>> @@ -2621,7 +2626,7 @@ static bool try_release_extent_state(struct extent_io_tree *tree,
>>   				     struct folio *folio)
>>   {
>>   	u64 start = folio_pos(folio);
>> -	u64 end = start + PAGE_SIZE - 1;
>> +	u64 end = start + folio_size(folio) - 1;
>>   	bool ret;
>>   
>>   	if (test_range_bit_exists(tree, start, end, EXTENT_LOCKED)) {
>> @@ -2659,7 +2664,7 @@ static bool try_release_extent_state(struct extent_io_tree *tree,
>>   bool try_release_extent_mapping(struct folio *folio, gfp_t mask)
>>   {
>>   	u64 start = folio_pos(folio);
>> -	u64 end = start + PAGE_SIZE - 1;
>> +	u64 end = start + folio_size(folio) - 1;
>>   	struct btrfs_inode *inode = folio_to_inode(folio);
>>   	struct extent_io_tree *io_tree = &inode->io_tree;
>>   
>> -- 
>> 2.48.1
>>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/6] btrfs: prepare for larger folios support
  2025-03-10 23:32 ` [PATCH v2 0/6] btrfs: prepare for larger folios support Boris Burkov
@ 2025-03-10 23:56   ` Qu Wenruo
  0 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2025-03-10 23:56 UTC (permalink / raw)
  To: Boris Burkov, Qu Wenruo; +Cc: linux-btrfs



在 2025/3/11 10:02, Boris Burkov 写道:
> On Mon, Mar 10, 2025 at 06:05:56PM +1030, Qu Wenruo wrote:
>> [CHANGELOG]
>> v2:
>> - Split the subpage.[ch] modification into 3 patches
>
> I found the way you split it to be a little confusing, honestly. I would
> have preferred splitting it by operation (alloc_subpage, is_subpage,
> etc..), rather than a basically incorrect prep patch and then a sed patch
> on the subpage file.

That's why I initially merge the first 3 patches into one.

If anyone else has some preference on the split, I'm pretty happy to follow.

>
> You already iterated on this, so I absolutely don't want to hold
> anything up on this observation, haha. If it's something others agree
> on, I wouldn't mind seeing it before the patches go in.
>
> I asked a question on patch 5 that I think is interesting, but please
> feel free to add:
> Reviewed-by: Boris Burkov <boris@bur.io>

Thanks for the review,
Qu

>
>> - Rebased the latest for-next branch
>>    Now all dependency are in for-next.
>>
>> This means:
>>
>> - Our subpage routine should check against the folio size other than
>>    PAGE_SIZE
>>
>> - Make functions handling filemap folios to use folio_size() other than
>>    PAGE_SIZE
>>
>>    The most common paths are:
>>    * Buffered reads/writes
>>    * Uncompressed folio writeback
>>      Already handled pretty well
>>
>>    * Compressed read
>>    * Compressed write
>>      To take full advantage of larger folios, we should use folio_iter
>>      other than bvec_iter.
>>      This will be a dedicated patchset, and the existing bvec_iter can
>>      still handle larger folios.
>>
>>    Internal usages can still use page sized folios, or even pages,
>>    including:
>>    * Encoded reads/writes
>>    * Compressed folios
>>    * RAID56 internal pages
>>    * Scrub internal pages
>>
>> This patchset will handle the above mentioned points by:
>>
>> - Prepare the subpage routine to handle larger folios
>>    This will introduce a small overhead, as all checks are against folio
>>    sizes, even on x86_64 we can no longer skip subpage completely.
>>
>>    This is done in the first patch.
>>
>> - Convert straightforward PAGE_SIZE users to use folio_size()
>>    This is done in the remaining patches.
>>
>> Currently this patchset is not a exhaustive conversion, I'm pretty sure
>> there are other complex situations which can cause problems.
>> Those problems can only be exposed and fixed after switching on the
>> experimental larger folios support later.
>>
>> Qu Wenruo (6):
>>    btrfs: subpage: make btrfs_is_subpage() check against a folio
>>    btrfs: add a @fsize parameter to btrfs_alloc_subpage()
>>    btrfs: replace PAGE_SIZE with folio_size for subpage.[ch]
>>    btrfs: prepare btrfs_launcher_folio() for larger folios support
>>    btrfs: prepare extent_io.c for future larger folio support
>>    btrfs: prepare btrfs_page_mkwrite() for larger folios
>>
>>   fs/btrfs/extent_io.c | 49 ++++++++++++++++++++++++--------------------
>>   fs/btrfs/file.c      | 19 +++++++++--------
>>   fs/btrfs/inode.c     |  4 ++--
>>   fs/btrfs/subpage.c   | 38 +++++++++++++++++-----------------
>>   fs/btrfs/subpage.h   | 16 +++++++--------
>>   5 files changed, 66 insertions(+), 60 deletions(-)
>>
>> --
>> 2.48.1
>>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/6] btrfs: prepare for larger folios support
  2025-03-10  7:35 [PATCH v2 0/6] btrfs: prepare for larger folios support Qu Wenruo
                   ` (6 preceding siblings ...)
  2025-03-10 23:32 ` [PATCH v2 0/6] btrfs: prepare for larger folios support Boris Burkov
@ 2025-03-11 10:25 ` Johannes Thumshirn
  2025-03-12 13:44 ` David Sterba
  8 siblings, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2025-03-11 10:25 UTC (permalink / raw)
  To: WenRuo Qu, linux-btrfs@vger.kernel.org

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/6] btrfs: prepare for larger folios support
  2025-03-10  7:35 [PATCH v2 0/6] btrfs: prepare for larger folios support Qu Wenruo
                   ` (7 preceding siblings ...)
  2025-03-11 10:25 ` Johannes Thumshirn
@ 2025-03-12 13:44 ` David Sterba
  2025-03-13 15:21   ` Nathan Chancellor
  8 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2025-03-12 13:44 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Mar 10, 2025 at 06:05:56PM +1030, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Split the subpage.[ch] modification into 3 patches
> - Rebased the latest for-next branch
>   Now all dependency are in for-next.

Please add the series to for-next, I haven't found anything that would
need fixups or another resend so we cant get it to 6.15 queue. Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/6] btrfs: prepare for larger folios support
  2025-03-12 13:44 ` David Sterba
@ 2025-03-13 15:21   ` Nathan Chancellor
  2025-03-13 20:50     ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Nathan Chancellor @ 2025-03-13 15:21 UTC (permalink / raw)
  To: David Sterba; +Cc: Qu Wenruo, linux-btrfs

On Wed, Mar 12, 2025 at 02:44:55PM +0100, David Sterba wrote:
> On Mon, Mar 10, 2025 at 06:05:56PM +1030, Qu Wenruo wrote:
> > [CHANGELOG]
> > v2:
> > - Split the subpage.[ch] modification into 3 patches
> > - Rebased the latest for-next branch
> >   Now all dependency are in for-next.
> 
> Please add the series to for-next, I haven't found anything that would
> need fixups or another resend so we cant get it to 6.15 queue. Thanks.

This series is still broken for 32-bit targets as reported two weeks ago:

https://lore.kernel.org/202502211908.aCcQQyEY-lkp@intel.com/
https://lore.kernel.org/20250225184136.GA1679809@ax162/

$ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- mrproper allmodconfig fs/btrfs/extent_io.o
In file included from <command-line>:
fs/btrfs/extent_io.c: In function 'extent_write_locked_range':
include/linux/compiler_types.h:557:45: error: call to '__compiletime_assert_802' declared with attribute error: min(folio_pos(folio) + folio_size(folio) - 1, end) signedness error
  557 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |                                             ^
include/linux/compiler_types.h:538:25: note: in definition of macro '__compiletime_assert'
  538 |                         prefix ## suffix();                             \
      |                         ^~~~~~
include/linux/compiler_types.h:557:9: note: in expansion of macro '_compiletime_assert'
  557 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |         ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
      |                                     ^~~~~~~~~~~~~~~~~~
include/linux/minmax.h:93:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
   93 |         BUILD_BUG_ON_MSG(!__types_ok(ux, uy),           \
      |         ^~~~~~~~~~~~~~~~
include/linux/minmax.h:98:9: note: in expansion of macro '__careful_cmp_once'
   98 |         __careful_cmp_once(op, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_))
      |         ^~~~~~~~~~~~~~~~~~
include/linux/minmax.h:105:25: note: in expansion of macro '__careful_cmp'
  105 | #define min(x, y)       __careful_cmp(min, x, y)
      |                         ^~~~~~~~~~~~~
fs/btrfs/extent_io.c:2472:27: note: in expansion of macro 'min'
 2472 |                 cur_end = min(folio_pos(folio) + folio_size(folio) - 1, end);
      |                           ^~~

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/6] btrfs: prepare for larger folios support
  2025-03-13 15:21   ` Nathan Chancellor
@ 2025-03-13 20:50     ` Qu Wenruo
  0 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2025-03-13 20:50 UTC (permalink / raw)
  To: Nathan Chancellor, David Sterba; +Cc: Qu Wenruo, linux-btrfs



在 2025/3/14 01:51, Nathan Chancellor 写道:
> On Wed, Mar 12, 2025 at 02:44:55PM +0100, David Sterba wrote:
>> On Mon, Mar 10, 2025 at 06:05:56PM +1030, Qu Wenruo wrote:
>>> [CHANGELOG]
>>> v2:
>>> - Split the subpage.[ch] modification into 3 patches
>>> - Rebased the latest for-next branch
>>>    Now all dependency are in for-next.
>>
>> Please add the series to for-next, I haven't found anything that would
>> need fixups or another resend so we cant get it to 6.15 queue. Thanks.
>
> This series is still broken for 32-bit targets as reported two weeks ago:

Now fixed in for-next branch.

Thanks,
Qu>
> https://lore.kernel.org/202502211908.aCcQQyEY-lkp@intel.com/
> https://lore.kernel.org/20250225184136.GA1679809@ax162/
>
> $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- mrproper allmodconfig fs/btrfs/extent_io.o
> In file included from <command-line>:
> fs/btrfs/extent_io.c: In function 'extent_write_locked_range':
> include/linux/compiler_types.h:557:45: error: call to '__compiletime_assert_802' declared with attribute error: min(folio_pos(folio) + folio_size(folio) - 1, end) signedness error
>    557 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>        |                                             ^
> include/linux/compiler_types.h:538:25: note: in definition of macro '__compiletime_assert'
>    538 |                         prefix ## suffix();                             \
>        |                         ^~~~~~
> include/linux/compiler_types.h:557:9: note: in expansion of macro '_compiletime_assert'
>    557 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>        |         ^~~~~~~~~~~~~~~~~~~
> include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
>     39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>        |                                     ^~~~~~~~~~~~~~~~~~
> include/linux/minmax.h:93:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
>     93 |         BUILD_BUG_ON_MSG(!__types_ok(ux, uy),           \
>        |         ^~~~~~~~~~~~~~~~
> include/linux/minmax.h:98:9: note: in expansion of macro '__careful_cmp_once'
>     98 |         __careful_cmp_once(op, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_))
>        |         ^~~~~~~~~~~~~~~~~~
> include/linux/minmax.h:105:25: note: in expansion of macro '__careful_cmp'
>    105 | #define min(x, y)       __careful_cmp(min, x, y)
>        |                         ^~~~~~~~~~~~~
> fs/btrfs/extent_io.c:2472:27: note: in expansion of macro 'min'
>   2472 |                 cur_end = min(folio_pos(folio) + folio_size(folio) - 1, end);
>        |                           ^~~
>
> Cheers,
> Nathan
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-03-13 20:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-10  7:35 [PATCH v2 0/6] btrfs: prepare for larger folios support Qu Wenruo
2025-03-10  7:35 ` [PATCH 1/6] btrfs: subpage: make btrfs_is_subpage() check against a folio Qu Wenruo
2025-03-10  7:35 ` [PATCH 2/6] btrfs: add a @fsize parameter to btrfs_alloc_subpage() Qu Wenruo
2025-03-10  7:35 ` [PATCH 3/6] btrfs: replace PAGE_SIZE with folio_size for subpage.[ch] Qu Wenruo
2025-03-10  7:36 ` [PATCH 4/6] btrfs: prepare btrfs_launcher_folio() for larger folios support Qu Wenruo
2025-03-10  7:36 ` [PATCH 5/6] btrfs: prepare extent_io.c for future larger folio support Qu Wenruo
2025-03-10 23:27   ` Boris Burkov
2025-03-10 23:51     ` Qu Wenruo
2025-03-10  7:36 ` [PATCH 6/6] btrfs: prepare btrfs_page_mkwrite() for larger folios Qu Wenruo
2025-03-10 23:32 ` [PATCH v2 0/6] btrfs: prepare for larger folios support Boris Burkov
2025-03-10 23:56   ` Qu Wenruo
2025-03-11 10:25 ` Johannes Thumshirn
2025-03-12 13:44 ` David Sterba
2025-03-13 15:21   ` Nathan Chancellor
2025-03-13 20:50     ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox