[PATCH 0/4] btrfs: experimental support for huge data folios

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* [PATCH 0/4] btrfs: experimental support for huge data folios
@ 2026-04-29  5:03 Qu Wenruo
  2026-04-29  5:03 ` [PATCH 1/4] btrfs: update the out-of-date comments on subpage Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Qu Wenruo @ 2026-04-29  5:03 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
RFC->v1:
- Rebase to the latest for-next branch
  Which provides a stable baseline that can pass usual fstests runs, and
  no more 2K fs block size support.

- Mark the new huge folio support as experimental
  Since the large folio support itself is moved out of experimental
  features, the huge folio support will need to be hidden behind
  experimental.

- Rework the blocks per folio limit
  Previously blocks per folio limit is always calculated by
  BTRFS_MAX_FOLIO_SIZE / BTRFS_MIN_BLOCK_SIZE, but the real blocks per
  folio is also depending on the fs block size.

  Now introduce a new BTRFS_MAX_BLOCKS_PER_FOLIO macro, which is either
  BITS_PER_LONG (the old one), or 512 (the new experimental one).

  This will allow non-experimental builds to get rid of the enlarged
  bitmap, thus lower the on-stack memory usage for non-experimental
  builds.

Currently btrfs only supports folios as large as BITS_PER_LONG * blocks.
This is an artificial limit introduced to make bitmap operations easier.

Btrfs has two extra bitmaps that are out of btrfs_folio_state structure,
btrfs_bio_ctrl->submit_bitmap and @delalloc_bitmap inside
writepage_delalloc().

Limits the bitmap size to BITS_PER_LONG makes it very easy to handle the
above two bitmaps, we can just use a local unsigned long, no need to do
any memory allocation.

On the other hand, those two external bitmaps are the only thing
limiting huge folios.

The 1st patch will update the comments related to subpage implementation
first.
The 2nd patch will handle the subpage internal operations, mostly to
handle bitmap dumping.
The 3rd patch will prepare btrfs_bio_ctrl::submit_bitmap to be a proper
pointer for the incoming huge folios support.

The final patch will enable the huge folio support, by using on-stack
bitmap that can contain 512 bits.
That will ensure 2MiB folio size, which is order 9 on 4K page sized
systems.

Qu Wenruo (4):
  btrfs: update the out-of-date comments on subpage
  btrfs: prepare subpage operations to support >= BITS_PER_LONG
    sub-bitmaps
  btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps
  btrfs: introduce support for huge folios

 fs/btrfs/disk-io.c   |  11 ++-
 fs/btrfs/extent_io.c |  71 +++++++++-------
 fs/btrfs/fs.h        |  16 ++++
 fs/btrfs/subpage.c   | 191 ++++++++++++++++++++++++++-----------------
 fs/btrfs/subpage.h   |   8 +-
 5 files changed, 186 insertions(+), 111 deletions(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] btrfs: update the out-of-date comments on subpage
  2026-04-29  5:03 [PATCH 0/4] btrfs: experimental support for huge data folios Qu Wenruo
@ 2026-04-29  5:03 ` Qu Wenruo
  2026-04-29  5:03 ` [PATCH 2/4] btrfs: prepare subpage operations to support >= BITS_PER_LONG sub-bitmaps Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2026-04-29  5:03 UTC (permalink / raw)
  To: linux-btrfs

The comments at the beginning of subpage.c are out-of-date, a lot of the
limits are already resolved.

Update them to reflect the latest status.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 39 +++++----------------------------------
 1 file changed, 5 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 8a09f34ea31e..10220453b6fa 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -10,41 +10,12 @@
  *
  * Limitations:
  *
- * - Only support 64K page size for now
- *   This is to make metadata handling easier, as 64K page would ensure
- *   all nodesize would fit inside one page, thus we don't need to handle
- *   cases where a tree block crosses several pages.
+ * - Metadata must be fully aligned to node size
+ *   So when nodesize <= page size, the metadata can never cross folio boundaries.
  *
- * - Only metadata read-write for now
- *   The data read-write part is in development.
- *
- * - Metadata can't cross 64K page boundary
- *   btrfs-progs and kernel have done that for a while, thus only ancient
- *   filesystems could have such problem.  For such case, do a graceful
- *   rejection.
- *
- * Special behavior:
- *
- * - Metadata
- *   Metadata read is fully supported.
- *   Meaning when reading one tree block will only trigger the read for the
- *   needed range, other unrelated range in the same page will not be touched.
- *
- *   Metadata write support is partial.
- *   The writeback is still for the full page, but we will only submit
- *   the dirty extent buffers in the page.
- *
- *   This means, if we have a metadata page like this:
- *
- *   Page offset
- *   0         16K         32K         48K        64K
- *   |/////////|           |///////////|
- *        \- Tree block A        \- Tree block B
- *
- *   Even if we just want to writeback tree block A, we will also writeback
- *   tree block B if it's also dirty.
- *
- *   This may cause extra metadata writeback which results more COW.
+ * - Only support blocks per folio <= BITS_PER_LONG
+ *   This is to make bitmap copying much easier, a single unsigned long can handle
+ *   one bitmap.
  *
  * Implementation:
  *
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] btrfs: prepare subpage operations to support >= BITS_PER_LONG sub-bitmaps
  2026-04-29  5:03 [PATCH 0/4] btrfs: experimental support for huge data folios Qu Wenruo
  2026-04-29  5:03 ` [PATCH 1/4] btrfs: update the out-of-date comments on subpage Qu Wenruo
@ 2026-04-29  5:03 ` Qu Wenruo
  2026-04-29  5:03 ` [PATCH 3/4] btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps Qu Wenruo
  2026-04-29  5:03 ` [PATCH 4/4] btrfs: introduce support for huge folios Qu Wenruo
  3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2026-04-29  5:03 UTC (permalink / raw)
  To: linux-btrfs

[CURRENT LIMIT]
Btrfs currently only supports sub-bitmaps (e.g. dirty bitmap) no larger
than BITS_PER_LONG.

That limit allows us to easily grab an unsigned long without the need to
properly allocate memory for a larger bitmap.

Unfortunately that limit prevents us from supporting huge folios.
For 4K page size and block size, a huge folio (order 9) means 512 blocks
inside a 2M folio.

[ENHANCEMENT]
To allow direct bitmap operations without allocating new memory,
introduce two different ways to access the subpage bitmaps:

- Return an unsigned long value
  This only happens if blocks_per_folio <= BITS_PER_LONG.

  We read out the sub-bitmap into an unsigned long, and return the
  value.
  This is the old existing method.

  This involves get_bitmap_value_##name() helper functions.
  And this time the helper functions are defined as inline functions
  instead of macros to provide better type checks.

- Return a pointer where the sub-bitmap starts
  This only happens if blocks_per_folio >= BITS_PER_LONG.

  This is the new method for sub-bitmaps larger than BITS_PER_LONG.
  Since the sizes of sub-bitmaps are all aligned to BITS_PER_LONG, we
  can directly access the start byte of the sub-bitmap.

  This involves get_bitmap_pointer_##name() helper functions.

Then change the existing sub-bitmaps users to use the new helpers:

- Bitmap dumping
  Switch between get_bitmap_value_##name() and
  get_bitmap_pointer_##name() depending on the sub-bitmap size.

- btrfs_get_subpage_dirty_bitmap()
  Rename it to btrfs_get_subpage_dirty_bitmap_value() to follow the new
  value/pointer naming.
  Since we do not support huge folios yet, there is no pointer version
  for it yet.

  Furthermore add the support for bs == ps cases for
  btrfs_get_subpage_dirty_bitmap_value(), so that the caller no longer
  needs to check if the folio needs subpage handling.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c |   7 +--
 fs/btrfs/subpage.c   | 136 ++++++++++++++++++++++++++++++-------------
 fs/btrfs/subpage.h   |   5 +-
 3 files changed, 98 insertions(+), 50 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8800faa8b4be..3802e82430f5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1457,12 +1457,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	int ret = 0;
 
 	/* Save the dirty bitmap as our submission bitmap will be a subset of it. */
-	if (btrfs_is_subpage(fs_info, folio)) {
-		ASSERT(blocks_per_folio > 1);
-		btrfs_get_subpage_dirty_bitmap(fs_info, folio, &bio_ctrl->submit_bitmap);
-	} else {
-		bio_ctrl->submit_bitmap = 1;
-	}
+	bio_ctrl->submit_bitmap = btrfs_get_subpage_dirty_bitmap_value(fs_info, folio);
 
 	for_each_set_bitrange(start_bit, end_bit, &bio_ctrl->submit_bitmap,
 			      blocks_per_folio) {
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 10220453b6fa..3e04ec6b3f52 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -601,25 +601,56 @@ IMPLEMENT_BTRFS_PAGE_OPS(writeback, folio_start_writeback, folio_end_writeback,
 IMPLEMENT_BTRFS_PAGE_OPS(ordered, folio_set_ordered, folio_clear_ordered,
 			 folio_test_ordered);
 
-#define GET_SUBPAGE_BITMAP(fs_info, folio, name, dst)			\
-{									\
-	const unsigned int __bpf = btrfs_blocks_per_folio(fs_info, folio); \
-	const struct btrfs_folio_state *__bfs = folio_get_private(folio); \
-									\
-	ASSERT(__bpf <= BITS_PER_LONG);					\
-	*dst = bitmap_read(__bfs->bitmaps,				\
-			   __bpf * btrfs_bitmap_nr_##name, __bpf);	\
+#define DEFINE_GET_SUBPAGE_BITMAP(name)						\
+static inline unsigned long get_bitmap_value_##name(				\
+					const struct btrfs_fs_info *fs_info,	\
+					struct folio *folio)			\
+{										\
+	const unsigned int __bpf = btrfs_blocks_per_folio(fs_info, folio);	\
+	const struct btrfs_folio_state *__bfs = folio_get_private(folio);	\
+	unsigned long value;							\
+										\
+	ASSERT(__bpf <= BITS_PER_LONG);						\
+	value = bitmap_read(__bfs->bitmaps, __bpf * btrfs_bitmap_nr_##name,	\
+			     __bpf);						\
+	return value;								\
+}										\
+static inline unsigned long *get_bitmap_pointer_##name(				\
+					const struct btrfs_fs_info *fs_info,	\
+					struct folio *folio)			\
+{										\
+	const unsigned int __bpf = btrfs_blocks_per_folio(fs_info, folio);	\
+	struct btrfs_folio_state *__bfs = folio_get_private(folio);		\
+	unsigned long *pointer;							\
+										\
+	ASSERT(__bpf >= BITS_PER_LONG);						\
+	ASSERT(IS_ALIGNED(__bpf, BITS_PER_LONG));				\
+	pointer = __bfs->bitmaps + (BIT_WORD(__bpf) * btrfs_bitmap_nr_##name);	\
+	return pointer;								\
 }
 
-#define SUBPAGE_DUMP_BITMAP(fs_info, folio, name, start, len)		\
-{									\
-	unsigned long bitmap;						\
-	const unsigned int __bpf = btrfs_blocks_per_folio(fs_info, folio); \
-									\
-	GET_SUBPAGE_BITMAP(fs_info, folio, name, &bitmap);		\
-	btrfs_warn(fs_info,						\
-	"dumping bitmap start=%llu len=%u folio=%llu " #name "_bitmap=%*pbl", \
-		   start, len, folio_pos(folio), __bpf, &bitmap);	\
+DEFINE_GET_SUBPAGE_BITMAP(uptodate);
+DEFINE_GET_SUBPAGE_BITMAP(dirty);
+DEFINE_GET_SUBPAGE_BITMAP(writeback);
+DEFINE_GET_SUBPAGE_BITMAP(ordered);
+DEFINE_GET_SUBPAGE_BITMAP(locked);
+
+#define SUBPAGE_DUMP_BITMAP(fs_info, folio, name, start, len)			\
+{										\
+	const unsigned int __bpf = btrfs_blocks_per_folio(fs_info, folio);	\
+										\
+	if (__bpf <= BITS_PER_LONG) {						\
+		unsigned long bitmap = get_bitmap_value_##name(fs_info, folio);	\
+										\
+		btrfs_warn(fs_info,						\
+	"dumping bitmap start=%llu len=%u folio=%llu " #name "_bitmap=%*pbl",	\
+		   start, len, folio_pos(folio), __bpf, &bitmap);		\
+	} else {								\
+		btrfs_warn(fs_info,						\
+	"dumping bitmap start=%llu len=%u folio=%llu " #name "_bitmap=%*pbl",	\
+		   start, len, folio_pos(folio), __bpf,				\
+		   get_bitmap_pointer_##name(fs_info, folio));			\
+	}									\
 }
 
 /*
@@ -717,48 +748,71 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 {
 	struct btrfs_folio_state *bfs;
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
-	unsigned long uptodate_bitmap;
-	unsigned long dirty_bitmap;
-	unsigned long writeback_bitmap;
-	unsigned long ordered_bitmap;
-	unsigned long locked_bitmap;
 	unsigned long flags;
 
 	ASSERT(folio_test_private(folio) && folio_get_private(folio));
 	ASSERT(blocks_per_folio > 1);
 	bfs = folio_get_private(folio);
 
-	spin_lock_irqsave(&bfs->lock, flags);
-	GET_SUBPAGE_BITMAP(fs_info, folio, uptodate, &uptodate_bitmap);
-	GET_SUBPAGE_BITMAP(fs_info, folio, dirty, &dirty_bitmap);
-	GET_SUBPAGE_BITMAP(fs_info, folio, writeback, &writeback_bitmap);
-	GET_SUBPAGE_BITMAP(fs_info, folio, ordered, &ordered_bitmap);
-	GET_SUBPAGE_BITMAP(fs_info, folio, locked, &locked_bitmap);
-	spin_unlock_irqrestore(&bfs->lock, flags);
-
 	dump_page(folio_page(folio, 0), "btrfs folio state dump");
+
+	if (blocks_per_folio <= BITS_PER_LONG) {
+		unsigned long uptodate;
+		unsigned long dirty;
+		unsigned long writeback;
+		unsigned long ordered;
+		unsigned long locked;
+
+		spin_lock_irqsave(&bfs->lock, flags);
+		uptodate = get_bitmap_value_uptodate(fs_info, folio);
+		dirty = get_bitmap_value_dirty(fs_info, folio);
+		writeback = get_bitmap_value_writeback(fs_info, folio);
+		ordered = get_bitmap_value_ordered(fs_info, folio);
+		locked = get_bitmap_value_locked(fs_info, folio);
+
+		spin_unlock_irqrestore(&bfs->lock, flags);
+
+		btrfs_warn(fs_info,
+"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl locked=%*pbl",
+			    start, len, folio_pos(folio),
+			    blocks_per_folio, &uptodate,
+			    blocks_per_folio, &dirty,
+			    blocks_per_folio, &writeback,
+			    blocks_per_folio, &ordered,
+			    blocks_per_folio, &locked);
+		return;
+	}
+
+	spin_lock_irqsave(&bfs->lock, flags);
 	btrfs_warn(fs_info,
-"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl locked=%*pbl writeback=%*pbl ordered=%*pbl",
+"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl locked=%*pbl",
 		    start, len, folio_pos(folio),
-		    blocks_per_folio, &uptodate_bitmap,
-		    blocks_per_folio, &dirty_bitmap,
-		    blocks_per_folio, &locked_bitmap,
-		    blocks_per_folio, &writeback_bitmap,
-		    blocks_per_folio, &ordered_bitmap);
+		    blocks_per_folio, get_bitmap_pointer_uptodate(fs_info, folio),
+		    blocks_per_folio, get_bitmap_pointer_dirty(fs_info, folio),
+		    blocks_per_folio, get_bitmap_pointer_writeback(fs_info, folio),
+		    blocks_per_folio, get_bitmap_pointer_ordered(fs_info, folio),
+		    blocks_per_folio, get_bitmap_pointer_locked(fs_info, folio));
+	spin_unlock_irqrestore(&bfs->lock, flags);
 }
 
-void btrfs_get_subpage_dirty_bitmap(struct btrfs_fs_info *fs_info,
-				    struct folio *folio,
-				    unsigned long *ret_bitmap)
+unsigned long btrfs_get_subpage_dirty_bitmap_value(struct btrfs_fs_info *fs_info,
+						   struct folio *folio)
 {
 	struct btrfs_folio_state *bfs;
+	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
 	unsigned long flags;
+	unsigned long value;
+
+	if (blocks_per_folio == 1)
+		return 1;
 
 	ASSERT(folio_test_private(folio) && folio_get_private(folio));
-	ASSERT(btrfs_blocks_per_folio(fs_info, folio) > 1);
+	ASSERT(blocks_per_folio > 1);
+	ASSERT(blocks_per_folio <= BITS_PER_LONG);
 	bfs = folio_get_private(folio);
 
 	spin_lock_irqsave(&bfs->lock, flags);
-	GET_SUBPAGE_BITMAP(fs_info, folio, dirty, ret_bitmap);
+	value = get_bitmap_value_dirty(fs_info, folio);
 	spin_unlock_irqrestore(&bfs->lock, flags);
+	return value;
 }
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index fdea0b605bfc..9e92877e7251 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -200,9 +200,8 @@ bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
 void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info,
 				  struct folio *folio, u64 start, u32 len);
 bool btrfs_meta_folio_clear_and_test_dirty(struct folio *folio, const struct extent_buffer *eb);
-void btrfs_get_subpage_dirty_bitmap(struct btrfs_fs_info *fs_info,
-				    struct folio *folio,
-				    unsigned long *ret_bitmap);
+unsigned long btrfs_get_subpage_dirty_bitmap_value(struct btrfs_fs_info *fs_info,
+						   struct folio *folio);
 void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 				      struct folio *folio, u64 start, u32 len);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps
  2026-04-29  5:03 [PATCH 0/4] btrfs: experimental support for huge data folios Qu Wenruo
  2026-04-29  5:03 ` [PATCH 1/4] btrfs: update the out-of-date comments on subpage Qu Wenruo
  2026-04-29  5:03 ` [PATCH 2/4] btrfs: prepare subpage operations to support >= BITS_PER_LONG sub-bitmaps Qu Wenruo
@ 2026-04-29  5:03 ` Qu Wenruo
  2026-04-29  5:03 ` [PATCH 4/4] btrfs: introduce support for huge folios Qu Wenruo
  3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2026-04-29  5:03 UTC (permalink / raw)
  To: linux-btrfs

[CURRENT LIMIT]
Btrfs currently only supports sub-bitmaps (e.g. dirty bitmap) no larger
than BITS_PER_LONG.

One call site that utilizes this limit is btrfs_bio_ctrl::submit_bitmap,
which makes it very simple and straightforward to just grab an unsigned
long value and assign it to submit_bitmap.

Unfortunately that limit prevents us from supporting huge folios.
For 4K page size and block size, a huge folio (order 9) means 512 blocks
inside a 2M folio.

[ENHANCEMENT]
Instead of using a fixed unsigned long value, change
btrfs_bio_ctrl::submit_bitmap to an unsigned long pointer.

And for cases where an unsigned long can hold the whole bitmap,
introduce @submit_bitmap_value, and just point that pointer to that
unsigned long.

Then update all direct users of bio_ctrl->submit_bitmap to use the
pointer version.

There are several call sites that get extra changes:

- @range_bitmap inside extent_writepage_io()
  Which is only utilized to truncate the bitmap.
  Since we do not want to allocate new memory just for such temporary
  usage, change the original bitmap_set() and bitmap_and() into
  bitmap_clear() for the ranges out of the folio.

- Getting dirty subpage bitmap inside writepage_delalloc()
  Since we're passing an unsigned long pointer now, we need to go with
  different handling (bs == ps, blocks_per_folio <= BITS_PER_LONG,
  blocks_per_folio > BITS_PER_LONG).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 82 +++++++++++++++++++++++++++++++-------------
 fs/btrfs/subpage.c   | 29 +++++++++++-----
 fs/btrfs/subpage.h   |  7 ++--
 3 files changed, 83 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3802e82430f5..71593d19c0a3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -130,7 +130,13 @@ struct btrfs_bio_ctrl {
 	 * extent_writepage_io().
 	 * This is to avoid touching ranges covered by compression/inline.
 	 */
-	unsigned long submit_bitmap;
+	unsigned long *submit_bitmap;
+	/*
+	 * When blocks_per_folio <= BITS_PER_LONG, we can use the inline
+	 * one without allocating memory.
+	 */
+	unsigned long submit_bitmap_value;
+
 	struct readahead_control *ractl;
 
 	/*
@@ -1457,9 +1463,9 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	int ret = 0;
 
 	/* Save the dirty bitmap as our submission bitmap will be a subset of it. */
-	bio_ctrl->submit_bitmap = btrfs_get_subpage_dirty_bitmap_value(fs_info, folio);
+	btrfs_copy_subpage_dirty_bitmap(fs_info, folio, bio_ctrl->submit_bitmap);
 
-	for_each_set_bitrange(start_bit, end_bit, &bio_ctrl->submit_bitmap,
+	for_each_set_bitrange(start_bit, end_bit, bio_ctrl->submit_bitmap,
 			      blocks_per_folio) {
 		u64 start = page_start + (start_bit << fs_info->sectorsize_bits);
 		u32 len = (end_bit - start_bit) << fs_info->sectorsize_bits;
@@ -1535,7 +1541,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 					     btrfs_ino(inode),
 					     folio_pos(folio),
 					     blocks_per_folio,
-					     &bio_ctrl->submit_bitmap,
+					     bio_ctrl->submit_bitmap,
 					     found_start, found_len, ret);
 		} else {
 			/*
@@ -1560,7 +1566,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 						 fs_info->sectorsize_bits;
 			unsigned int end_bit = (min(page_end + 1, found_start + found_len) -
 						page_start) >> fs_info->sectorsize_bits;
-			bitmap_clear(&bio_ctrl->submit_bitmap, start_bit, end_bit - start_bit);
+			bitmap_clear(bio_ctrl->submit_bitmap, start_bit, end_bit - start_bit);
 		}
 		/*
 		 * Above btrfs_run_delalloc_range() may have unlocked the folio,
@@ -1581,7 +1587,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 				fs_info->sectorsize_bits,
 				blocks_per_folio);
 
-		for_each_set_bitrange(start_bit, end_bit, &bio_ctrl->submit_bitmap,
+		for_each_set_bitrange(start_bit, end_bit, bio_ctrl->submit_bitmap,
 				      bitmap_size) {
 			u64 start = page_start + (start_bit << fs_info->sectorsize_bits);
 			u32 len = (end_bit - start_bit) << fs_info->sectorsize_bits;
@@ -1607,7 +1613,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	 * If all ranges are submitted asynchronously, we just need to account
 	 * for them here.
 	 */
-	if (bitmap_empty(&bio_ctrl->submit_bitmap, blocks_per_folio)) {
+	if (bitmap_empty(bio_ctrl->submit_bitmap, blocks_per_folio)) {
 		wbc->nr_to_write -= delalloc_to_write;
 		return 1;
 	}
@@ -1728,7 +1734,6 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 						  loff_t i_size)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	unsigned long range_bitmap = 0;
 	bool submitted_io = false;
 	int found_error = 0;
 	const u64 end = start + len;
@@ -1756,14 +1761,18 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 		return -EUCLEAN;
 	}
 
-	bitmap_set(&range_bitmap, (start - folio_pos(folio)) >> fs_info->sectorsize_bits,
-		   len >> fs_info->sectorsize_bits);
-	bitmap_and(&bio_ctrl->submit_bitmap, &bio_ctrl->submit_bitmap, &range_bitmap,
-		   blocks_per_folio);
+	/* Truncate the submit bitmap to the current range. */
+	if (start > folio_start)
+		bitmap_clear(bio_ctrl->submit_bitmap, 0,
+			     (start - folio_start) >> fs_info->sectorsize_bits);
+	if (start + len < folio_end)
+		bitmap_clear(bio_ctrl->submit_bitmap,
+			     (end - folio_start) >> fs_info->sectorsize_bits,
+			     (folio_end - end) >> fs_info->sectorsize_bits);
 
 	bio_ctrl->end_io_func = end_bbio_data_write;
 
-	for_each_set_bit(bit, &bio_ctrl->submit_bitmap, blocks_per_folio) {
+	for_each_set_bit(bit, bio_ctrl->submit_bitmap, blocks_per_folio) {
 		cur = folio_pos(folio) + (bit << fs_info->sectorsize_bits);
 
 		if (cur >= i_size) {
@@ -1823,6 +1832,32 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 	return found_error;
 }
 
+static void bio_ctrl_init_submit_bitmap(struct btrfs_fs_info *fs_info,
+					struct folio *folio,
+					struct btrfs_bio_ctrl *bio_ctrl)
+{
+	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
+
+	/* Only supported for blocks per folio <= BITS_PER_LONG for now. */
+	ASSERT(blocks_per_folio <= BITS_PER_LONG);
+	bio_ctrl->submit_bitmap_value = 0;
+	bio_ctrl->submit_bitmap = &bio_ctrl->submit_bitmap_value;
+	/*
+	 * Default to unlock the whole folio.
+	 * The proper bitmap is not initialized until writepage_delalloc().
+	 */
+	bitmap_set(bio_ctrl->submit_bitmap, 0, blocks_per_folio);
+}
+
+static void bio_ctrl_release_submit_bitmap(struct btrfs_fs_info *fs_info,
+					   struct folio *folio,
+					   struct btrfs_bio_ctrl *bio_ctrl)
+{
+	ASSERT(btrfs_blocks_per_folio(fs_info, folio) <= BITS_PER_LONG);
+
+	bio_ctrl->submit_bitmap = NULL;
+}
+
 /*
  * the writepage semantics are similar to regular writepage.  extent
  * records are inserted to lock ranges in the tree, and as dirty areas
@@ -1857,12 +1892,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 	if (folio_contains(folio, end_index))
 		folio_zero_range(folio, pg_offset, folio_size(folio) - pg_offset);
 
-	/*
-	 * Default to unlock the whole folio.
-	 * The proper bitmap can only be initialized until writepage_delalloc().
-	 */
-	bio_ctrl->submit_bitmap = (unsigned long)-1;
-
+	bio_ctrl_init_submit_bitmap(fs_info, folio, bio_ctrl);
 	/*
 	 * If the page is dirty but without private set, it's marked dirty
 	 * without informing the fs.
@@ -1887,21 +1917,25 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 		goto done;
 
 	ret = writepage_delalloc(inode, folio, bio_ctrl);
-	if (ret == 1)
+	if (ret == 1) {
+		bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
 		return 0;
+	}
 	if (ret)
 		goto done;
 
 	ret = extent_writepage_io(inode, folio, folio_pos(folio),
 				  folio_size(folio), bio_ctrl, i_size);
-	if (ret == 1)
+	if (ret == 1) {
+		bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
 		return 0;
+	}
 	if (unlikely(ret < 0))
 		btrfs_err_rl(fs_info,
 "failed to submit blocks, root=%lld inode=%llu folio=%llu submit_bitmap=%*pbl: %d",
 			     btrfs_root_id(inode->root), btrfs_ino(inode),
 			     folio_pos(folio), blocks_per_folio,
-			     &bio_ctrl->submit_bitmap, ret);
+			     bio_ctrl->submit_bitmap, ret);
 
 	bio_ctrl->wbc->nr_to_write--;
 
@@ -1913,6 +1947,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 	 * submitted ranges inside the folio.
 	 */
 	btrfs_folio_end_lock_bitmap(fs_info, folio, bio_ctrl->submit_bitmap);
+	bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
 	ASSERT(ret <= 0);
 	return ret;
 }
@@ -2648,7 +2683,7 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
 		 * Set the submission bitmap to submit all sectors.
 		 * extent_writepage_io() will do the truncation correctly.
 		 */
-		bio_ctrl.submit_bitmap = (unsigned long)-1;
+		bio_ctrl_init_submit_bitmap(fs_info, folio, &bio_ctrl);
 		ret = extent_writepage_io(BTRFS_I(inode), folio, cur, cur_len,
 					  &bio_ctrl, i_size);
 		if (ret == 1)
@@ -2660,6 +2695,7 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
 		if (ret < 0)
 			found_error = true;
 next_page:
+		bio_ctrl_release_submit_bitmap(fs_info, folio, &bio_ctrl);
 		folio_put(folio);
 		cur = cur_end + 1;
 	}
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 3e04ec6b3f52..0bad087c445c 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -276,7 +276,7 @@ void btrfs_folio_end_lock(const struct btrfs_fs_info *fs_info,
 }
 
 void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info,
-				 struct folio *folio, unsigned long bitmap)
+				 struct folio *folio, unsigned long *bitmap)
 {
 	struct btrfs_folio_state *bfs = folio_get_private(folio);
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
@@ -298,7 +298,7 @@ void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info,
 	}
 
 	spin_lock_irqsave(&bfs->lock, flags);
-	for_each_set_bit(bit, &bitmap, blocks_per_folio) {
+	for_each_set_bit(bit, bitmap, blocks_per_folio) {
 		if (test_and_clear_bit(bit + start_bit, bfs->bitmaps))
 			cleared++;
 	}
@@ -795,24 +795,35 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&bfs->lock, flags);
 }
 
-unsigned long btrfs_get_subpage_dirty_bitmap_value(struct btrfs_fs_info *fs_info,
-						   struct folio *folio)
+void btrfs_copy_subpage_dirty_bitmap(struct btrfs_fs_info *fs_info,
+				     struct folio *folio,
+				     unsigned long *dst)
 {
 	struct btrfs_folio_state *bfs;
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
 	unsigned long flags;
 	unsigned long value;
 
-	if (blocks_per_folio == 1)
-		return 1;
+	if (blocks_per_folio == 1) {
+		value = 1;
+		bitmap_copy(dst, &value, 1);
+		return;
+	}
 
 	ASSERT(folio_test_private(folio) && folio_get_private(folio));
 	ASSERT(blocks_per_folio > 1);
-	ASSERT(blocks_per_folio <= BITS_PER_LONG);
 	bfs = folio_get_private(folio);
 
+	if (blocks_per_folio <= BITS_PER_LONG) {
+		spin_lock_irqsave(&bfs->lock, flags);
+		value = bitmap_read(bfs->bitmaps, btrfs_bitmap_nr_dirty * blocks_per_folio,
+				    blocks_per_folio);
+		spin_unlock_irqrestore(&bfs->lock, flags);
+		bitmap_copy(dst, &value, blocks_per_folio);
+		return;
+	}
 	spin_lock_irqsave(&bfs->lock, flags);
-	value = get_bitmap_value_dirty(fs_info, folio);
+	bitmap_copy(dst, get_bitmap_pointer_dirty(fs_info, folio),
+		    blocks_per_folio);
 	spin_unlock_irqrestore(&bfs->lock, flags);
-	return value;
 }
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 9e92877e7251..b45694eecb41 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -131,7 +131,7 @@ void btrfs_folio_end_lock(const struct btrfs_fs_info *fs_info,
 void btrfs_folio_set_lock(const struct btrfs_fs_info *fs_info,
 			  struct folio *folio, u64 start, u32 len);
 void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info,
-				 struct folio *folio, unsigned long bitmap);
+				 struct folio *folio, unsigned long *bitmap);
 /*
  * Template for subpage related operations.
  *
@@ -200,8 +200,9 @@ bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
 void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info,
 				  struct folio *folio, u64 start, u32 len);
 bool btrfs_meta_folio_clear_and_test_dirty(struct folio *folio, const struct extent_buffer *eb);
-unsigned long btrfs_get_subpage_dirty_bitmap_value(struct btrfs_fs_info *fs_info,
-						   struct folio *folio);
+void btrfs_copy_subpage_dirty_bitmap(struct btrfs_fs_info *fs_info,
+				     struct folio *folio,
+				     unsigned long *dst);
 void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 				      struct folio *folio, u64 start, u32 len);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] btrfs: introduce support for huge folios
  2026-04-29  5:03 [PATCH 0/4] btrfs: experimental support for huge data folios Qu Wenruo
                   ` (2 preceding siblings ...)
  2026-04-29  5:03 ` [PATCH 3/4] btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps Qu Wenruo
@ 2026-04-29  5:03 ` Qu Wenruo
  3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2026-04-29  5:03 UTC (permalink / raw)
  To: linux-btrfs

With all the previous preparations, it's finally time to enable the
huge folio support.

- The max folio size
  Here we define BTRFS_MAX_FOLIO_SIZE, which is fixed to 2MiB.

  This will ensure we have a large enough but not too large folio for
  btrfs.
  This limit applies to all systems regardless of page size.

  Then we also define BTRFS_MAX_BLOCKS_PER_FOLIO, which depends on
  CONFIG_BTRFS_EXPERIMENTAL.

  If it's an experimental build, BTRFS_MAX_BLOCKS_PER_FOLIO is 512,
  otherwise it's BITS_PER_LONG.

  The filemap max order will be calculated using both
  BTRFS_MAX_FOLIO_SIZE and BTRFS_MAX_BLOCKS_PER_FOLIO.

  E.g. for 64K page size with 64K fs block size, the limit will be
  BTRFS_MAX_FOLIO_SIZE (2M), which limits the filemap max order to 5.
  This will be lower than the old order (6), but folios larger than 2M
  are rarely any better for IO performance. Meanwhile excessively large
  folios can cause other problems like stalling the IO pipeline for too
  long.

  For 4K page size and 4K fs block size, the limit will be increased to
  2M from the old 256K.
  This new size is constrained by both BTRFS_MAX_FOLIO_SIZE (2M) and
  BTRFS_MAX_BLOCKS_PER_FOLIO (512 * 4K), allowing x86_64 to reach huge
  folio support, and the filemap max order will be 9.

- btrfs_bio_ctrl::submit_bitmap
  This will be enlarged to contain BTRFS_MAX_BLOCKS_PER_FOLIO bits, and
  this will be on-stack memory.
  This will increase on-stack memory usage by 56 bytes compared to the
  baseline (before any patch in the series).

- Local @delalloc_bitmap inside writepage_delalloc()
  Unfortunately we cannot afford to handle an allocation error here, thus
  again we use on-stack memory.
  Thus this will increase on-stack memory usage by 56 bytes again.

So unfortunately this means during the delalloc window, the writeback path
will have +112 bytes for on-stack memory usage, and for other cases the
writeback path will have +56 bytes on-stack memory usage.

The +56 bytes (btrfs_bio_ctrl::submit_bitmap) can be removed
after we have reworked the compression submission, so the current
on-stack submit_bitmap is mostly a workaround until then.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 11 ++++++++++-
 fs/btrfs/extent_io.c | 42 ++++++++++++------------------------------
 fs/btrfs/fs.h        | 16 ++++++++++++++++
 fs/btrfs/subpage.c   |  7 ++++---
 4 files changed, 42 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 308955f0592a..48ddbeb18e3c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3272,6 +3272,15 @@ static bool fs_is_full_ro(const struct btrfs_fs_info *fs_info)
 	return false;
 }
 
+static u32 calc_block_max_order(u32 sectorsize_bits)
+{
+	u32 max_size;
+
+	max_size = min(BTRFS_MAX_BLOCKS_PER_FOLIO << sectorsize_bits,
+		       BTRFS_MAX_FOLIO_SIZE);
+	return ilog2(round_up(max_size, PAGE_SIZE) >> PAGE_SHIFT);
+}
+
 int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices)
 {
 	u32 sectorsize;
@@ -3394,7 +3403,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	fs_info->sectorsize = sectorsize;
 	fs_info->sectorsize_bits = ilog2(sectorsize);
 	fs_info->block_min_order = ilog2(round_up(sectorsize, PAGE_SIZE) >> PAGE_SHIFT);
-	fs_info->block_max_order = ilog2((BITS_PER_LONG << fs_info->sectorsize_bits) >> PAGE_SHIFT);
+	fs_info->block_max_order = calc_block_max_order(fs_info->sectorsize_bits);
 	fs_info->csums_per_leaf = BTRFS_MAX_ITEM_SIZE(fs_info) / fs_info->csum_size;
 	fs_info->stripesize = stripesize;
 	fs_info->fs_devices->fs_info = fs_info;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 71593d19c0a3..31a65c662b65 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -130,12 +130,7 @@ struct btrfs_bio_ctrl {
 	 * extent_writepage_io().
 	 * This is to avoid touching ranges covered by compression/inline.
 	 */
-	unsigned long *submit_bitmap;
-	/*
-	 * When blocks_per_folio <= BITS_PER_LONG, we can use the inline
-	 * one without allocating memory.
-	 */
-	unsigned long submit_bitmap_value;
+	unsigned long submit_bitmap[BITS_TO_LONGS(BTRFS_MAX_BLOCKS_PER_FOLIO)];
 
 	struct readahead_control *ractl;
 
@@ -1438,7 +1433,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	const u64 page_start = folio_pos(folio);
 	const u64 page_end = page_start + folio_size(folio) - 1;
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
-	unsigned long delalloc_bitmap = 0;
+	unsigned long delalloc_bitmap[BITS_TO_LONGS(BTRFS_MAX_BLOCKS_PER_FOLIO)] = { 0 };
 	/*
 	 * Save the last found delalloc end. As the delalloc end can go beyond
 	 * page boundary, thus we cannot rely on subpage bitmap to locate the
@@ -1481,7 +1476,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 			delalloc_start = delalloc_end + 1;
 			continue;
 		}
-		set_delalloc_bitmap(folio, &delalloc_bitmap, delalloc_start,
+		set_delalloc_bitmap(folio, delalloc_bitmap, delalloc_start,
 				    min(delalloc_end, page_end) + 1 - delalloc_start);
 		last_delalloc_end = delalloc_end;
 		delalloc_start = delalloc_end + 1;
@@ -1507,7 +1502,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 			found_len = last_delalloc_end + 1 - found_start;
 			found = true;
 		} else {
-			found = find_next_delalloc_bitmap(folio, &delalloc_bitmap,
+			found = find_next_delalloc_bitmap(folio, delalloc_bitmap,
 					delalloc_start, &found_start, &found_len);
 		}
 		if (!found)
@@ -1838,26 +1833,19 @@ static void bio_ctrl_init_submit_bitmap(struct btrfs_fs_info *fs_info,
 {
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
 
-	/* Only supported for blocks per folio <= BITS_PER_LONG for now. */
-	ASSERT(blocks_per_folio <= BITS_PER_LONG);
-	bio_ctrl->submit_bitmap_value = 0;
-	bio_ctrl->submit_bitmap = &bio_ctrl->submit_bitmap_value;
+	ASSERT(blocks_per_folio <= BTRFS_MAX_BLOCKS_PER_FOLIO);
+
 	/*
 	 * Default to unlock the whole folio.
 	 * The proper bitmap is not initialized until writepage_delalloc().
+	 *
+	 * We're safe just to set the bitmap range [0, blocks_per_folio), as
+	 * all later usage of the bitmap will follow the same range limit.
+	 * Any bits beyond blocks_per_folio will be ignored.
 	 */
 	bitmap_set(bio_ctrl->submit_bitmap, 0, blocks_per_folio);
 }
 
-static void bio_ctrl_release_submit_bitmap(struct btrfs_fs_info *fs_info,
-					   struct folio *folio,
-					   struct btrfs_bio_ctrl *bio_ctrl)
-{
-	ASSERT(btrfs_blocks_per_folio(fs_info, folio) <= BITS_PER_LONG);
-
-	bio_ctrl->submit_bitmap = NULL;
-}
-
 /*
  * the writepage semantics are similar to regular writepage.  extent
  * records are inserted to lock ranges in the tree, and as dirty areas
@@ -1917,19 +1905,15 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 		goto done;
 
 	ret = writepage_delalloc(inode, folio, bio_ctrl);
-	if (ret == 1) {
-		bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
+	if (ret == 1)
 		return 0;
-	}
 	if (ret)
 		goto done;
 
 	ret = extent_writepage_io(inode, folio, folio_pos(folio),
 				  folio_size(folio), bio_ctrl, i_size);
-	if (ret == 1) {
-		bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
+	if (ret == 1)
 		return 0;
-	}
 	if (unlikely(ret < 0))
 		btrfs_err_rl(fs_info,
 "failed to submit blocks, root=%lld inode=%llu folio=%llu submit_bitmap=%*pbl: %d",
@@ -1947,7 +1931,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 	 * submitted ranges inside the folio.
 	 */
 	btrfs_folio_end_lock_bitmap(fs_info, folio, bio_ctrl->submit_bitmap);
-	bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
 	ASSERT(ret <= 0);
 	return ret;
 }
@@ -2695,7 +2678,6 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
 		if (ret < 0)
 			found_error = true;
 next_page:
-		bio_ctrl_release_submit_bitmap(fs_info, folio, &bio_ctrl);
 		folio_put(folio);
 		cur = cur_end + 1;
 	}
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index f821aa762613..9997bbc1d1e5 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -54,6 +54,22 @@ struct btrfs_space_info;
 #define BTRFS_MIN_BLOCKSIZE	(SZ_4K)
 #define BTRFS_MAX_BLOCKSIZE	(SZ_64K)
 
+/* The max folio size btrfs supports. */
+#define BTRFS_MAX_FOLIO_SIZE	(SZ_2M)
+static_assert(BTRFS_MAX_FOLIO_SIZE > PAGE_SIZE);
+
+/*
+ * The max number of blocks a huge folio can support.
+ *
+ * Depending on the fs block size, the real max blocks per folio
+ * may also be limited by the above BTRFS_MAX_FOLIO_SIZE.
+ */
+#ifdef CONFIG_BTRFS_EXPERIMENTAL
+#define BTRFS_MAX_BLOCKS_PER_FOLIO	(512)
+#else
+#define BTRFS_MAX_BLOCKS_PER_FOLIO	(BITS_PER_LONG)
+#endif
+
 #define BTRFS_MAX_EXTENT_SIZE SZ_128M
 
 /*
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 0bad087c445c..ea202698fa10 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -13,9 +13,10 @@
  * - Metadata must be fully aligned to node size
  *   So when nodesize <= page size, the metadata can never cross folio boundaries.
  *
- * - Only support blocks per folio <= BITS_PER_LONG
- *   This is to make bitmap copying much easier, a single unsigned long can handle
- *   one bitmap.
+ * - Only support blocks per folio <= min(BTRFS_MAX_FOLIO_SIZE / fs block size,
+ *					  BTRFS_MAX_BLOCKS_PER_FOLIO)
+ *   This is to ensure we can afford an on-stack bitmap, without the need to allocate
+ *   bitmap memory at runtime.
  *
  * Implementation:
  *
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-29  5:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29  5:03 [PATCH 0/4] btrfs: experimental support for huge data folios Qu Wenruo
2026-04-29  5:03 ` [PATCH 1/4] btrfs: update the out-of-date comments on subpage Qu Wenruo
2026-04-29  5:03 ` [PATCH 2/4] btrfs: prepare subpage operations to support >= BITS_PER_LONG sub-bitmaps Qu Wenruo
2026-04-29  5:03 ` [PATCH 3/4] btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps Qu Wenruo
2026-04-29  5:03 ` [PATCH 4/4] btrfs: introduce support for huge folios Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox