[PATCH RFC 0/5] btrfs: support huge data folios for 4K page size

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size
@ 2026-04-24  8:51 Qu Wenruo
  2026-04-24  8:51 ` [PATCH RFC 1/5] btrfs: update the out-of-date comments on subpage Qu Wenruo
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Qu Wenruo @ 2026-04-24  8:51 UTC (permalink / raw)
  To: linux-btrfs

[REASON FOR RFC]

- Unstable baseline for-next branch
  The current for-next branch has a known regression where
  btrfs_async_reclaim_metadata_space() kworker can fall into a dead loop
  and fail several ENOSPC test cases.
  Although the regression is pinned down, I'm still waiting for a good
  baseline to rebase the series.

- Extra on-stack memory usage
  Currently we have two bitmaps that are using extra on-stack memory.
  Each of them uses 64 bytes on-stack memory, so in total we can have
  128 bytes just for the bitmaps.

  Not sure if it's a concern.

- Uncertainty related to v7.2 large folio support
  I'm planning to move large data folio support out of experimental.
  So when that support is no longer experimental, this huge folio support
  would need to start as experimental

  Depends on which patch(set) is merged first, there needs to be some
  modification to ensure end users won't suddenly get not only large but
  also huge folios.

Currently btrfs only supports folios as large as BITS_PER_LONG * blocks.
This is an artificial limit introduced to make bitmap operations easier.

Btrfs has two extra bitmaps that are out of btrfs_folio_state structure,
btrfs_bio_ctrl->submit_bitmap and @delalloc_bitmap inside
writepage_delalloc().

Limits the bitmap size to BITS_PER_LONG makes it very easy to handle the
above two bitmaps, we can just use a local unsigned long, no need to do
any memory allocation.

On the other hand, those two external bitmaps are the only thing
limiting huge folios.

The 1st patch will update the comments related to subpage implementation
first.
The 2nd patch will handle the subpage internal operations, mostly to
handle bitmap dumping.
The 3rd patch will prepare btrfs_bio_ctrl::submit_bitmap to be a proper
pointer for the incoming huge folios support.

The 4th patch will remove the 2K block size support, as it will double
the bitmap size for no obvious benefit, as it doesn't even improve the
coverage on 4K page sized systems anymore.

The final patch will enable the huge folio support, by using on-stack
bitmap that can contain 512 bits.
That will ensure 2MiB folio size, which is order 9 on 4K page sized
systems.

Qu Wenruo (5):
  btrfs: update the out-of-date comments on subpage
  btrfs: prepare subpage operations to support >= BITS_PER_LONG
    sub-bitmaps
  btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps
  btrfs: remove 2K block size support
  btrfs: introduce support for huge folios

 fs/btrfs/disk-io.c   |   2 +-
 fs/btrfs/extent_io.c |  73 +++++++++------
 fs/btrfs/fs.h        |  16 ++--
 fs/btrfs/subpage.c   | 209 +++++++++++++++++++++++++------------------
 fs/btrfs/subpage.h   |   8 +-
 5 files changed, 176 insertions(+), 132 deletions(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 1/5] btrfs: update the out-of-date comments on subpage
  2026-04-24  8:51 [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size Qu Wenruo
@ 2026-04-24  8:51 ` Qu Wenruo
  2026-04-24  8:51 ` [PATCH RFC 2/5] btrfs: prepare subpage operations to support >= BITS_PER_LONG sub-bitmaps Qu Wenruo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2026-04-24  8:51 UTC (permalink / raw)
  To: linux-btrfs

The comments at the beginning of subpage.c is out-of-date, a lot of the
limits are already resolved.

Update them to reflect the latest status.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 39 +++++----------------------------------
 1 file changed, 5 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 8a09f34ea31e..becaa1dc0001 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -10,41 +10,12 @@
  *
  * Limitations:
  *
- * - Only support 64K page size for now
- *   This is to make metadata handling easier, as 64K page would ensure
- *   all nodesize would fit inside one page, thus we don't need to handle
- *   cases where a tree block crosses several pages.
+ * - Metadata must be fully aligned to node size
+ *   So when nodesize <= page size, the metadata can never cross folio boundaries.
  *
- * - Only metadata read-write for now
- *   The data read-write part is in development.
- *
- * - Metadata can't cross 64K page boundary
- *   btrfs-progs and kernel have done that for a while, thus only ancient
- *   filesystems could have such problem.  For such case, do a graceful
- *   rejection.
- *
- * Special behavior:
- *
- * - Metadata
- *   Metadata read is fully supported.
- *   Meaning when reading one tree block will only trigger the read for the
- *   needed range, other unrelated range in the same page will not be touched.
- *
- *   Metadata write support is partial.
- *   The writeback is still for the full page, but we will only submit
- *   the dirty extent buffers in the page.
- *
- *   This means, if we have a metadata page like this:
- *
- *   Page offset
- *   0         16K         32K         48K        64K
- *   |/////////|           |///////////|
- *        \- Tree block A        \- Tree block B
- *
- *   Even if we just want to writeback tree block A, we will also writeback
- *   tree block B if it's also dirty.
- *
- *   This may cause extra metadata writeback which results more COW.
+ * - Only support block per folio <= BITS_PER_LONG
+ *   This is to make bitmap copying much easier, a single unsigned long can handle
+ *   one bitmap.
  *
  * Implementation:
  *
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 2/5] btrfs: prepare subpage operations to support >= BITS_PER_LONG sub-bitmaps
  2026-04-24  8:51 [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size Qu Wenruo
  2026-04-24  8:51 ` [PATCH RFC 1/5] btrfs: update the out-of-date comments on subpage Qu Wenruo
@ 2026-04-24  8:51 ` Qu Wenruo
  2026-04-24  8:51 ` [PATCH RFC 3/5] btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps Qu Wenruo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2026-04-24  8:51 UTC (permalink / raw)
  To: linux-btrfs

[CURRENT LIMIT]
Btrfs currently only supports sub-bitmaps (e.g. dirty bitmap) no larger
than BITS_PER_LONG.

That limit allows us to easily grab an unsigned long without the need to
properly allocate memory for a larger bitmap.

Unfortunately that limit prevents us from supporting huge folios, which
means order 9 and for 4K page size and block size, it means 512 blocks
inside a 2M folio.

[ENHANCEMENT]
To allow direct bitmap operations without allocating new memory,
introduce two different ways to access the subpage bitmaps:

- Return an unsigned long value
  This only happens if blocks_per_folio <= BITS_PER_LONG.

  We read out the sub-bitmap into an unsigned long, and return the
  value.
  This is the old existing method.

  This involves get_bitmap_value_##name() helper functions.
  And this time the helper functions are defined as inline functions
  instead of macros to provide better type checks.

- Return a pointer where the sub-bitmap starts
  This only happens if blocks_per_folio >= BITS_PER_LONG.

  This is the new method for sub-bitmaps larger than BITS_PER_LONG.
  Since the sizes of sub-bitmaps are all aligned to BITS_PER_LONG, we
  can directly access the start byte of the sub-bitmap.

  This involves get_bitmap_pointer_##name() helper functions.
  And this time the helper functions are defined as inline functions
  instead of macros to provide better type checks.

Then change the existing sub-bitmaps users to use the new helpers:

- Bitmap dumping
  Switch between get_bitmap_value_##name() and
  get_bitmap_pointer_##name() depending on the sub-bitmap size.

- btrfs_get_subpage_dirty_bitmap()
  Rename it to btrfs_get_subpage_dirty_bitmap_value() to follow the new
  value/pointer naming.
  Since we do not support huge folios yet, there is no pointer version
  for it yet.

  Furthermore add the support for bs == ps cases for
  btrfs_get_subpage_dirty_bitmap_value(), so that the caller no longer
  needs to check if the folio needs subpage handling.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c |   7 +--
 fs/btrfs/subpage.c   | 133 ++++++++++++++++++++++++++++++-------------
 fs/btrfs/subpage.h   |   5 +-
 3 files changed, 96 insertions(+), 49 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6a6d98eec787..13b726831aa0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1447,12 +1447,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	int ret = 0;
 
 	/* Save the dirty bitmap as our submission bitmap will be a subset of it. */
-	if (btrfs_is_subpage(fs_info, folio)) {
-		ASSERT(blocks_per_folio > 1);
-		btrfs_get_subpage_dirty_bitmap(fs_info, folio, &bio_ctrl->submit_bitmap);
-	} else {
-		bio_ctrl->submit_bitmap = 1;
-	}
+	bio_ctrl->submit_bitmap = btrfs_get_subpage_dirty_bitmap_value(fs_info, folio);
 
 	for_each_set_bitrange(start_bit, end_bit, &bio_ctrl->submit_bitmap,
 			      blocks_per_folio) {
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index becaa1dc0001..2116a6b4a476 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -601,25 +601,56 @@ IMPLEMENT_BTRFS_PAGE_OPS(writeback, folio_start_writeback, folio_end_writeback,
 IMPLEMENT_BTRFS_PAGE_OPS(ordered, folio_set_ordered, folio_clear_ordered,
 			 folio_test_ordered);
 
-#define GET_SUBPAGE_BITMAP(fs_info, folio, name, dst)			\
-{									\
-	const unsigned int __bpf = btrfs_blocks_per_folio(fs_info, folio); \
-	const struct btrfs_folio_state *__bfs = folio_get_private(folio); \
-									\
-	ASSERT(__bpf <= BITS_PER_LONG);					\
-	*dst = bitmap_read(__bfs->bitmaps,				\
-			   __bpf * btrfs_bitmap_nr_##name, __bpf);	\
+#define DEFINE_GET_SUBPAGE_BITMAP(name)						\
+static inline unsigned long get_bitmap_value_##name(				\
+					const struct btrfs_fs_info *fs_info,	\
+					struct folio *folio)			\
+{										\
+	const unsigned int __bpf = btrfs_blocks_per_folio(fs_info, folio);	\
+	const struct btrfs_folio_state *__bfs = folio_get_private(folio);	\
+	unsigned long value;							\
+										\
+	ASSERT(__bpf <= BITS_PER_LONG);						\
+	value = bitmap_read(__bfs->bitmaps, __bpf * btrfs_bitmap_nr_##name,	\
+			     __bpf);						\
+	return value;								\
+}										\
+static inline unsigned long *get_bitmap_pointer_##name(				\
+					const struct btrfs_fs_info *fs_info,	\
+					struct folio *folio)			\
+{										\
+	const unsigned int __bpf = btrfs_blocks_per_folio(fs_info, folio);	\
+	struct btrfs_folio_state *__bfs = folio_get_private(folio);		\
+	unsigned long *pointer;							\
+										\
+	ASSERT(__bpf >= BITS_PER_LONG);						\
+	ASSERT(IS_ALIGNED(__bpf, BITS_PER_LONG));				\
+	pointer = __bfs->bitmaps + (BIT_WORD(__bpf) * btrfs_bitmap_nr_##name);	\
+	return pointer;								\
 }
 
-#define SUBPAGE_DUMP_BITMAP(fs_info, folio, name, start, len)		\
-{									\
-	unsigned long bitmap;						\
-	const unsigned int __bpf = btrfs_blocks_per_folio(fs_info, folio); \
-									\
-	GET_SUBPAGE_BITMAP(fs_info, folio, name, &bitmap);		\
-	btrfs_warn(fs_info,						\
-	"dumping bitmap start=%llu len=%u folio=%llu " #name "_bitmap=%*pbl", \
-		   start, len, folio_pos(folio), __bpf, &bitmap);	\
+DEFINE_GET_SUBPAGE_BITMAP(uptodate);
+DEFINE_GET_SUBPAGE_BITMAP(dirty);
+DEFINE_GET_SUBPAGE_BITMAP(writeback);
+DEFINE_GET_SUBPAGE_BITMAP(ordered);
+DEFINE_GET_SUBPAGE_BITMAP(locked);
+
+#define SUBPAGE_DUMP_BITMAP(fs_info, folio, name, start, len)			\
+{										\
+	const unsigned int __bpf = btrfs_blocks_per_folio(fs_info, folio);	\
+										\
+	if (__bpf <= BITS_PER_LONG) {						\
+		unsigned long bitmap = get_bitmap_value_##name(fs_info, folio);	\
+										\
+		btrfs_warn(fs_info,						\
+	"dumping bitmap start=%llu len=%u folio=%llu " #name "_bitmap=%*pbl",	\
+		   start, len, folio_pos(folio), __bpf, &bitmap);		\
+	} else {								\
+		btrfs_warn(fs_info,						\
+	"dumping bitmap start=%llu len=%u folio=%llu " #name "_bitmap=%*pbl",	\
+		   start, len, folio_pos(folio), __bpf,				\
+		   get_bitmap_pointer_##name(fs_info, folio));			\
+	}									\
 }
 
 /*
@@ -717,48 +748,70 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 {
 	struct btrfs_folio_state *bfs;
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
-	unsigned long uptodate_bitmap;
-	unsigned long dirty_bitmap;
-	unsigned long writeback_bitmap;
-	unsigned long ordered_bitmap;
-	unsigned long locked_bitmap;
 	unsigned long flags;
 
 	ASSERT(folio_test_private(folio) && folio_get_private(folio));
 	ASSERT(blocks_per_folio > 1);
 	bfs = folio_get_private(folio);
 
-	spin_lock_irqsave(&bfs->lock, flags);
-	GET_SUBPAGE_BITMAP(fs_info, folio, uptodate, &uptodate_bitmap);
-	GET_SUBPAGE_BITMAP(fs_info, folio, dirty, &dirty_bitmap);
-	GET_SUBPAGE_BITMAP(fs_info, folio, writeback, &writeback_bitmap);
-	GET_SUBPAGE_BITMAP(fs_info, folio, ordered, &ordered_bitmap);
-	GET_SUBPAGE_BITMAP(fs_info, folio, locked, &locked_bitmap);
-	spin_unlock_irqrestore(&bfs->lock, flags);
+	if (blocks_per_folio <= BITS_PER_LONG) {
+		unsigned long uptodate;
+		unsigned long dirty;
+		unsigned long writeback;
+		unsigned long ordered;
+		unsigned long locked;
+
+		spin_lock_irqsave(&bfs->lock, flags);
+		uptodate = get_bitmap_value_uptodate(fs_info, folio);
+		dirty = get_bitmap_value_dirty(fs_info, folio);
+		writeback = get_bitmap_value_writeback(fs_info, folio);
+		ordered = get_bitmap_value_ordered(fs_info, folio);
+		locked = get_bitmap_value_locked(fs_info, folio);
+
+		spin_unlock_irqrestore(&bfs->lock, flags);
+
+		btrfs_warn(fs_info,
+"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl locked=%*pbl",
+			    start, len, folio_pos(folio),
+			    blocks_per_folio, &uptodate,
+			    blocks_per_folio, &dirty,
+			    blocks_per_folio, &writeback,
+			    blocks_per_folio, &ordered,
+			    blocks_per_folio, &locked);
+		return;
+	}
 
 	dump_page(folio_page(folio, 0), "btrfs folio state dump");
+	spin_lock_irqsave(&bfs->lock, flags);
 	btrfs_warn(fs_info,
-"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl locked=%*pbl writeback=%*pbl ordered=%*pbl",
+"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl locked=%*pbl",
 		    start, len, folio_pos(folio),
-		    blocks_per_folio, &uptodate_bitmap,
-		    blocks_per_folio, &dirty_bitmap,
-		    blocks_per_folio, &locked_bitmap,
-		    blocks_per_folio, &writeback_bitmap,
-		    blocks_per_folio, &ordered_bitmap);
+		    blocks_per_folio, get_bitmap_pointer_uptodate(fs_info, folio),
+		    blocks_per_folio, get_bitmap_pointer_dirty(fs_info, folio),
+		    blocks_per_folio, get_bitmap_pointer_writeback(fs_info, folio),
+		    blocks_per_folio, get_bitmap_pointer_ordered(fs_info, folio),
+		    blocks_per_folio, get_bitmap_pointer_locked(fs_info, folio));
+	spin_unlock_irqrestore(&bfs->lock, flags);
 }
 
-void btrfs_get_subpage_dirty_bitmap(struct btrfs_fs_info *fs_info,
-				    struct folio *folio,
-				    unsigned long *ret_bitmap)
+unsigned long btrfs_get_subpage_dirty_bitmap_value(struct btrfs_fs_info *fs_info,
+						   struct folio *folio)
 {
 	struct btrfs_folio_state *bfs;
+	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
 	unsigned long flags;
+	unsigned long value;
+
+	if (blocks_per_folio == 1)
+		return 1;
 
 	ASSERT(folio_test_private(folio) && folio_get_private(folio));
-	ASSERT(btrfs_blocks_per_folio(fs_info, folio) > 1);
+	ASSERT(blocks_per_folio > 1);
+	ASSERT(blocks_per_folio <= BITS_PER_LONG);
 	bfs = folio_get_private(folio);
 
 	spin_lock_irqsave(&bfs->lock, flags);
-	GET_SUBPAGE_BITMAP(fs_info, folio, dirty, ret_bitmap);
+	value = get_bitmap_value_dirty(fs_info, folio);
 	spin_unlock_irqrestore(&bfs->lock, flags);
+	return value;
 }
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index fdea0b605bfc..9e92877e7251 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -200,9 +200,8 @@ bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
 void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info,
 				  struct folio *folio, u64 start, u32 len);
 bool btrfs_meta_folio_clear_and_test_dirty(struct folio *folio, const struct extent_buffer *eb);
-void btrfs_get_subpage_dirty_bitmap(struct btrfs_fs_info *fs_info,
-				    struct folio *folio,
-				    unsigned long *ret_bitmap);
+unsigned long btrfs_get_subpage_dirty_bitmap_value(struct btrfs_fs_info *fs_info,
+						   struct folio *folio);
 void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 				      struct folio *folio, u64 start, u32 len);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 3/5] btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps
  2026-04-24  8:51 [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size Qu Wenruo
  2026-04-24  8:51 ` [PATCH RFC 1/5] btrfs: update the out-of-date comments on subpage Qu Wenruo
  2026-04-24  8:51 ` [PATCH RFC 2/5] btrfs: prepare subpage operations to support >= BITS_PER_LONG sub-bitmaps Qu Wenruo
@ 2026-04-24  8:51 ` Qu Wenruo
  2026-04-24  8:51 ` [PATCH RFC 4/5] btrfs: remove 2K block size support Qu Wenruo
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2026-04-24  8:51 UTC (permalink / raw)
  To: linux-btrfs

[CURRENT LIMIT]
Btrfs currently only supports sub-bitmaps (e.g. dirty bitmap) no larger
than BITS_PER_LONG.

One call site that utilizes this limit is btrfs_bio_ctrl::submit_bitmap,
which makes it very simple and straightforward to just grab an unsigned
long value and assign it to submit_bitmap.

Unfortunately that limit prevents us from supporting huge folios, which
means order 9 and for 4K page size and block size, it means 512 blocks
inside a 2M folio.

[ENHANCEMENT]
Instead of using a fixed unsigned long value, change
btrfs_bio_ctrl::submit_bitmap to an unsigned long pointer.

And for cases where an unsigned long can hold the whole bitmap,
introduce @submit_bitmap_value, and just point that pointer to that
unsigned long.

Then update all direct users of bio_ctrl->submit_bitmap to use the
pointer version.

There are several call sites that get extra changes:

- @range_bit inside extent_writepage_io()
  Which is only utilized to truncate the bitmap.
  Since we do not want to allocate new memory just for such temporary
  usage, change the original bitmap_set() and bitmap_and() into a loop
  to clear unrelated bits from the submit bitmap.

- Getting dirty subpage bitmap inside writepage_delalloc()
  Since we're passing an unsigned long pointer now, we need to go with
  different handling (bs == ps, blocks_per_folio <= BITS_PER_LONG,
  blocks_per_folio > BITS_PER_LONG).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 82 +++++++++++++++++++++++++++++++-------------
 fs/btrfs/subpage.c   | 29 +++++++++++-----
 fs/btrfs/subpage.h   |  7 ++--
 3 files changed, 83 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 13b726831aa0..b8dfd8770365 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -130,7 +130,13 @@ struct btrfs_bio_ctrl {
 	 * extent_writepage_io().
 	 * This is to avoid touching ranges covered by compression/inline.
 	 */
-	unsigned long submit_bitmap;
+	unsigned long *submit_bitmap;
+	/*
+	 * When blocks_per_folio <= BITS_PER_LONG, we can use the inline
+	 * one without allocating memory.
+	 */
+	unsigned long submit_bitmap_value;
+
 	struct readahead_control *ractl;
 
 	/*
@@ -1447,9 +1453,9 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	int ret = 0;
 
 	/* Save the dirty bitmap as our submission bitmap will be a subset of it. */
-	bio_ctrl->submit_bitmap = btrfs_get_subpage_dirty_bitmap_value(fs_info, folio);
+	btrfs_copy_subpage_dirty_bitmap(fs_info, folio, bio_ctrl->submit_bitmap);
 
-	for_each_set_bitrange(start_bit, end_bit, &bio_ctrl->submit_bitmap,
+	for_each_set_bitrange(start_bit, end_bit, bio_ctrl->submit_bitmap,
 			      blocks_per_folio) {
 		u64 start = page_start + (start_bit << fs_info->sectorsize_bits);
 		u32 len = (end_bit - start_bit) << fs_info->sectorsize_bits;
@@ -1525,7 +1531,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 					     btrfs_ino(inode),
 					     folio_pos(folio),
 					     blocks_per_folio,
-					     &bio_ctrl->submit_bitmap,
+					     bio_ctrl->submit_bitmap,
 					     found_start, found_len, ret);
 		} else {
 			/*
@@ -1550,7 +1556,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 						 fs_info->sectorsize_bits;
 			unsigned int end_bit = (min(page_end + 1, found_start + found_len) -
 						page_start) >> fs_info->sectorsize_bits;
-			bitmap_clear(&bio_ctrl->submit_bitmap, start_bit, end_bit - start_bit);
+			bitmap_clear(bio_ctrl->submit_bitmap, start_bit, end_bit - start_bit);
 		}
 		/*
 		 * Above btrfs_run_delalloc_range() may have unlocked the folio,
@@ -1571,7 +1577,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 				fs_info->sectorsize_bits,
 				blocks_per_folio);
 
-		for_each_set_bitrange(start_bit, end_bit, &bio_ctrl->submit_bitmap,
+		for_each_set_bitrange(start_bit, end_bit, bio_ctrl->submit_bitmap,
 				      bitmap_size) {
 			u64 start = page_start + (start_bit << fs_info->sectorsize_bits);
 			u32 len = (end_bit - start_bit) << fs_info->sectorsize_bits;
@@ -1597,7 +1603,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	 * If all ranges are submitted asynchronously, we just need to account
 	 * for them here.
 	 */
-	if (bitmap_empty(&bio_ctrl->submit_bitmap, blocks_per_folio)) {
+	if (bitmap_empty(bio_ctrl->submit_bitmap, blocks_per_folio)) {
 		wbc->nr_to_write -= delalloc_to_write;
 		return 1;
 	}
@@ -1718,7 +1724,6 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 						  loff_t i_size)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	unsigned long range_bitmap = 0;
 	bool submitted_io = false;
 	int found_error = 0;
 	const u64 end = start + len;
@@ -1746,14 +1751,18 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 		return -EUCLEAN;
 	}
 
-	bitmap_set(&range_bitmap, (start - folio_pos(folio)) >> fs_info->sectorsize_bits,
-		   len >> fs_info->sectorsize_bits);
-	bitmap_and(&bio_ctrl->submit_bitmap, &bio_ctrl->submit_bitmap, &range_bitmap,
-		   blocks_per_folio);
+	/* Truncate the submit bitmap to the current range. */
+	if (start > folio_start)
+		bitmap_clear(bio_ctrl->submit_bitmap, 0,
+			     (start - folio_start) >> fs_info->sectorsize_bits);
+	if (start + len < folio_end)
+		bitmap_clear(bio_ctrl->submit_bitmap,
+			     (end - folio_start) >> fs_info->sectorsize_bits,
+			     (folio_end - end) >> fs_info->sectorsize_bits);
 
 	bio_ctrl->end_io_func = end_bbio_data_write;
 
-	for_each_set_bit(bit, &bio_ctrl->submit_bitmap, blocks_per_folio) {
+	for_each_set_bit(bit, bio_ctrl->submit_bitmap, blocks_per_folio) {
 		cur = folio_pos(folio) + (bit << fs_info->sectorsize_bits);
 
 		if (cur >= i_size) {
@@ -1813,6 +1822,32 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 	return found_error;
 }
 
+static void bio_ctrl_init_submit_bitmap(struct btrfs_fs_info *fs_info,
+					struct folio *folio,
+					struct btrfs_bio_ctrl *bio_ctrl)
+{
+	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
+
+	/* Only supported for blocks per folio <= BITS_PER_LONG for now. */
+	ASSERT(blocks_per_folio <= BITS_PER_LONG);
+	bio_ctrl->submit_bitmap_value = 0;
+	bio_ctrl->submit_bitmap = &bio_ctrl->submit_bitmap_value;
+	/*
+	 * Default to unlock the whole folio.
+	 * The proper bitmap is not initialized until writepage_delalloc().
+	 */
+	bitmap_set(bio_ctrl->submit_bitmap, 0, blocks_per_folio);
+}
+
+static void bio_ctrl_release_submit_bitmap(struct btrfs_fs_info *fs_info,
+					   struct folio *folio,
+					   struct btrfs_bio_ctrl *bio_ctrl)
+{
+	ASSERT(btrfs_blocks_per_folio(fs_info, folio) <= BITS_PER_LONG);
+
+	bio_ctrl->submit_bitmap = NULL;
+}
+
 /*
  * the writepage semantics are similar to regular writepage.  extent
  * records are inserted to lock ranges in the tree, and as dirty areas
@@ -1847,12 +1882,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 	if (folio_contains(folio, end_index))
 		folio_zero_range(folio, pg_offset, folio_size(folio) - pg_offset);
 
-	/*
-	 * Default to unlock the whole folio.
-	 * The proper bitmap can only be initialized until writepage_delalloc().
-	 */
-	bio_ctrl->submit_bitmap = (unsigned long)-1;
-
+	bio_ctrl_init_submit_bitmap(fs_info, folio, bio_ctrl);
 	/*
 	 * If the page is dirty but without private set, it's marked dirty
 	 * without informing the fs.
@@ -1877,21 +1907,25 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 		goto done;
 
 	ret = writepage_delalloc(inode, folio, bio_ctrl);
-	if (ret == 1)
+	if (ret == 1) {
+		bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
 		return 0;
+	}
 	if (ret)
 		goto done;
 
 	ret = extent_writepage_io(inode, folio, folio_pos(folio),
 				  folio_size(folio), bio_ctrl, i_size);
-	if (ret == 1)
+	if (ret == 1) {
+		bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
 		return 0;
+	}
 	if (unlikely(ret < 0))
 		btrfs_err_rl(fs_info,
 "failed to submit blocks, root=%lld inode=%llu folio=%llu submit_bitmap=%*pbl: %d",
 			     btrfs_root_id(inode->root), btrfs_ino(inode),
 			     folio_pos(folio), blocks_per_folio,
-			     &bio_ctrl->submit_bitmap, ret);
+			     bio_ctrl->submit_bitmap, ret);
 
 	bio_ctrl->wbc->nr_to_write--;
 
@@ -1903,6 +1937,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 	 * submitted ranges inside the folio.
 	 */
 	btrfs_folio_end_lock_bitmap(fs_info, folio, bio_ctrl->submit_bitmap);
+	bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
 	ASSERT(ret <= 0);
 	return ret;
 }
@@ -2638,7 +2673,7 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
 		 * Set the submission bitmap to submit all sectors.
 		 * extent_writepage_io() will do the truncation correctly.
 		 */
-		bio_ctrl.submit_bitmap = (unsigned long)-1;
+		bio_ctrl_init_submit_bitmap(fs_info, folio, &bio_ctrl);
 		ret = extent_writepage_io(BTRFS_I(inode), folio, cur, cur_len,
 					  &bio_ctrl, i_size);
 		if (ret == 1)
@@ -2650,6 +2685,7 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
 		if (ret < 0)
 			found_error = true;
 next_page:
+		bio_ctrl_release_submit_bitmap(fs_info, folio, &bio_ctrl);
 		folio_put(folio);
 		cur = cur_end + 1;
 	}
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 2116a6b4a476..41b369db0839 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -276,7 +276,7 @@ void btrfs_folio_end_lock(const struct btrfs_fs_info *fs_info,
 }
 
 void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info,
-				 struct folio *folio, unsigned long bitmap)
+				 struct folio *folio, unsigned long *bitmap)
 {
 	struct btrfs_folio_state *bfs = folio_get_private(folio);
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
@@ -298,7 +298,7 @@ void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info,
 	}
 
 	spin_lock_irqsave(&bfs->lock, flags);
-	for_each_set_bit(bit, &bitmap, blocks_per_folio) {
+	for_each_set_bit(bit, bitmap, blocks_per_folio) {
 		if (test_and_clear_bit(bit + start_bit, bfs->bitmaps))
 			cleared++;
 	}
@@ -794,24 +794,35 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&bfs->lock, flags);
 }
 
-unsigned long btrfs_get_subpage_dirty_bitmap_value(struct btrfs_fs_info *fs_info,
-						   struct folio *folio)
+void btrfs_copy_subpage_dirty_bitmap(struct btrfs_fs_info *fs_info,
+				     struct folio *folio,
+				     unsigned long *dst)
 {
 	struct btrfs_folio_state *bfs;
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
 	unsigned long flags;
 	unsigned long value;
 
-	if (blocks_per_folio == 1)
-		return 1;
+	if (blocks_per_folio == 1) {
+		value = 1;
+		bitmap_copy(dst, &value, 1);
+		return;
+	}
 
 	ASSERT(folio_test_private(folio) && folio_get_private(folio));
 	ASSERT(blocks_per_folio > 1);
-	ASSERT(blocks_per_folio <= BITS_PER_LONG);
 	bfs = folio_get_private(folio);
 
+	if (blocks_per_folio <= BITS_PER_LONG) {
+		spin_lock_irqsave(&bfs->lock, flags);
+		value = bitmap_read(bfs->bitmaps, btrfs_bitmap_nr_dirty * blocks_per_folio,
+				    blocks_per_folio);
+		spin_unlock_irqrestore(&bfs->lock, flags);
+		bitmap_copy(dst, &value, blocks_per_folio);
+		return;
+	}
 	spin_lock_irqsave(&bfs->lock, flags);
-	value = get_bitmap_value_dirty(fs_info, folio);
+	bitmap_copy(dst, get_bitmap_pointer_dirty(fs_info, folio),
+		    blocks_per_folio);
 	spin_unlock_irqrestore(&bfs->lock, flags);
-	return value;
 }
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 9e92877e7251..b45694eecb41 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -131,7 +131,7 @@ void btrfs_folio_end_lock(const struct btrfs_fs_info *fs_info,
 void btrfs_folio_set_lock(const struct btrfs_fs_info *fs_info,
 			  struct folio *folio, u64 start, u32 len);
 void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info,
-				 struct folio *folio, unsigned long bitmap);
+				 struct folio *folio, unsigned long *bitmap);
 /*
  * Template for subpage related operations.
  *
@@ -200,8 +200,9 @@ bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
 void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info,
 				  struct folio *folio, u64 start, u32 len);
 bool btrfs_meta_folio_clear_and_test_dirty(struct folio *folio, const struct extent_buffer *eb);
-unsigned long btrfs_get_subpage_dirty_bitmap_value(struct btrfs_fs_info *fs_info,
-						   struct folio *folio);
+void btrfs_copy_subpage_dirty_bitmap(struct btrfs_fs_info *fs_info,
+				     struct folio *folio,
+				     unsigned long *dst);
 void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 				      struct folio *folio, u64 start, u32 len);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 4/5] btrfs: remove 2K block size support
  2026-04-24  8:51 [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size Qu Wenruo
                   ` (2 preceding siblings ...)
  2026-04-24  8:51 ` [PATCH RFC 3/5] btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps Qu Wenruo
@ 2026-04-24  8:51 ` Qu Wenruo
  2026-04-24 10:13   ` David Sterba
  2026-04-24  8:51 ` [PATCH RFC 5/5] btrfs: introduce support for huge folios Qu Wenruo
  2026-04-27 14:10 ` [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size David Sterba
  5 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2026-04-24  8:51 UTC (permalink / raw)
  To: linux-btrfs

Originally 2K block size support was introduced to test subpage (block
size < page size) on x86_64 where the page size is exactly the original
minimal block size.

However that 2K block size support has some problems:

- No 2K nodesize support
  This is critical, as there is still no way to exercise the subpage
  metadata routine.

- Very easy to test subpage data path now
  With the currently experimental large folio support, it's very easy to
  test the subpage data folio path already, as when a folio larger than
  4K is encountered on x86_64, we will need all the subpage folio states
  and bitmaps.

  So there is no need to use 2K block size just to verify subpage data
  path even on x86_64.

And with the incoming huge folio (2M on x86_64) support, the 2K block
size will easily double the bitmap size, considering the burden to
maintain and the limited extra coverage, I believe it's time to remove
it for the incoming huge folio support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/fs.h | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index d69b8ce41912..e99db75dede6 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -50,18 +50,8 @@ struct btrfs_subpage_info;
 struct btrfs_stripe_hash_table;
 struct btrfs_space_info;
 
-/*
- * Minimum data and metadata block size.
- *
- * Normally it's 4K, but for testing subpage block size on 4K page systems, we
- * allow DEBUG builds to accept 2K page size.
- */
-#ifdef CONFIG_BTRFS_DEBUG
-#define BTRFS_MIN_BLOCKSIZE	(SZ_2K)
-#else
+/* Minimum data and metadata block size. */
 #define BTRFS_MIN_BLOCKSIZE	(SZ_4K)
-#endif
-
 #define BTRFS_MAX_BLOCKSIZE	(SZ_64K)
 
 #define BTRFS_MAX_EXTENT_SIZE SZ_128M
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 5/5] btrfs: introduce support for huge folios
  2026-04-24  8:51 [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size Qu Wenruo
                   ` (3 preceding siblings ...)
  2026-04-24  8:51 ` [PATCH RFC 4/5] btrfs: remove 2K block size support Qu Wenruo
@ 2026-04-24  8:51 ` Qu Wenruo
  2026-04-27 14:10 ` [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size David Sterba
  5 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2026-04-24  8:51 UTC (permalink / raw)
  To: linux-btrfs

With all the previous preparations, it's finally time to enable the
huge folio support.

This will enlarge the bitmap size for each sub-bitmap inside
btrfs_folio_state, and there are some special call sites that need
extra mention:

- The max folio size
  Here we define a local MAX_BLOCKS_PER_FOLIO using BTRFS_MAX_FOLIO_SIZE
  (2M) and the minimum and most common fs block size
  BTRFS_MIN_BLOCKSIZE (4K).

  This will ensure we have a large enough but not too large folio for
  btrfs.
  This limit applies to all systems regardless of page size.

  For 64K page size, this will slightly reduce the max folio size, from
  the original 4MiB (with 64K page size) to the current 2MiB, and still
  not reaching huge folios (order 9, with 64K page size it is 32MiB).

  However, in reality folios larger than 2MiB are rarely any better for IO
  performance. Meanwhile, excessively large folios can cause other
  problems like stalling the IO pipeline for too long.
  So although the max folio size is reduced on 64K page sized systems,
  it shouldn't cause any huge problems.

  For 4K page size, this will increase the max folio size from 256K to
  2MiB, which means finally there is huge folio support.

- btrfs_bio_ctrl::submit_bitmap
  This will be enlarged to contain MAX_BLOCKS_PER_FOLIO, and this will
  be on-stack memory.
  This will increase on-stack memory usage by 48 bytes.

- Local @delalloc_bitmap inside writepage_delalloc()
  Unfortunately we cannot afford to handle an allocation error here, thus
  again we use on-stack memory.
  Thus this will increase on-stack memory usage by 56 bytes again.

So unfortunately this means during the delalloc window, the writeback path
will have +104 bytes for on-stack memory usage, and for other cases the
writeback path will have +48 bytes on-stack memory usage.

The +48 bytes (btrfs_bio_ctrl::submit_bitmap) can be removed
after we have reworked the compression submission, so the current
on-stack submit_bitmap is mostly a workaround until then.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   |  2 +-
 fs/btrfs/extent_io.c | 44 ++++++++++++++------------------------------
 fs/btrfs/fs.h        |  4 ++++
 fs/btrfs/subpage.c   |  6 +++---
 4 files changed, 22 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c67faf61ed9a..9b72878c08cf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3394,7 +3394,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	fs_info->sectorsize = sectorsize;
 	fs_info->sectorsize_bits = ilog2(sectorsize);
 	fs_info->block_min_order = ilog2(round_up(sectorsize, PAGE_SIZE) >> PAGE_SHIFT);
-	fs_info->block_max_order = ilog2((BITS_PER_LONG << fs_info->sectorsize_bits) >> PAGE_SHIFT);
+	fs_info->block_max_order = ilog2(BTRFS_MAX_FOLIO_SIZE >> PAGE_SHIFT);
 	fs_info->csums_per_leaf = BTRFS_MAX_ITEM_SIZE(fs_info) / fs_info->csum_size;
 	fs_info->stripesize = stripesize;
 	fs_info->fs_devices->fs_info = fs_info;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b8dfd8770365..afbf40382508 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -91,6 +91,8 @@ void btrfs_extent_buffer_leak_debug_check(struct btrfs_fs_info *fs_info)
 #define btrfs_leak_debug_del_eb(eb)			do {} while (0)
 #endif
 
+#define MAX_BLOCKS_PER_FOLIO	(BTRFS_MAX_FOLIO_SIZE / BTRFS_MIN_BLOCKSIZE)
+
 /*
  * Structure to record info about the bio being assembled, and other info like
  * how many bytes are there before stripe/ordered extent boundary.
@@ -130,12 +132,7 @@ struct btrfs_bio_ctrl {
 	 * extent_writepage_io().
 	 * This is to avoid touching ranges covered by compression/inline.
 	 */
-	unsigned long *submit_bitmap;
-	/*
-	 * When blocks_per_folio <= BITS_PER_LONG, we can use the inline
-	 * one without allocating memory.
-	 */
-	unsigned long submit_bitmap_value;
+	unsigned long submit_bitmap[BITS_TO_LONGS(MAX_BLOCKS_PER_FOLIO)];
 
 	struct readahead_control *ractl;
 
@@ -1428,7 +1425,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	const u64 page_start = folio_pos(folio);
 	const u64 page_end = page_start + folio_size(folio) - 1;
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
-	unsigned long delalloc_bitmap = 0;
+	unsigned long delalloc_bitmap[BITS_TO_LONGS(MAX_BLOCKS_PER_FOLIO)] = { 0 };
 	/*
 	 * Save the last found delalloc end. As the delalloc end can go beyond
 	 * page boundary, thus we cannot rely on subpage bitmap to locate the
@@ -1471,7 +1468,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 			delalloc_start = delalloc_end + 1;
 			continue;
 		}
-		set_delalloc_bitmap(folio, &delalloc_bitmap, delalloc_start,
+		set_delalloc_bitmap(folio, delalloc_bitmap, delalloc_start,
 				    min(delalloc_end, page_end) + 1 - delalloc_start);
 		last_delalloc_end = delalloc_end;
 		delalloc_start = delalloc_end + 1;
@@ -1497,7 +1494,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 			found_len = last_delalloc_end + 1 - found_start;
 			found = true;
 		} else {
-			found = find_next_delalloc_bitmap(folio, &delalloc_bitmap,
+			found = find_next_delalloc_bitmap(folio, delalloc_bitmap,
 					delalloc_start, &found_start, &found_len);
 		}
 		if (!found)
@@ -1828,26 +1825,19 @@ static void bio_ctrl_init_submit_bitmap(struct btrfs_fs_info *fs_info,
 {
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
 
-	/* Only supported for blocks per folio <= BITS_PER_LONG for now. */
-	ASSERT(blocks_per_folio <= BITS_PER_LONG);
-	bio_ctrl->submit_bitmap_value = 0;
-	bio_ctrl->submit_bitmap = &bio_ctrl->submit_bitmap_value;
+	ASSERT(blocks_per_folio <= MAX_BLOCKS_PER_FOLIO);
+
 	/*
 	 * Default to unlock the whole folio.
 	 * The proper bitmap is not initialized until writepage_delalloc().
+	 *
+	 * We're safe just to set the bitmap range [0, blocks_per_folio), as
+	 * all later usage of the bitmap will follow the same range limit.
+	 * Any bits beyond blocks_per_folio will be ignored.
 	 */
 	bitmap_set(bio_ctrl->submit_bitmap, 0, blocks_per_folio);
 }
 
-static void bio_ctrl_release_submit_bitmap(struct btrfs_fs_info *fs_info,
-					   struct folio *folio,
-					   struct btrfs_bio_ctrl *bio_ctrl)
-{
-	ASSERT(btrfs_blocks_per_folio(fs_info, folio) <= BITS_PER_LONG);
-
-	bio_ctrl->submit_bitmap = NULL;
-}
-
 /*
  * the writepage semantics are similar to regular writepage.  extent
  * records are inserted to lock ranges in the tree, and as dirty areas
@@ -1907,19 +1897,15 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 		goto done;
 
 	ret = writepage_delalloc(inode, folio, bio_ctrl);
-	if (ret == 1) {
-		bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
+	if (ret == 1)
 		return 0;
-	}
 	if (ret)
 		goto done;
 
 	ret = extent_writepage_io(inode, folio, folio_pos(folio),
 				  folio_size(folio), bio_ctrl, i_size);
-	if (ret == 1) {
-		bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
+	if (ret == 1)
 		return 0;
-	}
 	if (unlikely(ret < 0))
 		btrfs_err_rl(fs_info,
 "failed to submit blocks, root=%lld inode=%llu folio=%llu submit_bitmap=%*pbl: %d",
@@ -1937,7 +1923,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 	 * submitted ranges inside the folio.
 	 */
 	btrfs_folio_end_lock_bitmap(fs_info, folio, bio_ctrl->submit_bitmap);
-	bio_ctrl_release_submit_bitmap(fs_info, folio, bio_ctrl);
 	ASSERT(ret <= 0);
 	return ret;
 }
@@ -2685,7 +2670,6 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
 		if (ret < 0)
 			found_error = true;
 next_page:
-		bio_ctrl_release_submit_bitmap(fs_info, folio, &bio_ctrl);
 		folio_put(folio);
 		cur = cur_end + 1;
 	}
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index e99db75dede6..5c152acf61d3 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -54,6 +54,10 @@ struct btrfs_space_info;
 #define BTRFS_MIN_BLOCKSIZE	(SZ_4K)
 #define BTRFS_MAX_BLOCKSIZE	(SZ_64K)
 
+/* The max folio size btrfs supports. */
+#define BTRFS_MAX_FOLIO_SIZE	(SZ_2M)
+static_assert(BTRFS_MAX_FOLIO_SIZE >= PAGE_SIZE);
+
 #define BTRFS_MAX_EXTENT_SIZE SZ_128M
 
 /*
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 41b369db0839..00b0a17445a5 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -13,9 +13,9 @@
  * - Metadata must be fully aligned to node size
  *   So when nodesize <= page size, the metadata can never cross folio boundaries.
  *
- * - Only support block per folio <= BITS_PER_LONG
- *   This is to make bitmap copying much easier, a single unsigned long can handle
- *   one bitmap.
+ * - Only support block per folio <= (BTRFS_MAX_FOLIO_SIZE / BTRFS_MIN_BLOCKSIZE)
+ *   This is to ensure we can afford an on-stack bitmap, without the need to allocate
+ *   bitmap memory at runtime.
  *
  * Implementation:
  *
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 4/5] btrfs: remove 2K block size support
  2026-04-24  8:51 ` [PATCH RFC 4/5] btrfs: remove 2K block size support Qu Wenruo
@ 2026-04-24 10:13   ` David Sterba
  0 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2026-04-24 10:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 24, 2026 at 06:21:33PM +0930, Qu Wenruo wrote:
> Originally 2K block size support was introduced to test subpage (block
> size < page size) on x86_64 where the page size is exactly the original
> minimal block size.
> 
> However that 2K block size support has some problems:
> 
> - No 2K nodesize support
>   This is critical, as there is still no way to exercise the subpage
>   metadata routine.
> 
> - Very easy to test subpage data path now
>   With the currently experimental large folio support, it's very easy to
>   test the subpage data folio path already, as when a folio larger than
>   4K is encountered on x86_64, we will need all the subpage folio states
>   and bitmaps.
> 
>   So there is no need to use 2K block size just to verify subpage data
>   path even on x86_64.
> 
> And with the incoming huge folio (2M on x86_64) support, the 2K block
> size will easily double the bitmap size, considering the burden to
> maintain and the limited extra coverage, I believe it's time to remove
> it for the incoming huge folio support.

Ok then, the 2K support is inferior compared to the folio updates so off
it goes. You can add it now so you don't have to keep it in the series.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size
  2026-04-24  8:51 [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size Qu Wenruo
                   ` (4 preceding siblings ...)
  2026-04-24  8:51 ` [PATCH RFC 5/5] btrfs: introduce support for huge folios Qu Wenruo
@ 2026-04-27 14:10 ` David Sterba
  5 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2026-04-27 14:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 24, 2026 at 06:21:29PM +0930, Qu Wenruo wrote:
> [REASON FOR RFC]
> 
> - Unstable baseline for-next branch
>   The current for-next branch has a known regression where
>   btrfs_async_reclaim_metadata_space() kworker can fall into a dead loop
>   and fail several ENOSPC test cases.
>   Although the regression is pinned down, I'm still waiting for a good
>   baseline to rebase the series.

This has been resolved, patch removed from for-next.

> 
> - Extra on-stack memory usage
>   Currently we have two bitmaps that are using extra on-stack memory.
>   Each of them uses 64 bytes on-stack memory, so in total we can have
>   128 bytes just for the bitmaps.
> 
>   Not sure if it's a concern.

This will probably end up in the io path so it can be noticeable but as
we've been saving the stack bytes for some time I hope we can afford to
use as much as 128 bytes.

Allocation on IO path is worse problem in case we have to write data to
free more memory.

> - Uncertainty related to v7.2 large folio support
>   I'm planning to move large data folio support out of experimental.
>   So when that support is no longer experimental, this huge folio support
>   would need to start as experimental

I'd rather enable them separately. The large folios will affect
everything so we should minimize potential problems with another
feature.

>   Depends on which patch(set) is merged first, there needs to be some
>   modification to ensure end users won't suddenly get not only large but
>   also huge folios.

As rc1 is out we can start enabling fetures for 7.2, so please add the
large folios by default and the THP support as experimental.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-27 14:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24  8:51 [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size Qu Wenruo
2026-04-24  8:51 ` [PATCH RFC 1/5] btrfs: update the out-of-date comments on subpage Qu Wenruo
2026-04-24  8:51 ` [PATCH RFC 2/5] btrfs: prepare subpage operations to support >= BITS_PER_LONG sub-bitmaps Qu Wenruo
2026-04-24  8:51 ` [PATCH RFC 3/5] btrfs: migrate btrfs_bio_ctrl::submit_bitmap to support larger bitmaps Qu Wenruo
2026-04-24  8:51 ` [PATCH RFC 4/5] btrfs: remove 2K block size support Qu Wenruo
2026-04-24 10:13   ` David Sterba
2026-04-24  8:51 ` [PATCH RFC 5/5] btrfs: introduce support for huge folios Qu Wenruo
2026-04-27 14:10 ` [PATCH RFC 0/5] btrfs: support huge data folios for 4K page size David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox