[PATCH 0/5] btrfs: remove folio ordered flag

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* [PATCH 0/5] btrfs: remove folio ordered flag
@ 2026-05-07  5:29 Qu Wenruo
  2026-05-07  5:29 ` [PATCH 1/5] btrfs: detect dirty blocks without an ordered extent more reliably Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Qu Wenruo @ 2026-05-07  5:29 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
RFC->v1:
- Replace the folio_test_ordered() inside extent_writepage_io()
  Now do the check inside alloc_new_bio(), which is already doing the OE
  search.

  Now we can detect dirty blocks without an OE at per-block level, much
  better than the previous per-folio level checks, and without any extra
  overhead.

Btrfs has a long history using an internal folio flag called ordered,
which is to indicate if an fs block is covered by an ordered extent.

However this means we need to synchronize between ordered extents, which
are managed by a per-inode ordered tree, and folio flag/subpage bitmap.

Furthermore with huge folio support, the ordered bitmap can be as large
as 64 bytes (512 bits), which is not a small amount.

The series is going to remove folio ordered flag completely, along with
the ordered subpage bitmap.

Most call sites of folio_test_ordered() are just inside ASSERT()s, so
it's not too hard to remove them.

There are two call sites that utilizing *_folio_test_ordered():

- Inside extent_writepage_io()
  The warning left by the legacy COW fixup mechanism.
  The 1st patch is to introduce a more reliable way to detect dirty
  blocks without an OE, other than checking the folio ordered flag, now
  it's doing a per-block level check without introducing new overhead.

- Inside btrfs_invalidate_folio()
  We use ordered flag to check if we can skip an ordered extent.
  This is worked around by using the fact that we have waited for
  writeback of the folio, so that endio should have already finished for
  the writeback range. Then check dirty flags to determine if we can skip
  the OE range.

  To get a reliable dirty flag for both sub-folio and full-folio cases, we
  can not clear the folio dirty flag early, so the 2nd patch is
  introduced to change the folio dirty flag clearing timing, then the
  3rd patch can remove the folio_test_ordered() usage.

Then the 4th patch is to remove the remaining folio_test_ordered()
usage, and finally we can remove the whole ordered flag/subpage bitmap
completely.

I tried to hide the ordered flag/bitmap behind DEBUG, but unfortunately
the subpage bitmap macros are not that easy to be tweaked to handle
conditional ordered flags.

So the ordered flag/bitmap must be either there, or be completely gone.
I hope enough test runs will cover the removed ASSERT()s.

Qu Wenruo (5):
  btrfs: detect dirty blocks without an ordered extent more reliably
  btrfs: unify folio dirty flag clearing
  btrfs: use dirty flag to check if an ordered extent needs to be
    truncated
  btrfs: remove folio_test_ordered() usage
  btrfs: remove folio ordered flag and subpage bitmap

 fs/btrfs/extent_io.c | 117 ++++++++++++++++++++++++-------------------
 fs/btrfs/extent_io.h |  12 ++++-
 fs/btrfs/fs.h        |   8 ---
 fs/btrfs/inode.c     |  60 ++++++----------------
 fs/btrfs/subpage.c   |  41 +--------------
 fs/btrfs/subpage.h   |  19 +++----
 6 files changed, 100 insertions(+), 157 deletions(-)

-- 
2.54.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/5] btrfs: detect dirty blocks without an ordered extent more reliably
  2026-05-07  5:29 [PATCH 0/5] btrfs: remove folio ordered flag Qu Wenruo
@ 2026-05-07  5:29 ` Qu Wenruo
  2026-05-07  5:29 ` [PATCH 2/5] btrfs: unify folio dirty flag clearing Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2026-05-07  5:29 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs detects dirty folio which doesn't have an ordered extent
at extent_writepage_io(), but that is not ideal:

- The check is not handling all dirty blocks
  We can have multiple blocks inside a large folio, but the whole folio
  is marked ordered as long as there is one ordered extent in the range.

  We can still hit cases where some dirty blocks do not have
  corresponding ordered extents.

Instead of checking the folio ordered flags, do the check at
alloc_new_bio(), where we're already searching for ordered extents for
writebacks.

If we didn't find an ordered extent, we should already give an error
message and notify the caller there is something wrong.

This allows us to check every block that goes through
submit_extent_folio().

With this new and more reliable check, we can remove the old check.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 85 ++++++++++++++++++++++++++++----------------
 1 file changed, 54 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ebf9a63946e5..3550ae40255c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -730,9 +730,9 @@ static bool btrfs_bio_is_contig(struct btrfs_bio_ctrl *bio_ctrl,
 		bio_end_sector(bio) == sector;
 }
 
-static void alloc_new_bio(struct btrfs_inode *inode,
-			  struct btrfs_bio_ctrl *bio_ctrl,
-			  u64 disk_bytenr, u64 file_offset)
+static int alloc_new_bio(struct btrfs_inode *inode,
+			 struct btrfs_bio_ctrl *bio_ctrl,
+			 u64 disk_bytenr, u64 file_offset)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_bio *bbio;
@@ -749,13 +749,25 @@ static void alloc_new_bio(struct btrfs_inode *inode,
 	if (bio_ctrl->wbc) {
 		struct btrfs_ordered_extent *ordered;
 
+		/* This must be a write for data inodes. */
+		ASSERT(btrfs_op(&bio_ctrl->bbio->bio) == BTRFS_MAP_WRITE);
+		ASSERT(is_data_inode(inode));
+
 		ordered = btrfs_lookup_ordered_extent(inode, file_offset);
-		if (ordered) {
-			bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX,
-					ordered->file_offset +
-					ordered->disk_num_bytes - file_offset);
-			bbio->ordered = ordered;
+		if (unlikely(!ordered)) {
+			bio_ctrl->bbio = NULL;
+			bio_ctrl->next_file_offset = 0;
+			bio_put(&bbio->bio);
+			btrfs_err_rl(fs_info,
+	"root %lld ino %llu file offset %llu is marked dirty without notifying the fs",
+				     btrfs_root_id(inode->root), btrfs_ino(inode),
+				     file_offset);
+			return -EUCLEAN;
 		}
+		bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX,
+				ordered->file_offset +
+				ordered->disk_num_bytes - file_offset);
+		bbio->ordered = ordered;
 
 		/*
 		 * Pick the last added device to support cgroup writeback.  For
@@ -766,6 +778,7 @@ static void alloc_new_bio(struct btrfs_inode *inode,
 		bio_set_dev(&bbio->bio, fs_info->fs_devices->latest_dev->bdev);
 		wbc_init_bio(bio_ctrl->wbc, &bbio->bio);
 	}
+	return 0;
 }
 
 /*
@@ -781,14 +794,19 @@ static void alloc_new_bio(struct btrfs_inode *inode,
  * new one in @bio_ctrl->bbio.
  * The mirror number for this IO should already be initialized in
  * @bio_ctrl->mirror_num.
+ *
+ * Return the number of bytes that are queued into a bio.
+ * If the returned bytes is smaller than @size, it means we hit a critical error
+ * for data write, where there is no ordered extent for the range.
  */
-static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
-			       u64 disk_bytenr, struct folio *folio,
-			       size_t size, unsigned long pg_offset,
-			       u64 read_em_generation)
+static unsigned int submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
+					u64 disk_bytenr, struct folio *folio,
+					size_t size, unsigned long pg_offset,
+					u64 read_em_generation)
 {
 	struct btrfs_inode *inode = folio_to_inode(folio);
 	loff_t file_offset = folio_pos(folio) + pg_offset;
+	unsigned int queued = 0;
 
 	ASSERT(pg_offset + size <= folio_size(folio));
 	ASSERT(bio_ctrl->end_io_func);
@@ -801,8 +819,13 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
 		u32 len = size;
 
 		/* Allocate new bio if needed */
-		if (!bio_ctrl->bbio)
-			alloc_new_bio(inode, bio_ctrl, disk_bytenr, file_offset);
+		if (!bio_ctrl->bbio) {
+			int ret;
+
+			ret = alloc_new_bio(inode, bio_ctrl, disk_bytenr, file_offset);
+			if (ret < 0)
+				break;
+		}
 
 		/* Cap to the current ordered extent boundary if there is one. */
 		if (len > bio_ctrl->len_to_oe_boundary) {
@@ -830,6 +853,7 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
 		pg_offset += len;
 		disk_bytenr += len;
 		file_offset += len;
+		queued += len;
 
 		/*
 		 * len_to_oe_boundary defaults to U32_MAX, which isn't folio or
@@ -869,6 +893,7 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
 			submit_one_bio(bio_ctrl);
 
 	} while (size);
+	return queued;
 }
 
 static int attach_extent_buffer_folio(struct extent_buffer *eb,
@@ -1041,6 +1066,7 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
 		u64 disk_bytenr;
 		u64 block_start;
 		u64 em_gen;
+		unsigned int queued;
 
 		ASSERT(IS_ALIGNED(cur, fs_info->sectorsize));
 		if (cur >= last_byte) {
@@ -1154,8 +1180,10 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
 
 		if (force_bio_submit)
 			submit_one_bio(bio_ctrl);
-		submit_extent_folio(bio_ctrl, disk_bytenr, folio, blocksize,
-				    pg_offset, em_gen);
+		queued = submit_extent_folio(bio_ctrl, disk_bytenr, folio, blocksize,
+					     pg_offset, em_gen);
+		/* Read submission should not fail. */
+		ASSERT(queued == blocksize);
 	}
 	return 0;
 }
@@ -1643,6 +1671,7 @@ static int submit_one_sector(struct btrfs_inode *inode,
 	u64 extent_offset;
 	u64 em_end;
 	const u32 sectorsize = fs_info->sectorsize;
+	unsigned int queued;
 
 	ASSERT(IS_ALIGNED(filepos, sectorsize));
 
@@ -1709,8 +1738,15 @@ static int submit_one_sector(struct btrfs_inode *inode,
 	 */
 	ASSERT(folio_test_writeback(folio));
 
-	submit_extent_folio(bio_ctrl, disk_bytenr, folio,
-			    sectorsize, filepos - folio_pos(folio), 0);
+	queued = submit_extent_folio(bio_ctrl, disk_bytenr, folio,
+				     sectorsize, filepos - folio_pos(folio), 0);
+	if (unlikely(queued < sectorsize)) {
+		btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize);
+		btrfs_folio_clear_ordered(fs_info, folio, filepos, sectorsize);
+		btrfs_mark_ordered_io_finished(inode, filepos, fs_info->sectorsize,
+					       false);
+		return -EUCLEAN;
+	}
 	return 0;
 }
 
@@ -1743,19 +1779,6 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 	ASSERT(end <= folio_end, "start=%llu len=%u folio_start=%llu folio_size=%zu",
 	       start, len, folio_start, folio_size(folio));
 
-	if (unlikely(!folio_test_ordered(folio))) {
-		DEBUG_WARN();
-		btrfs_err_rl(fs_info,
-	"root %lld ino %llu folio %llu is marked dirty without notifying the fs",
-			     btrfs_root_id(inode->root),
-			     btrfs_ino(inode),
-			     folio_pos(folio));
-		btrfs_folio_clear_dirty(fs_info, folio, start, len);
-		btrfs_folio_set_writeback(fs_info, folio, start, len);
-		btrfs_folio_clear_writeback(fs_info, folio, start, len);
-		return -EUCLEAN;
-	}
-
 	/* Truncate the submit bitmap to the current range. */
 	if (start > folio_start)
 		bitmap_clear(bio_ctrl->submit_bitmap, 0,
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/5] btrfs: unify folio dirty flag clearing
  2026-05-07  5:29 [PATCH 0/5] btrfs: remove folio ordered flag Qu Wenruo
  2026-05-07  5:29 ` [PATCH 1/5] btrfs: detect dirty blocks without an ordered extent more reliably Qu Wenruo
@ 2026-05-07  5:29 ` Qu Wenruo
  2026-05-07  5:29 ` [PATCH 3/5] btrfs: use dirty flag to check if an ordered extent needs to be truncated Qu Wenruo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2026-05-07  5:29 UTC (permalink / raw)
  To: linux-btrfs

Currently during folio writeback, we call folio_clear_dirty_for_io()
before extent_writepage(), which causes folio dirty flag to be cleared,
but without touching the subpage bitmaps.

This works fine for the bio submission path, as we always call
btrfs_folio_clear_dirty() to clear the subpage bitmap.

But this is far from consistent, thus this patch is going to unify the
behavior to always use btrfs_folio_clear_dirty() helper to clear both
folio flag and subpage bitmap.

This involves:

- Replace folio_clear_dirty_for_io() with folio_test_dirty()
  There is only one call site calling folio_clear_dirty_for_io() outside
  of subpage.c, that's inside extent_write_cache_pages() just before
  extent_writepage().

- Make btrfs_invalidate_folio() clear dirty range for the whole folio
  The function btrfs_invalidate_folio() is also called during
  extent_writepage().

  If we had a folio completely beyond isize, we call
  folio_invalidate() -> btrfs_invalidate_folio() to free the folio.

  Since we no longer have folio_clear_dirty_for_io() to clear the folio
  dirty flag, we must manually clear the folio dirty flag for the
  to-be-invalidated folio, and also clear the PAGECACHE_TAG_DIRTY tag.

  The tag clearing is done using a new helper,
  btrfs_clear_folio_dirty_tag(), which is almost the same as the old
  btree_clear_folio_dirty_tag(), but with minor improvements including:

  * Remove the folio_test_dirty() check
    We have already done an ASSERT().

  * Add an ASSERT() to make sure folio is mapped

- Add extra ASSERT()s before clearing folio private
  During development I hit dirty folios without the private flag set,
  and that caused a lot of ASSERT()s.
  The reason is that btrfs_invalidate_folio() is relying on the dirty
  flag being cleared when it's called from extent_writepage().

  Add extra ASSERT()s inside clear_folio_extent_mapped() to catch
  wild dirty/writeback flags.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 26 +++++++++++++-------------
 fs/btrfs/extent_io.h | 11 +++++++++++
 fs/btrfs/inode.c     |  2 ++
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3550ae40255c..b307b26014c6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -958,6 +958,17 @@ void clear_folio_extent_mapped(struct folio *folio)
 	struct btrfs_fs_info *fs_info;
 
 	ASSERT(folio->mapping);
+	/*
+	 * The folio should not have writeback nor dirty flag set.
+	 *
+	 * If dirty flag is set, the folio can be written back again and we
+	 * expect the private flag set for the folio.
+	 *
+	 * If writeback flag is set, the endio may need to utilize the
+	 * private for btrfs_folio_state.
+	 */
+	ASSERT(!folio_test_dirty(folio));
+	ASSERT(!folio_test_writeback(folio));
 
 	if (!folio_test_private(folio))
 		return;
@@ -2585,7 +2596,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			}
 
 			if (folio_test_writeback(folio) ||
-			    !folio_clear_dirty_for_io(folio)) {
+			    !folio_test_dirty(folio)) {
 				folio_unlock(folio);
 				continue;
 			}
@@ -3748,17 +3759,6 @@ void free_extent_buffer_stale(struct extent_buffer *eb)
 	release_extent_buffer(eb);
 }
 
-static void btree_clear_folio_dirty_tag(struct folio *folio)
-{
-	ASSERT(!folio_test_dirty(folio));
-	ASSERT(folio_test_locked(folio));
-	xa_lock_irq(&folio->mapping->i_pages);
-	if (!folio_test_dirty(folio))
-		__xa_clear_mark(&folio->mapping->i_pages, folio->index,
-				PAGECACHE_TAG_DIRTY);
-	xa_unlock_irq(&folio->mapping->i_pages);
-}
-
 void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 			      struct extent_buffer *eb)
 {
@@ -3799,7 +3799,7 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 		folio_lock(folio);
 		last = btrfs_meta_folio_clear_and_test_dirty(folio, eb);
 		if (last)
-			btree_clear_folio_dirty_tag(folio);
+			btrfs_clear_folio_dirty_tag(folio);
 		folio_unlock(folio);
 	}
 	WARN_ON(refcount_read(&eb->refs) == 0);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index ede7abbe4031..29c57623385d 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -384,6 +384,17 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 			      struct extent_buffer *buf);
 
+static inline void btrfs_clear_folio_dirty_tag(struct folio *folio)
+{
+	ASSERT(!folio_test_dirty(folio));
+	ASSERT(folio_test_locked(folio));
+	ASSERT(folio->mapping);
+	xa_lock_irq(&folio->mapping->i_pages);
+	__xa_clear_mark(&folio->mapping->i_pages, folio->index,
+			PAGECACHE_TAG_DIRTY);
+	xa_unlock_irq(&folio->mapping->i_pages);
+}
+
 int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
 			   bool nofail);
 int btrfs_alloc_folio_array(unsigned int nr_folios, unsigned int order,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8fb00d22a924..4cc4643af7b4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7665,6 +7665,8 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
 					       &cached_state);
 		cur = range_end + 1;
 	}
+	btrfs_folio_clear_dirty(fs_info, folio, page_start, folio_size(folio));
+	btrfs_clear_folio_dirty_tag(folio);
 	/*
 	 * We have iterated through all ordered extents of the page, the page
 	 * should not have Ordered anymore, or the above iteration
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/5] btrfs: use dirty flag to check if an ordered extent needs to be truncated
  2026-05-07  5:29 [PATCH 0/5] btrfs: remove folio ordered flag Qu Wenruo
  2026-05-07  5:29 ` [PATCH 1/5] btrfs: detect dirty blocks without an ordered extent more reliably Qu Wenruo
  2026-05-07  5:29 ` [PATCH 2/5] btrfs: unify folio dirty flag clearing Qu Wenruo
@ 2026-05-07  5:29 ` Qu Wenruo
  2026-05-07  5:29 ` [PATCH 4/5] btrfs: remove folio_test_ordered() usage Qu Wenruo
  2026-05-07  5:29 ` [PATCH 5/5] btrfs: remove folio ordered flag and subpage bitmap Qu Wenruo
  4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2026-05-07  5:29 UTC (permalink / raw)
  To: linux-btrfs

Currently there are only two folio ordered flag users:

- extent_writepage_io()
  To ensure the folio range has an ordered extent covering it.
  This is from the legacy COW fixup mechanism, which is already removed
  and only a simple check is left.

- btrfs_invalidate_folio()
  This is to avoid race with end_bbio_data_write(), where
  btrfs_finish_ordered_extent() will be called to handle the OE
  finishing.

But for btrfs_invalidate_folio() we have already waited for the folio
writeback to finish, and locked the folio.
This means we can use the dirty flag to check if a range is already
submitted or not.

If the OE range is not dirty, it means the range has been submitted and
its dirty flag was cleared. And since we have already waited for
writeback, the endio function will handle the OE finishing.
Thus if the range is not dirty, we must skip the range.

If the OE range is dirty, it means we have allocated an ordered extent but
have not yet submitted the range. And that's exactly the case where we need
to truncate the ordered extent.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4cc4643af7b4..eaae344804f2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7593,15 +7593,20 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
 				page_end);
 		ASSERT(range_end + 1 - cur < U32_MAX);
 		range_len = range_end + 1 - cur;
-		if (!btrfs_folio_test_ordered(fs_info, folio, cur, range_len)) {
-			/*
-			 * If Ordered is cleared, it means endio has
-			 * already been executed for the range.
-			 * We can't delete the extent states as
-			 * btrfs_finish_ordered_io() may still use some of them.
-			 */
+		/*
+		 * If the range is not dirty, the range has been submitted and
+		 * since we have waited for the writeback, endio has been
+		 * executed, thus we must skip the range to avoid double
+		 * accounting for the ordered extent.
+		 */
+		if (!btrfs_folio_test_dirty(fs_info, folio, cur, range_len))
 			goto next;
-		}
+
+		/*
+		 * The range is dirty meaning it has not been submitted.
+		 * Here we need to truncate the OE range as the range will never
+		 * be submitted.
+		 */
 		btrfs_folio_clear_ordered(fs_info, folio, cur, range_len);
 
 		/*
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/5] btrfs: remove folio_test_ordered() usage
  2026-05-07  5:29 [PATCH 0/5] btrfs: remove folio ordered flag Qu Wenruo
                   ` (2 preceding siblings ...)
  2026-05-07  5:29 ` [PATCH 3/5] btrfs: use dirty flag to check if an ordered extent needs to be truncated Qu Wenruo
@ 2026-05-07  5:29 ` Qu Wenruo
  2026-05-07  5:29 ` [PATCH 5/5] btrfs: remove folio ordered flag and subpage bitmap Qu Wenruo
  4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2026-05-07  5:29 UTC (permalink / raw)
  To: linux-btrfs

This involves:

- The ASSERT() inside end_bbio_data_write()
  It's only an ASSERT() and it has never been triggered as far as I
  know.

- btrfs_migrate_folio()
  Since all folio_test_ordered() usage will be removed, there is no need to
  copy the folio ordered flag.

- The ASSERT() inside btrfs_invalidate_folio()
  This one has its usefulness as it indeed caught some bugs during
  development.
  But that's the last user and will not be worth the folio flag or the
  subpage bitmap.

This will allow btrfs to finally remove the ordered flags.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c |  1 -
 fs/btrfs/inode.c     | 12 ------------
 2 files changed, 13 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b307b26014c6..fdd94a5244e2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -531,7 +531,6 @@ static void end_bbio_data_write(struct btrfs_bio *bbio)
 		u32 len = fi.length;
 
 		bio_size += len;
-		ASSERT(btrfs_folio_test_ordered(fs_info, folio, start, len));
 		btrfs_folio_clear_ordered(fs_info, folio, start, len);
 		btrfs_folio_clear_writeback(fs_info, folio, start, len);
 	}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index eaae344804f2..61cec1a66baf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7499,12 +7499,6 @@ static int btrfs_migrate_folio(struct address_space *mapping,
 
 	if (ret)
 		return ret;
-
-	if (folio_test_ordered(src)) {
-		folio_clear_ordered(src);
-		folio_set_ordered(dst);
-	}
-
 	return 0;
 }
 #else
@@ -7672,12 +7666,6 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
 	}
 	btrfs_folio_clear_dirty(fs_info, folio, page_start, folio_size(folio));
 	btrfs_clear_folio_dirty_tag(folio);
-	/*
-	 * We have iterated through all ordered extents of the page, the page
-	 * should not have Ordered anymore, or the above iteration
-	 * did something wrong.
-	 */
-	ASSERT(!folio_test_ordered(folio));
 	if (!inode_evicting)
 		__btrfs_release_folio(folio, GFP_NOFS);
 	clear_folio_extent_mapped(folio);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/5] btrfs: remove folio ordered flag and subpage bitmap
  2026-05-07  5:29 [PATCH 0/5] btrfs: remove folio ordered flag Qu Wenruo
                   ` (3 preceding siblings ...)
  2026-05-07  5:29 ` [PATCH 4/5] btrfs: remove folio_test_ordered() usage Qu Wenruo
@ 2026-05-07  5:29 ` Qu Wenruo
  4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2026-05-07  5:29 UTC (permalink / raw)
  To: linux-btrfs

Btrfs has an internal flag/subpage bitmap called ordered, which is to
indicate that a block has corresponding ordered extent covering it.

However this requires extra synchronization between the inode ordered
tree, and the folio flag/subpage bitmap, not to mention we need to
maintain the extra folio flag with subpage bitmap.

As a step to align btrfs_folio_state more closely to iomap_folio_state,
remove the btrfs specific ordered flag/bitmap.

This will also save us 64 bytes for the bitmap of a huge folio.

Since we're here, also update the ASCII graph of the bitmap, as there
are only 4 sub-bitmaps, show all sub-bitmaps directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c |  7 -------
 fs/btrfs/extent_io.h |  1 -
 fs/btrfs/fs.h        |  8 --------
 fs/btrfs/inode.c     | 31 ++-----------------------------
 fs/btrfs/subpage.c   | 41 ++---------------------------------------
 fs/btrfs/subpage.h   | 19 +++++++------------
 6 files changed, 11 insertions(+), 96 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fdd94a5244e2..ac37ff864a2a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -251,8 +251,6 @@ static void process_one_folio(struct btrfs_fs_info *fs_info,
 	ASSERT(end + 1 - start != 0 && end + 1 - start < U32_MAX);
 	len = end + 1 - start;
 
-	if (page_ops & PAGE_SET_ORDERED)
-		btrfs_folio_clamp_set_ordered(fs_info, folio, start, len);
 	if (page_ops & PAGE_START_WRITEBACK) {
 		btrfs_folio_clamp_clear_dirty(fs_info, folio, start, len);
 		btrfs_folio_clamp_set_writeback(fs_info, folio, start, len);
@@ -531,7 +529,6 @@ static void end_bbio_data_write(struct btrfs_bio *bbio)
 		u32 len = fi.length;
 
 		bio_size += len;
-		btrfs_folio_clear_ordered(fs_info, folio, start, len);
 		btrfs_folio_clear_writeback(fs_info, folio, start, len);
 	}
 
@@ -1625,7 +1622,6 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 			u64 start = page_start + (start_bit << fs_info->sectorsize_bits);
 			u32 len = (end_bit - start_bit) << fs_info->sectorsize_bits;
 
-			btrfs_folio_clear_ordered(fs_info, folio, start, len);
 			btrfs_mark_ordered_io_finished(inode, start, len, false);
 		}
 		return ret;
@@ -1703,7 +1699,6 @@ static int submit_one_sector(struct btrfs_inode *inode,
 		 * ordered extent.
 		 */
 		btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
-		btrfs_folio_clear_ordered(fs_info, folio, filepos, sectorsize);
 		btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize);
 		btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize);
 
@@ -1752,7 +1747,6 @@ static int submit_one_sector(struct btrfs_inode *inode,
 				     sectorsize, filepos - folio_pos(folio), 0);
 	if (unlikely(queued < sectorsize)) {
 		btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize);
-		btrfs_folio_clear_ordered(fs_info, folio, filepos, sectorsize);
 		btrfs_mark_ordered_io_finished(inode, filepos, fs_info->sectorsize,
 					       false);
 		return -EUCLEAN;
@@ -1820,7 +1814,6 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 			spin_unlock(&inode->ordered_tree_lock);
 			btrfs_put_ordered_extent(ordered);
 
-			btrfs_folio_clear_ordered(fs_info, folio, cur, fs_info->sectorsize);
 			btrfs_mark_ordered_io_finished(inode, cur, fs_info->sectorsize, true);
 			/*
 			 * This range is beyond i_size, thus we don't need to
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 29c57623385d..2324c14a5ecd 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -55,7 +55,6 @@ enum {
 	/* Page starts writeback, clear dirty bit and set writeback bit */
 	ENUM_BIT(PAGE_START_WRITEBACK),
 	ENUM_BIT(PAGE_END_WRITEBACK),
-	ENUM_BIT(PAGE_SET_ORDERED),
 };
 
 /*
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 9997bbc1d1e5..e18607170e01 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -1213,14 +1213,6 @@ static inline void btrfs_force_shutdown(struct btrfs_fs_info *fs_info)
 	}
 }
 
-/*
- * We use folio flag owner_2 to indicate there is an ordered extent with
- * unfinished IO.
- */
-#define folio_test_ordered(folio)	folio_test_owner_2(folio)
-#define folio_set_ordered(folio)	folio_set_owner_2(folio)
-#define folio_clear_ordered(folio)	folio_clear_owner_2(folio)
-
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 
 #define EXPORT_FOR_TESTS
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 61cec1a66baf..8c4d6e427faa 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -401,28 +401,6 @@ void btrfs_inode_unlock(struct btrfs_inode *inode, unsigned int ilock_flags)
 static inline void btrfs_cleanup_ordered_extents(struct btrfs_inode *inode,
 						 u64 offset, u64 bytes)
 {
-	pgoff_t index = offset >> PAGE_SHIFT;
-	const pgoff_t end_index = (offset + bytes - 1) >> PAGE_SHIFT;
-	struct folio *folio;
-
-	while (index <= end_index) {
-		folio = filemap_get_folio(inode->vfs_inode.i_mapping, index);
-		if (IS_ERR(folio)) {
-			index++;
-			continue;
-		}
-
-		index = folio_next_index(folio);
-		/*
-		 * Here we just clear all Ordered bits for every page in the
-		 * range, then btrfs_mark_ordered_io_finished() will handle
-		 * the ordered extent accounting for the range.
-		 */
-		btrfs_folio_clamp_clear_ordered(inode->root->fs_info, folio,
-						offset, bytes);
-		folio_put(folio);
-	}
-
 	return btrfs_mark_ordered_io_finished(inode, offset, bytes, false);
 }
 
@@ -1406,7 +1384,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 	 * setup for writepage.
 	 */
 	page_ops = ((flags & COW_FILE_RANGE_KEEP_LOCKED) ? 0 : PAGE_UNLOCK);
-	page_ops |= PAGE_SET_ORDERED;
 
 	/*
 	 * Relocation relies on the relocated extents to have exactly the same
@@ -1972,8 +1949,7 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
 		goto error;
 	extent_clear_unlock_delalloc(inode, file_pos, end, locked_folio, cached,
 				     EXTENT_LOCKED | EXTENT_DELALLOC |
-				     EXTENT_CLEAR_DATA_RESV,
-				     PAGE_SET_ORDERED);
+				     EXTENT_CLEAR_DATA_RESV, 0);
 	return ret;
 
 error:
@@ -7600,10 +7576,7 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
 		 * The range is dirty meaning it has not been submitted.
 		 * Here we need to truncate the OE range as the range will never
 		 * be submitted.
-		 */
-		btrfs_folio_clear_ordered(fs_info, folio, cur, range_len);
-
-		/*
+		 *
 		 * IO on this page will never be started, so we need to account
 		 * for any ordered extents now. Don't clear EXTENT_DELALLOC_NEW
 		 * here, must leave that up for the ordered extent completion.
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index ea202698fa10..29b34ec31d18 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -451,35 +451,6 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&bfs->lock, flags);
 }
 
-void btrfs_subpage_set_ordered(const struct btrfs_fs_info *fs_info,
-			       struct folio *folio, u64 start, u32 len)
-{
-	struct btrfs_folio_state *bfs = folio_get_private(folio);
-	unsigned int start_bit = subpage_calc_start_bit(fs_info, folio,
-							ordered, start, len);
-	unsigned long flags;
-
-	spin_lock_irqsave(&bfs->lock, flags);
-	bitmap_set(bfs->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-	folio_set_ordered(folio);
-	spin_unlock_irqrestore(&bfs->lock, flags);
-}
-
-void btrfs_subpage_clear_ordered(const struct btrfs_fs_info *fs_info,
-				 struct folio *folio, u64 start, u32 len)
-{
-	struct btrfs_folio_state *bfs = folio_get_private(folio);
-	unsigned int start_bit = subpage_calc_start_bit(fs_info, folio,
-							ordered, start, len);
-	unsigned long flags;
-
-	spin_lock_irqsave(&bfs->lock, flags);
-	bitmap_clear(bfs->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-	if (subpage_test_bitmap_all_zero(fs_info, folio, ordered))
-		folio_clear_ordered(folio);
-	spin_unlock_irqrestore(&bfs->lock, flags);
-}
-
 /*
  * Unlike set/clear which is dependent on each page status, for test all bits
  * are tested in the same way.
@@ -503,7 +474,6 @@ bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback);
-IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered);
 
 /*
  * Note that, in selftests (extent-io-tests), we can have empty fs_info passed
@@ -599,8 +569,6 @@ IMPLEMENT_BTRFS_PAGE_OPS(dirty, folio_mark_dirty, folio_clear_dirty_for_io,
 			 folio_test_dirty);
 IMPLEMENT_BTRFS_PAGE_OPS(writeback, folio_start_writeback, folio_end_writeback,
 			 folio_test_writeback);
-IMPLEMENT_BTRFS_PAGE_OPS(ordered, folio_set_ordered, folio_clear_ordered,
-			 folio_test_ordered);
 
 #define DEFINE_GET_SUBPAGE_BITMAP(name)						\
 static inline unsigned long get_bitmap_value_##name(				\
@@ -633,7 +601,6 @@ static inline unsigned long *get_bitmap_pointer_##name(				\
 DEFINE_GET_SUBPAGE_BITMAP(uptodate);
 DEFINE_GET_SUBPAGE_BITMAP(dirty);
 DEFINE_GET_SUBPAGE_BITMAP(writeback);
-DEFINE_GET_SUBPAGE_BITMAP(ordered);
 DEFINE_GET_SUBPAGE_BITMAP(locked);
 
 #define SUBPAGE_DUMP_BITMAP(fs_info, folio, name, start, len)			\
@@ -761,37 +728,33 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 		unsigned long uptodate;
 		unsigned long dirty;
 		unsigned long writeback;
-		unsigned long ordered;
 		unsigned long locked;
 
 		spin_lock_irqsave(&bfs->lock, flags);
 		uptodate = get_bitmap_value_uptodate(fs_info, folio);
 		dirty = get_bitmap_value_dirty(fs_info, folio);
 		writeback = get_bitmap_value_writeback(fs_info, folio);
-		ordered = get_bitmap_value_ordered(fs_info, folio);
 		locked = get_bitmap_value_locked(fs_info, folio);
 
 		spin_unlock_irqrestore(&bfs->lock, flags);
 
 		btrfs_warn(fs_info,
-"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl locked=%*pbl",
+"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl locked=%*pbl",
 			    start, len, folio_pos(folio),
 			    blocks_per_folio, &uptodate,
 			    blocks_per_folio, &dirty,
 			    blocks_per_folio, &writeback,
-			    blocks_per_folio, &ordered,
 			    blocks_per_folio, &locked);
 		return;
 	}
 
 	spin_lock_irqsave(&bfs->lock, flags);
 	btrfs_warn(fs_info,
-"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl locked=%*pbl",
+"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl locked=%*pbl",
 		    start, len, folio_pos(folio),
 		    blocks_per_folio, get_bitmap_pointer_uptodate(fs_info, folio),
 		    blocks_per_folio, get_bitmap_pointer_dirty(fs_info, folio),
 		    blocks_per_folio, get_bitmap_pointer_writeback(fs_info, folio),
-		    blocks_per_folio, get_bitmap_pointer_ordered(fs_info, folio),
 		    blocks_per_folio, get_bitmap_pointer_locked(fs_info, folio));
 	spin_unlock_irqrestore(&bfs->lock, flags);
 }
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index b45694eecb41..c4f569a4027a 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -14,16 +14,17 @@ struct folio;
 /*
  * Extra info for subpage bitmap.
  *
- * For subpage we pack all uptodate/dirty/writeback/ordered bitmaps into
+ * For subpage we pack all uptodate/dirty/writeback/locked bitmaps into
  * one larger bitmap.
  *
  * This structure records how they are organized in the bitmap:
  *
- * /- uptodate          /- dirty        /- ordered
- * |			|		|
- * v			v		v
- * |u|u|u|u|........|u|u|d|d|.......|d|d|o|o|.......|o|o|
- * |< sectors_per_page >|
+ * /- uptodate  /- dirty   /- writeback  /- locked
+ * |            |          |             |
+ * v            v          v             v
+ * |u|u|....|u|u|d|d|..|d|d|w|w|.....|w|w|l|l|...|l|l|
+ * \            /
+ *  blocks_per_folio
  *
  * Unlike regular macro-like enums, here we do not go upper-case names, as
  * these names will be utilized in various macros to define function names.
@@ -40,11 +41,6 @@ enum {
 	 */
 	btrfs_bitmap_nr_writeback,
 
-	/*
-	 * The ordered flags shows if the range has an ordered extent.
-	 */
-	btrfs_bitmap_nr_ordered,
-
 	/*
 	 * The locked bit is for async delalloc range (compression), currently
 	 * async extent is queued with the range locked, until the compression
@@ -179,7 +175,6 @@ bool btrfs_meta_folio_test_##name(struct folio *folio, const struct extent_buffe
 DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
 DECLARE_BTRFS_SUBPAGE_OPS(dirty);
 DECLARE_BTRFS_SUBPAGE_OPS(writeback);
-DECLARE_BTRFS_SUBPAGE_OPS(ordered);
 
 /*
  * Helper for error cleanup, where a folio will have its dirty flag cleared,
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-07  5:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07  5:29 [PATCH 0/5] btrfs: remove folio ordered flag Qu Wenruo
2026-05-07  5:29 ` [PATCH 1/5] btrfs: detect dirty blocks without an ordered extent more reliably Qu Wenruo
2026-05-07  5:29 ` [PATCH 2/5] btrfs: unify folio dirty flag clearing Qu Wenruo
2026-05-07  5:29 ` [PATCH 3/5] btrfs: use dirty flag to check if an ordered extent needs to be truncated Qu Wenruo
2026-05-07  5:29 ` [PATCH 4/5] btrfs: remove folio_test_ordered() usage Qu Wenruo
2026-05-07  5:29 ` [PATCH 5/5] btrfs: remove folio ordered flag and subpage bitmap Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox