[PATCH RFC 0/4] btrfs: remove folio ordered flag

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* [PATCH RFC 0/4] btrfs: remove folio ordered flag
@ 2026-05-04 23:49 Qu Wenruo
  2026-05-04 23:49 ` [PATCH RFC 1/4] btrfs: unify folio dirty flag clearing Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Qu Wenruo @ 2026-05-04 23:49 UTC (permalink / raw)
  To: linux-btrfs

Btrfs has a long history using an internal folio flag called ordered,
which is to indicate if an fs block is covered by an ordered extent.

However this means we need to synchronize between ordered extents, which
are managed by a per-inode ordered tree, and folio flag/subpage bitmap.

Furthermore with huge folio support, the ordered bitmap can be as large
as 64 bytes (512 bits), which is not a small amount.

The series is going to remove folio ordered flag completely, along with
the ordered subpage bitmap.

Most call sites of folio_test_ordered() are just inside ASSERT()s, so
it's not too hard to remove them.

But there is a special call site inside btrfs_invalidate_folio() where
we use ordered flag to check if we can skip an ordered extent.
This is worked around by using the fact that we have waited for
writeback of the folio, so that endio should have already finished for
the writeback range. Then check dirty flags to determine if we can skip
the OE range.

To get a reliable dirty flag for both sub-folio and full-folio cases, we
can not clear the folio dirty flag early, so the first patch is
introduced to change the folio dirty flag clearing timing, then the
second patch can remove the folio_test_ordered() usage.

Then the third patch is to remove the remaining folio_test_ordered()
usage, and finally we can remove the whole ordered flag/subpage bitmap
completely.

[REASON FOR RFC]
I'm not sure if we should remove the folio ordered flag completely, or
keep it an internal debug feature for a while.

The main concern is that we're removing quite some ASSERT()s, some are
never hit, but at least one is very useful and had triggered several
times during development, exposing bugs.

In the long run, we will eventually remove the folio ordered
flag/subpage bitmap so that we can align btrfs_folio_state with
iomap_folio_state, so ordered flags should still be gone eventually.

Another point of concern is the new btrfs_ordered_extent_in_range()
helper for extent_writepage_io().
Previously we're just doing a folio flag check, now we have to do an
rbtree search.
I hope the overhead is not that huge.

Qu Wenruo (4):
  btrfs: unify folio dirty flag clearing
  btrfs: use dirty flag to check if an ordered extent needs to be
    truncated
  btrfs: remove folio_test_ordered() usage
  btrfs: remove folio ordered flag and subpage bitmap

 fs/btrfs/extent_io.c    | 35 ++++++++++--------------
 fs/btrfs/extent_io.h    | 12 ++++++++-
 fs/btrfs/fs.h           |  8 ------
 fs/btrfs/inode.c        | 60 ++++++++++-------------------------------
 fs/btrfs/ordered-data.h | 16 +++++++++++
 fs/btrfs/subpage.c      | 41 ++--------------------------
 fs/btrfs/subpage.h      | 12 +++------
 7 files changed, 60 insertions(+), 124 deletions(-)

-- 
2.54.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 1/4] btrfs: unify folio dirty flag clearing
  2026-05-04 23:49 [PATCH RFC 0/4] btrfs: remove folio ordered flag Qu Wenruo
@ 2026-05-04 23:49 ` Qu Wenruo
  2026-05-04 23:49 ` [PATCH RFC 2/4] btrfs: use dirty flag to check if an ordered extent needs to be truncated Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2026-05-04 23:49 UTC (permalink / raw)
  To: linux-btrfs

Currently during folio writeback, we call folio_clear_dirty_for_io()
before extent_writepage(), which causes folio dirty flag to be cleared,
but without touching the subpage bitmaps.

This works fine for the bio submission path, as we always call
btrfs_folio_clear_dirty() to clear subpage bitmap.

But this is far from consistent, thus this patch is going to unify the
behavior to always use btrfs_folio_clear_dirty() helper to clear both
folio flag and subpage bitmap.

This involves:

- Change the folio_clear_dirty_for_io() with folio_test_dirty()
  There is only one call site calling folio_clear_dirty_for_io() outside
  of subpage.c, that's inside extent_write_cache_pages() just before
  extent_writepage().

- Make btrfs_invalidate_folio() to clear dirty range for the whole folio
  The function btrfs_invalidate_folio() is also called during
  extent_writepage().

  If we had a folio completely beyond isize, we call
  folio_invalidate() -> btrfs_invalidate_folio() to free the folio.

  Since we no longer have folio_clear_dirty_for_io() to clear the folio
  dirty flag, we must manually clear the folio dirty flag for the
  to-be-invalidated folio, and also clear the PAGECACHE_TAG_DIRTY tag.

  The tag clearing is done using a new helper,
  btrfs_clear_folio_dirty_tag(), which is almost the same as the old
  btree_clear_folio_dirty_tag(), but with minor improvements including:

  * Remove the folio_test_dirty() check
    We have already done an ASSERT().

  * Add an ASSERT() to make sure folio is mapped

- Add extra ASSERT()s before clearing folio private
  During development I hit dirty folios without the private flag set,
  and that caused a lot of ASSERT()s.
  The reason is that btrfs_invalidate_folio() is relying on the dirty
  flag not set when it's called from extent_writepage().

  Add extra ASSERT()s inside clear_folio_extent_mapped() to catch
  wild dirty/writeback flags.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 26 +++++++++++++-------------
 fs/btrfs/extent_io.h | 11 +++++++++++
 fs/btrfs/inode.c     |  2 ++
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ebf9a63946e5..5cab9e7a5762 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -933,6 +933,17 @@ void clear_folio_extent_mapped(struct folio *folio)
 	struct btrfs_fs_info *fs_info;
 
 	ASSERT(folio->mapping);
+	/*
+	 * The folio should not have writeback nor dirty flag set.
+	 *
+	 * If dirty flag is set, the folio can be written back again and we
+	 * expect the private flag set for the folio.
+	 *
+	 * If writeback flag is set, the endio may need to utilize the
+	 * private for btrfs_folio_state.
+	 */
+	ASSERT(!folio_test_dirty(folio));
+	ASSERT(!folio_test_writeback(folio));
 
 	if (!folio_test_private(folio))
 		return;
@@ -2562,7 +2573,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			}
 
 			if (folio_test_writeback(folio) ||
-			    !folio_clear_dirty_for_io(folio)) {
+			    !folio_test_dirty(folio)) {
 				folio_unlock(folio);
 				continue;
 			}
@@ -3725,17 +3736,6 @@ void free_extent_buffer_stale(struct extent_buffer *eb)
 	release_extent_buffer(eb);
 }
 
-static void btree_clear_folio_dirty_tag(struct folio *folio)
-{
-	ASSERT(!folio_test_dirty(folio));
-	ASSERT(folio_test_locked(folio));
-	xa_lock_irq(&folio->mapping->i_pages);
-	if (!folio_test_dirty(folio))
-		__xa_clear_mark(&folio->mapping->i_pages, folio->index,
-				PAGECACHE_TAG_DIRTY);
-	xa_unlock_irq(&folio->mapping->i_pages);
-}
-
 void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 			      struct extent_buffer *eb)
 {
@@ -3776,7 +3776,7 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 		folio_lock(folio);
 		last = btrfs_meta_folio_clear_and_test_dirty(folio, eb);
 		if (last)
-			btree_clear_folio_dirty_tag(folio);
+			btrfs_clear_folio_dirty_tag(folio);
 		folio_unlock(folio);
 	}
 	WARN_ON(refcount_read(&eb->refs) == 0);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index ede7abbe4031..29c57623385d 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -384,6 +384,17 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 			      struct extent_buffer *buf);
 
+static inline void btrfs_clear_folio_dirty_tag(struct folio *folio)
+{
+	ASSERT(!folio_test_dirty(folio));
+	ASSERT(folio_test_locked(folio));
+	ASSERT(folio->mapping);
+	xa_lock_irq(&folio->mapping->i_pages);
+	__xa_clear_mark(&folio->mapping->i_pages, folio->index,
+			PAGECACHE_TAG_DIRTY);
+	xa_unlock_irq(&folio->mapping->i_pages);
+}
+
 int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
 			   bool nofail);
 int btrfs_alloc_folio_array(unsigned int nr_folios, unsigned int order,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8fb00d22a924..4cc4643af7b4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7665,6 +7665,8 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
 					       &cached_state);
 		cur = range_end + 1;
 	}
+	btrfs_folio_clear_dirty(fs_info, folio, page_start, folio_size(folio));
+	btrfs_clear_folio_dirty_tag(folio);
 	/*
 	 * We have iterated through all ordered extents of the page, the page
 	 * should not have Ordered anymore, or the above iteration
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 2/4] btrfs: use dirty flag to check if an ordered extent needs to be truncated
  2026-05-04 23:49 [PATCH RFC 0/4] btrfs: remove folio ordered flag Qu Wenruo
  2026-05-04 23:49 ` [PATCH RFC 1/4] btrfs: unify folio dirty flag clearing Qu Wenruo
@ 2026-05-04 23:49 ` Qu Wenruo
  2026-05-04 23:49 ` [PATCH RFC 3/4] btrfs: remove folio_test_ordered() usage Qu Wenruo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2026-05-04 23:49 UTC (permalink / raw)
  To: linux-btrfs

Currently there are only two folio ordered flag users:

- extent_writepage_io()
  To ensure the folio range has an ordered extent covering it.
  This is from the legacy COW fixup mechanism, which is already removed
  and only a simple check is left.

- btrfs_invalidate_folio()
  This is to avoid race with end_bbio_data_write(), where
  btrfs_finish_ordered_extent() will be called to handle the OE
  finishing.

But for btrfs_invalidate_folio() we have already waited for the folio
writeback to finish, and locked the folio.
This means we can use the dirty flag to check if a range is already
submitted or not.

If the OE range is not dirty, it means the range has been submitted and
its dirty flag was cleared. And since we have already waited for
writeback, the endio function will handle the OE finishing.
Thus if the range is not dirty, we must skip the range.

If the OE range is dirty, it means we have allocated an ordered extent but
have not yet submitted the range. And that's exactly the case where we need
to truncate the ordered extent.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4cc4643af7b4..eaae344804f2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7593,15 +7593,20 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
 				page_end);
 		ASSERT(range_end + 1 - cur < U32_MAX);
 		range_len = range_end + 1 - cur;
-		if (!btrfs_folio_test_ordered(fs_info, folio, cur, range_len)) {
-			/*
-			 * If Ordered is cleared, it means endio has
-			 * already been executed for the range.
-			 * We can't delete the extent states as
-			 * btrfs_finish_ordered_io() may still use some of them.
-			 */
+		/*
+		 * If the range is not dirty, the range has been submitted and
+		 * since we have waited for the writeback, endio has been
+		 * executed, thus we must skip the range to avoid double
+		 * accounting for the ordered extent.
+		 */
+		if (!btrfs_folio_test_dirty(fs_info, folio, cur, range_len))
 			goto next;
-		}
+
+		/*
+		 * The range is dirty meaning it has not been submitted.
+		 * Here we need to truncate the OE range as the range will never
+		 * be submitted.
+		 */
 		btrfs_folio_clear_ordered(fs_info, folio, cur, range_len);
 
 		/*
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 3/4] btrfs: remove folio_test_ordered() usage
  2026-05-04 23:49 [PATCH RFC 0/4] btrfs: remove folio ordered flag Qu Wenruo
  2026-05-04 23:49 ` [PATCH RFC 1/4] btrfs: unify folio dirty flag clearing Qu Wenruo
  2026-05-04 23:49 ` [PATCH RFC 2/4] btrfs: use dirty flag to check if an ordered extent needs to be truncated Qu Wenruo
@ 2026-05-04 23:49 ` Qu Wenruo
  2026-05-04 23:49 ` [PATCH RFC 4/4] btrfs: remove folio ordered flag and subpage bitmap Qu Wenruo
  2026-05-06 13:43 ` [PATCH RFC 0/4] btrfs: remove folio ordered flag David Sterba
  4 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2026-05-04 23:49 UTC (permalink / raw)
  To: linux-btrfs

This involves:

- The ASSERT() inside end_bbio_data_write()
  It's only an ASSERT() and it has never been triggered as far as I
  know.

- The unlikely() check inside extent_writepage_io()
  Introduce a helper, btrfs_ordered_extent_in_range(), to replace the
  folio_test_ordered().

- btrfs_migrate_folio()
  Since all folio_test_ordered() will be removed, there is no need to
  copy the folio ordered flag.

- The ASSERT() inside btrfs_invalidate_folio()
  This one has its usefulness as it indeed caught some bugs during
  development.
  But that's the last user and will not be worth the folio flag nor
  subpage bitmap.

This will allow btrfs to finally remove the ordered flags.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c    |  3 +--
 fs/btrfs/inode.c        | 12 ------------
 fs/btrfs/ordered-data.h | 16 ++++++++++++++++
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5cab9e7a5762..d178f48ee5f0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -531,7 +531,6 @@ static void end_bbio_data_write(struct btrfs_bio *bbio)
 		u32 len = fi.length;
 
 		bio_size += len;
-		ASSERT(btrfs_folio_test_ordered(fs_info, folio, start, len));
 		btrfs_folio_clear_ordered(fs_info, folio, start, len);
 		btrfs_folio_clear_writeback(fs_info, folio, start, len);
 	}
@@ -1754,7 +1753,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 	ASSERT(end <= folio_end, "start=%llu len=%u folio_start=%llu folio_size=%zu",
 	       start, len, folio_start, folio_size(folio));
 
-	if (unlikely(!folio_test_ordered(folio))) {
+	if (unlikely(!btrfs_ordered_extent_in_range(inode, start, len))) {
 		DEBUG_WARN();
 		btrfs_err_rl(fs_info,
 	"root %lld ino %llu folio %llu is marked dirty without notifying the fs",
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index eaae344804f2..61cec1a66baf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7499,12 +7499,6 @@ static int btrfs_migrate_folio(struct address_space *mapping,
 
 	if (ret)
 		return ret;
-
-	if (folio_test_ordered(src)) {
-		folio_clear_ordered(src);
-		folio_set_ordered(dst);
-	}
-
 	return 0;
 }
 #else
@@ -7672,12 +7666,6 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
 	}
 	btrfs_folio_clear_dirty(fs_info, folio, page_start, folio_size(folio));
 	btrfs_clear_folio_dirty_tag(folio);
-	/*
-	 * We have iterated through all ordered extents of the page, the page
-	 * should not have Ordered anymore, or the above iteration
-	 * did something wrong.
-	 */
-	ASSERT(!folio_test_ordered(folio));
 	if (!inode_evicting)
 		__btrfs_release_folio(folio, GFP_NOFS);
 	clear_folio_extent_mapped(folio);
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 03e12380a2fd..3a4ed8d59aca 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -208,6 +208,22 @@ struct btrfs_ordered_extent *
 btrfs_lookup_first_ordered_extent(struct btrfs_inode *inode, u64 file_offset);
 struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range(
 			struct btrfs_inode *inode, u64 file_offset, u64 len);
+
+/* Check if there is an ordered extent in range. */
+static inline bool btrfs_ordered_extent_in_range(struct btrfs_inode *inode,
+						 u64 file_offset, u64 len)
+{
+	struct btrfs_ordered_extent *ordered;
+	bool ret = false;
+
+	ordered = btrfs_lookup_first_ordered_range(inode, file_offset, len);
+	if (ordered) {
+		ret = true;
+		btrfs_put_ordered_extent(ordered);
+	}
+	return ret;
+}
+
 struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
 		struct btrfs_inode *inode,
 		u64 file_offset,
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 4/4] btrfs: remove folio ordered flag and subpage bitmap
  2026-05-04 23:49 [PATCH RFC 0/4] btrfs: remove folio ordered flag Qu Wenruo
                   ` (2 preceding siblings ...)
  2026-05-04 23:49 ` [PATCH RFC 3/4] btrfs: remove folio_test_ordered() usage Qu Wenruo
@ 2026-05-04 23:49 ` Qu Wenruo
  2026-05-06 13:43 ` [PATCH RFC 0/4] btrfs: remove folio ordered flag David Sterba
  4 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2026-05-04 23:49 UTC (permalink / raw)
  To: linux-btrfs

Btrfs has an internal flag/subpage bitmap called ordered, which is to
indicate that a block has corresponding ordered extent covering it.

However this requires extra synchronization between the inode ordered
tree, and the folio flag/subpage bitmap, not to mention we need to
maintain the extra folio flag with subpage bitmap.

As a step to align btrfs_folio_state more close to iomap_folio_state,
remove the btrfs specific ordered flag/bitmap.

This will also save us 64 bytes for the bitmap of a huge folio.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c |  6 ------
 fs/btrfs/extent_io.h |  1 -
 fs/btrfs/fs.h        |  8 --------
 fs/btrfs/inode.c     | 31 ++-----------------------------
 fs/btrfs/subpage.c   | 41 ++---------------------------------------
 fs/btrfs/subpage.h   | 12 +++---------
 6 files changed, 7 insertions(+), 92 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d178f48ee5f0..8a0da0b1aaf2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -251,8 +251,6 @@ static void process_one_folio(struct btrfs_fs_info *fs_info,
 	ASSERT(end + 1 - start != 0 && end + 1 - start < U32_MAX);
 	len = end + 1 - start;
 
-	if (page_ops & PAGE_SET_ORDERED)
-		btrfs_folio_clamp_set_ordered(fs_info, folio, start, len);
 	if (page_ops & PAGE_START_WRITEBACK) {
 		btrfs_folio_clamp_clear_dirty(fs_info, folio, start, len);
 		btrfs_folio_clamp_set_writeback(fs_info, folio, start, len);
@@ -531,7 +529,6 @@ static void end_bbio_data_write(struct btrfs_bio *bbio)
 		u32 len = fi.length;
 
 		bio_size += len;
-		btrfs_folio_clear_ordered(fs_info, folio, start, len);
 		btrfs_folio_clear_writeback(fs_info, folio, start, len);
 	}
 
@@ -1597,7 +1594,6 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 			u64 start = page_start + (start_bit << fs_info->sectorsize_bits);
 			u32 len = (end_bit - start_bit) << fs_info->sectorsize_bits;
 
-			btrfs_folio_clear_ordered(fs_info, folio, start, len);
 			btrfs_mark_ordered_io_finished(inode, start, len, false);
 		}
 		return ret;
@@ -1674,7 +1670,6 @@ static int submit_one_sector(struct btrfs_inode *inode,
 		 * ordered extent.
 		 */
 		btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
-		btrfs_folio_clear_ordered(fs_info, folio, filepos, sectorsize);
 		btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize);
 		btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize);
 
@@ -1797,7 +1792,6 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 			spin_unlock(&inode->ordered_tree_lock);
 			btrfs_put_ordered_extent(ordered);
 
-			btrfs_folio_clear_ordered(fs_info, folio, cur, fs_info->sectorsize);
 			btrfs_mark_ordered_io_finished(inode, cur, fs_info->sectorsize, true);
 			/*
 			 * This range is beyond i_size, thus we don't need to
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 29c57623385d..2324c14a5ecd 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -55,7 +55,6 @@ enum {
 	/* Page starts writeback, clear dirty bit and set writeback bit */
 	ENUM_BIT(PAGE_START_WRITEBACK),
 	ENUM_BIT(PAGE_END_WRITEBACK),
-	ENUM_BIT(PAGE_SET_ORDERED),
 };
 
 /*
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 9997bbc1d1e5..e18607170e01 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -1213,14 +1213,6 @@ static inline void btrfs_force_shutdown(struct btrfs_fs_info *fs_info)
 	}
 }
 
-/*
- * We use folio flag owner_2 to indicate there is an ordered extent with
- * unfinished IO.
- */
-#define folio_test_ordered(folio)	folio_test_owner_2(folio)
-#define folio_set_ordered(folio)	folio_set_owner_2(folio)
-#define folio_clear_ordered(folio)	folio_clear_owner_2(folio)
-
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 
 #define EXPORT_FOR_TESTS
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 61cec1a66baf..8c4d6e427faa 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -401,28 +401,6 @@ void btrfs_inode_unlock(struct btrfs_inode *inode, unsigned int ilock_flags)
 static inline void btrfs_cleanup_ordered_extents(struct btrfs_inode *inode,
 						 u64 offset, u64 bytes)
 {
-	pgoff_t index = offset >> PAGE_SHIFT;
-	const pgoff_t end_index = (offset + bytes - 1) >> PAGE_SHIFT;
-	struct folio *folio;
-
-	while (index <= end_index) {
-		folio = filemap_get_folio(inode->vfs_inode.i_mapping, index);
-		if (IS_ERR(folio)) {
-			index++;
-			continue;
-		}
-
-		index = folio_next_index(folio);
-		/*
-		 * Here we just clear all Ordered bits for every page in the
-		 * range, then btrfs_mark_ordered_io_finished() will handle
-		 * the ordered extent accounting for the range.
-		 */
-		btrfs_folio_clamp_clear_ordered(inode->root->fs_info, folio,
-						offset, bytes);
-		folio_put(folio);
-	}
-
 	return btrfs_mark_ordered_io_finished(inode, offset, bytes, false);
 }
 
@@ -1406,7 +1384,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 	 * setup for writepage.
 	 */
 	page_ops = ((flags & COW_FILE_RANGE_KEEP_LOCKED) ? 0 : PAGE_UNLOCK);
-	page_ops |= PAGE_SET_ORDERED;
 
 	/*
 	 * Relocation relies on the relocated extents to have exactly the same
@@ -1972,8 +1949,7 @@ static int nocow_one_range(struct btrfs_inode *inode, struct folio *locked_folio
 		goto error;
 	extent_clear_unlock_delalloc(inode, file_pos, end, locked_folio, cached,
 				     EXTENT_LOCKED | EXTENT_DELALLOC |
-				     EXTENT_CLEAR_DATA_RESV,
-				     PAGE_SET_ORDERED);
+				     EXTENT_CLEAR_DATA_RESV, 0);
 	return ret;
 
 error:
@@ -7600,10 +7576,7 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
 		 * The range is dirty meaning it has not been submitted.
 		 * Here we need to truncate the OE range as the range will never
 		 * be submitted.
-		 */
-		btrfs_folio_clear_ordered(fs_info, folio, cur, range_len);
-
-		/*
+		 *
 		 * IO on this page will never be started, so we need to account
 		 * for any ordered extents now. Don't clear EXTENT_DELALLOC_NEW
 		 * here, must leave that up for the ordered extent completion.
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index ea202698fa10..29b34ec31d18 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -451,35 +451,6 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&bfs->lock, flags);
 }
 
-void btrfs_subpage_set_ordered(const struct btrfs_fs_info *fs_info,
-			       struct folio *folio, u64 start, u32 len)
-{
-	struct btrfs_folio_state *bfs = folio_get_private(folio);
-	unsigned int start_bit = subpage_calc_start_bit(fs_info, folio,
-							ordered, start, len);
-	unsigned long flags;
-
-	spin_lock_irqsave(&bfs->lock, flags);
-	bitmap_set(bfs->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-	folio_set_ordered(folio);
-	spin_unlock_irqrestore(&bfs->lock, flags);
-}
-
-void btrfs_subpage_clear_ordered(const struct btrfs_fs_info *fs_info,
-				 struct folio *folio, u64 start, u32 len)
-{
-	struct btrfs_folio_state *bfs = folio_get_private(folio);
-	unsigned int start_bit = subpage_calc_start_bit(fs_info, folio,
-							ordered, start, len);
-	unsigned long flags;
-
-	spin_lock_irqsave(&bfs->lock, flags);
-	bitmap_clear(bfs->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-	if (subpage_test_bitmap_all_zero(fs_info, folio, ordered))
-		folio_clear_ordered(folio);
-	spin_unlock_irqrestore(&bfs->lock, flags);
-}
-
 /*
  * Unlike set/clear which is dependent on each page status, for test all bits
  * are tested in the same way.
@@ -503,7 +474,6 @@ bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback);
-IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered);
 
 /*
  * Note that, in selftests (extent-io-tests), we can have empty fs_info passed
@@ -599,8 +569,6 @@ IMPLEMENT_BTRFS_PAGE_OPS(dirty, folio_mark_dirty, folio_clear_dirty_for_io,
 			 folio_test_dirty);
 IMPLEMENT_BTRFS_PAGE_OPS(writeback, folio_start_writeback, folio_end_writeback,
 			 folio_test_writeback);
-IMPLEMENT_BTRFS_PAGE_OPS(ordered, folio_set_ordered, folio_clear_ordered,
-			 folio_test_ordered);
 
 #define DEFINE_GET_SUBPAGE_BITMAP(name)						\
 static inline unsigned long get_bitmap_value_##name(				\
@@ -633,7 +601,6 @@ static inline unsigned long *get_bitmap_pointer_##name(				\
 DEFINE_GET_SUBPAGE_BITMAP(uptodate);
 DEFINE_GET_SUBPAGE_BITMAP(dirty);
 DEFINE_GET_SUBPAGE_BITMAP(writeback);
-DEFINE_GET_SUBPAGE_BITMAP(ordered);
 DEFINE_GET_SUBPAGE_BITMAP(locked);
 
 #define SUBPAGE_DUMP_BITMAP(fs_info, folio, name, start, len)			\
@@ -761,37 +728,33 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 		unsigned long uptodate;
 		unsigned long dirty;
 		unsigned long writeback;
-		unsigned long ordered;
 		unsigned long locked;
 
 		spin_lock_irqsave(&bfs->lock, flags);
 		uptodate = get_bitmap_value_uptodate(fs_info, folio);
 		dirty = get_bitmap_value_dirty(fs_info, folio);
 		writeback = get_bitmap_value_writeback(fs_info, folio);
-		ordered = get_bitmap_value_ordered(fs_info, folio);
 		locked = get_bitmap_value_locked(fs_info, folio);
 
 		spin_unlock_irqrestore(&bfs->lock, flags);
 
 		btrfs_warn(fs_info,
-"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl locked=%*pbl",
+"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl locked=%*pbl",
 			    start, len, folio_pos(folio),
 			    blocks_per_folio, &uptodate,
 			    blocks_per_folio, &dirty,
 			    blocks_per_folio, &writeback,
-			    blocks_per_folio, &ordered,
 			    blocks_per_folio, &locked);
 		return;
 	}
 
 	spin_lock_irqsave(&bfs->lock, flags);
 	btrfs_warn(fs_info,
-"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl ordered=%*pbl locked=%*pbl",
+"start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl writeback=%*pbl locked=%*pbl",
 		    start, len, folio_pos(folio),
 		    blocks_per_folio, get_bitmap_pointer_uptodate(fs_info, folio),
 		    blocks_per_folio, get_bitmap_pointer_dirty(fs_info, folio),
 		    blocks_per_folio, get_bitmap_pointer_writeback(fs_info, folio),
-		    blocks_per_folio, get_bitmap_pointer_ordered(fs_info, folio),
 		    blocks_per_folio, get_bitmap_pointer_locked(fs_info, folio));
 	spin_unlock_irqrestore(&bfs->lock, flags);
 }
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index b45694eecb41..dd3ece30ba5f 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -14,15 +14,15 @@ struct folio;
 /*
  * Extra info for subpage bitmap.
  *
- * For subpage we pack all uptodate/dirty/writeback/ordered bitmaps into
+ * For subpage we pack all uptodate/dirty/writeback/locked bitmaps into
  * one larger bitmap.
  *
  * This structure records how they are organized in the bitmap:
  *
- * /- uptodate          /- dirty        /- ordered
+ * /- uptodate          /- dirty        /- locked
  * |			|		|
  * v			v		v
- * |u|u|u|u|........|u|u|d|d|.......|d|d|o|o|.......|o|o|
+ * |u|u|u|u|........|u|u|d|d|.......|d|d|l|l|.......|l|l|
  * |< sectors_per_page >|
  *
  * Unlike regular macro-like enums, here we do not go upper-case names, as
@@ -40,11 +40,6 @@ enum {
 	 */
 	btrfs_bitmap_nr_writeback,
 
-	/*
-	 * The ordered flags shows if the range has an ordered extent.
-	 */
-	btrfs_bitmap_nr_ordered,
-
 	/*
 	 * The locked bit is for async delalloc range (compression), currently
 	 * async extent is queued with the range locked, until the compression
@@ -179,7 +174,6 @@ bool btrfs_meta_folio_test_##name(struct folio *folio, const struct extent_buffe
 DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
 DECLARE_BTRFS_SUBPAGE_OPS(dirty);
 DECLARE_BTRFS_SUBPAGE_OPS(writeback);
-DECLARE_BTRFS_SUBPAGE_OPS(ordered);
 
 /*
  * Helper for error cleanup, where a folio will have its dirty flag cleared,
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 0/4] btrfs: remove folio ordered flag
  2026-05-04 23:49 [PATCH RFC 0/4] btrfs: remove folio ordered flag Qu Wenruo
                   ` (3 preceding siblings ...)
  2026-05-04 23:49 ` [PATCH RFC 4/4] btrfs: remove folio ordered flag and subpage bitmap Qu Wenruo
@ 2026-05-06 13:43 ` David Sterba
  2026-05-06 21:27   ` Qu Wenruo
  4 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2026-05-06 13:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, May 05, 2026 at 09:19:20AM +0930, Qu Wenruo wrote:
> Btrfs has a long history using an internal folio flag called ordered,
> which is to indicate if an fs block is covered by an ordered extent.
> 
> However this means we need to synchronize between ordered extents, which
> are managed by a per-inode ordered tree, and folio flag/subpage bitmap.
> 
> Furthermore with huge folio support, the ordered bitmap can be as large
> as 64 bytes (512 bits), which is not a small amount.
> 
> The series is going to remove folio ordered flag completely, along with
> the ordered subpage bitmap.
> 
> Most call sites of folio_test_ordered() are just inside ASSERT()s, so
> it's not too hard to remove them.
> 
> But there is a special call site inside btrfs_invalidate_folio() where
> we use ordered flag to check if we can skip an ordered extent.
> This is worked around by using the fact that we have waited for
> writeback of the folio, so that endio should have already finished for
> the writeback range. Then check dirty flags to determine if we can skip
> the OE range.
> 
> To get a reliable dirty flag for both sub-folio and full-folio cases, we
> can not clear the folio dirty flag early, so the first patch is
> introduced to change the folio dirty flag clearing timing, then the
> second patch can remove the folio_test_ordered() usage.
> 
> Then the third patch is to remove the remaining folio_test_ordered()
> usage, and finally we can remove the whole ordered flag/subpage bitmap
> completely.
> 
> [REASON FOR RFC]
> I'm not sure if we should remove the folio ordered flag completely, or
> keep it an internal debug feature for a while.

For debugging and additional verification we can keep it as long as it's
practical.

> The main concern is that we're removing quite some ASSERT()s, some are
> never hit, but at least one is very useful and had triggered several
> times during development, exposing bugs.
> 
> In the long run, we will eventually remove the folio ordered
> flag/subpage bitmap so that we can align btrfs_folio_state with
> iomap_folio_state, so ordered flags should still be gone eventually.
> 
> Another point of concern is the new btrfs_ordered_extent_in_range()
> helper for extent_writepage_io().
> Previously we're just doing a folio flag check, now we have to do an
> rbtree search.
> I hope the overhead is not that huge.

This seems to be a concern for removing the ordered bit, it will have
some performance impact. Searching in rb-tree is not cheap, compared to
a single bit check. This kind of optimization could be there even after
we switch to iomap.

Otherwise, reducing the size of bitmaps makes sense. We could live for a
release with less effective storage just to make sure things work and
then remove ordered bitmap or other optimizations.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 0/4] btrfs: remove folio ordered flag
  2026-05-06 13:43 ` [PATCH RFC 0/4] btrfs: remove folio ordered flag David Sterba
@ 2026-05-06 21:27   ` Qu Wenruo
  0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2026-05-06 21:27 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs



在 2026/5/6 23:13, David Sterba 写道:
> On Tue, May 05, 2026 at 09:19:20AM +0930, Qu Wenruo wrote:
>> Btrfs has a long history using an internal folio flag called ordered,
>> which is to indicate if an fs block is covered by an ordered extent.
>>
>> However this means we need to synchronize between ordered extents, which
>> are managed by a per-inode ordered tree, and folio flag/subpage bitmap.
>>
>> Furthermore with huge folio support, the ordered bitmap can be as large
>> as 64 bytes (512 bits), which is not a small amount.
>>
>> The series is going to remove folio ordered flag completely, along with
>> the ordered subpage bitmap.
>>
>> Most call sites of folio_test_ordered() are just inside ASSERT()s, so
>> it's not too hard to remove them.
>>
>> But there is a special call site inside btrfs_invalidate_folio() where
>> we use ordered flag to check if we can skip an ordered extent.
>> This is worked around by using the fact that we have waited for
>> writeback of the folio, so that endio should have already finished for
>> the writeback range. Then check dirty flags to determine if we can skip
>> the OE range.
>>
>> To get a reliable dirty flag for both sub-folio and full-folio cases, we
>> can not clear the folio dirty flag early, so the first patch is
>> introduced to change the folio dirty flag clearing timing, then the
>> second patch can remove the folio_test_ordered() usage.
>>
>> Then the third patch is to remove the remaining folio_test_ordered()
>> usage, and finally we can remove the whole ordered flag/subpage bitmap
>> completely.
>>
>> [REASON FOR RFC]
>> I'm not sure if we should remove the folio ordered flag completely, or
>> keep it an internal debug feature for a while.
> 
> For debugging and additional verification we can keep it as long as it's
> practical.
> 
>> The main concern is that we're removing quite some ASSERT()s, some are
>> never hit, but at least one is very useful and had triggered several
>> times during development, exposing bugs.
>>
>> In the long run, we will eventually remove the folio ordered
>> flag/subpage bitmap so that we can align btrfs_folio_state with
>> iomap_folio_state, so ordered flags should still be gone eventually.
>>
>> Another point of concern is the new btrfs_ordered_extent_in_range()
>> helper for extent_writepage_io().
>> Previously we're just doing a folio flag check, now we have to do an
>> rbtree search.
>> I hope the overhead is not that huge.
> 
> This seems to be a concern for removing the ordered bit, it will have
> some performance impact. Searching in rb-tree is not cheap, compared to
> a single bit check. This kind of optimization could be there even after
> we switch to iomap.

I have a better idea now, we can move the ordered extent check into 
alloc_new_bio(), where we're already doing an ordered extent search there.

For now if we failed to find an OE, we continue.

This can be changed to output an error, but will need extra error 
handling if no OE is found.

This will not introduce new OE search, and have a better accuracy than 
the per-folio checks.

Thanks,
Qu

> 
> Otherwise, reducing the size of bitmaps makes sense. We could live for a
> release with less effective storage just to make sure things work and
> then remove ordered bitmap or other optimizations.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-06 21:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 23:49 [PATCH RFC 0/4] btrfs: remove folio ordered flag Qu Wenruo
2026-05-04 23:49 ` [PATCH RFC 1/4] btrfs: unify folio dirty flag clearing Qu Wenruo
2026-05-04 23:49 ` [PATCH RFC 2/4] btrfs: use dirty flag to check if an ordered extent needs to be truncated Qu Wenruo
2026-05-04 23:49 ` [PATCH RFC 3/4] btrfs: remove folio_test_ordered() usage Qu Wenruo
2026-05-04 23:49 ` [PATCH RFC 4/4] btrfs: remove folio ordered flag and subpage bitmap Qu Wenruo
2026-05-06 13:43 ` [PATCH RFC 0/4] btrfs: remove folio ordered flag David Sterba
2026-05-06 21:27   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox