[PATCH v2 0/4] btrfs: reduce btrfs_get_extent() calls for buffered write path

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] btrfs: reduce btrfs_get_extent() calls for buffered write path
@ 2025-11-20  7:51 Qu Wenruo
  2025-11-20  7:51 ` [PATCH v2 1/4] btrfs: make sure all ordered extents beyond EOF is properly truncated Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-11-20  7:51 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Fix a potential bug where OEs beyond EOF are not truncated properly
  This replace the original patch to extract the code into a helper.

- Replace more for_each_set_bit() with for_each_set_bitrange()

- Fix several copy-n-pasted incorrect range inside submit_range()

Although btrfs has bs < ps support for a long time, and the larger data
folios support is also going to be graduate from experimental features
soon, the write path is still iterating each fs block and call
btrfs_get_extent() on each fs block.

What makes the situation worse is that, for the write path we do not
have any cached extent map, meaning even with large folios and we got a
continuous range that can be submitted in one go, we still call
btrfs_get_extent() many times and get the same range extent map again
and again.

This series will reduce the duplicated btrfs_get_extent() calls by only
call it once for each range, other than for each fs block.

The first one is a potential bug inspired by Boris' review.
Patch 2~3 are minor cleanups.
Patch 4 is the core of the optimization.

Although I don't expect there will be much difference in the real world though.

Qu Wenruo (4):
  btrfs: make sure all ordered extents beyond EOF is properly truncated
  btrfs: integrate the error handling of submit_one_sector()
  btrfs: replace for_each_set_bit() with for_each_set_bitmap()
  btrfs: reduce extent map lookup during writes

 fs/btrfs/extent_io.c    | 236 +++++++++++++++++++++-------------------
 fs/btrfs/ordered-data.c |  38 +++++++
 fs/btrfs/ordered-data.h |   2 +
 3 files changed, 162 insertions(+), 114 deletions(-)

-- 
2.52.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/4] btrfs: make sure all ordered extents beyond EOF is properly truncated
  2025-11-20  7:51 [PATCH v2 0/4] btrfs: reduce btrfs_get_extent() calls for buffered write path Qu Wenruo
@ 2025-11-20  7:51 ` Qu Wenruo
  2025-11-20  7:51 ` [PATCH v2 2/4] btrfs: integrate the error handling of submit_one_sector() Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-11-20  7:51 UTC (permalink / raw)
  To: linux-btrfs

[POSSIBLE BUG]
If there are multiple ordered extents beyond EOF, at folio writeback
time we may only truncate the first ordered extent, but leaving the
remaining ones finished but not marked as truncated.

Since those OEs are not marked as truncated, it will still insert an
file extent item, and may lead to false missing csum errors during
"btrfs check".

[CAUSE]
Since we have bs < ps support for a while and experimental large data
folios are also going to graduate from experimental features soon, we
can have the following folio to be written back:

  fs block size 4K
  page size 4K, folio size 64K.

           0        16K      32K                  64K
	   |<---------------- Dirty -------------->|
	   |<-OE A->|<-OE B->|<----- OE C -------->|
               |
	       i_size 4K.

In above case we need to submit the writeback for the range [0, 4K).
For range [4K, 64K) there is no need to submit any IO but mark the
involved OEs (OE A, B, C) all as truncated.

However during the EOF handling, we only call
btrfs_lookup_first_ordered_range() once, thus only got OE A and mark it
as truncated.
But OE B and C are not marked as truncated, they will finish as usual,
which will leave a regular file extent item to be inserted beyond EOF,
and without any data checksum.

[FIX]
Introduce a new helper, btrfs_mark_ordered_io_truncated(), to handle all
OEs of a range, and mark them all as truncated.

With that helper, all OEs (A B and C) will be marked as truncated.
OE B and C will have 0 truncated_len, preventing any file extent item to
be inserted from them.

Fixes: f0a0f5bfe04e ("btrfs: truncate ordered extent when skipping writeback past i_size")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c    | 19 +------------------
 fs/btrfs/ordered-data.c | 38 ++++++++++++++++++++++++++++++++++++++
 fs/btrfs/ordered-data.h |  2 ++
 3 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2d32dfc34ae3..2044b889c887 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1725,24 +1725,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 		cur = folio_pos(folio) + (bit << fs_info->sectorsize_bits);
 
 		if (cur >= i_size) {
-			struct btrfs_ordered_extent *ordered;
-
-			ordered = btrfs_lookup_first_ordered_range(inode, cur,
-								   folio_end - cur);
-			/*
-			 * We have just run delalloc before getting here, so
-			 * there must be an ordered extent.
-			 */
-			ASSERT(ordered != NULL);
-			spin_lock(&inode->ordered_tree_lock);
-			set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
-			ordered->truncated_len = min(ordered->truncated_len,
-						     cur - ordered->file_offset);
-			spin_unlock(&inode->ordered_tree_lock);
-			btrfs_put_ordered_extent(ordered);
-
-			btrfs_mark_ordered_io_finished(inode, folio, cur,
-						       end - cur, true);
+			btrfs_mark_ordered_io_truncated(inode, folio, cur, end - cur);
 			/*
 			 * This range is beyond i_size, thus we don't need to
 			 * bother writing back.
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index a421f7db9eec..9a0638d13225 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -546,6 +546,44 @@ void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode,
 	spin_unlock(&inode->ordered_tree_lock);
 }
 
+/*
+ * Mark all ordered extents io inside the specified range as truncated.
+ *
+ * This is utilized by writeback path, thus there must be an ordered extent
+ * for the range.
+ */
+void btrfs_mark_ordered_io_truncated(struct btrfs_inode *inode, struct folio *folio,
+				     u64 file_offset, u32 len)
+{
+	u64 cur = file_offset;
+
+	ASSERT(file_offset >= folio_pos(folio));
+	ASSERT(file_offset + len <= folio_end(folio));
+
+	while  (cur < file_offset + len) {
+		u32 cur_len = file_offset + len - cur;
+		struct btrfs_ordered_extent *ordered;
+
+		ordered = btrfs_lookup_first_ordered_range(inode, cur, cur_len);
+
+		/*
+		 * We have just run delalloc before getting here, so there must
+		 * be an ordered extent.
+		 */
+		ASSERT(ordered != NULL);
+		scoped_guard(spinlock, &inode->ordered_tree_lock) {
+			set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
+			ordered->truncated_len = min(ordered->truncated_len,
+						     cur - ordered->file_offset);
+		}
+		cur_len = min(cur_len, ordered->file_offset + ordered->num_bytes - cur);
+		btrfs_put_ordered_extent(ordered);
+
+		cur += cur_len;
+	}
+	btrfs_mark_ordered_io_finished(inode, folio, file_offset, len, true);
+}
+
 /*
  * Finish IO for one ordered extent across a given range.  The range can only
  * contain one ordered extent.
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 1e6b0b182b29..dd4cdc1a8b78 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -169,6 +169,8 @@ void btrfs_finish_ordered_extent(struct btrfs_ordered_extent *ordered,
 void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode,
 				    struct folio *folio, u64 file_offset,
 				    u64 num_bytes, bool uptodate);
+void btrfs_mark_ordered_io_truncated(struct btrfs_inode *inode, struct folio *folio,
+				     u64 file_offset, u32 len);
 bool btrfs_dec_test_ordered_pending(struct btrfs_inode *inode,
 				    struct btrfs_ordered_extent **cached,
 				    u64 file_offset, u64 io_size);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/4] btrfs: integrate the error handling of submit_one_sector()
  2025-11-20  7:51 [PATCH v2 0/4] btrfs: reduce btrfs_get_extent() calls for buffered write path Qu Wenruo
  2025-11-20  7:51 ` [PATCH v2 1/4] btrfs: make sure all ordered extents beyond EOF is properly truncated Qu Wenruo
@ 2025-11-20  7:51 ` Qu Wenruo
  2025-11-20  7:51 ` [PATCH v2 3/4] btrfs: replace for_each_set_bit() with for_each_set_bitmap() Qu Wenruo
  2025-11-20  7:51 ` [PATCH v2 4/4] btrfs: reduce extent map lookup during writes Qu Wenruo
  3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-11-20  7:51 UTC (permalink / raw)
  To: linux-btrfs

Currently submit_one_sector() has only one failure pattern from
btrfs_get_extent().

However the error handling is split into two part, one inside
submit_one_sector(), which clear the dirty flag and finish the writeback
for the fs block.

The other part of the error handling is to submit any remaining bio
inside bio_ctrl and mark the ordered extent for the fs block as
finished.

There is no special reason that we must split the error handling, let's
just concentrace all the error handling inside submit_one_sector().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2044b889c887..0f1bf9a4f0ae 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1598,7 +1598,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 
 /*
  * Return 0 if we have submitted or queued the sector for submission.
- * Return <0 for critical errors, and the sector will have its dirty flag cleared.
+ * Return <0 for critical errors, and the involved sector will be cleaned up.
  *
  * Caller should make sure filepos < i_size and handle filepos >= i_size case.
  */
@@ -1622,6 +1622,13 @@ static int submit_one_sector(struct btrfs_inode *inode,
 
 	em = btrfs_get_extent(inode, NULL, filepos, sectorsize);
 	if (IS_ERR(em)) {
+		/*
+		 * bio_ctrl may contain a bio crossing several folios.
+		 * Submit it immediately so that the bio has a chance
+		 * to finish normally, other than marked as error.
+		 */
+		submit_one_bio(bio_ctrl);
+
 		/*
 		 * When submission failed, we should still clear the folio dirty.
 		 * Or the folio will be written back again but without any
@@ -1630,6 +1637,13 @@ static int submit_one_sector(struct btrfs_inode *inode,
 		btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
 		btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize);
 		btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize);
+
+		/*
+		 * Since there is no bio submitted to finish the ordered
+		 * extent, we have to manually finish this sector.
+		 */
+		btrfs_mark_ordered_io_finished(inode, folio, filepos,
+					       fs_info->sectorsize, false);
 		return PTR_ERR(em);
 	}
 
@@ -1739,19 +1753,6 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 		}
 		ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
 		if (unlikely(ret < 0)) {
-			/*
-			 * bio_ctrl may contain a bio crossing several folios.
-			 * Submit it immediately so that the bio has a chance
-			 * to finish normally, other than marked as error.
-			 */
-			submit_one_bio(bio_ctrl);
-			/*
-			 * Failed to grab the extent map which should be very rare.
-			 * Since there is no bio submitted to finish the ordered
-			 * extent, we have to manually finish this sector.
-			 */
-			btrfs_mark_ordered_io_finished(inode, folio, cur,
-						       fs_info->sectorsize, false);
 			if (!found_error)
 				found_error = ret;
 			continue;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 3/4] btrfs: replace for_each_set_bit() with for_each_set_bitmap()
  2025-11-20  7:51 [PATCH v2 0/4] btrfs: reduce btrfs_get_extent() calls for buffered write path Qu Wenruo
  2025-11-20  7:51 ` [PATCH v2 1/4] btrfs: make sure all ordered extents beyond EOF is properly truncated Qu Wenruo
  2025-11-20  7:51 ` [PATCH v2 2/4] btrfs: integrate the error handling of submit_one_sector() Qu Wenruo
@ 2025-11-20  7:51 ` Qu Wenruo
  2025-11-20  7:51 ` [PATCH v2 4/4] btrfs: reduce extent map lookup during writes Qu Wenruo
  3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-11-20  7:51 UTC (permalink / raw)
  To: linux-btrfs

Inside extent_io.c, there are several simple call sites doing things
like:

	for_each_set_bit(bit, bitmap, bitmap_size) {
		/* handle one fs block */
	}

The workload includes:

- set_bit()
  Inside extent_writepage_io().

  This can be replaced with a bitmap_set().

- btrfs_folio_set_lock()
- btrfs_mark_ordered_io_finished()
  Inside writepage_delalloc().

  Instead of calling it multiple times, we can pass a range into the
  function with one call.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0f1bf9a4f0ae..87bf5ce17264 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1426,8 +1426,9 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	u64 delalloc_start = page_start;
 	u64 delalloc_end = page_end;
 	u64 delalloc_to_write = 0;
+	unsigned int start_bit;
+	unsigned int end_bit;
 	int ret = 0;
-	int bit;
 
 	/* Save the dirty bitmap as our submission bitmap will be a subset of it. */
 	if (btrfs_is_subpage(fs_info, folio)) {
@@ -1437,10 +1438,12 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 		bio_ctrl->submit_bitmap = 1;
 	}
 
-	for_each_set_bit(bit, &bio_ctrl->submit_bitmap, blocks_per_folio) {
-		u64 start = page_start + (bit << fs_info->sectorsize_bits);
+	for_each_set_bitrange(start_bit, end_bit, &bio_ctrl->submit_bitmap,
+			      blocks_per_folio) {
+		u64 start = page_start + (start_bit << fs_info->sectorsize_bits);
+		u32 len = (end_bit - start_bit) << fs_info->sectorsize_bits;
 
-		btrfs_folio_set_lock(fs_info, folio, start, fs_info->sectorsize);
+		btrfs_folio_set_lock(fs_info, folio, start, len);
 	}
 
 	/* Lock all (subpage) delalloc ranges inside the folio first. */
@@ -1557,10 +1560,13 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 				fs_info->sectorsize_bits,
 				blocks_per_folio);
 
-		for_each_set_bit(bit, &bio_ctrl->submit_bitmap, bitmap_size)
-			btrfs_mark_ordered_io_finished(inode, folio,
-				page_start + (bit << fs_info->sectorsize_bits),
-				fs_info->sectorsize, false);
+		for_each_set_bitrange(start_bit, end_bit, &bio_ctrl->submit_bitmap,
+				      bitmap_size) {
+			u64 start = page_start + (start_bit << fs_info->sectorsize_bits);
+			u32 len = (end_bit - start_bit) << fs_info->sectorsize_bits;
+
+			btrfs_mark_ordered_io_finished(inode, folio, start, len, false);
+		}
 		return ret;
 	}
 out:
@@ -1728,8 +1734,8 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 		return ret;
 	}
 
-	for (cur = start; cur < end; cur += fs_info->sectorsize)
-		set_bit((cur - folio_start) >> fs_info->sectorsize_bits, &range_bitmap);
+	bitmap_set(&range_bitmap, (start - folio_pos(folio)) >> fs_info->sectorsize_bits,
+		   len >> fs_info->sectorsize_bits);
 	bitmap_and(&bio_ctrl->submit_bitmap, &bio_ctrl->submit_bitmap, &range_bitmap,
 		   blocks_per_folio);
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 4/4] btrfs: reduce extent map lookup during writes
  2025-11-20  7:51 [PATCH v2 0/4] btrfs: reduce btrfs_get_extent() calls for buffered write path Qu Wenruo
                   ` (2 preceding siblings ...)
  2025-11-20  7:51 ` [PATCH v2 3/4] btrfs: replace for_each_set_bit() with for_each_set_bitmap() Qu Wenruo
@ 2025-11-20  7:51 ` Qu Wenruo
  3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-11-20  7:51 UTC (permalink / raw)
  To: linux-btrfs

With large data folios supports, even on x86_64 we can hit a folio that
contains several fs blocks.

In that case, we still need to call btrfs_get_extent() for each block,
as our submission path is still iterating each fs block and submit them
one by one. This reduces the benefit of large folios.

Change the behavior to submit the whole range when possible, this is
done by:

- Use for_each_set_bitrange() instead of for_each_set_bit()
  Now we can get a contiguous range to submit instead of a single fs
  block.

- Handle blocks beyond EOF in one go
  This is pretty much the same as the old behavior, but for a range
  crossing i_size, we finish the range beyond i_size first, then submit
  the remaining.

- Submit the contiguous range in one go
  Although we still need to consider the extent map boundary.

- Remove submit_one_sector()
  As it's no longer utilized.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 188 ++++++++++++++++++++++++-------------------
 1 file changed, 103 insertions(+), 85 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 87bf5ce17264..99cda33b7c43 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1602,91 +1602,100 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 	return 0;
 }
 
+
 /*
- * Return 0 if we have submitted or queued the sector for submission.
- * Return <0 for critical errors, and the involved sector will be cleaned up.
+ * Return 0 if we have submitted or queued the range for submission.
+ * Return <0 for critical errors, and the involved blocks will be cleaned up.
  *
- * Caller should make sure filepos < i_size and handle filepos >= i_size case.
+ * Caller should make sure the range doesn't go beyond the last block of the inode.
  */
-static int submit_one_sector(struct btrfs_inode *inode,
-			     struct folio *folio,
-			     u64 filepos, struct btrfs_bio_ctrl *bio_ctrl,
-			     loff_t i_size)
+static int submit_range(struct btrfs_inode *inode, struct folio *folio,
+			u64 start, u32 len, struct btrfs_bio_ctrl *bio_ctrl)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct extent_map *em;
-	u64 block_start;
-	u64 disk_bytenr;
-	u64 extent_offset;
-	u64 em_end;
 	const u32 sectorsize = fs_info->sectorsize;
+	u64 cur = start;
 
-	ASSERT(IS_ALIGNED(filepos, sectorsize));
+	ASSERT(IS_ALIGNED(start, sectorsize));
+	ASSERT(IS_ALIGNED(len, sectorsize));
+	ASSERT(start + len <= folio_end(folio));
 
-	/* @filepos >= i_size case should be handled by the caller. */
-	ASSERT(filepos < i_size);
-
-	em = btrfs_get_extent(inode, NULL, filepos, sectorsize);
-	if (IS_ERR(em)) {
-		/*
-		 * bio_ctrl may contain a bio crossing several folios.
-		 * Submit it immediately so that the bio has a chance
-		 * to finish normally, other than marked as error.
-		 */
-		submit_one_bio(bio_ctrl);
+	while (cur < start + len) {
+		struct extent_map *em;
+		u64 block_start;
+		u64 disk_bytenr;
+		u64 extent_offset;
+		u64 em_end;
+		u32 cur_len = start + len - cur;
 
 		/*
-		 * When submission failed, we should still clear the folio dirty.
-		 * Or the folio will be written back again but without any
-		 * ordered extent.
+		 * Here we call btrfs_get_extent() with sectorsize as @len
+		 * intentionally.
+		 * This allows us to catch any unexpected HOLE extent map
+		 * inside the range.
 		 */
-		btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
-		btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize);
-		btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize);
+		em = btrfs_get_extent(inode, NULL, cur, sectorsize);
+		if (IS_ERR(em)) {
+			/*
+			 * bio_ctrl may contain a bio crossing several folios.
+			 * Submit it immediately so that the bio has a chance
+			 * to finish normally, other than marked as error.
+			 */
+			submit_one_bio(bio_ctrl);
+
+			/*
+			 * When submission failed, we should still clear the folio dirty.
+			 * Or the folio will be written back again but without any
+			 * ordered extent.
+			 */
+			btrfs_folio_clear_dirty(fs_info, folio, cur, cur_len);
+			btrfs_folio_set_writeback(fs_info, folio, cur, cur_len);
+			btrfs_folio_clear_writeback(fs_info, folio, cur, cur_len);
+
+			/*
+			 * Since there is no bio submitted to finish the ordered
+			 * extent, we have to manually finish this range.
+			 */
+			btrfs_mark_ordered_io_finished(inode, folio, cur, cur_len, false);
+			return PTR_ERR(em);
+		}
+		extent_offset = cur - em->start;
+		em_end = btrfs_extent_map_end(em);
+		ASSERT(cur <= em_end);
+		ASSERT(IS_ALIGNED(em->start, sectorsize));
+		ASSERT(IS_ALIGNED(em->len, sectorsize));
+
+		block_start = btrfs_extent_map_block_start(em);
+		disk_bytenr = btrfs_extent_map_block_start(em) + extent_offset;
+
+		ASSERT(!btrfs_extent_map_is_compressed(em));
+		ASSERT(block_start != EXTENT_MAP_HOLE);
+		ASSERT(block_start != EXTENT_MAP_INLINE);
+
+		cur_len = min(cur_len, em_end - cur);
+		btrfs_free_extent_map(em);
+		em = NULL;
 
 		/*
-		 * Since there is no bio submitted to finish the ordered
-		 * extent, we have to manually finish this sector.
+		 * Although the PageDirty bit is cleared before entering this
+		 * function, subpage dirty bit is not cleared.
+		 * So clear subpage dirty bit here so next time we won't submit
+		 * a folio for a range already written to disk.
 		 */
-		btrfs_mark_ordered_io_finished(inode, folio, filepos,
-					       fs_info->sectorsize, false);
-		return PTR_ERR(em);
+		btrfs_folio_clear_dirty(fs_info, folio, cur, cur_len);
+		btrfs_folio_set_writeback(fs_info, folio, cur, cur_len);
+		/*
+		 * Above call should set the whole folio with writeback flag, even
+		 * just for a single subpage block.
+		 * As long as the folio is properly locked and the range is correct,
+		 * we should always get the folio with writeback flag.
+		 */
+		ASSERT(folio_test_writeback(folio));
+
+		submit_extent_folio(bio_ctrl, disk_bytenr, folio,
+				    cur_len, cur - folio_pos(folio), 0);
+		cur += cur_len;
 	}
-
-	extent_offset = filepos - em->start;
-	em_end = btrfs_extent_map_end(em);
-	ASSERT(filepos <= em_end);
-	ASSERT(IS_ALIGNED(em->start, sectorsize));
-	ASSERT(IS_ALIGNED(em->len, sectorsize));
-
-	block_start = btrfs_extent_map_block_start(em);
-	disk_bytenr = btrfs_extent_map_block_start(em) + extent_offset;
-
-	ASSERT(!btrfs_extent_map_is_compressed(em));
-	ASSERT(block_start != EXTENT_MAP_HOLE);
-	ASSERT(block_start != EXTENT_MAP_INLINE);
-
-	btrfs_free_extent_map(em);
-	em = NULL;
-
-	/*
-	 * Although the PageDirty bit is cleared before entering this
-	 * function, subpage dirty bit is not cleared.
-	 * So clear subpage dirty bit here so next time we won't submit
-	 * a folio for a range already written to disk.
-	 */
-	btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
-	btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize);
-	/*
-	 * Above call should set the whole folio with writeback flag, even
-	 * just for a single subpage sector.
-	 * As long as the folio is properly locked and the range is correct,
-	 * we should always get the folio with writeback flag.
-	 */
-	ASSERT(folio_test_writeback(folio));
-
-	submit_extent_folio(bio_ctrl, disk_bytenr, folio,
-			    sectorsize, filepos - folio_pos(folio), 0);
 	return 0;
 }
 
@@ -1712,8 +1721,9 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 	const u64 folio_start = folio_pos(folio);
 	const u64 folio_end = folio_start + folio_size(folio);
 	const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio);
-	u64 cur;
-	int bit;
+	unsigned int start_bit;
+	unsigned int end_bit;
+	const u64 rounded_isize = round_up(i_size, fs_info->sectorsize);
 	int ret = 0;
 
 	ASSERT(start >= folio_start, "start=%llu folio_start=%llu", start, folio_start);
@@ -1741,23 +1751,31 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 
 	bio_ctrl->end_io_func = end_bbio_data_write;
 
-	for_each_set_bit(bit, &bio_ctrl->submit_bitmap, blocks_per_folio) {
-		cur = folio_pos(folio) + (bit << fs_info->sectorsize_bits);
+	for_each_set_bitrange(start_bit, end_bit, &bio_ctrl->submit_bitmap, blocks_per_folio) {
+		const u64 cur_start = folio_pos(folio) + (start_bit << fs_info->sectorsize_bits);
+		u32 cur_len = (end_bit - start_bit) << fs_info->sectorsize_bits;
 
-		if (cur >= i_size) {
-			btrfs_mark_ordered_io_truncated(inode, folio, cur, end - cur);
+		if (cur_start > rounded_isize) {
 			/*
-			 * This range is beyond i_size, thus we don't need to
-			 * bother writing back.
-			 * But we still need to clear the dirty subpage bit, or
-			 * the next time the folio gets dirtied, we will try to
-			 * writeback the sectors with subpage dirty bits,
-			 * causing writeback without ordered extent.
+			 * The whole range is beyond EOF.
+			 *
+			 * Just finish the IO and skip to the next range.
 			 */
-			btrfs_folio_clear_dirty(fs_info, folio, cur, end - cur);
-			break;
+			btrfs_mark_ordered_io_truncated(inode, folio, cur_start, cur_len);
+			btrfs_folio_clear_dirty(fs_info, folio, cur_start, cur_len);
+			continue;
 		}
-		ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
+		if (cur_start + cur_len > rounded_isize) {
+			u32 truncate_len = cur_start + cur_len - rounded_isize;
+
+			/* The tailing part of the range is beyond EOF. */
+			btrfs_mark_ordered_io_truncated(inode, folio, rounded_isize, truncate_len);
+			btrfs_folio_clear_dirty(fs_info, folio, rounded_isize, truncate_len);
+			/* Shrink the range inside the EOF. */
+			cur_len = rounded_isize - cur_start;
+		}
+
+		ret = submit_range(inode, folio, cur_start, cur_len, bio_ctrl);
 		if (unlikely(ret < 0)) {
 			if (!found_error)
 				found_error = ret;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-11-20  7:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-20  7:51 [PATCH v2 0/4] btrfs: reduce btrfs_get_extent() calls for buffered write path Qu Wenruo
2025-11-20  7:51 ` [PATCH v2 1/4] btrfs: make sure all ordered extents beyond EOF is properly truncated Qu Wenruo
2025-11-20  7:51 ` [PATCH v2 2/4] btrfs: integrate the error handling of submit_one_sector() Qu Wenruo
2025-11-20  7:51 ` [PATCH v2 3/4] btrfs: replace for_each_set_bit() with for_each_set_bitmap() Qu Wenruo
2025-11-20  7:51 ` [PATCH v2 4/4] btrfs: reduce extent map lookup during writes Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox