[PATCH] btrfs: fix fill_holes() to merge a new hole with both adjacent items

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* [PATCH] btrfs: fix fill_holes() to merge a new hole with both adjacent items
@ 2026-04-29  2:13 Dave Chen
  2026-04-29 11:23 ` Filipe Manana
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Dave Chen @ 2026-04-29  2:13 UTC (permalink / raw)
  To: linux-btrfs, dsterba; +Cc: cccheng, Dave Chen

fill_holes() currently merges a punched hole with either the previous
or the next file extent item, but never both in the same call.  When
holes are punched in a non-sequential order this leaves consecutive
hole items in the file extent tree that should have been collapsed
into a single one.

For example:

  fallocate -p -o 4K  -l 4K ${FILE}
  fallocate -p -o 12K -l 4K ${FILE}
  fallocate -p -o 8K  -l 4K ${FILE}

After the third punch the [4K, 8K) and [12K, 16K) holes become
adjacent to the new [8K, 12K) hole, but fill_holes() merges only one
side and leaves two separate hole items ([4K, 12K) and [12K, 16K))
instead of the expected single [4K, 16K) hole item.

Fix this by checking both path->slots[0] - 1 and path->slots[0] in one
pass:

  - If only the previous slot is mergeable, extend it forward as
    before.
  - If only the next slot is mergeable, extend it backward and update
    its key offset as before.
  - If both are mergeable, extend the previous item to cover the new
    hole plus the next item, and remove the redundant next item with
    btrfs_del_items().

Because the merge path may now delete an item, switch the initial
btrfs_search_slot() call from a plain lookup (ins_len = 0) to a
search-for-deletion (ins_len = -1), so the leaf is prepared for a
possible item removal.

Fixes: 2aaa66558172 ("Btrfs: add hole punching")
Signed-off-by: Dave Chen <davechen@synology.com>
---
 fs/btrfs/file.c | 48 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cf1cb5c4db757..84450452d2347 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2093,6 +2093,10 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 	struct btrfs_file_extent_item *fi;
 	struct extent_map *hole_em;
 	struct btrfs_key key;
+	int modify_slot = -1;
+	int del_slot = -1;
+	bool update_offset = false;
+	u64 num_bytes = 0;
 	int ret;
 
 	if (btrfs_fs_incompat(fs_info, NO_HOLES))
@@ -2102,7 +2106,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 	key.type = BTRFS_EXTENT_DATA_KEY;
 	key.offset = offset;
 
-	ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
+	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
 	if (ret <= 0) {
 		/*
 		 * We should have dropped this offset, so if we find it then
@@ -2115,33 +2119,43 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 
 	leaf = path->nodes[0];
 	if (hole_mergeable(inode, leaf, path->slots[0] - 1, offset, end)) {
-		u64 num_bytes;
-
-		path->slots[0]--;
-		fi = btrfs_item_ptr(leaf, path->slots[0],
+		fi = btrfs_item_ptr(leaf, path->slots[0] - 1,
 				    struct btrfs_file_extent_item);
 		num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
 			end - offset;
-		btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
-		btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
-		btrfs_set_file_extent_offset(leaf, fi, 0);
-		btrfs_set_file_extent_generation(leaf, fi, trans->transid);
-		goto out;
+		modify_slot = path->slots[0] - 1;
 	}
-
 	if (hole_mergeable(inode, leaf, path->slots[0], offset, end)) {
-		u64 num_bytes;
-
-		key.offset = offset;
-		btrfs_set_item_key_safe(trans, path, &key);
 		fi = btrfs_item_ptr(leaf, path->slots[0],
 				    struct btrfs_file_extent_item);
-		num_bytes = btrfs_file_extent_num_bytes(leaf, fi) + end -
-			offset;
+		if (modify_slot != -1) {
+			num_bytes += btrfs_file_extent_num_bytes(leaf, fi);
+			del_slot = path->slots[0];
+		} else {
+			num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
+				end - offset;
+			modify_slot = path->slots[0];
+			update_offset = true;
+		}
+	}
+	if (modify_slot >= 0) {
+		fi = btrfs_item_ptr(leaf, modify_slot,
+				    struct btrfs_file_extent_item);
 		btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
 		btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
+		if (update_offset) {
+			key.offset = offset;
+			btrfs_set_item_key_safe(trans, path, &key);
+		}
 		btrfs_set_file_extent_offset(leaf, fi, 0);
 		btrfs_set_file_extent_generation(leaf, fi, trans->transid);
+		if (del_slot >= 0) {
+			ret = btrfs_del_items(trans, root, path, del_slot, 1);
+			if (ret) {
+				btrfs_release_path(path);
+				return ret;
+			}
+		}
 		goto out;
 	}
 	btrfs_release_path(path);
-- 
2.43.0


Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: fix fill_holes() to merge a new hole with both adjacent items
  2026-04-29  2:13 [PATCH] btrfs: fix fill_holes() to merge a new hole with both adjacent items Dave Chen
@ 2026-04-29 11:23 ` Filipe Manana
  2026-04-30  2:27 ` [PATCH v2] btrfs: optimize " Dave Chen
  2026-05-04  1:43 ` [PATCH v3] " Dave Chen
  2 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2026-04-29 11:23 UTC (permalink / raw)
  To: Dave Chen; +Cc: linux-btrfs, dsterba, cccheng

On Wed, Apr 29, 2026 at 3:16 AM Dave Chen <davechen@synology.com> wrote:
>
> fill_holes() currently merges a punched hole with either the previous
> or the next file extent item, but never both in the same call.  When
> holes are punched in a non-sequential order this leaves consecutive
> hole items in the file extent tree that should have been collapsed

file extent tree -> inode's subvolume tree

Using the term "file extent tree" is confusing because we have a
global extent tree for all data (and metadata) extents, and file items
are stored in a subvolume/fs tree.

> into a single one.

Ok, but that's harmless, we just use one more metadata item (file extent item).

>
> For example:
>
>   fallocate -p -o 4K  -l 4K ${FILE}
>   fallocate -p -o 12K -l 4K ${FILE}
>   fallocate -p -o 8K  -l 4K ${FILE}
>
> After the third punch the [4K, 8K) and [12K, 16K) holes become
> adjacent to the new [8K, 12K) hole, but fill_holes() merges only one
> side and leaves two separate hole items ([4K, 12K) and [12K, 16K))
> instead of the expected single [4K, 16K) hole item.
>
> Fix this by checking both path->slots[0] - 1 and path->slots[0] in one
> pass:
>
>   - If only the previous slot is mergeable, extend it forward as
>     before.
>   - If only the next slot is mergeable, extend it backward and update
>     its key offset as before.
>   - If both are mergeable, extend the previous item to cover the new
>     hole plus the next item, and remove the redundant next item with
>     btrfs_del_items().
>
> Because the merge path may now delete an item, switch the initial
> btrfs_search_slot() call from a plain lookup (ins_len = 0) to a
> search-for-deletion (ins_len = -1), so the leaf is prepared for a
> possible item removal.
>
> Fixes: 2aaa66558172 ("Btrfs: add hole punching")

I pointed this out to you in a previous patch: the Fixes tag is meant
only for bug fixes or serious performance regressions that impact
users.
This is a small inefficiency; having an extra file extent item is
harmless - and more irrelevant today since no-holes has been the
default for some years now.

You're making it sound like a functional bug or a serious performance issue.
Please make it explicit in the changelog that this is only a small
inefficiency.

Thanks.

> Signed-off-by: Dave Chen <davechen@synology.com>
> ---
>  fs/btrfs/file.c | 48 +++++++++++++++++++++++++++++++-----------------
>  1 file changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index cf1cb5c4db757..84450452d2347 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2093,6 +2093,10 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>         struct btrfs_file_extent_item *fi;
>         struct extent_map *hole_em;
>         struct btrfs_key key;
> +       int modify_slot = -1;
> +       int del_slot = -1;
> +       bool update_offset = false;
> +       u64 num_bytes = 0;
>         int ret;
>
>         if (btrfs_fs_incompat(fs_info, NO_HOLES))
> @@ -2102,7 +2106,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>         key.type = BTRFS_EXTENT_DATA_KEY;
>         key.offset = offset;
>
> -       ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
> +       ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>         if (ret <= 0) {
>                 /*
>                  * We should have dropped this offset, so if we find it then
> @@ -2115,33 +2119,43 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>
>         leaf = path->nodes[0];
>         if (hole_mergeable(inode, leaf, path->slots[0] - 1, offset, end)) {
> -               u64 num_bytes;
> -
> -               path->slots[0]--;
> -               fi = btrfs_item_ptr(leaf, path->slots[0],
> +               fi = btrfs_item_ptr(leaf, path->slots[0] - 1,
>                                     struct btrfs_file_extent_item);
>                 num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
>                         end - offset;
> -               btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
> -               btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
> -               btrfs_set_file_extent_offset(leaf, fi, 0);
> -               btrfs_set_file_extent_generation(leaf, fi, trans->transid);
> -               goto out;
> +               modify_slot = path->slots[0] - 1;
>         }
> -
>         if (hole_mergeable(inode, leaf, path->slots[0], offset, end)) {
> -               u64 num_bytes;
> -
> -               key.offset = offset;
> -               btrfs_set_item_key_safe(trans, path, &key);
>                 fi = btrfs_item_ptr(leaf, path->slots[0],
>                                     struct btrfs_file_extent_item);
> -               num_bytes = btrfs_file_extent_num_bytes(leaf, fi) + end -
> -                       offset;
> +               if (modify_slot != -1) {
> +                       num_bytes += btrfs_file_extent_num_bytes(leaf, fi);
> +                       del_slot = path->slots[0];
> +               } else {
> +                       num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
> +                               end - offset;
> +                       modify_slot = path->slots[0];
> +                       update_offset = true;
> +               }
> +       }
> +       if (modify_slot >= 0) {
> +               fi = btrfs_item_ptr(leaf, modify_slot,
> +                                   struct btrfs_file_extent_item);
>                 btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
>                 btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
> +               if (update_offset) {
> +                       key.offset = offset;
> +                       btrfs_set_item_key_safe(trans, path, &key);
> +               }
>                 btrfs_set_file_extent_offset(leaf, fi, 0);
>                 btrfs_set_file_extent_generation(leaf, fi, trans->transid);
> +               if (del_slot >= 0) {
> +                       ret = btrfs_del_items(trans, root, path, del_slot, 1);
> +                       if (ret) {
> +                               btrfs_release_path(path);
> +                               return ret;
> +                       }
> +               }
>                 goto out;
>         }
>         btrfs_release_path(path);
> --
> 2.43.0
>
>
> Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] btrfs: optimize fill_holes() to merge a new hole with both adjacent items
  2026-04-29  2:13 [PATCH] btrfs: fix fill_holes() to merge a new hole with both adjacent items Dave Chen
  2026-04-29 11:23 ` Filipe Manana
@ 2026-04-30  2:27 ` Dave Chen
  2026-04-30 11:21   ` Filipe Manana
  2026-05-04  1:43 ` [PATCH v3] " Dave Chen
  2 siblings, 1 reply; 6+ messages in thread
From: Dave Chen @ 2026-04-30  2:27 UTC (permalink / raw)
  To: fdmanana; +Cc: cccheng, davechen, linux-btrfs

fill_holes() currently merges a punched hole with either the previous
or the next file extent item, but never both in the same call.  When
holes are punched in a non-sequential order this leaves consecutive
hole items in the inode's subvolume tree that should have been collapsed
into a single one.

This is a minor metadata optimization that reduces the number of file
extent items when holes are punched in non-sequential order. While
having extra file extent items is harmless and has no functional
impact, reducing metadata overhead can benefit workloads with heavily
fragmented hole patterns.

For example:

  fallocate -p -o 4K  -l 4K ${FILE}
  fallocate -p -o 12K -l 4K ${FILE}
  fallocate -p -o 8K  -l 4K ${FILE}

After the third punch the [4K, 8K) and [12K, 16K) holes become
adjacent to the new [8K, 12K) hole, but fill_holes() merges only one
side and leaves two separate hole items ([4K, 12K) and [12K, 16K))
instead of the expected single [4K, 16K) hole item.

Fix this by checking both path->slots[0] - 1 and path->slots[0] in one
pass:

  - If only the previous slot is mergeable, extend it forward as
    before.
  - If only the next slot is mergeable, extend it backward and update
    its key offset as before.
  - If both are mergeable, extend the previous item to cover the new
    hole plus the next item, and remove the redundant next item with
    btrfs_del_items().

Because the merge path may now delete an item, switch the initial
btrfs_search_slot() call from a plain lookup (ins_len = 0) to a
search-for-deletion (ins_len = -1), so the leaf is prepared for a
possible item removal.

Note: This optimization only applies to filesystems without the
NO_HOLES feature enabled. Since NO_HOLES is now the default, this
primarily benefits older filesystems or those explicitly created with
NO_HOLES disabled.

Signed-off-by: Dave Chen <davechen@synology.com>
---
Changes in v2:
- Replace "file extent tree" with "inode's subvolume tree" to avoid
  confusion with the global extent tree (Filipe)
- Remove the Fixes: tag as this is a minor metadata optimization
  rather than a functional bug fix (Filipe)
- Reframe commit message to explicitly characterize this as a small
  optimization with no functional impact (Filipe)
- Add note about NO_HOLES default status and applicability scope

 fs/btrfs/file.c | 48 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cf1cb5c4db757..84450452d2347 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2093,6 +2093,10 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 	struct btrfs_file_extent_item *fi;
 	struct extent_map *hole_em;
 	struct btrfs_key key;
+	int modify_slot = -1;
+	int del_slot = -1;
+	bool update_offset = false;
+	u64 num_bytes = 0;
 	int ret;

 	if (btrfs_fs_incompat(fs_info, NO_HOLES))
@@ -2102,7 +2106,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 	key.type = BTRFS_EXTENT_DATA_KEY;
 	key.offset = offset;

-	ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
+	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
 	if (ret <= 0) {
 		/*
 		 * We should have dropped this offset, so if we find it then
@@ -2115,33 +2119,43 @@ static int fill_holes(struct btrfs_trans_handle *trans,

 	leaf = path->nodes[0];
 	if (hole_mergeable(inode, leaf, path->slots[0] - 1, offset, end)) {
-		u64 num_bytes;
-
-		path->slots[0]--;
-		fi = btrfs_item_ptr(leaf, path->slots[0],
+		fi = btrfs_item_ptr(leaf, path->slots[0] - 1,
 				    struct btrfs_file_extent_item);
 		num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
 			end - offset;
-		btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
-		btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
-		btrfs_set_file_extent_offset(leaf, fi, 0);
-		btrfs_set_file_extent_generation(leaf, fi, trans->transid);
-		goto out;
+		modify_slot = path->slots[0] - 1;
 	}
-
 	if (hole_mergeable(inode, leaf, path->slots[0], offset, end)) {
-		u64 num_bytes;
-
-		key.offset = offset;
-		btrfs_set_item_key_safe(trans, path, &key);
 		fi = btrfs_item_ptr(leaf, path->slots[0],
 				    struct btrfs_file_extent_item);
-		num_bytes = btrfs_file_extent_num_bytes(leaf, fi) + end -
-			offset;
+		if (modify_slot != -1) {
+			num_bytes += btrfs_file_extent_num_bytes(leaf, fi);
+			del_slot = path->slots[0];
+		} else {
+			num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
+				end - offset;
+			modify_slot = path->slots[0];
+			update_offset = true;
+		}
+	}
+	if (modify_slot >= 0) {
+		fi = btrfs_item_ptr(leaf, modify_slot,
+				    struct btrfs_file_extent_item);
 		btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
 		btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
+		if (update_offset) {
+			key.offset = offset;
+			btrfs_set_item_key_safe(trans, path, &key);
+		}
 		btrfs_set_file_extent_offset(leaf, fi, 0);
 		btrfs_set_file_extent_generation(leaf, fi, trans->transid);
+		if (del_slot >= 0) {
+			ret = btrfs_del_items(trans, root, path, del_slot, 1);
+			if (ret) {
+				btrfs_release_path(path);
+				return ret;
+			}
+		}
 		goto out;
 	}
 	btrfs_release_path(path);
-- 
2.43.0

Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] btrfs: optimize fill_holes() to merge a new hole with both adjacent items
  2026-04-30  2:27 ` [PATCH v2] btrfs: optimize " Dave Chen
@ 2026-04-30 11:21   ` Filipe Manana
  0 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2026-04-30 11:21 UTC (permalink / raw)
  To: Dave Chen; +Cc: fdmanana, cccheng, linux-btrfs

On Thu, Apr 30, 2026 at 3:31 AM Dave Chen <davechen@synology.com> wrote:
>
> fill_holes() currently merges a punched hole with either the previous
> or the next file extent item, but never both in the same call.  When
> holes are punched in a non-sequential order this leaves consecutive
> hole items in the inode's subvolume tree that should have been collapsed
> into a single one.
>
> This is a minor metadata optimization that reduces the number of file
> extent items when holes are punched in non-sequential order. While
> having extra file extent items is harmless and has no functional
> impact, reducing metadata overhead can benefit workloads with heavily
> fragmented hole patterns.
>
> For example:
>
>   fallocate -p -o 4K  -l 4K ${FILE}
>   fallocate -p -o 12K -l 4K ${FILE}
>   fallocate -p -o 8K  -l 4K ${FILE}
>
> After the third punch the [4K, 8K) and [12K, 16K) holes become
> adjacent to the new [8K, 12K) hole, but fill_holes() merges only one
> side and leaves two separate hole items ([4K, 12K) and [12K, 16K))
> instead of the expected single [4K, 16K) hole item.
>
> Fix this by checking both path->slots[0] - 1 and path->slots[0] in one
> pass:
>
>   - If only the previous slot is mergeable, extend it forward as
>     before.
>   - If only the next slot is mergeable, extend it backward and update
>     its key offset as before.
>   - If both are mergeable, extend the previous item to cover the new
>     hole plus the next item, and remove the redundant next item with
>     btrfs_del_items().
>
> Because the merge path may now delete an item, switch the initial
> btrfs_search_slot() call from a plain lookup (ins_len = 0) to a
> search-for-deletion (ins_len = -1), so the leaf is prepared for a
> possible item removal.
>
> Note: This optimization only applies to filesystems without the
> NO_HOLES feature enabled. Since NO_HOLES is now the default, this
> primarily benefits older filesystems or those explicitly created with
> NO_HOLES disabled.
>
> Signed-off-by: Dave Chen <davechen@synology.com>
> ---
> Changes in v2:
> - Replace "file extent tree" with "inode's subvolume tree" to avoid
>   confusion with the global extent tree (Filipe)
> - Remove the Fixes: tag as this is a minor metadata optimization
>   rather than a functional bug fix (Filipe)
> - Reframe commit message to explicitly characterize this as a small
>   optimization with no functional impact (Filipe)
> - Add note about NO_HOLES default status and applicability scope
>
>  fs/btrfs/file.c | 48 +++++++++++++++++++++++++++++++-----------------
>  1 file changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index cf1cb5c4db757..84450452d2347 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2093,6 +2093,10 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>         struct btrfs_file_extent_item *fi;
>         struct extent_map *hole_em;
>         struct btrfs_key key;
> +       int modify_slot = -1;
> +       int del_slot = -1;
> +       bool update_offset = false;
> +       u64 num_bytes = 0;
>         int ret;
>
>         if (btrfs_fs_incompat(fs_info, NO_HOLES))
> @@ -2102,7 +2106,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>         key.type = BTRFS_EXTENT_DATA_KEY;
>         key.offset = offset;
>
> -       ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
> +       ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>         if (ret <= 0) {
>                 /*
>                  * We should have dropped this offset, so if we find it then
> @@ -2115,33 +2119,43 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>
>         leaf = path->nodes[0];
>         if (hole_mergeable(inode, leaf, path->slots[0] - 1, offset, end)) {
> -               u64 num_bytes;
> -
> -               path->slots[0]--;
> -               fi = btrfs_item_ptr(leaf, path->slots[0],
> +               fi = btrfs_item_ptr(leaf, path->slots[0] - 1,
>                                     struct btrfs_file_extent_item);
>                 num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
>                         end - offset;
> -               btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
> -               btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
> -               btrfs_set_file_extent_offset(leaf, fi, 0);
> -               btrfs_set_file_extent_generation(leaf, fi, trans->transid);
> -               goto out;
> +               modify_slot = path->slots[0] - 1;
>         }
> -
>         if (hole_mergeable(inode, leaf, path->slots[0], offset, end)) {
> -               u64 num_bytes;
> -
> -               key.offset = offset;
> -               btrfs_set_item_key_safe(trans, path, &key);
>                 fi = btrfs_item_ptr(leaf, path->slots[0],
>                                     struct btrfs_file_extent_item);
> -               num_bytes = btrfs_file_extent_num_bytes(leaf, fi) + end -
> -                       offset;
> +               if (modify_slot != -1) {
> +                       num_bytes += btrfs_file_extent_num_bytes(leaf, fi);
> +                       del_slot = path->slots[0];
> +               } else {
> +                       num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
> +                               end - offset;
> +                       modify_slot = path->slots[0];
> +                       update_offset = true;
> +               }
> +       }
> +       if (modify_slot >= 0) {
> +               fi = btrfs_item_ptr(leaf, modify_slot,
> +                                   struct btrfs_file_extent_item);
>                 btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
>                 btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
> +               if (update_offset) {
> +                       key.offset = offset;
> +                       btrfs_set_item_key_safe(trans, path, &key);
> +               }
>                 btrfs_set_file_extent_offset(leaf, fi, 0);
>                 btrfs_set_file_extent_generation(leaf, fi, trans->transid);
> +               if (del_slot >= 0) {
> +                       ret = btrfs_del_items(trans, root, path, del_slot, 1);
> +                       if (ret) {

Here, upon error, we must abort the transaction, otherwise metadata
will be inconsistent because we modified the item at modify_slot.

Otherwise it looks fine, thanks.

> +                               btrfs_release_path(path);
> +                               return ret;
> +                       }
> +               }
>                 goto out;
>         }
>         btrfs_release_path(path);
> --
> 2.43.0
>
>
> Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3] btrfs: optimize fill_holes() to merge a new hole with both adjacent items
  2026-04-29  2:13 [PATCH] btrfs: fix fill_holes() to merge a new hole with both adjacent items Dave Chen
  2026-04-29 11:23 ` Filipe Manana
  2026-04-30  2:27 ` [PATCH v2] btrfs: optimize " Dave Chen
@ 2026-05-04  1:43 ` Dave Chen
  2026-05-04 15:47   ` Filipe Manana
  2 siblings, 1 reply; 6+ messages in thread
From: Dave Chen @ 2026-05-04  1:43 UTC (permalink / raw)
  To: fdmanana; +Cc: cccheng, davechen, linux-btrfs

fill_holes() currently merges a punched hole with either the previous
or the next file extent item, but never both in the same call.  When
holes are punched in a non-sequential order this leaves consecutive
hole items in the inode's subvolume tree that should have been collapsed
into a single one.

This is a minor metadata optimization that reduces the number of file
extent items when holes are punched in non-sequential order. While
having extra file extent items is harmless and has no functional
impact, reducing metadata overhead can benefit workloads with heavily
fragmented hole patterns.

For example:

  fallocate -p -o 4K  -l 4K ${FILE}
  fallocate -p -o 12K -l 4K ${FILE}
  fallocate -p -o 8K  -l 4K ${FILE}

After the third punch the [4K, 8K) and [12K, 16K) holes become
adjacent to the new [8K, 12K) hole, but fill_holes() merges only one
side and leaves two separate hole items ([4K, 12K) and [12K, 16K))
instead of the expected single [4K, 16K) hole item.

Fix this by checking both path->slots[0] - 1 and path->slots[0] in one
pass:

  - If only the previous slot is mergeable, extend it forward as
    before.
  - If only the next slot is mergeable, extend it backward and update
    its key offset as before.
  - If both are mergeable, extend the previous item to cover the new
    hole plus the next item, and remove the redundant next item with
    btrfs_del_items().

Because the merge path may now delete an item, switch the initial
btrfs_search_slot() call from a plain lookup (ins_len = 0) to a
search-for-deletion (ins_len = -1), so the leaf is prepared for a
possible item removal.

Note: This optimization only applies to filesystems without the
NO_HOLES feature enabled. Since NO_HOLES is now the default, this
primarily benefits older filesystems or those explicitly created with
NO_HOLES disabled.

Signed-off-by: Dave Chen <davechen@synology.com>
---
Changes in v3:
- Add btrfs_abort_transaction() on btrfs_del_items() failure to
  prevent metadata inconsistency (Filipe)

Changes in v2:
- Replace "file extent tree" with "inode's subvolume tree" to avoid
  confusion with the global extent tree (Filipe)
- Remove the Fixes: tag as this is a minor metadata optimization
  rather than a functional bug fix (Filipe)
- Reframe commit message to explicitly characterize this as a small
  optimization with no functional impact (Filipe)
- Add note about NO_HOLES default status and applicability scope

 fs/btrfs/file.c | 49 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 32 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cf1cb5c4db757..44ed7ddecd451 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2093,6 +2093,10 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 	struct btrfs_file_extent_item *fi;
 	struct extent_map *hole_em;
 	struct btrfs_key key;
+	int modify_slot = -1;
+	int del_slot = -1;
+	bool update_offset = false;
+	u64 num_bytes = 0;
 	int ret;
 
 	if (btrfs_fs_incompat(fs_info, NO_HOLES))
@@ -2102,7 +2106,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 	key.type = BTRFS_EXTENT_DATA_KEY;
 	key.offset = offset;
 
-	ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
+	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
 	if (ret <= 0) {
 		/*
 		 * We should have dropped this offset, so if we find it then
@@ -2115,33 +2119,44 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 
 	leaf = path->nodes[0];
 	if (hole_mergeable(inode, leaf, path->slots[0] - 1, offset, end)) {
-		u64 num_bytes;
-
-		path->slots[0]--;
-		fi = btrfs_item_ptr(leaf, path->slots[0],
+		fi = btrfs_item_ptr(leaf, path->slots[0] - 1,
 				    struct btrfs_file_extent_item);
 		num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
 			end - offset;
-		btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
-		btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
-		btrfs_set_file_extent_offset(leaf, fi, 0);
-		btrfs_set_file_extent_generation(leaf, fi, trans->transid);
-		goto out;
+		modify_slot = path->slots[0] - 1;
 	}
-
 	if (hole_mergeable(inode, leaf, path->slots[0], offset, end)) {
-		u64 num_bytes;
-
-		key.offset = offset;
-		btrfs_set_item_key_safe(trans, path, &key);
 		fi = btrfs_item_ptr(leaf, path->slots[0],
 				    struct btrfs_file_extent_item);
-		num_bytes = btrfs_file_extent_num_bytes(leaf, fi) + end -
-			offset;
+		if (modify_slot != -1) {
+			num_bytes += btrfs_file_extent_num_bytes(leaf, fi);
+			del_slot = path->slots[0];
+		} else {
+			num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
+				end - offset;
+			modify_slot = path->slots[0];
+			update_offset = true;
+		}
+	}
+	if (modify_slot >= 0) {
+		fi = btrfs_item_ptr(leaf, modify_slot,
+				    struct btrfs_file_extent_item);
 		btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
 		btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
+		if (update_offset) {
+			key.offset = offset;
+			btrfs_set_item_key_safe(trans, path, &key);
+		}
 		btrfs_set_file_extent_offset(leaf, fi, 0);
 		btrfs_set_file_extent_generation(leaf, fi, trans->transid);
+		if (del_slot >= 0) {
+			ret = btrfs_del_items(trans, root, path, del_slot, 1);
+			if (ret) {
+				btrfs_abort_transaction(trans, ret);
+				btrfs_release_path(path);
+				return ret;
+			}
+		}
 		goto out;
 	}
 	btrfs_release_path(path);
-- 
2.43.0


Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] btrfs: optimize fill_holes() to merge a new hole with both adjacent items
  2026-05-04  1:43 ` [PATCH v3] " Dave Chen
@ 2026-05-04 15:47   ` Filipe Manana
  0 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2026-05-04 15:47 UTC (permalink / raw)
  To: Dave Chen; +Cc: fdmanana, cccheng, linux-btrfs

On Mon, May 4, 2026 at 2:44 AM Dave Chen <davechen@synology.com> wrote:
>
> fill_holes() currently merges a punched hole with either the previous
> or the next file extent item, but never both in the same call.  When
> holes are punched in a non-sequential order this leaves consecutive
> hole items in the inode's subvolume tree that should have been collapsed
> into a single one.
>
> This is a minor metadata optimization that reduces the number of file
> extent items when holes are punched in non-sequential order. While
> having extra file extent items is harmless and has no functional
> impact, reducing metadata overhead can benefit workloads with heavily
> fragmented hole patterns.
>
> For example:
>
>   fallocate -p -o 4K  -l 4K ${FILE}
>   fallocate -p -o 12K -l 4K ${FILE}
>   fallocate -p -o 8K  -l 4K ${FILE}
>
> After the third punch the [4K, 8K) and [12K, 16K) holes become
> adjacent to the new [8K, 12K) hole, but fill_holes() merges only one
> side and leaves two separate hole items ([4K, 12K) and [12K, 16K))
> instead of the expected single [4K, 16K) hole item.
>
> Fix this by checking both path->slots[0] - 1 and path->slots[0] in one
> pass:
>
>   - If only the previous slot is mergeable, extend it forward as
>     before.
>   - If only the next slot is mergeable, extend it backward and update
>     its key offset as before.
>   - If both are mergeable, extend the previous item to cover the new
>     hole plus the next item, and remove the redundant next item with
>     btrfs_del_items().
>
> Because the merge path may now delete an item, switch the initial
> btrfs_search_slot() call from a plain lookup (ins_len = 0) to a
> search-for-deletion (ins_len = -1), so the leaf is prepared for a
> possible item removal.
>
> Note: This optimization only applies to filesystems without the
> NO_HOLES feature enabled. Since NO_HOLES is now the default, this
> primarily benefits older filesystems or those explicitly created with
> NO_HOLES disabled.
>
> Signed-off-by: Dave Chen <davechen@synology.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks, pushed it to the for-next branch in github.

> ---
> Changes in v3:
> - Add btrfs_abort_transaction() on btrfs_del_items() failure to
>   prevent metadata inconsistency (Filipe)
>
> Changes in v2:
> - Replace "file extent tree" with "inode's subvolume tree" to avoid
>   confusion with the global extent tree (Filipe)
> - Remove the Fixes: tag as this is a minor metadata optimization
>   rather than a functional bug fix (Filipe)
> - Reframe commit message to explicitly characterize this as a small
>   optimization with no functional impact (Filipe)
> - Add note about NO_HOLES default status and applicability scope
>
>  fs/btrfs/file.c | 49 ++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 32 insertions(+), 17 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index cf1cb5c4db757..44ed7ddecd451 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2093,6 +2093,10 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>         struct btrfs_file_extent_item *fi;
>         struct extent_map *hole_em;
>         struct btrfs_key key;
> +       int modify_slot = -1;
> +       int del_slot = -1;
> +       bool update_offset = false;
> +       u64 num_bytes = 0;
>         int ret;
>
>         if (btrfs_fs_incompat(fs_info, NO_HOLES))
> @@ -2102,7 +2106,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>         key.type = BTRFS_EXTENT_DATA_KEY;
>         key.offset = offset;
>
> -       ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
> +       ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>         if (ret <= 0) {
>                 /*
>                  * We should have dropped this offset, so if we find it then
> @@ -2115,33 +2119,44 @@ static int fill_holes(struct btrfs_trans_handle *trans,
>
>         leaf = path->nodes[0];
>         if (hole_mergeable(inode, leaf, path->slots[0] - 1, offset, end)) {
> -               u64 num_bytes;
> -
> -               path->slots[0]--;
> -               fi = btrfs_item_ptr(leaf, path->slots[0],
> +               fi = btrfs_item_ptr(leaf, path->slots[0] - 1,
>                                     struct btrfs_file_extent_item);
>                 num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
>                         end - offset;
> -               btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
> -               btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
> -               btrfs_set_file_extent_offset(leaf, fi, 0);
> -               btrfs_set_file_extent_generation(leaf, fi, trans->transid);
> -               goto out;
> +               modify_slot = path->slots[0] - 1;
>         }
> -
>         if (hole_mergeable(inode, leaf, path->slots[0], offset, end)) {
> -               u64 num_bytes;
> -
> -               key.offset = offset;
> -               btrfs_set_item_key_safe(trans, path, &key);
>                 fi = btrfs_item_ptr(leaf, path->slots[0],
>                                     struct btrfs_file_extent_item);
> -               num_bytes = btrfs_file_extent_num_bytes(leaf, fi) + end -
> -                       offset;
> +               if (modify_slot != -1) {
> +                       num_bytes += btrfs_file_extent_num_bytes(leaf, fi);
> +                       del_slot = path->slots[0];
> +               } else {
> +                       num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
> +                               end - offset;
> +                       modify_slot = path->slots[0];
> +                       update_offset = true;
> +               }
> +       }
> +       if (modify_slot >= 0) {
> +               fi = btrfs_item_ptr(leaf, modify_slot,
> +                                   struct btrfs_file_extent_item);
>                 btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
>                 btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
> +               if (update_offset) {
> +                       key.offset = offset;
> +                       btrfs_set_item_key_safe(trans, path, &key);
> +               }
>                 btrfs_set_file_extent_offset(leaf, fi, 0);
>                 btrfs_set_file_extent_generation(leaf, fi, trans->transid);
> +               if (del_slot >= 0) {
> +                       ret = btrfs_del_items(trans, root, path, del_slot, 1);
> +                       if (ret) {
> +                               btrfs_abort_transaction(trans, ret);
> +                               btrfs_release_path(path);
> +                               return ret;
> +                       }
> +               }
>                 goto out;
>         }
>         btrfs_release_path(path);
> --
> 2.43.0
>
>
> Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-04 15:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29  2:13 [PATCH] btrfs: fix fill_holes() to merge a new hole with both adjacent items Dave Chen
2026-04-29 11:23 ` Filipe Manana
2026-04-30  2:27 ` [PATCH v2] btrfs: optimize " Dave Chen
2026-04-30 11:21   ` Filipe Manana
2026-05-04  1:43 ` [PATCH v3] " Dave Chen
2026-05-04 15:47   ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox