public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: some optimizations for send
@ 2024-02-19 11:59 fdmanana
  2024-02-19 11:59 ` [PATCH 1/2] btrfs: send: don't issue unnecessary zero writes for trailing hole fdmanana
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: fdmanana @ 2024-02-19 11:59 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

The following are two optimizations for send, one to avoid sending
unnecessary holes (writes full of zeros), which can waste a lot of space
at the receiver and increase stream size for cases where sparse files are
used, such as images of thin provisioned filesystems for example as
recently reported by a user. The second is just a small optimization to
avoid repeating a btree search. More details in the respective change logs.

Filipe Manana (2):
  btrfs: send: don't issue unnecessary zero writes for trailing hole
  btrfs: send: avoid duplicated search for last extent when sending hole

 fs/btrfs/send.c | 44 +++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 19 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] btrfs: send: don't issue unnecessary zero writes for trailing hole
  2024-02-19 11:59 [PATCH 0/2] btrfs: some optimizations for send fdmanana
@ 2024-02-19 11:59 ` fdmanana
  2024-02-19 11:59 ` [PATCH 2/2] btrfs: send: avoid duplicated search for last extent when sending hole fdmanana
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: fdmanana @ 2024-02-19 11:59 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

If we have a sparse file with a trailing hole (from the last extent's end
to i_size) and then create an extent in the file that ends before the
file's i_size, then when doing an incremental send we will issue a write
full of zeroes for the range that starts immediately after the new extent
ends up to i_size. While this isn't incorrect because the file ends up
with exactly the same data, it unnecessarily results in using extra space
at the destination with one or more extents full of zeroes instead of
having a hole. In same cases this results in using megabytes or even
gigabytes of unnecessary space.

Example, reproducer:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/sdh
   MNT=/mnt/sdh

   mkfs.btrfs -f $DEV
   mount $DEV $MNT

   # Create 1G sparse file.
   xfs_io -f -c "truncate 1G" $MNT/foobar

   # Create base snapshot.
   btrfs subvolume snapshot -r $MNT $MNT/mysnap1

   # Create send stream (full send) for the base snapshot.
   btrfs send -f /tmp/1.snap $MNT/mysnap1

   # Now write one extent at the beginning of the file and one somewhere
   # in the middle, leaving a gap between the end of this second extent
   # and the file's size.
   xfs_io -c "pwrite -S 0xab 0 128K" \
          -c "pwrite -S 0xcd 512M 128K" \
          $MNT/foobar

   # Now create a second snapshot which is going to be used for an
   # incremental send operation.
   btrfs subvolume snapshot -r $MNT $MNT/mysnap2

   # Create send stream (incremental send) for the second snapshot.
   btrfs send -p $MNT/mysnap1 -f /tmp/2.snap $MNT/mysnap2

   # Now recreate the filesystem by receiving both send streams and
   # verify we get the same content that the original filesystem had
   # and file foobar has only two extents with a size of 128K each.
   umount $MNT
   mkfs.btrfs -f $DEV
   mount $DEV $MNT

   btrfs receive -f /tmp/1.snap $MNT
   btrfs receive -f /tmp/2.snap $MNT

   echo -e "\nFile fiemap in the second snapshot:"
   # Should have:
   #
   # 128K extent at file range [0, 128K[
   # hole at file range [128K, 512M[
   # 128K extent file range [512M, 512M + 128K[
   # hole at file range [512M + 128K, 1G[
   xfs_io -r -c "fiemap -v" $MNT/mysnap2/foobar

   # File should be using 256K of data (two 128K extents).
   echo -e "\nSpace used by the file: $(du -h $MNT/mysnap2/foobar | cut -f 1)"

   umount $MNT

Running the test, we can see with fiemap that we get an extent for the
range [512M, 1G[, while in the source filesystem we have an extent for
the range [512M, 512M + 128K[ and a hole for the rest of the file (the
range [512M + 128K, 1G[):

   $ ./test.sh
   (...)
   File fiemap in the second snapshot:
   /mnt/sdh/mysnap2/foobar:
    EXT: FILE-OFFSET        BLOCK-RANGE        TOTAL FLAGS
      0: [0..255]:          26624..26879         256   0x0
      1: [256..1048575]:    hole             1048320
      2: [1048576..2097151]: 2156544..3205119 1048576   0x1

   Space used by the file: 513M

This happens because once we finish processing an inode, at
finish_inode_if_needed(), we always issue a hole (write operations full
of zeros) if there's a gap between the end of the last processed extent
and the file's size, even if that range is already a hole in the parent
snapshot. Fix this by issuing the hole only if the range is not already
a hole.

After this change, running the test above, we get the expected layout:

   $ ./test.sh
   (...)
   File fiemap in the second snapshot:
   /mnt/sdh/mysnap2/foobar:
    EXT: FILE-OFFSET        BLOCK-RANGE      TOTAL FLAGS
      0: [0..255]:          26624..26879       256   0x0
      1: [256..1048575]:    hole             1048320
      2: [1048576..1048831]: 26880..27135       256   0x1
      3: [1048832..2097151]: hole             1048320

   Space used by the file: 256K

A test case for fstests will follow soon.

CC: stable@vger.kernel.org # 6.1+
Reported-by: Dorai Ashok S A <dash.btrfs@inix.me>
Link: https://lore.kernel.org/linux-btrfs/c0bf7818-9c45-46a8-b3d3-513230d0c86e@inix.me/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/send.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index e96d511f9dd9..dc18d5624ec7 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6723,11 +6723,20 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
 				if (ret)
 					goto out;
 			}
-			if (sctx->cur_inode_last_extent <
-			    sctx->cur_inode_size) {
-				ret = send_hole(sctx, sctx->cur_inode_size);
-				if (ret)
+			if (sctx->cur_inode_last_extent < sctx->cur_inode_size) {
+				ret = range_is_hole_in_parent(sctx,
+						      sctx->cur_inode_last_extent,
+						      sctx->cur_inode_size);
+				if (ret < 0) {
 					goto out;
+				} else if (ret == 0) {
+					ret = send_hole(sctx, sctx->cur_inode_size);
+					if (ret < 0)
+						goto out;
+				} else {
+					/* Range is already a hole, skip. */
+					ret = 0;
+				}
 			}
 		}
 		if (need_truncate) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] btrfs: send: avoid duplicated search for last extent when sending hole
  2024-02-19 11:59 [PATCH 0/2] btrfs: some optimizations for send fdmanana
  2024-02-19 11:59 ` [PATCH 1/2] btrfs: send: don't issue unnecessary zero writes for trailing hole fdmanana
@ 2024-02-19 11:59 ` fdmanana
  2024-02-19 19:00 ` [PATCH 0/2] btrfs: some optimizations for send Sweet Tea Dorminy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: fdmanana @ 2024-02-19 11:59 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

During an incremental send, before determining if we need to send a hole
(write operations full of zeroes) we will search for the last extent's
end offset if we are at the first slot of a leaf and the last processed
extent's end offset is smaller then the current extent's start offset.
However we are repeating this search in case we had the last extent's end
offset undefined (set to the (u64)-1 value) when we entered
maybe_send_hole(), wasting time.

So avoid this duplicated search by combining the two conditions that
trigger a search for the last extent's end offset into a single if
statement.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/send.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index dc18d5624ec7..a5da096c64fe 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6476,21 +6476,18 @@ static int maybe_send_hole(struct send_ctx *sctx, struct btrfs_path *path,
 	if (sctx->cur_ino != key->objectid || !need_send_hole(sctx))
 		return 0;
 
-	if (sctx->cur_inode_last_extent == (u64)-1) {
-		ret = get_last_extent(sctx, key->offset - 1);
-		if (ret)
-			return ret;
-	}
-
-	if (path->slots[0] == 0 &&
-	    sctx->cur_inode_last_extent < key->offset) {
-		/*
-		 * We might have skipped entire leafs that contained only
-		 * file extent items for our current inode. These leafs have
-		 * a generation number smaller (older) than the one in the
-		 * current leaf and the leaf our last extent came from, and
-		 * are located between these 2 leafs.
-		 */
+	/*
+	 * Get last extent's end offset (exclusive) if we haven't determined it
+	 * yet (we're processing the first file extent item that is new), or if
+	 * we're at the first slot of a leaf and the last extent's end is less
+	 * than the current extent's offset, because we might have skipped
+	 * entire leaves that contained only file extent items for our current
+	 * inode. These leaves have a generation number smaller (older) than the
+	 * one in the current leaf and the leaf our last extent came from, and
+	 * are located between these 2 leaves.
+	 */
+	if ((sctx->cur_inode_last_extent == (u64)-1) ||
+	    (path->slots[0] == 0 && sctx->cur_inode_last_extent < key->offset)) {
 		ret = get_last_extent(sctx, key->offset - 1);
 		if (ret)
 			return ret;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] btrfs: some optimizations for send
  2024-02-19 11:59 [PATCH 0/2] btrfs: some optimizations for send fdmanana
  2024-02-19 11:59 ` [PATCH 1/2] btrfs: send: don't issue unnecessary zero writes for trailing hole fdmanana
  2024-02-19 11:59 ` [PATCH 2/2] btrfs: send: avoid duplicated search for last extent when sending hole fdmanana
@ 2024-02-19 19:00 ` Sweet Tea Dorminy
  2024-02-19 19:10 ` David Sterba
  2024-02-20 15:21 ` Josef Bacik
  4 siblings, 0 replies; 6+ messages in thread
From: Sweet Tea Dorminy @ 2024-02-19 19:00 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2/19/24 06:59, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> The following are two optimizations for send, one to avoid sending
> unnecessary holes (writes full of zeros), which can waste a lot of space
> at the receiver and increase stream size for cases where sparse files are
> used, such as images of thin provisioned filesystems for example as
> recently reported by a user. The second is just a small optimization to
> avoid repeating a btree search. More details in the respective change logs.
> 
> Filipe Manana (2):
>    btrfs: send: don't issue unnecessary zero writes for trailing hole
>    btrfs: send: avoid duplicated search for last extent when sending hole
> 
>   fs/btrfs/send.c | 44 +++++++++++++++++++++++++-------------------
>   1 file changed, 25 insertions(+), 19 deletions(-)
> 

For both:
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] btrfs: some optimizations for send
  2024-02-19 11:59 [PATCH 0/2] btrfs: some optimizations for send fdmanana
                   ` (2 preceding siblings ...)
  2024-02-19 19:00 ` [PATCH 0/2] btrfs: some optimizations for send Sweet Tea Dorminy
@ 2024-02-19 19:10 ` David Sterba
  2024-02-20 15:21 ` Josef Bacik
  4 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2024-02-19 19:10 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Mon, Feb 19, 2024 at 11:59:29AM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> The following are two optimizations for send, one to avoid sending
> unnecessary holes (writes full of zeros), which can waste a lot of space
> at the receiver and increase stream size for cases where sparse files are
> used, such as images of thin provisioned filesystems for example as
> recently reported by a user. The second is just a small optimization to
> avoid repeating a btree search. More details in the respective change logs.
> 
> Filipe Manana (2):
>   btrfs: send: don't issue unnecessary zero writes for trailing hole
>   btrfs: send: avoid duplicated search for last extent when sending hole

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] btrfs: some optimizations for send
  2024-02-19 11:59 [PATCH 0/2] btrfs: some optimizations for send fdmanana
                   ` (3 preceding siblings ...)
  2024-02-19 19:10 ` David Sterba
@ 2024-02-20 15:21 ` Josef Bacik
  4 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2024-02-20 15:21 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Mon, Feb 19, 2024 at 11:59:29AM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> The following are two optimizations for send, one to avoid sending
> unnecessary holes (writes full of zeros), which can waste a lot of space
> at the receiver and increase stream size for cases where sparse files are
> used, such as images of thin provisioned filesystems for example as
> recently reported by a user. The second is just a small optimization to
> avoid repeating a btree search. More details in the respective change logs.
> 

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-02-20 15:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-19 11:59 [PATCH 0/2] btrfs: some optimizations for send fdmanana
2024-02-19 11:59 ` [PATCH 1/2] btrfs: send: don't issue unnecessary zero writes for trailing hole fdmanana
2024-02-19 11:59 ` [PATCH 2/2] btrfs: send: avoid duplicated search for last extent when sending hole fdmanana
2024-02-19 19:00 ` [PATCH 0/2] btrfs: some optimizations for send Sweet Tea Dorminy
2024-02-19 19:10 ` David Sterba
2024-02-20 15:21 ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox