[PATCH 0/2] btrfs-progs: mkfs/rootdir: add hole detection

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] btrfs-progs: mkfs/rootdir: add hole detection
@ 2026-03-23  5:51 Qu Wenruo
  2026-03-23  5:51 ` [PATCH 1/2] " Qu Wenruo
  2026-03-23  5:51 ` [PATCH 2/2] btrfs-progs: mkfs-tests: add a test case for mkfs " Qu Wenruo
  0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2026-03-23  5:51 UTC (permalink / raw)
  To: linux-btrfs

This patchset is already submitted as a github PR, this is mostly for
tracking purposes:

 https://github.com/kdave/btrfs-progs/pull/1097

Currently mkfs.btrfs --rootdir will always read out the content from
rootdir, and write them into the target fs.

This is IO consuming as we need to read out the data from host fs, and
write them into the target btrfs, and also CPU consuming as we need to
generate the data checksum.

With hole detection we can not only skip the IO and csum generation, but
also result less space usage for the target fs.

The first patch is implementing the new hole detection, and the second
one is the test case for it.

There is a small note for the test case. As I mentioned the 2GiB hole
size problem in the first place, I should put the test case to include a
huge hole (>=2GiB), but unfortunately that will greatly slow down the
test case since md5sum doesn't have any shortcut to handle large holes.

Considering the CI machine time increase, I intentionally limited the
hole size in the test case.

If one day we're not limited by the free machine time from github CI or
have a local CI worker, I'm definitely fine to add a new large hole into
the test case.

Qu Wenruo (2):
  btrfs-progs: mkfs/rootdir: add hole detection
  btrfs-progs: mkfs-tests: add a test case for mkfs hole detection

 mkfs/rootdir.c                              | 36 ++++++++++-
 tests/mkfs-tests/041-hole-detection/test.sh | 70 +++++++++++++++++++++
 2 files changed, 103 insertions(+), 3 deletions(-)
 create mode 100755 tests/mkfs-tests/041-hole-detection/test.sh

--
2.53.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] btrfs-progs: mkfs/rootdir: add hole detection
  2026-03-23  5:51 [PATCH 0/2] btrfs-progs: mkfs/rootdir: add hole detection Qu Wenruo
@ 2026-03-23  5:51 ` Qu Wenruo
  2026-03-23  5:51 ` [PATCH 2/2] btrfs-progs: mkfs-tests: add a test case for mkfs " Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2026-03-23  5:51 UTC (permalink / raw)
  To: linux-btrfs

Currently mkfs.btrfs --rootdir can not detect holes, thus it will always
read out the content of the files then write them into the destination
filesystem.

This can be time consuming for both the IO and checksum calculation, so
it can save a lot of CPU and IO time to skip those holes.

This patch adds such detection through SEEK_DATA flag for lseek64().

There are some minor corner cases:

- If the @file_pos is already inside a hole covering EOF
  lseek64() will return -1 and set errno to ENXIO.
  We need to properly handle this case other than treating it as an
  error.

- The host fs may have a different block size
  The returned offset from lseek64() is only aligned to the host fs
  block size, which can be different from the target btrfs block size.
  This is especially important as we are going to support all block
  sizes, thus we need to take extra care.

  Here we make it simple by rejecting any returned position that is not
  aligned to the target fs block size, and fallback to regular
  read/write path.

  In theory we can do better, but I believe 4K fs block will be the
  standard thus we do not need to bother too much yet.

- Limit the read length to the next hole start
  Otherwise we can read out the hole just after the non-hole range.

- Prevent return value overflow of add_file_item_extent()
  Since we can get a huge hole, and that function returns the number of
  bytes processed, for a huge hole (>= 2GiB) the length will
  overflow INT_MAX and make the caller treat the return value as an error.

  Here we limit the hole lenght to 1GiB to prevent overflow.

Here is a simple benchmark, almost the best case scenario:

 rootdir/
 \- middle_hole

The file "middle_hole" is generated by the following command:

 xfs_io -f -c "pwrite 2g 4k" -c "truncate 5g" middle_hole

So it's a single 4K block at the middle of 5G sparse file.

Unpatched timed mkfs:

 real	0m24.824s
 user	0m22.213s
 sys	0m2.078s

Patched timed mkfs:

 real	0m0.026s
 user	0m0.006s
 sys	0m0.007s

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 mkfs/rootdir.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index 7bdd6245f0d6..ed0fcfccd42a 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -798,6 +798,7 @@ static int add_file_item_extent(struct btrfs_trans_handle *trans,
 	bool datasum = true;
 	ssize_t comp_ret;
 	u64 flags = btrfs_stack_inode_flags(btrfs_inode);
+	off64_t next;
 
 	if (g_do_reflink || flags & BTRFS_INODE_NOCOMPRESS)
 		do_comp = false;
@@ -807,9 +808,38 @@ static int add_file_item_extent(struct btrfs_trans_handle *trans,
 		do_comp = false;
 	}
 
-	buf_size = do_comp ? BTRFS_MAX_COMPRESSED : MAX_EXTENT_SIZE;
-	to_read = min(file_pos + buf_size, source->size) - file_pos;
+	next = lseek64(source->fd, file_pos, SEEK_DATA);
+	/* The current offset is inside a hole to the next of the file. */
+	if (next == (off64_t)-1 && errno == ENXIO)
+		next = round_up(source->size, sectorsize);
+	if (next != (off64_t)-1 && next > file_pos && IS_ALIGNED(next, sectorsize)) {
+		/*
+		 * Limit the hole size to 1G, this is to avoid overflow int type,
+		 * or a hole length >= 2G can be treated as an error.
+		 */
+		const u64 length = min_t(u64, next - file_pos, SZ_1G);
 
+		btrfs_set_stack_file_extent_num_bytes(&stack_fi, length);
+		btrfs_set_stack_file_extent_ram_bytes(&stack_fi, length);
+		ret = btrfs_insert_file_extent(trans, root, objectid, file_pos, &stack_fi);
+		if (ret < 0) {
+			error("cannot insert hole for range [%llu, %llu)",
+			      file_pos, file_pos + length);
+			return ret;
+		}
+		return length;
+	}
+
+	/*
+	 * We have skippted to the next data, try to locate the next hole
+	 * to limit the read size.
+	 */
+	next = lseek64(source->fd, file_pos, SEEK_HOLE);
+	if (next == (off64_t)-1 || !IS_ALIGNED(next, sectorsize) || next > source->size)
+		next = source->size;
+
+	buf_size = do_comp ? BTRFS_MAX_COMPRESSED : MAX_EXTENT_SIZE;
+	to_read = min_t(u64, file_pos + buf_size, next) - file_pos;
 	bytes_read = 0;
 
 	while (bytes_read < to_read) {
@@ -874,7 +904,7 @@ static int add_file_item_extent(struct btrfs_trans_handle *trans,
 			btrfs_set_stack_inode_flags(btrfs_inode, flags);
 
 			buf_size = MAX_EXTENT_SIZE;
-			to_read = min(file_pos + buf_size, source->size) - file_pos;
+			to_read = min_t(u64, file_pos + buf_size, next) - file_pos;
 
 			while (bytes_read < to_read) {
 				ssize_t ret_read;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] btrfs-progs: mkfs-tests: add a test case for mkfs hole detection
  2026-03-23  5:51 [PATCH 0/2] btrfs-progs: mkfs/rootdir: add hole detection Qu Wenruo
  2026-03-23  5:51 ` [PATCH 1/2] " Qu Wenruo
@ 2026-03-23  5:51 ` Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2026-03-23  5:51 UTC (permalink / raw)
  To: linux-btrfs

The simple test case includes the following 3 patterns inside the
rootdir for hole detection:

- Full hole
  The whole file is a single hole.

- Middle data
  The head and tail parts are holes, and there is a single block of data
  in the middle.

- Middle hole
  There is a single block for the head and tail of the file.

Those files are inside a temporary directory, which is normally inside
'/tmp' thus is using page size as block size, but it can still be on
another fs.
Thus we have to properly grab the block size of the temporary directory
and check if we can mount the created fs first.

And verify the following aspects of those files after mounting the
created fs:

- The total amount of blocks written to the fs
  This is done by a plain 'du' call.
  We should get 3 data blocks written to the target fs thanks to the
  hole detection.

- The content of the files
  They should not change compared to the data in rootdir.

- Basic btrfs check
  The resulted fs should always pass btrfs check.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tests/mkfs-tests/041-hole-detection/test.sh | 70 +++++++++++++++++++++
 1 file changed, 70 insertions(+)
 create mode 100755 tests/mkfs-tests/041-hole-detection/test.sh

diff --git a/tests/mkfs-tests/041-hole-detection/test.sh b/tests/mkfs-tests/041-hole-detection/test.sh
new file mode 100755
index 000000000000..8c21762ae6fe
--- /dev/null
+++ b/tests/mkfs-tests/041-hole-detection/test.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+# Test basic hole detection features
+
+source "$TEST_TOP/common" || exit
+
+check_prereq mkfs.btrfs
+check_prereq btrfs
+check_global_prereq du
+
+if ! [ -f "/sys/fs/btrfs/features/supported_sectorsizes" ]; then
+	_not_run "kernel support for different block sizes missing"
+fi
+
+setup_root_helper
+prepare_test_dev
+
+tmp=$(_mktemp_dir mkfs-rootdir)
+
+# Get the fs block size, normally it's page size (using tmpfs for /tmp),
+# but it can still be other values if /tmp is on a regular fs.
+blocksize=$(stat -f -c %S "$tmp")
+
+blocksize_supported=false
+for bs in $(cat /sys/fs/btrfs/features/supported_sectorsizes); do
+	if [ "$blocksize" == "$bs" ]; then
+		blocksize_supported=true
+	fi
+done
+
+if [ "$blocksize_supported" != "true" ]; then
+	_not_run "kernel support for mounting blocksize $blocksize is missing"
+fi
+
+run_check truncate -s 1M "$tmp/full_hole"
+full_hole_before=$(run_check_stdout md5sum "$tmp/full_hole" | awk '{print $1}')
+run_check dd if=/dev/urandom of="$tmp/middle_data" bs=$blocksize seek=16 count=1
+run_check truncate -s $(($blocksize * 32)) "$tmp/middle_data"
+middle_data_before=$(run_check_stdout md5sum "$tmp/middle_data" | awk '{print $1}')
+run_check dd if=/dev/urandom of="$tmp/middle_hole" bs=$blocksize count=1
+run_check dd if=/dev/urandom of="$tmp/middle_hole" bs=$blocksize count=1 seek=16
+middle_hole_before=$(run_check_stdout md5sum "$tmp/middle_hole" | awk '{print $1}')
+
+run_check_mkfs_test_dev -s $blocksize --rootdir "$tmp"
+run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
+run_check_mount_test_dev
+
+# There are only 3 blocks written, thus 'du' should only report such 3 blocks
+# used.
+blocks=$(run_check_stdout du -B $blocksize "$TEST_MNT" | awk '{print $1}')
+
+if [ "$blocks" != "3" ]; then
+	_fail "Unexpected number of blocks written, has $blocks expect 3"
+fi
+
+full_hole_after=$(run_check_stdout md5sum "$TEST_MNT/full_hole" | awk '{print $1}')
+middle_data_after=$(run_check_stdout md5sum "$TEST_MNT/middle_data" | awk '{print $1}')
+middle_hole_after=$(run_check_stdout md5sum "$TEST_MNT/middle_hole" | awk '{print $1}')
+
+if [ "$full_hole_before" != "$full_hole_after" ]; then
+	_fail "full_hole content changed"
+fi
+if [ "$middle_data_before" != "$middle_data_after" ]; then
+	_fail "middle_data content changed"
+fi
+if [ "$middle_hole_before" != "$middle_hole_after" ]; then
+	_fail "middle_hole content changed"
+fi
+
+run_check_umount_test_dev
+run_check rm -rf -- "$tmp"
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-23  5:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23  5:51 [PATCH 0/2] btrfs-progs: mkfs/rootdir: add hole detection Qu Wenruo
2026-03-23  5:51 ` [PATCH 1/2] " Qu Wenruo
2026-03-23  5:51 ` [PATCH 2/2] btrfs-progs: mkfs-tests: add a test case for mkfs " Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox