* [PATCH 1/2] btrfs-progs: mkfs/rootdir: add hole detection
2026-03-23 5:51 [PATCH 0/2] btrfs-progs: mkfs/rootdir: add hole detection Qu Wenruo
@ 2026-03-23 5:51 ` Qu Wenruo
2026-03-23 5:51 ` [PATCH 2/2] btrfs-progs: mkfs-tests: add a test case for mkfs " Qu Wenruo
1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2026-03-23 5:51 UTC (permalink / raw)
To: linux-btrfs
Currently mkfs.btrfs --rootdir can not detect holes, thus it will always
read out the content of the files then write them into the destination
filesystem.
This can be time consuming for both the IO and checksum calculation, so
it can save a lot of CPU and IO time to skip those holes.
This patch adds such detection through SEEK_DATA flag for lseek64().
There are some minor corner cases:
- If the @file_pos is already inside a hole covering EOF
lseek64() will return -1 and set errno to ENXIO.
We need to properly handle this case other than treating it as an
error.
- The host fs may have a different block size
The returned offset from lseek64() is only aligned to the host fs
block size, which can be different from the target btrfs block size.
This is especially important as we are going to support all block
sizes, thus we need to take extra care.
Here we make it simple by rejecting any returned position that is not
aligned to the target fs block size, and fallback to regular
read/write path.
In theory we can do better, but I believe 4K fs block will be the
standard thus we do not need to bother too much yet.
- Limit the read length to the next hole start
Otherwise we can read out the hole just after the non-hole range.
- Prevent return value overflow of add_file_item_extent()
Since we can get a huge hole, and that function returns the number of
bytes processed, for a huge hole (>= 2GiB) the length will
overflow INT_MAX and make the caller treat the return value as an error.
Here we limit the hole lenght to 1GiB to prevent overflow.
Here is a simple benchmark, almost the best case scenario:
rootdir/
\- middle_hole
The file "middle_hole" is generated by the following command:
xfs_io -f -c "pwrite 2g 4k" -c "truncate 5g" middle_hole
So it's a single 4K block at the middle of 5G sparse file.
Unpatched timed mkfs:
real 0m24.824s
user 0m22.213s
sys 0m2.078s
Patched timed mkfs:
real 0m0.026s
user 0m0.006s
sys 0m0.007s
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
mkfs/rootdir.c | 36 +++++++++++++++++++++++++++++++++---
1 file changed, 33 insertions(+), 3 deletions(-)
diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index 7bdd6245f0d6..ed0fcfccd42a 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -798,6 +798,7 @@ static int add_file_item_extent(struct btrfs_trans_handle *trans,
bool datasum = true;
ssize_t comp_ret;
u64 flags = btrfs_stack_inode_flags(btrfs_inode);
+ off64_t next;
if (g_do_reflink || flags & BTRFS_INODE_NOCOMPRESS)
do_comp = false;
@@ -807,9 +808,38 @@ static int add_file_item_extent(struct btrfs_trans_handle *trans,
do_comp = false;
}
- buf_size = do_comp ? BTRFS_MAX_COMPRESSED : MAX_EXTENT_SIZE;
- to_read = min(file_pos + buf_size, source->size) - file_pos;
+ next = lseek64(source->fd, file_pos, SEEK_DATA);
+ /* The current offset is inside a hole to the next of the file. */
+ if (next == (off64_t)-1 && errno == ENXIO)
+ next = round_up(source->size, sectorsize);
+ if (next != (off64_t)-1 && next > file_pos && IS_ALIGNED(next, sectorsize)) {
+ /*
+ * Limit the hole size to 1G, this is to avoid overflow int type,
+ * or a hole length >= 2G can be treated as an error.
+ */
+ const u64 length = min_t(u64, next - file_pos, SZ_1G);
+ btrfs_set_stack_file_extent_num_bytes(&stack_fi, length);
+ btrfs_set_stack_file_extent_ram_bytes(&stack_fi, length);
+ ret = btrfs_insert_file_extent(trans, root, objectid, file_pos, &stack_fi);
+ if (ret < 0) {
+ error("cannot insert hole for range [%llu, %llu)",
+ file_pos, file_pos + length);
+ return ret;
+ }
+ return length;
+ }
+
+ /*
+ * We have skippted to the next data, try to locate the next hole
+ * to limit the read size.
+ */
+ next = lseek64(source->fd, file_pos, SEEK_HOLE);
+ if (next == (off64_t)-1 || !IS_ALIGNED(next, sectorsize) || next > source->size)
+ next = source->size;
+
+ buf_size = do_comp ? BTRFS_MAX_COMPRESSED : MAX_EXTENT_SIZE;
+ to_read = min_t(u64, file_pos + buf_size, next) - file_pos;
bytes_read = 0;
while (bytes_read < to_read) {
@@ -874,7 +904,7 @@ static int add_file_item_extent(struct btrfs_trans_handle *trans,
btrfs_set_stack_inode_flags(btrfs_inode, flags);
buf_size = MAX_EXTENT_SIZE;
- to_read = min(file_pos + buf_size, source->size) - file_pos;
+ to_read = min_t(u64, file_pos + buf_size, next) - file_pos;
while (bytes_read < to_read) {
ssize_t ret_read;
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* [PATCH 2/2] btrfs-progs: mkfs-tests: add a test case for mkfs hole detection
2026-03-23 5:51 [PATCH 0/2] btrfs-progs: mkfs/rootdir: add hole detection Qu Wenruo
2026-03-23 5:51 ` [PATCH 1/2] " Qu Wenruo
@ 2026-03-23 5:51 ` Qu Wenruo
1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2026-03-23 5:51 UTC (permalink / raw)
To: linux-btrfs
The simple test case includes the following 3 patterns inside the
rootdir for hole detection:
- Full hole
The whole file is a single hole.
- Middle data
The head and tail parts are holes, and there is a single block of data
in the middle.
- Middle hole
There is a single block for the head and tail of the file.
Those files are inside a temporary directory, which is normally inside
'/tmp' thus is using page size as block size, but it can still be on
another fs.
Thus we have to properly grab the block size of the temporary directory
and check if we can mount the created fs first.
And verify the following aspects of those files after mounting the
created fs:
- The total amount of blocks written to the fs
This is done by a plain 'du' call.
We should get 3 data blocks written to the target fs thanks to the
hole detection.
- The content of the files
They should not change compared to the data in rootdir.
- Basic btrfs check
The resulted fs should always pass btrfs check.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
tests/mkfs-tests/041-hole-detection/test.sh | 70 +++++++++++++++++++++
1 file changed, 70 insertions(+)
create mode 100755 tests/mkfs-tests/041-hole-detection/test.sh
diff --git a/tests/mkfs-tests/041-hole-detection/test.sh b/tests/mkfs-tests/041-hole-detection/test.sh
new file mode 100755
index 000000000000..8c21762ae6fe
--- /dev/null
+++ b/tests/mkfs-tests/041-hole-detection/test.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+# Test basic hole detection features
+
+source "$TEST_TOP/common" || exit
+
+check_prereq mkfs.btrfs
+check_prereq btrfs
+check_global_prereq du
+
+if ! [ -f "/sys/fs/btrfs/features/supported_sectorsizes" ]; then
+ _not_run "kernel support for different block sizes missing"
+fi
+
+setup_root_helper
+prepare_test_dev
+
+tmp=$(_mktemp_dir mkfs-rootdir)
+
+# Get the fs block size, normally it's page size (using tmpfs for /tmp),
+# but it can still be other values if /tmp is on a regular fs.
+blocksize=$(stat -f -c %S "$tmp")
+
+blocksize_supported=false
+for bs in $(cat /sys/fs/btrfs/features/supported_sectorsizes); do
+ if [ "$blocksize" == "$bs" ]; then
+ blocksize_supported=true
+ fi
+done
+
+if [ "$blocksize_supported" != "true" ]; then
+ _not_run "kernel support for mounting blocksize $blocksize is missing"
+fi
+
+run_check truncate -s 1M "$tmp/full_hole"
+full_hole_before=$(run_check_stdout md5sum "$tmp/full_hole" | awk '{print $1}')
+run_check dd if=/dev/urandom of="$tmp/middle_data" bs=$blocksize seek=16 count=1
+run_check truncate -s $(($blocksize * 32)) "$tmp/middle_data"
+middle_data_before=$(run_check_stdout md5sum "$tmp/middle_data" | awk '{print $1}')
+run_check dd if=/dev/urandom of="$tmp/middle_hole" bs=$blocksize count=1
+run_check dd if=/dev/urandom of="$tmp/middle_hole" bs=$blocksize count=1 seek=16
+middle_hole_before=$(run_check_stdout md5sum "$tmp/middle_hole" | awk '{print $1}')
+
+run_check_mkfs_test_dev -s $blocksize --rootdir "$tmp"
+run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
+run_check_mount_test_dev
+
+# There are only 3 blocks written, thus 'du' should only report such 3 blocks
+# used.
+blocks=$(run_check_stdout du -B $blocksize "$TEST_MNT" | awk '{print $1}')
+
+if [ "$blocks" != "3" ]; then
+ _fail "Unexpected number of blocks written, has $blocks expect 3"
+fi
+
+full_hole_after=$(run_check_stdout md5sum "$TEST_MNT/full_hole" | awk '{print $1}')
+middle_data_after=$(run_check_stdout md5sum "$TEST_MNT/middle_data" | awk '{print $1}')
+middle_hole_after=$(run_check_stdout md5sum "$TEST_MNT/middle_hole" | awk '{print $1}')
+
+if [ "$full_hole_before" != "$full_hole_after" ]; then
+ _fail "full_hole content changed"
+fi
+if [ "$middle_data_before" != "$middle_data_after" ]; then
+ _fail "middle_data content changed"
+fi
+if [ "$middle_hole_before" != "$middle_hole_after" ]; then
+ _fail "middle_hole content changed"
+fi
+
+run_check_umount_test_dev
+run_check rm -rf -- "$tmp"
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread