public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] generic/790: test post-EOF gap zeroing persistence
@ 2026-04-24  9:22 Zhang Yi
  2026-04-24 13:09 ` Brian Foster
  0 siblings, 1 reply; 4+ messages in thread
From: Zhang Yi @ 2026-04-24  9:22 UTC (permalink / raw)
  To: fstests, zlang
  Cc: linux-ext4, linux-fsdevel, bfoster, jack, yi.zhang, yi.zhang,
	yizhang089, yangerkun

From: Zhang Yi <yi.zhang@huawei.com>

Test that extending a file past a non-block-aligned EOF correctly
zero-fills the gap [old_EOF, block_boundary), and that this zeroing
persists through a filesystem shutdown+remount cycle.

Stale data beyond EOF can persist on disk when append write data blocks
are flushed before the on-disk file size update, or when concurrent
append writeback and mmap writes persist non-zero data past EOF.
Subsequent post-EOF operations (append write, fallocate, truncate up)
must zero-fill and persist the gap to prevent exposing stale data.

The test pollutes the file's last physical block (via FIEMAP + raw
device write) with a sentinel pattern beyond i_size, then performs each
extend operation and verifies the gap is zeroed both in memory and on
disk.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
v1->v2:
 - Add _require_no_realtime to prevent testing on XFS realtime devices,
   where file data may reside on $SCRATCH_RTDEV.
 - Add _exclude_fs btrfs since FIEMAP returns logical addresses, not
   physical device offsets, writing to these offsets on $SCRATCH_DEV
   would corrupt the filesystem in multi-device setups. Besides, since
   btrfs doesn't support shutdown right now, we can support it later.
 - Add -v flag to od in _check_gap_zero() to prevent line folding of
   identical consecutive lines.
 - Add expected_new_sz parameter to _test_eof_zeroing(), verify file
   size was not rolled back after shutdown+remount cycle, and also drop
   the unnecessary file size check before the shutdown as well.
 - Clarify the comment regarding when stale data beyond EOF can persist.

 tests/generic/790     | 164 ++++++++++++++++++++++++++++++++++++++++++
 tests/generic/790.out |   4 ++
 2 files changed, 168 insertions(+)
 create mode 100755 tests/generic/790
 create mode 100644 tests/generic/790.out

diff --git a/tests/generic/790 b/tests/generic/790
new file mode 100755
index 00000000..2adc06f8
--- /dev/null
+++ b/tests/generic/790
@@ -0,0 +1,164 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2026 Huawei.  All Rights Reserved.
+#
+# FS QA Test No. 790
+#
+# Test that extending a file past a non-block-aligned EOF correctly zero-fills
+# the gap [old_EOF, block_boundary), and that this zeroing persists through a
+# filesystem shutdown+remount cycle.
+#
+# Stale data beyond EOF can persist on disk when:
+# 1) append write data blocks are flushed before the on-disk file size update,
+#    and the system crashes in this window.
+# 2) concurrent append writeback and mmap writes persist non-zero data past EOF.
+#
+# Subsequent post-EOF operations (append write, fallocate, truncate up) must
+# zero-fill and persist the gap to prevent exposing stale data.
+#
+# The test pollutes the file's last physical block (via FIEMAP + raw device
+# write) with a sentinel pattern beyond i_size, then performs each extend
+# operation and verifies the gap is zeroed both in memory and on disk.
+#
+. ./common/preamble
+_begin_fstest auto quick rw shutdown
+
+. ./common/filter
+
+_require_scratch
+_require_block_device $SCRATCH_DEV
+_require_no_realtime
+_require_scratch_shutdown
+_require_metadata_journaling $SCRATCH_DEV
+
+# FIEMAP on Btrfs returns logical addresses within the filesystem's address
+# space, not physical device offsets. Writing to these offsets on $SCRATCH_DEV
+# would corrupt the filesystem in multi-device setups.
+_exclude_fs btrfs
+
+_require_xfs_io_command "fiemap"
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "pwrite"
+_require_xfs_io_command "truncate"
+_require_xfs_io_command "sync_range"
+
+# Check that gap region [offset, offset+nbytes) is entirely zero
+_check_gap_zero()
+{
+	local file="$1"
+	local offset="$2"
+	local nbytes="$3"
+	local label="$4"
+	local data
+	local stripped
+
+	data=$(od -A n -t x1 -v -j $offset -N $nbytes "$file" 2>/dev/null)
+
+	# Remove whitespace and check if any byte is non-zero
+	stripped=$(printf '%s' "$data" | tr -d ' \n\t')
+	if [ -n "$stripped" ] && ! echo "$stripped" | grep -qE "^0+$"; then
+		echo "FAIL: non-zero data in gap [$offset,$((offset + nbytes))) $label"
+		_hexdump -N $((offset + nbytes)) "$file"
+		return 1
+	fi
+	return 0
+}
+
+# Get the physical block offset (in bytes) of the file's first block on device
+_get_phys_offset()
+{
+	local file="$1"
+	local fiemap_output
+	local phys_blk
+
+	fiemap_output=$($XFS_IO_PROG -r -c "fiemap -v" "$file" 2>/dev/null)
+	phys_blk=$(echo "$fiemap_output" | _filter_xfs_io_fiemap | head -1 | awk '{print $3}')
+	if [ -z "$phys_blk" ]; then
+		echo ""
+		return
+	fi
+	# Convert 512-byte blocks to bytes
+	echo $((phys_blk * 512))
+}
+
+_test_eof_zeroing()
+{
+	local test_name="$1"
+	local extend_cmd="$2"
+	local expected_new_sz="$3"
+	local file=$SCRATCH_MNT/testfile_${test_name}
+
+	echo "$test_name" | tee -a $seqres.full
+
+	# Compute non-block-aligned EOF offset
+	local gap_bytes=16
+	local eof_offset=$((blksz - gap_bytes))
+
+	# Step 1: Write one full block to ensure the filesystem allocates a
+	#         physical block for the file instead of using inline data.
+	$XFS_IO_PROG -f -c "pwrite -S 0x5a 0 $blksz" -c fsync \
+		"$file" >> $seqres.full 2>&1
+
+	# Step 2: Get physical block offset on device via FIEMAP
+	local phys_offset
+	phys_offset=$(_get_phys_offset "$file")
+	if [ -z "$phys_offset" ]; then
+		_fail "$test_name: failed to get physical block offset via fiemap"
+	fi
+
+	# Step 3: Truncate file to non-block-aligned size and fsync.
+	#         The on-disk region [eof_offset, blksz) may or may not be
+	#         zeroed by the filesystem at this point.
+	$XFS_IO_PROG -c "truncate $eof_offset" -c fsync \
+		"$file" >> $seqres.full 2>&1
+
+	# Step 4: Unmount and restore the physical block to all-0x5a on disk.
+	#         This bypasses the kernel's pagecache EOF-zeroing to ensure
+	#         the stale pattern is present on disk. Then remount.
+	_scratch_unmount
+	$XFS_IO_PROG -d -c "pwrite -S 0x5a $phys_offset $blksz" \
+		$SCRATCH_DEV >> $seqres.full 2>&1
+	_scratch_mount >> $seqres.full 2>&1
+
+	# Step 5: Execute the extend operation.
+	$XFS_IO_PROG -c "$extend_cmd" "$file" >> $seqres.full 2>&1
+
+	# Step 6: Verify gap [eof_offset, blksz) is zeroed BEFORE shutdown
+	_check_gap_zero "$file" $eof_offset $gap_bytes "before shutdown" || return 1
+
+	# Step 7: Sync the extended range and shutdown the filesystem with
+	#         journal flush. This persists the file size extending, and
+	#         the filesystem should persist the zeroed data in the gap
+	#         range as well.
+	if [ "$extend_cmd" != "${extend_cmd#pwrite}" ]; then
+		$XFS_IO_PROG -c "sync_range -w $blksz $blksz" \
+			"$file" >> $seqres.full 2>&1
+	fi
+	_scratch_shutdown -f
+
+	# Step 8: Remount and verify gap is still zeroed
+	_scratch_cycle_mount
+
+	# Verify file size was not rolled back after shutdown+remount
+	local sz
+	sz=$(stat -c %s "$file")
+	if [ "$sz" -ne "$expected_new_sz" ]; then
+		_fail "$test_name: file size rolled back after shutdown+remount: $sz != $expected_new_sz"
+	fi
+
+	_check_gap_zero "$file" $eof_offset $gap_bytes "after shutdown+remount" || return 1
+}
+
+_scratch_mkfs >> $seqres.full 2>&1
+_scratch_mount
+
+blksz=$(_get_block_size $SCRATCH_MNT)
+
+# Test three variants of EOF-extending operations
+_test_eof_zeroing "append_write" "pwrite -S 0x42 $blksz $blksz" $((blksz * 2))
+_test_eof_zeroing "truncate_up" "truncate $((blksz * 2))" $((blksz * 2))
+_test_eof_zeroing "fallocate" "falloc $blksz $blksz" $((blksz * 2))
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/790.out b/tests/generic/790.out
new file mode 100644
index 00000000..e5e2cc09
--- /dev/null
+++ b/tests/generic/790.out
@@ -0,0 +1,4 @@
+QA output created by 790
+append_write
+truncate_up
+fallocate
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] generic/790: test post-EOF gap zeroing persistence
  2026-04-24  9:22 [PATCH v2] generic/790: test post-EOF gap zeroing persistence Zhang Yi
@ 2026-04-24 13:09 ` Brian Foster
  2026-04-25  3:06   ` Zhang Yi
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2026-04-24 13:09 UTC (permalink / raw)
  To: Zhang Yi
  Cc: fstests, zlang, linux-ext4, linux-fsdevel, jack, yi.zhang,
	yizhang089, yangerkun

On Fri, Apr 24, 2026 at 05:22:28PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Test that extending a file past a non-block-aligned EOF correctly
> zero-fills the gap [old_EOF, block_boundary), and that this zeroing
> persists through a filesystem shutdown+remount cycle.
> 
> Stale data beyond EOF can persist on disk when append write data blocks
> are flushed before the on-disk file size update, or when concurrent
> append writeback and mmap writes persist non-zero data past EOF.
> Subsequent post-EOF operations (append write, fallocate, truncate up)
> must zero-fill and persist the gap to prevent exposing stale data.
> 
> The test pollutes the file's last physical block (via FIEMAP + raw
> device write) with a sentinel pattern beyond i_size, then performs each
> extend operation and verifies the gap is zeroed both in memory and on
> disk.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
> v1->v2:
>  - Add _require_no_realtime to prevent testing on XFS realtime devices,
>    where file data may reside on $SCRATCH_RTDEV.
>  - Add _exclude_fs btrfs since FIEMAP returns logical addresses, not
>    physical device offsets, writing to these offsets on $SCRATCH_DEV
>    would corrupt the filesystem in multi-device setups. Besides, since
>    btrfs doesn't support shutdown right now, we can support it later.
>  - Add -v flag to od in _check_gap_zero() to prevent line folding of
>    identical consecutive lines.
>  - Add expected_new_sz parameter to _test_eof_zeroing(), verify file
>    size was not rolled back after shutdown+remount cycle, and also drop
>    the unnecessary file size check before the shutdown as well.
>  - Clarify the comment regarding when stale data beyond EOF can persist.
> 

Thanks for the tweaks. This all LGTM from a review standpoint. I gave it
a quick test on latest master and I see a few failures in a couple runs:

- On XFS (mkfs defaults) I saw one unexpected i_size failure and one
  zeroing failure, both on write extension fwiw.
- On ext4 I saw a few unexpected i_size failures (both with mkfs
  defaults and 1k block size).

I haven't dug into anything beyond that. Does this match what you're
seeing on current kernels or are these unexpected failures?

Brian

>  tests/generic/790     | 164 ++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/790.out |   4 ++
>  2 files changed, 168 insertions(+)
>  create mode 100755 tests/generic/790
>  create mode 100644 tests/generic/790.out
> 
> diff --git a/tests/generic/790 b/tests/generic/790
> new file mode 100755
> index 00000000..2adc06f8
> --- /dev/null
> +++ b/tests/generic/790
> @@ -0,0 +1,164 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2026 Huawei.  All Rights Reserved.
> +#
> +# FS QA Test No. 790
> +#
> +# Test that extending a file past a non-block-aligned EOF correctly zero-fills
> +# the gap [old_EOF, block_boundary), and that this zeroing persists through a
> +# filesystem shutdown+remount cycle.
> +#
> +# Stale data beyond EOF can persist on disk when:
> +# 1) append write data blocks are flushed before the on-disk file size update,
> +#    and the system crashes in this window.
> +# 2) concurrent append writeback and mmap writes persist non-zero data past EOF.
> +#
> +# Subsequent post-EOF operations (append write, fallocate, truncate up) must
> +# zero-fill and persist the gap to prevent exposing stale data.
> +#
> +# The test pollutes the file's last physical block (via FIEMAP + raw device
> +# write) with a sentinel pattern beyond i_size, then performs each extend
> +# operation and verifies the gap is zeroed both in memory and on disk.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick rw shutdown
> +
> +. ./common/filter
> +
> +_require_scratch
> +_require_block_device $SCRATCH_DEV
> +_require_no_realtime
> +_require_scratch_shutdown
> +_require_metadata_journaling $SCRATCH_DEV
> +
> +# FIEMAP on Btrfs returns logical addresses within the filesystem's address
> +# space, not physical device offsets. Writing to these offsets on $SCRATCH_DEV
> +# would corrupt the filesystem in multi-device setups.
> +_exclude_fs btrfs
> +
> +_require_xfs_io_command "fiemap"
> +_require_xfs_io_command "falloc"
> +_require_xfs_io_command "pwrite"
> +_require_xfs_io_command "truncate"
> +_require_xfs_io_command "sync_range"
> +
> +# Check that gap region [offset, offset+nbytes) is entirely zero
> +_check_gap_zero()
> +{
> +	local file="$1"
> +	local offset="$2"
> +	local nbytes="$3"
> +	local label="$4"
> +	local data
> +	local stripped
> +
> +	data=$(od -A n -t x1 -v -j $offset -N $nbytes "$file" 2>/dev/null)
> +
> +	# Remove whitespace and check if any byte is non-zero
> +	stripped=$(printf '%s' "$data" | tr -d ' \n\t')
> +	if [ -n "$stripped" ] && ! echo "$stripped" | grep -qE "^0+$"; then
> +		echo "FAIL: non-zero data in gap [$offset,$((offset + nbytes))) $label"
> +		_hexdump -N $((offset + nbytes)) "$file"
> +		return 1
> +	fi
> +	return 0
> +}
> +
> +# Get the physical block offset (in bytes) of the file's first block on device
> +_get_phys_offset()
> +{
> +	local file="$1"
> +	local fiemap_output
> +	local phys_blk
> +
> +	fiemap_output=$($XFS_IO_PROG -r -c "fiemap -v" "$file" 2>/dev/null)
> +	phys_blk=$(echo "$fiemap_output" | _filter_xfs_io_fiemap | head -1 | awk '{print $3}')
> +	if [ -z "$phys_blk" ]; then
> +		echo ""
> +		return
> +	fi
> +	# Convert 512-byte blocks to bytes
> +	echo $((phys_blk * 512))
> +}
> +
> +_test_eof_zeroing()
> +{
> +	local test_name="$1"
> +	local extend_cmd="$2"
> +	local expected_new_sz="$3"
> +	local file=$SCRATCH_MNT/testfile_${test_name}
> +
> +	echo "$test_name" | tee -a $seqres.full
> +
> +	# Compute non-block-aligned EOF offset
> +	local gap_bytes=16
> +	local eof_offset=$((blksz - gap_bytes))
> +
> +	# Step 1: Write one full block to ensure the filesystem allocates a
> +	#         physical block for the file instead of using inline data.
> +	$XFS_IO_PROG -f -c "pwrite -S 0x5a 0 $blksz" -c fsync \
> +		"$file" >> $seqres.full 2>&1
> +
> +	# Step 2: Get physical block offset on device via FIEMAP
> +	local phys_offset
> +	phys_offset=$(_get_phys_offset "$file")
> +	if [ -z "$phys_offset" ]; then
> +		_fail "$test_name: failed to get physical block offset via fiemap"
> +	fi
> +
> +	# Step 3: Truncate file to non-block-aligned size and fsync.
> +	#         The on-disk region [eof_offset, blksz) may or may not be
> +	#         zeroed by the filesystem at this point.
> +	$XFS_IO_PROG -c "truncate $eof_offset" -c fsync \
> +		"$file" >> $seqres.full 2>&1
> +
> +	# Step 4: Unmount and restore the physical block to all-0x5a on disk.
> +	#         This bypasses the kernel's pagecache EOF-zeroing to ensure
> +	#         the stale pattern is present on disk. Then remount.
> +	_scratch_unmount
> +	$XFS_IO_PROG -d -c "pwrite -S 0x5a $phys_offset $blksz" \
> +		$SCRATCH_DEV >> $seqres.full 2>&1
> +	_scratch_mount >> $seqres.full 2>&1
> +
> +	# Step 5: Execute the extend operation.
> +	$XFS_IO_PROG -c "$extend_cmd" "$file" >> $seqres.full 2>&1
> +
> +	# Step 6: Verify gap [eof_offset, blksz) is zeroed BEFORE shutdown
> +	_check_gap_zero "$file" $eof_offset $gap_bytes "before shutdown" || return 1
> +
> +	# Step 7: Sync the extended range and shutdown the filesystem with
> +	#         journal flush. This persists the file size extending, and
> +	#         the filesystem should persist the zeroed data in the gap
> +	#         range as well.
> +	if [ "$extend_cmd" != "${extend_cmd#pwrite}" ]; then
> +		$XFS_IO_PROG -c "sync_range -w $blksz $blksz" \
> +			"$file" >> $seqres.full 2>&1
> +	fi
> +	_scratch_shutdown -f
> +
> +	# Step 8: Remount and verify gap is still zeroed
> +	_scratch_cycle_mount
> +
> +	# Verify file size was not rolled back after shutdown+remount
> +	local sz
> +	sz=$(stat -c %s "$file")
> +	if [ "$sz" -ne "$expected_new_sz" ]; then
> +		_fail "$test_name: file size rolled back after shutdown+remount: $sz != $expected_new_sz"
> +	fi
> +
> +	_check_gap_zero "$file" $eof_offset $gap_bytes "after shutdown+remount" || return 1
> +}
> +
> +_scratch_mkfs >> $seqres.full 2>&1
> +_scratch_mount
> +
> +blksz=$(_get_block_size $SCRATCH_MNT)
> +
> +# Test three variants of EOF-extending operations
> +_test_eof_zeroing "append_write" "pwrite -S 0x42 $blksz $blksz" $((blksz * 2))
> +_test_eof_zeroing "truncate_up" "truncate $((blksz * 2))" $((blksz * 2))
> +_test_eof_zeroing "fallocate" "falloc $blksz $blksz" $((blksz * 2))
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/790.out b/tests/generic/790.out
> new file mode 100644
> index 00000000..e5e2cc09
> --- /dev/null
> +++ b/tests/generic/790.out
> @@ -0,0 +1,4 @@
> +QA output created by 790
> +append_write
> +truncate_up
> +fallocate
> -- 
> 2.52.0
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] generic/790: test post-EOF gap zeroing persistence
  2026-04-24 13:09 ` Brian Foster
@ 2026-04-25  3:06   ` Zhang Yi
  2026-04-27 13:03     ` Brian Foster
  0 siblings, 1 reply; 4+ messages in thread
From: Zhang Yi @ 2026-04-25  3:06 UTC (permalink / raw)
  To: Brian Foster
  Cc: fstests, zlang, linux-ext4, linux-fsdevel, jack, yi.zhang,
	yizhang089, yangerkun

On 4/24/2026 9:09 PM, Brian Foster wrote:
> On Fri, Apr 24, 2026 at 05:22:28PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Test that extending a file past a non-block-aligned EOF correctly
>> zero-fills the gap [old_EOF, block_boundary), and that this zeroing
>> persists through a filesystem shutdown+remount cycle.
>>
>> Stale data beyond EOF can persist on disk when append write data blocks
>> are flushed before the on-disk file size update, or when concurrent
>> append writeback and mmap writes persist non-zero data past EOF.
>> Subsequent post-EOF operations (append write, fallocate, truncate up)
>> must zero-fill and persist the gap to prevent exposing stale data.
>>
>> The test pollutes the file's last physical block (via FIEMAP + raw
>> device write) with a sentinel pattern beyond i_size, then performs each
>> extend operation and verifies the gap is zeroed both in memory and on
>> disk.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>> v1->v2:
>>  - Add _require_no_realtime to prevent testing on XFS realtime devices,
>>    where file data may reside on $SCRATCH_RTDEV.
>>  - Add _exclude_fs btrfs since FIEMAP returns logical addresses, not
>>    physical device offsets, writing to these offsets on $SCRATCH_DEV
>>    would corrupt the filesystem in multi-device setups. Besides, since
>>    btrfs doesn't support shutdown right now, we can support it later.
>>  - Add -v flag to od in _check_gap_zero() to prevent line folding of
>>    identical consecutive lines.
>>  - Add expected_new_sz parameter to _test_eof_zeroing(), verify file
>>    size was not rolled back after shutdown+remount cycle, and also drop
>>    the unnecessary file size check before the shutdown as well.
>>  - Clarify the comment regarding when stale data beyond EOF can persist.
>>
> 
> Thanks for the tweaks. This all LGTM from a review standpoint. I gave it
> a quick test on latest master and I see a few failures in a couple runs:
> 
> - On XFS (mkfs defaults) I saw one unexpected i_size failure and one
>   zeroing failure, both on write extension fwiw.

Previously, I only discovered the zeroing failure of append write. This
is because xfs_file_write_checks() -> xfs_file_write_zero_eof() only
zeroes the gap range in the page cache, without providing any
synchronous or asynchronous persistence (instead, truncate up does
synchronously writeback in xfs_vn_setattr_size(), and ext4 achieves
persistence via asynchronous writeback in data=ordered mode). So I think
this is a XFS problem.

Regarding the i_size failure, I did not directly reproduce this issue.
After analysis, I believe it is because the test case did not include
the -a option in sync_range, meaning it did not wait for IO writeback
completion and file size update persistence. I reproduced this issue by
adding a delay in the XFS end IO path. This is a problem with the test
case, and I will fix it in v3. Thank you for pointing this out.

> - On ext4 I saw a few unexpected i_size failures (both with mkfs
>   defaults and 1k block size).
> 

This is an ext4 issue on the shutdown path. Since ext4 set the shutdown
flag too early, it was unable to write back ordered zero data when
flushing the journal, which led to a journal abort and prevented the file
size update from being persisted. I have submitted a patch to fix this
issue. Please see below link for details.

 https://lore.kernel.org/linux-ext4/20260424104201.1930823-1-yi.zhang@huaweicloud.com/

Thanks
Yi.

> I haven't dug into anything beyond that. Does this match what you're
> seeing on current kernels or are these unexpected failures?
> 
> Brian
> 
>>  tests/generic/790     | 164 ++++++++++++++++++++++++++++++++++++++++++
>>  tests/generic/790.out |   4 ++
>>  2 files changed, 168 insertions(+)
>>  create mode 100755 tests/generic/790
>>  create mode 100644 tests/generic/790.out
>>
>> diff --git a/tests/generic/790 b/tests/generic/790
>> new file mode 100755
>> index 00000000..2adc06f8
>> --- /dev/null
>> +++ b/tests/generic/790
>> @@ -0,0 +1,164 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (c) 2026 Huawei.  All Rights Reserved.
>> +#
>> +# FS QA Test No. 790
>> +#
>> +# Test that extending a file past a non-block-aligned EOF correctly zero-fills
>> +# the gap [old_EOF, block_boundary), and that this zeroing persists through a
>> +# filesystem shutdown+remount cycle.
>> +#
>> +# Stale data beyond EOF can persist on disk when:
>> +# 1) append write data blocks are flushed before the on-disk file size update,
>> +#    and the system crashes in this window.
>> +# 2) concurrent append writeback and mmap writes persist non-zero data past EOF.
>> +#
>> +# Subsequent post-EOF operations (append write, fallocate, truncate up) must
>> +# zero-fill and persist the gap to prevent exposing stale data.
>> +#
>> +# The test pollutes the file's last physical block (via FIEMAP + raw device
>> +# write) with a sentinel pattern beyond i_size, then performs each extend
>> +# operation and verifies the gap is zeroed both in memory and on disk.
>> +#
>> +. ./common/preamble
>> +_begin_fstest auto quick rw shutdown
>> +
>> +. ./common/filter
>> +
>> +_require_scratch
>> +_require_block_device $SCRATCH_DEV
>> +_require_no_realtime
>> +_require_scratch_shutdown
>> +_require_metadata_journaling $SCRATCH_DEV
>> +
>> +# FIEMAP on Btrfs returns logical addresses within the filesystem's address
>> +# space, not physical device offsets. Writing to these offsets on $SCRATCH_DEV
>> +# would corrupt the filesystem in multi-device setups.
>> +_exclude_fs btrfs
>> +
>> +_require_xfs_io_command "fiemap"
>> +_require_xfs_io_command "falloc"
>> +_require_xfs_io_command "pwrite"
>> +_require_xfs_io_command "truncate"
>> +_require_xfs_io_command "sync_range"
>> +
>> +# Check that gap region [offset, offset+nbytes) is entirely zero
>> +_check_gap_zero()
>> +{
>> +	local file="$1"
>> +	local offset="$2"
>> +	local nbytes="$3"
>> +	local label="$4"
>> +	local data
>> +	local stripped
>> +
>> +	data=$(od -A n -t x1 -v -j $offset -N $nbytes "$file" 2>/dev/null)
>> +
>> +	# Remove whitespace and check if any byte is non-zero
>> +	stripped=$(printf '%s' "$data" | tr -d ' \n\t')
>> +	if [ -n "$stripped" ] && ! echo "$stripped" | grep -qE "^0+$"; then
>> +		echo "FAIL: non-zero data in gap [$offset,$((offset + nbytes))) $label"
>> +		_hexdump -N $((offset + nbytes)) "$file"
>> +		return 1
>> +	fi
>> +	return 0
>> +}
>> +
>> +# Get the physical block offset (in bytes) of the file's first block on device
>> +_get_phys_offset()
>> +{
>> +	local file="$1"
>> +	local fiemap_output
>> +	local phys_blk
>> +
>> +	fiemap_output=$($XFS_IO_PROG -r -c "fiemap -v" "$file" 2>/dev/null)
>> +	phys_blk=$(echo "$fiemap_output" | _filter_xfs_io_fiemap | head -1 | awk '{print $3}')
>> +	if [ -z "$phys_blk" ]; then
>> +		echo ""
>> +		return
>> +	fi
>> +	# Convert 512-byte blocks to bytes
>> +	echo $((phys_blk * 512))
>> +}
>> +
>> +_test_eof_zeroing()
>> +{
>> +	local test_name="$1"
>> +	local extend_cmd="$2"
>> +	local expected_new_sz="$3"
>> +	local file=$SCRATCH_MNT/testfile_${test_name}
>> +
>> +	echo "$test_name" | tee -a $seqres.full
>> +
>> +	# Compute non-block-aligned EOF offset
>> +	local gap_bytes=16
>> +	local eof_offset=$((blksz - gap_bytes))
>> +
>> +	# Step 1: Write one full block to ensure the filesystem allocates a
>> +	#         physical block for the file instead of using inline data.
>> +	$XFS_IO_PROG -f -c "pwrite -S 0x5a 0 $blksz" -c fsync \
>> +		"$file" >> $seqres.full 2>&1
>> +
>> +	# Step 2: Get physical block offset on device via FIEMAP
>> +	local phys_offset
>> +	phys_offset=$(_get_phys_offset "$file")
>> +	if [ -z "$phys_offset" ]; then
>> +		_fail "$test_name: failed to get physical block offset via fiemap"
>> +	fi
>> +
>> +	# Step 3: Truncate file to non-block-aligned size and fsync.
>> +	#         The on-disk region [eof_offset, blksz) may or may not be
>> +	#         zeroed by the filesystem at this point.
>> +	$XFS_IO_PROG -c "truncate $eof_offset" -c fsync \
>> +		"$file" >> $seqres.full 2>&1
>> +
>> +	# Step 4: Unmount and restore the physical block to all-0x5a on disk.
>> +	#         This bypasses the kernel's pagecache EOF-zeroing to ensure
>> +	#         the stale pattern is present on disk. Then remount.
>> +	_scratch_unmount
>> +	$XFS_IO_PROG -d -c "pwrite -S 0x5a $phys_offset $blksz" \
>> +		$SCRATCH_DEV >> $seqres.full 2>&1
>> +	_scratch_mount >> $seqres.full 2>&1
>> +
>> +	# Step 5: Execute the extend operation.
>> +	$XFS_IO_PROG -c "$extend_cmd" "$file" >> $seqres.full 2>&1
>> +
>> +	# Step 6: Verify gap [eof_offset, blksz) is zeroed BEFORE shutdown
>> +	_check_gap_zero "$file" $eof_offset $gap_bytes "before shutdown" || return 1
>> +
>> +	# Step 7: Sync the extended range and shutdown the filesystem with
>> +	#         journal flush. This persists the file size extending, and
>> +	#         the filesystem should persist the zeroed data in the gap
>> +	#         range as well.
>> +	if [ "$extend_cmd" != "${extend_cmd#pwrite}" ]; then
>> +		$XFS_IO_PROG -c "sync_range -w $blksz $blksz" \
>> +			"$file" >> $seqres.full 2>&1
>> +	fi
>> +	_scratch_shutdown -f
>> +
>> +	# Step 8: Remount and verify gap is still zeroed
>> +	_scratch_cycle_mount
>> +
>> +	# Verify file size was not rolled back after shutdown+remount
>> +	local sz
>> +	sz=$(stat -c %s "$file")
>> +	if [ "$sz" -ne "$expected_new_sz" ]; then
>> +		_fail "$test_name: file size rolled back after shutdown+remount: $sz != $expected_new_sz"
>> +	fi
>> +
>> +	_check_gap_zero "$file" $eof_offset $gap_bytes "after shutdown+remount" || return 1
>> +}
>> +
>> +_scratch_mkfs >> $seqres.full 2>&1
>> +_scratch_mount
>> +
>> +blksz=$(_get_block_size $SCRATCH_MNT)
>> +
>> +# Test three variants of EOF-extending operations
>> +_test_eof_zeroing "append_write" "pwrite -S 0x42 $blksz $blksz" $((blksz * 2))
>> +_test_eof_zeroing "truncate_up" "truncate $((blksz * 2))" $((blksz * 2))
>> +_test_eof_zeroing "fallocate" "falloc $blksz $blksz" $((blksz * 2))
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/generic/790.out b/tests/generic/790.out
>> new file mode 100644
>> index 00000000..e5e2cc09
>> --- /dev/null
>> +++ b/tests/generic/790.out
>> @@ -0,0 +1,4 @@
>> +QA output created by 790
>> +append_write
>> +truncate_up
>> +fallocate
>> -- 
>> 2.52.0
>>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] generic/790: test post-EOF gap zeroing persistence
  2026-04-25  3:06   ` Zhang Yi
@ 2026-04-27 13:03     ` Brian Foster
  0 siblings, 0 replies; 4+ messages in thread
From: Brian Foster @ 2026-04-27 13:03 UTC (permalink / raw)
  To: Zhang Yi
  Cc: fstests, zlang, linux-ext4, linux-fsdevel, jack, yi.zhang,
	yizhang089, yangerkun

On Sat, Apr 25, 2026 at 11:06:27AM +0800, Zhang Yi wrote:
> On 4/24/2026 9:09 PM, Brian Foster wrote:
> > On Fri, Apr 24, 2026 at 05:22:28PM +0800, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> Test that extending a file past a non-block-aligned EOF correctly
> >> zero-fills the gap [old_EOF, block_boundary), and that this zeroing
> >> persists through a filesystem shutdown+remount cycle.
> >>
> >> Stale data beyond EOF can persist on disk when append write data blocks
> >> are flushed before the on-disk file size update, or when concurrent
> >> append writeback and mmap writes persist non-zero data past EOF.
> >> Subsequent post-EOF operations (append write, fallocate, truncate up)
> >> must zero-fill and persist the gap to prevent exposing stale data.
> >>
> >> The test pollutes the file's last physical block (via FIEMAP + raw
> >> device write) with a sentinel pattern beyond i_size, then performs each
> >> extend operation and verifies the gap is zeroed both in memory and on
> >> disk.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> >> ---
> >> v1->v2:
> >>  - Add _require_no_realtime to prevent testing on XFS realtime devices,
> >>    where file data may reside on $SCRATCH_RTDEV.
> >>  - Add _exclude_fs btrfs since FIEMAP returns logical addresses, not
> >>    physical device offsets, writing to these offsets on $SCRATCH_DEV
> >>    would corrupt the filesystem in multi-device setups. Besides, since
> >>    btrfs doesn't support shutdown right now, we can support it later.
> >>  - Add -v flag to od in _check_gap_zero() to prevent line folding of
> >>    identical consecutive lines.
> >>  - Add expected_new_sz parameter to _test_eof_zeroing(), verify file
> >>    size was not rolled back after shutdown+remount cycle, and also drop
> >>    the unnecessary file size check before the shutdown as well.
> >>  - Clarify the comment regarding when stale data beyond EOF can persist.
> >>
> > 
> > Thanks for the tweaks. This all LGTM from a review standpoint. I gave it
> > a quick test on latest master and I see a few failures in a couple runs:
> > 
> > - On XFS (mkfs defaults) I saw one unexpected i_size failure and one
> >   zeroing failure, both on write extension fwiw.
> 
> Previously, I only discovered the zeroing failure of append write. This
> is because xfs_file_write_checks() -> xfs_file_write_zero_eof() only
> zeroes the gap range in the page cache, without providing any
> synchronous or asynchronous persistence (instead, truncate up does
> synchronously writeback in xfs_vn_setattr_size(), and ext4 achieves
> persistence via asynchronous writeback in data=ordered mode). So I think
> this is a XFS problem.
> 

Ah Ok.. the truncate flush has been there for a while to combat the NULL
files problem, and the zeroing check ties into that. I suspect the
reason we don't see this often is typical ascending offset writeback
order, whereas this test is doing a targeted sync range of a later
offset in the file (i.e. technically not overlapping with the range that
was zeroed internally).

I guess the simple thing to do there would be to similarly flush on
write extension. I'd want to think about that just a bit because we
haven't historically flushed there and I know the zeroing improvements
I've made in iomap have caused a couple performance regressions along
the way related to aggressive flushing that had to be fixed.

Maybe this is less of an issue now, but regardless if my assumptions are
correct here than I agree this isn't a test issue.

> Regarding the i_size failure, I did not directly reproduce this issue.
> After analysis, I believe it is because the test case did not include
> the -a option in sync_range, meaning it did not wait for IO writeback
> completion and file size update persistence. I reproduced this issue by
> adding a delay in the XFS end IO path. This is a problem with the test
> case, and I will fix it in v3. Thank you for pointing this out.
> 

Makes sense. It will be interesting to see if we can still reproduce the
above issue with this change.

> > - On ext4 I saw a few unexpected i_size failures (both with mkfs
> >   defaults and 1k block size).
> > 
> 
> This is an ext4 issue on the shutdown path. Since ext4 set the shutdown
> flag too early, it was unable to write back ordered zero data when
> flushing the journal, which led to a journal abort and prevented the file
> size update from being persisted. I have submitted a patch to fix this
> issue. Please see below link for details.
> 
>  https://lore.kernel.org/linux-ext4/20260424104201.1930823-1-yi.zhang@huaweicloud.com/
> 

Ok, thanks.

Brian

> Thanks
> Yi.
> 
> > I haven't dug into anything beyond that. Does this match what you're
> > seeing on current kernels or are these unexpected failures?
> > 
> > Brian
> > 
> >>  tests/generic/790     | 164 ++++++++++++++++++++++++++++++++++++++++++
> >>  tests/generic/790.out |   4 ++
> >>  2 files changed, 168 insertions(+)
> >>  create mode 100755 tests/generic/790
> >>  create mode 100644 tests/generic/790.out
> >>
> >> diff --git a/tests/generic/790 b/tests/generic/790
> >> new file mode 100755
> >> index 00000000..2adc06f8
> >> --- /dev/null
> >> +++ b/tests/generic/790
> >> @@ -0,0 +1,164 @@
> >> +#! /bin/bash
> >> +# SPDX-License-Identifier: GPL-2.0
> >> +# Copyright (c) 2026 Huawei.  All Rights Reserved.
> >> +#
> >> +# FS QA Test No. 790
> >> +#
> >> +# Test that extending a file past a non-block-aligned EOF correctly zero-fills
> >> +# the gap [old_EOF, block_boundary), and that this zeroing persists through a
> >> +# filesystem shutdown+remount cycle.
> >> +#
> >> +# Stale data beyond EOF can persist on disk when:
> >> +# 1) append write data blocks are flushed before the on-disk file size update,
> >> +#    and the system crashes in this window.
> >> +# 2) concurrent append writeback and mmap writes persist non-zero data past EOF.
> >> +#
> >> +# Subsequent post-EOF operations (append write, fallocate, truncate up) must
> >> +# zero-fill and persist the gap to prevent exposing stale data.
> >> +#
> >> +# The test pollutes the file's last physical block (via FIEMAP + raw device
> >> +# write) with a sentinel pattern beyond i_size, then performs each extend
> >> +# operation and verifies the gap is zeroed both in memory and on disk.
> >> +#
> >> +. ./common/preamble
> >> +_begin_fstest auto quick rw shutdown
> >> +
> >> +. ./common/filter
> >> +
> >> +_require_scratch
> >> +_require_block_device $SCRATCH_DEV
> >> +_require_no_realtime
> >> +_require_scratch_shutdown
> >> +_require_metadata_journaling $SCRATCH_DEV
> >> +
> >> +# FIEMAP on Btrfs returns logical addresses within the filesystem's address
> >> +# space, not physical device offsets. Writing to these offsets on $SCRATCH_DEV
> >> +# would corrupt the filesystem in multi-device setups.
> >> +_exclude_fs btrfs
> >> +
> >> +_require_xfs_io_command "fiemap"
> >> +_require_xfs_io_command "falloc"
> >> +_require_xfs_io_command "pwrite"
> >> +_require_xfs_io_command "truncate"
> >> +_require_xfs_io_command "sync_range"
> >> +
> >> +# Check that gap region [offset, offset+nbytes) is entirely zero
> >> +_check_gap_zero()
> >> +{
> >> +	local file="$1"
> >> +	local offset="$2"
> >> +	local nbytes="$3"
> >> +	local label="$4"
> >> +	local data
> >> +	local stripped
> >> +
> >> +	data=$(od -A n -t x1 -v -j $offset -N $nbytes "$file" 2>/dev/null)
> >> +
> >> +	# Remove whitespace and check if any byte is non-zero
> >> +	stripped=$(printf '%s' "$data" | tr -d ' \n\t')
> >> +	if [ -n "$stripped" ] && ! echo "$stripped" | grep -qE "^0+$"; then
> >> +		echo "FAIL: non-zero data in gap [$offset,$((offset + nbytes))) $label"
> >> +		_hexdump -N $((offset + nbytes)) "$file"
> >> +		return 1
> >> +	fi
> >> +	return 0
> >> +}
> >> +
> >> +# Get the physical block offset (in bytes) of the file's first block on device
> >> +_get_phys_offset()
> >> +{
> >> +	local file="$1"
> >> +	local fiemap_output
> >> +	local phys_blk
> >> +
> >> +	fiemap_output=$($XFS_IO_PROG -r -c "fiemap -v" "$file" 2>/dev/null)
> >> +	phys_blk=$(echo "$fiemap_output" | _filter_xfs_io_fiemap | head -1 | awk '{print $3}')
> >> +	if [ -z "$phys_blk" ]; then
> >> +		echo ""
> >> +		return
> >> +	fi
> >> +	# Convert 512-byte blocks to bytes
> >> +	echo $((phys_blk * 512))
> >> +}
> >> +
> >> +_test_eof_zeroing()
> >> +{
> >> +	local test_name="$1"
> >> +	local extend_cmd="$2"
> >> +	local expected_new_sz="$3"
> >> +	local file=$SCRATCH_MNT/testfile_${test_name}
> >> +
> >> +	echo "$test_name" | tee -a $seqres.full
> >> +
> >> +	# Compute non-block-aligned EOF offset
> >> +	local gap_bytes=16
> >> +	local eof_offset=$((blksz - gap_bytes))
> >> +
> >> +	# Step 1: Write one full block to ensure the filesystem allocates a
> >> +	#         physical block for the file instead of using inline data.
> >> +	$XFS_IO_PROG -f -c "pwrite -S 0x5a 0 $blksz" -c fsync \
> >> +		"$file" >> $seqres.full 2>&1
> >> +
> >> +	# Step 2: Get physical block offset on device via FIEMAP
> >> +	local phys_offset
> >> +	phys_offset=$(_get_phys_offset "$file")
> >> +	if [ -z "$phys_offset" ]; then
> >> +		_fail "$test_name: failed to get physical block offset via fiemap"
> >> +	fi
> >> +
> >> +	# Step 3: Truncate file to non-block-aligned size and fsync.
> >> +	#         The on-disk region [eof_offset, blksz) may or may not be
> >> +	#         zeroed by the filesystem at this point.
> >> +	$XFS_IO_PROG -c "truncate $eof_offset" -c fsync \
> >> +		"$file" >> $seqres.full 2>&1
> >> +
> >> +	# Step 4: Unmount and restore the physical block to all-0x5a on disk.
> >> +	#         This bypasses the kernel's pagecache EOF-zeroing to ensure
> >> +	#         the stale pattern is present on disk. Then remount.
> >> +	_scratch_unmount
> >> +	$XFS_IO_PROG -d -c "pwrite -S 0x5a $phys_offset $blksz" \
> >> +		$SCRATCH_DEV >> $seqres.full 2>&1
> >> +	_scratch_mount >> $seqres.full 2>&1
> >> +
> >> +	# Step 5: Execute the extend operation.
> >> +	$XFS_IO_PROG -c "$extend_cmd" "$file" >> $seqres.full 2>&1
> >> +
> >> +	# Step 6: Verify gap [eof_offset, blksz) is zeroed BEFORE shutdown
> >> +	_check_gap_zero "$file" $eof_offset $gap_bytes "before shutdown" || return 1
> >> +
> >> +	# Step 7: Sync the extended range and shutdown the filesystem with
> >> +	#         journal flush. This persists the file size extending, and
> >> +	#         the filesystem should persist the zeroed data in the gap
> >> +	#         range as well.
> >> +	if [ "$extend_cmd" != "${extend_cmd#pwrite}" ]; then
> >> +		$XFS_IO_PROG -c "sync_range -w $blksz $blksz" \
> >> +			"$file" >> $seqres.full 2>&1
> >> +	fi
> >> +	_scratch_shutdown -f
> >> +
> >> +	# Step 8: Remount and verify gap is still zeroed
> >> +	_scratch_cycle_mount
> >> +
> >> +	# Verify file size was not rolled back after shutdown+remount
> >> +	local sz
> >> +	sz=$(stat -c %s "$file")
> >> +	if [ "$sz" -ne "$expected_new_sz" ]; then
> >> +		_fail "$test_name: file size rolled back after shutdown+remount: $sz != $expected_new_sz"
> >> +	fi
> >> +
> >> +	_check_gap_zero "$file" $eof_offset $gap_bytes "after shutdown+remount" || return 1
> >> +}
> >> +
> >> +_scratch_mkfs >> $seqres.full 2>&1
> >> +_scratch_mount
> >> +
> >> +blksz=$(_get_block_size $SCRATCH_MNT)
> >> +
> >> +# Test three variants of EOF-extending operations
> >> +_test_eof_zeroing "append_write" "pwrite -S 0x42 $blksz $blksz" $((blksz * 2))
> >> +_test_eof_zeroing "truncate_up" "truncate $((blksz * 2))" $((blksz * 2))
> >> +_test_eof_zeroing "fallocate" "falloc $blksz $blksz" $((blksz * 2))
> >> +
> >> +# success, all done
> >> +status=0
> >> +exit
> >> diff --git a/tests/generic/790.out b/tests/generic/790.out
> >> new file mode 100644
> >> index 00000000..e5e2cc09
> >> --- /dev/null
> >> +++ b/tests/generic/790.out
> >> @@ -0,0 +1,4 @@
> >> +QA output created by 790
> >> +append_write
> >> +truncate_up
> >> +fallocate
> >> -- 
> >> 2.52.0
> >>
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-27 13:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24  9:22 [PATCH v2] generic/790: test post-EOF gap zeroing persistence Zhang Yi
2026-04-24 13:09 ` Brian Foster
2026-04-25  3:06   ` Zhang Yi
2026-04-27 13:03     ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox