Linux EXT4 FS development
 help / color / mirror / Atom feed
From: Zhang Yi <yi.zhang@huaweicloud.com>
To: "Darrick J. Wong" <djwong@kernel.org>, Brian Foster <bfoster@redhat.com>
Cc: fstests@vger.kernel.org, zlang@kernel.org,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	jack@suse.cz, yi.zhang@huawei.com, yizhang089@gmail.com,
	yangerkun@huawei.com
Subject: Re: [PATCH v3] generic/790: test post-EOF gap zeroing persistence
Date: Wed, 13 May 2026 09:23:57 +0800	[thread overview]
Message-ID: <f9ad1efa-ccd3-4f5d-99dd-eacff4fc4eaf@huaweicloud.com> (raw)
In-Reply-To: <20260512180935.GB9544@frogsfrogsfrogs>

On 5/13/2026 2:09 AM, Darrick J. Wong wrote:
> On Wed, Apr 29, 2026 at 07:24:05AM -0400, Brian Foster wrote:
>> On Tue, Apr 28, 2026 at 04:57:50PM +0800, Zhang Yi wrote:
>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>
>>> Test that extending a file past a non-block-aligned EOF correctly
>>> zero-fills the gap [old_EOF, block_boundary), and that this zeroing
>>> persists through a filesystem shutdown+remount cycle.
>>>
>>> Stale data beyond EOF can persist on disk when append write data blocks
>>> are flushed before the on-disk file size update, or when concurrent
>>> append writeback and mmap writes persist non-zero data past EOF.
>>> Subsequent post-EOF operations (append write, fallocate, truncate up)
>>> must zero-fill and persist the gap to prevent exposing stale data.
>>>
>>> The test pollutes the file's last physical block (via FIEMAP + raw
>>> device write) with a sentinel pattern beyond i_size, then performs each
>>> extend operation and verifies the gap is zeroed both in memory and on
>>> disk.
> 
> Hmm.  So this test fails on ext4, which apparently doesn't persist the
> isize update, which seems like a reasonable behaveior.

This is an ext4 issue on the shutdown path. I have submitted a patch to fix
this issue. Please see below patch for details.

 https://lore.kernel.org/linux-ext4/20260424104201.1930823-1-yi.zhang@huaweicloud.com/

> 
> It also complains about seeing the 0x5a pattern on xfs, which surprises
> me a little.  Is that a bug in xfs, and is someone working on that?

The root cause of this XFS issue has been identified. Please refer to the
previous discussion below:

 https://lore.kernel.org/linux-ext4/038228d9-8058-43e0-8be9-b8afc9e8397c@huaweicloud.com/

I suppose Brian is considering how to fix this.

Cheers,
Yi.

> 
> --D
> 
>>>
>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>> ---
>>
>> Reviewed-by: Brian Foster <bfoster@redhat.com>
>>
>>> v2->v3:
>>>  - Add error check for the raw device pwrite, a failed pwrite would
>>>    silently leave the test continuing with an unpolluted block,
>>>    producing false-positive passes.
>>>  - Add sync_range -a to wait until the extending I/O completes and to
>>>    ensure file size update is persisted before shutdown, preventing
>>>    unexpected file size errors.
>>> v1->v2:
>>>  - Add _require_no_realtime to prevent testing on XFS realtime devices,
>>>    where file data may reside on $SCRATCH_RTDEV.
>>>  - Add _exclude_fs btrfs since FIEMAP returns logical addresses, not
>>>    physical device offsets, writing to these offsets on $SCRATCH_DEV
>>>    would corrupt the filesystem in multi-device setups. Besides, since
>>>    btrfs doesn't support shutdown right now, we can support it later.
>>>  - Add -v flag to od in _check_gap_zero() to prevent line folding of
>>>    identical consecutive lines.
>>>  - Add expected_new_sz parameter to _test_eof_zeroing(), verify file
>>>    size was not rolled back after shutdown+remount cycle, and also drop
>>>    the unnecessary file size check before the shutdown as well.
>>>  - Clarify the comment regarding when stale data beyond EOF can persist.
>>>
>>>  tests/generic/790     | 168 ++++++++++++++++++++++++++++++++++++++++++
>>>  tests/generic/790.out |   4 +
>>>  2 files changed, 172 insertions(+)
>>>  create mode 100755 tests/generic/790
>>>  create mode 100644 tests/generic/790.out
>>>
>>> diff --git a/tests/generic/790 b/tests/generic/790
>>> new file mode 100755
>>> index 00000000..6daf3793
>>> --- /dev/null
>>> +++ b/tests/generic/790
>>> @@ -0,0 +1,168 @@
>>> +#! /bin/bash
>>> +# SPDX-License-Identifier: GPL-2.0
>>> +# Copyright (c) 2026 Huawei.  All Rights Reserved.
>>> +#
>>> +# FS QA Test No. 790
>>> +#
>>> +# Test that extending a file past a non-block-aligned EOF correctly zero-fills
>>> +# the gap [old_EOF, block_boundary), and that this zeroing persists through a
>>> +# filesystem shutdown+remount cycle.
>>> +#
>>> +# Stale data beyond EOF can persist on disk when:
>>> +# 1) append write data blocks are flushed before the on-disk file size update,
>>> +#    and the system crashes in this window.
>>> +# 2) concurrent append writeback and mmap writes persist non-zero data past EOF.
>>> +#
>>> +# Subsequent post-EOF operations (append write, fallocate, truncate up) must
>>> +# zero-fill and persist the gap to prevent exposing stale data.
>>> +#
>>> +# The test pollutes the file's last physical block (via FIEMAP + raw device
>>> +# write) with a sentinel pattern beyond i_size, then performs each extend
>>> +# operation and verifies the gap is zeroed both in memory and on disk.
>>> +#
>>> +. ./common/preamble
>>> +_begin_fstest auto quick rw shutdown
>>> +
>>> +. ./common/filter
>>> +
>>> +_require_scratch
>>> +_require_block_device $SCRATCH_DEV
>>> +_require_no_realtime
>>> +_require_scratch_shutdown
>>> +_require_metadata_journaling $SCRATCH_DEV
>>> +
>>> +# FIEMAP on Btrfs returns logical addresses within the filesystem's address
>>> +# space, not physical device offsets. Writing to these offsets on $SCRATCH_DEV
>>> +# would corrupt the filesystem in multi-device setups.
>>> +_exclude_fs btrfs
>>> +
>>> +_require_xfs_io_command "fiemap"
>>> +_require_xfs_io_command "falloc"
>>> +_require_xfs_io_command "pwrite"
>>> +_require_xfs_io_command "truncate"
>>> +_require_xfs_io_command "sync_range"
>>> +
>>> +# Check that gap region [offset, offset+nbytes) is entirely zero
>>> +_check_gap_zero()
>>> +{
>>> +	local file="$1"
>>> +	local offset="$2"
>>> +	local nbytes="$3"
>>> +	local label="$4"
>>> +	local data
>>> +	local stripped
>>> +
>>> +	data=$(od -A n -t x1 -v -j $offset -N $nbytes "$file" 2>/dev/null)
>>> +
>>> +	# Remove whitespace and check if any byte is non-zero
>>> +	stripped=$(printf '%s' "$data" | tr -d ' \n\t')
>>> +	if [ -n "$stripped" ] && ! echo "$stripped" | grep -qE "^0+$"; then
>>> +		echo "FAIL: non-zero data in gap [$offset,$((offset + nbytes))) $label"
>>> +		_hexdump -N $((offset + nbytes)) "$file"
>>> +		return 1
>>> +	fi
>>> +	return 0
>>> +}
>>> +
>>> +# Get the physical block offset (in bytes) of the file's first block on device
>>> +_get_phys_offset()
>>> +{
>>> +	local file="$1"
>>> +	local fiemap_output
>>> +	local phys_blk
>>> +
>>> +	fiemap_output=$($XFS_IO_PROG -r -c "fiemap -v" "$file" 2>/dev/null)
>>> +	phys_blk=$(echo "$fiemap_output" | _filter_xfs_io_fiemap | head -1 | awk '{print $3}')
>>> +	if [ -z "$phys_blk" ]; then
>>> +		echo ""
>>> +		return
>>> +	fi
>>> +	# Convert 512-byte blocks to bytes
>>> +	echo $((phys_blk * 512))
>>> +}
>>> +
>>> +_test_eof_zeroing()
>>> +{
>>> +	local test_name="$1"
>>> +	local extend_cmd="$2"
>>> +	local expected_new_sz="$3"
>>> +	local file=$SCRATCH_MNT/testfile_${test_name}
>>> +
>>> +	echo "$test_name" | tee -a $seqres.full
>>> +
>>> +	# Compute non-block-aligned EOF offset
>>> +	local gap_bytes=16
>>> +	local eof_offset=$((blksz - gap_bytes))
>>> +
>>> +	# Step 1: Write one full block to ensure the filesystem allocates a
>>> +	#         physical block for the file instead of using inline data.
>>> +	$XFS_IO_PROG -f -c "pwrite -S 0x5a 0 $blksz" -c fsync \
>>> +		"$file" >> $seqres.full 2>&1
>>> +
>>> +	# Step 2: Get physical block offset on device via FIEMAP
>>> +	local phys_offset
>>> +	phys_offset=$(_get_phys_offset "$file")
>>> +	if [ -z "$phys_offset" ]; then
>>> +		_fail "$test_name: failed to get physical block offset via fiemap"
>>> +	fi
>>> +
>>> +	# Step 3: Truncate file to non-block-aligned size and fsync.
>>> +	#         The on-disk region [eof_offset, blksz) may or may not be
>>> +	#         zeroed by the filesystem at this point.
>>> +	$XFS_IO_PROG -c "truncate $eof_offset" -c fsync \
>>> +		"$file" >> $seqres.full 2>&1
>>> +
>>> +	# Step 4: Unmount and restore the physical block to all-0x5a on disk.
>>> +	#         This bypasses the kernel's pagecache EOF-zeroing to ensure
>>> +	#         the stale pattern is present on disk. Then remount.
>>> +	_scratch_unmount
>>> +	$XFS_IO_PROG -d -c "pwrite -S 0x5a $phys_offset $blksz" \
>>> +		$SCRATCH_DEV >> $seqres.full 2>&1
>>> +	if [ $? -ne 0 ]; then
>>> +		_fail "$test_name: failed to inject stale data on disk"
>>> +	fi
>>> +	_scratch_mount >> $seqres.full 2>&1
>>> +
>>> +	# Step 5: Execute the extend operation.
>>> +	$XFS_IO_PROG -c "$extend_cmd" "$file" >> $seqres.full 2>&1
>>> +
>>> +	# Step 6: Verify gap [eof_offset, blksz) is zeroed BEFORE shutdown
>>> +	_check_gap_zero "$file" $eof_offset $gap_bytes "before shutdown" || return 1
>>> +
>>> +	# Step 7: Sync the extended range and shutdown the filesystem with
>>> +	#         journal flush. This persists the file size extending, and
>>> +	#         the filesystem should persist the zeroed data in the gap
>>> +	#         range as well.
>>> +	if [ "$extend_cmd" != "${extend_cmd#pwrite}" ]; then
>>> +		$XFS_IO_PROG -c "sync_range -w $blksz $blksz" \
>>> +			-c "sync_range -a $blksz $blksz" \
>>> +			"$file" >> $seqres.full 2>&1
>>> +	fi
>>> +	_scratch_shutdown -f
>>> +
>>> +	# Step 8: Remount and verify gap is still zeroed
>>> +	_scratch_cycle_mount
>>> +
>>> +	# Verify file size was not rolled back after shutdown+remount
>>> +	local sz
>>> +	sz=$(stat -c %s "$file")
>>> +	if [ "$sz" -ne "$expected_new_sz" ]; then
>>> +		_fail "$test_name: file size rolled back after shutdown+remount: $sz != $expected_new_sz"
>>> +	fi
>>> +
>>> +	_check_gap_zero "$file" $eof_offset $gap_bytes "after shutdown+remount" || return 1
>>> +}
>>> +
>>> +_scratch_mkfs >> $seqres.full 2>&1
>>> +_scratch_mount
>>> +
>>> +blksz=$(_get_block_size $SCRATCH_MNT)
>>> +
>>> +# Test three variants of EOF-extending operations
>>> +_test_eof_zeroing "append_write" "pwrite -S 0x42 $blksz $blksz" $((blksz * 2))
>>> +_test_eof_zeroing "truncate_up" "truncate $((blksz * 2))" $((blksz * 2))
>>> +_test_eof_zeroing "fallocate" "falloc $blksz $blksz" $((blksz * 2))
>>> +
>>> +# success, all done
>>> +status=0
>>> +exit
>>> diff --git a/tests/generic/790.out b/tests/generic/790.out
>>> new file mode 100644
>>> index 00000000..e5e2cc09
>>> --- /dev/null
>>> +++ b/tests/generic/790.out
>>> @@ -0,0 +1,4 @@
>>> +QA output created by 790
>>> +append_write
>>> +truncate_up
>>> +fallocate
>>> -- 
>>> 2.52.0
>>>
>>
>>


  reply	other threads:[~2026-05-13  1:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  8:57 [PATCH v3] generic/790: test post-EOF gap zeroing persistence Zhang Yi
2026-04-29 11:24 ` Brian Foster
2026-05-12 18:09   ` Darrick J. Wong
2026-05-13  1:23     ` Zhang Yi [this message]
2026-05-01 16:46 ` Zorro Lang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f9ad1efa-ccd3-4f5d-99dd-eacff4fc4eaf@huaweicloud.com \
    --to=yi.zhang@huaweicloud.com \
    --cc=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yizhang089@gmail.com \
    --cc=zlang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox