Linux filesystem development
 help / color / mirror / Atom feed
* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
       [not found] <20260611114029.176200-4-p.raghav@samsung.com>
@ 2026-06-16 13:31 ` Christoph Hellwig
  2026-06-17  9:44   ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2026-06-16 13:31 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: linux-xfs, bfoster, lukas, Darrick J . Wong, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

[API questions for Zhang and -fsdevel/ -api below)

> +	unsigned int		blksize = i_blocksize(inode);
> +	loff_t			offset_aligned = round_down(offset, blksize);

I think this actually needs to found up instead of rounding down.

> +	/*
> +	 * Zero the tail of the old EOF block and any space up to the new
> +	 * offset.
> +	 * In the usual truncate path, xfs_falloc_setsize takes care of
> +	 * zeroing those blocks.
> +	 */
> +	if (offset_aligned > old_size) {
> +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
> +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
> +				NULL, &did_zero);
> +		if (error)
> +			return error;
> +	}

... then this will properly zero from the old i_size to the first block
boundary after the old size.

> +	error = xfs_alloc_file_space(ip, offset, len,
> +			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);

... and here we need to pass offset_aligned instead of offset and
a new calculated len based on the last block boundary, and then
zero again after that.  That is assuming FALLOC_FL_WRITE_ZEROES
allows unaligned ranges for file systems.  The block code doesn't,
but I can't quite follow the ext4 code if it does or not, and there
is no mention of FALLOC_FL_WRITE_ZEROES even in the latest man-pages
tree.

Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
and make sure no existing data before the range is lost and the
entire range is zeroed?


> +	if (error)
> +		return error;
> +
> +	/*
> +	 * xfs_falloc_setsize() would re-zero the written extents via
> +	 * iomap_zero_range(). Use xfs_setfilesize() instead.
> +	 * Update in-core i_size first as xfs_setfilesize() clamps the on-disk
> +	 * size to it.
> +	 */
> +	if (new_size > i_size_read(inode))
> +		i_size_write(inode, new_size);

I think Sashiko is right that we need a pagecache_isize_extended and
filemap_write_and_wait_range calls here.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-16 13:31 ` [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES Christoph Hellwig
@ 2026-06-17  9:44   ` Pankaj Raghav (Samsung)
  2026-06-18  3:22     ` Zhang Yi
  2026-06-18  9:00     ` Christoph Hellwig
  0 siblings, 2 replies; 13+ messages in thread
From: Pankaj Raghav (Samsung) @ 2026-06-17  9:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:
> [API questions for Zhang and -fsdevel/ -api below)
> 
> > +	unsigned int		blksize = i_blocksize(inode);
> > +	loff_t			offset_aligned = round_down(offset, blksize);
> 
> I think this actually needs to found up instead of rounding down.
> 
> > +	/*
> > +	 * Zero the tail of the old EOF block and any space up to the new
> > +	 * offset.
> > +	 * In the usual truncate path, xfs_falloc_setsize takes care of
> > +	 * zeroing those blocks.
> > +	 */
> > +	if (offset_aligned > old_size) {
> > +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
> > +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
> > +				NULL, &did_zero);
> > +		if (error)
> > +			return error;
> > +	}
> 
> ... then this will properly zero from the old i_size to the first block
> boundary after the old size.

Hmm, right now we do this:

|----------|----------|----------|
    ^      ^     ^    ^
    |      |     |    |
 old_size  |   offset |
           |          |
	off_rd       off_ru

At the moment, we zero out old_size to off_rd and pass offset to
xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd.

What you are proposing is to zero out old_size to off_ru, and pass
off_ru to xfs_alloc_file_space. I don't exactly understand the
difference.

> 
> > +	error = xfs_alloc_file_space(ip, offset, len,
> > +			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
> 
> ... and here we need to pass offset_aligned instead of offset and
> a new calculated len based on the last block boundary, and then
> zero again after that.  That is assuming FALLOC_FL_WRITE_ZEROES
> allows unaligned ranges for file systems.  The block code doesn't,
> but I can't quite follow the ext4 code if it does or not, and there
> is no mention of FALLOC_FL_WRITE_ZEROES even in the latest man-pages
> tree.


I can't find any references to FALLOC_FL_WRITE_ZEROES in the man pages
master branch. Maybe we missed it. I can send a separate patch for that
once we have some clarity on the API.
> 
> Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
> and make sure no existing data before the range is lost and the
> entire range is zeroed?
> 

I added FALLOC_FL_WRITE_ZEROES support to ltp (both fsx and fsstress).
For example, generic/363 tests for unaligned writes and checks for any
stale data. By default, I think we do unaligned reads, writes and
truncate in fsx.

> 
> > +	if (error)
> > +		return error;
> > +
> > +	/*
> > +	 * xfs_falloc_setsize() would re-zero the written extents via
> > +	 * iomap_zero_range(). Use xfs_setfilesize() instead.
> > +	 * Update in-core i_size first as xfs_setfilesize() clamps the on-disk
> > +	 * size to it.
> > +	 */
> > +	if (new_size > i_size_read(inode))
> > +		i_size_write(inode, new_size);
> 
> I think Sashiko is right that we need a pagecache_isize_extended and
> filemap_write_and_wait_range calls here.
> 

Ok. Current fsx or fsstress did not expose this
problem. I will look into this. Thanks Christoph.

--
Pankaj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-17  9:44   ` Pankaj Raghav (Samsung)
@ 2026-06-18  3:22     ` Zhang Yi
  2026-06-18  8:18       ` Pankaj Raghav (Samsung)
  2026-06-18  8:59       ` Christoph Hellwig
  2026-06-18  9:00     ` Christoph Hellwig
  1 sibling, 2 replies; 13+ messages in thread
From: Zhang Yi @ 2026-06-18  3:22 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung), Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, linux-fsdevel, linux-api

On 6/17/2026 5:44 PM, Pankaj Raghav (Samsung) wrote:
> On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:
>> [API questions for Zhang and -fsdevel/ -api below)
>>
>>> +	unsigned int		blksize = i_blocksize(inode);
>>> +	loff_t			offset_aligned = round_down(offset, blksize);
>>
>> I think this actually needs to found up instead of rounding down.
>>
>>> +	/*
>>> +	 * Zero the tail of the old EOF block and any space up to the new
>>> +	 * offset.
>>> +	 * In the usual truncate path, xfs_falloc_setsize takes care of
>>> +	 * zeroing those blocks.
>>> +	 */
>>> +	if (offset_aligned > old_size) {
>>> +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
>>> +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
>>> +				NULL, &did_zero);
>>> +		if (error)
>>> +			return error;
>>> +	}
>>
>> ... then this will properly zero from the old i_size to the first block
>> boundary after the old size.
> 
> Hmm, right now we do this:
> 
> |----------|----------|----------|
>     ^      ^     ^    ^
>     |      |     |    |
>  old_size  |   offset |
>            |          |
> 	off_rd       off_ru
> 
> At the moment, we zero out old_size to off_rd and pass offset to
> xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd.
> 
> What you are proposing is to zero out old_size to off_ru, and pass
> off_ru to xfs_alloc_file_space. I don't exactly understand the
> difference.

IMO, FALLOC_FL_WRITE_ZEROES should handle the unaligned cases, if the
'offset' and 'end' are not block-size aligned, then:

1) if the two blocks straddling the boundaries have not yet been allocated,
   or allocated as unwritten, we should round outward the allocation range
   and zero out all allocated blocks, including those two boundary blocks.
2) if the blocks at the boundaries are already in the written state — which
   can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
   should be careful here: we should only zero the ranges [offset, offset_ru)
   and [end_rd, end) for the boundary blocks, leaving the already-written
   portions of the boundary blocks intact.

Thoughs?

Regarding the second point, the current ext4 implementation has an issue —
it zeroes out the entire boundary blocks. I overlooked this previously, and
I appreciate you pointing it out.

> 
>>
>>> +	error = xfs_alloc_file_space(ip, offset, len,
>>> +			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
>>
>> ... and here we need to pass offset_aligned instead of offset and
>> a new calculated len based on the last block boundary, and then
>> zero again after that.  That is assuming FALLOC_FL_WRITE_ZEROES
>> allows unaligned ranges for file systems.  The block code doesn't,
>> but I can't quite follow the ext4 code if it does or not, and there
>> is no mention of FALLOC_FL_WRITE_ZEROES even in the latest man-pages
>> tree.
> 
> 
> I can't find any references to FALLOC_FL_WRITE_ZEROES in the man pages
> master branch. Maybe we missed it. I can send a separate patch for that
> once we have some clarity on the API.

Yeah, I missed to update the man pages last year. Thanks.

Best Regards,
Yi.

>>
>> Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
>> and make sure no existing data before the range is lost and the
>> entire range is zeroed?
>>
> 
> I added FALLOC_FL_WRITE_ZEROES support to ltp (both fsx and fsstress).
> For example, generic/363 tests for unaligned writes and checks for any
> stale data. By default, I think we do unaligned reads, writes and
> truncate in fsx.
> 
>>
>>> +	if (error)
>>> +		return error;
>>> +
>>> +	/*
>>> +	 * xfs_falloc_setsize() would re-zero the written extents via
>>> +	 * iomap_zero_range(). Use xfs_setfilesize() instead.
>>> +	 * Update in-core i_size first as xfs_setfilesize() clamps the on-disk
>>> +	 * size to it.
>>> +	 */
>>> +	if (new_size > i_size_read(inode))
>>> +		i_size_write(inode, new_size);
>>
>> I think Sashiko is right that we need a pagecache_isize_extended and
>> filemap_write_and_wait_range calls here.
>>
> 
> Ok. Current fsx or fsstress did not expose this
> problem. I will look into this. Thanks Christoph.
> 
> --
> Pankaj
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  3:22     ` Zhang Yi
@ 2026-06-18  8:18       ` Pankaj Raghav (Samsung)
  2026-06-18  8:57         ` Christoph Hellwig
  2026-06-18  8:59       ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Pankaj Raghav (Samsung) @ 2026-06-18  8:18 UTC (permalink / raw)
  To: Zhang Yi, hch
  Cc: Christoph Hellwig, Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	linux-fsdevel, linux-api

On Thu, Jun 18, 2026 at 11:22:45AM +0800, Zhang Yi wrote:
> On 6/17/2026 5:44 PM, Pankaj Raghav (Samsung) wrote:
> > On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:
> >> [API questions for Zhang and -fsdevel/ -api below)
> >>
> >>> +	unsigned int		blksize = i_blocksize(inode);
> >>> +	loff_t			offset_aligned = round_down(offset, blksize);
> >>
> >> I think this actually needs to found up instead of rounding down.
> >>
> >>> +	/*
> >>> +	 * Zero the tail of the old EOF block and any space up to the new
> >>> +	 * offset.
> >>> +	 * In the usual truncate path, xfs_falloc_setsize takes care of
> >>> +	 * zeroing those blocks.
> >>> +	 */
> >>> +	if (offset_aligned > old_size) {
> >>> +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
> >>> +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
> >>> +				NULL, &did_zero);
> >>> +		if (error)
> >>> +			return error;
> >>> +	}
> >>
> >> ... then this will properly zero from the old i_size to the first block
> >> boundary after the old size.
> > 
> > Hmm, right now we do this:
> > 
> > |----------|----------|----------|
> >     ^      ^     ^    ^
> >     |      |     |    |
> >  old_size  |   offset |
> >            |          |
> > 	off_rd       off_ru
> > 
> > At the moment, we zero out old_size to off_rd and pass offset to
> > xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd.
> > 
> > What you are proposing is to zero out old_size to off_ru, and pass
> > off_ru to xfs_alloc_file_space. I don't exactly understand the
> > difference.
> 
> IMO, FALLOC_FL_WRITE_ZEROES should handle the unaligned cases, if the
> 'offset' and 'end' are not block-size aligned, then:
> 
> 1) if the two blocks straddling the boundaries have not yet been allocated,
>    or allocated as unwritten, we should round outward the allocation range
>    and zero out all allocated blocks, including those two boundary blocks.
> 2) if the blocks at the boundaries are already in the written state — which
>    can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
>    should be careful here: we should only zero the ranges [offset, offset_ru)
>    and [end_rd, end) for the boundary blocks, leaving the already-written
>    portions of the boundary blocks intact.
> 
> Thoughs?

Ok, this makes sense to me.

@Christoph, now I understood your reply about rounding up and rounding
down.

So, I could do xfs_zero_range(offset, offset_ru)[1] and xfs_zero_range(end_rd, end).
(offset_ru, end_rd) will be using the accelerated XFS_BMAPI_ZERO to 
zero out the extents. 

I also need to add pagecache_isize_extended and filemap_write_and_wait_range
to persist the xfs_zero_range calls before we call setfilesize.

xfs_zero_range should take care of the boundary blocks so that we don't
overwrite any data or zeroing out the unallocated or unwritten blocks as
pointed out in 1 and 2.

Let me know what you think. I am also wondering how fsx did not trigger
the boundary block edge case where the current impl might zero out user
data in the boundary blocks.

[1] if old_size < offset, then xfs_zero_range(old_size, offset_ru)) 
--
Pankaj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  8:18       ` Pankaj Raghav (Samsung)
@ 2026-06-18  8:57         ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2026-06-18  8:57 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Zhang Yi, Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	linux-fsdevel, linux-api

On Thu, Jun 18, 2026 at 10:18:49AM +0200, Pankaj Raghav (Samsung) wrote:
> So, I could do xfs_zero_range(offset, offset_ru)[1] and xfs_zero_range(end_rd, end).
> (offset_ru, end_rd) will be using the accelerated XFS_BMAPI_ZERO to 
> zero out the extents. 

Yeah.

> I also need to add pagecache_isize_extended and filemap_write_and_wait_range
> to persist the xfs_zero_range calls before we call setfilesize.

Yeah,  Or we need to find a way to use xfs_falloc_setsize after all
which would share all that code, although I'm not really sure how
that would work best.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  3:22     ` Zhang Yi
  2026-06-18  8:18       ` Pankaj Raghav (Samsung)
@ 2026-06-18  8:59       ` Christoph Hellwig
  2026-06-18 10:26         ` Zhang Yi
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2026-06-18  8:59 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Pankaj Raghav (Samsung), Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	linux-fsdevel, linux-api

On Thu, Jun 18, 2026 at 11:22:45AM +0800, Zhang Yi wrote:
> 1) if the two blocks straddling the boundaries have not yet been allocated,
>    or allocated as unwritten, we should round outward the allocation range
>    and zero out all allocated blocks, including those two boundary blocks.
> 2) if the blocks at the boundaries are already in the written state — which
>    can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
>    should be careful here: we should only zero the ranges [offset, offset_ru)
>    and [end_rd, end) for the boundary blocks, leaving the already-written
>    portions of the boundary blocks intact.
> 
> Thoughs?

Yes.

> Regarding the second point, the current ext4 implementation has an issue —
> it zeroes out the entire boundary blocks. I overlooked this previously, and
> I appreciate you pointing it out.

Which means we're missing test coverage for this as well..


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-17  9:44   ` Pankaj Raghav (Samsung)
  2026-06-18  3:22     ` Zhang Yi
@ 2026-06-18  9:00     ` Christoph Hellwig
  2026-06-18  9:28       ` Pankaj Raghav (Samsung)
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2026-06-18  9:00 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

On Wed, Jun 17, 2026 at 11:44:47AM +0200, Pankaj Raghav (Samsung) wrote:
> > Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
> > and make sure no existing data before the range is lost and the
> > entire range is zeroed?
> > 
> 
> I added FALLOC_FL_WRITE_ZEROES support to ltp (both fsx and fsstress).
> For example, generic/363 tests for unaligned writes and checks for any
> stale data. By default, I think we do unaligned reads, writes and
> truncate in fsx.

But I guess not unaligned FALLOC_FL_WRITE_ZEROES?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  9:00     ` Christoph Hellwig
@ 2026-06-18  9:28       ` Pankaj Raghav (Samsung)
  2026-06-18  9:36         ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Pankaj Raghav (Samsung) @ 2026-06-18  9:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

On Thu, Jun 18, 2026 at 02:00:16AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 17, 2026 at 11:44:47AM +0200, Pankaj Raghav (Samsung) wrote:
> > > Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
> > > and make sure no existing data before the range is lost and the
> > > entire range is zeroed?
> > > 
> > 
> > I added FALLOC_FL_WRITE_ZEROES support to ltp (both fsx and fsstress).
> > For example, generic/363 tests for unaligned writes and checks for any
> > stale data. By default, I think we do unaligned reads, writes and
> > truncate in fsx.
> 
> But I guess not unaligned FALLOC_FL_WRITE_ZEROES?

        -r readbdy: 4096 would make reads page aligned (default 1)
        -t truncbdy: 4096 would make truncates page aligned (default 1)
        -w writebdy: 4096 would make writes page aligned (default 1)

FALLOC_FL_WRITE_ZEROES comes under truncate. So I would assume we also
do that. That is how I also found the issue with offset > EOF. I will
take a look or else, I will add a test case to test this condition!

Thanks.
--
Pankaj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  9:28       ` Pankaj Raghav (Samsung)
@ 2026-06-18  9:36         ` Christoph Hellwig
  2026-06-18 13:26           ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2026-06-18  9:36 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Christoph Hellwig, Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	Zhang Yi, linux-fsdevel, linux-api

On Thu, Jun 18, 2026 at 11:28:15AM +0200, Pankaj Raghav (Samsung) wrote:
> > But I guess not unaligned FALLOC_FL_WRITE_ZEROES?
> 
>         -r readbdy: 4096 would make reads page aligned (default 1)
>         -t truncbdy: 4096 would make truncates page aligned (default 1)
>         -w writebdy: 4096 would make writes page aligned (default 1)
> 
> FALLOC_FL_WRITE_ZEROES comes under truncate. So I would assume we also
> do that. That is how I also found the issue with offset > EOF. I will
> take a look or else, I will add a test case to test this condition!

A targeted test using xfs_io that does FALLOC_FL_WRITE_ZEROES on an
unaligned range and then checks that the data around it is preserved
while the unaligned data in the range is zeroed would also be useful.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  8:59       ` Christoph Hellwig
@ 2026-06-18 10:26         ` Zhang Yi
  0 siblings, 0 replies; 13+ messages in thread
From: Zhang Yi @ 2026-06-18 10:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pankaj Raghav (Samsung), Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	linux-fsdevel, linux-api

On 6/18/2026 4:59 PM, Christoph Hellwig wrote:
> On Thu, Jun 18, 2026 at 11:22:45AM +0800, Zhang Yi wrote:
>> 1) if the two blocks straddling the boundaries have not yet been allocated,
>>    or allocated as unwritten, we should round outward the allocation range
>>    and zero out all allocated blocks, including those two boundary blocks.
>> 2) if the blocks at the boundaries are already in the written state — which
>>    can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
>>    should be careful here: we should only zero the ranges [offset, offset_ru)
>>    and [end_rd, end) for the boundary blocks, leaving the already-written
>>    portions of the boundary blocks intact.
>>
>> Thoughs?
> 
> Yes.
> 
>> Regarding the second point, the current ext4 implementation has an issue —
>> it zeroes out the entire boundary blocks. I overlooked this previously, and
>> I appreciate you pointing it out.
> 
> Which means we're missing test coverage for this as well..
> 

Ha, I just re-checked the ext4 implementation and found that scenario 2 is
actually fine, Sorry I misread the code earlier. Fortunately, no data
corruption occurred. :-)

The real issue is in scenario 1, where the boundary blocks are not correctly
converted to written-type extents.

As for testing the unaligned case, I also agree that we should explicitly add
a dedicated test for it. Relying on existing fsstress and fsx is not reliable
enough in this regard.

Thanks,
Yi.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  9:36         ` Christoph Hellwig
@ 2026-06-18 13:26           ` Pankaj Raghav (Samsung)
  2026-06-18 13:37             ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Pankaj Raghav (Samsung) @ 2026-06-18 13:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

On Thu, Jun 18, 2026 at 02:36:00AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 18, 2026 at 11:28:15AM +0200, Pankaj Raghav (Samsung) wrote:
> > > But I guess not unaligned FALLOC_FL_WRITE_ZEROES?
> > 
> >         -r readbdy: 4096 would make reads page aligned (default 1)
> >         -t truncbdy: 4096 would make truncates page aligned (default 1)
> >         -w writebdy: 4096 would make writes page aligned (default 1)
> > 
> > FALLOC_FL_WRITE_ZEROES comes under truncate. So I would assume we also
> > do that. That is how I also found the issue with offset > EOF. I will
> > take a look or else, I will add a test case to test this condition!
> 
> A targeted test using xfs_io that does FALLOC_FL_WRITE_ZEROES on an
> unaligned range and then checks that the data around it is preserved
> while the unaligned data in the range is zeroed would also be useful.
> 


|----------|----------|----------|----------|----------|
           ^     ^    ^                     ^     ^    ^
           |     |    |                     |     |    |
           |   offset |                     |    end   |
           |          |                     |          |
	off_rd       off_ru              end_rd       end_ru


I wrote a simple test case for unaligned FALLOC_FL_WRITE_ZEROES. The
existing code **does not** overwrite any user data. Here is why:

- xfs_free_file_space (punch hole) punches inward (offset_ru -> end_rd)
  and zeroes out from (offset -> offset_ru) and (end_rd -> end) with
  xfs_zero_range

- Luckily, even though we consider offset_rd to end_ru in
  alloc_file_space, XFS_BMAPI_ZERO will skip zeroing already written
  edge blocks and only offset_ru -> end_rd are zeroed using unmap zero range.
  ( (offset -> offset_ru) -> EXT_NORM, (offset_ru -> end_rd) -> HOLE,
  (end_rd -> end) -> EXT_NORM)

I didn't think it through but the code still holds for these cases. This
might be the reason why fsx did not complain? I will also document this
in the patch.

But as you said, we need a xfstest for this boundary block checks in case
some behaviour changes in the future.

I guess apart from this, the only change is to address persisting data
when offset > old_size as sashiko reported.

Let me know what you think.
--
Pankaj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18 13:26           ` Pankaj Raghav (Samsung)
@ 2026-06-18 13:37             ` Christoph Hellwig
  2026-06-18 13:57               ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2026-06-18 13:37 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Christoph Hellwig, Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	Zhang Yi, linux-fsdevel, linux-api

On Thu, Jun 18, 2026 at 03:26:14PM +0200, Pankaj Raghav (Samsung) wrote:
> existing code **does not** overwrite any user data. Here is why:
> 
> - xfs_free_file_space (punch hole) punches inward (offset_ru -> end_rd)
>   and zeroes out from (offset -> offset_ru) and (end_rd -> end) with
>   xfs_zero_range

Ah, right.

> - Luckily, even though we consider offset_rd to end_ru in
>   alloc_file_space, XFS_BMAPI_ZERO will skip zeroing already written
>   edge blocks and only offset_ru -> end_rd are zeroed using unmap zero range.
>   ( (offset -> offset_ru) -> EXT_NORM, (offset_ru -> end_rd) -> HOLE,
>   (end_rd -> end) -> EXT_NORM)

Oh.  So this was lucky because of the corner cases.  I still think doing
the proper alignment in the higher level code and only calling down for
the area that we really want to zero would be benefitial here, if only
to make the code easier to follow.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18 13:37             ` Christoph Hellwig
@ 2026-06-18 13:57               ` Pankaj Raghav (Samsung)
  0 siblings, 0 replies; 13+ messages in thread
From: Pankaj Raghav (Samsung) @ 2026-06-18 13:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

On Thu, Jun 18, 2026 at 06:37:54AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 18, 2026 at 03:26:14PM +0200, Pankaj Raghav (Samsung) wrote:
> > existing code **does not** overwrite any user data. Here is why:
> > 
> > - xfs_free_file_space (punch hole) punches inward (offset_ru -> end_rd)
> >   and zeroes out from (offset -> offset_ru) and (end_rd -> end) with
> >   xfs_zero_range
> 
> Ah, right.
> 
> > - Luckily, even though we consider offset_rd to end_ru in
> >   alloc_file_space, XFS_BMAPI_ZERO will skip zeroing already written
> >   edge blocks and only offset_ru -> end_rd are zeroed using unmap zero range.
> >   ( (offset -> offset_ru) -> EXT_NORM, (offset_ru -> end_rd) -> HOLE,
> >   (end_rd -> end) -> EXT_NORM)
> 
> Oh.  So this was lucky because of the corner cases.  I still think doing
> the proper alignment in the higher level code and only calling down for
> the area that we really want to zero would be benefitial here, if only
> to make the code easier to follow.

I agree. I will do the changes and document some of the nuances in the
code.

Thanks!

--
Pankaj

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-06-18 13:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260611114029.176200-4-p.raghav@samsung.com>
2026-06-16 13:31 ` [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES Christoph Hellwig
2026-06-17  9:44   ` Pankaj Raghav (Samsung)
2026-06-18  3:22     ` Zhang Yi
2026-06-18  8:18       ` Pankaj Raghav (Samsung)
2026-06-18  8:57         ` Christoph Hellwig
2026-06-18  8:59       ` Christoph Hellwig
2026-06-18 10:26         ` Zhang Yi
2026-06-18  9:00     ` Christoph Hellwig
2026-06-18  9:28       ` Pankaj Raghav (Samsung)
2026-06-18  9:36         ` Christoph Hellwig
2026-06-18 13:26           ` Pankaj Raghav (Samsung)
2026-06-18 13:37             ` Christoph Hellwig
2026-06-18 13:57               ` Pankaj Raghav (Samsung)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox