Linux XFS filesystem development
 help / color / mirror / Atom feed
* [PATCH v7 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs
@ 2026-06-22  8:31 Pankaj Raghav
  2026-06-22  8:31 ` [PATCH v7 1/2] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
  2026-06-22  8:31 ` [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
  0 siblings, 2 replies; 4+ messages in thread
From: Pankaj Raghav @ 2026-06-22  8:31 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

The benefits of FALLOC_FL_WRITE_ZEROES was already discussed as a part
of Zhang Yi's initial patches[1]. Postgres developer Andres also
mentioned they would like to use this feature in Postgres [2].

I tested the changes with fsstress and fsx based on the xfstests patch I
sent recently to test this flag[4]. generic/363 helped me debug the
crash I noticed when I did the initial implementation[3].

Dave initially suggested to create a common helper based on
xfs_iomap_convert_unwritten() but as it can be seen in the previous
version, a lot of the code had to be rewritten. The changes had more in
common with xfs_alloc_file_space(). This version reuses
xfs_alloc_file_space() for write zeroes.

Thanks to Christoph for all the review comments and design suggestions
that were made both offline and online for this series.

Stress test generic/363 generic/127 xfs/131 are passing. I have started
the full xfstest suite for this series.

Changes since v6:
- Pass only offset that needs to be zeroed to alloc_file_space (Christoph).
- Add RVB from Christoph.
- Change the call order. Call xfs_falloc_setsize() and then call
  xfs_alloc_file_space().
- Remove the prep patch to allow xfs_set_filesize to take 64-bit len.

Changes since v5:
- Add a prep patch to allow xfs_set_filesize to take 64-bit len
  (Sashiko)

Changes since v4:
- Introduce an enum for allocation mode in xfs_alloc_file_space (Christoph)
- Use xfs_set_filesize instead of updating the on-disk size in the
  function.

Changes since v3:
- Introduce xfs_bmap_alloc_or_convert_range() in xfs_iomap.c for easy
  review experience (christoph)
- Add extsz hint and rt support in xfs_bmap_alloc_or_convert_range()

Changes since v2:
- Add allow_write_zeroes to xfs_global so that we can enable this
  feature independent of the HW underneath.

Changes since v1 [5.1 5.2]:
- Added a new function xfs_bmap_alloc_or_convert_range() based on Dave's
  feedback.
- Changed the xfs_falloc_write_zeroes to use
  xfs_bmap_alloc_or_convert_range() instead of doing prealloc and
  convert approach.

[1] https://lore.kernel.org/linux-fsdevel/20250619111806.3546162-1-yi.zhang@huaweicloud.com/
[2] https://lore.kernel.org/linux-fsdevel/20260217055103.GA6174@lst.de/T/#m7935b9bab32bb5ff372507f84803b8753ad1c814
[3] https://lore.kernel.org/linux-xfs/6i2jvzn3lyugjlbgmjzpped3gogzyqv5mpe2uqaifz4vjpaega@pomzoq7ley77/
[4] https://lore.kernel.org/linux-xfs/20260312195308.738189-1-p.raghav@samsung.com/
[5.1] https://lore.kernel.org/linux-xfs/20260309180708.427553-2-lukas@herbolt.com/
[5.2] https://lore.kernel.org/linux-xfs/abC1LvRElctaHPe5@dread/

Pankaj Raghav (2):
  xfs: add an allocation mode to xfs_alloc_file_space()
  xfs: add support for FALLOC_FL_WRITE_ZEROES

 fs/xfs/xfs_bmap_util.c | 42 +++++++++++++++++++----
 fs/xfs/xfs_bmap_util.h |  7 +++-
 fs/xfs/xfs_file.c      | 76 +++++++++++++++++++++++++++++++++++++++---
 3 files changed, 114 insertions(+), 11 deletions(-)


base-commit: 6e24acc45ab58d39a0162b4d5f3fd001d07d868e
-- 
2.51.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v7 1/2] xfs: add an allocation mode to xfs_alloc_file_space()
  2026-06-22  8:31 [PATCH v7 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
@ 2026-06-22  8:31 ` Pankaj Raghav
  2026-06-22  8:31 ` [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
  1 sibling, 0 replies; 4+ messages in thread
From: Pankaj Raghav @ 2026-06-22  8:31 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

xfs_alloc_file_space() hardcodes XFS_BMAPI_PREALLOC to preallocate
unwritten extents across a range.

In preparation for FALLOC_FL_WRITE_ZEROES, add an explicit allocation
mode argument, enum xfs_alloc_file_space_mode, and derive the xfs_bmapi
flags from it. The only mode for now is XFS_ALLOC_FILE_SPACE_PREALLOC,
which preallocates unwritten extents and marks the inode as preallocated
exactly as before, so there is no functional change.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 fs/xfs/xfs_bmap_util.c | 25 +++++++++++++++++++++----
 fs/xfs/xfs_bmap_util.h |  6 +++++-
 fs/xfs/xfs_file.c      |  9 ++++++---
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 3b9f262f8e91..8dfb3c1e3759 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -642,11 +642,19 @@ xfs_free_eofblocks(
 	return error;
 }
 
+/*
+ * Allocate space for a file according to @mode:
+ *
+ * XFS_ALLOC_FILE_SPACE_PREALLOC:
+ * Preallocate unwritten extents across the range and mark the inode as
+ * preallocated.
+ */
 int
 xfs_alloc_file_space(
 	struct xfs_inode	*ip,
 	xfs_off_t		offset,
-	xfs_off_t		len)
+	xfs_off_t		len,
+	enum xfs_alloc_file_space_mode mode)
 {
 	xfs_mount_t		*mp = ip->i_mount;
 	xfs_off_t		count;
@@ -657,6 +665,7 @@ xfs_alloc_file_space(
 	int			rt;
 	xfs_trans_t		*tp;
 	xfs_bmbt_irec_t		imaps[1], *imapp;
+	uint32_t		bmapi_flags, nr_exts;
 	int			error;
 
 	if (xfs_is_always_cow_inode(ip))
@@ -674,6 +683,15 @@ xfs_alloc_file_space(
 	if (len <= 0)
 		return -EINVAL;
 
+	switch (mode) {
+	case XFS_ALLOC_FILE_SPACE_PREALLOC:
+		bmapi_flags = XFS_BMAPI_PREALLOC;
+		nr_exts = XFS_IEXT_ADD_NOSPLIT_CNT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
 	rt = XFS_IS_REALTIME_INODE(ip);
 	extsz = xfs_get_extsz_hint(ip);
 
@@ -733,8 +751,7 @@ xfs_alloc_file_space(
 		if (error)
 			break;
 
-		error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK,
-				XFS_IEXT_ADD_NOSPLIT_CNT);
+		error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK, nr_exts);
 		if (error)
 			goto error;
 
@@ -748,7 +765,7 @@ xfs_alloc_file_space(
 		 * will eventually reach the requested range.
 		 */
 		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
-				allocatesize_fsb, XFS_BMAPI_PREALLOC, 0, imapp,
+				allocatesize_fsb, bmapi_flags, 0, imapp,
 				&nimaps);
 		if (error) {
 			if (error != -ENOSR)
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index c477b3361630..232b4c48247e 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -55,8 +55,12 @@ int	xfs_bmap_last_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 			     int *is_empty);
 
 /* preallocation and hole punch interface */
+enum xfs_alloc_file_space_mode {
+	XFS_ALLOC_FILE_SPACE_PREALLOC,
+};
+
 int	xfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
-		xfs_off_t len);
+		xfs_off_t len, enum xfs_alloc_file_space_mode mode);
 int	xfs_free_file_space(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t len, struct xfs_zone_alloc_ctx *ac);
 int	xfs_collapse_file_space(struct xfs_inode *, xfs_off_t offset,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 845a97c9b063..e90ea6ebdc8e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1406,7 +1406,8 @@ xfs_falloc_zero_range(
 		len = round_up(offset + len, blksize) -
 			round_down(offset, blksize);
 		offset = round_down(offset, blksize);
-		error = xfs_alloc_file_space(ip, offset, len);
+		error = xfs_alloc_file_space(ip, offset, len,
+				XFS_ALLOC_FILE_SPACE_PREALLOC);
 	}
 	if (error)
 		return error;
@@ -1432,7 +1433,8 @@ xfs_falloc_unshare_range(
 	if (error)
 		return error;
 
-	error = xfs_alloc_file_space(XFS_I(inode), offset, len);
+	error = xfs_alloc_file_space(XFS_I(inode), offset, len,
+			XFS_ALLOC_FILE_SPACE_PREALLOC);
 	if (error)
 		return error;
 	return xfs_falloc_setsize(file, new_size);
@@ -1460,7 +1462,8 @@ xfs_falloc_allocate_range(
 	if (error)
 		return error;
 
-	error = xfs_alloc_file_space(XFS_I(inode), offset, len);
+	error = xfs_alloc_file_space(XFS_I(inode), offset, len,
+			XFS_ALLOC_FILE_SPACE_PREALLOC);
 	if (error)
 		return error;
 	return xfs_falloc_setsize(file, new_size);
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-22  8:31 [PATCH v7 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
  2026-06-22  8:31 ` [PATCH v7 1/2] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
@ 2026-06-22  8:31 ` Pankaj Raghav
  2026-06-23 20:21   ` Pankaj Raghav
  1 sibling, 1 reply; 4+ messages in thread
From: Pankaj Raghav @ 2026-06-22  8:31 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

If the underlying block device supports the unmap write zeroes
operation, this flag allows users to quickly preallocate a file with
written extents that contain zeroes. This is beneficial for subsequent
overwrites as it prevents the need for unwritten-to-written extent
conversions, thereby significantly reducing metadata updates and journal
I/O overhead, improving overwrite performance.

Punch the range first so it becomes a hole, update the size via
xfs_falloc_setsize() while it is still a hole (so its xfs_zero_range()
skips it and avoids rezeroing), then convert it to written
zeroed extents. A crash between the size update and the conversion is
safe, as a hole within i_size reads back as zeroes.

Co-developed-by: Lukas Herbolt <lukas@herbolt.com>
Signed-off-by: Lukas Herbolt <lukas@herbolt.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
I went to back calling xfs_falloc_setsize as using xfs_setfilesize would
involve a lot of repetition in the function. By changing the call order
with xfs_falloc_setsize we reuse most of the code.


 fs/xfs/xfs_bmap_util.c | 19 ++++++++++--
 fs/xfs/xfs_bmap_util.h |  1 +
 fs/xfs/xfs_file.c      | 67 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 8dfb3c1e3759..55722b815117 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -643,11 +643,18 @@ xfs_free_eofblocks(
 }
 
 /*
- * Allocate space for a file according to @mode:
+ * Allocate space or convert extents for a file according to @mode:
  *
  * XFS_ALLOC_FILE_SPACE_PREALLOC:
  * Preallocate unwritten extents across the range and mark the inode as
  * preallocated.
+ *
+ * XFS_ALLOC_FILE_SPACE_WRITE_ZEROES:
+ * Allocate written extents over holes and convert unwritten extents in the
+ * range to written extents, initialising both to contain zeroes.
+ *
+ * This function does not update the file size; callers that extend the file
+ * are responsible for updating it once the extents are allocated.
  */
 int
 xfs_alloc_file_space(
@@ -688,6 +695,10 @@ xfs_alloc_file_space(
 		bmapi_flags = XFS_BMAPI_PREALLOC;
 		nr_exts = XFS_IEXT_ADD_NOSPLIT_CNT;
 		break;
+	case XFS_ALLOC_FILE_SPACE_WRITE_ZEROES:
+		bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
+		nr_exts = XFS_IEXT_WRITE_UNWRITTEN_CNT;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -776,8 +787,10 @@ xfs_alloc_file_space(
 			allocatesize_fsb -= imapp->br_blockcount;
 		}
 
-		ip->i_diflags |= XFS_DIFLAG_PREALLOC;
-		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		if (mode == XFS_ALLOC_FILE_SPACE_PREALLOC) {
+			ip->i_diflags |= XFS_DIFLAG_PREALLOC;
+			xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		}
 
 		error = xfs_trans_commit(tp);
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index 232b4c48247e..e3d506ca9610 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -57,6 +57,7 @@ int	xfs_bmap_last_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 /* preallocation and hole punch interface */
 enum xfs_alloc_file_space_mode {
 	XFS_ALLOC_FILE_SPACE_PREALLOC,
+	XFS_ALLOC_FILE_SPACE_WRITE_ZEROES,
 };
 
 int	xfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e90ea6ebdc8e..0e1332ccdf79 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1368,6 +1368,68 @@ xfs_falloc_force_zero(
 	return XFS_TEST_ERROR(ip->i_mount, XFS_ERRTAG_FORCE_ZERO_RANGE);
 }
 
+static int
+xfs_falloc_write_zeroes(
+	struct file		*file,
+	int			mode,
+	loff_t			offset,
+	loff_t			len,
+	struct xfs_zone_alloc_ctx *ac)
+{
+	struct inode		*inode = file_inode(file);
+	struct xfs_inode	*ip = XFS_I(inode);
+	loff_t			new_size = 0;
+	unsigned int		blksize = i_blocksize(inode);
+	xfs_off_t		offset_aligned = round_up(offset, blksize);
+	xfs_off_t		end_aligned = round_down(offset + len, blksize);
+	xfs_off_t		len_aligned = end_aligned - offset_aligned;
+	int			error;
+
+	if (xfs_is_always_cow_inode(ip) ||
+	    !bdev_write_zeroes_unmap_sectors(xfs_inode_buftarg(ip)->bt_bdev))
+		return -EOPNOTSUPP;
+
+	error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
+	if (error)
+		return error;
+
+	/*
+	 *
+	 *    |----------|----------|----------|----------|----------|
+	 *    ^     ^    ^                     ^     ^    ^
+	 *    |     |    |                     |     |    |
+	 *    |   offset |                     |    end   |
+	 *    |          |                     |          |
+	 * offset_rd   offset_ru              end_rd    end_ru
+	 *
+	 * xfs_free_file_space() punches inside from offset_ru -> end_rd. It also
+	 * zeroes offset -> offset_ru and end_rd -> end.
+	 * Only pass offset_ru -> end_rd to be zeroed via xfs_alloc_file_space().
+	 */
+	error = xfs_free_file_space(ip, offset, len, ac);
+	if (error)
+		return error;
+
+	/*
+	 * Publish the new size while the punched range is still a hole, then
+	 * fill it with written zeroes.  Like the other fallocate modes we use
+	 * xfs_falloc_setsize(), but it must run *before* we convert the range
+	 * to written extents: xfs_setattr_size() zeroes [old EOF, new size) via
+	 * xfs_zero_range(), which skips holes, so there is nothing to re-zero.
+	 * It will also writeback partial EOF block before the on-disk size is
+	 * logged.
+	 */
+	error = xfs_falloc_setsize(file, new_size);
+	if (error)
+		return error;
+
+	if (len_aligned > 0)
+		error = xfs_alloc_file_space(ip, offset_aligned, len_aligned,
+				XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
+
+	return error;
+}
+
 /*
  * Punch a hole and prealloc the range.  We use a hole punch rather than
  * unwritten extent conversion for two reasons:
@@ -1473,7 +1535,7 @@ xfs_falloc_allocate_range(
 		(FALLOC_FL_ALLOCATE_RANGE | FALLOC_FL_KEEP_SIZE |	\
 		 FALLOC_FL_PUNCH_HOLE |	FALLOC_FL_COLLAPSE_RANGE |	\
 		 FALLOC_FL_ZERO_RANGE |	FALLOC_FL_INSERT_RANGE |	\
-		 FALLOC_FL_UNSHARE_RANGE)
+		 FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_WRITE_ZEROES)
 
 STATIC long
 __xfs_file_fallocate(
@@ -1525,6 +1587,9 @@ __xfs_file_fallocate(
 	case FALLOC_FL_ALLOCATE_RANGE:
 		error = xfs_falloc_allocate_range(file, mode, offset, len);
 		break;
+	case FALLOC_FL_WRITE_ZEROES:
+		error = xfs_falloc_write_zeroes(file, mode, offset, len, ac);
+		break;
 	default:
 		error = -EOPNOTSUPP;
 		break;
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-22  8:31 ` [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
@ 2026-06-23 20:21   ` Pankaj Raghav
  0 siblings, 0 replies; 4+ messages in thread
From: Pankaj Raghav @ 2026-06-23 20:21 UTC (permalink / raw)
  To: Pankaj Raghav, linux-xfs, hch
  Cc: bfoster, lukas, Darrick J . Wong, dgc, gost.dev, andres,
	kundan.kumar, hch, cem

> +static int
> +xfs_falloc_write_zeroes(
> +	struct file		*file,
> +	int			mode,
> +	loff_t			offset,
> +	loff_t			len,
> +	struct xfs_zone_alloc_ctx *ac)
> +{
> +	struct inode		*inode = file_inode(file);
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	loff_t			new_size = 0;
> +	unsigned int		blksize = i_blocksize(inode);
> +	xfs_off_t		offset_aligned = round_up(offset, blksize);
> +	xfs_off_t		end_aligned = round_down(offset + len, blksize);
> +	xfs_off_t		len_aligned = end_aligned - offset_aligned;
> +	int			error;
> +
> +	if (xfs_is_always_cow_inode(ip) ||
> +	    !bdev_write_zeroes_unmap_sectors(xfs_inode_buftarg(ip)->bt_bdev))
> +		return -EOPNOTSUPP;
> +
> +	error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 *
> +	 *    |----------|----------|----------|----------|----------|
> +	 *    ^     ^    ^                     ^     ^    ^
> +	 *    |     |    |                     |     |    |
> +	 *    |   offset |                     |    end   |
> +	 *    |          |                     |          |
> +	 * offset_rd   offset_ru              end_rd    end_ru
> +	 *
> +	 * xfs_free_file_space() punches inside from offset_ru -> end_rd. It also
> +	 * zeroes offset -> offset_ru and end_rd -> end.
> +	 * Only pass offset_ru -> end_rd to be zeroed via xfs_alloc_file_space().
> +	 */
> +	error = xfs_free_file_space(ip, offset, len, ac);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * Publish the new size while the punched range is still a hole, then
> +	 * fill it with written zeroes.  Like the other fallocate modes we use
> +	 * xfs_falloc_setsize(), but it must run *before* we convert the range
> +	 * to written extents: xfs_setattr_size() zeroes [old EOF, new size) via
> +	 * xfs_zero_range(), which skips holes, so there is nothing to re-zero.
> +	 * It will also writeback partial EOF block before the on-disk size is
> +	 * logged.
> +	 */
> +	error = xfs_falloc_setsize(file, new_size);
> +	if (error)
> +		return error;
> +
> +	if (len_aligned > 0)
> +		error = xfs_alloc_file_space(ip, offset_aligned, len_aligned,
> +				XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
> +
> +	return error;
> +}
> +

Sashiko was not happy with this approach as there are cases where there will not be a data
corruption but we might end up not allocating an extent, therefore, getting an -ENOSPC at a later point.

I went back what Zhang yi pointed out in the previous version wrt semantics[1]. I think the correct
idea should be the following:

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0e1332ccdf79..a27862037d22 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1379,10 +1379,6 @@ xfs_falloc_write_zeroes(
        struct inode            *inode = file_inode(file);
        struct xfs_inode        *ip = XFS_I(inode);
        loff_t                  new_size = 0;
-       unsigned int            blksize = i_blocksize(inode);
-       xfs_off_t               offset_aligned = round_up(offset, blksize);
-       xfs_off_t               end_aligned = round_down(offset + len, blksize);
-       xfs_off_t               len_aligned = end_aligned - offset_aligned;
        int                     error;

        if (xfs_is_always_cow_inode(ip) ||
@@ -1402,9 +1398,11 @@ xfs_falloc_write_zeroes(
         *    |          |                     |          |
         * offset_rd   offset_ru              end_rd    end_ru
         *
-        * xfs_free_file_space() punches inside from offset_ru -> end_rd. It also
-        * zeroes offset -> offset_ru and end_rd -> end.
-        * Only pass offset_ru -> end_rd to be zeroed via xfs_alloc_file_space().
+        * xfs_free_file_space() punches the aligned interior offset_ru -> end_rd
+        * to holes and byte-zeroes the in-range parts of the partial edge blocks,
+        * offset -> offset_ru and end_rd -> end.  xfs_zero_range() only touches
+        * already-written blocks here; it skips holes and unwritten extents, so
+        * unallocated/unwritten edge blocks are left for the allocation below.
         */
        error = xfs_free_file_space(ip, offset, len, ac);
        if (error)
@@ -1423,11 +1421,19 @@ xfs_falloc_write_zeroes(
        if (error)
                return error;

-       if (len_aligned > 0)
-               error = xfs_alloc_file_space(ip, offset_aligned, len_aligned,
-                               XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
-
-       return error;
+       /*
+        * Allocate written, zeroed extents across the range.  xfs_alloc_file_space()
+        * rounds outward to block granularity:
+        *  - holes (the punched interior and any unallocated edge block) are
+        *    allocated and zeroed;
+        *  - unwritten extents (including unwritten edge blocks) are converted to
+        *    written and zeroed;
+        *  - already-written blocks are skipped, so the out-of-range bytes of a
+        *    written edge block keep their data; their in-range bytes were already
+        *    zeroed by xfs_free_file_space() above.
+        */
+       return xfs_alloc_file_space(ip, offset, len,
+                       XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
 }

 /*

We pass offset and len without rounding to xfs_alloc_file_space, and the existing behaviour
correctly handles them. I could add test cases in xfstests to test out all these edge cases so that
we don't regress.

If I don't have anymore comments, I will send a v8 with this approach.

--
Pankaj

[1] https://lore.kernel.org/linux-xfs/557b2e5c-7c65-48de-87a9-6fba21eca99f@huaweicloud.com/

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-23 20:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22  8:31 [PATCH v7 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
2026-06-22  8:31 ` [PATCH v7 1/2] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
2026-06-22  8:31 ` [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
2026-06-23 20:21   ` Pankaj Raghav

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox