[PATCH v6 0/3] add FALLOC_FL_WRITE

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 0/3] add FALLOC_FL_WRITE_ZEROES support to xfs
@ 2026-06-11 11:40 Pankaj Raghav
  2026-06-11 11:40 ` [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t Pankaj Raghav
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Pankaj Raghav @ 2026-06-11 11:40 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

The benefits of FALLOC_FL_WRITE_ZEROES was already discussed as a part
of Zhang Yi's initial patches[1]. Postgres developer Andres also
mentioned they would like to use this feature in Postgres [2].

I tested the changes with fsstress and fsx based on the xfstests patch I
sent recently to test this flag[4]. generic/363 helped me debug the
crash I noticed when I did the initial implementation[3].

Dave initially suggested to create a common helper based on
xfs_iomap_convert_unwritten() but as it can be seen in the previous
version, a lot of the code had to be rewritten. The changes had more in
common with xfs_alloc_file_space(). This version reuses
xfs_alloc_file_space() for write zeroes.

Thanks to Christoph for all the review comments and design suggestions
that were made both offline and online for this series.

Stress test generic/363 generic/127 xfs/131 are passing. I have started
the full xfstest suite for this series.

Changes since v5:
- Add a prep patch to allow xfs_set_filesize to take 64-bit len
  (Sashiko)

Changes since v4:
- Introduce an enum for allocation mode in xfs_alloc_file_space (Christoph)
- Use xfs_set_filesize instead of updating the on-disk size in the
  function.

Changes since v3:
- Introduce xfs_bmap_alloc_or_convert_range() in xfs_iomap.c for easy
  review experience (christoph)
- Add extsz hint and rt support in xfs_bmap_alloc_or_convert_range()

Changes since v2:
- Add allow_write_zeroes to xfs_global so that we can enable this
  feature independent of the HW underneath.

Changes since v1 [5.1 5.2]:
- Added a new function xfs_bmap_alloc_or_convert_range() based on Dave's
  feedback.
- Changed the xfs_falloc_write_zeroes to use
  xfs_bmap_alloc_or_convert_range() instead of doing prealloc and
  convert approach.

[1] https://lore.kernel.org/linux-fsdevel/20250619111806.3546162-1-yi.zhang@huaweicloud.com/
[2] https://lore.kernel.org/linux-fsdevel/20260217055103.GA6174@lst.de/T/#m7935b9bab32bb5ff372507f84803b8753ad1c814
[3] https://lore.kernel.org/linux-xfs/6i2jvzn3lyugjlbgmjzpped3gogzyqv5mpe2uqaifz4vjpaega@pomzoq7ley77/
[4] https://lore.kernel.org/linux-xfs/20260312195308.738189-1-p.raghav@samsung.com/
[5.1] https://lore.kernel.org/linux-xfs/20260309180708.427553-2-lukas@herbolt.com/
[5.2] https://lore.kernel.org/linux-xfs/abC1LvRElctaHPe5@dread/

Pankaj Raghav (3):
  xfs: widen xfs_setfilesize() size argument to xfs_off_t
  xfs: add an allocation mode to xfs_alloc_file_space()
  xfs: add support for FALLOC_FL_WRITE_ZEROES

 fs/xfs/xfs_aops.c      |  2 +-
 fs/xfs/xfs_aops.h      |  2 +-
 fs/xfs/xfs_bmap_util.c | 42 +++++++++++++++++++----
 fs/xfs/xfs_bmap_util.h |  7 +++-
 fs/xfs/xfs_file.c      | 75 +++++++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_trace.h     |  8 ++---
 6 files changed, 119 insertions(+), 17 deletions(-)


base-commit: 46d91a29e0885a3867f49a7da09f0babef2d867f
-- 
2.51.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t
  2026-06-11 11:40 [PATCH v6 0/3] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
@ 2026-06-11 11:40 ` Pankaj Raghav
  2026-06-16 13:15   ` Christoph Hellwig
  2026-06-11 11:40 ` [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
  2026-06-11 11:40 ` [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
  2 siblings, 1 reply; 16+ messages in thread
From: Pankaj Raghav @ 2026-06-11 11:40 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

xfs_setfilesize() takes the written length as a size_t, which is only 32
bits wide on 32-bit architectures. The upcoming FALLOC_FL_WRITE_ZEROES
support calls it with a 64-bit fallocate length that can exceed 4GB.

Sashiko reported this[1].

Widen the size argument to xfs_off_t so large lengths are handled
correctly. The existing writeback caller passes a size_t and is unaffected
by the widening.

[1] https://sashiko.dev/#/patchset/20260604101442.2613872-1-p.raghav%40samsung.com

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 fs/xfs/xfs_aops.c  | 2 +-
 fs/xfs/xfs_aops.h  | 2 +-
 fs/xfs/xfs_trace.h | 8 ++++----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1a82cf625a08..0766c5667b95 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -52,7 +52,7 @@ int
 xfs_setfilesize(
 	struct xfs_inode	*ip,
 	xfs_off_t		offset,
-	size_t			size)
+	xfs_off_t		size)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 5a7a0f1a0b49..d8c4051f2a85 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -9,7 +9,7 @@
 extern const struct address_space_operations xfs_address_space_operations;
 extern const struct address_space_operations xfs_dax_aops;
 
-int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
+int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, xfs_off_t size);
 void xfs_end_bio(struct bio *bio);
 
 #endif /* __XFS_AOPS_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index d478693674f9..d5b50c033873 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1857,7 +1857,7 @@ DEFINE_IMAP_EVENT(xfs_iomap_alloc);
 DEFINE_IMAP_EVENT(xfs_iomap_found);
 
 DECLARE_EVENT_CLASS(xfs_simple_io_class,
-	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
+	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, xfs_off_t count),
 	TP_ARGS(ip, offset, count),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
@@ -1865,7 +1865,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
 		__field(loff_t, isize)
 		__field(loff_t, disize)
 		__field(loff_t, offset)
-		__field(size_t, count)
+		__field(xfs_off_t, count)
 	),
 	TP_fast_assign(
 		__entry->dev = VFS_I(ip)->i_sb->s_dev;
@@ -1876,7 +1876,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
 		__entry->count = count;
 	),
 	TP_printk("dev %d:%d ino 0x%llx isize 0x%llx disize 0x%llx "
-		  "pos 0x%llx bytecount 0x%zx",
+		  "pos 0x%llx bytecount 0x%llx",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->isize,
@@ -1887,7 +1887,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
 
 #define DEFINE_SIMPLE_IO_EVENT(name)	\
 DEFINE_EVENT(xfs_simple_io_class, name,	\
-	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),	\
+	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, xfs_off_t count),	\
 	TP_ARGS(ip, offset, count))
 DEFINE_SIMPLE_IO_EVENT(xfs_delalloc_enospc);
 DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space()
  2026-06-11 11:40 [PATCH v6 0/3] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
  2026-06-11 11:40 ` [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t Pankaj Raghav
@ 2026-06-11 11:40 ` Pankaj Raghav
  2026-06-16 13:15   ` Christoph Hellwig
  2026-06-11 11:40 ` [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
  2 siblings, 1 reply; 16+ messages in thread
From: Pankaj Raghav @ 2026-06-11 11:40 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

xfs_alloc_file_space() hardcodes XFS_BMAPI_PREALLOC to preallocate
unwritten extents across a range.

In preparation for FALLOC_FL_WRITE_ZEROES, add an explicit allocation
mode argument, enum xfs_alloc_file_space_mode, and derive the xfs_bmapi
flags from it. The only mode for now is XFS_ALLOC_FILE_SPACE_PREALLOC,
which preallocates unwritten extents and marks the inode as preallocated
exactly as before, so there is no functional change.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 fs/xfs/xfs_bmap_util.c | 25 +++++++++++++++++++++----
 fs/xfs/xfs_bmap_util.h |  6 +++++-
 fs/xfs/xfs_file.c      |  9 ++++++---
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 3b9f262f8e91..8dfb3c1e3759 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -642,11 +642,19 @@ xfs_free_eofblocks(
 	return error;
 }
 
+/*
+ * Allocate space for a file according to @mode:
+ *
+ * XFS_ALLOC_FILE_SPACE_PREALLOC:
+ * Preallocate unwritten extents across the range and mark the inode as
+ * preallocated.
+ */
 int
 xfs_alloc_file_space(
 	struct xfs_inode	*ip,
 	xfs_off_t		offset,
-	xfs_off_t		len)
+	xfs_off_t		len,
+	enum xfs_alloc_file_space_mode mode)
 {
 	xfs_mount_t		*mp = ip->i_mount;
 	xfs_off_t		count;
@@ -657,6 +665,7 @@ xfs_alloc_file_space(
 	int			rt;
 	xfs_trans_t		*tp;
 	xfs_bmbt_irec_t		imaps[1], *imapp;
+	uint32_t		bmapi_flags, nr_exts;
 	int			error;
 
 	if (xfs_is_always_cow_inode(ip))
@@ -674,6 +683,15 @@ xfs_alloc_file_space(
 	if (len <= 0)
 		return -EINVAL;
 
+	switch (mode) {
+	case XFS_ALLOC_FILE_SPACE_PREALLOC:
+		bmapi_flags = XFS_BMAPI_PREALLOC;
+		nr_exts = XFS_IEXT_ADD_NOSPLIT_CNT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
 	rt = XFS_IS_REALTIME_INODE(ip);
 	extsz = xfs_get_extsz_hint(ip);
 
@@ -733,8 +751,7 @@ xfs_alloc_file_space(
 		if (error)
 			break;
 
-		error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK,
-				XFS_IEXT_ADD_NOSPLIT_CNT);
+		error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK, nr_exts);
 		if (error)
 			goto error;
 
@@ -748,7 +765,7 @@ xfs_alloc_file_space(
 		 * will eventually reach the requested range.
 		 */
 		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
-				allocatesize_fsb, XFS_BMAPI_PREALLOC, 0, imapp,
+				allocatesize_fsb, bmapi_flags, 0, imapp,
 				&nimaps);
 		if (error) {
 			if (error != -ENOSR)
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index c477b3361630..232b4c48247e 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -55,8 +55,12 @@ int	xfs_bmap_last_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 			     int *is_empty);
 
 /* preallocation and hole punch interface */
+enum xfs_alloc_file_space_mode {
+	XFS_ALLOC_FILE_SPACE_PREALLOC,
+};
+
 int	xfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
-		xfs_off_t len);
+		xfs_off_t len, enum xfs_alloc_file_space_mode mode);
 int	xfs_free_file_space(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t len, struct xfs_zone_alloc_ctx *ac);
 int	xfs_collapse_file_space(struct xfs_inode *, xfs_off_t offset,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 845a97c9b063..e90ea6ebdc8e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1406,7 +1406,8 @@ xfs_falloc_zero_range(
 		len = round_up(offset + len, blksize) -
 			round_down(offset, blksize);
 		offset = round_down(offset, blksize);
-		error = xfs_alloc_file_space(ip, offset, len);
+		error = xfs_alloc_file_space(ip, offset, len,
+				XFS_ALLOC_FILE_SPACE_PREALLOC);
 	}
 	if (error)
 		return error;
@@ -1432,7 +1433,8 @@ xfs_falloc_unshare_range(
 	if (error)
 		return error;
 
-	error = xfs_alloc_file_space(XFS_I(inode), offset, len);
+	error = xfs_alloc_file_space(XFS_I(inode), offset, len,
+			XFS_ALLOC_FILE_SPACE_PREALLOC);
 	if (error)
 		return error;
 	return xfs_falloc_setsize(file, new_size);
@@ -1460,7 +1462,8 @@ xfs_falloc_allocate_range(
 	if (error)
 		return error;
 
-	error = xfs_alloc_file_space(XFS_I(inode), offset, len);
+	error = xfs_alloc_file_space(XFS_I(inode), offset, len,
+			XFS_ALLOC_FILE_SPACE_PREALLOC);
 	if (error)
 		return error;
 	return xfs_falloc_setsize(file, new_size);
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-11 11:40 [PATCH v6 0/3] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
  2026-06-11 11:40 ` [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t Pankaj Raghav
  2026-06-11 11:40 ` [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
@ 2026-06-11 11:40 ` Pankaj Raghav
  2026-06-16 13:31   ` Christoph Hellwig
  2 siblings, 1 reply; 16+ messages in thread
From: Pankaj Raghav @ 2026-06-11 11:40 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

If the underlying block device supports the unmap write zeroes
operation, this flag allows users to quickly preallocate a file with
written extents that contain zeroes. This is beneficial for subsequent
overwrites as it prevents the need for unwritten-to-written extent
conversions, thereby significantly reducing metadata updates and journal
I/O overhead, improving overwrite performance.

Co-developed-by: Lukas Herbolt <lukas@herbolt.com>
Signed-off-by: Lukas Herbolt <lukas@herbolt.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 fs/xfs/xfs_bmap_util.c | 19 ++++++++++--
 fs/xfs/xfs_bmap_util.h |  1 +
 fs/xfs/xfs_file.c      | 66 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 8dfb3c1e3759..55722b815117 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -643,11 +643,18 @@ xfs_free_eofblocks(
 }
 
 /*
- * Allocate space for a file according to @mode:
+ * Allocate space or convert extents for a file according to @mode:
  *
  * XFS_ALLOC_FILE_SPACE_PREALLOC:
  * Preallocate unwritten extents across the range and mark the inode as
  * preallocated.
+ *
+ * XFS_ALLOC_FILE_SPACE_WRITE_ZEROES:
+ * Allocate written extents over holes and convert unwritten extents in the
+ * range to written extents, initialising both to contain zeroes.
+ *
+ * This function does not update the file size; callers that extend the file
+ * are responsible for updating it once the extents are allocated.
  */
 int
 xfs_alloc_file_space(
@@ -688,6 +695,10 @@ xfs_alloc_file_space(
 		bmapi_flags = XFS_BMAPI_PREALLOC;
 		nr_exts = XFS_IEXT_ADD_NOSPLIT_CNT;
 		break;
+	case XFS_ALLOC_FILE_SPACE_WRITE_ZEROES:
+		bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
+		nr_exts = XFS_IEXT_WRITE_UNWRITTEN_CNT;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -776,8 +787,10 @@ xfs_alloc_file_space(
 			allocatesize_fsb -= imapp->br_blockcount;
 		}
 
-		ip->i_diflags |= XFS_DIFLAG_PREALLOC;
-		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		if (mode == XFS_ALLOC_FILE_SPACE_PREALLOC) {
+			ip->i_diflags |= XFS_DIFLAG_PREALLOC;
+			xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		}
 
 		error = xfs_trans_commit(tp);
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index 232b4c48247e..e3d506ca9610 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -57,6 +57,7 @@ int	xfs_bmap_last_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 /* preallocation and hole punch interface */
 enum xfs_alloc_file_space_mode {
 	XFS_ALLOC_FILE_SPACE_PREALLOC,
+	XFS_ALLOC_FILE_SPACE_WRITE_ZEROES,
 };
 
 int	xfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e90ea6ebdc8e..37623baaaed6 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1368,6 +1368,67 @@ xfs_falloc_force_zero(
 	return XFS_TEST_ERROR(ip->i_mount, XFS_ERRTAG_FORCE_ZERO_RANGE);
 }
 
+static int
+xfs_falloc_write_zeroes(
+	struct file		*file,
+	int			mode,
+	loff_t			offset,
+	loff_t			len,
+	struct xfs_zone_alloc_ctx *ac)
+{
+	struct inode		*inode = file_inode(file);
+	struct xfs_inode	*ip = XFS_I(inode);
+	loff_t			new_size = 0;
+	loff_t			old_size = XFS_ISIZE(ip);
+	int			error;
+	unsigned int		blksize = i_blocksize(inode);
+	loff_t			offset_aligned = round_down(offset, blksize);
+	bool			did_zero;
+
+	if (xfs_is_always_cow_inode(ip) ||
+	    !bdev_write_zeroes_unmap_sectors(xfs_inode_buftarg(ip)->bt_bdev))
+		return -EOPNOTSUPP;
+
+	error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
+	if (error)
+		return error;
+
+	error = xfs_free_file_space(ip, offset, len, ac);
+	if (error)
+		return error;
+
+	/*
+	 * Zero the tail of the old EOF block and any space up to the new
+	 * offset.
+	 * In the usual truncate path, xfs_falloc_setsize takes care of
+	 * zeroing those blocks.
+	 */
+	if (offset_aligned > old_size) {
+		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
+		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
+				NULL, &did_zero);
+		if (error)
+			return error;
+
+	}
+
+	error = xfs_alloc_file_space(ip, offset, len,
+			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
+	if (error)
+		return error;
+
+	/*
+	 * xfs_falloc_setsize() would re-zero the written extents via
+	 * iomap_zero_range(). Use xfs_setfilesize() instead.
+	 * Update in-core i_size first as xfs_setfilesize() clamps the on-disk
+	 * size to it.
+	 */
+	if (new_size > i_size_read(inode))
+		i_size_write(inode, new_size);
+
+	return xfs_setfilesize(ip, offset, len);
+}
+
 /*
  * Punch a hole and prealloc the range.  We use a hole punch rather than
  * unwritten extent conversion for two reasons:
@@ -1473,7 +1534,7 @@ xfs_falloc_allocate_range(
 		(FALLOC_FL_ALLOCATE_RANGE | FALLOC_FL_KEEP_SIZE |	\
 		 FALLOC_FL_PUNCH_HOLE |	FALLOC_FL_COLLAPSE_RANGE |	\
 		 FALLOC_FL_ZERO_RANGE |	FALLOC_FL_INSERT_RANGE |	\
-		 FALLOC_FL_UNSHARE_RANGE)
+		 FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_WRITE_ZEROES)
 
 STATIC long
 __xfs_file_fallocate(
@@ -1525,6 +1586,9 @@ __xfs_file_fallocate(
 	case FALLOC_FL_ALLOCATE_RANGE:
 		error = xfs_falloc_allocate_range(file, mode, offset, len);
 		break;
+	case FALLOC_FL_WRITE_ZEROES:
+		error = xfs_falloc_write_zeroes(file, mode, offset, len, ac);
+		break;
 	default:
 		error = -EOPNOTSUPP;
 		break;
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t
  2026-06-11 11:40 ` [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t Pankaj Raghav
@ 2026-06-16 13:15   ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-16 13:15 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: linux-xfs, bfoster, lukas, Darrick J . Wong, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

On Thu, Jun 11, 2026 at 01:40:27PM +0200, Pankaj Raghav wrote:
> xfs_setfilesize() takes the written length as a size_t, which is only 32
> bits wide on 32-bit architectures. The upcoming FALLOC_FL_WRITE_ZEROES
> support calls it with a 64-bit fallocate length that can exceed 4GB.

Others might be better than me in arguing what type the new size
parameter should be - xfs_off_t looks a bit odd, but the VFS uses loff_t
for the length which translates to xfs_off_t, so at least it is
consistent.

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space()
  2026-06-11 11:40 ` [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
@ 2026-06-16 13:15   ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-16 13:15 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: linux-xfs, bfoster, lukas, Darrick J . Wong, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-11 11:40 ` [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
@ 2026-06-16 13:31   ` Christoph Hellwig
  2026-06-17  9:44     ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-16 13:31 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: linux-xfs, bfoster, lukas, Darrick J . Wong, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

[API questions for Zhang and -fsdevel/ -api below)

> +	unsigned int		blksize = i_blocksize(inode);
> +	loff_t			offset_aligned = round_down(offset, blksize);

I think this actually needs to found up instead of rounding down.

> +	/*
> +	 * Zero the tail of the old EOF block and any space up to the new
> +	 * offset.
> +	 * In the usual truncate path, xfs_falloc_setsize takes care of
> +	 * zeroing those blocks.
> +	 */
> +	if (offset_aligned > old_size) {
> +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
> +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
> +				NULL, &did_zero);
> +		if (error)
> +			return error;
> +	}

... then this will properly zero from the old i_size to the first block
boundary after the old size.

> +	error = xfs_alloc_file_space(ip, offset, len,
> +			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);

... and here we need to pass offset_aligned instead of offset and
a new calculated len based on the last block boundary, and then
zero again after that.  That is assuming FALLOC_FL_WRITE_ZEROES
allows unaligned ranges for file systems.  The block code doesn't,
but I can't quite follow the ext4 code if it does or not, and there
is no mention of FALLOC_FL_WRITE_ZEROES even in the latest man-pages
tree.

Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
and make sure no existing data before the range is lost and the
entire range is zeroed?

> +	if (error)
> +		return error;
> +
> +	/*
> +	 * xfs_falloc_setsize() would re-zero the written extents via
> +	 * iomap_zero_range(). Use xfs_setfilesize() instead.
> +	 * Update in-core i_size first as xfs_setfilesize() clamps the on-disk
> +	 * size to it.
> +	 */
> +	if (new_size > i_size_read(inode))
> +		i_size_write(inode, new_size);

I think Sashiko is right that we need a pagecache_isize_extended and
filemap_write_and_wait_range calls here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-16 13:31   ` Christoph Hellwig
@ 2026-06-17  9:44     ` Pankaj Raghav (Samsung)
  2026-06-18  3:22       ` Zhang Yi
  2026-06-18  9:00       ` Christoph Hellwig
  0 siblings, 2 replies; 16+ messages in thread
From: Pankaj Raghav (Samsung) @ 2026-06-17  9:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:
> [API questions for Zhang and -fsdevel/ -api below)
> 
> > +	unsigned int		blksize = i_blocksize(inode);
> > +	loff_t			offset_aligned = round_down(offset, blksize);
> 
> I think this actually needs to found up instead of rounding down.
> 
> > +	/*
> > +	 * Zero the tail of the old EOF block and any space up to the new
> > +	 * offset.
> > +	 * In the usual truncate path, xfs_falloc_setsize takes care of
> > +	 * zeroing those blocks.
> > +	 */
> > +	if (offset_aligned > old_size) {
> > +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
> > +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
> > +				NULL, &did_zero);
> > +		if (error)
> > +			return error;
> > +	}
> 
> ... then this will properly zero from the old i_size to the first block
> boundary after the old size.

Hmm, right now we do this:

|----------|----------|----------|
    ^      ^     ^    ^
    |      |     |    |
 old_size  |   offset |
           |          |
	off_rd       off_ru

At the moment, we zero out old_size to off_rd and pass offset to
xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd.

What you are proposing is to zero out old_size to off_ru, and pass
off_ru to xfs_alloc_file_space. I don't exactly understand the
difference.

> 
> > +	error = xfs_alloc_file_space(ip, offset, len,
> > +			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
> 
> ... and here we need to pass offset_aligned instead of offset and
> a new calculated len based on the last block boundary, and then
> zero again after that.  That is assuming FALLOC_FL_WRITE_ZEROES
> allows unaligned ranges for file systems.  The block code doesn't,
> but I can't quite follow the ext4 code if it does or not, and there
> is no mention of FALLOC_FL_WRITE_ZEROES even in the latest man-pages
> tree.


I can't find any references to FALLOC_FL_WRITE_ZEROES in the man pages
master branch. Maybe we missed it. I can send a separate patch for that
once we have some clarity on the API.
> 
> Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
> and make sure no existing data before the range is lost and the
> entire range is zeroed?
> 

I added FALLOC_FL_WRITE_ZEROES support to ltp (both fsx and fsstress).
For example, generic/363 tests for unaligned writes and checks for any
stale data. By default, I think we do unaligned reads, writes and
truncate in fsx.

> 
> > +	if (error)
> > +		return error;
> > +
> > +	/*
> > +	 * xfs_falloc_setsize() would re-zero the written extents via
> > +	 * iomap_zero_range(). Use xfs_setfilesize() instead.
> > +	 * Update in-core i_size first as xfs_setfilesize() clamps the on-disk
> > +	 * size to it.
> > +	 */
> > +	if (new_size > i_size_read(inode))
> > +		i_size_write(inode, new_size);
> 
> I think Sashiko is right that we need a pagecache_isize_extended and
> filemap_write_and_wait_range calls here.
> 

Ok. Current fsx or fsstress did not expose this
problem. I will look into this. Thanks Christoph.

--
Pankaj

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-17  9:44     ` Pankaj Raghav (Samsung)
@ 2026-06-18  3:22       ` Zhang Yi
  2026-06-18  8:18         ` Pankaj Raghav (Samsung)
  2026-06-18  8:59         ` Christoph Hellwig
  2026-06-18  9:00       ` Christoph Hellwig
  1 sibling, 2 replies; 16+ messages in thread
From: Zhang Yi @ 2026-06-18  3:22 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung), Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, linux-fsdevel, linux-api

On 6/17/2026 5:44 PM, Pankaj Raghav (Samsung) wrote:
> On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:
>> [API questions for Zhang and -fsdevel/ -api below)
>>
>>> +	unsigned int		blksize = i_blocksize(inode);
>>> +	loff_t			offset_aligned = round_down(offset, blksize);
>>
>> I think this actually needs to found up instead of rounding down.
>>
>>> +	/*
>>> +	 * Zero the tail of the old EOF block and any space up to the new
>>> +	 * offset.
>>> +	 * In the usual truncate path, xfs_falloc_setsize takes care of
>>> +	 * zeroing those blocks.
>>> +	 */
>>> +	if (offset_aligned > old_size) {
>>> +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
>>> +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
>>> +				NULL, &did_zero);
>>> +		if (error)
>>> +			return error;
>>> +	}
>>
>> ... then this will properly zero from the old i_size to the first block
>> boundary after the old size.
> 
> Hmm, right now we do this:
> 
> |----------|----------|----------|
>     ^      ^     ^    ^
>     |      |     |    |
>  old_size  |   offset |
>            |          |
> 	off_rd       off_ru
> 
> At the moment, we zero out old_size to off_rd and pass offset to
> xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd.
> 
> What you are proposing is to zero out old_size to off_ru, and pass
> off_ru to xfs_alloc_file_space. I don't exactly understand the
> difference.

IMO, FALLOC_FL_WRITE_ZEROES should handle the unaligned cases, if the
'offset' and 'end' are not block-size aligned, then:

1) if the two blocks straddling the boundaries have not yet been allocated,
   or allocated as unwritten, we should round outward the allocation range
   and zero out all allocated blocks, including those two boundary blocks.
2) if the blocks at the boundaries are already in the written state — which
   can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
   should be careful here: we should only zero the ranges [offset, offset_ru)
   and [end_rd, end) for the boundary blocks, leaving the already-written
   portions of the boundary blocks intact.

Thoughs?

Regarding the second point, the current ext4 implementation has an issue —
it zeroes out the entire boundary blocks. I overlooked this previously, and
I appreciate you pointing it out.

> 
>>
>>> +	error = xfs_alloc_file_space(ip, offset, len,
>>> +			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
>>
>> ... and here we need to pass offset_aligned instead of offset and
>> a new calculated len based on the last block boundary, and then
>> zero again after that.  That is assuming FALLOC_FL_WRITE_ZEROES
>> allows unaligned ranges for file systems.  The block code doesn't,
>> but I can't quite follow the ext4 code if it does or not, and there
>> is no mention of FALLOC_FL_WRITE_ZEROES even in the latest man-pages
>> tree.
> 
> 
> I can't find any references to FALLOC_FL_WRITE_ZEROES in the man pages
> master branch. Maybe we missed it. I can send a separate patch for that
> once we have some clarity on the API.

Yeah, I missed to update the man pages last year. Thanks.

Best Regards,
Yi.

>>
>> Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
>> and make sure no existing data before the range is lost and the
>> entire range is zeroed?
>>
> 
> I added FALLOC_FL_WRITE_ZEROES support to ltp (both fsx and fsstress).
> For example, generic/363 tests for unaligned writes and checks for any
> stale data. By default, I think we do unaligned reads, writes and
> truncate in fsx.
> 
>>
>>> +	if (error)
>>> +		return error;
>>> +
>>> +	/*
>>> +	 * xfs_falloc_setsize() would re-zero the written extents via
>>> +	 * iomap_zero_range(). Use xfs_setfilesize() instead.
>>> +	 * Update in-core i_size first as xfs_setfilesize() clamps the on-disk
>>> +	 * size to it.
>>> +	 */
>>> +	if (new_size > i_size_read(inode))
>>> +		i_size_write(inode, new_size);
>>
>> I think Sashiko is right that we need a pagecache_isize_extended and
>> filemap_write_and_wait_range calls here.
>>
> 
> Ok. Current fsx or fsstress did not expose this
> problem. I will look into this. Thanks Christoph.
> 
> --
> Pankaj
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  3:22       ` Zhang Yi
@ 2026-06-18  8:18         ` Pankaj Raghav (Samsung)
  2026-06-18  8:57           ` Christoph Hellwig
  2026-06-18  8:59         ` Christoph Hellwig
  1 sibling, 1 reply; 16+ messages in thread
From: Pankaj Raghav (Samsung) @ 2026-06-18  8:18 UTC (permalink / raw)
  To: Zhang Yi, hch
  Cc: Christoph Hellwig, Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	linux-fsdevel, linux-api

On Thu, Jun 18, 2026 at 11:22:45AM +0800, Zhang Yi wrote:
> On 6/17/2026 5:44 PM, Pankaj Raghav (Samsung) wrote:
> > On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:
> >> [API questions for Zhang and -fsdevel/ -api below)
> >>
> >>> +	unsigned int		blksize = i_blocksize(inode);
> >>> +	loff_t			offset_aligned = round_down(offset, blksize);
> >>
> >> I think this actually needs to found up instead of rounding down.
> >>
> >>> +	/*
> >>> +	 * Zero the tail of the old EOF block and any space up to the new
> >>> +	 * offset.
> >>> +	 * In the usual truncate path, xfs_falloc_setsize takes care of
> >>> +	 * zeroing those blocks.
> >>> +	 */
> >>> +	if (offset_aligned > old_size) {
> >>> +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
> >>> +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
> >>> +				NULL, &did_zero);
> >>> +		if (error)
> >>> +			return error;
> >>> +	}
> >>
> >> ... then this will properly zero from the old i_size to the first block
> >> boundary after the old size.
> > 
> > Hmm, right now we do this:
> > 
> > |----------|----------|----------|
> >     ^      ^     ^    ^
> >     |      |     |    |
> >  old_size  |   offset |
> >            |          |
> > 	off_rd       off_ru
> > 
> > At the moment, we zero out old_size to off_rd and pass offset to
> > xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd.
> > 
> > What you are proposing is to zero out old_size to off_ru, and pass
> > off_ru to xfs_alloc_file_space. I don't exactly understand the
> > difference.
> 
> IMO, FALLOC_FL_WRITE_ZEROES should handle the unaligned cases, if the
> 'offset' and 'end' are not block-size aligned, then:
> 
> 1) if the two blocks straddling the boundaries have not yet been allocated,
>    or allocated as unwritten, we should round outward the allocation range
>    and zero out all allocated blocks, including those two boundary blocks.
> 2) if the blocks at the boundaries are already in the written state — which
>    can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
>    should be careful here: we should only zero the ranges [offset, offset_ru)
>    and [end_rd, end) for the boundary blocks, leaving the already-written
>    portions of the boundary blocks intact.
> 
> Thoughs?

Ok, this makes sense to me.

@Christoph, now I understood your reply about rounding up and rounding
down.

So, I could do xfs_zero_range(offset, offset_ru)[1] and xfs_zero_range(end_rd, end).
(offset_ru, end_rd) will be using the accelerated XFS_BMAPI_ZERO to 
zero out the extents. 

I also need to add pagecache_isize_extended and filemap_write_and_wait_range
to persist the xfs_zero_range calls before we call setfilesize.

xfs_zero_range should take care of the boundary blocks so that we don't
overwrite any data or zeroing out the unallocated or unwritten blocks as
pointed out in 1 and 2.

Let me know what you think. I am also wondering how fsx did not trigger
the boundary block edge case where the current impl might zero out user
data in the boundary blocks.

[1] if old_size < offset, then xfs_zero_range(old_size, offset_ru)) 
--
Pankaj

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  8:18         ` Pankaj Raghav (Samsung)
@ 2026-06-18  8:57           ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-18  8:57 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Zhang Yi, Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	linux-fsdevel, linux-api

On Thu, Jun 18, 2026 at 10:18:49AM +0200, Pankaj Raghav (Samsung) wrote:
> So, I could do xfs_zero_range(offset, offset_ru)[1] and xfs_zero_range(end_rd, end).
> (offset_ru, end_rd) will be using the accelerated XFS_BMAPI_ZERO to 
> zero out the extents. 

Yeah.

> I also need to add pagecache_isize_extended and filemap_write_and_wait_range
> to persist the xfs_zero_range calls before we call setfilesize.

Yeah,  Or we need to find a way to use xfs_falloc_setsize after all
which would share all that code, although I'm not really sure how
that would work best.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  3:22       ` Zhang Yi
  2026-06-18  8:18         ` Pankaj Raghav (Samsung)
@ 2026-06-18  8:59         ` Christoph Hellwig
  2026-06-18 10:26           ` Zhang Yi
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-18  8:59 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Pankaj Raghav (Samsung), Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	linux-fsdevel, linux-api

On Thu, Jun 18, 2026 at 11:22:45AM +0800, Zhang Yi wrote:
> 1) if the two blocks straddling the boundaries have not yet been allocated,
>    or allocated as unwritten, we should round outward the allocation range
>    and zero out all allocated blocks, including those two boundary blocks.
> 2) if the blocks at the boundaries are already in the written state — which
>    can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
>    should be careful here: we should only zero the ranges [offset, offset_ru)
>    and [end_rd, end) for the boundary blocks, leaving the already-written
>    portions of the boundary blocks intact.
> 
> Thoughs?

Yes.

> Regarding the second point, the current ext4 implementation has an issue —
> it zeroes out the entire boundary blocks. I overlooked this previously, and
> I appreciate you pointing it out.

Which means we're missing test coverage for this as well..


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-17  9:44     ` Pankaj Raghav (Samsung)
  2026-06-18  3:22       ` Zhang Yi
@ 2026-06-18  9:00       ` Christoph Hellwig
  2026-06-18  9:28         ` Pankaj Raghav (Samsung)
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-18  9:00 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

On Wed, Jun 17, 2026 at 11:44:47AM +0200, Pankaj Raghav (Samsung) wrote:
> > Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
> > and make sure no existing data before the range is lost and the
> > entire range is zeroed?
> > 
> 
> I added FALLOC_FL_WRITE_ZEROES support to ltp (both fsx and fsstress).
> For example, generic/363 tests for unaligned writes and checks for any
> stale data. By default, I think we do unaligned reads, writes and
> truncate in fsx.

But I guess not unaligned FALLOC_FL_WRITE_ZEROES?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  9:00       ` Christoph Hellwig
@ 2026-06-18  9:28         ` Pankaj Raghav (Samsung)
  2026-06-18  9:36           ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Pankaj Raghav (Samsung) @ 2026-06-18  9:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

On Thu, Jun 18, 2026 at 02:00:16AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 17, 2026 at 11:44:47AM +0200, Pankaj Raghav (Samsung) wrote:
> > > Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
> > > and make sure no existing data before the range is lost and the
> > > entire range is zeroed?
> > > 
> > 
> > I added FALLOC_FL_WRITE_ZEROES support to ltp (both fsx and fsstress).
> > For example, generic/363 tests for unaligned writes and checks for any
> > stale data. By default, I think we do unaligned reads, writes and
> > truncate in fsx.
> 
> But I guess not unaligned FALLOC_FL_WRITE_ZEROES?

        -r readbdy: 4096 would make reads page aligned (default 1)
        -t truncbdy: 4096 would make truncates page aligned (default 1)
        -w writebdy: 4096 would make writes page aligned (default 1)

FALLOC_FL_WRITE_ZEROES comes under truncate. So I would assume we also
do that. That is how I also found the issue with offset > EOF. I will
take a look or else, I will add a test case to test this condition!

Thanks.
--
Pankaj

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  9:28         ` Pankaj Raghav (Samsung)
@ 2026-06-18  9:36           ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-18  9:36 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Christoph Hellwig, Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	Zhang Yi, linux-fsdevel, linux-api

On Thu, Jun 18, 2026 at 11:28:15AM +0200, Pankaj Raghav (Samsung) wrote:
> > But I guess not unaligned FALLOC_FL_WRITE_ZEROES?
> 
>         -r readbdy: 4096 would make reads page aligned (default 1)
>         -t truncbdy: 4096 would make truncates page aligned (default 1)
>         -w writebdy: 4096 would make writes page aligned (default 1)
> 
> FALLOC_FL_WRITE_ZEROES comes under truncate. So I would assume we also
> do that. That is how I also found the issue with offset > EOF. I will
> take a look or else, I will add a test case to test this condition!

A targeted test using xfs_io that does FALLOC_FL_WRITE_ZEROES on an
unaligned range and then checks that the data around it is preserved
while the unaligned data in the range is zeroed would also be useful.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-18  8:59         ` Christoph Hellwig
@ 2026-06-18 10:26           ` Zhang Yi
  0 siblings, 0 replies; 16+ messages in thread
From: Zhang Yi @ 2026-06-18 10:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pankaj Raghav (Samsung), Pankaj Raghav, linux-xfs, bfoster, lukas,
	Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	linux-fsdevel, linux-api

On 6/18/2026 4:59 PM, Christoph Hellwig wrote:
> On Thu, Jun 18, 2026 at 11:22:45AM +0800, Zhang Yi wrote:
>> 1) if the two blocks straddling the boundaries have not yet been allocated,
>>    or allocated as unwritten, we should round outward the allocation range
>>    and zero out all allocated blocks, including those two boundary blocks.
>> 2) if the blocks at the boundaries are already in the written state — which
>>    can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
>>    should be careful here: we should only zero the ranges [offset, offset_ru)
>>    and [end_rd, end) for the boundary blocks, leaving the already-written
>>    portions of the boundary blocks intact.
>>
>> Thoughs?
> 
> Yes.
> 
>> Regarding the second point, the current ext4 implementation has an issue —
>> it zeroes out the entire boundary blocks. I overlooked this previously, and
>> I appreciate you pointing it out.
> 
> Which means we're missing test coverage for this as well..
> 

Ha, I just re-checked the ext4 implementation and found that scenario 2 is
actually fine, Sorry I misread the code earlier. Fortunately, no data
corruption occurred. :-)

The real issue is in scenario 1, where the boundary blocks are not correctly
converted to written-type extents.

As for testing the unaligned case, I also agree that we should explicitly add
a dedicated test for it. Relying on existing fsstress and fsx is not reliable
enough in this regard.

Thanks,
Yi.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-06-18 10:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 11:40 [PATCH v6 0/3] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
2026-06-11 11:40 ` [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t Pankaj Raghav
2026-06-16 13:15   ` Christoph Hellwig
2026-06-11 11:40 ` [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
2026-06-16 13:15   ` Christoph Hellwig
2026-06-11 11:40 ` [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
2026-06-16 13:31   ` Christoph Hellwig
2026-06-17  9:44     ` Pankaj Raghav (Samsung)
2026-06-18  3:22       ` Zhang Yi
2026-06-18  8:18         ` Pankaj Raghav (Samsung)
2026-06-18  8:57           ` Christoph Hellwig
2026-06-18  8:59         ` Christoph Hellwig
2026-06-18 10:26           ` Zhang Yi
2026-06-18  9:00       ` Christoph Hellwig
2026-06-18  9:28         ` Pankaj Raghav (Samsung)
2026-06-18  9:36           ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.