Linux XFS filesystem development
 help / color / mirror / Atom feed
* [PATCH v6 0/3] add FALLOC_FL_WRITE_ZEROES support to xfs
@ 2026-06-11 11:40 Pankaj Raghav
  2026-06-11 11:40 ` [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t Pankaj Raghav
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Pankaj Raghav @ 2026-06-11 11:40 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

The benefits of FALLOC_FL_WRITE_ZEROES was already discussed as a part
of Zhang Yi's initial patches[1]. Postgres developer Andres also
mentioned they would like to use this feature in Postgres [2].

I tested the changes with fsstress and fsx based on the xfstests patch I
sent recently to test this flag[4]. generic/363 helped me debug the
crash I noticed when I did the initial implementation[3].

Dave initially suggested to create a common helper based on
xfs_iomap_convert_unwritten() but as it can be seen in the previous
version, a lot of the code had to be rewritten. The changes had more in
common with xfs_alloc_file_space(). This version reuses
xfs_alloc_file_space() for write zeroes.

Thanks to Christoph for all the review comments and design suggestions
that were made both offline and online for this series.

Stress test generic/363 generic/127 xfs/131 are passing. I have started
the full xfstest suite for this series.

Changes since v5:
- Add a prep patch to allow xfs_set_filesize to take 64-bit len
  (Sashiko)

Changes since v4:
- Introduce an enum for allocation mode in xfs_alloc_file_space (Christoph)
- Use xfs_set_filesize instead of updating the on-disk size in the
  function.

Changes since v3:
- Introduce xfs_bmap_alloc_or_convert_range() in xfs_iomap.c for easy
  review experience (christoph)
- Add extsz hint and rt support in xfs_bmap_alloc_or_convert_range()

Changes since v2:
- Add allow_write_zeroes to xfs_global so that we can enable this
  feature independent of the HW underneath.

Changes since v1 [5.1 5.2]:
- Added a new function xfs_bmap_alloc_or_convert_range() based on Dave's
  feedback.
- Changed the xfs_falloc_write_zeroes to use
  xfs_bmap_alloc_or_convert_range() instead of doing prealloc and
  convert approach.

[1] https://lore.kernel.org/linux-fsdevel/20250619111806.3546162-1-yi.zhang@huaweicloud.com/
[2] https://lore.kernel.org/linux-fsdevel/20260217055103.GA6174@lst.de/T/#m7935b9bab32bb5ff372507f84803b8753ad1c814
[3] https://lore.kernel.org/linux-xfs/6i2jvzn3lyugjlbgmjzpped3gogzyqv5mpe2uqaifz4vjpaega@pomzoq7ley77/
[4] https://lore.kernel.org/linux-xfs/20260312195308.738189-1-p.raghav@samsung.com/
[5.1] https://lore.kernel.org/linux-xfs/20260309180708.427553-2-lukas@herbolt.com/
[5.2] https://lore.kernel.org/linux-xfs/abC1LvRElctaHPe5@dread/

Pankaj Raghav (3):
  xfs: widen xfs_setfilesize() size argument to xfs_off_t
  xfs: add an allocation mode to xfs_alloc_file_space()
  xfs: add support for FALLOC_FL_WRITE_ZEROES

 fs/xfs/xfs_aops.c      |  2 +-
 fs/xfs/xfs_aops.h      |  2 +-
 fs/xfs/xfs_bmap_util.c | 42 +++++++++++++++++++----
 fs/xfs/xfs_bmap_util.h |  7 +++-
 fs/xfs/xfs_file.c      | 75 +++++++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_trace.h     |  8 ++---
 6 files changed, 119 insertions(+), 17 deletions(-)


base-commit: 46d91a29e0885a3867f49a7da09f0babef2d867f
-- 
2.51.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t
  2026-06-11 11:40 [PATCH v6 0/3] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
@ 2026-06-11 11:40 ` Pankaj Raghav
  2026-06-16 13:15   ` Christoph Hellwig
  2026-06-11 11:40 ` [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
  2026-06-11 11:40 ` [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
  2 siblings, 1 reply; 7+ messages in thread
From: Pankaj Raghav @ 2026-06-11 11:40 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

xfs_setfilesize() takes the written length as a size_t, which is only 32
bits wide on 32-bit architectures. The upcoming FALLOC_FL_WRITE_ZEROES
support calls it with a 64-bit fallocate length that can exceed 4GB.

Sashiko reported this[1].

Widen the size argument to xfs_off_t so large lengths are handled
correctly. The existing writeback caller passes a size_t and is unaffected
by the widening.

[1] https://sashiko.dev/#/patchset/20260604101442.2613872-1-p.raghav%40samsung.com

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 fs/xfs/xfs_aops.c  | 2 +-
 fs/xfs/xfs_aops.h  | 2 +-
 fs/xfs/xfs_trace.h | 8 ++++----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1a82cf625a08..0766c5667b95 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -52,7 +52,7 @@ int
 xfs_setfilesize(
 	struct xfs_inode	*ip,
 	xfs_off_t		offset,
-	size_t			size)
+	xfs_off_t		size)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 5a7a0f1a0b49..d8c4051f2a85 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -9,7 +9,7 @@
 extern const struct address_space_operations xfs_address_space_operations;
 extern const struct address_space_operations xfs_dax_aops;
 
-int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
+int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, xfs_off_t size);
 void xfs_end_bio(struct bio *bio);
 
 #endif /* __XFS_AOPS_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index d478693674f9..d5b50c033873 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1857,7 +1857,7 @@ DEFINE_IMAP_EVENT(xfs_iomap_alloc);
 DEFINE_IMAP_EVENT(xfs_iomap_found);
 
 DECLARE_EVENT_CLASS(xfs_simple_io_class,
-	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
+	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, xfs_off_t count),
 	TP_ARGS(ip, offset, count),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
@@ -1865,7 +1865,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
 		__field(loff_t, isize)
 		__field(loff_t, disize)
 		__field(loff_t, offset)
-		__field(size_t, count)
+		__field(xfs_off_t, count)
 	),
 	TP_fast_assign(
 		__entry->dev = VFS_I(ip)->i_sb->s_dev;
@@ -1876,7 +1876,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
 		__entry->count = count;
 	),
 	TP_printk("dev %d:%d ino 0x%llx isize 0x%llx disize 0x%llx "
-		  "pos 0x%llx bytecount 0x%zx",
+		  "pos 0x%llx bytecount 0x%llx",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->isize,
@@ -1887,7 +1887,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
 
 #define DEFINE_SIMPLE_IO_EVENT(name)	\
 DEFINE_EVENT(xfs_simple_io_class, name,	\
-	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),	\
+	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, xfs_off_t count),	\
 	TP_ARGS(ip, offset, count))
 DEFINE_SIMPLE_IO_EVENT(xfs_delalloc_enospc);
 DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space()
  2026-06-11 11:40 [PATCH v6 0/3] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
  2026-06-11 11:40 ` [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t Pankaj Raghav
@ 2026-06-11 11:40 ` Pankaj Raghav
  2026-06-16 13:15   ` Christoph Hellwig
  2026-06-11 11:40 ` [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
  2 siblings, 1 reply; 7+ messages in thread
From: Pankaj Raghav @ 2026-06-11 11:40 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

xfs_alloc_file_space() hardcodes XFS_BMAPI_PREALLOC to preallocate
unwritten extents across a range.

In preparation for FALLOC_FL_WRITE_ZEROES, add an explicit allocation
mode argument, enum xfs_alloc_file_space_mode, and derive the xfs_bmapi
flags from it. The only mode for now is XFS_ALLOC_FILE_SPACE_PREALLOC,
which preallocates unwritten extents and marks the inode as preallocated
exactly as before, so there is no functional change.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 fs/xfs/xfs_bmap_util.c | 25 +++++++++++++++++++++----
 fs/xfs/xfs_bmap_util.h |  6 +++++-
 fs/xfs/xfs_file.c      |  9 ++++++---
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 3b9f262f8e91..8dfb3c1e3759 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -642,11 +642,19 @@ xfs_free_eofblocks(
 	return error;
 }
 
+/*
+ * Allocate space for a file according to @mode:
+ *
+ * XFS_ALLOC_FILE_SPACE_PREALLOC:
+ * Preallocate unwritten extents across the range and mark the inode as
+ * preallocated.
+ */
 int
 xfs_alloc_file_space(
 	struct xfs_inode	*ip,
 	xfs_off_t		offset,
-	xfs_off_t		len)
+	xfs_off_t		len,
+	enum xfs_alloc_file_space_mode mode)
 {
 	xfs_mount_t		*mp = ip->i_mount;
 	xfs_off_t		count;
@@ -657,6 +665,7 @@ xfs_alloc_file_space(
 	int			rt;
 	xfs_trans_t		*tp;
 	xfs_bmbt_irec_t		imaps[1], *imapp;
+	uint32_t		bmapi_flags, nr_exts;
 	int			error;
 
 	if (xfs_is_always_cow_inode(ip))
@@ -674,6 +683,15 @@ xfs_alloc_file_space(
 	if (len <= 0)
 		return -EINVAL;
 
+	switch (mode) {
+	case XFS_ALLOC_FILE_SPACE_PREALLOC:
+		bmapi_flags = XFS_BMAPI_PREALLOC;
+		nr_exts = XFS_IEXT_ADD_NOSPLIT_CNT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
 	rt = XFS_IS_REALTIME_INODE(ip);
 	extsz = xfs_get_extsz_hint(ip);
 
@@ -733,8 +751,7 @@ xfs_alloc_file_space(
 		if (error)
 			break;
 
-		error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK,
-				XFS_IEXT_ADD_NOSPLIT_CNT);
+		error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK, nr_exts);
 		if (error)
 			goto error;
 
@@ -748,7 +765,7 @@ xfs_alloc_file_space(
 		 * will eventually reach the requested range.
 		 */
 		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
-				allocatesize_fsb, XFS_BMAPI_PREALLOC, 0, imapp,
+				allocatesize_fsb, bmapi_flags, 0, imapp,
 				&nimaps);
 		if (error) {
 			if (error != -ENOSR)
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index c477b3361630..232b4c48247e 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -55,8 +55,12 @@ int	xfs_bmap_last_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 			     int *is_empty);
 
 /* preallocation and hole punch interface */
+enum xfs_alloc_file_space_mode {
+	XFS_ALLOC_FILE_SPACE_PREALLOC,
+};
+
 int	xfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
-		xfs_off_t len);
+		xfs_off_t len, enum xfs_alloc_file_space_mode mode);
 int	xfs_free_file_space(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t len, struct xfs_zone_alloc_ctx *ac);
 int	xfs_collapse_file_space(struct xfs_inode *, xfs_off_t offset,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 845a97c9b063..e90ea6ebdc8e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1406,7 +1406,8 @@ xfs_falloc_zero_range(
 		len = round_up(offset + len, blksize) -
 			round_down(offset, blksize);
 		offset = round_down(offset, blksize);
-		error = xfs_alloc_file_space(ip, offset, len);
+		error = xfs_alloc_file_space(ip, offset, len,
+				XFS_ALLOC_FILE_SPACE_PREALLOC);
 	}
 	if (error)
 		return error;
@@ -1432,7 +1433,8 @@ xfs_falloc_unshare_range(
 	if (error)
 		return error;
 
-	error = xfs_alloc_file_space(XFS_I(inode), offset, len);
+	error = xfs_alloc_file_space(XFS_I(inode), offset, len,
+			XFS_ALLOC_FILE_SPACE_PREALLOC);
 	if (error)
 		return error;
 	return xfs_falloc_setsize(file, new_size);
@@ -1460,7 +1462,8 @@ xfs_falloc_allocate_range(
 	if (error)
 		return error;
 
-	error = xfs_alloc_file_space(XFS_I(inode), offset, len);
+	error = xfs_alloc_file_space(XFS_I(inode), offset, len,
+			XFS_ALLOC_FILE_SPACE_PREALLOC);
 	if (error)
 		return error;
 	return xfs_falloc_setsize(file, new_size);
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-11 11:40 [PATCH v6 0/3] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
  2026-06-11 11:40 ` [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t Pankaj Raghav
  2026-06-11 11:40 ` [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
@ 2026-06-11 11:40 ` Pankaj Raghav
  2026-06-16 13:31   ` Christoph Hellwig
  2 siblings, 1 reply; 7+ messages in thread
From: Pankaj Raghav @ 2026-06-11 11:40 UTC (permalink / raw)
  To: linux-xfs
  Cc: bfoster, lukas, Darrick J . Wong, p.raghav, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

If the underlying block device supports the unmap write zeroes
operation, this flag allows users to quickly preallocate a file with
written extents that contain zeroes. This is beneficial for subsequent
overwrites as it prevents the need for unwritten-to-written extent
conversions, thereby significantly reducing metadata updates and journal
I/O overhead, improving overwrite performance.

Co-developed-by: Lukas Herbolt <lukas@herbolt.com>
Signed-off-by: Lukas Herbolt <lukas@herbolt.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 fs/xfs/xfs_bmap_util.c | 19 ++++++++++--
 fs/xfs/xfs_bmap_util.h |  1 +
 fs/xfs/xfs_file.c      | 66 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 8dfb3c1e3759..55722b815117 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -643,11 +643,18 @@ xfs_free_eofblocks(
 }
 
 /*
- * Allocate space for a file according to @mode:
+ * Allocate space or convert extents for a file according to @mode:
  *
  * XFS_ALLOC_FILE_SPACE_PREALLOC:
  * Preallocate unwritten extents across the range and mark the inode as
  * preallocated.
+ *
+ * XFS_ALLOC_FILE_SPACE_WRITE_ZEROES:
+ * Allocate written extents over holes and convert unwritten extents in the
+ * range to written extents, initialising both to contain zeroes.
+ *
+ * This function does not update the file size; callers that extend the file
+ * are responsible for updating it once the extents are allocated.
  */
 int
 xfs_alloc_file_space(
@@ -688,6 +695,10 @@ xfs_alloc_file_space(
 		bmapi_flags = XFS_BMAPI_PREALLOC;
 		nr_exts = XFS_IEXT_ADD_NOSPLIT_CNT;
 		break;
+	case XFS_ALLOC_FILE_SPACE_WRITE_ZEROES:
+		bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
+		nr_exts = XFS_IEXT_WRITE_UNWRITTEN_CNT;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -776,8 +787,10 @@ xfs_alloc_file_space(
 			allocatesize_fsb -= imapp->br_blockcount;
 		}
 
-		ip->i_diflags |= XFS_DIFLAG_PREALLOC;
-		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		if (mode == XFS_ALLOC_FILE_SPACE_PREALLOC) {
+			ip->i_diflags |= XFS_DIFLAG_PREALLOC;
+			xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		}
 
 		error = xfs_trans_commit(tp);
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index 232b4c48247e..e3d506ca9610 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -57,6 +57,7 @@ int	xfs_bmap_last_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 /* preallocation and hole punch interface */
 enum xfs_alloc_file_space_mode {
 	XFS_ALLOC_FILE_SPACE_PREALLOC,
+	XFS_ALLOC_FILE_SPACE_WRITE_ZEROES,
 };
 
 int	xfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e90ea6ebdc8e..37623baaaed6 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1368,6 +1368,67 @@ xfs_falloc_force_zero(
 	return XFS_TEST_ERROR(ip->i_mount, XFS_ERRTAG_FORCE_ZERO_RANGE);
 }
 
+static int
+xfs_falloc_write_zeroes(
+	struct file		*file,
+	int			mode,
+	loff_t			offset,
+	loff_t			len,
+	struct xfs_zone_alloc_ctx *ac)
+{
+	struct inode		*inode = file_inode(file);
+	struct xfs_inode	*ip = XFS_I(inode);
+	loff_t			new_size = 0;
+	loff_t			old_size = XFS_ISIZE(ip);
+	int			error;
+	unsigned int		blksize = i_blocksize(inode);
+	loff_t			offset_aligned = round_down(offset, blksize);
+	bool			did_zero;
+
+	if (xfs_is_always_cow_inode(ip) ||
+	    !bdev_write_zeroes_unmap_sectors(xfs_inode_buftarg(ip)->bt_bdev))
+		return -EOPNOTSUPP;
+
+	error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
+	if (error)
+		return error;
+
+	error = xfs_free_file_space(ip, offset, len, ac);
+	if (error)
+		return error;
+
+	/*
+	 * Zero the tail of the old EOF block and any space up to the new
+	 * offset.
+	 * In the usual truncate path, xfs_falloc_setsize takes care of
+	 * zeroing those blocks.
+	 */
+	if (offset_aligned > old_size) {
+		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
+		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
+				NULL, &did_zero);
+		if (error)
+			return error;
+
+	}
+
+	error = xfs_alloc_file_space(ip, offset, len,
+			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
+	if (error)
+		return error;
+
+	/*
+	 * xfs_falloc_setsize() would re-zero the written extents via
+	 * iomap_zero_range(). Use xfs_setfilesize() instead.
+	 * Update in-core i_size first as xfs_setfilesize() clamps the on-disk
+	 * size to it.
+	 */
+	if (new_size > i_size_read(inode))
+		i_size_write(inode, new_size);
+
+	return xfs_setfilesize(ip, offset, len);
+}
+
 /*
  * Punch a hole and prealloc the range.  We use a hole punch rather than
  * unwritten extent conversion for two reasons:
@@ -1473,7 +1534,7 @@ xfs_falloc_allocate_range(
 		(FALLOC_FL_ALLOCATE_RANGE | FALLOC_FL_KEEP_SIZE |	\
 		 FALLOC_FL_PUNCH_HOLE |	FALLOC_FL_COLLAPSE_RANGE |	\
 		 FALLOC_FL_ZERO_RANGE |	FALLOC_FL_INSERT_RANGE |	\
-		 FALLOC_FL_UNSHARE_RANGE)
+		 FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_WRITE_ZEROES)
 
 STATIC long
 __xfs_file_fallocate(
@@ -1525,6 +1586,9 @@ __xfs_file_fallocate(
 	case FALLOC_FL_ALLOCATE_RANGE:
 		error = xfs_falloc_allocate_range(file, mode, offset, len);
 		break;
+	case FALLOC_FL_WRITE_ZEROES:
+		error = xfs_falloc_write_zeroes(file, mode, offset, len, ac);
+		break;
 	default:
 		error = -EOPNOTSUPP;
 		break;
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t
  2026-06-11 11:40 ` [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t Pankaj Raghav
@ 2026-06-16 13:15   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-06-16 13:15 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: linux-xfs, bfoster, lukas, Darrick J . Wong, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

On Thu, Jun 11, 2026 at 01:40:27PM +0200, Pankaj Raghav wrote:
> xfs_setfilesize() takes the written length as a size_t, which is only 32
> bits wide on 32-bit architectures. The upcoming FALLOC_FL_WRITE_ZEROES
> support calls it with a 64-bit fallocate length that can exceed 4GB.

Others might be better than me in arguing what type the new size
parameter should be - xfs_off_t looks a bit odd, but the VFS uses loff_t
for the length which translates to xfs_off_t, so at least it is
consistent.

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space()
  2026-06-11 11:40 ` [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
@ 2026-06-16 13:15   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-06-16 13:15 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: linux-xfs, bfoster, lukas, Darrick J . Wong, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, hch, cem, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
  2026-06-11 11:40 ` [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
@ 2026-06-16 13:31   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-06-16 13:31 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: linux-xfs, bfoster, lukas, Darrick J . Wong, dgc, gost.dev,
	pankaj.raghav, andres, kundan.kumar, cem, Zhang Yi, linux-fsdevel,
	linux-api

[API questions for Zhang and -fsdevel/ -api below)

> +	unsigned int		blksize = i_blocksize(inode);
> +	loff_t			offset_aligned = round_down(offset, blksize);

I think this actually needs to found up instead of rounding down.

> +	/*
> +	 * Zero the tail of the old EOF block and any space up to the new
> +	 * offset.
> +	 * In the usual truncate path, xfs_falloc_setsize takes care of
> +	 * zeroing those blocks.
> +	 */
> +	if (offset_aligned > old_size) {
> +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
> +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
> +				NULL, &did_zero);
> +		if (error)
> +			return error;
> +	}

... then this will properly zero from the old i_size to the first block
boundary after the old size.

> +	error = xfs_alloc_file_space(ip, offset, len,
> +			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);

... and here we need to pass offset_aligned instead of offset and
a new calculated len based on the last block boundary, and then
zero again after that.  That is assuming FALLOC_FL_WRITE_ZEROES
allows unaligned ranges for file systems.  The block code doesn't,
but I can't quite follow the ext4 code if it does or not, and there
is no mention of FALLOC_FL_WRITE_ZEROES even in the latest man-pages
tree.

Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
and make sure no existing data before the range is lost and the
entire range is zeroed?


> +	if (error)
> +		return error;
> +
> +	/*
> +	 * xfs_falloc_setsize() would re-zero the written extents via
> +	 * iomap_zero_range(). Use xfs_setfilesize() instead.
> +	 * Update in-core i_size first as xfs_setfilesize() clamps the on-disk
> +	 * size to it.
> +	 */
> +	if (new_size > i_size_read(inode))
> +		i_size_write(inode, new_size);

I think Sashiko is right that we need a pagecache_isize_extended and
filemap_write_and_wait_range calls here.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-16 13:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 11:40 [PATCH v6 0/3] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
2026-06-11 11:40 ` [PATCH v6 1/3] xfs: widen xfs_setfilesize() size argument to xfs_off_t Pankaj Raghav
2026-06-16 13:15   ` Christoph Hellwig
2026-06-11 11:40 ` [PATCH v6 2/3] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
2026-06-16 13:15   ` Christoph Hellwig
2026-06-11 11:40 ` [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
2026-06-16 13:31   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox