From: Pankaj Raghav <pankaj.raghav@linux.dev>
To: Pankaj Raghav <p.raghav@samsung.com>,
linux-xfs@vger.kernel.org, hch@infradead.org
Cc: bfoster@redhat.com, lukas@herbolt.com,
"Darrick J . Wong" <djwong@kernel.org>,
dgc@kernel.org, gost.dev@samsung.com, andres@anarazel.de,
kundan.kumar@samsung.com, hch@lst.de, cem@kernel.org
Subject: Re: [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES
Date: Tue, 23 Jun 2026 22:21:20 +0200 [thread overview]
Message-ID: <d69477d2-f520-4b5e-bc29-95a74726e504@linux.dev> (raw)
In-Reply-To: <20260622083106.2914092-3-p.raghav@samsung.com>
> +static int
> +xfs_falloc_write_zeroes(
> + struct file *file,
> + int mode,
> + loff_t offset,
> + loff_t len,
> + struct xfs_zone_alloc_ctx *ac)
> +{
> + struct inode *inode = file_inode(file);
> + struct xfs_inode *ip = XFS_I(inode);
> + loff_t new_size = 0;
> + unsigned int blksize = i_blocksize(inode);
> + xfs_off_t offset_aligned = round_up(offset, blksize);
> + xfs_off_t end_aligned = round_down(offset + len, blksize);
> + xfs_off_t len_aligned = end_aligned - offset_aligned;
> + int error;
> +
> + if (xfs_is_always_cow_inode(ip) ||
> + !bdev_write_zeroes_unmap_sectors(xfs_inode_buftarg(ip)->bt_bdev))
> + return -EOPNOTSUPP;
> +
> + error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
> + if (error)
> + return error;
> +
> + /*
> + *
> + * |----------|----------|----------|----------|----------|
> + * ^ ^ ^ ^ ^ ^
> + * | | | | | |
> + * | offset | | end |
> + * | | | |
> + * offset_rd offset_ru end_rd end_ru
> + *
> + * xfs_free_file_space() punches inside from offset_ru -> end_rd. It also
> + * zeroes offset -> offset_ru and end_rd -> end.
> + * Only pass offset_ru -> end_rd to be zeroed via xfs_alloc_file_space().
> + */
> + error = xfs_free_file_space(ip, offset, len, ac);
> + if (error)
> + return error;
> +
> + /*
> + * Publish the new size while the punched range is still a hole, then
> + * fill it with written zeroes. Like the other fallocate modes we use
> + * xfs_falloc_setsize(), but it must run *before* we convert the range
> + * to written extents: xfs_setattr_size() zeroes [old EOF, new size) via
> + * xfs_zero_range(), which skips holes, so there is nothing to re-zero.
> + * It will also writeback partial EOF block before the on-disk size is
> + * logged.
> + */
> + error = xfs_falloc_setsize(file, new_size);
> + if (error)
> + return error;
> +
> + if (len_aligned > 0)
> + error = xfs_alloc_file_space(ip, offset_aligned, len_aligned,
> + XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
> +
> + return error;
> +}
> +
Sashiko was not happy with this approach as there are cases where there will not be a data
corruption but we might end up not allocating an extent, therefore, getting an -ENOSPC at a later point.
I went back what Zhang yi pointed out in the previous version wrt semantics[1]. I think the correct
idea should be the following:
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0e1332ccdf79..a27862037d22 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1379,10 +1379,6 @@ xfs_falloc_write_zeroes(
struct inode *inode = file_inode(file);
struct xfs_inode *ip = XFS_I(inode);
loff_t new_size = 0;
- unsigned int blksize = i_blocksize(inode);
- xfs_off_t offset_aligned = round_up(offset, blksize);
- xfs_off_t end_aligned = round_down(offset + len, blksize);
- xfs_off_t len_aligned = end_aligned - offset_aligned;
int error;
if (xfs_is_always_cow_inode(ip) ||
@@ -1402,9 +1398,11 @@ xfs_falloc_write_zeroes(
* | | | |
* offset_rd offset_ru end_rd end_ru
*
- * xfs_free_file_space() punches inside from offset_ru -> end_rd. It also
- * zeroes offset -> offset_ru and end_rd -> end.
- * Only pass offset_ru -> end_rd to be zeroed via xfs_alloc_file_space().
+ * xfs_free_file_space() punches the aligned interior offset_ru -> end_rd
+ * to holes and byte-zeroes the in-range parts of the partial edge blocks,
+ * offset -> offset_ru and end_rd -> end. xfs_zero_range() only touches
+ * already-written blocks here; it skips holes and unwritten extents, so
+ * unallocated/unwritten edge blocks are left for the allocation below.
*/
error = xfs_free_file_space(ip, offset, len, ac);
if (error)
@@ -1423,11 +1421,19 @@ xfs_falloc_write_zeroes(
if (error)
return error;
- if (len_aligned > 0)
- error = xfs_alloc_file_space(ip, offset_aligned, len_aligned,
- XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
-
- return error;
+ /*
+ * Allocate written, zeroed extents across the range. xfs_alloc_file_space()
+ * rounds outward to block granularity:
+ * - holes (the punched interior and any unallocated edge block) are
+ * allocated and zeroed;
+ * - unwritten extents (including unwritten edge blocks) are converted to
+ * written and zeroed;
+ * - already-written blocks are skipped, so the out-of-range bytes of a
+ * written edge block keep their data; their in-range bytes were already
+ * zeroed by xfs_free_file_space() above.
+ */
+ return xfs_alloc_file_space(ip, offset, len,
+ XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
}
/*
We pass offset and len without rounding to xfs_alloc_file_space, and the existing behaviour
correctly handles them. I could add test cases in xfstests to test out all these edge cases so that
we don't regress.
If I don't have anymore comments, I will send a v8 with this approach.
--
Pankaj
[1] https://lore.kernel.org/linux-xfs/557b2e5c-7c65-48de-87a9-6fba21eca99f@huaweicloud.com/
prev parent reply other threads:[~2026-06-23 20:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-22 8:31 [PATCH v7 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
2026-06-22 8:31 ` [PATCH v7 1/2] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
2026-06-22 8:31 ` [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
2026-06-23 20:21 ` Pankaj Raghav [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d69477d2-f520-4b5e-bc29-95a74726e504@linux.dev \
--to=pankaj.raghav@linux.dev \
--cc=andres@anarazel.de \
--cc=bfoster@redhat.com \
--cc=cem@kernel.org \
--cc=dgc@kernel.org \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=kundan.kumar@samsung.com \
--cc=linux-xfs@vger.kernel.org \
--cc=lukas@herbolt.com \
--cc=p.raghav@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox