Linux XFS filesystem development
 help / color / mirror / Atom feed
From: Pankaj Raghav <pankaj.raghav@linux.dev>
To: Pankaj Raghav <p.raghav@samsung.com>,
	linux-xfs@vger.kernel.org, hch@infradead.org
Cc: bfoster@redhat.com, lukas@herbolt.com,
	"Darrick J . Wong" <djwong@kernel.org>,
	dgc@kernel.org, gost.dev@samsung.com, andres@anarazel.de,
	kundan.kumar@samsung.com, hch@lst.de, cem@kernel.org
Subject: Re: [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES
Date: Tue, 23 Jun 2026 22:21:20 +0200	[thread overview]
Message-ID: <d69477d2-f520-4b5e-bc29-95a74726e504@linux.dev> (raw)
In-Reply-To: <20260622083106.2914092-3-p.raghav@samsung.com>

> +static int
> +xfs_falloc_write_zeroes(
> +	struct file		*file,
> +	int			mode,
> +	loff_t			offset,
> +	loff_t			len,
> +	struct xfs_zone_alloc_ctx *ac)
> +{
> +	struct inode		*inode = file_inode(file);
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	loff_t			new_size = 0;
> +	unsigned int		blksize = i_blocksize(inode);
> +	xfs_off_t		offset_aligned = round_up(offset, blksize);
> +	xfs_off_t		end_aligned = round_down(offset + len, blksize);
> +	xfs_off_t		len_aligned = end_aligned - offset_aligned;
> +	int			error;
> +
> +	if (xfs_is_always_cow_inode(ip) ||
> +	    !bdev_write_zeroes_unmap_sectors(xfs_inode_buftarg(ip)->bt_bdev))
> +		return -EOPNOTSUPP;
> +
> +	error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 *
> +	 *    |----------|----------|----------|----------|----------|
> +	 *    ^     ^    ^                     ^     ^    ^
> +	 *    |     |    |                     |     |    |
> +	 *    |   offset |                     |    end   |
> +	 *    |          |                     |          |
> +	 * offset_rd   offset_ru              end_rd    end_ru
> +	 *
> +	 * xfs_free_file_space() punches inside from offset_ru -> end_rd. It also
> +	 * zeroes offset -> offset_ru and end_rd -> end.
> +	 * Only pass offset_ru -> end_rd to be zeroed via xfs_alloc_file_space().
> +	 */
> +	error = xfs_free_file_space(ip, offset, len, ac);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * Publish the new size while the punched range is still a hole, then
> +	 * fill it with written zeroes.  Like the other fallocate modes we use
> +	 * xfs_falloc_setsize(), but it must run *before* we convert the range
> +	 * to written extents: xfs_setattr_size() zeroes [old EOF, new size) via
> +	 * xfs_zero_range(), which skips holes, so there is nothing to re-zero.
> +	 * It will also writeback partial EOF block before the on-disk size is
> +	 * logged.
> +	 */
> +	error = xfs_falloc_setsize(file, new_size);
> +	if (error)
> +		return error;
> +
> +	if (len_aligned > 0)
> +		error = xfs_alloc_file_space(ip, offset_aligned, len_aligned,
> +				XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
> +
> +	return error;
> +}
> +

Sashiko was not happy with this approach as there are cases where there will not be a data
corruption but we might end up not allocating an extent, therefore, getting an -ENOSPC at a later point.

I went back what Zhang yi pointed out in the previous version wrt semantics[1]. I think the correct
idea should be the following:

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0e1332ccdf79..a27862037d22 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1379,10 +1379,6 @@ xfs_falloc_write_zeroes(
        struct inode            *inode = file_inode(file);
        struct xfs_inode        *ip = XFS_I(inode);
        loff_t                  new_size = 0;
-       unsigned int            blksize = i_blocksize(inode);
-       xfs_off_t               offset_aligned = round_up(offset, blksize);
-       xfs_off_t               end_aligned = round_down(offset + len, blksize);
-       xfs_off_t               len_aligned = end_aligned - offset_aligned;
        int                     error;

        if (xfs_is_always_cow_inode(ip) ||
@@ -1402,9 +1398,11 @@ xfs_falloc_write_zeroes(
         *    |          |                     |          |
         * offset_rd   offset_ru              end_rd    end_ru
         *
-        * xfs_free_file_space() punches inside from offset_ru -> end_rd. It also
-        * zeroes offset -> offset_ru and end_rd -> end.
-        * Only pass offset_ru -> end_rd to be zeroed via xfs_alloc_file_space().
+        * xfs_free_file_space() punches the aligned interior offset_ru -> end_rd
+        * to holes and byte-zeroes the in-range parts of the partial edge blocks,
+        * offset -> offset_ru and end_rd -> end.  xfs_zero_range() only touches
+        * already-written blocks here; it skips holes and unwritten extents, so
+        * unallocated/unwritten edge blocks are left for the allocation below.
         */
        error = xfs_free_file_space(ip, offset, len, ac);
        if (error)
@@ -1423,11 +1421,19 @@ xfs_falloc_write_zeroes(
        if (error)
                return error;

-       if (len_aligned > 0)
-               error = xfs_alloc_file_space(ip, offset_aligned, len_aligned,
-                               XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
-
-       return error;
+       /*
+        * Allocate written, zeroed extents across the range.  xfs_alloc_file_space()
+        * rounds outward to block granularity:
+        *  - holes (the punched interior and any unallocated edge block) are
+        *    allocated and zeroed;
+        *  - unwritten extents (including unwritten edge blocks) are converted to
+        *    written and zeroed;
+        *  - already-written blocks are skipped, so the out-of-range bytes of a
+        *    written edge block keep their data; their in-range bytes were already
+        *    zeroed by xfs_free_file_space() above.
+        */
+       return xfs_alloc_file_space(ip, offset, len,
+                       XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
 }

 /*

We pass offset and len without rounding to xfs_alloc_file_space, and the existing behaviour
correctly handles them. I could add test cases in xfstests to test out all these edge cases so that
we don't regress.

If I don't have anymore comments, I will send a v8 with this approach.

--
Pankaj

[1] https://lore.kernel.org/linux-xfs/557b2e5c-7c65-48de-87a9-6fba21eca99f@huaweicloud.com/

      reply	other threads:[~2026-06-23 20:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-22  8:31 [PATCH v7 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
2026-06-22  8:31 ` [PATCH v7 1/2] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
2026-06-22  8:31 ` [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
2026-06-23 20:21   ` Pankaj Raghav [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d69477d2-f520-4b5e-bc29-95a74726e504@linux.dev \
    --to=pankaj.raghav@linux.dev \
    --cc=andres@anarazel.de \
    --cc=bfoster@redhat.com \
    --cc=cem@kernel.org \
    --cc=dgc@kernel.org \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=kundan.kumar@samsung.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lukas@herbolt.com \
    --cc=p.raghav@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox