All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pankaj Raghav <pankaj.raghav@linux.dev>
To: Pankaj Raghav <p.raghav@samsung.com>,
	linux-xfs@vger.kernel.org, hch@infradead.org
Cc: bfoster@redhat.com, lukas@herbolt.com,
	"Darrick J . Wong" <djwong@kernel.org>,
	dgc@kernel.org, gost.dev@samsung.com, andres@anarazel.de,
	kundan.kumar@samsung.com, hch@lst.de, cem@kernel.org
Subject: Re: [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES
Date: Tue, 23 Jun 2026 22:21:20 +0200	[thread overview]
Message-ID: <d69477d2-f520-4b5e-bc29-95a74726e504@linux.dev> (raw)
In-Reply-To: <20260622083106.2914092-3-p.raghav@samsung.com>

> +static int
> +xfs_falloc_write_zeroes(
> +	struct file		*file,
> +	int			mode,
> +	loff_t			offset,
> +	loff_t			len,
> +	struct xfs_zone_alloc_ctx *ac)
> +{
> +	struct inode		*inode = file_inode(file);
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	loff_t			new_size = 0;
> +	unsigned int		blksize = i_blocksize(inode);
> +	xfs_off_t		offset_aligned = round_up(offset, blksize);
> +	xfs_off_t		end_aligned = round_down(offset + len, blksize);
> +	xfs_off_t		len_aligned = end_aligned - offset_aligned;
> +	int			error;
> +
> +	if (xfs_is_always_cow_inode(ip) ||
> +	    !bdev_write_zeroes_unmap_sectors(xfs_inode_buftarg(ip)->bt_bdev))
> +		return -EOPNOTSUPP;
> +
> +	error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 *
> +	 *    |----------|----------|----------|----------|----------|
> +	 *    ^     ^    ^                     ^     ^    ^
> +	 *    |     |    |                     |     |    |
> +	 *    |   offset |                     |    end   |
> +	 *    |          |                     |          |
> +	 * offset_rd   offset_ru              end_rd    end_ru
> +	 *
> +	 * xfs_free_file_space() punches inside from offset_ru -> end_rd. It also
> +	 * zeroes offset -> offset_ru and end_rd -> end.
> +	 * Only pass offset_ru -> end_rd to be zeroed via xfs_alloc_file_space().
> +	 */
> +	error = xfs_free_file_space(ip, offset, len, ac);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * Publish the new size while the punched range is still a hole, then
> +	 * fill it with written zeroes.  Like the other fallocate modes we use
> +	 * xfs_falloc_setsize(), but it must run *before* we convert the range
> +	 * to written extents: xfs_setattr_size() zeroes [old EOF, new size) via
> +	 * xfs_zero_range(), which skips holes, so there is nothing to re-zero.
> +	 * It will also writeback partial EOF block before the on-disk size is
> +	 * logged.
> +	 */
> +	error = xfs_falloc_setsize(file, new_size);
> +	if (error)
> +		return error;
> +
> +	if (len_aligned > 0)
> +		error = xfs_alloc_file_space(ip, offset_aligned, len_aligned,
> +				XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
> +
> +	return error;
> +}
> +

Sashiko was not happy with this approach as there are cases where there will not be a data
corruption but we might end up not allocating an extent, therefore, getting an -ENOSPC at a later point.

I went back what Zhang yi pointed out in the previous version wrt semantics[1]. I think the correct
idea should be the following:

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0e1332ccdf79..a27862037d22 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1379,10 +1379,6 @@ xfs_falloc_write_zeroes(
        struct inode            *inode = file_inode(file);
        struct xfs_inode        *ip = XFS_I(inode);
        loff_t                  new_size = 0;
-       unsigned int            blksize = i_blocksize(inode);
-       xfs_off_t               offset_aligned = round_up(offset, blksize);
-       xfs_off_t               end_aligned = round_down(offset + len, blksize);
-       xfs_off_t               len_aligned = end_aligned - offset_aligned;
        int                     error;

        if (xfs_is_always_cow_inode(ip) ||
@@ -1402,9 +1398,11 @@ xfs_falloc_write_zeroes(
         *    |          |                     |          |
         * offset_rd   offset_ru              end_rd    end_ru
         *
-        * xfs_free_file_space() punches inside from offset_ru -> end_rd. It also
-        * zeroes offset -> offset_ru and end_rd -> end.
-        * Only pass offset_ru -> end_rd to be zeroed via xfs_alloc_file_space().
+        * xfs_free_file_space() punches the aligned interior offset_ru -> end_rd
+        * to holes and byte-zeroes the in-range parts of the partial edge blocks,
+        * offset -> offset_ru and end_rd -> end.  xfs_zero_range() only touches
+        * already-written blocks here; it skips holes and unwritten extents, so
+        * unallocated/unwritten edge blocks are left for the allocation below.
         */
        error = xfs_free_file_space(ip, offset, len, ac);
        if (error)
@@ -1423,11 +1421,19 @@ xfs_falloc_write_zeroes(
        if (error)
                return error;

-       if (len_aligned > 0)
-               error = xfs_alloc_file_space(ip, offset_aligned, len_aligned,
-                               XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
-
-       return error;
+       /*
+        * Allocate written, zeroed extents across the range.  xfs_alloc_file_space()
+        * rounds outward to block granularity:
+        *  - holes (the punched interior and any unallocated edge block) are
+        *    allocated and zeroed;
+        *  - unwritten extents (including unwritten edge blocks) are converted to
+        *    written and zeroed;
+        *  - already-written blocks are skipped, so the out-of-range bytes of a
+        *    written edge block keep their data; their in-range bytes were already
+        *    zeroed by xfs_free_file_space() above.
+        */
+       return xfs_alloc_file_space(ip, offset, len,
+                       XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
 }

 /*

We pass offset and len without rounding to xfs_alloc_file_space, and the existing behaviour
correctly handles them. I could add test cases in xfstests to test out all these edge cases so that
we don't regress.

If I don't have anymore comments, I will send a v8 with this approach.

--
Pankaj

[1] https://lore.kernel.org/linux-xfs/557b2e5c-7c65-48de-87a9-6fba21eca99f@huaweicloud.com/

      reply	other threads:[~2026-06-23 20:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-22  8:31 [PATCH v7 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
2026-06-22  8:31 ` [PATCH v7 1/2] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
2026-06-22  8:31 ` [PATCH v7 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
2026-06-23 20:21   ` Pankaj Raghav [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d69477d2-f520-4b5e-bc29-95a74726e504@linux.dev \
    --to=pankaj.raghav@linux.dev \
    --cc=andres@anarazel.de \
    --cc=bfoster@redhat.com \
    --cc=cem@kernel.org \
    --cc=dgc@kernel.org \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=kundan.kumar@samsung.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lukas@herbolt.com \
    --cc=p.raghav@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.