From: Pankaj Raghav <pankaj.raghav@linux.dev>
To: "Darrick J. Wong" <djwong@kernel.org>,
Pankaj Raghav <p.raghav@samsung.com>
Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, lukas@herbolt.com,
dgc@kernel.org, gost.dev@samsung.com,
Zhang Yi <yi.zhang@huaweicloud.com>,
andres@anarazel.de, kundan.kumar@samsung.com, hch@lst.de,
cem@kernel.org, hch@infradead.org
Subject: Re: [PATCH v8 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES
Date: Fri, 26 Jun 2026 18:04:35 +0200 [thread overview]
Message-ID: <f7c2276c-b36c-4a69-a913-c5abfc403d01@linux.dev> (raw)
In-Reply-To: <20260625172006.GC6078@frogsfrogsfrogs>
>> + error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
>> + if (error)
>> + return error;
>> +
>> + /*
>> + *
>> + * |----------|----------|----------|----------|----------|
>> + * ^ ^ ^ ^ ^ ^
>> + * | | | | | |
>> + * | offset | | end |
>> + * | | | |
>> + * offset_rd offset_ru end_rd end_ru
>
> Do "_rd" and "_ru" mean "round down" and "round up"? And is that to the
> fsblock size, or the allocation unit size?
>
Hmm, until now, I was thinking of fs block size, but looking at the function again,
we change it if it is a realtime file.
startoffset_fsb = XFS_B_TO_FSB(mp, offset);
endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
/* We can only free complete realtime extents. */
if (xfs_inode_has_bigrtalloc(ip)) {
startoffset_fsb = xfs_fileoff_roundup_rtx(mp, startoffset_fsb);
endoffset_fsb = xfs_fileoff_rounddown_rtx(mp, endoffset_fsb);
}
>> + *
>> + * xfs_free_file_space() punches the aligned interior offset_ru -> end_rd
>> + * to holes and byte-zeroes the in-range parts of the partial edge blocks,
>
> xfs_free_file_space rounds inward to allocation unit granularity and
> punches out that range; and then it writes zeroes to non-hole space
> that doesn't get unmapped.
>
>> + * offset -> offset_ru and end_rd -> end. xfs_zero_range() only touches
>> + * already-written blocks here; it skips holes and unwritten extents, so
>> + * unallocated/unwritten edge blocks are left for the allocation below.
>> + */
>> + error = xfs_free_file_space(ip, offset, len, ac);
>> + if (error)
>> + return error;
>> +
>> + /*
>> + * Publish the new size while the punched range is still a hole, then
>> + * fill it with written zeroes. Like the other fallocate modes we use
>> + * xfs_falloc_setsize(), but it must run *before* we convert the range
>> + * to written extents: xfs_setattr_size() zeroes [old EOF, new size) via
>> + * xfs_zero_range(), which skips holes, so there is nothing to re-zero.
>> + * It will also writeback partial EOF block before the on-disk size is
>> + * logged.
>> + * Note: extending the size before allocating means a failure below
>> + * leaves the file larger with unallocated holes in the new range.
>> + * That is safe as holes within i_size read back as zeroes and expose
>> + * no stale data while the error is propagated to the caller.
>> + */
>> + error = xfs_falloc_setsize(file, new_size);
>> + if (error)
>> + return error;
>
> Hrm ok so now that we've punched out some blocks and zeroed the rest,
> now we adjust the file size, which should only entail committing the new
> file size to disk...
>
>> +
>> + /*
>> + * Allocate written, zeroed extents across the range. xfs_alloc_file_space()
>> + * rounds outward to block granularity:
>> + * - holes (the punched interior and any unallocated edge block) are
>> + * allocated and zeroed;
>> + * - unwritten extents (including unwritten edge blocks) are converted to
>> + * written and zeroed;
>> + * - Already written edge blocks are skipped. The out-of-range bytes of
>> + * a written edge block keep their data (offset_rd -> offset and
>> + * end -> end_rd); their in-range bytes (offset -> offset_ru and
>> + * end_ru -> end were already zeroed by xfs_free_file_space().
>> + */
>> + return xfs_alloc_file_space(ip, offset, len,
>> + XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
>
> ...and now we can just do an accelerated "write zeroes to disk" which is
> conveniently always within EOF now. I /think/ this looks ok to me now,
> though I'm curious how extensively the new fallocate mode has been
> tested with fsx and unaligned file ranges? And rt volumes with rt
> extent size > 1 fsblock.
>
I tested it extensively with fsblock with fsx and I added some tests locally (which I will
send it upstream soon) for unaligned edges. Some of the corner cases I figured because of some
fsx test (generic/363). But I didn't do it for all the profiles. I will also test it for `-r
extsize=8k -b size=4k`.
Thanks for the review Darrick.
--
Pankaj
prev parent reply other threads:[~2026-06-26 16:04 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 11:45 [PATCH v8 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
2026-06-25 11:45 ` [PATCH v8 1/2] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
2026-06-25 17:01 ` Darrick J. Wong
2026-06-25 11:45 ` [PATCH v8 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
2026-06-25 17:20 ` Darrick J. Wong
2026-06-26 16:04 ` Pankaj Raghav [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f7c2276c-b36c-4a69-a913-c5abfc403d01@linux.dev \
--to=pankaj.raghav@linux.dev \
--cc=andres@anarazel.de \
--cc=bfoster@redhat.com \
--cc=cem@kernel.org \
--cc=dgc@kernel.org \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=kundan.kumar@samsung.com \
--cc=linux-xfs@vger.kernel.org \
--cc=lukas@herbolt.com \
--cc=p.raghav@samsung.com \
--cc=yi.zhang@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.