From: Pankaj Raghav <pankaj.raghav@linux.dev>
To: "Darrick J. Wong" <djwong@kernel.org>,
Pankaj Raghav <p.raghav@samsung.com>
Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, lukas@herbolt.com,
dgc@kernel.org, gost.dev@samsung.com,
Zhang Yi <yi.zhang@huaweicloud.com>,
andres@anarazel.de, kundan.kumar@samsung.com, hch@lst.de,
cem@kernel.org, hch@infradead.org
Subject: Re: [PATCH v8 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES
Date: Fri, 26 Jun 2026 18:04:35 +0200 [thread overview]
Message-ID: <f7c2276c-b36c-4a69-a913-c5abfc403d01@linux.dev> (raw)
In-Reply-To: <20260625172006.GC6078@frogsfrogsfrogs>
>> + error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
>> + if (error)
>> + return error;
>> +
>> + /*
>> + *
>> + * |----------|----------|----------|----------|----------|
>> + * ^ ^ ^ ^ ^ ^
>> + * | | | | | |
>> + * | offset | | end |
>> + * | | | |
>> + * offset_rd offset_ru end_rd end_ru
>
> Do "_rd" and "_ru" mean "round down" and "round up"? And is that to the
> fsblock size, or the allocation unit size?
>
Hmm, until now, I was thinking of fs block size, but looking at the function again,
we change it if it is a realtime file.
startoffset_fsb = XFS_B_TO_FSB(mp, offset);
endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
/* We can only free complete realtime extents. */
if (xfs_inode_has_bigrtalloc(ip)) {
startoffset_fsb = xfs_fileoff_roundup_rtx(mp, startoffset_fsb);
endoffset_fsb = xfs_fileoff_rounddown_rtx(mp, endoffset_fsb);
}
>> + *
>> + * xfs_free_file_space() punches the aligned interior offset_ru -> end_rd
>> + * to holes and byte-zeroes the in-range parts of the partial edge blocks,
>
> xfs_free_file_space rounds inward to allocation unit granularity and
> punches out that range; and then it writes zeroes to non-hole space
> that doesn't get unmapped.
>
>> + * offset -> offset_ru and end_rd -> end. xfs_zero_range() only touches
>> + * already-written blocks here; it skips holes and unwritten extents, so
>> + * unallocated/unwritten edge blocks are left for the allocation below.
>> + */
>> + error = xfs_free_file_space(ip, offset, len, ac);
>> + if (error)
>> + return error;
>> +
>> + /*
>> + * Publish the new size while the punched range is still a hole, then
>> + * fill it with written zeroes. Like the other fallocate modes we use
>> + * xfs_falloc_setsize(), but it must run *before* we convert the range
>> + * to written extents: xfs_setattr_size() zeroes [old EOF, new size) via
>> + * xfs_zero_range(), which skips holes, so there is nothing to re-zero.
>> + * It will also writeback partial EOF block before the on-disk size is
>> + * logged.
>> + * Note: extending the size before allocating means a failure below
>> + * leaves the file larger with unallocated holes in the new range.
>> + * That is safe as holes within i_size read back as zeroes and expose
>> + * no stale data while the error is propagated to the caller.
>> + */
>> + error = xfs_falloc_setsize(file, new_size);
>> + if (error)
>> + return error;
>
> Hrm ok so now that we've punched out some blocks and zeroed the rest,
> now we adjust the file size, which should only entail committing the new
> file size to disk...
>
>> +
>> + /*
>> + * Allocate written, zeroed extents across the range. xfs_alloc_file_space()
>> + * rounds outward to block granularity:
>> + * - holes (the punched interior and any unallocated edge block) are
>> + * allocated and zeroed;
>> + * - unwritten extents (including unwritten edge blocks) are converted to
>> + * written and zeroed;
>> + * - Already written edge blocks are skipped. The out-of-range bytes of
>> + * a written edge block keep their data (offset_rd -> offset and
>> + * end -> end_rd); their in-range bytes (offset -> offset_ru and
>> + * end_ru -> end were already zeroed by xfs_free_file_space().
>> + */
>> + return xfs_alloc_file_space(ip, offset, len,
>> + XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
>
> ...and now we can just do an accelerated "write zeroes to disk" which is
> conveniently always within EOF now. I /think/ this looks ok to me now,
> though I'm curious how extensively the new fallocate mode has been
> tested with fsx and unaligned file ranges? And rt volumes with rt
> extent size > 1 fsblock.
>
I tested it extensively with fsblock with fsx and I added some tests locally (which I will
send it upstream soon) for unaligned edges. Some of the corner cases I figured because of some
fsx test (generic/363). But I didn't do it for all the profiles. I will also test it for `-r
extsize=8k -b size=4k`.
Thanks for the review Darrick.
--
Pankaj
prev parent reply other threads:[~2026-06-26 16:04 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 11:45 [PATCH v8 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
2026-06-25 11:45 ` [PATCH v8 1/2] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
2026-06-25 17:01 ` Darrick J. Wong
2026-06-25 11:45 ` [PATCH v8 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
2026-06-25 17:20 ` Darrick J. Wong
2026-06-26 16:04 ` Pankaj Raghav [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f7c2276c-b36c-4a69-a913-c5abfc403d01@linux.dev \
--to=pankaj.raghav@linux.dev \
--cc=andres@anarazel.de \
--cc=bfoster@redhat.com \
--cc=cem@kernel.org \
--cc=dgc@kernel.org \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=kundan.kumar@samsung.com \
--cc=linux-xfs@vger.kernel.org \
--cc=lukas@herbolt.com \
--cc=p.raghav@samsung.com \
--cc=yi.zhang@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox