Linux XFS filesystem development
 help / color / mirror / Atom feed
From: Pankaj Raghav <pankaj.raghav@linux.dev>
To: "Darrick J. Wong" <djwong@kernel.org>,
	Pankaj Raghav <p.raghav@samsung.com>
Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com, lukas@herbolt.com,
	dgc@kernel.org, gost.dev@samsung.com,
	Zhang Yi <yi.zhang@huaweicloud.com>,
	andres@anarazel.de, kundan.kumar@samsung.com, hch@lst.de,
	cem@kernel.org, hch@infradead.org
Subject: Re: [PATCH v8 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES
Date: Fri, 26 Jun 2026 18:04:35 +0200	[thread overview]
Message-ID: <f7c2276c-b36c-4a69-a913-c5abfc403d01@linux.dev> (raw)
In-Reply-To: <20260625172006.GC6078@frogsfrogsfrogs>

>> +	error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
>> +	if (error)
>> +		return error;
>> +
>> +	/*
>> +	 *
>> +	 *    |----------|----------|----------|----------|----------|
>> +	 *    ^     ^    ^                     ^     ^    ^
>> +	 *    |     |    |                     |     |    |
>> +	 *    |   offset |                     |    end   |
>> +	 *    |          |                     |          |
>> +	 * offset_rd   offset_ru              end_rd    end_ru
> 
> Do "_rd" and "_ru" mean "round down" and "round up"?  And is that to the
> fsblock size, or the allocation unit size?
> 
Hmm, until now, I was thinking of fs block size, but looking at the function again,
we change it if it is a realtime file.

	startoffset_fsb = XFS_B_TO_FSB(mp, offset);
	endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);

	/* We can only free complete realtime extents. */
	if (xfs_inode_has_bigrtalloc(ip)) {
		startoffset_fsb = xfs_fileoff_roundup_rtx(mp, startoffset_fsb);
		endoffset_fsb = xfs_fileoff_rounddown_rtx(mp, endoffset_fsb);
	}

>> +	 *
>> +	 * xfs_free_file_space() punches the aligned interior offset_ru -> end_rd
>> +	 * to holes and byte-zeroes the in-range parts of the partial edge blocks,
> 
> xfs_free_file_space rounds inward to allocation unit granularity and
> punches out that range; and then it writes zeroes to non-hole space
> that doesn't get unmapped.
> 
>> +	 * offset -> offset_ru and end_rd -> end.  xfs_zero_range() only touches
>> +	 * already-written blocks here; it skips holes and unwritten extents, so
>> +	 * unallocated/unwritten edge blocks are left for the allocation below.
>> +	 */
>> +	error = xfs_free_file_space(ip, offset, len, ac);
>> +	if (error)
>> +		return error;
>> +
>> +	/*
>> +	 * Publish the new size while the punched range is still a hole, then
>> +	 * fill it with written zeroes.  Like the other fallocate modes we use
>> +	 * xfs_falloc_setsize(), but it must run *before* we convert the range
>> +	 * to written extents: xfs_setattr_size() zeroes [old EOF, new size) via
>> +	 * xfs_zero_range(), which skips holes, so there is nothing to re-zero.
>> +	 * It will also writeback partial EOF block before the on-disk size is
>> +	 * logged.
>> +	 * Note: extending the size before allocating means a failure below
>> +	 * leaves the file larger with unallocated holes in the new range.
>> +	 * That is safe as holes within i_size read back as zeroes and expose
>> +	 * no stale data while the error is propagated to the caller.
>> +	 */
>> +	error = xfs_falloc_setsize(file, new_size);
>> +	if (error)
>> +		return error;
> 
> Hrm ok so now that we've punched out some blocks and zeroed the rest,
> now we adjust the file size, which should only entail committing the new
> file size to disk...
> 
>> +
>> +	/*
>> +	 * Allocate written, zeroed extents across the range.  xfs_alloc_file_space()
>> +	 * rounds outward to block granularity:
>> +	 *  - holes (the punched interior and any unallocated edge block) are
>> +	 *    allocated and zeroed;
>> +	 *  - unwritten extents (including unwritten edge blocks) are converted to
>> +	 *    written and zeroed;
>> +	 *  - Already written edge blocks are skipped. The out-of-range bytes of
>> +	 *    a written edge block keep their data (offset_rd -> offset and
>> +	 *    end -> end_rd); their in-range bytes (offset -> offset_ru and
>> +	 *    end_ru -> end were already zeroed by xfs_free_file_space().
>> +	 */
>> +	return xfs_alloc_file_space(ip, offset, len,
>> +			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
> 
> ...and now we can just do an accelerated "write zeroes to disk" which is
> conveniently always within EOF now.  I /think/ this looks ok to me now,
> though I'm curious how extensively the new fallocate mode has been
> tested with fsx and unaligned file ranges?  And rt volumes with rt
> extent size > 1 fsblock.
> 

I tested it extensively with fsblock with fsx and I added some tests locally (which I will
send it upstream soon) for unaligned edges. Some of the corner cases I figured because of some
fsx test (generic/363). But I didn't do it for all the profiles. I will also test it for `-r
extsize=8k -b size=4k`.

Thanks for the review Darrick.

--
Pankaj


      reply	other threads:[~2026-06-26 16:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 11:45 [PATCH v8 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
2026-06-25 11:45 ` [PATCH v8 1/2] xfs: add an allocation mode to xfs_alloc_file_space() Pankaj Raghav
2026-06-25 17:01   ` Darrick J. Wong
2026-06-25 11:45 ` [PATCH v8 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
2026-06-25 17:20   ` Darrick J. Wong
2026-06-26 16:04     ` Pankaj Raghav [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7c2276c-b36c-4a69-a913-c5abfc403d01@linux.dev \
    --to=pankaj.raghav@linux.dev \
    --cc=andres@anarazel.de \
    --cc=bfoster@redhat.com \
    --cc=cem@kernel.org \
    --cc=dgc@kernel.org \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=kundan.kumar@samsung.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lukas@herbolt.com \
    --cc=p.raghav@samsung.com \
    --cc=yi.zhang@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox