Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] btrfs: implement FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_INSERT_RANGE
@ 2026-04-18 14:38 Paul Richards
  2026-04-18 14:38 ` [PATCH 1/3] btrfs: refactor btrfs_fallocate() ahead of supporting more modes Paul Richards
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Paul Richards @ 2026-04-18 14:38 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, Paul Richards

This series adds support for FALLOC_FL_COLLAPSE_RANGE and
FALLOC_FL_INSERT_RANGE to btrfs_fallocate(). Both operations are
already supported by ext4 and xfs. The userspace contract is
documented in fallocate(2).

Patch 1 refactors btrfs_fallocate() to dispatch via a switch statement,
moving punch_hole into its own function and decoupling locking from the
per-operation helpers. This is similar to the implementaitons for ext4
and xfs. The allocate-range and zero-range paths remain coupled since
they share some setup logic.

Patches 2 and 3 add COLLAPSE_RANGE and INSERT_RANGE respectively.

== Implementation approach ==

For COLLAPSE_RANGE:
 - The removed region [offset, offset+len) is punched out via
   btrfs_replace_file_extents(), which handles boundary splitting.
 - All EXTENT_DATA keys with key.offset >= offset+len are shifted
   leftward by len in forward order.

For INSERT_RANGE:
 - All EXTENT_DATA keys with key.offset >= offset are shifted rightward
   by len in reverse order (required to avoid key collisions).
 - No pre-splitting of straddling extents is needed: the left portion
   of a straddling extent stays in place, the right portion is shifted;
   both reference the same physical extent via their existing
   extent_offset fields.

For each shifted key, the corresponding back-reference in the extent
tree is updated via a shared helper btrfs_shift_extent_backref().

After the key-shift loop, btrfs_drop_extent_map_range() is called to
invalidate the in-memory extent map cache. This is important for reads
after the fallocate operation to ensure they obtain data for the new
offsets.

The page cache is flushed and invalidated upfront (before any extent
manipulation) following the ext4/xfs pattern. The inode lock
(BTRFS_ILOCK_MMAP) is held throughout, preventing new dirty pages from
appearing during the operation.

Transaction cycling: the key-shift loop cycles transactions every
BTRFS_INSERT_COLLAPSE_TRANSACTION_CYCLE_INTERVAL (32) items to avoid
holding a single transaction open across a large number of extents.

Both operations are gated on CONFIG_BTRFS_EXPERIMENTAL.

== Known limitations ==

INSERT_RANGE returns -EOPNOTSUPP for inlined files. Supporting inline
files will require promoting the existing inline extent to a regular
one, since inline extends are supported only at the very start of a
file.

In the opposite direction, COLLAPSE_RANGE will not create inline files
like it should if the remaining data is small enough.

I intend to address both of these limitations.

== Testing ==

Tested with a Rust-based functional test suite covering:
 - Collapse and insert at the start, middle of a file
 - Multiple sequential operations on the same file
 - Files with multiple extents (fsync between writes to force separate
   extent items)
 - Files with holes (explicit punch_hole and implicit sparse writes)
 - Compressed extents (mount -o compress=zstd)
 - Transaction cycling (interval reduced to 4 during testing, verified
   in dmesg logs)
 - Inline files, verified that -EOPNOTSUPP is returned.

The same tests pass on both btrfs and xfs (modulo the inline files).

I have not run fstests which I know contains tests for INSERT_RANGE
and COLLAPSE_RANGE. I will do so.

== Questions for reviewers ==

1. Transaction cycling interval: we use 32 items per cycle. Is this
   the right threshold, or is there an established convention in btrfs
   for this kind of loop?

2. Extent lock scope for collapse: we hold the extent lock only on
   [offset, offset+len) during the hole punch, not on the full
   [offset, i_size) range that the key-shift loop operates on. Is
   this safe, or should we lock the full affected range?

3. CONFIG_BTRFS_EXPERIMENTAL gate: is this the right gate for these
   operations, or should they be unconditionally available?

== Notes ==

This is my first kernel contribution. Development was significantly
assisted by an LLM (Amazon Q Developer). The implementation, testing,
and final review decisions are my own.

Various btrfs_info() print statements, assertions, and comments that
were useful during development and testing have been left in place,
but will be removed or streamlined in the next revision.

Paul Richards (3):
  btrfs: refactor btrfs_fallocate() ahead of supporting more modes
  btrfs: support for FALLOC_FL_COLLAPSE_RANGE in btrfs_fallocate()
  btrfs: support for FALLOC_FL_INSERT_RANGE in btrfs_fallocate()

 fs/btrfs/file.c | 601 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 578 insertions(+), 23 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-04-19 22:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-18 14:38 [RFC PATCH 0/3] btrfs: implement FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_INSERT_RANGE Paul Richards
2026-04-18 14:38 ` [PATCH 1/3] btrfs: refactor btrfs_fallocate() ahead of supporting more modes Paul Richards
2026-04-19  0:57   ` Qu Wenruo
2026-04-18 14:38 ` [PATCH 2/3] btrfs: support for FALLOC_FL_COLLAPSE_RANGE in btrfs_fallocate() Paul Richards
2026-04-19  1:29   ` Qu Wenruo
2026-04-18 14:38 ` [PATCH 3/3] btrfs: support for FALLOC_FL_INSERT_RANGE " Paul Richards
2026-04-19  4:44   ` Qu Wenruo
2026-04-19  0:25 ` [RFC PATCH 0/3] btrfs: implement FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_INSERT_RANGE Qu Wenruo
2026-04-19  5:08   ` Qu Wenruo
2026-04-19 18:40     ` Paul Richards
2026-04-19 22:30       ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox