linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] btrfs: initial bs > ps support
@ 2025-09-21 22:40 Qu Wenruo
  2025-09-21 22:40 ` [PATCH v2 1/9] btrfs: fix the incorrect @max_bytes value for find_lock_delalloc_range() Qu Wenruo
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Qu Wenruo @ 2025-09-21 22:40 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Add a new patch to fix the incorrect @max_bytes of
  find_lock_delalloc_range()
  This in fact also fixes a very rare corner case where bs < ps support
  is also affected.

  This allows us to re-enable extra large folios (folio > bs) for
  bs > ps cases.

RFC->v1
- Disable extra large folios for bs > ps mounts
  Such extra large folios are larger than a block.

  Still debugging, but disabling it makes 8K block size runs survive the
  full fs tests, with some minor failures due to the limitations.

  This may be something affecting regular large folios (folio > bs,
  but bs <= ps).

This series enables the initial bs > ps support, with several
limitations:

- No direct IO support
  All direct IOs fall back to buffered ones.

- No RAID56 support
  Any fs with RAID56 feature will be rejected at mount time.

- No encoded read/write/send
  Encoded send will fallback to the regular send (reading from page
  cache).
  Encoded read/write utilized by send/receive will fallback to regular
  ones.

Above limits are introduced by the fact that, we require large folios to
cover at least one fs block, so that no block can cross large folio
boundaries.

This simplifies our checksum and RAID56 handling.

The problem is, user space programs can only ensure their memory is
properly aligned in virtual addresses, but have no control on the
backing folios. Thus they can got a contiguous memory but is backed
by incontiguous pages.

In that case, it will break the "no block can cross large folio
boundaries" assumption, and will need a very complex mechanism to handle
checksum, especially for RAID56.

The same applies to encoded send, which uses vmallocated memory.

In the long run, we will need to support all those complex mechanism.

[FUTURE ROADMAP]
Currently bs > ps support is only to allow extra compatibility, e.g.
allowing x86_64 to mount a btrfs which is originally created on ppc64
(64K page size, 64K block size).

But this should also open a new door for btrfs RAID56 write hole
problems in the future, by enforcing a larger block size and fixed 
power-of-2 data stripes, so that every write can fill a full stripe,
just like RAIDZ.

E.g. with 8K block size, all data writes are now in 8K sizes, and will
always be a full stripe write for a 3 disks RAID5 with a stripe length
of 4K.

This RAIDZ like solution will allow a much simpler RAID56 (no more RMW
any more), at the cost of a larger block size (more write-amplification,
higher memory usage etc).

Qu Wenruo (9):
  btrfs: fix the incorrect @max_bytes value for
    find_lock_delalloc_range()
  btrfs: prepare compression folio alloc/free for bs > ps cases
  btrfs: prepare zstd to support bs > ps cases
  btrfs: prepare lzo to support bs > ps cases
  btrfs: prepare zlib to support bs > ps cases
  btrfs: prepare scrub to support bs > ps cases
  btrfs: fix symbolic link reading when bs > ps
  btrfs: add extra ASSERT()s to catch unaligned bios
  btrfs: enable experimental bs > ps support

 fs/btrfs/bio.c         | 27 +++++++++++++++++++
 fs/btrfs/compression.c | 42 ++++++++++++++++++++---------
 fs/btrfs/compression.h |  2 +-
 fs/btrfs/direct-io.c   | 12 +++++++++
 fs/btrfs/disk-io.c     | 14 ++++++++--
 fs/btrfs/extent_io.c   | 21 +++++++++++----
 fs/btrfs/extent_io.h   |  3 ++-
 fs/btrfs/fs.c          | 20 ++++++++++++--
 fs/btrfs/fs.h          |  6 +++++
 fs/btrfs/inode.c       | 18 +++++++------
 fs/btrfs/ioctl.c       | 35 +++++++++++++++++-------
 fs/btrfs/lzo.c         | 59 ++++++++++++++++++++++-------------------
 fs/btrfs/raid56.c      | 42 +++++++++++++++++++----------
 fs/btrfs/raid56.h      |  4 +--
 fs/btrfs/scrub.c       | 51 +++++++++++++++++++----------------
 fs/btrfs/send.c        |  9 ++++++-
 fs/btrfs/zlib.c        | 60 +++++++++++++++++++++++++++---------------
 fs/btrfs/zstd.c        | 44 +++++++++++++++++--------------
 18 files changed, 321 insertions(+), 148 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-09-22 10:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-21 22:40 [PATCH v2 0/9] btrfs: initial bs > ps support Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 1/9] btrfs: fix the incorrect @max_bytes value for find_lock_delalloc_range() Qu Wenruo
2025-09-21 22:46   ` Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 2/9] btrfs: prepare compression folio alloc/free for bs > ps cases Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 3/9] btrfs: prepare zstd to support " Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 4/9] btrfs: prepare lzo " Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 5/9] btrfs: prepare zlib " Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 6/9] btrfs: prepare scrub " Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 7/9] btrfs: fix symbolic link reading when bs > ps Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 8/9] btrfs: add extra ASSERT()s to catch unaligned bios Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 9/9] btrfs: enable experimental bs > ps support Qu Wenruo
2025-09-22 10:21   ` David Sterba
2025-09-22 10:27     ` Qu Wenruo
2025-09-22 10:42       ` David Sterba
2025-09-22 10:51         ` Qu Wenruo
2025-09-22  9:12 ` [PATCH v2 0/9] btrfs: initial " David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).