From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v2 0/9] btrfs: initial bs > ps support
Date: Mon, 22 Sep 2025 08:10:42 +0930 [thread overview]
Message-ID: <cover.1758494326.git.wqu@suse.com> (raw)
[CHANGELOG]
v2:
- Add a new patch to fix the incorrect @max_bytes of
find_lock_delalloc_range()
This in fact also fixes a very rare corner case where bs < ps support
is also affected.
This allows us to re-enable extra large folios (folio > bs) for
bs > ps cases.
RFC->v1
- Disable extra large folios for bs > ps mounts
Such extra large folios are larger than a block.
Still debugging, but disabling it makes 8K block size runs survive the
full fs tests, with some minor failures due to the limitations.
This may be something affecting regular large folios (folio > bs,
but bs <= ps).
This series enables the initial bs > ps support, with several
limitations:
- No direct IO support
All direct IOs fall back to buffered ones.
- No RAID56 support
Any fs with RAID56 feature will be rejected at mount time.
- No encoded read/write/send
Encoded send will fallback to the regular send (reading from page
cache).
Encoded read/write utilized by send/receive will fallback to regular
ones.
Above limits are introduced by the fact that, we require large folios to
cover at least one fs block, so that no block can cross large folio
boundaries.
This simplifies our checksum and RAID56 handling.
The problem is, user space programs can only ensure their memory is
properly aligned in virtual addresses, but have no control on the
backing folios. Thus they can got a contiguous memory but is backed
by incontiguous pages.
In that case, it will break the "no block can cross large folio
boundaries" assumption, and will need a very complex mechanism to handle
checksum, especially for RAID56.
The same applies to encoded send, which uses vmallocated memory.
In the long run, we will need to support all those complex mechanism.
[FUTURE ROADMAP]
Currently bs > ps support is only to allow extra compatibility, e.g.
allowing x86_64 to mount a btrfs which is originally created on ppc64
(64K page size, 64K block size).
But this should also open a new door for btrfs RAID56 write hole
problems in the future, by enforcing a larger block size and fixed
power-of-2 data stripes, so that every write can fill a full stripe,
just like RAIDZ.
E.g. with 8K block size, all data writes are now in 8K sizes, and will
always be a full stripe write for a 3 disks RAID5 with a stripe length
of 4K.
This RAIDZ like solution will allow a much simpler RAID56 (no more RMW
any more), at the cost of a larger block size (more write-amplification,
higher memory usage etc).
Qu Wenruo (9):
btrfs: fix the incorrect @max_bytes value for
find_lock_delalloc_range()
btrfs: prepare compression folio alloc/free for bs > ps cases
btrfs: prepare zstd to support bs > ps cases
btrfs: prepare lzo to support bs > ps cases
btrfs: prepare zlib to support bs > ps cases
btrfs: prepare scrub to support bs > ps cases
btrfs: fix symbolic link reading when bs > ps
btrfs: add extra ASSERT()s to catch unaligned bios
btrfs: enable experimental bs > ps support
fs/btrfs/bio.c | 27 +++++++++++++++++++
fs/btrfs/compression.c | 42 ++++++++++++++++++++---------
fs/btrfs/compression.h | 2 +-
fs/btrfs/direct-io.c | 12 +++++++++
fs/btrfs/disk-io.c | 14 ++++++++--
fs/btrfs/extent_io.c | 21 +++++++++++----
fs/btrfs/extent_io.h | 3 ++-
fs/btrfs/fs.c | 20 ++++++++++++--
fs/btrfs/fs.h | 6 +++++
fs/btrfs/inode.c | 18 +++++++------
fs/btrfs/ioctl.c | 35 +++++++++++++++++-------
fs/btrfs/lzo.c | 59 ++++++++++++++++++++++-------------------
fs/btrfs/raid56.c | 42 +++++++++++++++++++----------
fs/btrfs/raid56.h | 4 +--
fs/btrfs/scrub.c | 51 +++++++++++++++++++----------------
fs/btrfs/send.c | 9 ++++++-
fs/btrfs/zlib.c | 60 +++++++++++++++++++++++++++---------------
fs/btrfs/zstd.c | 44 +++++++++++++++++--------------
18 files changed, 321 insertions(+), 148 deletions(-)
--
2.50.1
next reply other threads:[~2025-09-21 22:47 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-21 22:40 Qu Wenruo [this message]
2025-09-21 22:40 ` [PATCH v2 1/9] btrfs: fix the incorrect @max_bytes value for find_lock_delalloc_range() Qu Wenruo
2025-09-21 22:46 ` Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 2/9] btrfs: prepare compression folio alloc/free for bs > ps cases Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 3/9] btrfs: prepare zstd to support " Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 4/9] btrfs: prepare lzo " Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 5/9] btrfs: prepare zlib " Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 6/9] btrfs: prepare scrub " Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 7/9] btrfs: fix symbolic link reading when bs > ps Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 8/9] btrfs: add extra ASSERT()s to catch unaligned bios Qu Wenruo
2025-09-21 22:40 ` [PATCH v2 9/9] btrfs: enable experimental bs > ps support Qu Wenruo
2025-09-22 10:21 ` David Sterba
2025-09-22 10:27 ` Qu Wenruo
2025-09-22 10:42 ` David Sterba
2025-09-22 10:51 ` Qu Wenruo
2025-09-22 9:12 ` [PATCH v2 0/9] btrfs: initial " David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1758494326.git.wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).