From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size
Date: Wed, 21 Oct 2020 14:24:46 +0800 [thread overview]
Message-ID: <20201021062554.68132-1-wqu@suse.com> (raw)
Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage_data_fullpage_write
=== Overview ===
To make 64K page size systems to mount 4K sector size btrfs and to
regular read-write.
=== What works ===
- Subpage data read
Both uncompressed and compressed data
- Subpage metadata read/write
So far single thread "fsstress" loops hasn't crash the system with all
debug options enabled.
(Currently running with "-n 2048" in a 1024 run loop).
This also means, we can do subpage sized pure metadata operations like
reflink. (e.g. we can result 4K sector using reflink without problem)
- Full page data write
Only tested uncompressed data yet.
This means all data write will happen in page size, including:
* buffered write
* dio write
* hole punch for unaligned range
This means even just one 4K sector is dirtied, we will writeback the
full 64K page as an data extent.
=== What doesn't works ===
- Balance
- Scrub
All failed with csum check failure, may be quick to solve, but the
current development status and patchset size is enough for a milestone.
- Dev replace
Unable to submit subpage data writes.
=== Challenges and solutions ===
- Metadata
* One 64K page can contain several tree blocks
Instead of full page read/write/lock, we use extent io tree to do
sector aligned read/write/lock, and avoid full page lock if
possible.
* Metadata can cross 64K page boundary
This only happens for certain converted fs. Consider how little used
just reject them for now and fix convert.
Overall, metadata is not that complex as metadata has very limited
interfaces.
- Data
* Data has more page status and uses ordered extents
* Data subpage write can be handled by iomap
Instead of using extent io tree for each page status, goes full page
write back.
So that I won't waste time to implement something which is designed
to be replaced.
- Testing
* No way to test under 86_64
Currently I'm using an RK3399 board with NVME driver, planning to
move to a Xavier AGX board.
But we plan to add 2K sector size support as a pure testing sector
size for x86_64 (but still 4K as minimal node size) to test subpage
routines and make my life a little easier.
=== TODO ===
- More testing
Obviously
- Balance and scrub support
- Limited data subpage write
Mostly for balance and replace, as a workaround.
- Iomap support for true subpage data writeback
=== Patchset structure ===
Patch 01~03: Small bug fixes
Patch 04~22: Generic cleanup and refactors, which make sense without
subpage support
Patch 23~27: Subpage specific cleanup and refactors.
Patch 28~42: Enablement for subpage RO mount
Patch 43~52: Enablement for subpage metadata write
Patch 53~68: Enablement for subpage data write (although still in
page size)
=== Changelog ===
v2:
- Migrating to extent_io_tree based status/locking mechanism
This gets rid of the ad-hoc subpage_eb_mapping structure and extra
timing to verify the extent buffers.
This also brings some extra cleanups for btree inode extent io tree
hooks which makes no sense for both subpage and regular sector size.
This also completely removes the requirement for page status like
Locked/Uptodate/Dirty. Now metadata pages only utilize Private status,
while private pointer is always NULL.
- Submit proper subpage sized read for metadata
With the help of extent io tree, we no longer need to bother full page
read. Now submit subpage sized metadata read and do subpage locking.
- Remove some unnecessary refactors
Some refactors like extracting detach_extent_buffer_pages() doesn't
really make the code cleaner. We can easily add subpage specific
branch.
- Address the comments from v1
v3:
- Add compressed data read fix
- Also update page status according to extent status for btree inode
This makes us to reuse more code from the existing code base.
- Add metadata write support
Only manually tested (with a fs created under x86_64, and script to do
metadata only operations under aarch64 with 64K page size).
- More cleanup/refactors during metadata write support development.
v4:
- Add more refactors
The mostly obvious one is the refactor of __set/__clear_extent_bit()
to make the less common options less visible, and allow me to add more
options more easily.
- Add full data page write support
- More bug fixes for existing patches
Mostly the bug found during fsstress tests.
- Reduce page locking to minimal for metadata
I hit a possible ABBA lock, where extent io tree locking and page
locking leads to dead lock.
To resolve it without adding more requirement for page locking
sequence, subpage metadata only rely on extent io tree locking.
Page locking is only reserved for unavoidable cases, like calling
clear_page_dirty_for_io().
Goldwyn Rodrigues (1):
btrfs: use iosize while reading compressed pages
Qu Wenruo (67):
btrfs: extent-io-tests: remove invalid tests
btrfs: extent_io: fix the comment on lock_extent_buffer_for_io().
btrfs: extent_io: update the comment for find_first_extent_bit()
btrfs: extent_io: sink the @failed_start parameter for
set_extent_bit()
btrfs: make btree inode io_tree has its special owner
btrfs: disk-io: replace @fs_info and @private_data with @inode for
btrfs_wq_submit_bio()
btrfs: inode: sink parameter @start and @len for check_data_csum()
btrfs: extent_io: unexport extent_invalidatepage()
btrfs: extent_io: remove the forward declaration and rename
__process_pages_contig
btrfs: extent_io: rename pages_locked in process_pages_contig()
btrfs: extent_io: only require sector size alignment for page read
btrfs: extent_io: remove the extent_start/extent_len for
end_bio_extent_readpage()
btrfs: extent_io: integrate page status update into
endio_readpage_release_extent()
btrfs: extent_io: rename page_size to io_size in submit_extent_page()
btrfs: extent_io: add assert_spin_locked() for
attach_extent_buffer_page()
btrfs: extent_io: extract the btree page submission code into its own
helper function
btrfs: extent_io: calculate inline extent buffer page size based on
page size
btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size
devided values
btrfs: extent_io: sink less common parameters for __set_extent_bit()
btrfs: extent_io: sink less common parameters for __clear_extent_bit()
btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for
btrfs_mark_buffer_dirty()
btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than
page size
btrfs: disk-io: extract the extent buffer verification from
btree_readpage_end_io_hook()
btrfs: disk-io: accept bvec directly for csum_dirty_buffer()
btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size
btrfs: introduce a helper to determine if the sectorsize is smaller
than PAGE_SIZE
btrfs: extent_io: allow find_first_extent_bit() to find a range with
exact bits match
btrfs: extent_io: don't allow tree block to cross page boundary for
subpage support
btrfs: extent_io: update num_extent_pages() to support subpage sized
extent buffer
btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage()
btrfs: extent-io: make type of extent_state::state to be at least 32
bits
btrfs: extent_io: use extent_io_tree to handle subpage extent buffer
allocation
btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support
subpage size
btrfs: extent_io: make the assert test on page uptodate able to handle
subpage
btrfs: extent_io: implement subpage metadata read and its endio
function
btrfs: extent_io: implement try_release_extent_buffer() for subpage
metadata support
btrfs: extent_io: extra the core of test_range_bit() into
test_range_bit_nolock()
btrfs: extent_io: introduce EXTENT_READ_SUBMITTED to handle subpage
data read
btrfs: set btree inode track_uptodate for subpage support
btrfs: allow RO mount of 4K sector size fs on 64K page system
btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check
on subpage metadata
btrfs: disk-io: support subpage metadata csum calculation at write
time
btrfs: extent_io: prevent extent_state from being merged for btree io
tree
btrfs: extent_io: make set_extent_buffer_dirty() to support subpage
sized metadata
btrfs: extent_io: add subpage support for clear_extent_buffer_dirty()
btrfs: extent_io: make set_btree_ioerr() accept extent buffer
btrfs: extent_io: introduce write_one_subpage_eb() function
btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible
btrfs: extent_io: introduce submit_btree_subpage() to submit a page
for subpage metadata write
btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function
btrfs: inode: make can_nocow_extent() check only return 1 if the range
is no smaller than PAGE_SIZE
btrfs: file: calculate reserve space based on PAGE_SIZE for buffered
write
btrfs: file: make hole punching page aligned for subpage
btrfs: file: make btrfs_dirty_pages() follow page size to mark extent
io tree
btrfs: file: make btrfs_file_write_iter() to be page aligned
btrfs: output extra info for space info update underflow
btrfs: delalloc-space: make data space reservation to be page aligned
btrfs: scrub: allow scrub to work with subpage sectorsize
btrfs: inode: make btrfs_truncate_block() to do page alignment
btrfs: file: make hole punch and zero range to be page aligned
btrfs: file: make btrfs_fallocate() to use PAGE_SIZE as blocksize
btrfs: inode: always mark the full page range delalloc for
btrfs_page_mkwrite()
btrfs: inode: require page alignement for direct io
btrfs: inode: only do NOCOW write for page aligned extent
btrfs: reflink: do full page writeback for reflink prepare
btrfs: support subpage read write for test
fs/btrfs/block-group.c | 2 +-
fs/btrfs/btrfs_inode.h | 12 +
fs/btrfs/ctree.c | 5 +-
fs/btrfs/ctree.h | 43 +-
fs/btrfs/delalloc-space.c | 19 +-
fs/btrfs/disk-io.c | 425 ++++++--
fs/btrfs/disk-io.h | 8 +-
fs/btrfs/extent-io-tree.h | 145 ++-
fs/btrfs/extent-tree.c | 2 +-
fs/btrfs/extent_io.c | 1576 ++++++++++++++++++++++--------
fs/btrfs/extent_io.h | 27 +-
fs/btrfs/extent_map.c | 2 +-
fs/btrfs/file.c | 140 ++-
fs/btrfs/free-space-cache.c | 2 +-
fs/btrfs/inode.c | 117 ++-
fs/btrfs/reflink.c | 36 +-
fs/btrfs/relocation.c | 2 +-
fs/btrfs/scrub.c | 8 -
fs/btrfs/space-info.h | 4 +-
fs/btrfs/struct-funcs.c | 18 +-
fs/btrfs/tests/extent-io-tests.c | 26 +-
fs/btrfs/transaction.c | 4 +-
fs/btrfs/volumes.c | 2 +-
include/trace/events/btrfs.h | 1 +
24 files changed, 1927 insertions(+), 699 deletions(-)
--
2.28.0
next reply other threads:[~2020-10-21 6:26 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-21 6:24 Qu Wenruo [this message]
2020-10-21 6:24 ` [PATCH v4 01/68] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
2020-10-26 23:26 ` David Sterba
2020-10-27 0:44 ` Qu Wenruo
2020-11-03 6:07 ` Qu Wenruo
2020-10-21 6:24 ` [PATCH v4 02/68] btrfs: use iosize while reading compressed pages Qu Wenruo
2020-10-21 6:24 ` [PATCH v4 03/68] btrfs: extent_io: fix the comment on lock_extent_buffer_for_io() Qu Wenruo
2020-10-21 6:24 ` [PATCH v4 04/68] btrfs: extent_io: update the comment for find_first_extent_bit() Qu Wenruo
2020-10-21 6:24 ` [PATCH v4 05/68] btrfs: extent_io: sink the @failed_start parameter for set_extent_bit() Qu Wenruo
2020-10-21 6:24 ` [PATCH v4 06/68] btrfs: make btree inode io_tree has its special owner Qu Wenruo
2020-10-21 6:24 ` [PATCH v4 07/68] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio() Qu Wenruo
2020-10-21 22:00 ` Goldwyn Rodrigues
2020-10-21 6:24 ` [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum() Qu Wenruo
2020-10-21 22:11 ` Goldwyn Rodrigues
2020-10-27 0:13 ` David Sterba
2020-10-27 0:50 ` Qu Wenruo
2020-10-27 23:17 ` David Sterba
2020-10-28 0:57 ` Qu Wenruo
2020-10-29 19:38 ` David Sterba
2020-10-21 6:24 ` [PATCH v4 09/68] btrfs: extent_io: unexport extent_invalidatepage() Qu Wenruo
2020-10-27 0:24 ` David Sterba
2020-10-21 6:24 ` [PATCH v4 10/68] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig Qu Wenruo
2020-10-27 0:28 ` David Sterba
2020-10-27 0:50 ` Qu Wenruo
2020-10-27 23:25 ` David Sterba
2020-10-21 6:24 ` [PATCH v4 11/68] btrfs: extent_io: rename pages_locked in process_pages_contig() Qu Wenruo
2020-10-21 6:24 ` [PATCH v4 12/68] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
2020-10-21 6:24 ` [PATCH v4 13/68] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage() Qu Wenruo
2020-10-27 10:29 ` David Sterba
2020-10-27 12:15 ` Qu Wenruo
2020-10-27 23:31 ` David Sterba
2020-10-21 6:25 ` [PATCH v4 14/68] btrfs: extent_io: integrate page status update into endio_readpage_release_extent() Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 15/68] btrfs: extent_io: rename page_size to io_size in submit_extent_page() Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 16/68] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
2020-10-27 10:43 ` David Sterba
2020-10-21 6:25 ` [PATCH v4 17/68] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 18/68] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
2020-10-27 11:16 ` David Sterba
2020-10-27 11:20 ` David Sterba
2020-10-21 6:25 ` [PATCH v4 19/68] btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 20/68] btrfs: extent_io: sink less common parameters for __set_extent_bit() Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 21/68] btrfs: extent_io: sink less common parameters for __clear_extent_bit() Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 22/68] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty() Qu Wenruo
2020-10-27 15:43 ` Goldwyn Rodrigues
2020-10-21 6:25 ` [PATCH v4 23/68] btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 24/68] btrfs: disk-io: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 25/68] btrfs: disk-io: accept bvec directly for csum_dirty_buffer() Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 26/68] btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 27/68] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 28/68] btrfs: extent_io: allow find_first_extent_bit() to find a range with exact bits match Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 29/68] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 30/68] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 31/68] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 32/68] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 33/68] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 34/68] btrfs: extent_io: use extent_io_tree to handle subpage extent buffer allocation Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 35/68] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 36/68] btrfs: extent_io: make the assert test on page uptodate able to handle subpage Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 37/68] btrfs: extent_io: implement subpage metadata read and its endio function Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 38/68] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 39/68] btrfs: extent_io: extra the core of test_range_bit() into test_range_bit_nolock() Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 40/68] btrfs: extent_io: introduce EXTENT_READ_SUBMITTED to handle subpage data read Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 41/68] btrfs: set btree inode track_uptodate for subpage support Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 42/68] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
2020-10-29 20:11 ` David Sterba
2020-10-29 23:34 ` Michał Mirosław
2020-10-29 23:56 ` Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 43/68] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 44/68] btrfs: disk-io: support subpage metadata csum calculation at write time Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 45/68] btrfs: extent_io: prevent extent_state from being merged for btree io tree Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 46/68] btrfs: extent_io: make set_extent_buffer_dirty() to support subpage sized metadata Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 47/68] btrfs: extent_io: add subpage support for clear_extent_buffer_dirty() Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 48/68] btrfs: extent_io: make set_btree_ioerr() accept extent buffer Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 49/68] btrfs: extent_io: introduce write_one_subpage_eb() function Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 50/68] btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 51/68] btrfs: extent_io: introduce submit_btree_subpage() to submit a page for subpage metadata write Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 52/68] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 53/68] btrfs: inode: make can_nocow_extent() check only return 1 if the range is no smaller than PAGE_SIZE Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 54/68] btrfs: file: calculate reserve space based on PAGE_SIZE for buffered write Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 55/68] btrfs: file: make hole punching page aligned for subpage Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 56/68] btrfs: file: make btrfs_dirty_pages() follow page size to mark extent io tree Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 57/68] btrfs: file: make btrfs_file_write_iter() to be page aligned Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 58/68] btrfs: output extra info for space info update underflow Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 59/68] btrfs: delalloc-space: make data space reservation to be page aligned Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 60/68] btrfs: scrub: allow scrub to work with subpage sectorsize Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 61/68] btrfs: inode: make btrfs_truncate_block() to do page alignment Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 62/68] btrfs: file: make hole punch and zero range to be page aligned Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 63/68] btrfs: file: make btrfs_fallocate() to use PAGE_SIZE as blocksize Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 64/68] btrfs: inode: always mark the full page range delalloc for btrfs_page_mkwrite() Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 65/68] btrfs: inode: require page alignement for direct io Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 66/68] btrfs: inode: only do NOCOW write for page aligned extent Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 67/68] btrfs: reflink: do full page writeback for reflink prepare Qu Wenruo
2020-10-21 6:25 ` [PATCH v4 68/68] btrfs: support subpage read write for test Qu Wenruo
2020-10-21 11:22 ` [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size David Sterba
2020-10-21 11:50 ` Qu Wenruo
2020-11-02 14:56 ` David Sterba
2020-11-03 0:06 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201021062554.68132-1-wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).