From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Cc: linux-block@vger.kernel.org, dm-devel@redhat.com
Subject: Re: [PATCH 00/17] btrfs: split bio at btrfs_map_bio() time
Date: Wed, 1 Dec 2021 16:57:35 +0800 [thread overview]
Message-ID: <bf794c47-35aa-7823-35ea-854873a20949@suse.com> (raw)
In-Reply-To: <20211201051756.53742-1-wqu@suse.com>
On 2021/12/1 13:17, Qu Wenruo wrote:
> [BACKGROUND]
>
> Currently btrfs never uses bio_split() to split its bio against RAID
> stripe boundaries.
>
> Instead inside btrfs we check our stripe boundary everytime we allocate
> a new bio, and ensure the new bio never cross stripe boundaries.
>
> [PROBLEMS]
>
> Although this behavior works fine, it's against the common practice used in
> stacked drivers, and is making the effort to convert to iomap harder.
>
> There is also an hidden burden, every time we allocate a new bio, we uses
> BIO_MAX_BVECS, but since we know the boundaries, for RAID0/RAID10 we can
> only fit at most 16 pages (fixed 64K stripe size, and 4K page size),
> wasting the 256 slots we allocated.
>
> [CHALLENGES]
>
> To change the situation, this patchset attempts to improve the situation
> by moving the bio split into btrfs_map_bio() time, so upper layer should
> no longer bother the bio split against RAID stripes or even chunk
> boundaries.
>
> But there are several challenges:
>
> - Conflicts in various endio functions
> We want the existing granularity, instead of chained endio, thus we
> must make the involved endio functions to handle split bios.
>
> Although most endio functions are already doing their works
> independent of the bio size, they are not yet fully handling split
> bio.
>
> This patch will convert them to use saved bi_iter and only iterate
> the split range instead of the whole bio.
> This change involved 3 types of IOs:
>
> * Buffered IO
> Including both data and metadata
> * Direct IO
> * Compressed IO
>
> Their endio functions needs different level of updates to handle split
> bios.
>
> Furthermore, there is another endio, end_workqueue_bio(), it can't
> handle split bios at all, thus we change the timing so that
> btrfs_bio_wq_end_io() is only called after the bio being split.
>
> - Checksum verification
> Currently we rely on btrfs_bio::csum to contain the checksum for the
> whole bio.
> If one bio get split, csum will no longer points to the correct
> location for the split bio.
>
> This can be solved by introducing btrfs_bio::offset_to_original, and
> use that new member to calculate where we should read csum from.
>
> For the parent bio, it still has btrfs_bio::csum for the whole bio,
> thus it can still free it correctly.
>
> - Independent endio for each split bio
> Unlike stack drivers, for RAID10 btrfs needs to try its best effort to
> read every sectors, to handle the following case: (X means bad, either
> unable to read or failed to pass checksum verification, V means good)
>
> Dev 1 (missing) | D1 (X) |
> Dev 2 (OK) | D1 (V) |
> Dev 3 (OK) | D2 (V) |
> Dev 4 (OK) | D2 (X) |
>
> In the above RAID10 case, dev1 is missing, and although dev4 is fine,
> its D2 sector is corrupted (by bit rot or whatever).
>
> If we use bio_chain(), read bio for both D1 and D2 will be split, and
> since D1 is missing, the whole D1 and D2 read will be marked as error,
> thus we will try to read from dev2 and dev4.
>
> But D2 in dev4 has csum mismatch, we can only rebuild D1 and D2
> correctly from dev2:D1 and dev3:D2.
>
> This patchset resolve this by saving bi_iter into btrfs_bio::iter, and
> uses that at endio to iterate only the split part of an bio.
> Other than this, existing read/write page endio functions can handle
> them properly without problem.
>
> - Bad RAID56 naming/functionality
> There are quite some RAID56 call sites relies on specific behavior on
> __btrfs_map_block(), like returning @map_length as stripe_len other
> than real mapped length.
>
> This is handled by some small cleanups specific for RAID56.
>
> [NEED FEEDBACK]
> In this refactor, btrfs is utilizing a lot of call sites like:
>
> btrfs_bio_save_iter(); // Save bi_iter into some other location
> __bio_for_each_segment(bvec, bio, iter, btrfs_bio->iter) {
> /* Doing endio for each bvec */
> }
>
> And manually implementing an endio which does some work of
> __bio_chain_endio() but with extra btrfs specific workload.
>
> I'm wondering if block layer is fine to provide some *enhanced* chain
> bio facilities?
>
> [CHANGELOG]
> RFC->v1:
> - Better patch split
> Now patch 01~06 are refactors/cleanups/preparations.
> While 07~13 are the patches that doing the conversion while can handle
> both old and new bio split timings.
> Finally patch 14~16 convert the bio split call sites one by one to
> newer facility.
> The final patch is just a small clean up.
>
> - Various bug fixes
> During the full fstests run, various stupid bugs are exposed and
> fixed.
The latest version can be fetched from this branch:
https://github.com/adam900710/linux/tree/refactor_chunk_map
Which already has the fix for the btrfs/187 crash, which is caused by a
missing modification for scrub_stripe_index_and_offset().
Thanks,
Qu
>
> Qu Wenruo (17):
> btrfs: update an stale comment on btrfs_submit_bio_hook()
> btrfs: save bio::bi_iter into btrfs_bio::iter before submitting
> btrfs: use correct bio size for error message in btrfs_end_dio_bio()
> btrfs: refactor btrfs_map_bio()
> btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio()
> btrfs: replace btrfs_dio_private::refs with
> btrfs_dio_private::pending_bytes
> btrfs: introduce btrfs_bio_split() helper
> btrfs: make data buffered read path to handle split bio properly
> btrfs: make data buffered write endio function to be split bio
> compatible
> btrfs: make metadata write endio functions to be split bio compatible
> btrfs: make dec_and_test_compressed_bio() to be split bio compatible
> btrfs: return proper mapped length for RAID56 profiles in
> __btrfs_map_block()
> btrfs: allow btrfs_map_bio() to split bio according to chunk stripe
> boundaries
> btrfs: remove buffered IO stripe boundary calculation
> btrfs: remove stripe boundary calculation for compressed IO
> btrfs: remove the stripe boundary calculation for direct IO
> btrfs: unexport btrfs_get_io_geometry()
>
> fs/btrfs/btrfs_inode.h | 10 +-
> fs/btrfs/compression.c | 70 +++-----------
> fs/btrfs/disk-io.c | 9 +-
> fs/btrfs/extent_io.c | 189 +++++++++++++++++++++++++------------
> fs/btrfs/extent_io.h | 2 +
> fs/btrfs/inode.c | 210 ++++++++++++++++-------------------------
> fs/btrfs/raid56.c | 14 ++-
> fs/btrfs/raid56.h | 2 +-
> fs/btrfs/scrub.c | 4 +-
> fs/btrfs/volumes.c | 157 ++++++++++++++++++++++--------
> fs/btrfs/volumes.h | 75 +++++++++++++--
> 11 files changed, 435 insertions(+), 307 deletions(-)
>
prev parent reply other threads:[~2021-12-01 8:57 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-01 5:17 [PATCH 00/17] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
2021-12-01 5:17 ` [PATCH 01/17] btrfs: update an stale comment on btrfs_submit_bio_hook() Qu Wenruo
2021-12-01 5:17 ` [PATCH 02/17] btrfs: save bio::bi_iter into btrfs_bio::iter before submitting Qu Wenruo
2021-12-01 11:52 ` Qu Wenruo
2021-12-01 5:17 ` [PATCH 03/17] btrfs: use correct bio size for error message in btrfs_end_dio_bio() Qu Wenruo
2021-12-01 5:17 ` [PATCH 04/17] btrfs: refactor btrfs_map_bio() Qu Wenruo
2021-12-01 5:17 ` [PATCH 05/17] btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio() Qu Wenruo
2021-12-01 5:17 ` [PATCH 06/17] btrfs: replace btrfs_dio_private::refs with btrfs_dio_private::pending_bytes Qu Wenruo
2021-12-01 5:17 ` [PATCH 07/17] btrfs: introduce btrfs_bio_split() helper Qu Wenruo
2021-12-01 5:17 ` [PATCH 08/17] btrfs: make data buffered read path to handle split bio properly Qu Wenruo
2021-12-01 5:17 ` [PATCH 09/17] btrfs: make data buffered write endio function to be split bio compatible Qu Wenruo
2021-12-01 5:17 ` [PATCH 10/17] btrfs: make metadata write endio functions " Qu Wenruo
2021-12-01 5:17 ` [PATCH 11/17] btrfs: make dec_and_test_compressed_bio() " Qu Wenruo
2022-03-16 19:46 ` Josef Bacik
2022-03-16 23:30 ` Qu Wenruo
2021-12-01 5:17 ` [PATCH 12/17] btrfs: return proper mapped length for RAID56 profiles in __btrfs_map_block() Qu Wenruo
2021-12-01 6:52 ` Qu Wenruo
2021-12-01 5:17 ` [PATCH 13/17] btrfs: allow btrfs_map_bio() to split bio according to chunk stripe boundaries Qu Wenruo
2021-12-01 5:17 ` [PATCH 14/17] btrfs: remove buffered IO stripe boundary calculation Qu Wenruo
2021-12-01 5:17 ` [PATCH 15/17] btrfs: remove stripe boundary calculation for compressed IO Qu Wenruo
2021-12-01 5:17 ` [PATCH 16/17] btrfs: remove the stripe boundary calculation for direct IO Qu Wenruo
2021-12-01 5:17 ` [PATCH 17/17] btrfs: unexport btrfs_get_io_geometry() Qu Wenruo
2021-12-01 8:57 ` Qu Wenruo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bf794c47-35aa-7823-35ea-854873a20949@suse.com \
--to=wqu@suse.com \
--cc=dm-devel@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox