Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PoC PATCH 00/11] btrfs: scrub: rework to get rid of the complex bio formshaping
Date: Tue,  6 Dec 2022 16:23:27 +0800	[thread overview]
Message-ID: <cover.1670314744.git.wqu@suse.com> (raw)

This is a proof-of-concept patchset, thus don't merge.

This series is mostly a full rework of low level scrub code.

Although I implemented the full support for all profiles, only raid56
code is partially cleaned up (which is already over 1K lines removed).
The estimated full cleanup will be around 1~2K more lines removed
eventually.

The series is sent out for feedback, as the full patchset can be very
large (mostly to remove old codes).

[PROBLEMS OF SCRUB]

TL;DR
The current scrub code is way too complex for future expansion.

Current scrub code has a complex system to manage its in-flight bios.

This behavior is designed to improve scrub performance, but has a lot of
disadvantage too:

- Too many indirect calls/jumps

  To scrub one extent in a simple mirror (like SINGLE), the call chain
  involves the following functions:

  /* Before entering scrub_simple_mirror() */
  scrub_ctx()
  |- INIT_WORK(&sbio->work, scrub_bio_end_io_worker);

  /* In scrub_simple_mirror() */
  scrub_extent()
  |- scrub_sectors()
     |- scrub_add_sector_to_rd_bio()
        |- sbio->bio->bi_end_io = scrub_bio_end_io;

  /* Now it jumps to the endio function */

  scrub_bio_end_io()
  |- queue_work()

  /* Now it jumps to workqueue, which is setup in scrub_ctx() */
  scrub_bio_end_io_worker()
  |- scrub_block_complete()
     |- scrub_handle_errored_block() /* For corruption case */
     |  |- scrub_recheck_block()
     |     |- scrub_repair_block_from_good_copy()
     |- scrub_checksum()
     |- scrub_write_block_to_dev_replace()

  The whole jumps/delayed calls are really a mess to read.
  This makes me wonder if the original code is really designed for human
  to read.

- Complex bio form-shaping
  We have scrub_bio to manage the in-flight bios.

  It has at most 64 bios for one device scrub, and each bio can be as
  large as 128K.

  This is definitely a big performance enhancement.

  But I'm not convinced that scrub performance should be the first thing
  to consider.

- No usage of btrfs_submit_bio()
  This means we're doing a lot of things which can already be handled by
  btrfs_submit_bio().

  This means quite some duplicated code.

[ENHANCEMENT]

This patchset will introduce a new infrasturcture, scrub2_stripe.

The "scrub2" prefix is to avoid naming conflicts.

The overall idea is, we always do scrub in the unit of BTRFS_STRIPE_LEN,
hence the "scrub2_stripe".

Furthermore, all work will done in a submit-and-wait fashion, reducing
the delayed calls/jumps to minimal.

The new scrub entrance in scrub_simple_mirror() would looks like this:

  scrub_simple_mirror()
  |  /* Find a stripe with at least one sector used */
  |- scrub2_find_fill_first_stripe()
  |
  |  /* Submit a read for the whole 64KiB */
  |- scrub2_read_and_wait_stripe()
  |  |- btrfs_submit_bio()
  |  |  /* do the verification for all sectors */
  |  |- scrub2_verify_one_stripe()
  |
  |- scrub2_reapair_one_stripe()
  |  |- scrub2_repair_from_mirror()
  |     |  /* reuse the existing read path */
  |     |- scrub2_read_and_wait_stripe()
  |
  |- scrub2_report_errors()
  |
  |  /*
  |   * For dev-replace and regular scrub repair, the write range
  |   * will be different.
  |   * (replace will writeback all good sectors, repair only writes
  |   *  back repaired sectors).
  |   */
  |- scrub2_writeback_sectors()
  
Everything is done in a submit-and-wait fashion.

Although this will reduce concurrency, the readability will be greatly
improved.

Furthermore since we're already submitting read in 64KiB size, it's less
IOPS intense, thus it's already doing optimization for read.

Even if the performance is not that good, it can be enhanced later by
using scrub2_stripe_group to submit multiple stripes in one go (aka,
enlarge the block size) to improve performance, without damaging
readability.

[CURRENT FEATURES]

- Working scrub/replace for all profiles
  Currently only non-raid56 is tested.

[TODO]

There are a lot of todos:

- Completely remove scrub_bio/scrub_block/scrub_sector
  I estimate that would result further 1~2K lines removed.

  Unfortunately, thus huge cleanup will take way more patches than
  usual.

- Add proper support for zoned devices
  Currently zoned code is not touched, but existing zoned code relies on
  scrub_bio.
  Thus they won't work at all.

- Proper performance benchmarking


Qu Wenruo (11):
  btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based
    interface
  btrfs: scrub: introduce a helper to find and fill the sector info for
    a scrub2_stripe
  btrfs: scrub: introduce a helper to verify one scrub2_stripe
  btrfs: scrub: add the repair function for scrub2_stripe
  btrfs: scrub: add a writeback helper for scrub2_stripe
  btrfs: scrub: add the error reporting for scrub2_stripe
  btrfs: scrub: add raid56 P/Q scrubbing support
  btrfs: scrub: use dedicated super block verification function to scrub
    one super block
  btrfs: scrub: switch to the new scrub2_stripe based infrastructure
  btrfs: scrub: cleanup the old scrub_parity infrastructure
  btrfs: scrub: cleanup scrub_extent() and its related functions

 fs/btrfs/disk-io.c   |   10 +-
 fs/btrfs/disk-io.h   |    2 +
 fs/btrfs/file-item.c |    8 +-
 fs/btrfs/file-item.h |    3 +-
 fs/btrfs/raid56.c    |    2 +-
 fs/btrfs/scrub.c     | 2263 +++++++++++++++++++++++-------------------
 6 files changed, 1238 insertions(+), 1050 deletions(-)

-- 
2.38.1


             reply	other threads:[~2022-12-06  8:24 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-06  8:23 Qu Wenruo [this message]
2022-12-06  8:23 ` [PoC PATCH 01/11] btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based interface Qu Wenruo
2022-12-06  8:23 ` [PoC PATCH 02/11] btrfs: scrub: introduce a helper to find and fill the sector info for a scrub2_stripe Qu Wenruo
2022-12-06  8:23 ` [PoC PATCH 03/11] btrfs: scrub: introduce a helper to verify one scrub2_stripe Qu Wenruo
2022-12-06  8:23 ` [PoC PATCH 04/11] btrfs: scrub: add the repair function for scrub2_stripe Qu Wenruo
2022-12-06  8:23 ` [PoC PATCH 05/11] btrfs: scrub: add a writeback helper " Qu Wenruo
2022-12-06  8:45   ` Christoph Hellwig
2022-12-06  8:23 ` [PoC PATCH 06/11] btrfs: scrub: add the error reporting " Qu Wenruo
2022-12-06 18:48   ` kernel test robot
2022-12-06  8:23 ` [PoC PATCH 07/11] btrfs: scrub: add raid56 P/Q scrubbing support Qu Wenruo
2022-12-27 10:45   ` kernel test robot
2022-12-06  8:23 ` [PoC PATCH 08/11] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
2022-12-06  8:23 ` [PoC PATCH 09/11] btrfs: scrub: switch to the new scrub2_stripe based infrastructure Qu Wenruo
2022-12-06  8:23 ` [PoC PATCH 10/11] btrfs: scrub: cleanup the old scrub_parity infrastructure Qu Wenruo
2022-12-06  8:23 ` [PoC PATCH 11/11] btrfs: scrub: cleanup scrub_extent() and its related functions Qu Wenruo
2022-12-06  8:40 ` [PoC PATCH 00/11] btrfs: scrub: rework to get rid of the complex bio formshaping Christoph Hellwig
2022-12-06  9:29   ` Qu Wenruo
2022-12-13 22:08 ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1670314744.git.wqu@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox