public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu WenRuo <wqu@suse.com>
To: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time
Date: Thu, 10 Oct 2019 02:40:48 +0000	[thread overview]
Message-ID: <115dcff2-1cff-1a33-3b71-20acaa2f2ef9@suse.com> (raw)
In-Reply-To: <20191010023928.24586-1-wqu@suse.com>


[-- Attachment #1.1: Type: text/plain, Size: 4254 bytes --]



On 2019/10/10 上午10:39, Qu Wenruo wrote:
> This patchset can be fetched from:
> https://github.com/adam900710/linux/tree/bg_tree
> Which is based on v5.4-rc1 tag.
> 
> This patchset will hugely reduce mount time of large fs by putting all
> block group items into its own tree.
> 
> The old behavior will try to read out all block group items at mount
> time, however due to the key of block group items are scattered across
> tons of extent items, we must call btrfs_search_slot() for each block
> group.
> 
> It works fine for small fs, but when number of block groups goes beyond
> 200, such tree search will become a random read, causing obvious slow
> down.
> 
> On the other hand, btrfs_read_chunk_tree() is still very fast, since we
> put CHUNK_ITEMS into their own tree and package them next to each other.
> 
> Following this idea, we could do the same thing for block group items,
> so instead of triggering btrfs_search_slot() for each block group, we
> just call btrfs_next_item() and under most case we could finish in
> memory, and hugely speed up mount (see BENCHMARK below).
> 
> The only disadvantage is, this method introduce an incompatible feature,
> so existing fs can't use this feature directly.
> Either specify it at mkfs time, or use btrfs-progs offline convert tool.
> 
> [[Benchmark]]
> Since I have upgraded my rig to all NVME storage, there is no HDD
> test result.
> 
> Physical device:	NVMe SSD
> VM device:		VirtIO block device, backup by sparse file
> Nodesize:		4K  (to bump up tree height)
> Extent data size:	4M
> Fs size used:		1T
> 
> All file extents on disk is in 4M size, preallocated to reduce space usage
> (as the VM uses loopback block device backed by sparse file)
> 
> Without patchset:
> Use ftrace function graph:
> 
>  7)               |  open_ctree [btrfs]() {
>  7)               |    btrfs_read_block_groups [btrfs]() {
>  7) @ 805851.8 us |    }
>  7) @ 911890.2 us |  }
> 
>  btrfs_read_block_groups() takes 88% of the total mount time,
> 
> With patchset, and use -O bg-tree mkfs option:
> 
>  6)               |  open_ctree [btrfs]() {
>  6)               |    btrfs_read_block_groups [btrfs]() {
>  6) * 91204.69 us |    }
>  6) @ 192039.5 us |  }
> 
>   open_ctree() time is only 21% of original mount time.
>   And btrfs_read_block_groups() only takes 47% of total open_ctree()
>   execution time.
> 
> The reason is pretty obvious when considering how many tree blocks needs
> to be read from disk:
> - Original extent tree:
>   nodes:	55
>   leaves:	1025
>   total:	1080
> - Block group tree:
>   nodes:	1
>   leaves:	13
>   total:	14
> 
> Not to mention all the tree blocks readahead works pretty fine for bg
> tree, as we will read every item.
> While readahead for extent tree will just be a diaster, as all block
> groups are scatter across the whole extent tree.
> 
> Changelog:
> v2:
> - Rebase to v5.4-rc1
>   Minor conflicts due to code moved to block-group.c
> - Fix a bug where some block groups will not be loaded at mount time
>   It's a bug in that refactor patch, not exposed by previous round of
>   tests.
> - Add a new patch to remove a dead check
> - Update benchmark to NVMe based result
>   Hardware upgrade is not always a good thing for benchmark.
> 
> Changelog:
> v3:
> - Add a separate patch to fix possible memory leak
> - Add Reviewed-by tag for the refactor patch
> - Reword the refactor patch to mention the change of use
>   btrfs_fs_incompat()
Forgot one:

- Remove one wrong patch which could break usebackuproot mount option.

Thanks,
Qu
> 
> Qu Wenruo (3):
>   btrfs: block-group: Fix a memory leak due to missing
>     btrfs_put_block_group()
>   btrfs: block-group: Refactor btrfs_read_block_groups()
>   btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time
> 
>  fs/btrfs/block-group.c          | 306 ++++++++++++++++++++------------
>  fs/btrfs/ctree.h                |   5 +-
>  fs/btrfs/disk-io.c              |  13 ++
>  fs/btrfs/sysfs.c                |   2 +
>  include/uapi/linux/btrfs.h      |   1 +
>  include/uapi/linux/btrfs_tree.h |   3 +
>  6 files changed, 212 insertions(+), 118 deletions(-)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

      parent reply	other threads:[~2019-10-10  2:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-10  2:39 [PATCH v3 0/3] btrfs: Introduce new incompat feature BG_TREE to hugely reduce mount time Qu Wenruo
2019-10-10  2:39 ` [PATCH v3 1/3] btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() Qu Wenruo
2019-10-10  2:51   ` Anand Jain
2019-10-10  7:20   ` Johannes Thumshirn
2019-10-11 19:23   ` David Sterba
2019-10-10  2:39 ` [PATCH v3 2/3] btrfs: block-group: Refactor btrfs_read_block_groups() Qu Wenruo
2019-10-10  2:52   ` Anand Jain
2019-10-30  4:59   ` Qu WenRuo
2019-11-04 19:53     ` David Sterba
2019-11-04 21:44       ` David Sterba
2019-11-05  0:47         ` Qu Wenruo
2019-11-04 19:55   ` David Sterba
2019-10-10  2:39 ` [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time Qu Wenruo
2019-10-10  5:21   ` Naohiro Aota
2019-10-11 13:23   ` Josef Bacik
2019-10-14  9:08   ` Anand Jain
2019-10-10  2:40 ` Qu WenRuo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=115dcff2-1cff-1a33-3b71-20acaa2f2ef9@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox