From: Qu WenRuo <wqu@suse.com>
To: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH RFC 0/2] btrfs: Introduce new incompat feature SKINNY_BG_TREE to hugely reduce mount time
Date: Mon, 4 Nov 2019 13:32:48 +0000 [thread overview]
Message-ID: <6e6cde26-194d-d3df-044e-340340d4c7ac@suse.com> (raw)
In-Reply-To: <20191104120347.56342-1-wqu@suse.com>
[-- Attachment #1.1: Type: text/plain, Size: 5002 bytes --]
On 2019/11/4 下午8:03, Qu Wenruo wrote:
> This patchset can be fetched from:
> https://github.com/adam900710/linux/tree/skinny_bg_tree
> Which is based on david/for-next-20191024 branch.
>
> This patchset will hugely reduce mount time of large fs by putting all
> block group items into its own tree, and further compact the block group
> item design to take full usage of btrfs_key.
>
> The old behavior will try to read out all block group items at mount
> time, however due to the key of block group items are scattered across
> tons of extent items, we must call btrfs_search_slot() for each block
> group.
>
> It works fine for small fs, but when number of block groups goes beyond
> 200, such tree search will become a random read, causing obvious slow
> down.
>
> On the other hand, btrfs_read_chunk_tree() is still very fast, since we
> put CHUNK_ITEMS into their own tree and package them next to each other.
>
> Following this idea, we could do the same thing for block group items,
> so instead of triggering btrfs_search_slot() for each block group, we
> just call btrfs_next_item() and under most case we could finish in
> memory, and hugely speed up mount (see BENCHMARK below).
>
> The only disadvantage is, this method introduce an incompatible feature,
> so existing fs can't use this feature directly.
> This can be improved to RO compatible, as long as btrfs can go skip_bg
> automatically (another patchset needed)
>
> Either specify it at mkfs time, or use btrfs-progs offline convert tool.
>
> [[Benchmark]]
> Since I have upgraded my rig to all NVME storage, there is no HDD
> test result.
>
> Physical device: NVMe SSD
> VM device: VirtIO block device, backup by sparse file
> Nodesize: 4K (to bump up tree height)
> Extent data size: 4M
> Fs size used: 1T
>
> All file extents on disk is in 4M size, preallocated to reduce space usage
> (as the VM uses loopback block device backed by sparse file)
>
> Without patchset:
> Use ftrace function graph:
>
> 7) | open_ctree [btrfs]() {
> 7) | btrfs_read_block_groups [btrfs]() {
> 7) @ 805851.8 us | }
> 7) @ 911890.2 us | }
>
> btrfs_read_block_groups() takes 88% of the total mount time,
>
> With patchset, and use -O skinny-bg-tree mkfs option:
>
> 5) | open_ctree [btrfs]() {
> 5) | btrfs_read_block_groups [btrfs]() {
> 5) * 63395.75 us | }
> 5) @ 143106.9 us | }
>
> open_ctree() time is only 15% of original mount time.
> And btrfs_read_block_groups() only takes 7% of total open_ctree()
> execution time.
>
> The reason is pretty obvious when considering how many tree blocks needs
> to be read from disk:
>
> | Extent tree | Regular bg tree | Skinny bg tree |
> -----------------------------------------------------------------------
> nodes | 55 | 1 | 1 |
> leaves | 1025 | 13 | 7 |
> total | 1080 | 14 | 8 |
> Not to mention all the tree blocks readahead works pretty fine for bg
> tree, as we will read every item.
> While readahead for extent tree will just be a diaster, as all block
> groups are scatter across the whole extent tree.
>
> Changelog:
> (v2~v3 are all original bg-tree design)
> v2:
> - Rebase to v5.4-rc1
> Minor conflicts due to code moved to block-group.c
> - Fix a bug where some block groups will not be loaded at mount time
> It's a bug in that refactor patch, not exposed by previous round of
> tests.
> - Add a new patch to remove a dead check
> - Update benchmark to NVMe based result
> Hardware upgrade is not always a good thing for benchmark.
>
> v3:
> - Add a separate patch to fix possible memory leak
> - Add Reviewed-by tag for the refactor patch
> - Reword the refactor patch to mention the change of use
> btrfs_fs_incompat()
>
> RFC:
> - Make bg-tree to use global rsv space.
> - Explore the skinny-bg-tree design.
>
Forgot the reason for RFC:
I don't know if the tradeoff is that good enough for all the extra trouble.
If we compare all the needed unique tree blocks, it's indeed an
impressive 0.74% of original extent tree, but only 57% reduction of
regular bg tree.
So any feedback is welcomed.
Thanks,
Qu
> Qu Wenruo (2):
> btrfs: block-group: Refactor btrfs_read_block_groups()
> btrfs: Introduce new incompat feature, SKINNY_BG_TREE, to further
> reduce mount time
>
> fs/btrfs/block-group.c | 462 +++++++++++++++++++++-----------
> fs/btrfs/block-rsv.c | 2 +
> fs/btrfs/ctree.h | 5 +-
> fs/btrfs/disk-io.c | 14 +
> fs/btrfs/sysfs.c | 2 +
> include/uapi/linux/btrfs.h | 1 +
> include/uapi/linux/btrfs_tree.h | 11 +
> 7 files changed, 342 insertions(+), 155 deletions(-)
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
prev parent reply other threads:[~2019-11-04 13:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-04 12:03 [PATCH RFC 0/2] btrfs: Introduce new incompat feature SKINNY_BG_TREE to hugely reduce mount time Qu Wenruo
2019-11-04 12:03 ` [PATCH RFC 1/2] btrfs: block-group: Refactor btrfs_read_block_groups() Qu Wenruo
2019-11-04 12:03 ` [PATCH RFC 2/2] btrfs: Introduce new incompat feature, SKINNY_BG_TREE, to further reduce mount time Qu Wenruo
2019-11-04 13:32 ` Qu WenRuo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e6cde26-194d-d3df-044e-340340d4c7ac@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox