From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 08/10] btrfs: speedup checking for extent sharedness during fiemap
Date: Thu, 1 Sep 2022 10:23:07 -0400 [thread overview]
Message-ID: <YxDAS3KLIvGxcLYW@localhost.localdomain> (raw)
In-Reply-To: <5e696c29b65f6558b8012596aa513101ed04a21a.1662022922.git.fdmanana@suse.com>
On Thu, Sep 01, 2022 at 02:18:28PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> One of the most expensive tasks performed during fiemap is to check if
> an extent is shared. This task has two major steps:
>
> 1) Check if the data extent is shared. This implies checking the extent
> item in the extent tree, checking delayed references, etc. If we
> find the data extent is directly shared, we terminate immediately;
>
> 2) If the data extent is not directly shared (its extent item has a
> refcount of 1), then it may be shared if we have snapshots that share
> subtrees of the inode's subvolume b+tree. So we check if the leaf
> containing the file extent item is shared, then its parent node, then
> the parent node of the parent node, etc, until we reach the root node
> or we find one of them is shared - in which case we stop immediately.
>
> During fiemap we process the extents of a file from left to right, from
> file offset 0 to eof. This means that we iterate b+tree leaves from left
> to right, and has the implication that we keep repeating that second step
> above several times for the same b+tree path of the inode's subvolume
> b+tree.
>
> For example, if we have two file extent items in leaf X, and the path to
> leaf X is A -> B -> C -> X, then when we try to determine if the data
> extent referenced by the first extent item is shared, we check if the data
> extent is shared - if it's not, then we check if leaf X is shared, if not,
> then we check if node C is shared, if not, then check if node B is shared,
> if not than check if node A is shared. When we move to the next file
> extent item, after determining the data extent is not shared, we repeat
> the checks for X, C, B and A - doing all the expensive searches in the
> extent tree, delayed refs, etc. If we have thousands of tile extents, then
> we keep repeating the sharedness checks for the same paths over and over.
>
> On a file that has no shared extents or only a small portion, it's easy
> to see that this scales terribly with the number of extents in the file
> and the sizes of the extent and subvolume b+trees.
>
> This change eliminates the repeated sharedness check on extent buffers
> by caching the results of the last path used. The results can be used as
> long as no snapshots were created since they were cached (for not shared
> extent buffers) or no roots were dropped since they were cached (for
> shared extent buffers). This greatly reduces the time spent by fiemap for
> files with thousands of extents and/or large extent and subvolume b+trees.
>
> Example performance test:
>
> $ cat fiemap-perf-test.sh
> #!/bin/bash
>
> DEV=/dev/sdi
> MNT=/mnt/sdi
>
> mkfs.btrfs -f $DEV
> mount -o compress=lzo $DEV $MNT
>
> # 40G gives 327680 128K file extents (due to compression).
> xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar
>
> umount $MNT
> mount -o compress=lzo $DEV $MNT
>
> start=$(date +%s%N)
> filefrag $MNT/foobar
> end=$(date +%s%N)
> dur=$(( (end - start) / 1000000 ))
> echo "fiemap took $dur milliseconds (metadata not cached)"
>
> start=$(date +%s%N)
> filefrag $MNT/foobar
> end=$(date +%s%N)
> dur=$(( (end - start) / 1000000 ))
> echo "fiemap took $dur milliseconds (metadata cached)"
>
> umount $MNT
>
> Before this patch:
>
> $ ./fiemap-perf-test.sh
> (...)
> /mnt/sdi/foobar: 327680 extents found
> fiemap took 3597 milliseconds (metadata not cached)
> /mnt/sdi/foobar: 327680 extents found
> fiemap took 2107 milliseconds (metadata cached)
>
> After this patch:
>
> $ ./fiemap-perf-test.sh
> (...)
> /mnt/sdi/foobar: 327680 extents found
> fiemap took 1646 milliseconds (metadata not cached)
> /mnt/sdi/foobar: 327680 extents found
> fiemap took 698 milliseconds (metadata cached)
>
> That's about 2.2x faster when no metadata is cached, and about 3x faster
> when all metadata is cached. On a real filesystem with many other files,
> data, directories, etc, the b+trees will be 2 or 3 levels higher,
> therefore this optimization will have a higher impact.
>
> Several reports of a slow fiemap show up often, the two Link tags below
> refer to two recent reports of such slowness. This patch, together with
> the next ones in the series, is meant to address that.
>
> Link: https://lore.kernel.org/linux-btrfs/21dd32c6-f1f9-f44a-466a-e18fdc6788a7@virtuozzo.com/
> Link: https://lore.kernel.org/linux-btrfs/Ysace25wh5BbLd5f@atmark-techno.com/
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
next prev parent reply other threads:[~2022-09-01 14:23 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-01 13:18 [PATCH 00/10] btrfs: make lseek and fiemap much more efficient fdmanana
2022-09-01 13:18 ` [PATCH 01/10] btrfs: allow hole and data seeking to be interruptible fdmanana
2022-09-01 13:58 ` Josef Bacik
2022-09-01 21:49 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 02/10] btrfs: make hole and data seeking a lot more efficient fdmanana
2022-09-01 14:03 ` Josef Bacik
2022-09-01 15:00 ` Filipe Manana
2022-09-02 13:26 ` Josef Bacik
2022-09-01 22:18 ` Qu Wenruo
2022-09-02 8:36 ` Filipe Manana
2022-09-11 22:12 ` Qu Wenruo
2022-09-12 8:38 ` Filipe Manana
2022-09-01 13:18 ` [PATCH 03/10] btrfs: remove check for impossible block start for an extent map at fiemap fdmanana
2022-09-01 14:03 ` Josef Bacik
2022-09-01 22:19 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 04/10] btrfs: remove zero length check when entering fiemap fdmanana
2022-09-01 14:04 ` Josef Bacik
2022-09-01 22:24 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 05/10] btrfs: properly flush delalloc " fdmanana
2022-09-01 14:06 ` Josef Bacik
2022-09-01 22:38 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 06/10] btrfs: allow fiemap to be interruptible fdmanana
2022-09-01 14:07 ` Josef Bacik
2022-09-01 22:42 ` Qu Wenruo
2022-09-02 8:38 ` Filipe Manana
2022-09-01 13:18 ` [PATCH 07/10] btrfs: rename btrfs_check_shared() to a more descriptive name fdmanana
2022-09-01 14:08 ` Josef Bacik
2022-09-01 22:45 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 08/10] btrfs: speedup checking for extent sharedness during fiemap fdmanana
2022-09-01 14:23 ` Josef Bacik [this message]
2022-09-01 22:50 ` Qu Wenruo
2022-09-02 8:46 ` Filipe Manana
2022-09-01 13:18 ` [PATCH 09/10] btrfs: skip unnecessary extent buffer sharedness checks " fdmanana
2022-09-01 14:26 ` Josef Bacik
2022-09-01 23:01 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 10/10] btrfs: make fiemap more efficient and accurate reporting extent sharedness fdmanana
2022-09-01 14:35 ` Josef Bacik
2022-09-01 15:04 ` Filipe Manana
2022-09-02 13:25 ` Josef Bacik
2022-09-01 23:27 ` Qu Wenruo
2022-09-02 8:59 ` Filipe Manana
2022-09-02 9:34 ` Qu Wenruo
2022-09-02 9:41 ` Filipe Manana
2022-09-02 9:50 ` Qu Wenruo
2022-09-02 0:53 ` [PATCH 00/10] btrfs: make lseek and fiemap much more efficient Wang Yugui
2022-09-02 8:24 ` Filipe Manana
2022-09-02 11:41 ` Wang Yugui
2022-09-02 11:45 ` Filipe Manana
2022-09-05 14:39 ` Filipe Manana
2022-09-06 16:20 ` David Sterba
2022-09-06 17:13 ` Filipe Manana
2022-09-07 9:12 ` Christoph Hellwig
2022-09-07 9:47 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YxDAS3KLIvGxcLYW@localhost.localdomain \
--to=josef@toxicpanda.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox