From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 08/10] btrfs: speedup checking for extent sharedness during fiemap
Date: Thu, 1 Sep 2022 10:23:07 -0400 [thread overview]
Message-ID: <YxDAS3KLIvGxcLYW@localhost.localdomain> (raw)
In-Reply-To: <5e696c29b65f6558b8012596aa513101ed04a21a.1662022922.git.fdmanana@suse.com>
On Thu, Sep 01, 2022 at 02:18:28PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> One of the most expensive tasks performed during fiemap is to check if
> an extent is shared. This task has two major steps:
>
> 1) Check if the data extent is shared. This implies checking the extent
> item in the extent tree, checking delayed references, etc. If we
> find the data extent is directly shared, we terminate immediately;
>
> 2) If the data extent is not directly shared (its extent item has a
> refcount of 1), then it may be shared if we have snapshots that share
> subtrees of the inode's subvolume b+tree. So we check if the leaf
> containing the file extent item is shared, then its parent node, then
> the parent node of the parent node, etc, until we reach the root node
> or we find one of them is shared - in which case we stop immediately.
>
> During fiemap we process the extents of a file from left to right, from
> file offset 0 to eof. This means that we iterate b+tree leaves from left
> to right, and has the implication that we keep repeating that second step
> above several times for the same b+tree path of the inode's subvolume
> b+tree.
>
> For example, if we have two file extent items in leaf X, and the path to
> leaf X is A -> B -> C -> X, then when we try to determine if the data
> extent referenced by the first extent item is shared, we check if the data
> extent is shared - if it's not, then we check if leaf X is shared, if not,
> then we check if node C is shared, if not, then check if node B is shared,
> if not than check if node A is shared. When we move to the next file
> extent item, after determining the data extent is not shared, we repeat
> the checks for X, C, B and A - doing all the expensive searches in the
> extent tree, delayed refs, etc. If we have thousands of tile extents, then
> we keep repeating the sharedness checks for the same paths over and over.
>
> On a file that has no shared extents or only a small portion, it's easy
> to see that this scales terribly with the number of extents in the file
> and the sizes of the extent and subvolume b+trees.
>
> This change eliminates the repeated sharedness check on extent buffers
> by caching the results of the last path used. The results can be used as
> long as no snapshots were created since they were cached (for not shared
> extent buffers) or no roots were dropped since they were cached (for
> shared extent buffers). This greatly reduces the time spent by fiemap for
> files with thousands of extents and/or large extent and subvolume b+trees.
>
> Example performance test:
>
> $ cat fiemap-perf-test.sh
> #!/bin/bash
>
> DEV=/dev/sdi
> MNT=/mnt/sdi
>
> mkfs.btrfs -f $DEV
> mount -o compress=lzo $DEV $MNT
>
> # 40G gives 327680 128K file extents (due to compression).
> xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar
>
> umount $MNT
> mount -o compress=lzo $DEV $MNT
>
> start=$(date +%s%N)
> filefrag $MNT/foobar
> end=$(date +%s%N)
> dur=$(( (end - start) / 1000000 ))
> echo "fiemap took $dur milliseconds (metadata not cached)"
>
> start=$(date +%s%N)
> filefrag $MNT/foobar
> end=$(date +%s%N)
> dur=$(( (end - start) / 1000000 ))
> echo "fiemap took $dur milliseconds (metadata cached)"
>
> umount $MNT
>
> Before this patch:
>
> $ ./fiemap-perf-test.sh
> (...)
> /mnt/sdi/foobar: 327680 extents found
> fiemap took 3597 milliseconds (metadata not cached)
> /mnt/sdi/foobar: 327680 extents found
> fiemap took 2107 milliseconds (metadata cached)
>
> After this patch:
>
> $ ./fiemap-perf-test.sh
> (...)
> /mnt/sdi/foobar: 327680 extents found
> fiemap took 1646 milliseconds (metadata not cached)
> /mnt/sdi/foobar: 327680 extents found
> fiemap took 698 milliseconds (metadata cached)
>
> That's about 2.2x faster when no metadata is cached, and about 3x faster
> when all metadata is cached. On a real filesystem with many other files,
> data, directories, etc, the b+trees will be 2 or 3 levels higher,
> therefore this optimization will have a higher impact.
>
> Several reports of a slow fiemap show up often, the two Link tags below
> refer to two recent reports of such slowness. This patch, together with
> the next ones in the series, is meant to address that.
>
> Link: https://lore.kernel.org/linux-btrfs/21dd32c6-f1f9-f44a-466a-e18fdc6788a7@virtuozzo.com/
> Link: https://lore.kernel.org/linux-btrfs/Ysace25wh5BbLd5f@atmark-techno.com/
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
next prev parent reply other threads:[~2022-09-01 14:23 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-01 13:18 [PATCH 00/10] btrfs: make lseek and fiemap much more efficient fdmanana
2022-09-01 13:18 ` [PATCH 01/10] btrfs: allow hole and data seeking to be interruptible fdmanana
2022-09-01 13:58 ` Josef Bacik
2022-09-01 21:49 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 02/10] btrfs: make hole and data seeking a lot more efficient fdmanana
2022-09-01 14:03 ` Josef Bacik
2022-09-01 15:00 ` Filipe Manana
2022-09-02 13:26 ` Josef Bacik
2022-09-01 22:18 ` Qu Wenruo
2022-09-02 8:36 ` Filipe Manana
2022-09-11 22:12 ` Qu Wenruo
2022-09-12 8:38 ` Filipe Manana
2022-09-01 13:18 ` [PATCH 03/10] btrfs: remove check for impossible block start for an extent map at fiemap fdmanana
2022-09-01 14:03 ` Josef Bacik
2022-09-01 22:19 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 04/10] btrfs: remove zero length check when entering fiemap fdmanana
2022-09-01 14:04 ` Josef Bacik
2022-09-01 22:24 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 05/10] btrfs: properly flush delalloc " fdmanana
2022-09-01 14:06 ` Josef Bacik
2022-09-01 22:38 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 06/10] btrfs: allow fiemap to be interruptible fdmanana
2022-09-01 14:07 ` Josef Bacik
2022-09-01 22:42 ` Qu Wenruo
2022-09-02 8:38 ` Filipe Manana
2022-09-01 13:18 ` [PATCH 07/10] btrfs: rename btrfs_check_shared() to a more descriptive name fdmanana
2022-09-01 14:08 ` Josef Bacik
2022-09-01 22:45 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 08/10] btrfs: speedup checking for extent sharedness during fiemap fdmanana
2022-09-01 14:23 ` Josef Bacik [this message]
2022-09-01 22:50 ` Qu Wenruo
2022-09-02 8:46 ` Filipe Manana
2022-09-01 13:18 ` [PATCH 09/10] btrfs: skip unnecessary extent buffer sharedness checks " fdmanana
2022-09-01 14:26 ` Josef Bacik
2022-09-01 23:01 ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 10/10] btrfs: make fiemap more efficient and accurate reporting extent sharedness fdmanana
2022-09-01 14:35 ` Josef Bacik
2022-09-01 15:04 ` Filipe Manana
2022-09-02 13:25 ` Josef Bacik
2022-09-01 23:27 ` Qu Wenruo
2022-09-02 8:59 ` Filipe Manana
2022-09-02 9:34 ` Qu Wenruo
2022-09-02 9:41 ` Filipe Manana
2022-09-02 9:50 ` Qu Wenruo
2022-09-02 0:53 ` [PATCH 00/10] btrfs: make lseek and fiemap much more efficient Wang Yugui
2022-09-02 8:24 ` Filipe Manana
2022-09-02 11:41 ` Wang Yugui
2022-09-02 11:45 ` Filipe Manana
2022-09-05 14:39 ` Filipe Manana
2022-09-06 16:20 ` David Sterba
2022-09-06 17:13 ` Filipe Manana
2022-09-07 9:12 ` Christoph Hellwig
2022-09-07 9:47 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YxDAS3KLIvGxcLYW@localhost.localdomain \
--to=josef@toxicpanda.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.