Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 08/10] btrfs: speedup checking for extent sharedness during fiemap
Date: Thu, 1 Sep 2022 10:23:07 -0400	[thread overview]
Message-ID: <YxDAS3KLIvGxcLYW@localhost.localdomain> (raw)
In-Reply-To: <5e696c29b65f6558b8012596aa513101ed04a21a.1662022922.git.fdmanana@suse.com>

On Thu, Sep 01, 2022 at 02:18:28PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> One of the most expensive tasks performed during fiemap is to check if
> an extent is shared. This task has two major steps:
> 
> 1) Check if the data extent is shared. This implies checking the extent
>    item in the extent tree, checking delayed references, etc. If we
>    find the data extent is directly shared, we terminate immediately;
> 
> 2) If the data extent is not directly shared (its extent item has a
>    refcount of 1), then it may be shared if we have snapshots that share
>    subtrees of the inode's subvolume b+tree. So we check if the leaf
>    containing the file extent item is shared, then its parent node, then
>    the parent node of the parent node, etc, until we reach the root node
>    or we find one of them is shared - in which case we stop immediately.
> 
> During fiemap we process the extents of a file from left to right, from
> file offset 0 to eof. This means that we iterate b+tree leaves from left
> to right, and has the implication that we keep repeating that second step
> above several times for the same b+tree path of the inode's subvolume
> b+tree.
> 
> For example, if we have two file extent items in leaf X, and the path to
> leaf X is A -> B -> C -> X, then when we try to determine if the data
> extent referenced by the first extent item is shared, we check if the data
> extent is shared - if it's not, then we check if leaf X is shared, if not,
> then we check if node C is shared, if not, then check if node B is shared,
> if not than check if node A is shared. When we move to the next file
> extent item, after determining the data extent is not shared, we repeat
> the checks for X, C, B and A - doing all the expensive searches in the
> extent tree, delayed refs, etc. If we have thousands of tile extents, then
> we keep repeating the sharedness checks for the same paths over and over.
> 
> On a file that has no shared extents or only a small portion, it's easy
> to see that this scales terribly with the number of extents in the file
> and the sizes of the extent and subvolume b+trees.
> 
> This change eliminates the repeated sharedness check on extent buffers
> by caching the results of the last path used. The results can be used as
> long as no snapshots were created since they were cached (for not shared
> extent buffers) or no roots were dropped since they were cached (for
> shared extent buffers). This greatly reduces the time spent by fiemap for
> files with thousands of extents and/or large extent and subvolume b+trees.
> 
> Example performance test:
> 
>     $ cat fiemap-perf-test.sh
>     #!/bin/bash
> 
>     DEV=/dev/sdi
>     MNT=/mnt/sdi
> 
>     mkfs.btrfs -f $DEV
>     mount -o compress=lzo $DEV $MNT
> 
>     # 40G gives 327680 128K file extents (due to compression).
>     xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar
> 
>     umount $MNT
>     mount -o compress=lzo $DEV $MNT
> 
>     start=$(date +%s%N)
>     filefrag $MNT/foobar
>     end=$(date +%s%N)
>     dur=$(( (end - start) / 1000000 ))
>     echo "fiemap took $dur milliseconds (metadata not cached)"
> 
>     start=$(date +%s%N)
>     filefrag $MNT/foobar
>     end=$(date +%s%N)
>     dur=$(( (end - start) / 1000000 ))
>     echo "fiemap took $dur milliseconds (metadata cached)"
> 
>     umount $MNT
> 
> Before this patch:
> 
>     $ ./fiemap-perf-test.sh
>     (...)
>     /mnt/sdi/foobar: 327680 extents found
>     fiemap took 3597 milliseconds (metadata not cached)
>     /mnt/sdi/foobar: 327680 extents found
>     fiemap took 2107 milliseconds (metadata cached)
> 
> After this patch:
> 
>     $ ./fiemap-perf-test.sh
>     (...)
>     /mnt/sdi/foobar: 327680 extents found
>     fiemap took 1646 milliseconds (metadata not cached)
>     /mnt/sdi/foobar: 327680 extents found
>     fiemap took 698 milliseconds (metadata cached)
> 
> That's about 2.2x faster when no metadata is cached, and about 3x faster
> when all metadata is cached. On a real filesystem with many other files,
> data, directories, etc, the b+trees will be 2 or 3 levels higher,
> therefore this optimization will have a higher impact.
> 
> Several reports of a slow fiemap show up often, the two Link tags below
> refer to two recent reports of such slowness. This patch, together with
> the next ones in the series, is meant to address that.
> 
> Link: https://lore.kernel.org/linux-btrfs/21dd32c6-f1f9-f44a-466a-e18fdc6788a7@virtuozzo.com/
> Link: https://lore.kernel.org/linux-btrfs/Ysace25wh5BbLd5f@atmark-techno.com/
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

  reply	other threads:[~2022-09-01 14:23 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01 13:18 [PATCH 00/10] btrfs: make lseek and fiemap much more efficient fdmanana
2022-09-01 13:18 ` [PATCH 01/10] btrfs: allow hole and data seeking to be interruptible fdmanana
2022-09-01 13:58   ` Josef Bacik
2022-09-01 21:49   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 02/10] btrfs: make hole and data seeking a lot more efficient fdmanana
2022-09-01 14:03   ` Josef Bacik
2022-09-01 15:00     ` Filipe Manana
2022-09-02 13:26       ` Josef Bacik
2022-09-01 22:18   ` Qu Wenruo
2022-09-02  8:36     ` Filipe Manana
2022-09-11 22:12   ` Qu Wenruo
2022-09-12  8:38     ` Filipe Manana
2022-09-01 13:18 ` [PATCH 03/10] btrfs: remove check for impossible block start for an extent map at fiemap fdmanana
2022-09-01 14:03   ` Josef Bacik
2022-09-01 22:19   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 04/10] btrfs: remove zero length check when entering fiemap fdmanana
2022-09-01 14:04   ` Josef Bacik
2022-09-01 22:24   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 05/10] btrfs: properly flush delalloc " fdmanana
2022-09-01 14:06   ` Josef Bacik
2022-09-01 22:38   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 06/10] btrfs: allow fiemap to be interruptible fdmanana
2022-09-01 14:07   ` Josef Bacik
2022-09-01 22:42   ` Qu Wenruo
2022-09-02  8:38     ` Filipe Manana
2022-09-01 13:18 ` [PATCH 07/10] btrfs: rename btrfs_check_shared() to a more descriptive name fdmanana
2022-09-01 14:08   ` Josef Bacik
2022-09-01 22:45   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 08/10] btrfs: speedup checking for extent sharedness during fiemap fdmanana
2022-09-01 14:23   ` Josef Bacik [this message]
2022-09-01 22:50   ` Qu Wenruo
2022-09-02  8:46     ` Filipe Manana
2022-09-01 13:18 ` [PATCH 09/10] btrfs: skip unnecessary extent buffer sharedness checks " fdmanana
2022-09-01 14:26   ` Josef Bacik
2022-09-01 23:01   ` Qu Wenruo
2022-09-01 13:18 ` [PATCH 10/10] btrfs: make fiemap more efficient and accurate reporting extent sharedness fdmanana
2022-09-01 14:35   ` Josef Bacik
2022-09-01 15:04     ` Filipe Manana
2022-09-02 13:25       ` Josef Bacik
2022-09-01 23:27   ` Qu Wenruo
2022-09-02  8:59     ` Filipe Manana
2022-09-02  9:34       ` Qu Wenruo
2022-09-02  9:41         ` Filipe Manana
2022-09-02  9:50           ` Qu Wenruo
2022-09-02  0:53 ` [PATCH 00/10] btrfs: make lseek and fiemap much more efficient Wang Yugui
2022-09-02  8:24   ` Filipe Manana
2022-09-02 11:41     ` Wang Yugui
2022-09-02 11:45     ` Filipe Manana
2022-09-05 14:39       ` Filipe Manana
2022-09-06 16:20 ` David Sterba
2022-09-06 17:13   ` Filipe Manana
2022-09-07  9:12 ` Christoph Hellwig
2022-09-07  9:47   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YxDAS3KLIvGxcLYW@localhost.localdomain \
    --to=josef@toxicpanda.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox