From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/4] xfs: speed up rmap lookups by using non-overlapped lookups when possible
Date: Sat, 23 Apr 2022 07:43:36 +1000 [thread overview]
Message-ID: <20220422214336.GW1544202@dread.disaster.area> (raw)
In-Reply-To: <164997685638.383709.4789775648712621300.stgit@magnolia>
On Thu, Apr 14, 2022 at 03:54:16PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Reverse mapping on a reflink-capable filesystem has some pretty high
> overhead when performing file operations. This is because the rmap
> records for logically and physically adjacent extents might not be
> adjacent in the rmap index due to data block sharing. As a result, we
> use expensive overlapped-interval btree search, which walks every record
> that overlaps with the supplied key in the hopes of finding the record.
>
> However, profiling data shows that when the index contains a record that
> is an exact match for a query key, the non-overlapped btree search
> function can find the record much faster than the overlapped version.
> Try the non-overlapped lookup first, which will make scrub run much
> faster.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> fs/xfs/libxfs/xfs_rmap.c | 38 ++++++++++++++++++++++++++++++++------
> 1 file changed, 32 insertions(+), 6 deletions(-)
>
>
> diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> index 3eea8056e7bc..5aa94deb3afd 100644
> --- a/fs/xfs/libxfs/xfs_rmap.c
> +++ b/fs/xfs/libxfs/xfs_rmap.c
> @@ -402,12 +402,38 @@ xfs_rmap_lookup_le_range(
> info.irec = irec;
> info.stat = stat;
>
> - trace_xfs_rmap_lookup_le_range(cur->bc_mp,
> - cur->bc_ag.pag->pag_agno, bno, 0, owner, offset, flags);
> - error = xfs_rmap_query_range(cur, &info.high, &info.high,
> - xfs_rmap_lookup_le_range_helper, &info);
> - if (error == -ECANCELED)
> - error = 0;
> + trace_xfs_rmap_lookup_le_range(cur->bc_mp, cur->bc_ag.pag->pag_agno,
> + bno, 0, owner, offset, flags);
> +
> + /*
> + * Historically, we always used the range query to walk every reverse
> + * mapping that could possibly overlap the key that the caller asked
> + * for, and filter out the ones that don't. That is very slow when
> + * there are a lot of records.
> + *
> + * However, there are two scenarios where the classic btree search can
> + * produce correct results -- if the index contains a record that is an
> + * exact match for the lookup key; and if there are no other records
> + * between the record we want and the key we supplied.
> + *
> + * As an optimization, try a non-overlapped lookup first. This makes
> + * scrub run much faster on most filesystems because bmbt records are
> + * usually an exact match for rmap records. If we don't find what we
> + * want, we fall back to the overlapped query.
> + */
> + error = xfs_rmap_lookup_le(cur, bno, owner, offset, flags, irec, stat);
> + if (error)
> + return error;
> + if (*stat) {
> + *stat = 0;
> + xfs_rmap_lookup_le_range_helper(cur, irec, &info);
> + }
> + if (!(*stat)) {
> + error = xfs_rmap_query_range(cur, &info.high, &info.high,
> + xfs_rmap_lookup_le_range_helper, &info);
> + if (error == -ECANCELED)
> + error = 0;
> + }
Ok, I can see what this is doing, but the code is nasty - zeroing
info.stat via *stat = 0, then having
xfs_rmap_lookup_le_range_helper() modify *stat via info.stat and
then relying on that implicit update to skip the very next if
(!(*stat)) clause is not very nice.
xfs_rmap_lookup_le_range_helper() returns -ECANCELED when it's
found a match, so we can use this rather than relying on *stat
to determine what to do:
error = xfs_rmap_lookup_le(cur, bno, owner, offset, flags, irec, stat);
if (error)
return error;
info.irec = irec;
info.stat = 0;
if (*stat)
error = xfs_rmap_lookup_le_range_helper(cur, irec, &info);
if (!error)
error = xfs_rmap_query_range(cur, &info.high, &info.high,
xfs_rmap_lookup_le_range_helper, &info);
if (error == -ECANCELED)
error = 0;
*stat = info.stat;
....
Cheers,
Dave.
> if (*stat)
> trace_xfs_rmap_lookup_le_range_result(cur->bc_mp,
> cur->bc_ag.pag->pag_agno, irec->rm_startblock,
>
>
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2022-04-22 22:41 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-14 22:53 [PATCHSET 0/4] xfs: fix rmap inefficiencies Darrick J. Wong
2022-04-14 22:54 ` [PATCH 1/4] xfs: capture buffer ops in the xfs_buf tracepoints Darrick J. Wong
2022-04-22 21:47 ` Dave Chinner
2022-04-14 22:54 ` [PATCH 2/4] xfs: simplify xfs_rmap_lookup_le call sites Darrick J. Wong
2022-04-22 21:48 ` Dave Chinner
2022-04-24 5:45 ` Christoph Hellwig
2022-04-14 22:54 ` [PATCH 3/4] xfs: speed up rmap lookups by using non-overlapped lookups when possible Darrick J. Wong
2022-04-22 21:43 ` Dave Chinner [this message]
2022-04-25 18:35 ` Darrick J. Wong
2022-04-14 22:54 ` [PATCH 4/4] xfs: speed up write operations " Darrick J. Wong
2022-04-22 21:46 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2022-04-27 0:51 [PATCHSET v2 0/4] xfs: fix rmap inefficiencies Darrick J. Wong
2022-04-27 0:51 ` [PATCH 3/4] xfs: speed up rmap lookups by using non-overlapped lookups when possible Darrick J. Wong
2022-04-27 4:19 ` Dave Chinner
2022-04-28 12:44 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220422214336.GW1544202@dread.disaster.area \
--to=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox