From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: Re: [PATCH v2 7/7] reftable/block: avoid decoding keys when searching restart points
Date: Tue, 2 Apr 2024 19:15:48 +0200 [thread overview]
Message-ID: <Zgw9RByBo8sKpeRf@tanuki> (raw)
In-Reply-To: <eiyd2nmwxjaetkux4prwm6adcx7z77ry3wc62art6gnfklvgmw@hox32vwuu5sj>
[-- Attachment #1: Type: text/plain, Size: 3702 bytes --]
On Tue, Apr 02, 2024 at 11:47:16AM -0500, Justin Tobler wrote:
> On 24/03/25 11:11AM, Patrick Steinhardt wrote:
> > When searching over restart points in a block we decode the key of each
> > of the records, which results in a memory allocation. This is quite
> > pointless though given that records it restart points will never use
> > prefix compression and thus store their keys verbatim in the block.
> >
> > Refactor the code so that we can avoid decoding the keys, which saves us
> > some allocations.
>
> Out of curiousity, do you have any benchmarks around this change and
> would that be something we would want to add to the commit message?
I don't have a benchmark. The problem is that the difference isn't
really measureable when doing a single seek, only, because seeks are
simply too fast. The only usecase where I know that there are a ton of
of record seeks are writes, but here the performance improvement is
getting drowned out by everything else.
You can try to measure allocations and indeed see a difference. But
again, this is getting drowned out by the noise for writes. With my
block reader refactorings (ps/reftable-block-iteration-optim) you can
see the difference when iterating through refs. Before:
HEAP SUMMARY:
in use at exit: 13,603 bytes in 125 blocks
total heap usage: 314 allocs, 189 frees, 106,035 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,603 bytes in 125 blocks
total heap usage: 303 allocs, 178 frees, 105,763 bytes allocated
But yeah, it's nothing that'd make you go "Oh, wow!". As said, it will
add up when doing many seeks, but I didn't manage to find a proper
benchamrk yet that would be worthy to make it into the commit message.
Patrick
> -Justin
>
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> > reftable/block.c | 29 +++++++++++++++++++----------
> > 1 file changed, 19 insertions(+), 10 deletions(-)
> >
> > diff --git a/reftable/block.c b/reftable/block.c
> > index ca80a05e21..8bb4e43cec 100644
> > --- a/reftable/block.c
> > +++ b/reftable/block.c
> > @@ -287,23 +287,32 @@ static int restart_needle_less(size_t idx, void *_args)
> > .buf = args->reader->block.data + off,
> > .len = args->reader->block_len - off,
> > };
> > - struct strbuf kth_restart_key = STRBUF_INIT;
> > - uint8_t unused_extra;
> > - int result, n;
> > + uint64_t prefix_len, suffix_len;
> > + uint8_t extra;
> > + int n;
> >
> > /*
> > - * TODO: The restart key is verbatim in the block, so we can in theory
> > - * avoid decoding the key and thus save some allocations.
> > + * Records at restart points are stored without prefix compression, so
> > + * there is no need to fully decode the record key here. This removes
> > + * the need for allocating memory.
> > */
> > - n = reftable_decode_key(&kth_restart_key, &unused_extra, in);
> > - if (n < 0) {
> > + n = reftable_decode_keylen(in, &prefix_len, &suffix_len, &extra);
> > + if (n < 0 || prefix_len) {
> > args->error = 1;
> > return -1;
> > }
> >
> > - result = strbuf_cmp(&args->needle, &kth_restart_key);
> > - strbuf_release(&kth_restart_key);
> > - return result < 0;
> > + string_view_consume(&in, n);
> > + if (suffix_len > in.len) {
> > + args->error = 1;
> > + return -1;
> > + }
> > +
> > + n = memcmp(args->needle.buf, in.buf,
> > + args->needle.len < suffix_len ? args->needle.len : suffix_len);
> > + if (n)
> > + return n < 0;
> > + return args->needle.len < suffix_len;
> > }
> >
> > void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> > --
> > 2.44.GIT
> >
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-04-02 17:15 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-22 12:22 [PATCH 0/7] reftable: improvements for the `binsearch()` mechanism Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-03-22 18:46 ` Justin Tobler
2024-03-25 10:07 ` Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-03-22 18:55 ` Justin Tobler
2024-03-25 10:07 ` Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt
2024-03-25 10:10 ` [PATCH v2 0/7] reftable: improvements for the `binsearch()` mechanism Patrick Steinhardt
2024-03-25 10:10 ` [PATCH v2 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-03-25 10:10 ` [PATCH v2 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-03-25 10:10 ` [PATCH v2 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-04-02 16:27 ` Justin Tobler
2024-04-02 17:15 ` Patrick Steinhardt
2024-03-25 10:10 ` [PATCH v2 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-04-02 16:42 ` Justin Tobler
2024-04-02 17:15 ` Patrick Steinhardt
2024-04-02 17:46 ` Justin Tobler
2024-04-03 6:01 ` Patrick Steinhardt
2024-03-25 10:10 ` [PATCH v2 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-03-25 10:10 ` [PATCH v2 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-03-25 10:11 ` [PATCH v2 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt
2024-04-02 16:47 ` Justin Tobler
2024-04-02 17:15 ` Patrick Steinhardt [this message]
2024-04-02 17:24 ` [PATCH v3 0/7] reftable: improvements for the `binsearch()` mechanism Patrick Steinhardt
2024-04-02 17:24 ` [PATCH v3 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-04-02 17:24 ` [PATCH v3 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-04-02 17:24 ` [PATCH v3 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-04-02 17:24 ` [PATCH v3 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-04-02 17:24 ` [PATCH v3 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-04-02 17:25 ` [PATCH v3 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-04-02 17:25 ` [PATCH v3 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt
2024-04-02 17:49 ` [PATCH v3 0/7] reftable: improvements for the `binsearch()` mechanism Justin Tobler
2024-04-03 6:03 ` [PATCH v4 " Patrick Steinhardt
2024-04-03 6:03 ` [PATCH v4 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-04-03 6:04 ` [PATCH v4 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-04-03 6:04 ` [PATCH v4 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-04-03 6:04 ` [PATCH v4 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-04-03 6:04 ` [PATCH v4 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-04-03 6:04 ` [PATCH v4 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-04-03 6:04 ` [PATCH v4 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zgw9RByBo8sKpeRf@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).