git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: Re: [PATCH v2 7/7] reftable/block: avoid decoding keys when searching restart points
Date: Tue, 2 Apr 2024 19:15:48 +0200	[thread overview]
Message-ID: <Zgw9RByBo8sKpeRf@tanuki> (raw)
In-Reply-To: <eiyd2nmwxjaetkux4prwm6adcx7z77ry3wc62art6gnfklvgmw@hox32vwuu5sj>

[-- Attachment #1: Type: text/plain, Size: 3702 bytes --]

On Tue, Apr 02, 2024 at 11:47:16AM -0500, Justin Tobler wrote:
> On 24/03/25 11:11AM, Patrick Steinhardt wrote:
> > When searching over restart points in a block we decode the key of each
> > of the records, which results in a memory allocation. This is quite
> > pointless though given that records it restart points will never use
> > prefix compression and thus store their keys verbatim in the block.
> > 
> > Refactor the code so that we can avoid decoding the keys, which saves us
> > some allocations.
> 
> Out of curiousity, do you have any benchmarks around this change and
> would that be something we would want to add to the commit message?

I don't have a benchmark. The problem is that the difference isn't
really measureable when doing a single seek, only, because seeks are
simply too fast. The only usecase where I know that there are a ton of
of record seeks are writes, but here the performance improvement is
getting drowned out by everything else.

You can try to measure allocations and indeed see a difference. But
again, this is getting drowned out by the noise for writes. With my
block reader refactorings (ps/reftable-block-iteration-optim) you can
see the difference when iterating through refs. Before:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 314 allocs, 189 frees, 106,035 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 303 allocs, 178 frees, 105,763 bytes allocated

But yeah, it's nothing that'd make you go "Oh, wow!". As said, it will
add up when doing many seeks, but I didn't manage to find a proper
benchamrk yet that would be worthy to make it into the commit message.

Patrick

> -Justin
> 
> > 
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  reftable/block.c | 29 +++++++++++++++++++----------
> >  1 file changed, 19 insertions(+), 10 deletions(-)
> > 
> > diff --git a/reftable/block.c b/reftable/block.c
> > index ca80a05e21..8bb4e43cec 100644
> > --- a/reftable/block.c
> > +++ b/reftable/block.c
> > @@ -287,23 +287,32 @@ static int restart_needle_less(size_t idx, void *_args)
> >  		.buf = args->reader->block.data + off,
> >  		.len = args->reader->block_len - off,
> >  	};
> > -	struct strbuf kth_restart_key = STRBUF_INIT;
> > -	uint8_t unused_extra;
> > -	int result, n;
> > +	uint64_t prefix_len, suffix_len;
> > +	uint8_t extra;
> > +	int n;
> >  
> >  	/*
> > -	 * TODO: The restart key is verbatim in the block, so we can in theory
> > -	 * avoid decoding the key and thus save some allocations.
> > +	 * Records at restart points are stored without prefix compression, so
> > +	 * there is no need to fully decode the record key here. This removes
> > +	 * the need for allocating memory.
> >  	 */
> > -	n = reftable_decode_key(&kth_restart_key, &unused_extra, in);
> > -	if (n < 0) {
> > +	n = reftable_decode_keylen(in, &prefix_len, &suffix_len, &extra);
> > +	if (n < 0 || prefix_len) {
> >  		args->error = 1;
> >  		return -1;
> >  	}
> >  
> > -	result = strbuf_cmp(&args->needle, &kth_restart_key);
> > -	strbuf_release(&kth_restart_key);
> > -	return result < 0;
> > +	string_view_consume(&in, n);
> > +	if (suffix_len > in.len) {
> > +		args->error = 1;
> > +		return -1;
> > +	}
> > +
> > +	n = memcmp(args->needle.buf, in.buf,
> > +		   args->needle.len < suffix_len ? args->needle.len : suffix_len);
> > +	if (n)
> > +		return n < 0;
> > +	return args->needle.len < suffix_len;
> >  }
> >  
> >  void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> > -- 
> > 2.44.GIT
> > 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-04-02 17:15 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-22 12:22 [PATCH 0/7] reftable: improvements for the `binsearch()` mechanism Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-03-22 18:46   ` Justin Tobler
2024-03-25 10:07     ` Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-03-22 18:55   ` Justin Tobler
2024-03-25 10:07     ` Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt
2024-03-25 10:10 ` [PATCH v2 0/7] reftable: improvements for the `binsearch()` mechanism Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-04-02 16:27     ` Justin Tobler
2024-04-02 17:15       ` Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-04-02 16:42     ` Justin Tobler
2024-04-02 17:15       ` Patrick Steinhardt
2024-04-02 17:46         ` Justin Tobler
2024-04-03  6:01           ` Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-03-25 10:11   ` [PATCH v2 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt
2024-04-02 16:47     ` Justin Tobler
2024-04-02 17:15       ` Patrick Steinhardt [this message]
2024-04-02 17:24 ` [PATCH v3 0/7] reftable: improvements for the `binsearch()` mechanism Patrick Steinhardt
2024-04-02 17:24   ` [PATCH v3 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-04-02 17:24   ` [PATCH v3 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-04-02 17:24   ` [PATCH v3 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-04-02 17:24   ` [PATCH v3 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-04-02 17:24   ` [PATCH v3 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-04-02 17:25   ` [PATCH v3 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-04-02 17:25   ` [PATCH v3 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt
2024-04-02 17:49   ` [PATCH v3 0/7] reftable: improvements for the `binsearch()` mechanism Justin Tobler
2024-04-03  6:03 ` [PATCH v4 " Patrick Steinhardt
2024-04-03  6:03   ` [PATCH v4 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zgw9RByBo8sKpeRf@tanuki \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).