From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: Re: [PATCH] reftable/block: fix binary search over restart counter
Date: Thu, 7 Mar 2024 17:24:14 +0100 [thread overview]
Message-ID: <ZenqLgfSomVlYDwh@tanuki> (raw)
In-Reply-To: <a4312698cceab5f2438c9dd34465da21d719e256.1709825186.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 3495 bytes --]
On Thu, Mar 07, 2024 at 04:26:38PM +0100, Patrick Steinhardt wrote:
> Records store their keys prefix-compressed. As many records will share a
> common prefix (e.g. "refs/heads/"), this can end up saving quite a bit
> of disk space. The downside of this is that it is not possible to just
> seek into the middle of a block and consume the corresponding record
> because it may depend on prefixes read from preceding records.
>
> To help with this usecase, the reftable format writes every n'th record
> without using prefix compression, which is called a "restart". The list
> of restarts is stored at the end of each block so that a reader can
> figure out entry points at which to read a full record without having to
> read all preceding records.
>
> This allows us to do a binary search over the records in a block when
> searching for a particular key by iterating through the restarts until
> we have found the section in which our record must be located. From
> thereon we perform a linear search to locate the desired record.
>
> This mechanism is broken though. In `block_reader_seek()` we call
> `binsearch()` over the count of restarts in the current block. The
> function we pass to compare records with each other computes the key at
> the current index and then compares it to our search key by calling
> `strbuf_cmp()`, returning its result directly. But `binsearch()` expects
> us to return a truish value that indicates whether the current index is
> smaller than the searched-for key. And unless our key exactly matches
> the value at the restart counter we always end up returning a truish
> value.
>
> The consequence is that `binsearch()` essentially always returns 0,
> indicacting to us that we must start searching right at the beginning of
> the block. This works by chance because we now always do a linear scan
> from the start of the block, and thus we would still end up finding the
> desired record. But needless to say, this makes the optimization quite
> useless.
>
> Fix this bug by returning whether the current key is smaller than the
> searched key. As the current behaviour was correct it is not possible to
> write a test. Furthermore it is also not really possible to demonstrate
> in a benchmark that this fix speeds up seeking records.
>
> This may cause the reader to question whether this binary search makes
> sense in the first place if it doesn't even help with performance. But
> it would end up helping if we were to read a reftable with a much larger
> block size. Blocks can be up to 16MB in size, in which case it will
> become much more important to avoid the linear scan. We are not yet
> ready to read or write such larger blocks though, so we have to live
> without a benchmark demonstrating this.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> reftable/block.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/reftable/block.c b/reftable/block.c
> index 72eb73b380..3d7a7022e7 100644
> --- a/reftable/block.c
> +++ b/reftable/block.c
> @@ -302,7 +302,7 @@ static int restart_key_less(size_t idx, void *args)
>
> result = strbuf_cmp(&a->key, &rkey);
> strbuf_release(&rkey);
> - return result;
> + return result <= 0;
> }
>
> void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> --
> 2.44.0
>
Hum. I noticed that this causes a memory leak in our CI. I'll need to
investigate.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-03-07 16:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-07 15:26 [PATCH] reftable/block: fix binary search over restart counter Patrick Steinhardt
2024-03-07 16:24 ` Patrick Steinhardt [this message]
2024-03-07 20:35 ` [PATCH v2 0/2] " Patrick Steinhardt
2024-03-07 20:35 ` [PATCH v2 1/2] reftable/record: fix memory leak when decoding object records Patrick Steinhardt
2024-03-07 20:36 ` [PATCH v2 2/2] reftable/block: fix binary search over restart counter Patrick Steinhardt
2024-03-07 23:29 ` Junio C Hamano
2024-03-08 0:40 ` Junio C Hamano
2024-03-11 5:18 ` Patrick Steinhardt
2024-03-11 17:33 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZenqLgfSomVlYDwh@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).