From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Eric Sunshine <sunshine@sunshineco.com>, John Cai <johncai86@gmail.com>
Subject: [PATCH v2 4/7] reftable/pq: allocation-less comparison of entry keys
Date: Mon, 12 Feb 2024 09:32:43 +0100 [thread overview]
Message-ID: <fd09ba70fe16216114781dba9dd4d197bd0c4258.1707726654.git.ps@pks.im> (raw)
In-Reply-To: <cover.1707726654.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 2853 bytes --]
The priority queue is used by the merged iterator to iterate over
reftable records from multiple tables in the correct order. The queue
ends up having one record for each table that is being iterated over,
with the record that is supposed to be shown next at the top. For
example, the key of a ref record is equal to its name so that we end up
sorting the priority queue lexicographically by ref name.
To figure out the order we need to compare the reftable record keys with
each other. This comparison is done by formatting them into a `struct
strbuf` and then doing `strbuf_strcmp()` on the result. We then discard
the buffers immediately after the comparison.
This ends up being very expensive. Because the priority queue usually
contains as many records as we have tables, we call the comparison
function `O(log($tablecount))` many times for every record we insert.
Furthermore, when iterating over many refs, we will insert at least one
record for every ref we are iterating over. So ultimately, this ends up
being called `O($refcount * log($tablecount))` many times.
Refactor the code to use the new `refatble_record_cmp()` function that
has been implemented in a preceding commit. This function does not need
to allocate memory and is thus significantly more efficient.
The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1), where the reftable stack
consists of three tables:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 224.4 ms ± 6.5 ms [User: 220.6 ms, System: 3.6 ms]
Range (min … max): 216.5 ms … 261.1 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 172.9 ms ± 4.4 ms [User: 169.2 ms, System: 3.6 ms]
Range (min … max): 166.5 ms … 204.6 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.30 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/pq.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)
diff --git a/reftable/pq.c b/reftable/pq.c
index dcefeb793a..7220efc39a 100644
--- a/reftable/pq.c
+++ b/reftable/pq.c
@@ -14,20 +14,9 @@ license that can be found in the LICENSE file or at
int pq_less(struct pq_entry *a, struct pq_entry *b)
{
- struct strbuf ak = STRBUF_INIT;
- struct strbuf bk = STRBUF_INIT;
- int cmp = 0;
- reftable_record_key(&a->rec, &ak);
- reftable_record_key(&b->rec, &bk);
-
- cmp = strbuf_cmp(&ak, &bk);
-
- strbuf_release(&ak);
- strbuf_release(&bk);
-
+ int cmp = reftable_record_cmp(&a->rec, &b->rec);
if (cmp == 0)
return a->index > b->index;
-
return cmp < 0;
}
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-02-12 8:32 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-01 10:24 [PATCH 0/7] reftable: improve ref iteration performance Patrick Steinhardt
2024-02-01 10:24 ` [PATCH 1/7] reftable/record: introduce function to compare records by key Patrick Steinhardt
2024-02-01 15:00 ` Eric Sunshine
2024-02-01 10:25 ` [PATCH 2/7] reftable/merged: allocation-less dropping of shadowed records Patrick Steinhardt
2024-02-01 10:25 ` [PATCH 3/7] reftable/merged: skip comparison for records of the same subiter Patrick Steinhardt
2024-02-01 17:29 ` Eric Sunshine
2024-02-02 5:15 ` Patrick Steinhardt
2024-02-01 10:25 ` [PATCH 4/7] reftable/pq: allocation-less comparison of entry keys Patrick Steinhardt
2024-02-01 10:25 ` [PATCH 5/7] reftable/block: swap buffers instead of copying Patrick Steinhardt
2024-02-01 10:25 ` [PATCH 6/7] reftable/record: don't try to reallocate ref record name Patrick Steinhardt
2024-02-01 10:25 ` [PATCH 7/7] reftable/reader: add comments to `table_iter_next()` Patrick Steinhardt
2024-02-09 16:01 ` John Cai
2024-02-12 8:24 ` Patrick Steinhardt
2024-02-12 8:32 ` [PATCH v2 0/7] reftable: improve ref iteration performance Patrick Steinhardt
2024-02-12 8:32 ` [PATCH v2 1/7] reftable/record: introduce function to compare records by key Patrick Steinhardt
2024-02-12 8:32 ` [PATCH v2 2/7] reftable/merged: allocation-less dropping of shadowed records Patrick Steinhardt
2024-02-12 8:32 ` [PATCH v2 3/7] reftable/merged: skip comparison for records of the same subiter Patrick Steinhardt
2024-02-12 8:32 ` Patrick Steinhardt [this message]
2024-02-12 8:32 ` [PATCH v2 5/7] reftable/block: swap buffers instead of copying Patrick Steinhardt
2024-02-12 8:32 ` [PATCH v2 6/7] reftable/record: don't try to reallocate ref record name Patrick Steinhardt
2024-02-12 8:32 ` [PATCH v2 7/7] reftable/reader: add comments to `table_iter_next()` Patrick Steinhardt
2024-02-12 17:19 ` Junio C Hamano
2024-02-13 6:57 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fd09ba70fe16216114781dba9dd4d197bd0c4258.1707726654.git.ps@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=johncai86@gmail.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).