git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: karthik nayak <karthik.188@gmail.com>,
	 Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v4 10/10] refs/reftable: reuse iterators when reading refs
Date: Tue, 26 Nov 2024 07:43:01 +0100	[thread overview]
Message-ID: <20241126-pks-reftable-backend-reuse-iter-v4-10-b17fd27df126@pks.im> (raw)
In-Reply-To: <20241126-pks-reftable-backend-reuse-iter-v4-0-b17fd27df126@pks.im>

When reading references the reftable backend has to:

  1. Create a new ref iterator.

  2. Seek the iterator to the record we're searching for.

  3. Read the record.

We cannot really avoid the last two steps, but re-creating the iterator
every single time we want to read a reference is kind of expensive and a
waste of resources. We couldn't help it in the past though because it
was not possible to reuse iterators. But starting with 5bf96e0c39
(reftable/generic: move seeking of records into the iterator,
2024-05-13) we have split up the iterator lifecycle such that creating
the iterator and seeking are two different concerns.

Refactor the code such that we cache iterators in the reftable backend.
This cache is invalidated whenever the respective stack is reloaded such
that we know to recreate the iterator in that case. This leads to a
sizeable speedup when creating many refs, which requires a lot of random
reference reads:

    Benchmark 1: update-ref: create many refs (refcount = 100000, revision = master)
      Time (mean ± σ):      1.793 s ±  0.010 s    [User: 0.954 s, System: 0.835 s]
      Range (min … max):    1.781 s …  1.811 s    10 runs

    Benchmark 2: update-ref: create many refs (refcount = 100000, revision = HEAD)
      Time (mean ± σ):      1.680 s ±  0.013 s    [User: 0.846 s, System: 0.831 s]
      Range (min … max):    1.664 s …  1.702 s    10 runs

    Summary
      update-ref: create many refs (refcount = 100000, revision = HEAD) ran
        1.07 ± 0.01 times faster than update-ref: create many refs (refcount = 100000, revision = master)

While 7% is not a huge win, you have to consider that the benchmark is
_writing_ data, so _reading_ references is only one part of what we do.
Flame graphs show that we spend around 40% of our time reading refs, so
the speedup when reading refs is approximately ~2.5x that. I could not
find better benchmarks where we perform a lot of random ref reads.

You can also see a sizeable impact on memory usage when creating 100k
references. Before this change:

    HEAP SUMMARY:
        in use at exit: 19,112,538 bytes in 200,170 blocks
      total heap usage: 8,400,426 allocs, 8,200,256 frees, 454,367,048 bytes allocated

After this change:

    HEAP SUMMARY:
        in use at exit: 674,416 bytes in 169 blocks
      total heap usage: 7,929,872 allocs, 7,929,703 frees, 281,509,985 bytes allocated

As an additional factor, this refactoring opens up the possibility for
more performance optimizations in how we re-seek iterators. Any change
that allows us to optimize re-seeking by e.g. reusing data structures
would thus also directly speed up random reads.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index b6638d43028d11ae7621dcd4e0dbf1d3174743b7..a80c334f384acd0d72a1c4ef9e59c93600dd6db4 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -36,19 +36,30 @@
 
 struct reftable_backend {
 	struct reftable_stack *stack;
+	struct reftable_iterator it;
 };
 
+static void reftable_backend_on_reload(void *payload)
+{
+	struct reftable_backend *be = payload;
+	reftable_iterator_destroy(&be->it);
+}
+
 static int reftable_backend_init(struct reftable_backend *be,
 				 const char *path,
-				 const struct reftable_write_options *opts)
+				 const struct reftable_write_options *_opts)
 {
-	return reftable_new_stack(&be->stack, path, opts);
+	struct reftable_write_options opts = *_opts;
+	opts.on_reload = reftable_backend_on_reload;
+	opts.on_reload_payload = be;
+	return reftable_new_stack(&be->stack, path, &opts);
 }
 
 static void reftable_backend_release(struct reftable_backend *be)
 {
 	reftable_stack_destroy(be->stack);
 	be->stack = NULL;
+	reftable_iterator_destroy(&be->it);
 }
 
 static int reftable_backend_read_ref(struct reftable_backend *be,
@@ -60,10 +71,25 @@ static int reftable_backend_read_ref(struct reftable_backend *be,
 	struct reftable_ref_record ref = {0};
 	int ret;
 
-	ret = reftable_stack_read_ref(be->stack, refname, &ref);
+	if (!be->it.ops) {
+		ret = reftable_stack_init_ref_iterator(be->stack, &be->it);
+		if (ret)
+			goto done;
+	}
+
+	ret = reftable_iterator_seek_ref(&be->it, refname);
 	if (ret)
 		goto done;
 
+	ret = reftable_iterator_next_ref(&be->it, &ref);
+	if (ret)
+		goto done;
+
+	if (strcmp(ref.refname, refname)) {
+		ret = 1;
+		goto done;
+	}
+
 	if (ref.value_type == REFTABLE_REF_SYMREF) {
 		strbuf_reset(referent);
 		strbuf_addstr(referent, ref.value.symref);

-- 
2.47.0.366.gd4f858ca17.dirty


      parent reply	other threads:[~2024-11-26  6:43 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-04 15:11 [PATCH 0/8] refs/reftable: reuse iterators when reading refs Patrick Steinhardt
2024-11-04 15:11 ` [PATCH 1/8] refs/reftable: encapsulate reftable stack Patrick Steinhardt
2024-11-05 11:03   ` karthik nayak
2024-11-04 15:11 ` [PATCH 2/8] refs/reftable: handle reloading stacks in the reftable backend Patrick Steinhardt
2024-11-05 11:14   ` karthik nayak
2024-11-06 10:43     ` Patrick Steinhardt
2024-11-04 15:11 ` [PATCH 3/8] refs/reftable: read references via `struct reftable_backend` Patrick Steinhardt
2024-11-05 11:20   ` karthik nayak
2024-11-04 15:11 ` [PATCH 4/8] refs/reftable: refactor reading symbolic refs to use reftable backend Patrick Steinhardt
2024-11-04 15:11 ` [PATCH 5/8] refs/reftable: refactor reflog expiry " Patrick Steinhardt
2024-11-04 15:11 ` [PATCH 6/8] reftable/stack: add mechanism to notify callers on reload Patrick Steinhardt
2024-11-04 15:11 ` [PATCH 7/8] reftable/merged: drain priority queue on reseek Patrick Steinhardt
2024-11-05  3:16   ` Junio C Hamano
2024-11-05  3:23     ` Junio C Hamano
2024-11-05  7:14       ` Patrick Steinhardt
2024-11-04 15:11 ` [PATCH 8/8] refs/reftable: reuse iterators when reading refs Patrick Steinhardt
2024-11-05  4:49 ` [PATCH 0/8] " Junio C Hamano
2024-11-05  9:11 ` [PATCH v2 " Patrick Steinhardt
2024-11-05  9:11   ` [PATCH v2 1/8] refs/reftable: encapsulate reftable stack Patrick Steinhardt
2024-11-12  6:07     ` Junio C Hamano
2024-11-05  9:12   ` [PATCH v2 2/8] refs/reftable: handle reloading stacks in the reftable backend Patrick Steinhardt
2024-11-12  6:41     ` Junio C Hamano
2024-11-12  9:05       ` Patrick Steinhardt
2024-11-05  9:12   ` [PATCH v2 3/8] refs/reftable: read references via `struct reftable_backend` Patrick Steinhardt
2024-11-12  7:26     ` Junio C Hamano
2024-11-12  9:05       ` Patrick Steinhardt
2024-11-05  9:12   ` [PATCH v2 4/8] refs/reftable: refactor reading symbolic refs to use reftable backend Patrick Steinhardt
2024-11-05  9:12   ` [PATCH v2 5/8] refs/reftable: refactor reflog expiry " Patrick Steinhardt
2024-11-05  9:12   ` [PATCH v2 6/8] reftable/stack: add mechanism to notify callers on reload Patrick Steinhardt
2024-11-05  9:12   ` [PATCH v2 7/8] reftable/merged: drain priority queue on reseek Patrick Steinhardt
2024-11-05  9:12   ` [PATCH v2 8/8] refs/reftable: reuse iterators when reading refs Patrick Steinhardt
2024-11-25  7:38 ` [PATCH v3 0/9] " Patrick Steinhardt
2024-11-25  7:38   ` [PATCH v3 1/9] refs/reftable: encapsulate reftable stack Patrick Steinhardt
2024-11-25  7:38   ` [PATCH v3 2/9] refs/reftable: handle reloading stacks in the reftable backend Patrick Steinhardt
2024-11-26  0:31     ` Junio C Hamano
2024-11-25  7:38   ` [PATCH v3 3/9] reftable/stack: add accessor for the hash ID Patrick Steinhardt
2024-11-25  7:38   ` [PATCH v3 4/9] refs/reftable: read references via `struct reftable_backend` Patrick Steinhardt
2024-11-26  0:48     ` Junio C Hamano
2024-11-26  6:41       ` Patrick Steinhardt
2024-11-25  7:38   ` [PATCH v3 5/9] refs/reftable: refactor reading symbolic refs to use reftable backend Patrick Steinhardt
2024-11-25  7:38   ` [PATCH v3 6/9] refs/reftable: refactor reflog expiry " Patrick Steinhardt
2024-11-25  7:38   ` [PATCH v3 7/9] reftable/stack: add mechanism to notify callers on reload Patrick Steinhardt
2024-11-25  7:38   ` [PATCH v3 8/9] reftable/merged: drain priority queue on reseek Patrick Steinhardt
2024-11-25  7:38   ` [PATCH v3 9/9] refs/reftable: reuse iterators when reading refs Patrick Steinhardt
2024-11-25  9:47   ` [PATCH v3 0/9] " Christian Couder
2024-11-25  9:52     ` Patrick Steinhardt
2024-11-26  6:42 ` [PATCH v4 00/10] " Patrick Steinhardt
2024-11-26  6:42   ` [PATCH v4 01/10] refs/reftable: encapsulate reftable stack Patrick Steinhardt
2024-11-26  6:42   ` [PATCH v4 02/10] refs/reftable: handle reloading stacks in the reftable backend Patrick Steinhardt
2024-11-26  6:42   ` [PATCH v4 03/10] reftable/stack: add accessor for the hash ID Patrick Steinhardt
2024-11-26  6:42   ` [PATCH v4 04/10] refs/reftable: figure out hash via `reftable_stack` Patrick Steinhardt
2024-11-26  6:42   ` [PATCH v4 05/10] refs/reftable: read references via `struct reftable_backend` Patrick Steinhardt
2024-11-26  6:42   ` [PATCH v4 06/10] refs/reftable: refactor reading symbolic refs to use reftable backend Patrick Steinhardt
2024-11-26  6:42   ` [PATCH v4 07/10] refs/reftable: refactor reflog expiry " Patrick Steinhardt
2024-11-26  6:42   ` [PATCH v4 08/10] reftable/stack: add mechanism to notify callers on reload Patrick Steinhardt
2024-11-26  6:43   ` [PATCH v4 09/10] reftable/merged: drain priority queue on reseek Patrick Steinhardt
2024-11-26  6:43   ` Patrick Steinhardt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241126-pks-reftable-backend-reuse-iter-v4-10-b17fd27df126@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).