From: Justin Tobler <jltobler@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Matt Smiley <msmiley@gitlab.com>
Subject: Re: [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()`
Date: Wed, 18 Feb 2026 12:26:21 -0600 [thread overview]
Message-ID: <aZX4w8C5In9gEF33@denethor> (raw)
In-Reply-To: <20260216-b4-pks-receive-pack-optimize-shallow-v1-1-e98886daff2b@pks.im>
On 26/02/16 04:38PM, Patrick Steinhardt wrote:
> The function `lookup_commit_reference_gently()` can be used to look up a
> committish by object ID. As such, the function knows to peel for example
> tag objects so that we eventually end up with the commit.
>
> The function is used quite a lot throughout our tree. One such user is
> "shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
> function is to figure out whether a shallow push is missing any objects
> that are required to satisfy the ref updates, and if so, which of the
> ref updates is missing objects.
>
> This is done by painting the tree with `UNINTERESTING`. We start
> painting by calling `refs_for_each_ref()` so that we can mark all
> existing referenced objects as the boundary of objects that we already
> have, and which are supposed to be fully connected. The reference tips
> are then parsed via `lookup_commit_reference_gently()`, and the commit
> commit is then marked as uninteresting.
s/commit commit/commit/
> But references may not necessarily point to a committish, and if a lot
> of them aren't then this step takes a lot of time. This is mostly due to
> the way that `lookup_commit_reference_gently()` is implemented: before
> we learn about the type of the object we already call `parse_object()`
> on the object ID. This has two consequences:
>
> - We parse all objects, including trees and blobs, even though we
> don't even need the contents of them.
>
> - More importantly though, `parse_object()` will cause us to check
> whether the object ID matches its contents.
>
> Combined this means that we deflate and hash every non-committish
> object, and that of course ends up being both CPU- and memory-intensive.
IIUC, the `mark_uninteresting()` callback used by `refs_for_each_ref()`
is resulting in resolved/peeled object being parsed regardless of its
type. This is wasteful though because we should only care about commits.
This makes sense.
> Improve the logic so that we first use `peel_object()`. This function
> won't parse the object for us, and thus it allows us to learn about the
> object's type before we parse and return it.
>
> The following benchmark pushes a single object from a shallow clone into
> a repository that has 100,000 refs. These refs were created by listing
> all objects via `git rev-list(1) --objects --all` and creating refs for
> a subset of them, so lots of those refs will cover non-commit objects.
>
> Benchmark 1: git-receive-pack (rev = HEAD~)
> Time (mean ± σ): 62.571 s ± 0.413 s [User: 58.331 s, System: 4.053 s]
> Range (min … max): 62.191 s … 63.010 s 3 runs
>
> Benchmark 2: git-receive-pack (rev = HEAD)
> Time (mean ± σ): 38.339 s ± 0.192 s [User: 36.220 s, System: 1.992 s]
> Range (min … max): 38.176 s … 38.551 s 3 runs
>
> Summary
> git-receive-pack . </tmp/input (rev = HEAD) ran
> 1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~)
>
> This leads to a sizeable speedup as we now skip reading and parsing
> non-commit objects. Before this change we spent around 40% of the time
> in `assign_shallow_commits_to_refs()`, after the change we only spend
> around 1.2% of the time in there. Almost the entire remainder of the
> time is spent in git-rev-list(1) to perform the connectivity checks.
Nice!
> Despite the speedup though, this also leads to a massive reduction in
> allocations. Before:
>
> HEAP SUMMARY:
> in use at exit: 352,480,441 bytes in 97,185 blocks
> total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated
>
> And after:
>
> HEAP SUMMARY:
> in use at exit: 17,524,978 bytes in 22,393 blocks
> total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
>
> Note that when all references refer to commits performance stays roughly
> the same, as expected. The following benchmark was executed with 600k
> commits:
>
> Benchmark 1: git-receive-pack (rev = HEAD~)
> Time (mean ± σ): 9.101 s ± 0.006 s [User: 8.800 s, System: 0.520 s]
> Range (min … max): 9.095 s … 9.106 s 3 runs
>
> Benchmark 2: git-receive-pack (rev = HEAD)
> Time (mean ± σ): 9.128 s ± 0.094 s [User: 8.820 s, System: 0.522 s]
> Range (min … max): 9.019 s … 9.188 s 3 runs
>
> Summary
> git-receive-pack (rev = HEAD~) ran
> 1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD)
>
> This will be improved in the next commit.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> commit.c | 32 +++++++++++++++++++++++++++-----
> object.c | 23 ++++++++++++++++++-----
> object.h | 5 +++++
> 3 files changed, 50 insertions(+), 10 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index 9bb471d217..b7c4ec2eb5 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -43,13 +43,35 @@ const char *commit_type = "commit";
> struct commit *lookup_commit_reference_gently(struct repository *r,
> const struct object_id *oid, int quiet)
> {
> - struct object *obj = deref_tag(r,
> - parse_object(r, oid),
> - NULL, 0);
Here we drop the use of `deref_tag()` which required us to
unconfitionally parse the object.
> + const struct object_id *maybe_peeled;
> + struct object_id peeled_oid;
> + struct object *object;
> + enum object_type type;
>
> - if (!obj)
> + switch (peel_object_ext(r, oid, &peeled_oid, 0, &type)) {
Here we peel the object and also get its type.
> + case PEEL_NON_TAG:
> + maybe_peeled = oid;
> + break;
> + case PEEL_PEELED:
> + maybe_peeled = &peeled_oid;
> + break;
> + default:
> return NULL;
> - return object_as_type(obj, OBJ_COMMIT, quiet);
> + }
> +
> + if (type != OBJ_COMMIT) {
> + if (!quiet)
> + error(_("object %s is a %s, not a %s"),
> + oid_to_hex(oid), type_name(type),
> + type_name(OBJ_COMMIT));
> + return NULL;
> + }
> +
> + object = parse_object(r, maybe_peeled);
> + if (!object)
> + return NULL;
Now we only parse the object if it is a commit.
> +
> + return object_as_type(object, OBJ_COMMIT, quiet);
> }
>
> struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)
> diff --git a/object.c b/object.c
> index 4669b8d65e..99b6df3780 100644
> --- a/object.c
> +++ b/object.c
> @@ -207,10 +207,11 @@ struct object *lookup_object_by_type(struct repository *r,
> }
> }
>
> -enum peel_status peel_object(struct repository *r,
> - const struct object_id *name,
> - struct object_id *oid,
> - unsigned flags)
> +enum peel_status peel_object_ext(struct repository *r,
> + const struct object_id *name,
> + struct object_id *oid,
> + unsigned flags,
> + enum object_type *typep)
> {
> struct object *o = lookup_unknown_object(r, name);
>
> @@ -220,8 +221,10 @@ enum peel_status peel_object(struct repository *r,
> return PEEL_INVALID;
> }
>
> - if (o->type != OBJ_TAG)
> + if (o->type != OBJ_TAG) {
> + *typep = o->type;
> return PEEL_NON_TAG;
> + }
>
> while (o && o->type == OBJ_TAG) {
> o = parse_object(r, &o->oid);
> @@ -241,9 +244,19 @@ enum peel_status peel_object(struct repository *r,
> return PEEL_INVALID;
>
> oidcpy(oid, &o->oid);
> + *typep = o->type;
Here we are modifying the existing `peel_object()` to also provide
object type info back to the caller. With this, we can avoid the
unconditional object parsing in `lookup_commit_reference_gently()`.
Looks good.
-Justin
next prev parent reply other threads:[~2026-02-18 18:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-16 15:38 [PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to_refs()` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 18:26 ` Justin Tobler [this message]
2026-02-19 17:35 ` Junio C Hamano
2026-02-16 15:38 ` [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust Patrick Steinhardt
2026-02-18 19:23 ` Justin Tobler
2026-02-19 7:18 ` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 19:34 ` Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZX4w8C5In9gEF33@denethor \
--to=jltobler@gmail.com \
--cc=git@vger.kernel.org \
--cc=msmiley@gitlab.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.