From: Justin Tobler <jltobler@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Matt Smiley <msmiley@gitlab.com>
Subject: Re: [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()`
Date: Wed, 18 Feb 2026 12:26:21 -0600 [thread overview]
Message-ID: <aZX4w8C5In9gEF33@denethor> (raw)
In-Reply-To: <20260216-b4-pks-receive-pack-optimize-shallow-v1-1-e98886daff2b@pks.im>
On 26/02/16 04:38PM, Patrick Steinhardt wrote:
> The function `lookup_commit_reference_gently()` can be used to look up a
> committish by object ID. As such, the function knows to peel for example
> tag objects so that we eventually end up with the commit.
>
> The function is used quite a lot throughout our tree. One such user is
> "shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
> function is to figure out whether a shallow push is missing any objects
> that are required to satisfy the ref updates, and if so, which of the
> ref updates is missing objects.
>
> This is done by painting the tree with `UNINTERESTING`. We start
> painting by calling `refs_for_each_ref()` so that we can mark all
> existing referenced objects as the boundary of objects that we already
> have, and which are supposed to be fully connected. The reference tips
> are then parsed via `lookup_commit_reference_gently()`, and the commit
> commit is then marked as uninteresting.
s/commit commit/commit/
> But references may not necessarily point to a committish, and if a lot
> of them aren't then this step takes a lot of time. This is mostly due to
> the way that `lookup_commit_reference_gently()` is implemented: before
> we learn about the type of the object we already call `parse_object()`
> on the object ID. This has two consequences:
>
> - We parse all objects, including trees and blobs, even though we
> don't even need the contents of them.
>
> - More importantly though, `parse_object()` will cause us to check
> whether the object ID matches its contents.
>
> Combined this means that we deflate and hash every non-committish
> object, and that of course ends up being both CPU- and memory-intensive.
IIUC, the `mark_uninteresting()` callback used by `refs_for_each_ref()`
is resulting in resolved/peeled object being parsed regardless of its
type. This is wasteful though because we should only care about commits.
This makes sense.
> Improve the logic so that we first use `peel_object()`. This function
> won't parse the object for us, and thus it allows us to learn about the
> object's type before we parse and return it.
>
> The following benchmark pushes a single object from a shallow clone into
> a repository that has 100,000 refs. These refs were created by listing
> all objects via `git rev-list(1) --objects --all` and creating refs for
> a subset of them, so lots of those refs will cover non-commit objects.
>
> Benchmark 1: git-receive-pack (rev = HEAD~)
> Time (mean ± σ): 62.571 s ± 0.413 s [User: 58.331 s, System: 4.053 s]
> Range (min … max): 62.191 s … 63.010 s 3 runs
>
> Benchmark 2: git-receive-pack (rev = HEAD)
> Time (mean ± σ): 38.339 s ± 0.192 s [User: 36.220 s, System: 1.992 s]
> Range (min … max): 38.176 s … 38.551 s 3 runs
>
> Summary
> git-receive-pack . </tmp/input (rev = HEAD) ran
> 1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~)
>
> This leads to a sizeable speedup as we now skip reading and parsing
> non-commit objects. Before this change we spent around 40% of the time
> in `assign_shallow_commits_to_refs()`, after the change we only spend
> around 1.2% of the time in there. Almost the entire remainder of the
> time is spent in git-rev-list(1) to perform the connectivity checks.
Nice!
> Despite the speedup though, this also leads to a massive reduction in
> allocations. Before:
>
> HEAP SUMMARY:
> in use at exit: 352,480,441 bytes in 97,185 blocks
> total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated
>
> And after:
>
> HEAP SUMMARY:
> in use at exit: 17,524,978 bytes in 22,393 blocks
> total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
>
> Note that when all references refer to commits performance stays roughly
> the same, as expected. The following benchmark was executed with 600k
> commits:
>
> Benchmark 1: git-receive-pack (rev = HEAD~)
> Time (mean ± σ): 9.101 s ± 0.006 s [User: 8.800 s, System: 0.520 s]
> Range (min … max): 9.095 s … 9.106 s 3 runs
>
> Benchmark 2: git-receive-pack (rev = HEAD)
> Time (mean ± σ): 9.128 s ± 0.094 s [User: 8.820 s, System: 0.522 s]
> Range (min … max): 9.019 s … 9.188 s 3 runs
>
> Summary
> git-receive-pack (rev = HEAD~) ran
> 1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD)
>
> This will be improved in the next commit.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> commit.c | 32 +++++++++++++++++++++++++++-----
> object.c | 23 ++++++++++++++++++-----
> object.h | 5 +++++
> 3 files changed, 50 insertions(+), 10 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index 9bb471d217..b7c4ec2eb5 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -43,13 +43,35 @@ const char *commit_type = "commit";
> struct commit *lookup_commit_reference_gently(struct repository *r,
> const struct object_id *oid, int quiet)
> {
> - struct object *obj = deref_tag(r,
> - parse_object(r, oid),
> - NULL, 0);
Here we drop the use of `deref_tag()` which required us to
unconfitionally parse the object.
> + const struct object_id *maybe_peeled;
> + struct object_id peeled_oid;
> + struct object *object;
> + enum object_type type;
>
> - if (!obj)
> + switch (peel_object_ext(r, oid, &peeled_oid, 0, &type)) {
Here we peel the object and also get its type.
> + case PEEL_NON_TAG:
> + maybe_peeled = oid;
> + break;
> + case PEEL_PEELED:
> + maybe_peeled = &peeled_oid;
> + break;
> + default:
> return NULL;
> - return object_as_type(obj, OBJ_COMMIT, quiet);
> + }
> +
> + if (type != OBJ_COMMIT) {
> + if (!quiet)
> + error(_("object %s is a %s, not a %s"),
> + oid_to_hex(oid), type_name(type),
> + type_name(OBJ_COMMIT));
> + return NULL;
> + }
> +
> + object = parse_object(r, maybe_peeled);
> + if (!object)
> + return NULL;
Now we only parse the object if it is a commit.
> +
> + return object_as_type(object, OBJ_COMMIT, quiet);
> }
>
> struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)
> diff --git a/object.c b/object.c
> index 4669b8d65e..99b6df3780 100644
> --- a/object.c
> +++ b/object.c
> @@ -207,10 +207,11 @@ struct object *lookup_object_by_type(struct repository *r,
> }
> }
>
> -enum peel_status peel_object(struct repository *r,
> - const struct object_id *name,
> - struct object_id *oid,
> - unsigned flags)
> +enum peel_status peel_object_ext(struct repository *r,
> + const struct object_id *name,
> + struct object_id *oid,
> + unsigned flags,
> + enum object_type *typep)
> {
> struct object *o = lookup_unknown_object(r, name);
>
> @@ -220,8 +221,10 @@ enum peel_status peel_object(struct repository *r,
> return PEEL_INVALID;
> }
>
> - if (o->type != OBJ_TAG)
> + if (o->type != OBJ_TAG) {
> + *typep = o->type;
> return PEEL_NON_TAG;
> + }
>
> while (o && o->type == OBJ_TAG) {
> o = parse_object(r, &o->oid);
> @@ -241,9 +244,19 @@ enum peel_status peel_object(struct repository *r,
> return PEEL_INVALID;
>
> oidcpy(oid, &o->oid);
> + *typep = o->type;
Here we are modifying the existing `peel_object()` to also provide
object type info back to the caller. With this, we can avoid the
unconditional object parsing in `lookup_commit_reference_gently()`.
Looks good.
-Justin
next prev parent reply other threads:[~2026-02-18 18:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-16 15:38 [PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to_refs()` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 18:26 ` Justin Tobler [this message]
2026-02-19 17:35 ` Junio C Hamano
2026-02-16 15:38 ` [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust Patrick Steinhardt
2026-02-18 19:23 ` Justin Tobler
2026-02-19 7:18 ` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 19:34 ` Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZX4w8C5In9gEF33@denethor \
--to=jltobler@gmail.com \
--cc=git@vger.kernel.org \
--cc=msmiley@gitlab.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox