[PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to

public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to_refs()`
@ 2026-02-16 15:38 Patrick Steinhardt
  2026-02-16 15:38 ` [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()` Patrick Steinhardt
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Patrick Steinhardt @ 2026-02-16 15:38 UTC (permalink / raw)
  To: git; +Cc: Matt Smiley

Hi,

this patch series optimizes how git-receive-pack(1) handles shallow
pushes in `assign_shallow_commits_to_refs()`. This covers some edge
cases that we have hit in a production repository, where shallow pushes
could take many minutes and allocate a ton of memory.

Thanks!

Patrick

---
Patrick Steinhardt (3):
      commit: avoid parsing non-commits in `lookup_commit_reference_gently()`
      commit: make `repo_parse_commit_no_graph()` more robust
      commit: use commit graph in `lookup_commit_reference_gently()`

 commit.c                        | 32 +++++++++++++++++++++++++++-----
 commit.h                        | 14 ++++++++++++--
 contrib/coccinelle/commit.cocci |  2 +-
 object.c                        | 23 ++++++++++++++++++-----
 object.h                        |  5 +++++
 5 files changed, 63 insertions(+), 13 deletions(-)


---
base-commit: 852829b3dd2fe4e7c7fc4d8badde644cf1b66c74
change-id: 20260216-b4-pks-receive-pack-optimize-shallow-0a6cc508549d


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()`
  2026-02-16 15:38 [PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to_refs()` Patrick Steinhardt
@ 2026-02-16 15:38 ` Patrick Steinhardt
  2026-02-18 18:26   ` Justin Tobler
  2026-02-16 15:38 ` [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust Patrick Steinhardt
  2026-02-16 15:38 ` [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()` Patrick Steinhardt
  2 siblings, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2026-02-16 15:38 UTC (permalink / raw)
  To: git; +Cc: Matt Smiley

The function `lookup_commit_reference_gently()` can be used to look up a
committish by object ID. As such, the function knows to peel for example
tag objects so that we eventually end up with the commit.

The function is used quite a lot throughout our tree. One such user is
"shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
function is to figure out whether a shallow push is missing any objects
that are required to satisfy the ref updates, and if so, which of the
ref updates is missing objects.

This is done by painting the tree with `UNINTERESTING`. We start
painting by calling `refs_for_each_ref()` so that we can mark all
existing referenced objects as the boundary of objects that we already
have, and which are supposed to be fully connected. The reference tips
are then parsed via `lookup_commit_reference_gently()`, and the commit
commit is then marked as uninteresting.

But references may not necessarily point to a committish, and if a lot
of them aren't then this step takes a lot of time. This is mostly due to
the way that `lookup_commit_reference_gently()` is implemented: before
we learn about the type of the object we already call `parse_object()`
on the object ID. This has two consequences:

  - We parse all objects, including trees and blobs, even though we
    don't even need the contents of them.

  - More importantly though, `parse_object()` will cause us to check
    whether the object ID matches its contents.

Combined this means that we deflate and hash every non-committish
object, and that of course ends up being both CPU- and memory-intensive.

Improve the logic so that we first use `peel_object()`. This function
won't parse the object for us, and thus it allows us to learn about the
object's type before we parse and return it.

The following benchmark pushes a single object from a shallow clone into
a repository that has 100,000 refs. These refs were created by listing
all objects via `git rev-list(1) --objects --all` and creating refs for
a subset of them, so lots of those refs will cover non-commit objects.

  Benchmark 1: git-receive-pack (rev = HEAD~)
    Time (mean ± σ):     62.571 s ±  0.413 s    [User: 58.331 s, System: 4.053 s]
    Range (min … max):   62.191 s … 63.010 s    3 runs

  Benchmark 2: git-receive-pack (rev = HEAD)
    Time (mean ± σ):     38.339 s ±  0.192 s    [User: 36.220 s, System: 1.992 s]
    Range (min … max):   38.176 s … 38.551 s    3 runs

  Summary
    git-receive-pack . </tmp/input (rev = HEAD) ran
      1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~)

This leads to a sizeable speedup as we now skip reading and parsing
non-commit objects. Before this change we spent around 40% of the time
in `assign_shallow_commits_to_refs()`, after the change we only spend
around 1.2% of the time in there. Almost the entire remainder of the
time is spent in git-rev-list(1) to perform the connectivity checks.

Despite the speedup though, this also leads to a massive reduction in
allocations. Before:

  HEAP SUMMARY:
      in use at exit: 352,480,441 bytes in 97,185 blocks
    total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated

And after:

  HEAP SUMMARY:
      in use at exit: 17,524,978 bytes in 22,393 blocks
    total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated

Note that when all references refer to commits performance stays roughly
the same, as expected. The following benchmark was executed with 600k
commits:

  Benchmark 1: git-receive-pack (rev = HEAD~)
    Time (mean ± σ):      9.101 s ±  0.006 s    [User: 8.800 s, System: 0.520 s]
    Range (min … max):    9.095 s …  9.106 s    3 runs

  Benchmark 2: git-receive-pack (rev = HEAD)
    Time (mean ± σ):      9.128 s ±  0.094 s    [User: 8.820 s, System: 0.522 s]
    Range (min … max):    9.019 s …  9.188 s    3 runs

  Summary
    git-receive-pack (rev = HEAD~) ran
      1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD)

This will be improved in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 commit.c | 32 +++++++++++++++++++++++++++-----
 object.c | 23 ++++++++++++++++++-----
 object.h |  5 +++++
 3 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/commit.c b/commit.c
index 9bb471d217..b7c4ec2eb5 100644
--- a/commit.c
+++ b/commit.c
@@ -43,13 +43,35 @@ const char *commit_type = "commit";
 struct commit *lookup_commit_reference_gently(struct repository *r,
 		const struct object_id *oid, int quiet)
 {
-	struct object *obj = deref_tag(r,
-				       parse_object(r, oid),
-				       NULL, 0);
+	const struct object_id *maybe_peeled;
+	struct object_id peeled_oid;
+	struct object *object;
+	enum object_type type;

-	if (!obj)
+	switch (peel_object_ext(r, oid, &peeled_oid, 0, &type)) {
+	case PEEL_NON_TAG:
+		maybe_peeled = oid;
+		break;
+	case PEEL_PEELED:
+		maybe_peeled = &peeled_oid;
+		break;
+	default:
 		return NULL;
-	return object_as_type(obj, OBJ_COMMIT, quiet);
+	}
+
+	if (type != OBJ_COMMIT) {
+		if (!quiet)
+			error(_("object %s is a %s, not a %s"),
+			      oid_to_hex(oid), type_name(type),
+			      type_name(OBJ_COMMIT));
+		return NULL;
+	}
+
+	object = parse_object(r, maybe_peeled);
+	if (!object)
+		return NULL;
+
+	return object_as_type(object, OBJ_COMMIT, quiet);
 }

 struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)
diff --git a/object.c b/object.c
index 4669b8d65e..99b6df3780 100644
--- a/object.c
+++ b/object.c
@@ -207,10 +207,11 @@ struct object *lookup_object_by_type(struct repository *r,
 	}
 }

-enum peel_status peel_object(struct repository *r,
-			     const struct object_id *name,
-			     struct object_id *oid,
-			     unsigned flags)
+enum peel_status peel_object_ext(struct repository *r,
+				 const struct object_id *name,
+				 struct object_id *oid,
+				 unsigned flags,
+				 enum object_type *typep)
 {
 	struct object *o = lookup_unknown_object(r, name);

@@ -220,8 +221,10 @@ enum peel_status peel_object(struct repository *r,
 			return PEEL_INVALID;
 	}

-	if (o->type != OBJ_TAG)
+	if (o->type != OBJ_TAG) {
+		*typep = o->type;
 		return PEEL_NON_TAG;
+	}

 	while (o && o->type == OBJ_TAG) {
 		o = parse_object(r, &o->oid);
@@ -241,9 +244,19 @@ enum peel_status peel_object(struct repository *r,
 		return PEEL_INVALID;

 	oidcpy(oid, &o->oid);
+	*typep = o->type;
 	return PEEL_PEELED;
 }

+enum peel_status peel_object(struct repository *r,
+			     const struct object_id *name,
+			     struct object_id *oid,
+			     unsigned flags)
+{
+	enum object_type dummy;
+	return peel_object_ext(r, name, oid, flags, &dummy);
+}
+
 struct object *parse_object_buffer(struct repository *r, const struct object_id *oid, enum object_type type, unsigned long size, void *buffer, int *eaten_p)
 {
 	struct object *obj;
diff --git a/object.h b/object.h
index 4bca957b8d..8f98382243 100644
--- a/object.h
+++ b/object.h
@@ -309,6 +309,11 @@ enum peel_status peel_object(struct repository *r,
 			     const struct object_id *name,
 			     struct object_id *oid,
 			     unsigned flags);
+enum peel_status peel_object_ext(struct repository *r,
+				 const struct object_id *name,
+				 struct object_id *oid,
+				 unsigned flags,
+				 enum object_type *typep);

 struct object_list *object_list_insert(struct object *item,
 				       struct object_list **list_p);

-- 
2.53.0.352.gd1286b26eb.dirty

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()`
  2026-02-16 15:38 ` [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()` Patrick Steinhardt
@ 2026-02-18 18:26   ` Justin Tobler
  2026-02-19 17:35     ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Justin Tobler @ 2026-02-18 18:26 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Matt Smiley

On 26/02/16 04:38PM, Patrick Steinhardt wrote:
> The function `lookup_commit_reference_gently()` can be used to look up a
> committish by object ID. As such, the function knows to peel for example
> tag objects so that we eventually end up with the commit.
> 
> The function is used quite a lot throughout our tree. One such user is
> "shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
> function is to figure out whether a shallow push is missing any objects
> that are required to satisfy the ref updates, and if so, which of the
> ref updates is missing objects.
> 
> This is done by painting the tree with `UNINTERESTING`. We start
> painting by calling `refs_for_each_ref()` so that we can mark all
> existing referenced objects as the boundary of objects that we already
> have, and which are supposed to be fully connected. The reference tips
> are then parsed via `lookup_commit_reference_gently()`, and the commit
> commit is then marked as uninteresting.

s/commit commit/commit/

> But references may not necessarily point to a committish, and if a lot
> of them aren't then this step takes a lot of time. This is mostly due to
> the way that `lookup_commit_reference_gently()` is implemented: before
> we learn about the type of the object we already call `parse_object()`
> on the object ID. This has two consequences:
> 
>   - We parse all objects, including trees and blobs, even though we
>     don't even need the contents of them.
> 
>   - More importantly though, `parse_object()` will cause us to check
>     whether the object ID matches its contents.
> 
> Combined this means that we deflate and hash every non-committish
> object, and that of course ends up being both CPU- and memory-intensive.

IIUC, the `mark_uninteresting()` callback used by `refs_for_each_ref()`
is resulting in resolved/peeled object being parsed regardless of its
type. This is wasteful though because we should only care about commits.
This makes sense.

> Improve the logic so that we first use `peel_object()`. This function
> won't parse the object for us, and thus it allows us to learn about the
> object's type before we parse and return it.
> 
> The following benchmark pushes a single object from a shallow clone into
> a repository that has 100,000 refs. These refs were created by listing
> all objects via `git rev-list(1) --objects --all` and creating refs for
> a subset of them, so lots of those refs will cover non-commit objects.
> 
>   Benchmark 1: git-receive-pack (rev = HEAD~)
>     Time (mean ± σ):     62.571 s ±  0.413 s    [User: 58.331 s, System: 4.053 s]
>     Range (min … max):   62.191 s … 63.010 s    3 runs
> 
>   Benchmark 2: git-receive-pack (rev = HEAD)
>     Time (mean ± σ):     38.339 s ±  0.192 s    [User: 36.220 s, System: 1.992 s]
>     Range (min … max):   38.176 s … 38.551 s    3 runs
> 
>   Summary
>     git-receive-pack . </tmp/input (rev = HEAD) ran
>       1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~)
> 
> This leads to a sizeable speedup as we now skip reading and parsing
> non-commit objects. Before this change we spent around 40% of the time
> in `assign_shallow_commits_to_refs()`, after the change we only spend
> around 1.2% of the time in there. Almost the entire remainder of the
> time is spent in git-rev-list(1) to perform the connectivity checks.

Nice!

> Despite the speedup though, this also leads to a massive reduction in
> allocations. Before:
> 
>   HEAP SUMMARY:
>       in use at exit: 352,480,441 bytes in 97,185 blocks
>     total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated
> 
> And after:
> 
>   HEAP SUMMARY:
>       in use at exit: 17,524,978 bytes in 22,393 blocks
>     total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
> 
> Note that when all references refer to commits performance stays roughly
> the same, as expected. The following benchmark was executed with 600k
> commits:
> 
>   Benchmark 1: git-receive-pack (rev = HEAD~)
>     Time (mean ± σ):      9.101 s ±  0.006 s    [User: 8.800 s, System: 0.520 s]
>     Range (min … max):    9.095 s …  9.106 s    3 runs
> 
>   Benchmark 2: git-receive-pack (rev = HEAD)
>     Time (mean ± σ):      9.128 s ±  0.094 s    [User: 8.820 s, System: 0.522 s]
>     Range (min … max):    9.019 s …  9.188 s    3 runs
> 
>   Summary
>     git-receive-pack (rev = HEAD~) ran
>       1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD)
> 
> This will be improved in the next commit.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  commit.c | 32 +++++++++++++++++++++++++++-----
>  object.c | 23 ++++++++++++++++++-----
>  object.h |  5 +++++
>  3 files changed, 50 insertions(+), 10 deletions(-)
> 
> diff --git a/commit.c b/commit.c
> index 9bb471d217..b7c4ec2eb5 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -43,13 +43,35 @@ const char *commit_type = "commit";
>  struct commit *lookup_commit_reference_gently(struct repository *r,
>  		const struct object_id *oid, int quiet)
>  {
> -	struct object *obj = deref_tag(r,
> -				       parse_object(r, oid),
> -				       NULL, 0);

Here we drop the use of `deref_tag()` which required us to
unconfitionally parse the object.

> +	const struct object_id *maybe_peeled;
> +	struct object_id peeled_oid;
> +	struct object *object;
> +	enum object_type type;
>  
> -	if (!obj)
> +	switch (peel_object_ext(r, oid, &peeled_oid, 0, &type)) {

Here we peel the object and also get its type.

> +	case PEEL_NON_TAG:
> +		maybe_peeled = oid;
> +		break;
> +	case PEEL_PEELED:
> +		maybe_peeled = &peeled_oid;
> +		break;
> +	default:
>  		return NULL;
> -	return object_as_type(obj, OBJ_COMMIT, quiet);
> +	}
> +
> +	if (type != OBJ_COMMIT) {
> +		if (!quiet)
> +			error(_("object %s is a %s, not a %s"),
> +			      oid_to_hex(oid), type_name(type),
> +			      type_name(OBJ_COMMIT));
> +		return NULL;
> +	}
> +
> +	object = parse_object(r, maybe_peeled);
> +	if (!object)
> +		return NULL;

Now we only parse the object if it is a commit.

> +
> +	return object_as_type(object, OBJ_COMMIT, quiet);
>  }
>  
>  struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)
> diff --git a/object.c b/object.c
> index 4669b8d65e..99b6df3780 100644
> --- a/object.c
> +++ b/object.c
> @@ -207,10 +207,11 @@ struct object *lookup_object_by_type(struct repository *r,
>  	}
>  }
>  
> -enum peel_status peel_object(struct repository *r,
> -			     const struct object_id *name,
> -			     struct object_id *oid,
> -			     unsigned flags)
> +enum peel_status peel_object_ext(struct repository *r,
> +				 const struct object_id *name,
> +				 struct object_id *oid,
> +				 unsigned flags,
> +				 enum object_type *typep)
>  {
>  	struct object *o = lookup_unknown_object(r, name);
>  
> @@ -220,8 +221,10 @@ enum peel_status peel_object(struct repository *r,
>  			return PEEL_INVALID;
>  	}
>  
> -	if (o->type != OBJ_TAG)
> +	if (o->type != OBJ_TAG) {
> +		*typep = o->type;
>  		return PEEL_NON_TAG;
> +	}
>  
>  	while (o && o->type == OBJ_TAG) {
>  		o = parse_object(r, &o->oid);
> @@ -241,9 +244,19 @@ enum peel_status peel_object(struct repository *r,
>  		return PEEL_INVALID;
>  
>  	oidcpy(oid, &o->oid);
> +	*typep = o->type;

Here we are modifying the existing `peel_object()` to also provide
object type info back to the caller. With this, we can avoid the
unconditional object parsing in `lookup_commit_reference_gently()`.
Looks good.

-Justin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()`
  2026-02-18 18:26   ` Justin Tobler
@ 2026-02-19 17:35     ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2026-02-19 17:35 UTC (permalink / raw)
  To: Patrick Steinhardt, Justin Tobler; +Cc: git, Matt Smiley

Justin Tobler <jltobler@gmail.com> writes:

> On 26/02/16 04:38PM, Patrick Steinhardt wrote:
>> The function `lookup_commit_reference_gently()` can be used to look up a
>> committish by object ID. As such, the function knows to peel for example
>> tag objects so that we eventually end up with the commit.
>> 
>> The function is used quite a lot throughout our tree. One such user is
>> "shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
>> function is to figure out whether a shallow push is missing any objects
>> that are required to satisfy the ref updates, and if so, which of the
>> ref updates is missing objects.
>> 
>> This is done by painting the tree with `UNINTERESTING`. We start
>> painting by calling `refs_for_each_ref()` so that we can mark all
>> existing referenced objects as the boundary of objects that we already
>> have, and which are supposed to be fully connected. The reference tips
>> are then parsed via `lookup_commit_reference_gently()`, and the commit
>> commit is then marked as uninteresting.
>
> s/commit commit/commit/

Will locally amend (no need to reroll only to fix this).

Thanks for carefully reading.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust
  2026-02-16 15:38 [PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to_refs()` Patrick Steinhardt
  2026-02-16 15:38 ` [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()` Patrick Steinhardt
@ 2026-02-16 15:38 ` Patrick Steinhardt
  2026-02-18 19:23   ` Justin Tobler
  2026-02-16 15:38 ` [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()` Patrick Steinhardt
  2 siblings, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2026-02-16 15:38 UTC (permalink / raw)
  To: git; +Cc: Matt Smiley

In the next commit we will start to parse more commits via the
commit-graph. This change will lead to a segfault though because we try
to access the tree of a commit via `repo_get_commit_tree()`, but:

  - The commit has been parsed via the commit-graph, and thus its
    `maybe_tree` field is not yet populated.

  - We cannot use the commit-graph to populate the commit's tree because
    we're in the process of writing the commit-graph.

The consequence is that we'll get a `NULL` pointer for the tree in
`write_graph_chunk_data()`.

In theory we are already mindful of this situation, as we explicitly use
`repo_parse_commit_no_graph()` to parse the commit without the help of
the commit-graph. But that doesn't do the trick as the commit is already
marked as parsed, so the function will not re-populate it. And as the
commit-graph has been closed, neither will `get_commit_tree_oid()` be
able to load the tree for us.

It seems like this issue can only be hit under artificial circumstances:
the error was hit via `git_test_write_commit_graph_or_die()`, which is
run by git-commit(1) and git-merge(1) in case `GIT_TEST_COMMIT_GRAPH=1`:

  $ GIT_TEST_COMMIT_GRAPH=1 meson test t7507-commit-verbose \
      --test-args=-ix -i
  ...
  ++ git -c commit.verbose=true commit --amend
  hint: Waiting for your editor to close the file...
  ./test-lib.sh: line 1012: 55895 Segmentation fault         (core dumped) git -c commit.verbose=true commit --amend

To the best of my knowledge, this is the only case where we end up
writing a commit-graph in the same process that might have already
consulted the commit-graph to look up arbitrary objects. But regardless
of that, this feels like a bigger accident that is just waiting to
happen.

Make the code more robust by extending `repo_parse_commit_no_graph()` to
unparse a commit first in case we detect it's coming from a graph. This
ensures that we will re-read the object without it, and thus we will
populate `maybe_tree` properly.

This fix shouldn't have any performance consequences: the function is
only ever called in the "commit-graph.c" code, and we'll only re-parse
the commit at most once.

Add an exclusion to our Coccinelle rules so that it doesn't complain
about us accessing `maybe_tree` directly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 commit.h                        | 14 ++++++++++++--
 contrib/coccinelle/commit.cocci |  2 +-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/commit.h b/commit.h
index 1635de418b..f2f39e1a89 100644
--- a/commit.h
+++ b/commit.h
@@ -103,16 +103,26 @@ static inline int repo_parse_commit(struct repository *r, struct commit *item)
 	return repo_parse_commit_gently(r, item, 0);
 }

+void unparse_commit(struct repository *r, const struct object_id *oid);
+
 static inline int repo_parse_commit_no_graph(struct repository *r,
 					     struct commit *commit)
 {
+	/*
+	 * When the commit has been parsed but its tree wasn't populated then
+	 * this is an indicator that it has been parsed via the commit-graph.
+	 * We cannot read the tree via the commit-graph, as we're explicitly
+	 * told not to use it. We thus have to first un-parse the object so
+	 * that we can re-parse it without the graph.
+	 */
+	if (commit->object.parsed && !commit->maybe_tree)
+		unparse_commit(r, &commit->object.oid);
+
 	return repo_parse_commit_internal(r, commit, 0, 0);
 }

 void parse_commit_or_die(struct commit *item);

-void unparse_commit(struct repository *r, const struct object_id *oid);
-
 struct buffer_slab;
 struct buffer_slab *allocate_commit_buffer_slab(void);
 void free_commit_buffer_slab(struct buffer_slab *bs);
diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
index c5284604c5..42725161e9 100644
--- a/contrib/coccinelle/commit.cocci
+++ b/contrib/coccinelle/commit.cocci
@@ -26,7 +26,7 @@ expression s;
 // repo_get_commit_tree() on the LHS.
 @@
 identifier f != { repo_get_commit_tree, get_commit_tree_in_graph_one,
-		  load_tree_for_commit, set_commit_tree };
+		  load_tree_for_commit, set_commit_tree, repo_parse_commit_no_graph };
 expression c;
 @@
   f(...) {<...

-- 
2.53.0.352.gd1286b26eb.dirty

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust
  2026-02-16 15:38 ` [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust Patrick Steinhardt
@ 2026-02-18 19:23   ` Justin Tobler
  2026-02-19  7:18     ` Patrick Steinhardt
  0 siblings, 1 reply; 9+ messages in thread
From: Justin Tobler @ 2026-02-18 19:23 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Matt Smiley

On 26/02/16 04:38PM, Patrick Steinhardt wrote:
> In the next commit we will start to parse more commits via the
> commit-graph. This change will lead to a segfault though because we try
> to access the tree of a commit via `repo_get_commit_tree()`, but:
> 
>   - The commit has been parsed via the commit-graph, and thus its
>     `maybe_tree` field is not yet populated.
> 
>   - We cannot use the commit-graph to populate the commit's tree because
>     we're in the process of writing the commit-graph.
> 
> The consequence is that we'll get a `NULL` pointer for the tree in
> `write_graph_chunk_data()`.

IIUC, when a commit has been parsed via the commit graph, if the commit
graph is closed, there is no longer a way for commit object tree to be
read. This results `repo_get_commit_tree()` always returning NULL in
such scenarios.

> In theory we are already mindful of this situation, as we explicitly use
> `repo_parse_commit_no_graph()` to parse the commit without the help of
> the commit-graph. But that doesn't do the trick as the commit is already
> marked as parsed, so the function will not re-populate it. And as the
> commit-graph has been closed, neither will `get_commit_tree_oid()` be
> able to load the tree for us.

And `repo_parse_commit_no_graph()` doesn't work because the commit has
already been marked as parsed.

> It seems like this issue can only be hit under artificial circumstances:
> the error was hit via `git_test_write_commit_graph_or_die()`, which is
> run by git-commit(1) and git-merge(1) in case `GIT_TEST_COMMIT_GRAPH=1`:
> 
>   $ GIT_TEST_COMMIT_GRAPH=1 meson test t7507-commit-verbose \
>       --test-args=-ix -i
>   ...
>   ++ git -c commit.verbose=true commit --amend
>   hint: Waiting for your editor to close the file...
>   ./test-lib.sh: line 1012: 55895 Segmentation fault         (core dumped) git -c commit.verbose=true commit --amend
> 
> To the best of my knowledge, this is the only case where we end up
> writing a commit-graph in the same process that might have already
> consulted the commit-graph to look up arbitrary objects. But regardless
> of that, this feels like a bigger accident that is just waiting to
> happen.

So I assume we end up closing the commit-graph when writing a new one.
If we need to read the trees of commits parsed via commit-graph, this
will trigger a segfault since the commit tree will always be NULL.

> Make the code more robust by extending `repo_parse_commit_no_graph()` to
> unparse a commit first in case we detect it's coming from a graph. This
> ensures that we will re-read the object without it, and thus we will
> populate `maybe_tree` properly.

Hmm, I wonder if this is conceptually the correct place to address this
problem. Naively, I would expect `repo_get_commit_tree()` to always be
capable of providing the commit tree. I guess the problem though is that
this would require `repo_get_commit_tree()` to detect this scenario and
reparse the object itself. Maybe we could at least have
`repo_get_commit_tree()` BUG() in this scenario though?

> This fix shouldn't have any performance consequences: the function is
> only ever called in the "commit-graph.c" code, and we'll only re-parse
> the commit at most once.
> 
> Add an exclusion to our Coccinelle rules so that it doesn't complain
> about us accessing `maybe_tree` directly.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  commit.h                        | 14 ++++++++++++--
>  contrib/coccinelle/commit.cocci |  2 +-
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/commit.h b/commit.h
> index 1635de418b..f2f39e1a89 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -103,16 +103,26 @@ static inline int repo_parse_commit(struct repository *r, struct commit *item)
>  	return repo_parse_commit_gently(r, item, 0);
>  }
>  
> +void unparse_commit(struct repository *r, const struct object_id *oid);
> +
>  static inline int repo_parse_commit_no_graph(struct repository *r,
>  					     struct commit *commit)
>  {
> +	/*
> +	 * When the commit has been parsed but its tree wasn't populated then
> +	 * this is an indicator that it has been parsed via the commit-graph.
> +	 * We cannot read the tree via the commit-graph, as we're explicitly
> +	 * told not to use it. We thus have to first un-parse the object so
> +	 * that we can re-parse it without the graph.
> +	 */
> +	if (commit->object.parsed && !commit->maybe_tree)
> +		unparse_commit(r, &commit->object.oid);

This unparses commits that were read via commit-graph so they can be
reparsed normally. Looks good.

-Justin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust
  2026-02-18 19:23   ` Justin Tobler
@ 2026-02-19  7:18     ` Patrick Steinhardt
  0 siblings, 0 replies; 9+ messages in thread
From: Patrick Steinhardt @ 2026-02-19  7:18 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, Matt Smiley

On Wed, Feb 18, 2026 at 01:23:47PM -0600, Justin Tobler wrote:
> On 26/02/16 04:38PM, Patrick Steinhardt wrote:
> > Make the code more robust by extending `repo_parse_commit_no_graph()` to
> > unparse a commit first in case we detect it's coming from a graph. This
> > ensures that we will re-read the object without it, and thus we will
> > populate `maybe_tree` properly.
> 
> Hmm, I wonder if this is conceptually the correct place to address this
> problem. Naively, I would expect `repo_get_commit_tree()` to always be
> capable of providing the commit tree. I guess the problem though is that
> this would require `repo_get_commit_tree()` to detect this scenario and
> reparse the object itself. Maybe we could at least have
> `repo_get_commit_tree()` BUG() in this scenario though?

I was wondering myself whether we should make `repo_get_commit_tree()`
more robust instead. But I shy away from just calling BUG() because a
bunch of callers are prepared to handle a `NULL` return value, even
though many others aren't.

The alternative would be parse the commit via the ODB in case we see
that the commit is not from a graph. We cannot unparse the object here
though as that would potentially invalidate for example the list of
parents. So we'd have to refactor the code base a bit so that we can
parse the tree object.

But there's also another problem: we cannot expect the commit or tree to
even exist at all due to partial clones, and the question is what we
should do in this case. Should we fetch the missing objects in case
we're asked to get the tree? Maybe, but I guess it depends on the
context of the caller.

Overall this becomes complex pretty fast, unfortunately. I've attached a
naive implementation of the above, but as mentioned it fails in some
tests related to partial clones.

Patrick

--- >8 ---

diff --git a/archive.c b/archive.c
index fcd474c682..974f318f6d 100644
--- a/archive.c
+++ b/archive.c
@@ -49,7 +49,7 @@ void init_archivers(void)
 	init_zip_archiver();
 }
 
-static void format_subst(const struct commit *commit,
+static void format_subst(struct commit *commit,
 			 const char *src, size_t len,
 			 struct strbuf *buf, struct pretty_print_context *ctx)
 {
@@ -90,7 +90,7 @@ static void *object_file_to_archive(const struct archiver_args *args,
 				    unsigned long *sizep)
 {
 	void *buffer;
-	const struct commit *commit = args->convert ? args->commit : NULL;
+	struct commit *commit = args->convert ? args->commit : NULL;
 	struct checkout_metadata meta;
 
 	init_checkout_metadata(&meta, args->refname,
@@ -489,7 +489,7 @@ static void parse_treeish_arg(const char **argv,
 	const struct object_id *commit_oid;
 	time_t archive_time;
 	struct tree *tree;
-	const struct commit *commit;
+	struct commit *commit;
 	struct object_id oid;
 	char *ref = NULL;
 
diff --git a/archive.h b/archive.h
index bbe65ba0f9..0d0f064ffa 100644
--- a/archive.h
+++ b/archive.h
@@ -15,7 +15,7 @@ struct archiver_args {
 	size_t baselen;
 	struct tree *tree;
 	const struct object_id *commit_oid;
-	const struct commit *commit;
+	struct commit *commit;
 	const char *mtime_option;
 	timestamp_t time;
 	struct pathspec pathspec;
diff --git a/commit-graph.c b/commit-graph.c
index 1fcceb3920..45a9b486a7 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -2774,7 +2774,7 @@ static int verify_one_commit_graph(struct commit_graph *g,
 
 		graph_commit = lookup_commit(r, &cur_oid);
 		odb_commit = (struct commit *)create_object(r, &cur_oid, alloc_commit_node(r));
-		if (repo_parse_commit_internal(r, odb_commit, 0, 0)) {
+		if (repo_parse_commit_no_graph(r, odb_commit)) {
 			graph_report(_("failed to parse commit %s from object database for commit-graph"),
 				     oid_to_hex(&cur_oid));
 			continue;
diff --git a/commit.c b/commit.c
index b7c4ec2eb5..03a2ddfb78 100644
--- a/commit.c
+++ b/commit.c
@@ -435,24 +435,6 @@ static inline void set_commit_tree(struct commit *c, struct tree *t)
 	c->maybe_tree = t;
 }
 
-struct tree *repo_get_commit_tree(struct repository *r,
-				  const struct commit *commit)
-{
-	if (commit->maybe_tree || !commit->object.parsed)
-		return commit->maybe_tree;
-
-	if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
-		return get_commit_tree_in_graph(r, commit);
-
-	return NULL;
-}
-
-struct object_id *get_commit_tree_oid(const struct commit *commit)
-{
-	struct tree *tree = repo_get_commit_tree(the_repository, commit);
-	return tree ? &tree->object.oid : NULL;
-}
-
 void release_commit_memory(struct parsed_object_pool *pool, struct commit *c)
 {
 	set_commit_tree(c, NULL);
@@ -483,6 +465,34 @@ const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
 	return ret;
 }
 
+static int parse_commit_tree(struct repository *r, struct commit *item,
+			     const char *buffer, unsigned long size,
+			     const char **out)
+{
+	const int tree_entry_len = the_hash_algo->hexsz + 5;
+	const char *tail = buffer + size;
+	struct object_id tree_oid;
+	struct tree *tree;
+
+	if (tail <= buffer + tree_entry_len + 1 || memcmp(buffer, "tree ", 5) ||
+			buffer[tree_entry_len] != '\n')
+		return error("bogus commit object %s", oid_to_hex(&item->object.oid));
+	if (get_oid_hex(buffer + 5, &tree_oid) < 0)
+		return error("bad tree pointer in commit %s",
+			     oid_to_hex(&item->object.oid));
+
+	tree = lookup_tree(r, &tree_oid);
+	if (!tree)
+		return error("bad tree pointer %s in commit %s",
+			     oid_to_hex(&tree_oid),
+			     oid_to_hex(&item->object.oid));
+	set_commit_tree(item, tree);
+
+	if (out)
+		*out = buffer + tree_entry_len + 1;
+	return 0;
+}
+
 int parse_commit_buffer(struct repository *r, struct commit *item, const void *buffer, unsigned long size, int check_graph)
 {
 	const char *tail = buffer;
@@ -490,9 +500,8 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	struct object_id parent;
 	struct commit_list **pptr;
 	struct commit_graft *graft;
-	const int tree_entry_len = the_hash_algo->hexsz + 5;
 	const int parent_entry_len = the_hash_algo->hexsz + 7;
-	struct tree *tree;
+	int ret;
 
 	if (item->object.parsed)
 		return 0;
@@ -506,19 +515,11 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	item->parents = NULL;
 
 	tail += size;
-	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
-			bufptr[tree_entry_len] != '\n')
-		return error("bogus commit object %s", oid_to_hex(&item->object.oid));
-	if (get_oid_hex(bufptr + 5, &parent) < 0)
-		return error("bad tree pointer in commit %s",
-			     oid_to_hex(&item->object.oid));
-	tree = lookup_tree(r, &parent);
-	if (!tree)
-		return error("bad tree pointer %s in commit %s",
-			     oid_to_hex(&parent),
-			     oid_to_hex(&item->object.oid));
-	set_commit_tree(item, tree);
-	bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */
+
+	ret = parse_commit_tree(r, item, buffer, size, &bufptr);
+	if (ret < 0)
+		return ret;
+
 	pptr = &item->parents;
 
 	graft = lookup_commit_graft(r, &item->object.oid);
@@ -567,18 +568,16 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	return 0;
 }
 
-int repo_parse_commit_internal(struct repository *r,
-			       struct commit *item,
-			       int quiet_on_missing,
-			       int use_commit_graph)
+static int repo_read_commit_buffer(struct repository *r,
+				   const struct object_id *oid,
+				   void **buffer, unsigned long *size,
+				   int quiet_on_missing)
 {
 	enum object_type type;
-	void *buffer;
-	unsigned long size;
 	struct object_info oi = {
 		.typep = &type,
-		.sizep = &size,
-		.contentp = &buffer,
+		.sizep = size,
+		.contentp = buffer,
 	};
 	/*
 	 * Git does not support partial clones that exclude commits, so set
@@ -586,6 +585,51 @@ int repo_parse_commit_internal(struct repository *r,
 	 */
 	int flags = OBJECT_INFO_LOOKUP_REPLACE | OBJECT_INFO_SKIP_FETCH_OBJECT |
 		OBJECT_INFO_DIE_IF_CORRUPT;
+
+	if (odb_read_object_info_extended(r->objects, oid, &oi, flags) < 0)
+		return quiet_on_missing ? -1 :
+			error("Could not read %s", oid_to_hex(oid));
+	if (type != OBJ_COMMIT) {
+		free(buffer);
+		return error("Object %s not a commit", oid_to_hex(oid));
+	}
+
+	return 0;
+}
+
+struct tree *repo_get_commit_tree(struct repository *r,
+				  struct commit *commit)
+{
+	unsigned long size;
+	void *buffer;
+
+	if (commit->maybe_tree)
+		return commit->maybe_tree;
+
+	if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
+		return get_commit_tree_in_graph(r, commit);
+
+	if (repo_read_commit_buffer(r, &commit->object.oid, &buffer, &size, 0) < 0 ||
+	    parse_commit_tree(r, commit, buffer, size, NULL) < 0)
+		die(NULL);
+
+	free(buffer);
+	return commit->maybe_tree;
+}
+
+struct object_id *get_commit_tree_oid(struct commit *commit)
+{
+	struct tree *tree = repo_get_commit_tree(the_repository, commit);
+	return tree ? &tree->object.oid : NULL;
+}
+
+static int repo_parse_commit_internal(struct repository *r,
+				      struct commit *item,
+				      int quiet_on_missing,
+				      int use_commit_graph)
+{
+	unsigned long size;
+	void *buffer;
 	int ret;
 
 	if (!item)
@@ -608,16 +652,10 @@ int repo_parse_commit_internal(struct repository *r,
 		return 0;
 	}
 
-	if (odb_read_object_info_extended(r->objects, &item->object.oid,
-					  &oi, flags) < 0)
-		return quiet_on_missing ? -1 :
-			error("Could not read %s",
-			     oid_to_hex(&item->object.oid));
-	if (type != OBJ_COMMIT) {
-		free(buffer);
-		return error("Object %s not a commit",
-			     oid_to_hex(&item->object.oid));
-	}
+	ret = repo_read_commit_buffer(r, &item->object.oid, &buffer, &size,
+				      quiet_on_missing);
+	if (ret < 0)
+		return ret;
 
 	ret = parse_commit_buffer(r, item, buffer, size, 0);
 	if (save_commit_buffer && !ret &&
@@ -635,6 +673,12 @@ int repo_parse_commit_gently(struct repository *r,
 	return repo_parse_commit_internal(r, item, quiet_on_missing, 1);
 }
 
+int repo_parse_commit_no_graph(struct repository *r,
+			       struct commit *commit)
+{
+	return repo_parse_commit_internal(r, commit, 0, 0);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
 	if (repo_parse_commit(the_repository, item))
diff --git a/commit.h b/commit.h
index 1635de418b..f3c1932e74 100644
--- a/commit.h
+++ b/commit.h
@@ -93,8 +93,6 @@ struct commit *lookup_commit_reference_by_name_gently(const char *name,
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name);
 
 int parse_commit_buffer(struct repository *r, struct commit *item, const void *buffer, unsigned long size, int check_graph);
-int repo_parse_commit_internal(struct repository *r, struct commit *item,
-			       int quiet_on_missing, int use_commit_graph);
 int repo_parse_commit_gently(struct repository *r,
 			     struct commit *item,
 			     int quiet_on_missing);
@@ -103,11 +101,8 @@ static inline int repo_parse_commit(struct repository *r, struct commit *item)
 	return repo_parse_commit_gently(r, item, 0);
 }
 
-static inline int repo_parse_commit_no_graph(struct repository *r,
-					     struct commit *commit)
-{
-	return repo_parse_commit_internal(r, commit, 0, 0);
-}
+int repo_parse_commit_no_graph(struct repository *r,
+			       struct commit *commit);
 
 void parse_commit_or_die(struct commit *item);
 
@@ -153,8 +148,8 @@ void repo_unuse_commit_buffer(struct repository *r,
  */
 void free_commit_buffer(struct parsed_object_pool *pool, struct commit *);
 
-struct tree *repo_get_commit_tree(struct repository *, const struct commit *);
-struct object_id *get_commit_tree_oid(const struct commit *);
+struct tree *repo_get_commit_tree(struct repository *, struct commit *);
+struct object_id *get_commit_tree_oid(struct commit *);
 
 /*
  * Release memory related to a commit, including the parent list and
diff --git a/pretty.c b/pretty.c
index e0646bbc5d..2d8261e772 100644
--- a/pretty.c
+++ b/pretty.c
@@ -889,7 +889,7 @@ enum trunc_type {
 
 struct format_commit_context {
 	struct repository *repository;
-	const struct commit *commit;
+	struct commit *commit;
 	const struct pretty_print_context *pretty_ctx;
 	unsigned commit_header_parsed:1;
 	unsigned commit_message_parsed:1;
@@ -1440,7 +1440,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 				void *context)
 {
 	struct format_commit_context *c = context;
-	const struct commit *commit = c->commit;
+	struct commit *commit = c->commit;
 	const char *msg = c->message;
 	struct commit_list *p;
 	const char *arg, *eol;
@@ -1984,7 +1984,7 @@ void userformat_find_requirements(const char *fmt, struct userformat_want *w)
 }
 
 void repo_format_commit_message(struct repository *r,
-				const struct commit *commit,
+				struct commit *commit,
 				const char *format, struct strbuf *sb,
 				const struct pretty_print_context *pretty_ctx)
 {
@@ -2282,7 +2282,7 @@ void pp_remainder(struct pretty_print_context *pp,
 }
 
 void pretty_print_commit(struct pretty_print_context *pp,
-			 const struct commit *commit,
+			 struct commit *commit,
 			 struct strbuf *sb)
 {
 	unsigned long beginning_of_body;
@@ -2363,7 +2363,7 @@ void pretty_print_commit(struct pretty_print_context *pp,
 	repo_unuse_commit_buffer(the_repository, commit, reencoded);
 }
 
-void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit,
+void pp_commit_easy(enum cmit_fmt fmt, struct commit *commit,
 		    struct strbuf *sb)
 {
 	struct pretty_print_context pp = {0};
diff --git a/pretty.h b/pretty.h
index fac699033e..8b5b448988 100644
--- a/pretty.h
+++ b/pretty.h
@@ -82,8 +82,8 @@ void userformat_find_requirements(const char *fmt, struct userformat_want *w);
  * Shortcut for invoking pretty_print_commit if we do not have any context.
  * Context would be set empty except "fmt".
  */
-void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit,
-			struct strbuf *sb);
+void pp_commit_easy(enum cmit_fmt fmt, struct commit *commit,
+		    struct strbuf *sb);
 
 /*
  * Get information about user and date from "line", format it and
@@ -117,9 +117,9 @@ void pp_remainder(struct pretty_print_context *pp, const char **msg_p,
  * Please use this function for custom formats.
  */
 void repo_format_commit_message(struct repository *r,
-			const struct commit *commit,
-			const char *format, struct strbuf *sb,
-			const struct pretty_print_context *context);
+				struct commit *commit,
+				const char *format, struct strbuf *sb,
+				const struct pretty_print_context *context);
 
 /*
  * Parse given arguments from "arg", check it for correctness and
@@ -133,8 +133,8 @@ void get_commit_format(const char *arg, struct rev_info *);
  * Please use this function if you have a context (candidate for "pp").
  */
 void pretty_print_commit(struct pretty_print_context *pp,
-			const struct commit *commit,
-			struct strbuf *sb);
+			 struct commit *commit,
+			 struct strbuf *sb);
 
 /*
  * Change line breaks in "msg" to "line_separator" and put it into "sb".

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()`
  2026-02-16 15:38 [PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to_refs()` Patrick Steinhardt
  2026-02-16 15:38 ` [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()` Patrick Steinhardt
  2026-02-16 15:38 ` [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust Patrick Steinhardt
@ 2026-02-16 15:38 ` Patrick Steinhardt
  2026-02-18 19:34   ` Justin Tobler
  2 siblings, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2026-02-16 15:38 UTC (permalink / raw)
  To: git; +Cc: Matt Smiley

In the preceding commit we refactored `lookup_commit_reference_gently()`
so that it doesn't parse non-commit objects anymore. This has led to a
speedup when git-receive-pack(1) accepts a shallow push into a repo
with lots of refs that point to blobs or trees.

But while this case is now faster, we still have the issue that
accepting pushes with lots of "normal" refs that point to commits are
still slow. This is mostly because we look up the commits via the object
database, and that is rather costly.

Adapt the code to use `repo_parse_commit_gently()` instead of
`parse_object()` to parse the resulting commit object. This function
knows to use the commit-graph to fill in the object, which is way more
cost efficient.

This leads to another significant speedup when accepting shallow pushes.
The following benchmark pushes a single objects from a shallow clone
into a repository with 600,000 references that all point to commits:

  Benchmark 1: git-receive-pack (rev = HEAD~)
    Time (mean ± σ):      9.179 s ±  0.031 s    [User: 8.858 s, System: 0.528 s]
    Range (min … max):    9.154 s …  9.213 s    3 runs

  Benchmark 2: git-receive-pack (rev = HEAD)
    Time (mean ± σ):      2.337 s ±  0.032 s    [User: 2.331 s, System: 0.234 s]
    Range (min … max):    2.308 s …  2.371 s    3 runs

  Summary
    git-receive-pack . </tmp/input (rev = HEAD) ran
      3.93 ± 0.05 times faster than git-receive-pack (rev = HEAD~)

Also, this again leads to a significant reduction in memory allocations.
Before this change:

  HEAP SUMMARY:
      in use at exit: 17,524,978 bytes in 22,393 blocks
    total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated

And after this change:

  HEAP SUMMARY:
      in use at exit: 11,534,036 bytes in 12,406 blocks
    total heap usage: 13,284 allocs, 878 frees, 15,521,451 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 commit.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index b7c4ec2eb5..014f74822c 100644
--- a/commit.c
+++ b/commit.c
@@ -45,7 +45,7 @@ struct commit *lookup_commit_reference_gently(struct repository *r,
 {
 	const struct object_id *maybe_peeled;
 	struct object_id peeled_oid;
-	struct object *object;
+	struct commit *commit;
 	enum object_type type;
 
 	switch (peel_object_ext(r, oid, &peeled_oid, 0, &type)) {
@@ -67,11 +67,11 @@ struct commit *lookup_commit_reference_gently(struct repository *r,
 		return NULL;
 	}
 
-	object = parse_object(r, maybe_peeled);
-	if (!object)
+	commit = lookup_commit(r, maybe_peeled);
+	if (!commit || repo_parse_commit_gently(r, commit, quiet) < 0)
 		return NULL;
 
-	return object_as_type(object, OBJ_COMMIT, quiet);
+	return commit;
 }
 
 struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)

-- 
2.53.0.352.gd1286b26eb.dirty


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()`
  2026-02-16 15:38 ` [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()` Patrick Steinhardt
@ 2026-02-18 19:34   ` Justin Tobler
  0 siblings, 0 replies; 9+ messages in thread
From: Justin Tobler @ 2026-02-18 19:34 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Matt Smiley

On 26/02/16 04:38PM, Patrick Steinhardt wrote:
> In the preceding commit we refactored `lookup_commit_reference_gently()`
> so that it doesn't parse non-commit objects anymore. This has led to a
> speedup when git-receive-pack(1) accepts a shallow push into a repo
> with lots of refs that point to blobs or trees.
> 
> But while this case is now faster, we still have the issue that
> accepting pushes with lots of "normal" refs that point to commits are
> still slow. This is mostly because we look up the commits via the object
> database, and that is rather costly.
> 
> Adapt the code to use `repo_parse_commit_gently()` instead of
> `parse_object()` to parse the resulting commit object. This function
> knows to use the commit-graph to fill in the object, which is way more
> cost efficient.

Makes sense.

> This leads to another significant speedup when accepting shallow pushes.
> The following benchmark pushes a single objects from a shallow clone
> into a repository with 600,000 references that all point to commits:
> 
>   Benchmark 1: git-receive-pack (rev = HEAD~)
>     Time (mean ± σ):      9.179 s ±  0.031 s    [User: 8.858 s, System: 0.528 s]
>     Range (min … max):    9.154 s …  9.213 s    3 runs
> 
>   Benchmark 2: git-receive-pack (rev = HEAD)
>     Time (mean ± σ):      2.337 s ±  0.032 s    [User: 2.331 s, System: 0.234 s]
>     Range (min … max):    2.308 s …  2.371 s    3 runs
> 
>   Summary
>     git-receive-pack . </tmp/input (rev = HEAD) ran
>       3.93 ± 0.05 times faster than git-receive-pack (rev = HEAD~)
> 
> Also, this again leads to a significant reduction in memory allocations.
> Before this change:
> 
>   HEAP SUMMARY:
>       in use at exit: 17,524,978 bytes in 22,393 blocks
>     total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
> 
> And after this change:
> 
>   HEAP SUMMARY:
>       in use at exit: 11,534,036 bytes in 12,406 blocks
>     total heap usage: 13,284 allocs, 878 frees, 15,521,451 bytes allocated

Very nice :)

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  commit.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/commit.c b/commit.c
> index b7c4ec2eb5..014f74822c 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -45,7 +45,7 @@ struct commit *lookup_commit_reference_gently(struct repository *r,
>  {
>  	const struct object_id *maybe_peeled;
>  	struct object_id peeled_oid;
> -	struct object *object;
> +	struct commit *commit;
>  	enum object_type type;
>  
>  	switch (peel_object_ext(r, oid, &peeled_oid, 0, &type)) {
> @@ -67,11 +67,11 @@ struct commit *lookup_commit_reference_gently(struct repository *r,
>  		return NULL;
>  	}
>  
> -	object = parse_object(r, maybe_peeled);
> -	if (!object)
> +	commit = lookup_commit(r, maybe_peeled);
> +	if (!commit || repo_parse_commit_gently(r, commit, quiet) < 0)
>  		return NULL;

Now it is possible to use the commit-graph to parse the commit.

>  
> -	return object_as_type(object, OBJ_COMMIT, quiet);
> +	return commit;
>  }
>  
>  struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)

Thanks Patrick. This looks good to me.

-Justin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-19 17:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-16 15:38 [PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to_refs()` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 18:26   ` Justin Tobler
2026-02-19 17:35     ` Junio C Hamano
2026-02-16 15:38 ` [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust Patrick Steinhardt
2026-02-18 19:23   ` Justin Tobler
2026-02-19  7:18     ` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 19:34   ` Justin Tobler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox