* [PATCH 1/3] builtin/repack: fix geometric repacks with promisor remotes
2025-12-05 8:19 [PATCH 0/3] Some random object database related fixes Patrick Steinhardt
@ 2025-12-05 8:19 ` Patrick Steinhardt
2025-12-10 19:31 ` Justin Tobler
2025-12-05 8:19 ` [PATCH 2/3] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Patrick Steinhardt @ 2025-12-05 8:19 UTC (permalink / raw)
To: git
When repacking a repository with promisor remotes git-repack(1) knows to
pass "--exclude-promisor-objects" to git-pack-objects(1). This option
ensures that the new pack will not contain any promised object that do
not yet exist locally.
This command line option is incompatible with "--stdin-packs": the
latter option enables the rev-walk-based machinery to figure out which
objects to add to the pack, whereas the former tells git-pack-objects(1)
to merge all packs passed via stdin into one large pack. As we do not
know to filter those packs via the passed-in revisions it is clear that
at the current point in time nothing sensible comes out of combining
these two options.
But there is one case where git-repack(1) decides to pass both options:
when performing a geometric repack we always pass "--stdin-packs" to
identify the packs that should be merged. So if one performs a geometric
repack in a partial clone we'll end up with both options, and that
causes the repack to fail.
Fix this issue by never passing "--exclude-promisor-objects" when we
have a geometric split factor. We don't need the option anyway when
doing a geometric repack as we will only ever pack loose objects or
merge multiple packs. And neither of those cases can yield a promisor
object.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/repack.c | 5 +++--
t/t7703-repack-geometric.sh | 26 ++++++++++++++++++++++++++
2 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index d9012141f6..4621eed3e6 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -294,9 +294,10 @@ int cmd_repack(int argc,
strvec_push(&cmd.args, "--all");
strvec_push(&cmd.args, "--reflog");
strvec_push(&cmd.args, "--indexed-objects");
+
+ if (repo_has_promisor_remote(repo))
+ strvec_push(&cmd.args, "--exclude-promisor-objects");
}
- if (repo_has_promisor_remote(repo))
- strvec_push(&cmd.args, "--exclude-promisor-objects");
if (!write_midx) {
if (write_bitmaps > 0)
strvec_push(&cmd.args, "--write-bitmap-index");
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 9fc1626fbf..6d2c712bff 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -445,4 +445,30 @@ test_expect_success '--geometric -l disables writing bitmaps with non-local pack
test_path_is_file member/.git/objects/pack/multi-pack-index-*.bitmap
'
+test_expect_success '--geometric works with promisor packs' '
+ test_when_finished "rm -fr remote local" &&
+
+ git init remote &&
+ test_commit -C remote first file first &&
+ test_commit -C remote second file second &&
+ git -C remote config set uploadpack.allowfilter 1 &&
+ git -C remote config set uploadpack.allowanysha1inwant 1 &&
+ git -C remote repack -Ad &&
+
+ git clone --filter=blob:none file://"$(pwd)"/remote local &&
+ git -C local rev-list --objects --missing=print HEAD >missing-objects &&
+ test_grep "^?" missing-objects &&
+
+ # Assert that promisor packs are left alone and that we still manage to
+ # create new geometric packs.
+ ls local/.git/objects/pack/*.promisor >promisors-before &&
+ ls local/.git/objects/pack/*.pack >packs-before &&
+ test_commit -C local change &&
+ git -C local repack --geometric=2 &&
+ ls local/.git/objects/pack/*.promisor >promisors-after &&
+ ls local/.git/objects/pack/*.pack >packs-after &&
+ ! cmp packs-before packs-after &&
+ test_cmp promisors-before promisors-after
+'
+
test_done
--
2.52.0.239.gd5f0c6e74e.dirty
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH 1/3] builtin/repack: fix geometric repacks with promisor remotes
2025-12-05 8:19 ` [PATCH 1/3] builtin/repack: fix geometric repacks with promisor remotes Patrick Steinhardt
@ 2025-12-10 19:31 ` Justin Tobler
2025-12-11 5:46 ` Patrick Steinhardt
0 siblings, 1 reply; 17+ messages in thread
From: Justin Tobler @ 2025-12-10 19:31 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On 25/12/05 09:19AM, Patrick Steinhardt wrote:
> When repacking a repository with promisor remotes git-repack(1) knows to
> pass "--exclude-promisor-objects" to git-pack-objects(1). This option
> ensures that the new pack will not contain any promised object that do
> not yet exist locally.
>
> This command line option is incompatible with "--stdin-packs": the
> latter option enables the rev-walk-based machinery to figure out which
> objects to add to the pack, whereas the former tells git-pack-objects(1)
> to merge all packs passed via stdin into one large pack. As we do not
> know to filter those packs via the passed-in revisions it is clear that
> at the current point in time nothing sensible comes out of combining
> these two options.
Is the latter/former part here backwards? I find it a bit confusing to
read. As I understand it, --stdin-packs expects the packfiles provided
as input to dictate the source of objects when repacking. With
--exclude-promisor-objects, we walk the object graph normally, but
exclude promisor objects. Thus combining these two options would create
a conflict regarding which objects are included.
> But there is one case where git-repack(1) decides to pass both options:
> when performing a geometric repack we always pass "--stdin-packs" to
> identify the packs that should be merged. So if one performs a geometric
> repack in a partial clone we'll end up with both options, and that
> causes the repack to fail.
>
> Fix this issue by never passing "--exclude-promisor-objects" when we
> have a geometric split factor. We don't need the option anyway when
> doing a geometric repack as we will only ever pack loose objects or
> merge multiple packs. And neither of those cases can yield a promisor
> object.
I'm not sure I fully understand why --exclude-promisor-objects would not
be needed for geometric repacks. To clarify, do geometric repacks
already exclude promisor packfiles when merging? If so, then this change
makes sense.
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> builtin/repack.c | 5 +++--
> t/t7703-repack-geometric.sh | 26 ++++++++++++++++++++++++++
> 2 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index d9012141f6..4621eed3e6 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -294,9 +294,10 @@ int cmd_repack(int argc,
> strvec_push(&cmd.args, "--all");
> strvec_push(&cmd.args, "--reflog");
> strvec_push(&cmd.args, "--indexed-objects");
> +
> + if (repo_has_promisor_remote(repo))
> + strvec_push(&cmd.args, "--exclude-promisor-objects");
> }
> - if (repo_has_promisor_remote(repo))
> - strvec_push(&cmd.args, "--exclude-promisor-objects");
Ok, now the --exclude-promisor-objects flag is only added when there is
a promisor remote and geometric repacking is not used.
> if (!write_midx) {
> if (write_bitmaps > 0)
> strvec_push(&cmd.args, "--write-bitmap-index");
> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> index 9fc1626fbf..6d2c712bff 100755
> --- a/t/t7703-repack-geometric.sh
> +++ b/t/t7703-repack-geometric.sh
> @@ -445,4 +445,30 @@ test_expect_success '--geometric -l disables writing bitmaps with non-local pack
> test_path_is_file member/.git/objects/pack/multi-pack-index-*.bitmap
> '
>
> +test_expect_success '--geometric works with promisor packs' '
> + test_when_finished "rm -fr remote local" &&
> +
> + git init remote &&
> + test_commit -C remote first file first &&
> + test_commit -C remote second file second &&
> + git -C remote config set uploadpack.allowfilter 1 &&
> + git -C remote config set uploadpack.allowanysha1inwant 1 &&
> + git -C remote repack -Ad &&
> +
> + git clone --filter=blob:none file://"$(pwd)"/remote local &&
> + git -C local rev-list --objects --missing=print HEAD >missing-objects &&
> + test_grep "^?" missing-objects &&
> +
> + # Assert that promisor packs are left alone and that we still manage to
> + # create new geometric packs.
> + ls local/.git/objects/pack/*.promisor >promisors-before &&
> + ls local/.git/objects/pack/*.pack >packs-before &&
> + test_commit -C local change &&
> + git -C local repack --geometric=2 &&
> + ls local/.git/objects/pack/*.promisor >promisors-after &&
> + ls local/.git/objects/pack/*.pack >packs-after &&
> + ! cmp packs-before packs-after &&
> + test_cmp promisors-before promisors-after
Ok, so it does seem to be the case that promisor packfiles are ignored
when performing a geometric repack. Naive question: does this mean there
are scenarios where a repository could accumulate many promisor
packfiles, but never repack them?
-Justin
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH 1/3] builtin/repack: fix geometric repacks with promisor remotes
2025-12-10 19:31 ` Justin Tobler
@ 2025-12-11 5:46 ` Patrick Steinhardt
0 siblings, 0 replies; 17+ messages in thread
From: Patrick Steinhardt @ 2025-12-11 5:46 UTC (permalink / raw)
To: Justin Tobler; +Cc: git
On Wed, Dec 10, 2025 at 01:31:44PM -0600, Justin Tobler wrote:
> On 25/12/05 09:19AM, Patrick Steinhardt wrote:
> > But there is one case where git-repack(1) decides to pass both options:
> > when performing a geometric repack we always pass "--stdin-packs" to
> > identify the packs that should be merged. So if one performs a geometric
> > repack in a partial clone we'll end up with both options, and that
> > causes the repack to fail.
> >
> > Fix this issue by never passing "--exclude-promisor-objects" when we
> > have a geometric split factor. We don't need the option anyway when
> > doing a geometric repack as we will only ever pack loose objects or
> > merge multiple packs. And neither of those cases can yield a promisor
> > object.
>
> I'm not sure I fully understand why --exclude-promisor-objects would not
> be needed for geometric repacks. To clarify, do geometric repacks
> already exclude promisor packfiles when merging? If so, then this change
> makes sense.
Okay, I had a deeper look now, and turns out my claim was completely
wrong. We _do_ try to perform geometric repacking with promisor remotes,
but we don't know to handle them in any capacity:
- git-pack-objects(1) just dies right away.
- Even if it didn't, we would need to learn how to merge promisor
packs.
I'll drop this patch for now, thanks for prompting!
Patrick
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/3] builtin/gc: fix condition for whether to write commit graphs
2025-12-05 8:19 [PATCH 0/3] Some random object database related fixes Patrick Steinhardt
2025-12-05 8:19 ` [PATCH 1/3] builtin/repack: fix geometric repacks with promisor remotes Patrick Steinhardt
@ 2025-12-05 8:19 ` Patrick Steinhardt
2025-12-10 19:49 ` Justin Tobler
2025-12-05 8:20 ` [PATCH 3/3] odb: properly close sources before freeing them Patrick Steinhardt
2025-12-11 7:19 ` [PATCH v2 0/2] Some random object database related fixes Patrick Steinhardt
3 siblings, 1 reply; 17+ messages in thread
From: Patrick Steinhardt @ 2025-12-05 8:19 UTC (permalink / raw)
To: git
When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.
This logic has a memory leak though:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x55555562e433 in malloc (git+0xda433)
#1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
#2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
#3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
#4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
#5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
#6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
#7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
#8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
#9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
#10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
#11 0x55555575166a in run_builtin ../git.c:506:11
#12 0x5555557502f0 in handle_builtin ../git.c:779:9
#13 0x555555751127 in run_argv ../git.c:862:4
#14 0x55555575007b in cmd_main ../git.c:984:19
#15 0x5555557523aa in main ../common-main.c:9:11
#16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#18 0x5555555f0934 in _start (git+0x9c934)
The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.
This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.
While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
reachable via any other reference, we'd count that object twice.
Fix both of these bugs so that we properly count objects without leaking
any memory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/gc.c | 8 +++++---
t/t7900-maintenance.sh | 26 ++++++++++++++++++++++++++
2 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/builtin/gc.c b/builtin/gc.c
index 92c6e7b954..17ff68cbd9 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1130,8 +1130,10 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
return 0;
commit = lookup_commit(the_repository, maybe_peeled);
- if (!commit)
+ if (!commit || commit->object.flags & SEEN)
return 0;
+ commit->object.flags |= SEEN;
+
if (repo_parse_commit(the_repository, commit) ||
commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
return 0;
@@ -1141,7 +1143,7 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
if (data->num_not_in_graph >= data->limit)
return 1;
- commit_list_append(commit, &stack);
+ commit_list_insert(commit, &stack);
while (!result && stack) {
struct commit_list *parent;
@@ -1162,7 +1164,7 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
break;
}
- commit_list_append(parent->item, &stack);
+ commit_list_insert(parent->item, &stack);
}
}
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 6b36f52df7..6f3117304f 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -206,6 +206,32 @@ test_expect_success 'commit-graph auto condition' '
test_subcommand $COMMIT_GRAPH_WRITE <cg-two-satisfied.txt
'
+test_expect_success 'commit-graph auto condition with merges' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git config set maintenance.auto false &&
+ git commit --allow-empty -m initial &&
+ git switch --create feature &&
+ git commit --allow-empty -m feature-1 &&
+ git commit --allow-empty -m feature-2 &&
+ git switch - &&
+ git commit --allow-empty -m main-1 &&
+ git commit --allow-empty -m main-2 &&
+ git merge feature &&
+ git branch -D feature &&
+
+ # We have 6 commit, none of which are covered by a commit
+ # graph. So this must be the boundary at which we start to
+ # perform maintenance.
+ test_must_fail git -c maintenance.commit-graph.auto=7 \
+ maintenance is-needed --auto --task=commit-graph &&
+ git -c maintenance.commit-graph.auto=6 \
+ maintenance is-needed --auto --task=commit-graph
+ )
+'
+
test_expect_success 'run --task=bogus' '
test_must_fail git maintenance run --task=bogus 2>err &&
test_grep "is not a valid task" err
--
2.52.0.239.gd5f0c6e74e.dirty
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH 2/3] builtin/gc: fix condition for whether to write commit graphs
2025-12-05 8:19 ` [PATCH 2/3] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
@ 2025-12-10 19:49 ` Justin Tobler
2025-12-11 5:48 ` Patrick Steinhardt
0 siblings, 1 reply; 17+ messages in thread
From: Justin Tobler @ 2025-12-10 19:49 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On 25/12/05 09:19AM, Patrick Steinhardt wrote:
> When performing auto-maintenance we check whether commit graphs need to
> be generated by counting the number of commits that are reachable by any
> reference, but not covered by a commit graph. This search is performed
> by iterating through all references and then doing a depth-first search
> until we have found enough commits that are not present in the commit
> graph.
>
> This logic has a memory leak though:
>
> Direct leak of 16 byte(s) in 1 object(s) allocated from:
> #0 0x55555562e433 in malloc (git+0xda433)
> #1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
> #2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
> #3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
> #4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
> #5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
> #6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
> #7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
> #8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
> #9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
> #10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
> #11 0x55555575166a in run_builtin ../git.c:506:11
> #12 0x5555557502f0 in handle_builtin ../git.c:779:9
> #13 0x555555751127 in run_argv ../git.c:862:4
> #14 0x55555575007b in cmd_main ../git.c:984:19
> #15 0x5555557523aa in main ../common-main.c:9:11
> #16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
> #17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
> #18 0x5555555f0934 in _start (git+0x9c934)
>
> The root cause of this memory leak is our use of `commit_list_append()`.
> This function expects as parameters the item to append and the _tail_ of
> the list to append. This tail will then be overwritten with the new tail
> of the list so that it can be used in subsequent calls. But we call it
> with `commit_list_append(parent->item, &stack)`, so we end up losing
> everything but the new item.
>
> This issue only surfaces when counting merge commits. Next to being a
> memory leak, it also shows that we're in fact miscounting as we only
> respect children of the last parent. All previous parents are discarded,
> so their children will be disregarded unless they are hit via another
> reference.
>
> While crafting a test case for the issue I was puzzled that I couldn't
> establish the proper border at which the auto-condition would be
> fulfilled. As it turns out, there's another bug: if an object is at the
> tip of any reference we don't mark it as seen. Consequently, if it is
> reachable via any other reference, we'd count that object twice.
>
> Fix both of these bugs so that we properly count objects without leaking
> any memory.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> builtin/gc.c | 8 +++++---
> t/t7900-maintenance.sh | 26 ++++++++++++++++++++++++++
> 2 files changed, 31 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 92c6e7b954..17ff68cbd9 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -1130,8 +1130,10 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
> return 0;
>
> commit = lookup_commit(the_repository, maybe_peeled);
> - if (!commit)
> + if (!commit || commit->object.flags & SEEN)
> return 0;
> + commit->object.flags |= SEEN;
Now we are marking the object at the reference tip as seen so it will
not be counted more than once if used by other references. Makes sense.
> +
> if (repo_parse_commit(the_repository, commit) ||
> commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
> return 0;
> @@ -1141,7 +1143,7 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
> if (data->num_not_in_graph >= data->limit)
> return 1;
>
> - commit_list_append(commit, &stack);
> + commit_list_insert(commit, &stack);
>
> while (!result && stack) {
> struct commit_list *parent;
> @@ -1162,7 +1164,7 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
> break;
> }
>
> - commit_list_append(parent->item, &stack);
> + commit_list_insert(parent->item, &stack);
We change from commit_list_append() to commit_list_insert() so the new
item is added to the list without discarding the other entries. This
fixes the memory leak and corrects the other miscounting issue. Looks
good.
> }
> }
>
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 6b36f52df7..6f3117304f 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -206,6 +206,32 @@ test_expect_success 'commit-graph auto condition' '
> test_subcommand $COMMIT_GRAPH_WRITE <cg-two-satisfied.txt
> '
>
> +test_expect_success 'commit-graph auto condition with merges' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git config set maintenance.auto false &&
> + git commit --allow-empty -m initial &&
> + git switch --create feature &&
> + git commit --allow-empty -m feature-1 &&
> + git commit --allow-empty -m feature-2 &&
> + git switch - &&
> + git commit --allow-empty -m main-1 &&
> + git commit --allow-empty -m main-2 &&
> + git merge feature &&
> + git branch -D feature &&
If we left the feature branch instead of deleting it, would that help
test that commits are not counted twice?
> +
> + # We have 6 commit, none of which are covered by a commit
> + # graph. So this must be the boundary at which we start to
> + # perform maintenance.
> + test_must_fail git -c maintenance.commit-graph.auto=7 \
> + maintenance is-needed --auto --task=commit-graph &&
> + git -c maintenance.commit-graph.auto=6 \
> + maintenance is-needed --auto --task=commit-graph
> + )
> +'
Nice fix.
-Justin
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH 2/3] builtin/gc: fix condition for whether to write commit graphs
2025-12-10 19:49 ` Justin Tobler
@ 2025-12-11 5:48 ` Patrick Steinhardt
0 siblings, 0 replies; 17+ messages in thread
From: Patrick Steinhardt @ 2025-12-11 5:48 UTC (permalink / raw)
To: Justin Tobler; +Cc: git
On Wed, Dec 10, 2025 at 01:49:39PM -0600, Justin Tobler wrote:
> On 25/12/05 09:19AM, Patrick Steinhardt wrote:
> > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> > index 6b36f52df7..6f3117304f 100755
> > --- a/t/t7900-maintenance.sh
> > +++ b/t/t7900-maintenance.sh
> > @@ -206,6 +206,32 @@ test_expect_success 'commit-graph auto condition' '
> > test_subcommand $COMMIT_GRAPH_WRITE <cg-two-satisfied.txt
> > '
> >
> > +test_expect_success 'commit-graph auto condition with merges' '
> > + test_when_finished "rm -rf repo" &&
> > + git init repo &&
> > + (
> > + cd repo &&
> > + git config set maintenance.auto false &&
> > + git commit --allow-empty -m initial &&
> > + git switch --create feature &&
> > + git commit --allow-empty -m feature-1 &&
> > + git commit --allow-empty -m feature-2 &&
> > + git switch - &&
> > + git commit --allow-empty -m main-1 &&
> > + git commit --allow-empty -m main-2 &&
> > + git merge feature &&
> > + git branch -D feature &&
>
> If we left the feature branch instead of deleting it, would that help
> test that commits are not counted twice?
Indeed! I couldn't make any sense of the results at the beginning of
writing this test, but now that I fixed the relveant bugs we can retain
the branch.
Patrick
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3/3] odb: properly close sources before freeing them
2025-12-05 8:19 [PATCH 0/3] Some random object database related fixes Patrick Steinhardt
2025-12-05 8:19 ` [PATCH 1/3] builtin/repack: fix geometric repacks with promisor remotes Patrick Steinhardt
2025-12-05 8:19 ` [PATCH 2/3] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
@ 2025-12-05 8:20 ` Patrick Steinhardt
2025-12-05 23:14 ` Eric Sunshine
2025-12-11 7:19 ` [PATCH v2 0/2] Some random object database related fixes Patrick Steinhardt
3 siblings, 1 reply; 17+ messages in thread
From: Patrick Steinhardt @ 2025-12-05 8:20 UTC (permalink / raw)
To: git
In the next commit we are about to move the packfile store into the ODB
source so that we have one store per source. This will lead to a memory
leak in the following commit when reading data from a submodule via
git-grep(1):
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x55555562e726 in calloc (git+0xda726)
#1 0x555555964734 in xcalloc ../wrapper.c:154:8
#2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2
#3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6
#4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17
#5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3
#6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2
#7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2
#8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4
#9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8
#10 0x5555558540bd in odb_read_object ../odb.c:920:6
#11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12
#12 0x55555580a13a in grep_source_load ../grep.c:1986:10
#13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7
#14 0x555555807574 in grep_source_1 ../grep.c:1625:8
#15 0x555555807322 in grep_source ../grep.c:1837:10
#16 0x5555556a5c58 in run ../builtin/grep.c:208:10
#17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o
#18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
The root caues of this leak is the way we set up and release the
submodule:
1. We use `repo_submodule_init()` to initialize a new repository. This
repository is stored in `repos_to_free`.
2. We now read data from the submodule repository.
3. We then call `repo_clear()` on the submodule repositories.
4. `repo_clear()` calls `odb_free()`.
5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`.
The issue here is the 5th step: we call `odb_free_sources()` _before_ we
call `odb_close()`. But `odb_free_sources()` already frees all sources,
so the logic that closes them in `odb_close()` now becomes a no-op. As a
consequence, we never explicitly close sources at all.
Fix the leak by closing the store before we free the sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/odb.c b/odb.c
index dc8f292f3d..8e67afe185 100644
--- a/odb.c
+++ b/odb.c
@@ -1132,13 +1132,13 @@ void odb_free(struct object_database *o)
oidmap_clear(&o->replace_map, 1);
pthread_mutex_destroy(&o->replace_mutex);
+ odb_close(o);
odb_free_sources(o);
for (size_t i = 0; i < o->cached_object_nr; i++)
free((char *) o->cached_objects[i].value.buf);
free(o->cached_objects);
- odb_close(o);
packfile_store_free(o->packfiles);
string_list_clear(&o->submodule_source_paths, 0);
--
2.52.0.239.gd5f0c6e74e.dirty
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH 3/3] odb: properly close sources before freeing them
2025-12-05 8:20 ` [PATCH 3/3] odb: properly close sources before freeing them Patrick Steinhardt
@ 2025-12-05 23:14 ` Eric Sunshine
2025-12-06 11:38 ` Patrick Steinhardt
0 siblings, 1 reply; 17+ messages in thread
From: Eric Sunshine @ 2025-12-05 23:14 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Fri, Dec 5, 2025 at 6:36 AM Patrick Steinhardt <ps@pks.im> wrote:
> In the next commit we are about to move the packfile store into the ODB
> source so that we have one store per source. This will lead to a memory
> leak in the following commit when reading data from a submodule via
> git-grep(1):
> [...]
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
Considering that this is patch [3/3], to what does "In the next
commit..." refer?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] odb: properly close sources before freeing them
2025-12-05 23:14 ` Eric Sunshine
@ 2025-12-06 11:38 ` Patrick Steinhardt
2025-12-06 11:43 ` Eric Sunshine
0 siblings, 1 reply; 17+ messages in thread
From: Patrick Steinhardt @ 2025-12-06 11:38 UTC (permalink / raw)
To: Eric Sunshine; +Cc: git
On Fri, Dec 05, 2025 at 06:14:22PM -0500, Eric Sunshine wrote:
> On Fri, Dec 5, 2025 at 6:36 AM Patrick Steinhardt <ps@pks.im> wrote:
> > In the next commit we are about to move the packfile store into the ODB
> > source so that we have one store per source. This will lead to a memory
> > leak in the following commit when reading data from a submodule via
> > git-grep(1):
> > [...]
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
>
> Considering that this is patch [3/3], to what does "In the next
> commit..." refer?
Good catch! I split this out of another, bigger, patch series. But as
I've started to hit the leak in a different patch series, as well, I
decided to split it out into a smaller patch series.
I've queued the following change locally, but will refrain from sending
out a new version for now.
Thanks!
Patrick
1: 5c15065406 = 1: 9f813d92f3 builtin/repack: fix geometric repacks with promisor remotes
2: 2fa3991003 = 2: 02167bfb16 builtin/gc: fix condition for whether to write commit graphs
3: a06d0716c3 ! 3: c9ca233c29 odb: properly close sources before freeing them
@@ Commit message
odb: properly close sources before freeing them
In the next commit we are about to move the packfile store into the ODB
- source so that we have one store per source. This will lead to a memory
- leak in the following commit when reading data from a submodule via
- git-grep(1):
+ source so that we have one store per source. This can lead to a memory
+ leak when reading data from a submodule via git-grep(1):
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x55555562e726 in calloc (git+0xda726)
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH 3/3] odb: properly close sources before freeing them
2025-12-06 11:38 ` Patrick Steinhardt
@ 2025-12-06 11:43 ` Eric Sunshine
2025-12-06 12:04 ` Patrick Steinhardt
0 siblings, 1 reply; 17+ messages in thread
From: Eric Sunshine @ 2025-12-06 11:43 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Sat, Dec 6, 2025 at 6:38 AM Patrick Steinhardt <ps@pks.im> wrote:
> On Fri, Dec 05, 2025 at 06:14:22PM -0500, Eric Sunshine wrote:
> > On Fri, Dec 5, 2025 at 6:36 AM Patrick Steinhardt <ps@pks.im> wrote:
> > > In the next commit we are about to move the packfile store into the ODB
> > > source so that we have one store per source. This will lead to a memory
> > > leak in the following commit when reading data from a submodule via
> > > git-grep(1):
> > > [...]
> > > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> >
> > Considering that this is patch [3/3], to what does "In the next
> > commit..." refer?
>
> Good catch! I split this out of another, bigger, patch series. But as
> I've started to hit the leak in a different patch series, as well, I
> decided to split it out into a smaller patch series.
>
> I've queued the following change locally, but will refrain from sending
> out a new version for now.
>
> 3: a06d0716c3 ! 3: c9ca233c29 odb: properly close sources before freeing them
> @@ Commit message
> In the next commit we are about to move the packfile store into the ODB
> - source so that we have one store per source. This will lead to a memory
> - leak in the following commit when reading data from a submodule via
> - git-grep(1):
> + source so that we have one store per source. This can lead to a memory
> + leak when reading data from a submodule via git-grep(1):
I would think that you would also want to drop the "In the next commit
we are about to..." bit (considering, again, that this is patch
[3/3]).
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] odb: properly close sources before freeing them
2025-12-06 11:43 ` Eric Sunshine
@ 2025-12-06 12:04 ` Patrick Steinhardt
0 siblings, 0 replies; 17+ messages in thread
From: Patrick Steinhardt @ 2025-12-06 12:04 UTC (permalink / raw)
To: Eric Sunshine; +Cc: git
On Sat, Dec 06, 2025 at 06:43:40AM -0500, Eric Sunshine wrote:
> On Sat, Dec 6, 2025 at 6:38 AM Patrick Steinhardt <ps@pks.im> wrote:
> > On Fri, Dec 05, 2025 at 06:14:22PM -0500, Eric Sunshine wrote:
> > > On Fri, Dec 5, 2025 at 6:36 AM Patrick Steinhardt <ps@pks.im> wrote:
> > > > In the next commit we are about to move the packfile store into the ODB
> > > > source so that we have one store per source. This will lead to a memory
> > > > leak in the following commit when reading data from a submodule via
> > > > git-grep(1):
> > > > [...]
> > > > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > >
> > > Considering that this is patch [3/3], to what does "In the next
> > > commit..." refer?
> >
> > Good catch! I split this out of another, bigger, patch series. But as
> > I've started to hit the leak in a different patch series, as well, I
> > decided to split it out into a smaller patch series.
> >
> > I've queued the following change locally, but will refrain from sending
> > out a new version for now.
> >
> > 3: a06d0716c3 ! 3: c9ca233c29 odb: properly close sources before freeing them
> > @@ Commit message
> > In the next commit we are about to move the packfile store into the ODB
> > - source so that we have one store per source. This will lead to a memory
> > - leak in the following commit when reading data from a submodule via
> > - git-grep(1):
> > + source so that we have one store per source. This can lead to a memory
> > + leak when reading data from a submodule via git-grep(1):
>
> I would think that you would also want to drop the "In the next commit
> we are about to..." bit (considering, again, that this is patch
> [3/3]).
Ugh, of course. It's the weekend, so my brain is clearly not working.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 0/2] Some random object database related fixes
2025-12-05 8:19 [PATCH 0/3] Some random object database related fixes Patrick Steinhardt
` (2 preceding siblings ...)
2025-12-05 8:20 ` [PATCH 3/3] odb: properly close sources before freeing them Patrick Steinhardt
@ 2025-12-11 7:19 ` Patrick Steinhardt
2025-12-11 7:19 ` [PATCH v2 1/2] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
` (2 more replies)
3 siblings, 3 replies; 17+ messages in thread
From: Patrick Steinhardt @ 2025-12-11 7:19 UTC (permalink / raw)
To: git; +Cc: Justin Tobler, Eric Sunshine
Hi,
this patch series fixes some small issues I've discovered while working
on some other patch series. I've decided to split it out of these
because I'm hitting the same issues in multiple series, and I don't want
those to become dependent on one another.
The patch series is built on top of f0ef5b6d9b with
ps/object-source-management at ac65c70663 (odb: handle recreation of
quarantine directories, 2025-11-19) merged into it.
Changes in v2:
- Drop the first commit that regards geometric repacking with promisor
remotes. As it turns out my assertion was wrong: geometric repacks
do and have to consider promisors, but they will fail to handle
them. This is a bigger topic to fix though, so I'll rather want to
move this into a separate patch series.
- Tighten tests a bit for the commit-graph generation.
- Stop referring to a "subsequent" commit that doesn't exist.
- Link to v1: https://lore.kernel.org/r/20251205-odb-related-fixes-v1-0-ef4250abb584@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (2):
builtin/gc: fix condition for whether to write commit graphs
odb: properly close sources before freeing them
builtin/gc.c | 8 +++++---
odb.c | 2 +-
t/t7900-maintenance.sh | 25 +++++++++++++++++++++++++
3 files changed, 31 insertions(+), 4 deletions(-)
Range-diff versus v1:
1: 5c15065406 < -: ---------- builtin/repack: fix geometric repacks with promisor remotes
2: 2fa3991003 ! 1: 1702bf6e7f builtin/gc: fix condition for whether to write commit graphs
@@ t/t7900-maintenance.sh: test_expect_success 'commit-graph auto condition' '
+ git commit --allow-empty -m main-1 &&
+ git commit --allow-empty -m main-2 &&
+ git merge feature &&
-+ git branch -D feature &&
+
+ # We have 6 commit, none of which are covered by a commit
+ # graph. So this must be the boundary at which we start to
3: a06d0716c3 ! 2: 7dd4e6fabe odb: properly close sources before freeing them
@@ Metadata
## Commit message ##
odb: properly close sources before freeing them
- In the next commit we are about to move the packfile store into the ODB
- source so that we have one store per source. This will lead to a memory
- leak in the following commit when reading data from a submodule via
- git-grep(1):
+ It is possible to hit a memory leak when reading data from a submodule
+ via git-grep(1):
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x55555562e726 in calloc (git+0xda726)
---
base-commit: 2797238193944b52d12624a04a962f40b9bcad69
change-id: 20251205-odb-related-fixes-5f48a0993ef7
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH v2 1/2] builtin/gc: fix condition for whether to write commit graphs
2025-12-11 7:19 ` [PATCH v2 0/2] Some random object database related fixes Patrick Steinhardt
@ 2025-12-11 7:19 ` Patrick Steinhardt
2025-12-11 14:16 ` Toon Claes
2025-12-11 7:19 ` [PATCH v2 2/2] odb: properly close sources before freeing them Patrick Steinhardt
2025-12-12 15:01 ` [PATCH v2 0/2] Some random object database related fixes Justin Tobler
2 siblings, 1 reply; 17+ messages in thread
From: Patrick Steinhardt @ 2025-12-11 7:19 UTC (permalink / raw)
To: git; +Cc: Justin Tobler, Eric Sunshine
When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.
This logic has a memory leak though:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x55555562e433 in malloc (git+0xda433)
#1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
#2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
#3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
#4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
#5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
#6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
#7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
#8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
#9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
#10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
#11 0x55555575166a in run_builtin ../git.c:506:11
#12 0x5555557502f0 in handle_builtin ../git.c:779:9
#13 0x555555751127 in run_argv ../git.c:862:4
#14 0x55555575007b in cmd_main ../git.c:984:19
#15 0x5555557523aa in main ../common-main.c:9:11
#16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#18 0x5555555f0934 in _start (git+0x9c934)
The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.
This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.
While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
reachable via any other reference, we'd count that object twice.
Fix both of these bugs so that we properly count objects without leaking
any memory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/gc.c | 8 +++++---
t/t7900-maintenance.sh | 25 +++++++++++++++++++++++++
2 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/builtin/gc.c b/builtin/gc.c
index 92c6e7b954..17ff68cbd9 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1130,8 +1130,10 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
return 0;
commit = lookup_commit(the_repository, maybe_peeled);
- if (!commit)
+ if (!commit || commit->object.flags & SEEN)
return 0;
+ commit->object.flags |= SEEN;
+
if (repo_parse_commit(the_repository, commit) ||
commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
return 0;
@@ -1141,7 +1143,7 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
if (data->num_not_in_graph >= data->limit)
return 1;
- commit_list_append(commit, &stack);
+ commit_list_insert(commit, &stack);
while (!result && stack) {
struct commit_list *parent;
@@ -1162,7 +1164,7 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
break;
}
- commit_list_append(parent->item, &stack);
+ commit_list_insert(parent->item, &stack);
}
}
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 6b36f52df7..a2b4403595 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -206,6 +206,31 @@ test_expect_success 'commit-graph auto condition' '
test_subcommand $COMMIT_GRAPH_WRITE <cg-two-satisfied.txt
'
+test_expect_success 'commit-graph auto condition with merges' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git config set maintenance.auto false &&
+ git commit --allow-empty -m initial &&
+ git switch --create feature &&
+ git commit --allow-empty -m feature-1 &&
+ git commit --allow-empty -m feature-2 &&
+ git switch - &&
+ git commit --allow-empty -m main-1 &&
+ git commit --allow-empty -m main-2 &&
+ git merge feature &&
+
+ # We have 6 commit, none of which are covered by a commit
+ # graph. So this must be the boundary at which we start to
+ # perform maintenance.
+ test_must_fail git -c maintenance.commit-graph.auto=7 \
+ maintenance is-needed --auto --task=commit-graph &&
+ git -c maintenance.commit-graph.auto=6 \
+ maintenance is-needed --auto --task=commit-graph
+ )
+'
+
test_expect_success 'run --task=bogus' '
test_must_fail git maintenance run --task=bogus 2>err &&
test_grep "is not a valid task" err
--
2.52.0.270.g3f4935d65f.dirty
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v2 1/2] builtin/gc: fix condition for whether to write commit graphs
2025-12-11 7:19 ` [PATCH v2 1/2] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
@ 2025-12-11 14:16 ` Toon Claes
0 siblings, 0 replies; 17+ messages in thread
From: Toon Claes @ 2025-12-11 14:16 UTC (permalink / raw)
To: Patrick Steinhardt, git; +Cc: Justin Tobler, Eric Sunshine
Patrick Steinhardt <ps@pks.im> writes:
> The root cause of this memory leak is our use of `commit_list_append()`.
> This function expects as parameters the item to append and the _tail_ of
> the list to append. This tail will then be overwritten with the new tail
> of the list so that it can be used in subsequent calls. But we call it
> with `commit_list_append(parent->item, &stack)`, so we end up losing
> everything but the new item.
>
> This issue only surfaces when counting merge commits. Next to being a
> memory leak, it also shows that we're in fact miscounting as we only
> respect children of the last parent. All previous parents are discarded,
> so their children will be disregarded unless they are hit via another
> reference.
>
> While crafting a test case for the issue I was puzzled that I couldn't
> establish the proper border at which the auto-condition would be
> fulfilled. As it turns out, there's another bug: if an object is at the
> tip of any reference we don't mark it as seen. Consequently, if it is
> reachable via any other reference, we'd count that object twice.
>
> Fix both of these bugs so that we properly count objects without leaking
> any memory.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> builtin/gc.c | 8 +++++---
> t/t7900-maintenance.sh | 25 +++++++++++++++++++++++++
> 2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 92c6e7b954..17ff68cbd9 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -1130,8 +1130,10 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
> return 0;
>
> commit = lookup_commit(the_repository, maybe_peeled);
> - if (!commit)
> + if (!commit || commit->object.flags & SEEN)
> return 0;
> + commit->object.flags |= SEEN;
> +
> if (repo_parse_commit(the_repository, commit) ||
> commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
> return 0;
> @@ -1141,7 +1143,7 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
> if (data->num_not_in_graph >= data->limit)
> return 1;
>
> - commit_list_append(commit, &stack);
> + commit_list_insert(commit, &stack);
commit_list_insert() prepends the commit to the beginning of the list,
while commit_list_append() appends it at the end. Because the list is
only used for counting, we don't care about the order. So this fix looks
good to me.
I also approve the other changes in this series.
--
Cheers,
Toon
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 2/2] odb: properly close sources before freeing them
2025-12-11 7:19 ` [PATCH v2 0/2] Some random object database related fixes Patrick Steinhardt
2025-12-11 7:19 ` [PATCH v2 1/2] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
@ 2025-12-11 7:19 ` Patrick Steinhardt
2025-12-12 15:01 ` [PATCH v2 0/2] Some random object database related fixes Justin Tobler
2 siblings, 0 replies; 17+ messages in thread
From: Patrick Steinhardt @ 2025-12-11 7:19 UTC (permalink / raw)
To: git; +Cc: Justin Tobler, Eric Sunshine
It is possible to hit a memory leak when reading data from a submodule
via git-grep(1):
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x55555562e726 in calloc (git+0xda726)
#1 0x555555964734 in xcalloc ../wrapper.c:154:8
#2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2
#3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6
#4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17
#5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3
#6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2
#7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2
#8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4
#9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8
#10 0x5555558540bd in odb_read_object ../odb.c:920:6
#11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12
#12 0x55555580a13a in grep_source_load ../grep.c:1986:10
#13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7
#14 0x555555807574 in grep_source_1 ../grep.c:1625:8
#15 0x555555807322 in grep_source ../grep.c:1837:10
#16 0x5555556a5c58 in run ../builtin/grep.c:208:10
#17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o
#18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
The root caues of this leak is the way we set up and release the
submodule:
1. We use `repo_submodule_init()` to initialize a new repository. This
repository is stored in `repos_to_free`.
2. We now read data from the submodule repository.
3. We then call `repo_clear()` on the submodule repositories.
4. `repo_clear()` calls `odb_free()`.
5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`.
The issue here is the 5th step: we call `odb_free_sources()` _before_ we
call `odb_close()`. But `odb_free_sources()` already frees all sources,
so the logic that closes them in `odb_close()` now becomes a no-op. As a
consequence, we never explicitly close sources at all.
Fix the leak by closing the store before we free the sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/odb.c b/odb.c
index dc8f292f3d..8e67afe185 100644
--- a/odb.c
+++ b/odb.c
@@ -1132,13 +1132,13 @@ void odb_free(struct object_database *o)
oidmap_clear(&o->replace_map, 1);
pthread_mutex_destroy(&o->replace_mutex);
+ odb_close(o);
odb_free_sources(o);
for (size_t i = 0; i < o->cached_object_nr; i++)
free((char *) o->cached_objects[i].value.buf);
free(o->cached_objects);
- odb_close(o);
packfile_store_free(o->packfiles);
string_list_clear(&o->submodule_source_paths, 0);
--
2.52.0.270.g3f4935d65f.dirty
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v2 0/2] Some random object database related fixes
2025-12-11 7:19 ` [PATCH v2 0/2] Some random object database related fixes Patrick Steinhardt
2025-12-11 7:19 ` [PATCH v2 1/2] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
2025-12-11 7:19 ` [PATCH v2 2/2] odb: properly close sources before freeing them Patrick Steinhardt
@ 2025-12-12 15:01 ` Justin Tobler
2 siblings, 0 replies; 17+ messages in thread
From: Justin Tobler @ 2025-12-12 15:01 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Eric Sunshine
On 25/12/11 08:19AM, Patrick Steinhardt wrote:
> Changes in v2:
> - Drop the first commit that regards geometric repacking with promisor
> remotes. As it turns out my assertion was wrong: geometric repacks
> do and have to consider promisors, but they will fail to handle
> them. This is a bigger topic to fix though, so I'll rather want to
> move this into a separate patch series.
> - Tighten tests a bit for the commit-graph generation.
> - Stop referring to a "subsequent" commit that doesn't exist.
> - Link to v1: https://lore.kernel.org/r/20251205-odb-related-fixes-v1-0-ef4250abb584@pks.im
The change is this version look good to me.
-Justin
^ permalink raw reply [flat|nested] 17+ messages in thread