* Re: [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters
From: Kristofer Karlsson @ 2026-06-24 14:38 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <19639ad3-2d16-4f3b-be79-138e00144ea3@gmail.com>
On Wed, 24 Jun 2026 at 15:54, Derrick Stolee <stolee@gmail.com> wrote:
>
> I'm grateful to see these changes happening to the doc in real-
> time. I know it was extra work, but I'm grateful right now.
>
> Hopefully future historians will also benefit from this effort.
It was honestly not bad at all, and I agree it felt quite nice to
see how the doc naturally changed along with the implementation.
> > +static struct commit *paint_queue_get(struct paint_state *state)
> > +{
>
> Since we are going to make this a more complete termination
> condition, we may want to make that very explicit with a doc-
> comment. Something along the lines of "dequeue a commit when
> possible, but also signal termination of the walk when we
> conclude that no more merge bases will be discovered due to
> internal state."
Yes, I'll make sure to clean that part up more, maybe also
rename the function to be more descriptive.
> You mentioned in your cover letter how the min_generation value
> can add extra termination conditions. It may be a good idea to
> insert min_generation into the paint_queue struct and make it a
> termination condition for paint_queue_get(). If you consider this
> direction, then I'd make it a separate patch on top of this one
> _before_ adding the one-sided change. The extra tests that cover
> the exact number of walked commits can help to guarantee the same
> behavior, assuming that some of those tests check a non-zero
> min_generation input. (It may be good to add such trace tests in
> an earlier patch to help confidence in this case.)
I think I might wait with this - the patch series already feels
quite big, and I think it has a natural progression and finish now.
But I can definitely commit to following up later -- it would be a
smaller series that is easier to reason about, likely a single commit.
Thanks,
Kristofer
^ permalink raw reply
* Re: [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases
From: Kristofer Karlsson @ 2026-06-24 14:33 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren
In-Reply-To: <b4b33635-1279-46c0-819a-d29cc13921f5@gmail.com>
On Wed, 24 Jun 2026 at 15:43, Derrick Stolee <stolee@gmail.com> wrote:
>
> One way to make these tests have potential to check exact stats
> without too much extra work would be to update 'test_all_modes'
> to run each command with GIT_TRACE2_EVENT set to a known trace
> file (reset each time) that can then be checked after verifying
> that the results of each command is the same.
>
> Then, these tests could have lines such as
>
> test_trace2_data paint_down_to_common steps 20 <trace-full.txt &&
> test_trace2_data paint_down_to_common steps 30 <trace-half.txt &&
> test_trace2_data paint_down_to_common steps 40 <trace-none.txt
>
> after the test_all_modes line.
That does indeed look quite clean, I will try to massage the tests to
utilize that, though I haven't fully worked it through in my head yet so
I am not sure if/where I would get stuck. :)
Thanks,
Kristofer
^ permalink raw reply
* Re: [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common()
From: Kristofer Karlsson @ 2026-06-24 14:31 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <560c91df-3c07-4c8f-9924-ef0cc7646e08@gmail.com>
On Wed, 24 Jun 2026 at 15:41, Derrick Stolee <stolee@gmail.com> wrote:
>
> (highlighting this chunk)
>
> > + rm -rf .git/objects/info/commit-graph \
> > + .git/objects/info/commit-graphs &&
> > +
> > + GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \
> > + git merge-base --all commit-9-9 commit-9-1 >actual &&
> > + test_trace2_data paint_down_to_common steps 81 <trace-none.txt &&
>
> I'd rather see the whitespace line before the `rm` to make it
> more obvious that it's setting up the "none" case.
Ah yes, good point, will fix.
> > + cp commit-graph-full .git/objects/info/commit-graph &&
> > + GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
> > + git merge-base --all commit-9-9 commit-9-1 >actual &&
> > + test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
> > +
> > + cp commit-graph-half .git/objects/info/commit-graph &&
> > + GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
> > + git merge-base --all commit-9-9 commit-9-1 >actual &&
> > + test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> > +'
> > +
>
> This test is a great example. I look forward to seeing that it
> updates in the future.
>
> One thing I was hoping to see was that your side-exhaustion tests
> (from patch v2 2/7) would also include these checks so they are
> more obviously updating when the implementation updates later.
I was internally contemplating how much I should introduce the steps
validation to existing tests. My worry was that it might make tests
fragile - for example I repeatedly got some off-by-one changes
after refactoring the halt condition slightly (differs depending on
adding the halts solely within paint_queue_get or having it at the end
of the loop) and I think potentially other future work could affect it.
But I'm happy to attach the steps checks for more relevant tests,
it's not much work to change.
> One way to accomplish that is to reorder this patch before adding
> those tests so their first version includes these checks and then
> the values update when changing the implementation.
I was thinking I could keep the same order, but the patch to introduce
the trace could also modify the tests at the same time - that would
perhaps make it even more clear. Also this means I could avoid
making changes to Elijah's commit that I already _partly_ butchered
(extracted the test change as-is, but dropped the other file changes)
and I don't want to make that one more unclean.
Thanks,
Kristofer
^ permalink raw reply
* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
From: Kristofer Karlsson @ 2026-06-24 14:25 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <67c00a9f-2aa2-4e83-9c0a-317ca589b232@gmail.com>
On Wed, 24 Jun 2026 at 15:34, Derrick Stolee <stolee@gmail.com> wrote:
>
> I like seeing these updates including the deterministic steps. Is there
> a reason you don't include the step data for 'merge-tree (across import)'
> in your monorepo case? The wall-clock is substantial, so it's not like the
> last two examples in git.git where there may not be any difference.
I will have to attribute to laziness I suppose :)
I ran the initial benchmarks before adding the trace, and I didn't
update all of them,
just enough to show the improvement and value of the trace data.
I will ensure that I include all of it in the next version though
(maybe 1-2 days from now?) or maybe drop some of the benchmarks to
not overload with partly redundant information.
(merge-tree benchmarks doesn't perhaps add much significance on top
of merge-base in practice).
Thanks,
Kristofer
^ permalink raw reply
* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
From: Derrick Stolee @ 2026-06-24 14:09 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> commit-reach: terminate merge-base walk when one side is exhausted
>
> Optimize paint_down_to_common() for merge-base queries that hit large
> one-sided histories.
I completed my review of this version. All of my comments are around
either making the commit history a little cleaner or expanding the
tests that use the trace2 data.
I believe that this code is _correct_ and could be shipped as-is. My
comments are focused on making it the best that it could be, with an
eye towards a cleaner final result or a more robust test setup.
The most actionable things are:
1. You can add tracing before the new tests, allowing the new tests
to also check the step counts in their first versions and then
get updated in the final patch to demonstrate how they change
with that behavior change.
2. The t6600 helper 'test_all_modes' could set GIT_TRACE2_EVENT for
each mode into a different trace file that can be scanned later.
This will simplify your current tracing tests but also unlock
easier tracing like this in the future.
3. The termination condition depending on min_generation could be
refactored into paint_queue_get() to help make things even more
obvious as to when we terminate. This should help with your
concerns that you mentioned in response to patch 2/6 of the
previous version:
> I am not 100% happy with the halt-condition placement yet --
> the existing loop in master already has several exit paths
> (while condition, min_generation break, FIND_ALL break) and I
> think there is an opportunity to consolidate them. But that is
> a separate discussion and I do not want to derail this series.
> I can propose some alternatives in a follow-up after this
> lands.
I then have some super minor comments around making the diffs
even easier to read, but they could be ignored as they are very
nit-picky. It's the kind of detail that I would try to resolve
if I was the author, but I'm _not_. You are. Your time is
valuable so make your own conclusions as to whether you want to
go down that road. You've already entertained my ideas around
updating the docs as the implementation changes.
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Derrick Stolee @ 2026-06-24 14:02 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <d84b932e5b078edc8255b6944ecb67fc1aa086b0.1782303254.git.gitgitgadget@gmail.com>
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> Add an early termination check to paint_down_to_common() using the
> per-side counters introduced earlier. Once the walk enters the
> finite-generation region, terminate early when one side's exclusive
> count drops to zero -- no new merge-base can form without both paint
> sides meeting.
Having this as the last patch is truly a nice climax moment for the
patch series!
> @@ -94,6 +94,9 @@ ends when one of the following conditions holds:
>
> 1. The queue is empty.
> 2. The queue contains only stale entries.
> + 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
> + remain in the queue, no pending merge-base candidates exist,
> + and the walk has entered the finite-generation region.
...> +Side-exhaustion condition
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +A new merge-base requires commits from both sides to meet. When one
> +side's exclusive counter reaches zero and there are no pending
> +merge-base candidates, no future traversal step can produce a new
> +candidate.
> +
> +This optimization only activates in the finite-generation region
> +where topological ordering holds. In that region, children are
> +always visited before parents, so paint flags are final at visit
> +time and an exhausted side cannot reappear. In the INFINITY region,
> +commit-date ordering can violate this guarantee, so the check is
> +skipped.
> +
And these doc updates inline make me happy.
> Related documentation
> ---------------------
>
> diff --git a/commit-reach.c b/commit-reach.c
> index e0d9874f99..f79d0b64d6 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -133,17 +133,30 @@ static void paint_queue_put(struct paint_state *state,
>
> static struct commit *paint_queue_get(struct paint_state *state)
> {
> - struct commit *commit;
> + struct commit *commit = prio_queue_get(&state->queue);
>
> - if (!state->p1_count && !state->p2_count &&
> - !state->pending_merge_bases)
> + if (!commit)
> return NULL;
I see how the previous implementation has a termination condition
before calling prio_queue_get(), which is technically more
efficient. It does make this initial diff a bit more complicated
because we are moving the prio_queue_get() line.
If the introduction of the method in patch 5/7 looked like this:
+static struct commit *paint_queue_get(struct paint_state *state)
+{
+ struct commit *commit = prio_queue_get(&state->queue);
+
+ if (!commit)
+ return NULL;
+
+ if (!state->p1_count && !state->p2_count &&
+ !state->pending_merge_bases)
+ return NULL;
+
+ commit->object.flags &= ~ENQUEUED;
+ paint_count_update(state, commit->object.flags, -1);
+ return commit;
+}
Then this diff would look cleaner.
(This is the nittiest of nitpicks so feel free to ignore if this
doesn't bother you at all.)
> - commit = prio_queue_get(&state->queue);
> - if (commit) {
> - commit->object.flags &= ~ENQUEUED;
> - paint_count_update(state, commit->object.flags, -1);
> + commit->object.flags &= ~ENQUEUED;
> +
> + if (!state->pending_merge_bases) {
> + if (!state->p1_count && !state->p2_count)
> + return NULL;
> + /*
> + * Side exhaustion: a new merge-base can only form
> + * when both PARENT1-only and PARENT2-only commits
> + * remain in the queue. In the finite-generation
> + * region the queue is ordered topologically, so
> + * no future step can add paint to visited commits
> + * and an exhausted side cannot reappear.
> + */
> + if ((!state->p1_count || !state->p2_count) &&
> + commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
> + return NULL;
> }
> +
> + paint_count_update(state, commit->object.flags, -1);
> return commit;
> }
I like how the crux of this implementation is entirely within
paint_queue_get() now.
> diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> index c1109fb42f..03175befb3 100755
> --- a/t/t6600-test-reach.sh
> +++ b/t/t6600-test-reach.sh
> @@ -332,12 +332,12 @@ test_expect_success 'merge-base --all commit-walk steps' '
> cp commit-graph-full .git/objects/info/commit-graph &&
> GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
> git merge-base --all commit-9-9 commit-9-1 >actual &&
> - test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
> + test_trace2_data paint_down_to_common steps 9 <trace-full.txt &&
>
> cp commit-graph-half .git/objects/info/commit-graph &&
> GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
> git merge-base --all commit-9-9 commit-9-1 >actual &&
> - test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> + test_trace2_data paint_down_to_common steps 57 <trace-half.txt
> '
I love to see these steps change. If you take my suggestion to
update more tests with these checks, then this diff will get bigger
(but in a deserved way).
Also, when I suggested that 'test_all_modes' creates the trace
files on our behalf, I forgot to mention that this specific test
that you added in patch 4/7 simplifies by running the merge-base
check under 'test_all_modes' and then checking the trace2 data
on the three well-known files afterwards.
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers
From: Derrick Stolee @ 2026-06-24 13:55 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <8c72f01083237c00397dd074beda8f854e882cbe.1782303254.git.gitgitgadget@gmail.com>
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became
> unused after the previous commit. The core nonstale_queue functions
> remain in use by ahead_behind().
This is a nice cleanup that makes the previous diff easier to
read. Thanks!
^ permalink raw reply
* Re: [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters
From: Derrick Stolee @ 2026-06-24 13:54 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <f24edd45f0af1da64513164d5d720fe70c1decff.1782303254.git.gitgitgadget@gmail.com>
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> Termination
> -----------
>
> -The walk uses a `nonstale_queue` wrapper around `prio_queue` that
> -tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
> -so far. Once that commit is dequeued, every remaining entry is known
> -to be STALE and the loop terminates. Specifically, the main loop
> +The walk tracks the number of commits of each type in the queue
> +(PARENT1-only, PARENT2-only, pending merge-base). The main loop
> ends when one of the following conditions holds:
>
> 1. The queue is empty.
> - 2. `max_nonstale` has been dequeued, meaning the queue only contains
> - STALE entries.
> + 2. The queue contains only stale entries.
I'm grateful to see these changes happening to the doc in real-
time. I know it was extra work, but I'm grateful right now.
Hopefully future historians will also benefit from this effort.
> +static void paint_count_update(struct paint_state *state,
> + unsigned flags, int delta)
> +{
> + switch (flags & (PARENT1 | PARENT2 | STALE)) {
> + case PARENT1:
> + state->p1_count += delta;
> + break;
> +
> + case PARENT2:
> + state->p2_count += delta;
> + break;
> +
> + case PARENT1 | PARENT2:
> + state->pending_merge_bases += delta;
> + break;
> +
> + case PARENT1 | PARENT2 | STALE:
> + break;
> +
> + default:
> + BUG("unexpected paint state");
> + }
> +}
I like the use of 'delta' to allow reuse of this switch.
> +
> +static void paint_queue_put(struct paint_state *state,
> + struct commit *c, unsigned add_flags)
> +{
> + unsigned old_flags = c->object.flags;
> + c->object.flags |= add_flags;
> +
> + if (old_flags & ENQUEUED) {
> + paint_count_update(state, old_flags, -1);
> + paint_count_update(state, c->object.flags, 1);
> + } else {
> + c->object.flags |= ENQUEUED;
> + prio_queue_put(&state->queue, c);
> + paint_count_update(state, c->object.flags, 1);
> + }
> +}
ok: if we are already in the queue then we have old flags and
may need to subtract their values because they were counted
already. Otherwise, we need to queue it for the first time and
only add the values. Makes sense.
> +
> +static struct commit *paint_queue_get(struct paint_state *state)
> +{
Since we are going to make this a more complete termination
condition, we may want to make that very explicit with a doc-
comment. Something along the lines of "dequeue a commit when
possible, but also signal termination of the walk when we
conclude that no more merge bases will be discovered due to
internal state."
> @@ -187,12 +253,11 @@ static int paint_down_to_common(struct repository *r,
> return error(_("could not parse commit %s"),
> oid_to_hex(&p->object.oid));
> }
> - p->object.flags |= flags;
> - nonstale_queue_put_dedup(&queue, p);
> + paint_queue_put(&state, p, flags);
I like how this simplifies the flag-assignment logic somewhat.
You mentioned in your cover letter how the min_generation value
can add extra termination conditions. It may be a good idea to
insert min_generation into the paint_queue struct and make it a
termination condition for paint_queue_get(). If you consider this
direction, then I'd make it a separate patch on top of this one
_before_ adding the one-sided change. The extra tests that cover
the exact number of walked commits can help to guarantee the same
behavior, assuming that some of those tests check a non-zero
min_generation input. (It may be good to add such trace tests in
an earlier patch to help confidence in this case.)
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases
From: Derrick Stolee @ 2026-06-24 13:43 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <6151b8e0a3989a51e6d9717e0ceac439f26f1c1d.1782303254.git.gitgitgadget@gmail.com>
On 6/24/2026 8:14 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> Add test cases to t6600-test-reach.sh that exercise edge cases in the
> side-exhaustion optimization for paint_down_to_common():
>
> - in_merge_bases_many:self: commit is both A and one of the X inputs
> - get_merge_bases_many:duplicate-twos: duplicate entries in X list
> - get_merge_bases_many:pending-stale: STALE transition on an
> already-painted commit (ps-* diamond topology)
> - get_merge_bases_many:infinity-both-sides: both tips outside the
> commit-graph with non-monotonic dates (pi-* topology)
I'm happy that these cases now exist.
> +test_expect_success 'in_merge_bases_many:self' '
> + cat >input <<-\EOF &&
> + A:commit-6-8
> + X:commit-5-9
> + X:commit-6-8
> + EOF
> + echo "in_merge_bases_many(A,X):1" >expect &&
> + test_all_modes in_merge_bases_many
> +'
and using 'test_all_modes' is great to get coverage of all the
different commit-graph states. In reply to patch v2 4/7 I ask
to see the results of the traces in these kinds of test cases,
but each of these modes will have different values.
One way to make these tests have potential to check exact stats
without too much extra work would be to update 'test_all_modes'
to run each command with GIT_TRACE2_EVENT set to a known trace
file (reset each time) that can then be checked after verifying
that the results of each command is the same.
Then, these tests could have lines such as
test_trace2_data paint_down_to_common steps 20 <trace-full.txt &&
test_trace2_data paint_down_to_common steps 30 <trace-half.txt &&
test_trace2_data paint_down_to_common steps 40 <trace-none.txt
after the test_all_modes line.
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common()
From: Derrick Stolee @ 2026-06-24 13:41 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <6ade4df2ed2a836a3b4c5400ab13e8247e36c029.1782303254.git.gitgitgadget@gmail.com>
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> Add a step counter and trace2_data_intmax() call so that the number
> of commits visited during the paint walk is observable via
> GIT_TRACE2_PERF. This provides a way to measure the impact of
> future optimizations without relying on wall-clock benchmarks alone.
> + trace2_data_intmax("paint_down_to_common", r,
> + "steps", steps);
This is great data. Very clearly marked for what we should be
doing here.
> +test_expect_success 'merge-base --all commit-walk steps' '
> + test_when_finished rm -rf .git/objects/info/commit-graph \
> + .git/objects/info/commit-graphs &&
(highlighting this chunk)
> + rm -rf .git/objects/info/commit-graph \
> + .git/objects/info/commit-graphs &&
> +
> + GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \
> + git merge-base --all commit-9-9 commit-9-1 >actual &&
> + test_trace2_data paint_down_to_common steps 81 <trace-none.txt &&
I'd rather see the whitespace line before the `rm` to make it
more obvious that it's setting up the "none" case.
> +
> + cp commit-graph-full .git/objects/info/commit-graph &&
> + GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
> + git merge-base --all commit-9-9 commit-9-1 >actual &&
> + test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
> +
> + cp commit-graph-half .git/objects/info/commit-graph &&
> + GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
> + git merge-base --all commit-9-9 commit-9-1 >actual &&
> + test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> +'
> +
This test is a great example. I look forward to seeing that it
updates in the future.
One thing I was hoping to see was that your side-exhaustion tests
(from patch v2 2/7) would also include these checks so they are
more obviously updating when the implementation updates later.
One way to accomplish that is to reorder this patch before adding
those tests so their first version includes these checks and then
the values update when changing the implementation.
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
From: Derrick Stolee @ 2026-06-24 13:34 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> Benchmarks
>
> Step counts are deterministic (measured via trace2_data_intmax added in
> patch 4). Wall-clock times are medians over 10-20 runs with CPU governor set
> to performance.
>
> 2.6M-commit monorepo with commit-graph (baseline v2.55.0-rc1):
>
> steps wall-clock
> merge-base --all (across import) 2682391 -> 53521 7.26s -> 88ms
> merge-base --all (1000 apart) 2659607 -> 1106 6.98s -> 8ms
> merge-tree (across import) - 8.11s -> 100ms
>
>
> git.git (88k commits, commit-graph):
>
> steps wall-clock
> merge-base --all v2.0.0 v2.55.0-rc1 72264 -> 44589 82ms -> 49ms
> merge-base --all HEAD HEAD~1000 9873 -> 3817 19ms -> 9ms
> merge-base --all HEAD HEAD~10000 72285 -> 41523 80ms -> 48ms
> merge-base HEAD HEAD~1000 - 9ms -> 9ms
> merge-base --is-ancestor HEAD~1000 HEAD - 6ms -> 6ms
I like seeing these updates including the deterministic steps. Is there
a reason you don't include the step data for 'merge-tree (across import)'
in your monorepo case? The wall-clock is substantial, so it's not like the
last two examples in git.git where there may not be any difference.
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH GSoC RFC v13 06/12] connect: refactor packet writing
From: Karthik Nayak @ 2026-06-24 13:25 UTC (permalink / raw)
To: Pablo Sabater
Cc: gitster, peff, eric.peijian, chriscool, git, jltobler, toon,
chandrapratap3519
In-Reply-To: <CAN5EUNRN4_5PS3cbQtQfpyRuwByvV=qvAVKnVbgT-pirKGnnTg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3976 bytes --]
Pablo Sabater <pabloosabaterr@gmail.com> writes:
> El lun, 22 jun 2026 a las 22:43, Karthik Nayak
> (<karthik.188@gmail.com>) escribió:
>>
>> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>>
>> [snip]
>>
>> > diff --git a/connect.c b/connect.c
>> > index 1dced8e632..78c69d4485 100644
>> > --- a/connect.c
>> > +++ b/connect.c
>> > @@ -700,16 +700,16 @@ int server_supports(const char *feature)
>> > return !!server_feature_value(feature, NULL);
>> > }
>> >
>> > -void write_fetch_command_and_capabilities(struct strbuf *req_buf,
>> > - const struct string_list *server_options)
>> > +void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
>> > + const struct string_list *server_options)
>> > {
>> > const char *hash_name;
>> > int advertise_sid;
>> >
>> > repo_config_get_bool(the_repository, "transfer.advertisesid", &advertise_sid);
>> >
>> > - ensure_server_supports_v2("fetch");
>> > - packet_buf_write(req_buf, "command=fetch");
>> > + ensure_server_supports_v2(command);
>> > + packet_buf_write(req_buf, "command=%s", command);
>> > if (server_supports_v2("agent"))
>> > packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
>> > if (advertise_sid && server_supports_v2("session-id"))
>> > @@ -727,7 +727,7 @@ void write_fetch_command_and_capabilities(struct strbuf *req_buf,
>> > die(_("mismatched algorithms: client %s; server %s"),
>> > the_hash_algo->name, hash_name);
>> > packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
>> > - } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
>> > + } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
>> > die(_("the server does not support algorithm '%s'"),
>> > the_hash_algo->name);
>> > }
>>
>> Why did we make this change? If the server doesn't support v2, then the
>> object format should be `GIT_HASH_SHA1_LEGACY`. While the value of it is
>> indeed `GIT_HASH_SHA1`, it indicates a scenario where there was no
>> option to select object hash, which is the scenario here.
>>
>> If there is a reason to make such a change, perhaps we should highlight
>> this in the commit message.
>
> Hi!
> There should be no diff related to that line, In some point between
> Eric's last version (v11) and mine's firs (v12) the original code
> changed. On the diff from v11 [1] the object format is the same, i
> didn't notice this change and it's wrong, I'll fix it for v14, Thanks!
>
>>
>> > diff --git a/connect.h b/connect.h
>> > index c4f6ea4b0a..8f4c523892 100644
>> > --- a/connect.h
>> > +++ b/connect.h
>> > @@ -34,8 +34,12 @@ void check_stateless_delimiter(int stateless_rpc,
>> > struct packet_reader *reader,
>> > const char *error);
>> >
>> > +/*
>> > + * Writes a command along with the requested server capabilities/features into a
>> > + * request buffer.
>> > + */
>> > struct string_list;
>>
>> The comment should be above the function and not the forward
>> declaration.
>
> True, I'll fix it for v14.
>
>>
>> While we're here, why not `#include "string-list.h"` and remove the
>> forward declaration, is there a circular dependency?
>
> I believe this was right because from what I know forward declarations
> are prefered in headers when in this case, the struct is only used as
> a pointer. Investigating, this came from a review from patrick [2].
>
That's fair.
Nit: Maybe add this context to the commit message?
Thanks
> [snip]
>
> [1]: https://lore.kernel.org/git/20250221190451.12536-5-eric.peijian@gmail.com/
> [2]: https://lore.kernel.org/git/Z0RIqUAoEob8lGfM@pks.im/
>
> Thanks for the review,
> Pablo.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply
* [ANNOUNCE] Git for Windows 2.55.0-rc2
From: Johannes Schindelin @ 2026-06-24 13:07 UTC (permalink / raw)
To: git, git-packagers
Dear Git users,
I hereby announce that Git for Windows 2.55.0-rc2 is available from:
https://github.com/git-for-windows/git/releases/tag/v2.55.0-rc2.windows.1
Changes since Git for Windows v2.54.0 (April 20th 2026)
Following the MSYS2 project, on which Git for Windows is based, Windows
8.1 support will be dropped after Git for Windows v2.55.
New Features
* Comes with Git v2.55.0-rc2.
* Comes with the MSYS2 runtime (Git for Windows flavor) based on
Cygwin v3.6.9.
* Comes with Git Credential Manager v2.8.0.
* Comes with cURL v8.20.0.
* Comes with less 702.
* The FSCache now accelerates more git add scenarios.
* Comes with OpenSSL v3.5.7.
* The diff helper handling Word documents was ported from Perl to
Rust.
* Comes with Bash v5.3.15.
Bug Fixes
* A regression in v2.54.0 that could cause endless "Unlink of file
'.git/objects/pack/pack-.idx' failed. Should I try again?" loops on
older Windows 10 versions during git fetch operations was fixed.
* A bug that prevented proper shutdown of processes launched via Git
Bash under certain circumstances was fixed.
* A bug was fixed which could cause parallel checkouts to fail under
certain circumstances when the FSCache is enabled.
* Git Bash (MinTTY) now respects screen scaling settings under more
circumstances.
* A couple of bugs revolving around very large clones/packfiles/
objects were fixed.
* Following TortoiseGit v2.19, .git file validations have been
tightened. As a consequence, access to remote repositories via UNC
file paths needs to be allowed explicitly via the safe.directory
configuration.
Git-2.55.0-rc2-64-bit.exe | 804b7e3392c94c26b5bd3a979febcf5f0bdac04b09d20572c10ab3daca67dd0e
Git-2.55.0-rc2-arm64.exe | 08009883dd388a2de7f6fbaa0e2efafe35d3a041269760d265c0462a52cd5802
PortableGit-2.55.0-rc2-64-bit.7z.exe | ed27e72d93a42ab9f4e62ff0a9f7b54a84c8712d29b3817af60b2a986b0a626f
PortableGit-2.55.0-rc2-arm64.7z.exe | cf1a4ab5f7ba0d35d4a9919d84327fd3e0ca170fbdcea4a82a49091614e925e5
MinGit-2.55.0-rc2-64-bit.zip | 8cb142c6bc4d8385d5efbe0068f8804573e3b569720f6f7f64e90425b92d05c9
MinGit-2.55.0-rc2-arm64.zip | 64226274ad601325aa181bca160529d0d26f13cd64742ce1b939ac096297476c
MinGit-2.55.0-rc2-32-bit.zip | a57922d0c1cd8925f24affee4a42e8e216dacb0ac51f675bbd85ef8e2ebfde01
MinGit-2.55.0-rc2-busybox-64-bit.zip | caf5daa4d5d9b251cc7d109202aee424100bffa12f9e146810255e908d02e9fb
MinGit-2.55.0-rc2-busybox-32-bit.zip | 1edddbfa04d2f544a9f34b275fd9844d3b365d7c318ab29f996bca9bb48963a2
Git-2.55.0-rc2-64-bit.tar.bz2 | a700459442bff0705e62e3441c1f6e5cfdaf191156e7cc06b68544678d663f3a
Git-2.55.0-rc2-arm64.tar.bz2 | 44fc3dfd5235c17fa607dcd9972f7c6b67fb106a4bdbcabb74ab17313d9aeaaa
Ciao,
Johannes
^ permalink raw reply
* Re: [PATCH 1/2] branch: suggest <remote>/<branch> on upstream slip
From: Ben Knoble @ 2026-06-24 12:35 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Harald Nordgren via GitGitGadget, git, Harald Nordgren
In-Reply-To: <xmqq1pdytkmj.fsf@gitster.g>
> Le 22 juin 2026 à 15:58, Junio C Hamano <gitster@pobox.com> a écrit :
>
> "Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Harald Nordgren <haraldnordgren@gmail.com>
>>
>> "git branch --set-upstream-to origin main" reads the trailing word as
>> the local branch to operate on and dies with "branch 'main' does not
>> exist", pointing at the wrong problem.
>
> When 'main' does not exist locally,
>
> $ git branch --set-upstream-to "$anything" main
>
> would fail before even looking at the "$anything" (which is supposed
> to specify the new_upstream for the named local branch 'main'). The
> operation is to set the upstream for 'main', and if 'main' does not
> exist, doesn't the user deserve the error that says 'main' does not
> exist, no matter what "$anything" is, whether it is a well-formed or
> ill-formed remote tracking branch name?
>
> So it is unclear, at least to me, why "branch 'main' does not exist"
> is an inappropriate message, mostly because these three lines does
> not clearly tell me what the user _expected_ the command line to do.
>
> When 'main' does exist, but named upstream "$anything" does not, we
> get
>
> $ git branch sample master ;# make sure the thing exists
> $ git branch --set-upstream-to origin sample
> fatal: the requested upstream branch 'origin' does not exist
Relatedly, if memory serves: when origin/HEAD is available locally, this works to set sample’s upstream branch to whatever origin/HEAD refers to, right? So it may not even be a mistake. (I’m pretty sure I’ve used something like « git branch -u origin » when I didn’t give an upstream of « origin » to « git switch -c branch » for various reasons.)
^ permalink raw reply
* Re: [PATCH GSoC RFC v13 05/12] fetch-pack: move function to connect.c
From: Pablo Sabater @ 2026-06-24 12:24 UTC (permalink / raw)
To: Chandra Pratap
Cc: gitster, peff, eric.peijian, chriscool, git, jltobler,
karthik.188, toon
In-Reply-To: <CA+J6zkQEqTeNWkHJWDD6MmK4hesKofBVobDt9OcQ-FSVLC28pw@mail.gmail.com>
El dom, 21 jun 2026 a las 7:38, Chandra Pratap
(<chandrapratap3519@gmail.com>) escribió:
>
> On Fri, 19 Jun 2026 at 20:26, Pablo Sabater <pabloosabaterr@gmail.com> wrote:
> >
> > write_fetch_command_and_capabilities will be refactored in a subsequent
>
> Nit: the rest of this patch's body referes to this function as:
> `write_fetch_command_and_capabilities()`
>
> Let's use that here as well.
I'll do that, thanks.
>
> > commit where it will become a more general-purpose function, making it
> > more accessible to additional commands in the future.
> >
> > To move `write_fetch_command_and_capabilities()` to `connect.c`, we need
> > to adjust how `advertise_sid` is managed. Previously in `fetch_pack.c`,
> > `advertise_sid` was a static variable, modified using
> > `repo_config_get_bool()`.
> >
> > In `connect.c`, we now initialize `advertise_sid` at the begining by
> > directly using `repo_config_get_bool()`. This change is safe because:
> >
> > In the original `fetch-pack.c` code, there are only two places that write
> > `advertise_sid`:
> >
> > 1. In function `do_fetch_pack()`:
> > if (!sever_supports("session_id"))
>
> s/sever/server
True, thanks.
>
> > advertise_sid = 0;
> > 2. In function `fetch_pack_config()`:
> > repo_config_get_bool("transfer.advertisesid", &advertise_sid);
> >
> > About 1, since `do_fetch_pack()` is only relevant for protocol v1, this
> > assignment can be ignored, as `write_fetch_command_and_capabilities()`
> > is only used in v2.
> >
> > About 2, `repo_config_get_bool()` is from `config.h` and it's an out-of-box
> > dependency of `connect.c`, so we can reuse it directly.
> >
> > Move `write_fetch_command_and_capabilities()` to `connect.c`
>
> Nit: this is a better patch header than "move function to connect.c",
> since it better describes the exact change we intend to make.
>
> Let's use it instead.
Okay, I'll use it.
Thanks for the review,
Pablo
^ permalink raw reply
* Re: [PATCH GSoC RFC v13 05/12] fetch-pack: move function to connect.c
From: Pablo Sabater @ 2026-06-24 12:21 UTC (permalink / raw)
To: Karthik Nayak
Cc: gitster, peff, eric.peijian, chriscool, git, jltobler, toon,
chandrapratap3519
In-Reply-To: <CAOLa=ZRUoBKPAjh6He0qgdZdzAzMxmeS9RMRi-czpHEfKG6EKw@mail.gmail.com>
El lun, 22 jun 2026 a las 12:30, Karthik Nayak
(<karthik.188@gmail.com>) escribió:
>
> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>
> > write_fetch_command_and_capabilities will be refactored in a subsequent
> > commit where it will become a more general-purpose function, making it
> > more accessible to additional commands in the future.
>
> Okay.
>
> > To move `write_fetch_command_and_capabilities()` to `connect.c`, we need
> > to adjust how `advertise_sid` is managed. Previously in `fetch_pack.c`,
> > `advertise_sid` was a static variable, modified using
> > `repo_config_get_bool()`.
>
> Nit: What's missing is why do we need to move it to 'connect.c', I
> assume this is because it being generic means its better placed in
> connect.c over 'fetch-pack.c'. Would be nice to explicitly mention that
> perhaps?
True, it is for that reason, I'll write it explicitly in the next
version, thanks!
>
> >
> > In `connect.c`, we now initialize `advertise_sid` at the begining by
> > directly using `repo_config_get_bool()`. This change is safe because:
> >
> > In the original `fetch-pack.c` code, there are only two places that write
> > `advertise_sid`:
> >
> > 1. In function `do_fetch_pack()`:
> > if (!sever_supports("session_id"))
> > advertise_sid = 0;
> > 2. In function `fetch_pack_config()`:
> > repo_config_get_bool("transfer.advertisesid", &advertise_sid);
> >
> > About 1, since `do_fetch_pack()` is only relevant for protocol v1, this
> > assignment can be ignored, as `write_fetch_command_and_capabilities()`
> > is only used in v2.
> >
> > About 2, `repo_config_get_bool()` is from `config.h` and it's an out-of-box
> > dependency of `connect.c`, so we can reuse it directly.
> >
> > Move `write_fetch_command_and_capabilities()` to `connect.c`
> >
>
> Nit: Wouldn't it then make sense to split this into two?
> 1. Drop usage of the static `advertise_sid` within
> `write_fetch_command_and_capabilities()`.
> 2. Move `write_fetch_command_and_capabilities()` to `connect.c`
>
> That way the second patch is simply a move?
Okay, seems fair, I'll do that, thanks.
>
> [snip]
Thanks for the review,
Pablo
^ permalink raw reply
* [PATCH 6/6] odb: document object info fields
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
Some of the fields in `struct object_info` are undocumented. Add these
missing comments.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/odb.h b/odb.h
index e0d05eaf87..a44ec46b08 100644
--- a/odb.h
+++ b/odb.h
@@ -283,12 +283,28 @@ struct object_info_source {
} u;
};
+/*
+ * The object info contains the query and response that is to be used for
+ * functions that end up reading object information. Callers are expected to
+ * populate pointers whose information they want to request.
+ */
struct object_info {
- /* Request */
+ /* The object type. */
enum object_type *typep;
+
+ /* The inflated object size in bytes. */
size_t *sizep;
+
+ /* The object size as stored on disk. */
off_t *disk_sizep;
+
+ /*
+ * The base the object is deltified against, in case it is stored as a
+ * delta.
+ */
struct object_id *delta_base_oid;
+
+ /* The object contents. Ownership of memory goes over to the caller. */
void **contentp;
/*
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 5/6] odb: drop `whence` field from object info
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
In the preceding commits we have migrated all callers to derive their
information of how a specific object is stored to use the new object
info source instead, and hence the field is now unused. Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 1 -
odb.h | 7 -------
odb/source-inmemory.c | 2 --
odb/source-loose.c | 2 --
packfile.c | 2 --
5 files changed, 14 deletions(-)
diff --git a/odb.c b/odb.c
index 99f4e7551c..82c41f1793 100644
--- a/odb.c
+++ b/odb.c
@@ -691,7 +691,6 @@ static int oid_object_info_convert(struct repository *r,
return -1;
}
}
- input_oi->whence = new_oi.whence;
if (input_oi->sourcep)
*input_oi->sourcep = *new_oi.sourcep;
return ret;
diff --git a/odb.h b/odb.h
index 330a55879e..e0d05eaf87 100644
--- a/odb.h
+++ b/odb.h
@@ -311,13 +311,6 @@ struct object_info {
* or multiple times in the same source.
*/
struct object_info_source *sourcep;
-
- /* Response */
- enum {
- OI_CACHED,
- OI_LOOSE,
- OI_PACKED,
- } whence;
};
/*
diff --git a/odb/source-inmemory.c b/odb/source-inmemory.c
index 2328e62687..008e49bfe9 100644
--- a/odb/source-inmemory.c
+++ b/odb/source-inmemory.c
@@ -54,8 +54,6 @@ static void populate_object_info(struct odb_source_inmemory *source,
*oi->mtimep = 0;
if (oi->sourcep)
oi->sourcep->source = &source->base;
-
- oi->whence = OI_CACHED;
}
static int odb_source_inmemory_read_object_info(struct odb_source *source,
diff --git a/odb/source-loose.c b/odb/source-loose.c
index 5c4e9892b5..e743ccab42 100644
--- a/odb/source-loose.c
+++ b/odb/source-loose.c
@@ -198,8 +198,6 @@ static int read_object_info_from_path(struct odb_source_loose *loose,
oidclr(oi->delta_base_oid, loose->base.odb->repo->hash_algo);
if (oi->sourcep && !ret)
oi->sourcep->source = &loose->base;
- if (!ret)
- oi->whence = OI_LOOSE;
}
return ret;
diff --git a/packfile.c b/packfile.c
index fa22095b75..4a8c108034 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1421,8 +1421,6 @@ int packed_object_info_with_index_pos(struct odb_source_packed *source,
oidclr(oi->delta_base_oid, p->repo->hash_algo);
}
- oi->whence = OI_PACKED;
-
if (oi->sourcep) {
if (!source)
BUG("cannot request source without an owning source");
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 4/6] treewide: convert users of `whence` to the new source field
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
The `whence` field has become redundant now that callers can learn about
the exact source an object has been looked up from via the `struct
object_info_source::source` field.
Adapt callers to use the new field. Note that all callsites already set
up the `info.sourcep` request pointer, so the conversion is rather
straight-forward.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 2 +-
builtin/index-pack.c | 3 ++-
builtin/pack-objects.c | 2 +-
reachable.c | 2 +-
4 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index adc626ce30..1b96150e5b 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -834,7 +834,7 @@ static int batch_one_object_oi(const struct object_id *oid,
void *_payload)
{
struct for_each_object_payload *payload = _payload;
- if (oi && oi->whence == OI_PACKED)
+ if (oi && oi->sourcep->source->type == ODB_SOURCE_PACKED)
return payload->callback(oid, oi->sourcep->u.packed.pack,
oi->sourcep->u.packed.offset,
payload->payload);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 77af26db8f..1b03b07e5e 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1833,7 +1833,8 @@ static void repack_local_links(void)
if (odb_read_object_info_extended(the_repository->objects, oid, &info, 0))
/* Missing; assume it is a promisor object */
continue;
- if (info.whence == OI_PACKED && info_source.u.packed.pack->pack_promisor)
+ if (info_source.source->type == ODB_SOURCE_PACKED &&
+ info_source.u.packed.pack->pack_promisor)
continue;
if (!cmd.args.nr) {
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9deb37e9e8..d0fdfad750 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -5010,7 +5010,7 @@ static int is_not_in_promisor_pack_obj(struct object *obj, void *data UNUSED)
if (odb_read_object_info_extended(the_repository->objects, &obj->oid, &info, 0))
BUG("should_include_obj should only be called on existing objects");
- return info.whence != OI_PACKED || !info_source.u.packed.pack->pack_promisor;
+ return info_source.source->type != ODB_SOURCE_PACKED || !info_source.u.packed.pack->pack_promisor;
}
static int is_not_in_promisor_pack(struct commit *commit, void *data) {
diff --git a/reachable.c b/reachable.c
index 2fc5b82d62..123a658944 100644
--- a/reachable.c
+++ b/reachable.c
@@ -234,7 +234,7 @@ static int add_recent_object(const struct object_id *oid,
add_pending_object(data->revs, obj, "");
if (data->cb) {
- if (oi->whence == OI_PACKED)
+ if (oi->sourcep->source->type == ODB_SOURCE_PACKED)
data->cb(obj, oi->sourcep->u.packed.pack,
oi->sourcep->u.packed.offset, *oi->mtimep);
else
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 3/6] odb: add `source` field to struct object_info_source
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
The previous commit introduced `struct object_info_source` as an opt-in
container for backend-specific information, but for now we only moved
preexisting data into this structure. Most importantly, the caller has
no way yet to learn about which source an object was actually looked up
from. Instead, callers have to rely on the `whence` enum to distinguish
the object type, but cannot use that enum to tell the object source.
Add a `struct odb_source *source` field to the structure and populate it
from each backend's lookup path.
The `whence` enum is still set and used by callers; it will be removed
in a subsequent commit now that `sourcep->source` can identify the
backend on its own.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 3 +++
odb/source-inmemory.c | 3 +++
odb/source-loose.c | 2 ++
packfile.c | 6 +++++-
4 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/odb.h b/odb.h
index 770900289a..330a55879e 100644
--- a/odb.h
+++ b/odb.h
@@ -253,6 +253,9 @@ int odb_pretend_object(struct object_database *odb,
* more about how exactly it is stored.
*/
struct object_info_source {
+ /* The source that this object has been looked up from. */
+ struct odb_source *source;
+
/*
* Backend-specific information about the specific object. This can be
* used for example to uniquely identify a given object in case it
diff --git a/odb/source-inmemory.c b/odb/source-inmemory.c
index e004566d76..2328e62687 100644
--- a/odb/source-inmemory.c
+++ b/odb/source-inmemory.c
@@ -52,6 +52,9 @@ static void populate_object_info(struct odb_source_inmemory *source,
*oi->contentp = xmemdupz(object->buf, object->size);
if (oi->mtimep)
*oi->mtimep = 0;
+ if (oi->sourcep)
+ oi->sourcep->source = &source->base;
+
oi->whence = OI_CACHED;
}
diff --git a/odb/source-loose.c b/odb/source-loose.c
index 66e6bb8d3f..5c4e9892b5 100644
--- a/odb/source-loose.c
+++ b/odb/source-loose.c
@@ -196,6 +196,8 @@ static int read_object_info_from_path(struct odb_source_loose *loose,
oi->typep = NULL;
if (oi->delta_base_oid)
oidclr(oi->delta_base_oid, loose->base.odb->repo->hash_algo);
+ if (oi->sourcep && !ret)
+ oi->sourcep->source = &loose->base;
if (!ret)
oi->whence = OI_LOOSE;
}
diff --git a/packfile.c b/packfile.c
index 688c410b35..fa22095b75 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1324,7 +1324,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info_with_index_pos(struct odb_source_packed *source UNUSED,
+int packed_object_info_with_index_pos(struct odb_source_packed *source,
struct packed_git *p, off_t obj_offset,
uint32_t *maybe_index_pos, struct object_info *oi)
{
@@ -1424,6 +1424,10 @@ int packed_object_info_with_index_pos(struct odb_source_packed *source UNUSED,
oi->whence = OI_PACKED;
if (oi->sourcep) {
+ if (!source)
+ BUG("cannot request source without an owning source");
+ oi->sourcep->source = &source->base;
+
oi->sourcep->u.packed.offset = obj_offset;
oi->sourcep->u.packed.pack = p;
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 2/6] odb: make backend-specific fields optional
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
The `struct object_info` carries two pieces of information
about how an object was looked up:
- The `whence` enum identifying the backend.
- The backend-tagged union `u` exposing backend-specific details
(currently only the packed-source case, which records the owning
pack, offset and packed object type).
The union is populated unconditionally, even though most callers don't
care about provenance at all.
Split the backend-specific union out into a new public type, `struct
object_info_source`, and make the object info structure carry it via
just another opt-in request pointer. As with all the other requestable
information, callers that need source info allocate a `struct
object_info_source` on the stack and point `sourcep` at it; callers that
don't care about it simply leave the field as a `NULL` pointer. Adapt
callers accordingly.
Note that the `whence` enum is strictly-speaking also backend-specific
information, so it would be another good candidate to be moved into the
`struct object_info_source`. For now though it is left alone, as it will
be replaced by a `struct odb_source` pointer in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 8 +++++--
builtin/index-pack.c | 8 +++++--
builtin/pack-objects.c | 15 +++++++++----
odb.c | 3 ++-
odb.h | 60 +++++++++++++++++++++++++++++++++-----------------
packfile.c | 33 ++++++++++++++-------------
reachable.c | 5 ++++-
7 files changed, 87 insertions(+), 45 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 8726485f1f..adc626ce30 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -835,7 +835,8 @@ static int batch_one_object_oi(const struct object_id *oid,
{
struct for_each_object_payload *payload = _payload;
if (oi && oi->whence == OI_PACKED)
- return payload->callback(oid, oi->u.packed.pack, oi->u.packed.offset,
+ return payload->callback(oid, oi->sourcep->u.packed.pack,
+ oi->sourcep->u.packed.offset,
payload->payload);
return payload->callback(oid, NULL, 0, payload->payload);
}
@@ -906,7 +907,10 @@ static void batch_each_object(struct batch_options *opt,
&payload, flags);
}
} else {
- struct object_info oi = { 0 };
+ struct object_info_source oi_source;
+ struct object_info oi = {
+ .sourcep = &oi_source,
+ };
for (source = the_repository->objects->sources; source; source = source->next) {
struct odb_source_files *files = odb_source_files_downcast(source);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f396658468..77af26db8f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1825,11 +1825,15 @@ static void repack_local_links(void)
oidset_iter_init(&outgoing_links, &iter);
while ((oid = oidset_iter_next(&iter))) {
- struct object_info info = OBJECT_INFO_INIT;
+ struct object_info_source info_source;
+ struct object_info info = {
+ .sourcep = &info_source,
+ };
+
if (odb_read_object_info_extended(the_repository->objects, oid, &info, 0))
/* Missing; assume it is a promisor object */
continue;
- if (info.whence == OI_PACKED && info.u.packed.pack->pack_promisor)
+ if (info.whence == OI_PACKED && info_source.u.packed.pack->pack_promisor)
continue;
if (!cmd.args.nr) {
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 620d9ce085..9deb37e9e8 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4491,8 +4491,9 @@ static int add_object_in_unpacked_pack(const struct object_id *oid,
void *data UNUSED)
{
if (cruft) {
- add_cruft_object_entry(oid, OBJ_NONE, oi->u.packed.pack,
- oi->u.packed.offset, NULL, *oi->mtimep);
+ add_cruft_object_entry(oid, OBJ_NONE, oi->sourcep->u.packed.pack,
+ oi->sourcep->u.packed.offset, NULL,
+ *oi->mtimep);
} else {
add_object_entry(oid, OBJ_NONE, "", 0);
}
@@ -4509,8 +4510,10 @@ static void add_objects_in_unpacked_packs(void)
ODB_FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS |
ODB_FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS,
};
+ struct object_info_source oi_source;
struct object_info oi = {
.mtimep = &mtime,
+ .sourcep = &oi_source,
};
odb_prepare_alternates(to_pack.repo->objects);
@@ -5000,10 +5003,14 @@ static int option_parse_cruft_expiration(const struct option *opt UNUSED,
static int is_not_in_promisor_pack_obj(struct object *obj, void *data UNUSED)
{
- struct object_info info = OBJECT_INFO_INIT;
+ struct object_info_source info_source;
+ struct object_info info = {
+ .sourcep = &info_source,
+ };
+
if (odb_read_object_info_extended(the_repository->objects, &obj->oid, &info, 0))
BUG("should_include_obj should only be called on existing objects");
- return info.whence != OI_PACKED || !info.u.packed.pack->pack_promisor;
+ return info.whence != OI_PACKED || !info_source.u.packed.pack->pack_promisor;
}
static int is_not_in_promisor_pack(struct commit *commit, void *data) {
diff --git a/odb.c b/odb.c
index 7d555be09f..99f4e7551c 100644
--- a/odb.c
+++ b/odb.c
@@ -692,7 +692,8 @@ static int oid_object_info_convert(struct repository *r,
}
}
input_oi->whence = new_oi.whence;
- input_oi->u = new_oi.u;
+ if (input_oi->sourcep)
+ *input_oi->sourcep = *new_oi.sourcep;
return ret;
}
diff --git a/odb.h b/odb.h
index 3834a0dcbf..770900289a 100644
--- a/odb.h
+++ b/odb.h
@@ -248,6 +248,38 @@ int odb_pretend_object(struct object_database *odb,
void *buf, size_t len, enum object_type type,
struct object_id *oid);
+/*
+ * Object information that can be used to uniquely identify an object and learn
+ * more about how exactly it is stored.
+ */
+struct object_info_source {
+ /*
+ * Backend-specific information about the specific object. This can be
+ * used for example to uniquely identify a given object in case it
+ * exists multiple times.
+ */
+ union {
+ /*
+ * struct {
+ * ... Nothing to expose in this case
+ * } cached;
+ * struct {
+ * ... Nothing to expose in this case
+ * } loose;
+ */
+ struct {
+ struct packed_git *pack;
+ off_t offset;
+ enum packed_object_type {
+ PACKED_OBJECT_TYPE_UNKNOWN,
+ PACKED_OBJECT_TYPE_FULL,
+ PACKED_OBJECT_TYPE_OFS_DELTA,
+ PACKED_OBJECT_TYPE_REF_DELTA,
+ } type;
+ } packed;
+ } u;
+};
+
struct object_info {
/* Request */
enum object_type *typep;
@@ -269,32 +301,20 @@ struct object_info {
*/
time_t *mtimep;
+ /*
+ * Backend-specific information that tells the caller where exactly an
+ * object was looked up from. This information should help disambiguate
+ * object lookups in case the same object exists in multiple sources,
+ * or multiple times in the same source.
+ */
+ struct object_info_source *sourcep;
+
/* Response */
enum {
OI_CACHED,
OI_LOOSE,
OI_PACKED,
} whence;
- union {
- /*
- * struct {
- * ... Nothing to expose in this case
- * } cached;
- * struct {
- * ... Nothing to expose in this case
- * } loose;
- */
- struct {
- struct packed_git *pack;
- off_t offset;
- enum packed_object_type {
- PACKED_OBJECT_TYPE_UNKNOWN,
- PACKED_OBJECT_TYPE_FULL,
- PACKED_OBJECT_TYPE_OFS_DELTA,
- PACKED_OBJECT_TYPE_REF_DELTA,
- } type;
- } packed;
- } u;
};
/*
diff --git a/packfile.c b/packfile.c
index 2b741d7a76..688c410b35 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1422,22 +1422,25 @@ int packed_object_info_with_index_pos(struct odb_source_packed *source UNUSED,
}
oi->whence = OI_PACKED;
- oi->u.packed.offset = obj_offset;
- oi->u.packed.pack = p;
- switch (type) {
- case OBJ_NONE:
- oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
- break;
- case OBJ_REF_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
- break;
- case OBJ_OFS_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
- break;
- default:
- oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
- break;
+ if (oi->sourcep) {
+ oi->sourcep->u.packed.offset = obj_offset;
+ oi->sourcep->u.packed.pack = p;
+
+ switch (type) {
+ case OBJ_NONE:
+ oi->sourcep->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
+ break;
+ case OBJ_REF_DELTA:
+ oi->sourcep->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->sourcep->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->sourcep->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
}
ret = 0;
diff --git a/reachable.c b/reachable.c
index 101cfc2727..2fc5b82d62 100644
--- a/reachable.c
+++ b/reachable.c
@@ -235,7 +235,8 @@ static int add_recent_object(const struct object_id *oid,
add_pending_object(data->revs, obj, "");
if (data->cb) {
if (oi->whence == OI_PACKED)
- data->cb(obj, oi->u.packed.pack, oi->u.packed.offset, *oi->mtimep);
+ data->cb(obj, oi->sourcep->u.packed.pack,
+ oi->sourcep->u.packed.offset, *oi->mtimep);
else
data->cb(obj, NULL, 0, *oi->mtimep);
}
@@ -252,9 +253,11 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
unsigned flags;
enum object_type type;
time_t mtime;
+ struct object_info_source oi_source;
struct object_info oi = {
.mtimep = &mtime,
.typep = &type,
+ .sourcep = &oi_source,
};
int r;
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 1/6] packfile: thread odb_source_packed through packed_object_info()
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
Add an optional `struct odb_source_packed *source` parameter to
`packed_object_info()` and `packed_object_info_with_index_pos()`. This
parameter is unused at this point in time, but it will be used in a
follow-up commit so that we can record the source of a specific object.
Note that callers in "odb/source-packed.c" pass the already-available
source, but all other callers pass `NULL` instead. This is fine though,
as we only care about populating this info when called via the packed
store.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 2 +-
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
odb/source-packed.c | 4 ++--
pack-bitmap.c | 2 +-
packfile.c | 8 +++++---
packfile.h | 6 ++++--
t/helper/test-bitmap.c | 2 +-
8 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 0f3dbd9850..8726485f1f 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -497,7 +497,7 @@ static void batch_object_write(const char *obj_name,
data->info.sizep = &data->size;
if (pack)
- ret = packed_object_info(pack, offset, &data->info);
+ ret = packed_object_info(NULL, pack, offset, &data->info);
else
ret = odb_read_object_info_extended(the_repository->objects,
&data->oid, &data->info,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index bc5f9ef321..620d9ce085 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2463,7 +2463,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &size;
oi.typep = &type;
- if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(NULL, IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to odb_read_object_info, which may find another copy.
@@ -3804,7 +3804,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
ofs = nth_packed_object_offset(p, pos);
oi.typep = &type;
- if (packed_object_info(p, ofs, &oi) < 0) {
+ if (packed_object_info(NULL, p, ofs, &oi) < 0) {
die(_("could not get type of object %s in pack %s"),
oid_to_hex(oid), p->pack_name);
} else if (type == OBJ_COMMIT) {
diff --git a/commit-graph.c b/commit-graph.c
index c6d9c5c740..9dc8bd5eee 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1538,7 +1538,7 @@ static int add_packed_commits(const struct object_id *oid,
struct object_info oi = OBJECT_INFO_INIT;
oi.typep = &type;
- if (packed_object_info(pack, offset, &oi) < 0)
+ if (packed_object_info(NULL, pack, offset, &oi) < 0)
die(_("unable to get type of object %s"), oid_to_hex(oid));
return add_packed_commits_oi(oid, &oi, data);
diff --git a/odb/source-packed.c b/odb/source-packed.c
index 42c28fba0e..43fb53b72d 100644
--- a/odb/source-packed.c
+++ b/odb/source-packed.c
@@ -59,7 +59,7 @@ static int odb_source_packed_read_object_info(struct odb_source *source,
if (!oi)
return 0;
- ret = packed_object_info(e.p, e.offset, oi);
+ ret = packed_object_info(packed, e.p, e.offset, oi);
if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
@@ -99,7 +99,7 @@ static int odb_source_packed_for_each_object_wrapper(const struct object_id *oid
off_t offset = nth_packed_object_offset(pack, index_pos);
struct object_info oi = *data->request;
- if (packed_object_info_with_index_pos(pack, offset,
+ if (packed_object_info_with_index_pos(data->store, pack, offset,
&index_pos, &oi) < 0) {
mark_bad_packed_object(pack, oid);
return -1;
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 83eb47a28b..35774b6f0c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1877,7 +1877,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
ofs = pack_pos_to_offset(pack, pos);
}
- if (packed_object_info(pack, ofs, &oi) < 0) {
+ if (packed_object_info(NULL, pack, ofs, &oi) < 0) {
struct object_id oid;
nth_bitmap_object_oid(bitmap_git, &oid,
pack_pos_to_index(pack, pos));
diff --git a/packfile.c b/packfile.c
index 1d1b23b6cc..2b741d7a76 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1324,7 +1324,8 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_offset,
+int packed_object_info_with_index_pos(struct odb_source_packed *source UNUSED,
+ struct packed_git *p, off_t obj_offset,
uint32_t *maybe_index_pos, struct object_info *oi)
{
struct pack_window *w_curs = NULL;
@@ -1446,10 +1447,11 @@ int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_offset,
return ret;
}
-int packed_object_info(struct packed_git *p, off_t obj_offset,
+int packed_object_info(struct odb_source_packed *source,
+ struct packed_git *p, off_t obj_offset,
struct object_info *oi)
{
- return packed_object_info_with_index_pos(p, obj_offset, NULL, oi);
+ return packed_object_info_with_index_pos(source, p, obj_offset, NULL, oi);
}
static void *unpack_compressed_entry(struct packed_git *p,
diff --git a/packfile.h b/packfile.h
index 2329a69701..e1f77152b5 100644
--- a/packfile.h
+++ b/packfile.h
@@ -320,9 +320,11 @@ extern int do_check_packed_object_crc;
* Look up the object info for a specific offset in the packfile.
* Returns zero on success, a negative error code otherwise.
*/
-int packed_object_info(struct packed_git *pack,
+int packed_object_info(struct odb_source_packed *source,
+ struct packed_git *pack,
off_t offset, struct object_info *);
-int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_offset,
+int packed_object_info_with_index_pos(struct odb_source_packed *source,
+ struct packed_git *p, off_t obj_offset,
uint32_t *maybe_index_pos, struct object_info *oi);
void mark_bad_packed_object(struct packed_git *, const struct object_id *);
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index b130832b81..8547ef67e2 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -52,7 +52,7 @@ static int add_packed_object(const struct object_id *oid,
entry = packlist_alloc(packed, oid);
entry->idx.offset = nth_packed_object_offset(pack, pos);
- if (packed_object_info(pack, entry->idx.offset, &oi) < 0)
+ if (packed_object_info(NULL, pack, entry->idx.offset, &oi) < 0)
die("could not get type of object %s",
oid_to_hex(oid));
oe_set_type(entry, type);
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 0/6] odb: refactor source-specific information in object info
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
Hi,
this patch series refactors `struct object_info` to not contain the
`whence` field anymore.
This field only gave the caller information about the type of source
this was read from, but it didn't allow them to figure out which source
specifically yielded the object. So instead, we replace this information
with a new `struct object_info_source` field that both contains info
about the source, and any backend-specific data.
With this in place we can re-query the same backend for any given
object. More importantly though, we can eventually also use the backend-
specific data to also uniquely identify any given object, e.g. by
recording the packfile and offset, so that we can even yield the same
object in case one source contains the object multiple times.
Furthermore, with this change all information in `struct object_info` is
now following the same request-response-field style.
The series is built on top of 26d8d94e94 (A few more topics before -rc2,
2026-06-21) with ps/odb-source-packed at 1bba3c035d (odb/source-packed:
drop pointer to "files" parent source, 2026-06-17) merged into it.
Thanks!
Patrick
---
Patrick Steinhardt (6):
packfile: thread odb_source_packed through packed_object_info()
odb: make backend-specific fields optional
odb: add `source` field to struct object_info_source
treewide: convert users of `whence` to the new source field
odb: drop `whence` field from object info
odb: document object info fields
builtin/cat-file.c | 12 +++++---
builtin/index-pack.c | 9 ++++--
builtin/pack-objects.c | 19 ++++++++----
commit-graph.c | 2 +-
odb.c | 4 +--
odb.h | 80 +++++++++++++++++++++++++++++++++++---------------
odb/source-inmemory.c | 3 +-
odb/source-loose.c | 4 +--
odb/source-packed.c | 4 +--
pack-bitmap.c | 2 +-
packfile.c | 45 ++++++++++++++++------------
packfile.h | 6 ++--
reachable.c | 7 +++--
t/helper/test-bitmap.c | 2 +-
14 files changed, 130 insertions(+), 69 deletions(-)
---
base-commit: 969dbd51a70f9105ee9965adec5c5a02e75ab5b3
change-id: 20260612-b4-pks-odb-drop-whence-1b0af9ab16f4
^ permalink raw reply
* [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
From: Kristofer Karlsson <krka@spotify.com>
Add an early termination check to paint_down_to_common() using the
per-side counters introduced earlier. Once the walk enters the
finite-generation region, terminate early when one side's exclusive
count drops to zero -- no new merge-base can form without both paint
sides meeting.
The check also waits for pending_merge_bases to reach zero, ensuring
all merge-base candidates have been dequeued and recorded before
exiting.
The INFINITY gate ensures correctness: commits without a commit-graph
entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
which is not topologically reliable. The optimization only fires
once the walk enters the finite-generation region where ordering
guarantees hold.
Step counts measured with trace2 on git.git with commit-graph:
merge-base --all v2.0.0 v2.55.0-rc1:
before: 72264 steps after: 44589 steps
merge-base --all v2.55.0-rc1 v2.55.0-rc1~5:
before: 110 steps after: 7 steps
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
.../technical/paint-down-to-common.adoc | 17 ++++++++++++
commit-reach.c | 27 ++++++++++++++-----
t/t6600-test-reach.sh | 4 +--
3 files changed, 39 insertions(+), 9 deletions(-)
diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc
index 0f4e1892a5..983dfcf233 100644
--- a/Documentation/technical/paint-down-to-common.adoc
+++ b/Documentation/technical/paint-down-to-common.adoc
@@ -94,6 +94,9 @@ ends when one of the following conditions holds:
1. The queue is empty.
2. The queue contains only stale entries.
+ 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
+ remain in the queue, no pending merge-base candidates exist,
+ and the walk has entered the finite-generation region.
Stale entry condition
~~~~~~~~~~~~~~~~~~~~~
@@ -104,6 +107,20 @@ existing candidates by proving one is an ancestor of another, but
`remove_redundant()` handles that as a post-processing step, so it
is safe to exit early.
+Side-exhaustion condition
+~~~~~~~~~~~~~~~~~~~~~~~~~
+A new merge-base requires commits from both sides to meet. When one
+side's exclusive counter reaches zero and there are no pending
+merge-base candidates, no future traversal step can produce a new
+candidate.
+
+This optimization only activates in the finite-generation region
+where topological ordering holds. In that region, children are
+always visited before parents, so paint flags are final at visit
+time and an exhausted side cannot reappear. In the INFINITY region,
+commit-date ordering can violate this guarantee, so the check is
+skipped.
+
Related documentation
---------------------
diff --git a/commit-reach.c b/commit-reach.c
index e0d9874f99..f79d0b64d6 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -133,17 +133,30 @@ static void paint_queue_put(struct paint_state *state,
static struct commit *paint_queue_get(struct paint_state *state)
{
- struct commit *commit;
+ struct commit *commit = prio_queue_get(&state->queue);
- if (!state->p1_count && !state->p2_count &&
- !state->pending_merge_bases)
+ if (!commit)
return NULL;
- commit = prio_queue_get(&state->queue);
- if (commit) {
- commit->object.flags &= ~ENQUEUED;
- paint_count_update(state, commit->object.flags, -1);
+ commit->object.flags &= ~ENQUEUED;
+
+ if (!state->pending_merge_bases) {
+ if (!state->p1_count && !state->p2_count)
+ return NULL;
+ /*
+ * Side exhaustion: a new merge-base can only form
+ * when both PARENT1-only and PARENT2-only commits
+ * remain in the queue. In the finite-generation
+ * region the queue is ordered topologically, so
+ * no future step can add paint to visited commits
+ * and an exhausted side cannot reappear.
+ */
+ if ((!state->p1_count || !state->p2_count) &&
+ commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
+ return NULL;
}
+
+ paint_count_update(state, commit->object.flags, -1);
return commit;
}
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index c1109fb42f..03175befb3 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -332,12 +332,12 @@ test_expect_success 'merge-base --all commit-walk steps' '
cp commit-graph-full .git/objects/info/commit-graph &&
GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
git merge-base --all commit-9-9 commit-9-1 >actual &&
- test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
+ test_trace2_data paint_down_to_common steps 9 <trace-full.txt &&
cp commit-graph-half .git/objects/info/commit-graph &&
GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
git merge-base --all commit-9-9 commit-9-1 >actual &&
- test_trace2_data paint_down_to_common steps 81 <trace-half.txt
+ test_trace2_data paint_down_to_common steps 57 <trace-half.txt
'
test_expect_success 'reduce_heads' '
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
From: Kristofer Karlsson <krka@spotify.com>
nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became
unused after the previous commit. The core nonstale_queue functions
remain in use by ahead_behind().
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
commit-reach.c | 18 ------------------
1 file changed, 18 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index bf102f5e28..e0d9874f99 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -79,24 +79,6 @@ static void clear_nonstale_queue(struct nonstale_queue *queue)
queue->max_nonstale = NULL;
}
-static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
- struct commit *c)
-{
- if (c->object.flags & ENQUEUED)
- return;
- c->object.flags |= ENQUEUED;
- nonstale_queue_put(queue, c);
-}
-
-static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
-{
- struct commit *commit = nonstale_queue_get(queue);
-
- if (commit)
- commit->object.flags &= ~ENQUEUED;
- return commit;
-}
-
/*
* Priority queue with per-side commit counters for paint_down_to_common().
* Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
--
gitgitgadget
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox