Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] branch: suggest <remote>/<branch> on upstream slip
From: Junio C Hamano @ 2026-06-22 21:35 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget; +Cc: git, Harald Nordgren
In-Reply-To: <xmqq1pdytkmj.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> "Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Harald Nordgren <haraldnordgren@gmail.com>
>>
>> "git branch --set-upstream-to origin main" reads the trailing word as
>> the local branch to operate on and dies with "branch 'main' does not
>> exist", pointing at the wrong problem.
>
> When 'main' does not exist locally,
>
>     $ git branch --set-upstream-to "$anything" main
>
> would fail before even looking at the "$anything" (which is supposed
> to specify the new_upstream for the named local branch 'main').  The
> operation is to set the upstream for 'main', and if 'main' does not
> exist, doesn't the user deserve the error that says 'main' does not
> exist, no matter what "$anything" is, whether it is a well-formed or
> ill-formed remote tracking branch name?
>
> So it is unclear, at least to me, why "branch 'main' does not exist"
> is an inappropriate message, mostly because these three lines does
> not clearly tell me what the user _expected_ the command line to do.

After pondering on this a bit, I _think_ (but I am guessing, and
your job as an author of proposed commit log message is to make sure
your readers do not have to guess) what the user expected was to set
the upstream for the currrent branch.

    When trying to set the upstream for the current branch to "main"
    branch of the remote "origin", i.e.,

        $ git branch --set-upstream-to origin/main

    it is easy for some users to mistakenly say

        $ git branch --set-upstream-to origin main

    But it is a request to set the upstream for the local branch
    "main" to "origin", which is not expected to work as the
    upstream most likely would look like <remote>/<branch> (e.g.,
    "origin/main").  The user would get either one of these errors:

        fatal: branch 'main' does not exist
        fatal: the requested upstream branch 'origin' does not exist

    Give a hint that we _suspect_ the user may have meant to set the
    upstream of the current branch to 'origin/main' (but do so only
    when 'origin/main' does exist), and tell them the right way to
    spell that request.

or something perhaps?

^ permalink raw reply

* Re: [PATCH 0/2] branch/push: suggest intended form when remote/branch slip given
From: Junio C Hamano @ 2026-06-22 21:16 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget; +Cc: git, Harald Nordgren
In-Reply-To: <pull.2331.git.git.1781262619.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> When the repository or upstream argument is a slip like "origin/main" or
> "origin main", suggest the intended "git push origin main" or "git branch
> --set-upstream-to=origin/main" form instead of failing with an unrelated
> error.

Sorry for asking a question that may be stupid, but what does the
word "slip" mean in the context of the above sentence?  I am having
a hard time coming up with a topic name while queuing these two
patches (an obvious candidate is hn/branch-push-slip-advise but I do
not know how well the word sits there).

Thanks.

^ permalink raw reply

* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Kristofer Karlsson @ 2026-06-22 21:03 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <4f9cae3c-5cef-420b-954b-d1981d9d5a67@gmail.com>

On Mon, 22 Jun 2026 at 22:26, Derrick Stolee <stolee@gmail.com> wrote:
>
> I've used hyperfine [1] when doing specific performance tests
> in the past. You can build Git before and after and have hyperfine
> run the two modes and compare them:
>
>         hyperfine --warmup=3 \
>                 -n 'old' "~/git-old/bin-wrappers/git -C $repo merge-base $A $B" \
>                 -n 'new' "~/git-new/bin-wrappers/git -C $repo merge-base $A $B"
>
> [1] https://github.com/sharkdp/hyperfine

I can definitely use that, but I was thinking that the overhead
of operations such as repo_parse_commit would be high relative
to the overhead of the new paint_queue struct such that it would
be hard to properly measure and that it would be easier if I could
spread out that cost across multiple internal runs (which requires
a custom binary of some sort), but perhaps it's enough to just
show that there's no measurable regression here and then
hyperfine is indeed the right fit. I'll start with that and see if I need
to do anything more complex.

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH GSoC RFC v13 06/12] connect: refactor packet writing
From: Karthik Nayak @ 2026-06-22 20:43 UTC (permalink / raw)
  To: Pablo Sabater, gitster
  Cc: peff, eric.peijian, chriscool, git, jltobler, toon,
	chandrapratap3519, Jonathan Tan, Calvin Wan
In-Reply-To: <20260619-ps-eric-work-rebase-v13-6-3d4c7315d2f8@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3712 bytes --]

Pablo Sabater <pabloosabaterr@gmail.com> writes:

[snip]

> diff --git a/connect.c b/connect.c
> index 1dced8e632..78c69d4485 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -700,16 +700,16 @@ int server_supports(const char *feature)
>  	return !!server_feature_value(feature, NULL);
>  }
>
> -void write_fetch_command_and_capabilities(struct strbuf *req_buf,
> -					  const struct string_list *server_options)
> +void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
> +				    const struct string_list *server_options)
>  {
>  	const char *hash_name;
>  	int advertise_sid;
>
>  	repo_config_get_bool(the_repository, "transfer.advertisesid", &advertise_sid);
>
> -	ensure_server_supports_v2("fetch");
> -	packet_buf_write(req_buf, "command=fetch");
> +	ensure_server_supports_v2(command);
> +	packet_buf_write(req_buf, "command=%s", command);
>  	if (server_supports_v2("agent"))
>  		packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
>  	if (advertise_sid && server_supports_v2("session-id"))
> @@ -727,7 +727,7 @@ void write_fetch_command_and_capabilities(struct strbuf *req_buf,
>  			die(_("mismatched algorithms: client %s; server %s"),
>  			    the_hash_algo->name, hash_name);
>  		packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
> -	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
> +	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
>  		die(_("the server does not support algorithm '%s'"),
>  		    the_hash_algo->name);
>  	}

Why did we make this change? If the server doesn't support v2, then the
object format should be `GIT_HASH_SHA1_LEGACY`. While the value of it is
indeed `GIT_HASH_SHA1`, it indicates a scenario where there was no
option to select object hash, which is the scenario here.

If there is a reason to make such a change, perhaps we should highlight
this in the commit message.

> diff --git a/connect.h b/connect.h
> index c4f6ea4b0a..8f4c523892 100644
> --- a/connect.h
> +++ b/connect.h
> @@ -34,8 +34,12 @@ void check_stateless_delimiter(int stateless_rpc,
>  			       struct packet_reader *reader,
>  			       const char *error);
>
> +/*
> + * Writes a command along with the requested server capabilities/features into a
> + * request buffer.
> + */
>  struct string_list;

The comment should be above the function and not the forward
declaration.

While we're here, why not `#include "string-list.h"` and remove the
forward declaration, is there a circular dependency?

> -void write_fetch_command_and_capabilities(struct strbuf *req_buf,
> -					  const struct string_list *server_options);
> +void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
> +				    const struct string_list *server_options);
>
>  #endif
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 4a8a70b5f3..3d32114907 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1387,7 +1387,7 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
>  	int done_sent = 0;
>  	struct strbuf req_buf = STRBUF_INIT;
>
> -	write_fetch_command_and_capabilities(&req_buf, args->server_options);
> +	write_command_and_capabilities(&req_buf, "fetch", args->server_options);
>
>  	if (args->use_thin_pack)
>  		packet_buf_write(&req_buf, "thin-pack");
> @@ -2255,7 +2255,7 @@ void negotiate_using_fetch(const struct oid_array *negotiation_restrict_tips,
>  					   the_repository, "%d",
>  					   negotiation_round);
>  		strbuf_reset(&req_buf);
> -		write_fetch_command_and_capabilities(&req_buf, server_options);
> +		write_command_and_capabilities(&req_buf, "fetch", server_options);
>
>  		packet_buf_write(&req_buf, "wait-for-done");
>
>
> --
> 2.54.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] push: suggest <remote> <branch> for a slash slip
From: Junio C Hamano @ 2026-06-22 20:40 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget; +Cc: git, Harald Nordgren
In-Reply-To: <ea1412b1107f485cf52c953e387a513d95d82b53.1781262619.git.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Harald Nordgren <haraldnordgren@gmail.com>
>
> "git push origin/main" is treated as a repository and dies with
> "'origin/main' does not appear to be a git repository", with no hint
> that a space was meant instead of a slash.

This is easier for me to guess than what the user may have wanted to
do in the decription of [1/2].  But it still will be easier on
readers to say

    When pusing out up update the "main" branch to the remote
    "origin", i.e.,

        $ git push origin main

    it is easy for some users to mistakenly say

        $ git push origin/main

    instead.  This however instructs to push to remote "origin/main"
    with configured refspecs, which means a completely different
    thing.  Lucikly, often origin/main does not exist as a remote
    and the command fails without doing any harm, but still may
    leave the user puzzled what happened.  Give hint to ...

or something like that.

> When the argument is not an existing path or configured remote but its
> part before the first slash names one, suggest the intended
> "git push <remote> <branch>" form. The suggestion is shown as advice so
> it can be silenced with advice.pushRepoLooksLikeRef.

Sounds sensible.

>  	if (repo) {
>  		if (!add_remote_or_group(repo, &remote_group)) {
> +			const char *slash = strchr(repo, '/');
> +			struct remote *r;
> +
> +			/*
> +			 * A "<remote>/<branch>" argument that does not name
> +			 * a path is likely a slip for the separate
> +			 * "<remote> <branch>" form, so suggest that instead.
> +			 */
> +			if (slash && slash[1] && !file_exists(repo)) {
> +				struct strbuf name = STRBUF_INIT;
> +
> +				strbuf_add(&name, repo, slash - repo);
> +				if (remote_is_configured(remote_get(name.buf), 0)) {
> +					int code = die_message(_("'%s' is not a valid push target"), repo);
> +					advise_if_enabled(ADVICE_PUSH_REPO_LOOKS_LIKE_REF,
> +							  _("Did you mean to use: git push %s %s?"),
> +							  name.buf, slash + 1);
> +					strbuf_release(&name);
> +					exit(code);
> +				}
> +				strbuf_release(&name);
> +			}

Hmph, if this class of hint is not enabled, do we still have to
spend cycles on these "is this a remote?  is the first token a
remote?" computation?  I would have expected that a change here
would be a two-liner:

    if (!add_remote_or_group(...)) {
+	if (advise_enabled(ADVICE_PUSH_REPO_LOOKS_LIKE_REF))
+		die_if_plausible_typo(...);
	... do the "try treating it as a direct URL or path" thing ...
    }


with the bulk of the "if it has slash, it is not a file, then advise
and die" logic inside the new helper function.

What I find especially troubling is that even when advise for this
class of hint is not enabled, the new code will hit the new exit(),
without falling back to the "try treating it as a direct URL or
path" thing.  Or am I missing something?

Thanks.


^ permalink raw reply

* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases
From: Derrick Stolee @ 2026-06-22 20:28 UTC (permalink / raw)
  To: Kristofer Karlsson; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren
In-Reply-To: <CAL71e4M0T4fFG4JuYTp_ZPHzNcHXf342Xkh0n0dt4LVKsuSu2Q@mail.gmail.com>

On 6/22/2026 3:25 PM, Kristofer Karlsson wrote:
> On Mon, 22 Jun 2026 at 20:15, Derrick Stolee <stolee@gmail.com> wrote:
>> It's usually my preference to see these tests show up before the
>> new code arrives, that way we can see that they already work with
>> the old logic and continue to work with the new logic.
>>
>> It's minor, but putting them after your code change may be adding
>> enforcement of a change of behavior.
> 
> Agreed, I actually also prefer that in practice so I am not
> sure why I ordered them this way - perhaps some attempt at
> making it easier to review (show the idea and change before
> the verification). I will reorder to put all new tests as the first commit
> (or second, if I will also introduce a status-quo technical first).
> 
>>
>> One thing that could be helpful here is to consider tracing a
>> count of "commits walked" in the merge-base code, then you could
>> have these tests demonstrate the performance benefit by checking
>> for that number changing.
> 
> Good idea, I actually had some of that locally when developing it,
> but I removed the ugly traces before submitting this. I will try to
> re-introduce that in a nice way. It would be neat to let tests
> inspect that side effect, though in the worst case that could make
> it fragile. At the very least it's good for human debugging though.

And to be clear, I'm suggesting using trace2_data_intmax() calls
to get structured data that can be parsed in the GIT_TRACE2_EVENT
logs during tests. It could also be picked up by teletry tools that
listen to trace2 output, if desired.

It will show up differently in GIT_TRACE2_PERF, but that's a nice
human-readable way to debug things.

>> In t6600, that tracing number would not be the same across the
>> three different data shapes (full graph, half graph, no graph) and
>> that could be valuable to demonstrate in tests.
> 
> Agreed, the number of commits visited would be more interesting
> than the relative performance numbers since it's an algorithmic
> change rather than a micro-optimization.
They are both interesting, but only the commit count can be
guaranteed rigorously in the test suite. It's possible that a
great improvement to such a trace doesn't result in great end-to-
end time improvement, but I believe that it is true in this case.

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Derrick Stolee @ 2026-06-22 20:26 UTC (permalink / raw)
  To: Kristofer Karlsson
  Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <CAL71e4NJZ9c_=0W4djRFCYPw4z_dkh_ZHEDWBk8cuwXhxT9jgw@mail.gmail.com>

On 6/22/2026 3:19 PM, Kristofer Karlsson wrote:

> I think I may need to create some type of (temporary, internal)
> test runner that runs the same walk multiple times to reduce
> the noise from parsing commits.
I've used hyperfine [1] when doing specific performance tests
in the past. You can build Git before and after and have hyperfine
run the two modes and compare them:

	hyperfine --warmup=3 \
		-n 'old' "~/git-old/bin-wrappers/git -C $repo merge-base $A $B" \
		-n 'new' "~/git-new/bin-wrappers/git -C $repo merge-base $A $B"

[1] https://github.com/sharkdp/hyperfine

Good luck!
-Stolee


^ permalink raw reply

* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
From: Derrick Stolee @ 2026-06-22 20:23 UTC (permalink / raw)
  To: Kristofer Karlsson
  Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <CAL71e4Pcw-UUbHBw_j6PFx2bXmxZ93VLMWG+3Qap=RmCJa_ZgA@mail.gmail.com>

On 6/22/2026 3:14 PM, Kristofer Karlsson wrote:
> On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote:
>>

>> Also: technically "case 0" should be a BUG() state, right? We
>> shouldn't be walking any commit that isn't reachable from at
>> least one side. (case 0 does happen for old_paint, though.)
> 
> No, this is actually intended - initially I started with skipping
> case 0 and let it fall through, but that would hide _other_ bugs.
> I use 0 as a marker for "not in the queue" so we have this:
> Enqueuing: 0 -> flags
> Dequeueing: flags -> 0
> Only the case with the modified commit being in the queue
> will have non-zero flags. I tried to document this, but perhaps
> it is not clear enough, I will see if I can rephrase it, or add an
> inline comment around the case itself.

I bet this would be obvious if I tried to change the code and
run the tests. thanks for the explanation.

>>> +     while ((commit = paint_queue_get(&queue))) {
>> ...> +
>>> +             if (queue.p1_count + queue.p2_count +
>>> +                 queue.pending_merge_bases == 0)
>>> +                     break;
>>>       }
>> When possible, I like to try to make loops only have one terminating
>> condition. Should we have paint_queue_get() return NULL when it sees
>> this internal state condition?
> 
> Possibly, but that would couple the paint_queue struct very tightly with
> the usage. Not a problem in practice since it only has one call site, and
> it's unlikely that we want to add more of them but it may feel more natural
> to let the paint_queue purely have the queue semantics and counters,
> and keep the halt condition within the function itself. I don't feel
> super-strongly about this and can change it if needed, I will just need to
> verify that nothing else gets complex as a result, I have not fully thought
> through the effects.

Hm. Interesting. The coupling is perhaps expected, because the data
structure tracks counts that don't otherwise need to be tracked.
Maybe the terminating condition method could be descriptively named
to say why it would be completing.

>> Also, I'd rather see it of the form of (!count) instead of using
>> addition to make it clear that we care about each value being zero.
> 
> I did consider that, and most of the code in commit-reach.c at least
> prefers x and !x over x != 0 and x == 0, but my thinking was that
> other code in the repo did use comparison operators specifically
> for things like counters. Happy to change it to conform better though!
I just worry about the idea that a negative number (or an addition
overflow) would create conditions for termination that we did not
intend. That's why using the nonzero status as true/false combined
with ands and ors is better.

>> Finally, I think we actually want this case to get the benefit:
>>
>>         if ((!queue.p1_count || !queue.p2_count) &&
>>             !queue.pending_merge_bases)
>>
>> I do see that you have this condition in patch 3 with the extra
>> detail that the max generation in the queue is finite. I think this
>> is more reason to include this in the data structure method and not
>> in the loop.
> 
> Yes, but just to be clear, you don't want to merge together patch 2 and 3
> here, just grouping the halt conditions closer together
> (within paint_queue_get)? Keeping patch 2 and 3 separate would be nice
> to make it easier to show that introducing this extra counter bookkeeping
> does not negatively impact the overall performance too much.
No, I don't want you to squash them. I was perhaps unclear as I was
discovering the structure as we went. The thing I was missing above
was the "finite generation number" condition, which you make very
clear in patch 3.

Thanks,
-Stolee



^ permalink raw reply

* Re: [PATCH 1/2] branch: suggest <remote>/<branch> on upstream slip
From: Junio C Hamano @ 2026-06-22 19:56 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget; +Cc: git, Harald Nordgren
In-Reply-To: <21684539debaf433b6b63404e1a7622a5cc33283.1781262619.git.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Harald Nordgren <haraldnordgren@gmail.com>
>
> "git branch --set-upstream-to origin main" reads the trailing word as
> the local branch to operate on and dies with "branch 'main' does not
> exist", pointing at the wrong problem.

When 'main' does not exist locally,

    $ git branch --set-upstream-to "$anything" main

would fail before even looking at the "$anything" (which is supposed
to specify the new_upstream for the named local branch 'main').  The
operation is to set the upstream for 'main', and if 'main' does not
exist, doesn't the user deserve the error that says 'main' does not
exist, no matter what "$anything" is, whether it is a well-formed or
ill-formed remote tracking branch name?

So it is unclear, at least to me, why "branch 'main' does not exist"
is an inappropriate message, mostly because these three lines does
not clearly tell me what the user _expected_ the command line to do.

When 'main' does exist, but named upstream "$anything" does not, we
get

    $ git branch sample master ;# make sure the thing exists
    $ git branch --set-upstream-to origin sample
    fatal: the requested upstream branch 'origin' does not exist
    hint:
    hint: If you are planning on basing your work on an upstream
    hint: branch that already exists at the remote, you may need to
    hint: run "git fetch" to retrieve it.
    hint:
    hint: If you are planning to push out a new local branch that
    hint: will track its remote counterpart, you may want to use
    hint: "git push -u" to set the upstream config as you push.
    hint: Disable this message with "git config set advice.setUpstreamFailure false"

which does sound clear enough to me, even though it does not exactly
say "Even though upstream branch 'origin' does not exist, 'origin'
is a nickname for a remote, perhaps you meant to say
origin/something?"

I do not doubt you are trying to address a real issue, but the above
three-line description does not tell me what that problem is.

Now I do not regularly use --set-upstream-to, so I may be missing an
obvious common mistake modes, but a couple of my attempts to make
bad command invocations seem to give me reasonable responses:

    $ git branch --set-upstream-to ko/master sample
    branch 'sample' set up to track 'ko/master'.

OK, both are well formed so no problem.

    $ git branch --set-upstream-to ko/mastre sample
    fatal: the requested upstream branch 'ko/mastre' does not exist
    hint:
    hint: If you are planning on basing your work on an upstream
    hint: branch that already exists at the remote, you may need to
    hint: run "git fetch" to retrieve it.
    hint:
    hint: If you are planning to push out a new local branch that
    hint: will track its remote counterpart, you may want to use
    hint: "git push -u" to set the upstream config as you push.
    hint: Disable this message with "git config set advice.setUpstreamFailure false"

Misspelt upstream branch name diagnosed correctly, just like the
case where I gave 'origin', which does not exist, either.

^ permalink raw reply

* Re: [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc
From: Kristofer Karlsson @ 2026-06-22 19:30 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <50dd5fb1-6b4e-448c-977c-cdc476f7fe40@gmail.com>

On Mon, 22 Jun 2026 at 20:21, Derrick Stolee <stolee@gmail.com> wrote:
>
> I like the idea of documenting this so it's easier to understand.

Yes I was myself thinking that I can prove it to myself now that it works,
and anyone else could also prove it to themselves, but having it
explicit here is even better. I found the other documents
(i.e. commit-graph) to be a good source of inspiration here.

> There is risk of drift from the actual implementation. You may want
> to add a comment to the method in commit-reach.c to indicate that
> any change should be reflected in this document.

Good idea, will add that.

> > +Termination
> > +-----------
> > +
> > +Termination happens when we can prove that no extra progress is
> > +possible. We are done with the main loop when one of the following
> > +conditions holds:
> > +
> > +  1. The queue is empty.
> > +  2. The queue only contains STALE entries.
> > +  3. Side-exhaustion: the walk has reached the finite region and one
> > +     of the sides is fully exhausted.
> It could be an interesting exercise, but potentially wasteful, to
> add this document as a Patch 1, but reflecting the old algorithm
> and then to update the document at the same time as you update the
> code.

I did consider that initially but I was worried it would be considered
noisy. I am quite happy to rework it in a way that first
explains the status quo. That would make the document diff
more interesting. Agreed that should become the first patch,
and the patch that changes the algorithm should include
the documentation change.

> The changes in your patch 2 would impact this doc in terms of the
> data being tracked by the paint_queue data structure instead of the
> nonstale_queue structure (though those details are not currently
> handled in the current version). The change to the termination
> condition would come along with patch 3.

Agreed, I would need to rephrase from tracking non-stale
to tracking counts of p1 and p2 (and pending merge bases) commits,
but I think that would be a small tweak and well worth doing.

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases
From: Kristofer Karlsson @ 2026-06-22 19:25 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren
In-Reply-To: <1588b53d-9576-4752-9459-da48276e4b2a@gmail.com>

On Mon, 22 Jun 2026 at 20:15, Derrick Stolee <stolee@gmail.com> wrote:
> It's usually my preference to see these tests show up before the
> new code arrives, that way we can see that they already work with
> the old logic and continue to work with the new logic.
>
> It's minor, but putting them after your code change may be adding
> enforcement of a change of behavior.

Agreed, I actually also prefer that in practice so I am not
sure why I ordered them this way - perhaps some attempt at
making it easier to review (show the idea and change before
the verification). I will reorder to put all new tests as the first commit
(or second, if I will also introduce a status-quo technical first).

>
> One thing that could be helpful here is to consider tracing a
> count of "commits walked" in the merge-base code, then you could
> have these tests demonstrate the performance benefit by checking
> for that number changing.

Good idea, I actually had some of that locally when developing it,
but I removed the ugly traces before submitting this. I will try to
re-introduce that in a nice way. It would be neat to let tests
inspect that side effect, though in the worst case that could make
it fragile. At the very least it's good for human debugging though.

> In t6600, that tracing number would not be the same across the
> three different data shapes (full graph, half graph, no graph) and
> that could be valuable to demonstrate in tests.

Agreed, the number of commits visited would be more interesting
than the relative performance numbers since it's an algorithmic
change rather than a micro-optimization.

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Kristofer Karlsson @ 2026-06-22 19:19 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <5c43f6ce-4dfe-47dd-b96a-80de57ecf108@gmail.com>

On Mon, 22 Jun 2026 at 20:12, Derrick Stolee <stolee@gmail.com> wrote:
> > +             if (generation < GENERATION_NUMBER_INFINITY &&
> > +                 queue.pending_merge_bases == 0 &&
> > +                 (queue.p1_count == 0 || queue.p2_count == 0))
> > +                     break;
> I mentioned it earlier, but I think this check should be in the
> dequeueing method instead of in the tail of the loop.

Yes, I will try to fold this one into the paint_queue_get as well.

> I like that you broke this out into its own patch to demonstrate
> that this is the key performance boost. It may be good to have
> some performance test numbers that demonstrate that patch 2 does
> not add any substantial overhead (timing should match previous
> code) and in patch 3 this single condition gets us a huge benefit,
> though it requires the data tracking of patch 2 to work.

Good point, I will try to run enough local tests to ensure that patch 2
does not add too much overhead to slow things down.
I think I may need to create some type of (temporary, internal)
test runner that runs the same walk multiple times to reduce
the noise from parsing commits. I am not sure if I should also
commit such a performance test or simply include a brief summary
in the commit message

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
From: Kristofer Karlsson @ 2026-06-22 19:14 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <f0c9eb6e-60b1-4eb6-86be-3af4d87afe85@gmail.com>

On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> > From: Kristofer Karlsson <krka@spotify.com>
>
> > +     if (!(old_paint & STALE)) {
> > +             switch (old_paint & (PARENT1 | PARENT2)) {
> > +             case 0:                  break;
> > +             case PARENT1:            queue->p1_count--; break;
> > +             case PARENT2:            queue->p2_count--; break;
> > +             case PARENT1 | PARENT2:  queue->pending_merge_bases--; break;
> > +             default:                 BUG("unexpected paint state");
> > +             }
> > +     }
> > +     if (!(new_paint & STALE)) {
> > +             switch (new_paint & (PARENT1 | PARENT2)) {
> > +             case 0:                  break;
> > +             case PARENT1:            queue->p1_count++; break;
> > +             case PARENT2:            queue->p2_count++; break;
> > +             case PARENT1 | PARENT2:  queue->pending_merge_bases++; break;
> > +             default:                 BUG("unexpected paint state");
> > +             }
> > +     }
>
> While correct and compact, I don't believe that these switch
> statements follow the coding guidelines. We should split the
> lines appropriately so they are more standard, such as:
>
> if (!(new_paint & STALE)) {
>         switch (new_paint & (PARENT1 | PARENT2)) {
>         case 0:
>                 break;
>
>         case PARENT1:
>                 queue->p1_count++;
>                 break;
>
>         case PARENT2:
>                 queue->p2_count++;
>                 break;
>
>         case PARENT1 | PARENT2:
>                 queue->pending_merge_bases++;
>                 break;
>
>         default:
>                 BUG("unexpected paint state");
>         }
> }

Agreed, I will change to that style. I did try to look for style guidelines
but I missed the .clang-format file (I was only looking through text files).
Apologies, will remember clang-format for next time (and v2)

> Also: technically "case 0" should be a BUG() state, right? We
> shouldn't be walking any commit that isn't reachable from at
> least one side. (case 0 does happen for old_paint, though.)

No, this is actually intended - initially I started with skipping
case 0 and let it fall through, but that would hide _other_ bugs.
I use 0 as a marker for "not in the queue" so we have this:
Enqueuing: 0 -> flags
Dequeueing: flags -> 0
Only the case with the modified commit being in the queue
will have non-zero flags. I tried to document this, but perhaps
it is not clear enough, I will see if I can rephrase it, or add an
inline comment around the case itself.

> > -static void clear_nonstale_queue(struct nonstale_queue *queue)
> > +static void paint_queue_put(struct paint_queue *queue,
> > +                         struct commit *c, unsigned add_flags)
> >  {
> > -     clear_prio_queue(&queue->pq);
> > -     queue->max_nonstale = NULL;
> > -}
> > +     unsigned old_flags = c->object.flags;
> > +     c->object.flags |= add_flags;
>
> Diffs like this are part of the reason I'd like to see a _new_
> data structure instead of replacing the old one. Keeping the
> old one for ahead_behind seems like a good idea to me, but even
> if we don't land on that end state then deleting the old code
> _after_ adding the new code will make the diff more readable.

Agreed, will address that.

> > -     struct nonstale_queue queue = {
> > -             { compare_commits_by_gen_then_commit_date }
> > +     struct paint_queue queue = {
> > +             .pq = { compare_commits_by_gen_then_commit_date }
> >       };
>
> I didn't notice when reading the struct definition, but looking at
> 'pq' here makes me think that we shouldn't be using that abbreviation
> as it could stand for "prio_queue" or "paint_queue".

Good point, I should pick a longer name for the field. Perhaps simply queue
(I want to avoid prio_queue since it exactly matches the name of the struct
which could be confusing.)

> > +     while ((commit = paint_queue_get(&queue))) {
> ...> +
> > +             if (queue.p1_count + queue.p2_count +
> > +                 queue.pending_merge_bases == 0)
> > +                     break;
> >       }
> When possible, I like to try to make loops only have one terminating
> condition. Should we have paint_queue_get() return NULL when it sees
> this internal state condition?

Possibly, but that would couple the paint_queue struct very tightly with
the usage. Not a problem in practice since it only has one call site, and
it's unlikely that we want to add more of them but it may feel more natural
to let the paint_queue purely have the queue semantics and counters,
and keep the halt condition within the function itself. I don't feel
super-strongly about this and can change it if needed, I will just need to
verify that nothing else gets complex as a result, I have not fully thought
through the effects.

> Also, I'd rather see it of the form of (!count) instead of using
> addition to make it clear that we care about each value being zero.

I did consider that, and most of the code in commit-reach.c at least
prefers x and !x over x != 0 and x == 0, but my thinking was that
other code in the repo did use comparison operators specifically
for things like counters. Happy to change it to conform better though!

> Finally, I think we actually want this case to get the benefit:
>
>         if ((!queue.p1_count || !queue.p2_count) &&
>             !queue.pending_merge_bases)
>
> I do see that you have this condition in patch 3 with the extra
> detail that the max generation in the queue is finite. I think this
> is more reason to include this in the data structure method and not
> in the loop.

Yes, but just to be clear, you don't want to merge together patch 2 and 3
here, just grouping the halt conditions closer together
(within paint_queue_get)? Keeping patch 2 and 3 separate would be nice
to make it easier to show that introducing this extra counter bookkeeping
does not negatively impact the overall performance too much.

Thanks! I appreciate the thorough review of this patch
(which I feared was the most annoying one to look at).

Kristofer

^ permalink raw reply

* Re: [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue
From: Kristofer Karlsson @ 2026-06-22 18:53 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <001e8da6-3232-4cfa-ba6b-35d3489e4779@gmail.com>

On Mon, 22 Jun 2026 at 20:00, Derrick Stolee <stolee@gmail.com> wrote:
>
> This change is only needed if we are intending to delete the nonstale
> queue struct, which is currently happening in your patch 2. But we
> are essentially recreating its logic in a more disjointed way here,
> leaving this code in a worse state.
>
> I'd rather see patch 2 create a _new_ data structure instead of
> _replacing_ one that already works for multiple callers. (It does
> drop to only one caller, but that seems cleaner to me right now.)

I can definitely do that and leave ahead_behind unchanged for v2.
I was thinking that with only a single caller, and ahead_behind
being simpler than paint_down in this respect, it would be
worthwhile to simplify it, but if so I could instead do that as
a standalone follow up (though it may prove to be not enough
value for the win).

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted
From: Derrick Stolee @ 2026-06-22 18:22 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <pull.2149.git.1781951820.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> Hi,
> 
> This follows up on my RFC [1] with a concrete proposal. I expect the design
> to still be scrutinized, but that may be easier with actual code to look at.
> 
> I tried to make this easier to review by splitting into atomic patches. The
> first two patches are the meatiest parts, though they are pure refactoring.
> The behavior change is in patch 3 and is in itself quite small. The last
> patch adds technical documentation to support future development.
Thanks for putting this together carefully.

I gave some feedback on the specific code and the patch organization.
Overall, I believe that this implementation is functionally correct
and everything I have to say is about presentation and data gathering.

I look forward to a non-RFC v2.

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc
From: Derrick Stolee @ 2026-06-22 18:21 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <9cbfc67d724d91b9abc3621f03a3c97208c76a70.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> 
> Add a technical document describing the paint_down_to_common()
> algorithm used for merge-base computation.

I like the idea of documenting this so it's easier to understand.

There is risk of drift from the actual implementation. You may want
to add a comment to the method in commit-reach.c to indicate that
any change should be reflected in this document.

> +Termination
> +-----------
> +
> +Termination happens when we can prove that no extra progress is
> +possible. We are done with the main loop when one of the following
> +conditions holds:
> +
> +  1. The queue is empty.
> +  2. The queue only contains STALE entries.
> +  3. Side-exhaustion: the walk has reached the finite region and one
> +     of the sides is fully exhausted.
It could be an interesting exercise, but potentially wasteful, to
add this document as a Patch 1, but reflecting the old algorithm
and then to update the document at the same time as you update the
code.

The changes in your patch 2 would impact this doc in terms of the
data being tracked by the paint_queue data structure instead of the
nonstale_queue structure (though those details are not currently
handled in the current version). The change to the termination
condition would come along with patch 3.

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests
From: Derrick Stolee @ 2026-06-22 18:16 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <faf5bc98ede79965e23bfe1535127d6f52221680.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> Add t6099 to test the case where multiple merge-base candidates exist
> and one is an ancestor of another. This exercises the side-exhaustion
> optimization in paint_down_to_common together with the
> remove_redundant safety net in get_merge_bases_many_0.

Same as the previous patch: I'd like to see these before the code
change. And if we trace a count of commits walked, we'd be able to
see the number change in this specific case.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases
From: Derrick Stolee @ 2026-06-22 18:15 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <91372b975fbe102538c05c7d2cdae356539d1bbd.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Add test cases to t6600-test-reach.sh that exercise edge cases in the
> side-exhaustion optimization for paint_down_to_common():
> 
>  - in_merge_bases_many:self: commit is both A and one of the X inputs
>  - get_merge_bases_many:duplicate-twos: duplicate entries in X list
>  - get_merge_bases_many:pending-stale: STALE transition on an
>    already-painted commit (ps-* diamond topology)
>  - get_merge_bases_many:infinity-both-sides: both tips outside the
>    commit-graph with non-monotonic dates (pi-* topology)

It's usually my preference to see these tests show up before the
new code arrives, that way we can see that they already work with
the old logic and continue to work with the new logic.

It's minor, but putting them after your code change may be adding
enforcement of a change of behavior.

One thing that could be helpful here is to consider tracing a
count of "commits walked" in the merge-base code, then you could
have these tests demonstrate the performance benefit by checking
for that number changing.

In t6600, that tracing number would not be the same across the
three different data shapes (full graph, half graph, no graph) and
that could be valuable to demonstrate in tests.

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Derrick Stolee @ 2026-06-22 18:12 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <ed12a5cb5b76925cff08d2ab61efeda382b4477a.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> 
> Add an early termination check to paint_down_to_common() using the
> per-side counters introduced in the previous commit. Once the walk
> enters the finite-generation region, terminate early when one side's
> exclusive count drops to zero -- no new merge-base can form without
> both paint sides meeting.
> 
> The check also waits for pending_merge_bases to reach zero, ensuring
> all merge-base candidates have been popped and recorded before
> exiting.
> 
> The INFINITY gate ensures correctness: commits without a commit-graph
> entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
> which is not topologically reliable. The optimization only fires
> once the walk enters the finite-generation region where ordering
> guarantees hold.
> 
> On large repositories with commit-graph, this yields 100-1000x
> speedups for merge-base queries where one side (e.g. a PR branch) is
> much smaller than the other.
> 
> Helped-by: Derrick Stolee <stolee@gmail.com>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
>  commit-reach.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/commit-reach.c b/commit-reach.c
> index ba1e896f0f..fcd1ad0167 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -201,6 +201,19 @@ static int paint_down_to_common(struct repository *r,
>  		if (queue.p1_count + queue.p2_count +
>  		    queue.pending_merge_bases == 0)
>  			break;
> +
> +		/*
> +		 * Side exhaustion: a new merge-base can only form
> +		 * when both PARENT1-only and PARENT2-only commits
> +		 * remain in the queue.  In the finite-generation
> +		 * region the queue is ordered topologically, so
> +		 * no future step can add paint to visited commits
> +		 * and an exhausted side cannot reappear.
> +		 */
> +		if (generation < GENERATION_NUMBER_INFINITY &&
> +		    queue.pending_merge_bases == 0 &&
> +		    (queue.p1_count == 0 || queue.p2_count == 0))
> +			break;
I mentioned it earlier, but I think this check should be in the
dequeueing method instead of in the tail of the loop.

But I think this is the correct ending case.

I like that you broke this out into its own patch to demonstrate
that this is the key performance boost. It may be good to have
some performance test numbers that demonstrate that patch 2 does
not add any substantial overhead (timing should match previous
code) and in patch 3 this single condition gets us a huge benefit,
though it requires the data tracking of patch 2 to work.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
From: Derrick Stolee @ 2026-06-22 18:10 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <316e4dfe261043730c77142639f86f5c3cabe370.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>

> +	if (!(old_paint & STALE)) {
> +		switch (old_paint & (PARENT1 | PARENT2)) {
> +		case 0:                  break;
> +		case PARENT1:            queue->p1_count--; break;
> +		case PARENT2:            queue->p2_count--; break;
> +		case PARENT1 | PARENT2:  queue->pending_merge_bases--; break;
> +		default:                 BUG("unexpected paint state");
> +		}
> +	}
> +	if (!(new_paint & STALE)) {
> +		switch (new_paint & (PARENT1 | PARENT2)) {
> +		case 0:                  break;
> +		case PARENT1:            queue->p1_count++; break;
> +		case PARENT2:            queue->p2_count++; break;
> +		case PARENT1 | PARENT2:  queue->pending_merge_bases++; break;
> +		default:                 BUG("unexpected paint state");
> +		}
> +	}

While correct and compact, I don't believe that these switch
statements follow the coding guidelines. We should split the
lines appropriately so they are more standard, such as:

if (!(new_paint & STALE)) {
	switch (new_paint & (PARENT1 | PARENT2)) {
	case 0:
		break;

	case PARENT1:
		queue->p1_count++;
		break;

	case PARENT2:
		queue->p2_count++;
		break;

	case PARENT1 | PARENT2:
		queue->pending_merge_bases++;
		break;

	default:
		BUG("unexpected paint state");
	}
}

Also: technically "case 0" should be a BUG() state, right? We
shouldn't be walking any commit that isn't reachable from at
least one side. (case 0 does happen for old_paint, though.)

>  }
>  
> -static void clear_nonstale_queue(struct nonstale_queue *queue)
> +static void paint_queue_put(struct paint_queue *queue,
> +			    struct commit *c, unsigned add_flags)
>  {
> -	clear_prio_queue(&queue->pq);
> -	queue->max_nonstale = NULL;
> -}
> +	unsigned old_flags = c->object.flags;
> +	c->object.flags |= add_flags;

Diffs like this are part of the reason I'd like to see a _new_
data structure instead of replacing the old one. Keeping the
old one for ahead_behind seems like a good idea to me, but even
if we don't land on that end state then deleting the old code
_after_ adding the new code will make the diff more readable.

> -	struct nonstale_queue queue = {
> -		{ compare_commits_by_gen_then_commit_date }
> +	struct paint_queue queue = {
> +		.pq = { compare_commits_by_gen_then_commit_date }
>  	};

I didn't notice when reading the struct definition, but looking at
'pq' here makes me think that we shouldn't be using that abbreviation
as it could stand for "prio_queue" or "paint_queue".

> +	while ((commit = paint_queue_get(&queue))) {
...> +
> +		if (queue.p1_count + queue.p2_count +
> +		    queue.pending_merge_bases == 0)
> +			break;
>  	}
When possible, I like to try to make loops only have one terminating
condition. Should we have paint_queue_get() return NULL when it sees
this internal state condition?

Also, I'd rather see it of the form of (!count) instead of using
addition to make it clear that we care about each value being zero.

Finally, I think we actually want this case to get the benefit:

	if ((!queue.p1_count || !queue.p2_count) &&
	    !queue.pending_merge_bases)

I do see that you have this condition in patch 3 with the extra
detail that the max generation in the queue is finite. I think this
is more reason to include this in the data structure method and not
in the loop.

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH v3 0/2] environment: move ignore_case into repo_config_values
From: Junio C Hamano @ 2026-06-22 18:01 UTC (permalink / raw)
  To: Tian Yuchen; +Cc: git, ps, phillip.wood123, johannes.schindelin, stolee
In-Reply-To: <b5a9115a-c909-405f-b150-f956d866b1eb@malon.dev>

Tian Yuchen <cat@malon.dev> writes:

> On 6/22/26 04:16, Junio C Hamano wrote:
>> As the compat/ layer is not meant as a general purpose POSIX
>> emulation wrapper that is generally reusable to projects other than
>> us, if we have a knob settable by end users to affect behaviours of
>> lower layer in compat/, it is natural to make repo-settings
>> available to them.
>
> I see.
>
>> What is the perceived problem you have in mind, and what are your
>> proposed alternatives?
>
> Actually, my reason for showing this question wasn’t because I thought 
> there were any architectural problem, but because I felt that for a file 
> in compat/win32, which is more on the _downstream_ side (is that 
> correct?), we need to exercise extra caution and confirm with its 
> maintainer whether the changes are appropriate. That’s why I CC'd 
> Johannes Schindelin on this.
>
> Was that the right thing to do?

Yup, Dscho is the right person to decide on the design issues on
Windows build.

^ permalink raw reply

* Re: [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue
From: Derrick Stolee @ 2026-06-22 18:00 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <5492acda0ad05eab67198880a5262e84a3f22ba6.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> 
> Move ahead_behind() off the shared nonstale_queue abstraction to use
> a plain prio_queue with a local max_nonstale pointer. The nonstale
> tracking is inlined into insert_no_dup().
> 
> This prepares for replacing nonstale_queue with a paint_queue struct
> that tracks per-side commit counts, which ahead_behind() does not
> need. No behavior change.

This change is only needed if we are intending to delete the nonstale
queue struct, which is currently happening in your patch 2. But we
are essentially recreating its logic in a more disjointed way here,
leaving this code in a worse state.

I'd rather see patch 2 create a _new_ data structure instead of
_replacing_ one that already works for multiple callers. (It does
drop to only one caller, but that seems cleaner to me right now.)

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH 3/3] connected: search promisor objects generically
From: Junio C Hamano @ 2026-06-22 17:57 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260622-pks-connected-generic-promisor-checks-v1-3-25eba2698202@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> When performing connectivity checks we have to figure out whether any of
> the new objects are promisor objects, as we cannot assume full
> connectivity if so.
>
> This check is performed by iterating through all packfiles in the
> repository and searching each of them for the given object. Of course,
> this mechanism is quite specific to implementation details of the object
> database, as we assume that it uses packfiles in the first place.
>
> Refactor the logic so that we instead use `odb_for_each_object_ext()`
> with an object prefix filter and the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY`
> flag. This will yield all objects that have the exact object name and
> that are part of a promisor pack in a generic way.
> ...


> -		 *
> -		 * Before checking for promisor packs, be sure we have the
> -		 * latest pack-files loaded into memory.
>  		 */
> -		odb_reprepare(the_repository->objects);

Hmph?

>  		do {
> -			struct packed_git *p;
> -
> -			repo_for_each_pack(the_repository, p) {
> -				if (!p->pack_promisor)
> -					continue;
> -				if (find_pack_entry_one(oid, p))
> -					goto promisor_pack_found;
> +			opts.prefix = oid;
> +
> +			err = odb_for_each_object_ext(the_repository->objects,
> +						      NULL, promised_object_cb,
> +						      NULL, &opts);
> +			if (err < 0)
> +				break;
> +			if (err > 0) {
> +				err = 0;
> +				continue;
>  			}

So we used to manually iterate and stop when we have a matching pack
entry, but now "stop when we find" is done by promisor_object_cb
callback that returns 1.

What is the reason why we no longer odb_(re)prepare() upfront before
going into the loop?  Would it make us miss a newly added promisor
packs?  We will fall back to rev-list for correctness, so it may not
matter, though.

> +
>  			/*
>  			 * Fallback to rev-list with oid and the rest of the
>  			 * object IDs provided by fn.
>  			 */
>  			goto no_promisor_pack_found;
> -promisor_pack_found:
> -			;
>  		} while ((oid = fn(cb_data)) != NULL);
> +
>  		if (opt->err_fd)
>  			close(opt->err_fd);
> -		return 0;
> +		return err;
>  	}
>  
>  no_promisor_pack_found:

^ permalink raw reply

* Re: [PATCH 1/3] odb/source-packed: extract logic to skip certain packs
From: Junio C Hamano @ 2026-06-22 17:51 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260622-pks-connected-generic-promisor-checks-v1-1-25eba2698202@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> The caller can pass flags that allow them to filter out specific kinds
> of objects when iterating objects via `odb_for_each_object()`. This only
> works for "normal" iteration though, as we `BUG()` when the user passes
> flags and specifies an object prefix.
>
> This limitation will be lifted in the next commit. Prepare for this by
> extracting the logic that skips certain kinds of packs so that we can
> easily reuse it.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  odb/source-packed.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)

Quite straight-forward creation of a simple helper function.

> diff --git a/odb/source-packed.c b/odb/source-packed.c
> index 42c28fba0e..3afc4bf01f 100644
> --- a/odb/source-packed.c
> +++ b/odb/source-packed.c
> @@ -126,6 +126,22 @@ static int match_hash(unsigned len, const unsigned char *a, const unsigned char
>  	return 1;
>  }
>  
> +static bool should_exclude_pack(struct packed_git *p, enum odb_for_each_object_flags flags)
> +{
> +	if ((flags & ODB_FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
> +		return true;
> +	if ((flags & ODB_FOR_EACH_OBJECT_PROMISOR_ONLY) &&
> +	    !p->pack_promisor)
> +		return true;
> +	if ((flags & ODB_FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS) &&
> +	    p->pack_keep_in_core)
> +		return true;
> +	if ((flags & ODB_FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS) &&
> +	    p->pack_keep)
> +		return true;
> +	return false;
> +}
> +
>  static int for_each_prefixed_object_in_midx(
>  	struct odb_source_packed *store,
>  	struct multi_pack_index *m,
> @@ -306,17 +322,9 @@ static int odb_source_packed_for_each_object(struct odb_source *source,
>  	for (e = packfile_store_get_packs(packed); e; e = e->next) {
>  		struct packed_git *p = e->pack;
>  
> -		if ((opts->flags & ODB_FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
> -			continue;
> -		if ((opts->flags & ODB_FOR_EACH_OBJECT_PROMISOR_ONLY) &&
> -		    !p->pack_promisor)
> -			continue;
> -		if ((opts->flags & ODB_FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS) &&
> -		    p->pack_keep_in_core)
> -			continue;
> -		if ((opts->flags & ODB_FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS) &&
> -		    p->pack_keep)
> +		if (should_exclude_pack(p, opts->flags))
>  			continue;
> +
>  		if (open_pack_index(p)) {
>  			pack_errors = 1;
>  			continue;

^ permalink raw reply

* Re: [GSoC Patch v7 1/3] path: extract append_formatted_path() and use in rev-parse
From: K Jayatheerth @ 2026-06-22 17:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: a3205153416, git, jltobler, kumarayushjha123, lucasseikioshiro,
	phillip.wood, sandals
In-Reply-To: <xmqq1pdy36me.fsf@gitster.g>

Hey Junio,

On Mon, Jun 22, 2026 at 2:32 AM Junio C Hamano <gitster@pobox.com> wrote:

> So, for the existing user of this logic, the preimage ...
>
> > -static void print_path(const char *path, const char *prefix, enum format_type format, enum default_type def)
> >  {

...

> > -     free(cwd);
> >  }
>
> ... now becomes this postimage.
>

Yes that's right!

> > +static void print_path(const char *path, const char *prefix,
> > +                    enum format_type format, enum default_type def)
> >  {
> > +     struct strbuf sb = STRBUF_INIT;
> > +     enum path_format fmt;
> > +
> > +     if (format == FORMAT_RELATIVE) {
> > +             fmt = PATH_FORMAT_RELATIVE;
> > +     } else if (format == FORMAT_CANONICAL) {
> > +             fmt = PATH_FORMAT_CANONICAL;
> > +     } else /* FORMAT_DEFAULT */ {
> > +             switch (def) {
> > +             case DEFAULT_RELATIVE:
> > +                     fmt = PATH_FORMAT_RELATIVE;
> > +                     break;
> > +             case DEFAULT_RELATIVE_IF_SHARED:
> > +                     fmt = PATH_FORMAT_RELATIVE_IF_SHARED;
> > +                     break;
> > +             case DEFAULT_CANONICAL:
> > +                     fmt = PATH_FORMAT_CANONICAL;
> > +                     break;
> > +             case DEFAULT_UNMODIFIED:
> > +             default:
> > +                     fmt = PATH_FORMAT_UNMODIFIED;
> > +                     break;
> >               }
> >       }
> > +
> > +     append_formatted_path(&sb, path, prefix, fmt);
> > +     puts(sb.buf);
> > +
> > +     strbuf_release(&sb);
> >  }
>
> Mostly, the code translates FORMAT_FOO constants into the new
> PATH_FORMAT_FOO constants, and lets append_formatted_path() do the
> heavy lifting.
>
> It is a minor point, but wouldn't it make it simpler to handle
> format_default first?  I.e.,
>
>         if (format == FORMAT_DEFAULT)
>                 switch (def) {
>                 case DEFAULT_RELATIVE:
>                         format = DEFAULT_RELATIVE;
>                         break;
>                 ...
>                 case DEFAULT_UNMODIFIED:
>                 default:
>                         format = DEFAULT_UNMODIFIED;
>                         break;
>         }
>         switch (format) {
>         case FORMAT_RELATIVE: fmt = PATH_FORMAT_RELATIVE; break;
>         case FORMAT_CANONICAL: fmt = PATH_FORMAT_CANONICAL; break;
>         ...
>         }
>
> Perhaps yes, perhaps not.  I dunno.
>

I see you have continued this point further
I am going to respond to this in detail there.

> > diff --git a/path.c b/path.c
> > index d7e17bf174..6d8e892ada 100644
> > --- a/path.c
> > +++ b/path.c
> > @@ -1579,6 +1579,75 @@ char *xdg_cache_home(const char *filename)
> >       return NULL;
> >  }
> >
> > +void append_formatted_path(struct strbuf *dest, const char *path,
> > +                        const char *prefix, enum path_format format)
> > +{
> > +     switch (format) {
> > +     case PATH_FORMAT_UNMODIFIED:
> > +             strbuf_addstr(dest, path);
> > +             break;
>
> In the orignal "print_path()", DEFAULT/UNMODIFIED did this "show
> unmodified".  OK.
>
> > +     case PATH_FORMAT_RELATIVE: {
> > +             struct strbuf relative_buf = STRBUF_INIT;
> > +             struct strbuf real_path = STRBUF_INIT;
> > +             struct strbuf real_prefix = STRBUF_INIT;
> > +             char *cwd = NULL;
> > +
> > +             /*
> > +              * We don't ever produce a relative path if prefix is NULL,
> > +              * so set the prefix to the current directory so that we can
> > +              * produce a relative path whenever possible.
> > +              */
> > +             if (!prefix)
> > +                     prefix = cwd = xgetcwd();
>
> This is what was done in the original "print_path()" upfront, with
> a similar comment to explay why this happens.  Looking good.  Also
> we no longer call xgetcwd() when we do not need to, which is goodd.
>
> > +             if (!is_absolute_path(path)) {
> > +                     strbuf_realpath_forgiving(&real_path, path, 1);
> > +                     path = real_path.buf;
> > +             }
> > +             if (!is_absolute_path(prefix)) {
> > +                     strbuf_realpath_forgiving(&real_prefix, prefix, 1);
> > +                     prefix = real_prefix.buf;
> > +             }
>
> There used to be a comment explaining why we make realpath calls,
> which is now lost.  Perhaps what the comment said was so obvious
> that we are better off without it?  I offhand do not know.
>

When the logic was a single block, the comment felt necessary to
explain the flow.
By splitting it into explicit switch cases, the logic became a bit
more self-evident, so I removed it to reduce clutter.
I kept the other comments where the reasoning is less obvious.


> What is done to make the paths real is the same as before, which is
> good.
>
> > +             strbuf_addstr(dest, relative_path(path, prefix, &relative_buf));
> > +
> > +             strbuf_release(&relative_buf);
> > +             strbuf_release(&real_path);
> > +             strbuf_release(&real_prefix);
> > +             free(cwd);
> > +             break;
> > +     }
>
> OK.
>
> > +     case PATH_FORMAT_RELATIVE_IF_SHARED: {
> > +             struct strbuf relative_buf = STRBUF_INIT;
> > +
> > +             /*
> > +              * If we're using RELATIVE_IF_SHARED mode, then we want an
> > +              * absolute path unless the two share a common prefix, so don't
> > +              * default the prefix to the current working directory. Doing so
> > +              * would cause a relative path to always be produced if possible.
> > +              */

I thought this comment made sense keeping in for instance.

> Identical to the original, which is good.
> > +
> > +     case PATH_FORMAT_CANONICAL: {
> > +             struct strbuf canonical_buf = STRBUF_INIT;
> > +
> > +             strbuf_realpath_forgiving(&canonical_buf, path, 1);
> > +             strbuf_addbuf(dest, &canonical_buf);
> > +
> > +             strbuf_release(&canonical_buf);
> > +             break;
> > +     }
> > +
> > +     default:
> > +             BUG("unknown path_format value %d", format);
> > +     }
> > +}
>
> OK.
>
> > +/**
> > + * Format a path according to the specified formatting strategy and append
> > + * the result to the given strbuf.
> > + *
> > + * `dest`   : The string buffer to append the formatted path to.
> > + * `path`   : The path string that needs to be formatted.
> > + * `prefix` : The directory prefix to calculate relative offsets against.
> > + * Pass NULL to default to the current working directory where applicable.
> > + * `format` : The formatting behavior rule to execute.
> > + */
> > +void append_formatted_path(struct strbuf *dest, const char *path,
> > +                        const char *prefix, enum path_format format);
> > +
>
> It is slightly unsatisfying that this function is defined to
> "append" to any existing value in the dest strbuf, rather than
> storing the result in the dest strbuf.  The original caller
> print_path() passes an empty strbuf to this helper, so it can let
> strbuf_realpath_*() functions to strbuf_reset() it (e.g.,
> abspath.c:get_root_part() called by strbuf_realpath_1(), wihch in
> turn is called by strbuf_realpath() and strbuf_realpath_forgiving())
> it freely, which means that use of temporary strbuf like
> canonical_buf only to copy it out to dest is wasteful and unneeded.
> But other callers we will have for this helper later may want to
> append to what they already have, so perhaps it is OK (on the other
> hand, we could say that preserving and appending is what these
> callers can do themselves).
>

Hmm, I thought about this for a while.

Then I looked at what ls-tree.c does(using an accumulator).
They already routinely use temporary `strbuf`s to calculate
relative/absolute paths before
appending them to their main output string.

Because callers who need to accumulate can easily do the preserving
and appending
themselves with a temporary buffer, there is no reason to force that
overhead into our helper.

I will change the semantics from "append" to "replace", rename the
helper back to `format_path()`.
I hope I am looking at ls-tree.c correctly here : )

Eliminate the wasteful `canonical_buf` allocations so we can pass the
destination buffer directly to functions like
`strbuf_realpath_forgiving()`.
This is a good suggestion actually, thanks!

> Otherwise, looking good as a no-op bug-to-bug compatible rewrite,
> with a slight optimization (to skip xgetcwd()).
>
> Thanks.

On Mon, Jun 22, 2026 at 9:33 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:

> > ...
> > It is a minor point, but wouldn't it make it simpler to handle
> > format_default first?  I.e.,
> >
> >       if (format == FORMAT_DEFAULT)
> >               switch (def) {
> >               case DEFAULT_RELATIVE:
> >                       format = DEFAULT_RELATIVE;
> >                       break;
> >               ...
> >               case DEFAULT_UNMODIFIED:
> >               default:
> >                       format = DEFAULT_UNMODIFIED;
> >                       break;
> >       }
> >       switch (format) {
> >         case FORMAT_RELATIVE: fmt = PATH_FORMAT_RELATIVE; break;
> >       case FORMAT_CANONICAL: fmt = PATH_FORMAT_CANONICAL; break;
> >       ...
> >       }
> >
> > Perhaps yes, perhaps not.  I dunno.
>
> I do not consider the above an blocker, but it might make a
> difference if we are going to acquire more modes and formats, so
> once somebody tries to rewrite the logic and finds the resulting
> code harder to follow (or not easier to follow), I would be happy to
> see the above discarded ;-)
>

True, if new formats are introduced
this would instantly become sloppy.

I will change it to future proof since I am
looking to send v8 for append_formatted_path().

Although I would be surprised to see an example for a new format.

> >> +/**
> >> + * Format a path according to the specified formatting strategy and append
> >> + * the result to the given strbuf.
> >> + *
> >> + * `dest`   : The string buffer to append the formatted path to.
> >> + * `path`   : The path string that needs to be formatted.
> >> + * `prefix` : The directory prefix to calculate relative offsets against.
> >> + * Pass NULL to default to the current working directory where applicable.
> >> + * `format` : The formatting behavior rule to execute.
> >> + */
> >> +void append_formatted_path(struct strbuf *dest, const char *path,
> >> +                       const char *prefix, enum path_format format);
> >> +
> >
> > It is slightly unsatisfying that this function is defined to
> > "append" to any existing value in the dest strbuf, rather than
> > storing the result in the dest strbuf.  The original caller
> > print_path() passes an empty strbuf to this helper, so it can let
> > strbuf_realpath_*() functions to strbuf_reset() it (e.g.,
> > abspath.c:get_root_part() called by strbuf_realpath_1(), wihch in
> > turn is called by strbuf_realpath() and strbuf_realpath_forgiving())
> > it freely, which means that use of temporary strbuf like
> > canonical_buf only to copy it out to dest is wasteful and unneeded.
> > But other callers we will have for this helper later may want to
> > append to what they already have, so perhaps it is OK (on the other
> > hand, we could say that preserving and appending is what these
> > callers can do themselves).
>
> This one we may want to consider a bit more seriously, but it is
> entirely up to the future callers of the helper.  If it would make
> the callers much easier to write for this helper to have "append"
> semantics, I'd be happy to accept the semantics of the above as-is,
> but otherwise, I suspect it would be simpler to use if the helper is
> defined to replase dest with the result, instead of appending the
> result to dest.
>

I am still unsure if I am following ls-tree.c correctly.
If I am then I think it is a very good change to have for v8 as I
specified above.

> > Otherwise, looking good as a no-op bug-to-bug compatible rewrite,
> > with a slight optimization (to skip xgetcwd()).
>
> This part of the review does not change in any case.  The
> refactoring looks good.

Thank you ; )

Regards,
- K Jayatheerth

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox