Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 0/6] receive-pack: use ODB transactions to stage object writes
From: Junio C Hamano @ 2026-06-24 20:09 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps
In-Reply-To: <20260624041920.2601961-1-jltobler@gmail.com>

Justin Tobler <jltobler@gmail.com> writes:

> Greetings,
>
> This patch series replaces direct usage of the `tmp_objdir` interfaces
> in git-receive-pack(1) to instead use the `odb_transaction` interfaces
> to create/manage a staging area to write objects to. The purpose of this
> change is to get git-receive-pack(1) one step closer to being ODB
> backend agnostic. For now, the object writes themselves are still
> "files" backend specific due to being handled by the git-index-pack(1)
> and git-unpack-objects(1) child processes. This will be tackled in a
> separate series though.
>
> Thanks,
> -Justin

The integration cycle this morning was somehow more painful than
other cycles.  

FYI, this topic had some interactions with ps/odb-source-packed and
ps/refs-avoid-chdir-notify-reparent and needed the following
evil-merge fix to make it build.

commit 721c15c1d5ef8f8aab8b9f50463e979e890b1ab6
Author: Junio C Hamano <gitster@pobox.com>
Date:   Wed Jun 24 12:45:35 2026 -0700

    merge-fix/jt/receive-pack-use-odb-transactions
    
    conflict with ps/odb-source-packed and ps/refs-avoid-chdir-notify-reparent

diff --git a/odb/source-packed.c b/odb/source-packed.c
index c3c26fb53e..b9231a8da0 100644
--- a/odb/source-packed.c
+++ b/odb/source-packed.c
@@ -545,7 +545,8 @@ static int odb_source_packed_write_object_stream(struct odb_source *source UNUSE
 }
 
 static int odb_source_packed_begin_transaction(struct odb_source *source UNUSED,
-					       struct odb_transaction **out UNUSED)
+					       struct odb_transaction **out UNUSED,
+					       enum odb_transaction_flags flags UNUSED)
 {
 	return error("packed backend cannot begin transactions");
 }
diff --git a/t/unit-tests/u-reftable-table.c b/t/unit-tests/u-reftable-table.c
index a35bbc8004..dd5db06534 100644
--- a/t/unit-tests/u-reftable-table.c
+++ b/t/unit-tests/u-reftable-table.c
@@ -235,7 +235,8 @@ void test_reftable_table__seek_invalid_log_offset(void)
 	uint8_t *footer;
 
 	cl_reftable_write_to_buf(&buf, refs, ARRAY_SIZE(refs),
-				 logs, ARRAY_SIZE(logs), NULL);
+				 logs, ARRAY_SIZE(logs),
+				 REFTABLE_HASH_SHA1, NULL);
 
 	/*
 	 * Corrupt the log section offset stored in the footer so that it points
@@ -278,7 +279,8 @@ void test_reftable_table__new_with_truncated_table(void)
 	struct reftable_table *table;
 	struct reftable_buf buf = REFTABLE_BUF_INIT;
 
-	cl_reftable_write_to_buf(&buf, refs, ARRAY_SIZE(refs), NULL, 0, NULL);
+	cl_reftable_write_to_buf(&buf, refs, ARRAY_SIZE(refs), NULL, 0,
+				 REFTABLE_HASH_SHA1, NULL);
 
 	/*
 	 * Truncate the table so that it is large enough to read the header, but

^ permalink raw reply related

* Re: [PATCH v4 1/3] replay: refactor enum replay_mode into a bool
From: Toon Claes @ 2026-06-24 19:15 UTC (permalink / raw)
  To: Junio C Hamano, Patrick Steinhardt; +Cc: git, Elijah Newren
In-Reply-To: <xmqq7bnq37jm.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> That's fair, and the result is easier to write. But is it really easier
>> to read?

You're bringing up a very valid point there, and not directly in what
you're saying, but how it makes me reconsider.

So we're comparing:

    pick_regular_commit(revs->repo, commit, replayed_commits,
                        mode == REPLAY_MODE_REVERT ? last_commit : onto,
                        &merge_opt, &result, mode, opts->empty);

with:

    pick_regular_commit(revs->repo, commit, replayed_commits,
                        reverse ? last_commit : onto,
                        &merge_opt, &result, reverse, opts->empty);

You can argue which of both is easier to read, but the problem isn't
really whether it's a bool or an enum, but the ternary operator in this
lengthy function call is. That is the problem I was trying to solve, and
converting enum to bool isn't really the solution.

>> And what if we ever have to create a third mode going forward?

Personally I find this weak argument. As far as I know we most of the
time do not write code in a way so "it will be ready to add X in the
future". In my personal experience, I'm always wrong in predicting what
might be added in the future. Although I must say this case is
different, because we're not adding something new, no this commit was
dumbing down something existing. So I'll revisit this commit in the next
iteration.

>> I'm generally no fan of booleans as parameters as they basically give
>> you no information at all at the callsite, except if you're lucky and
>> you already have an aptly-named variable available that you can pass.
>> Which seems to be the case here, but I'm still not sure whether this
>> change really improves the code.

That's also a very valid argument, which I didn't take in mind.

> I tend to agree with you on both counts.  The "what happens when
> somebody else wants a third choice?" is a quesiton I would ask the
> first thing as the maintainer of a project.
>
> Even if the boolean parameter is so obviously named, the callsite
> can only say "true" or "false", unlike some other popular languages
> that lets you say
>
> 	my_function(use_revert_mode=true, verbose=false);
>
> and you cannot tell what effect the author wanted out of that "true"
> if all you can write were
>
> 	my_function(true, false);
>
> Of course, we could go ultra verbose, like
>
> 	my_function(true, /* use_revert_mode */
> 		    false, /* verbose */);
>
> but then we are often better off writing:
>
> 	my_function(REPLAY_MODE_REVERT, REPLAY_QUIET);

Thanks for bringing in this illustrative example. Point made, I'll
revisit.

-- 
Cheers,
Toon

^ permalink raw reply

* Re: [PATCH 2/6] object-file: propagate files transaction errors
From: Junio C Hamano @ 2026-06-24 18:35 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps
In-Reply-To: <20260624041920.2601961-3-jltobler@gmail.com>

Justin Tobler <jltobler@gmail.com> writes:

> The "files" transaction backend may encounter errors related to managing
> the temporary directory used to stage objects, but silently ignores
> these errors. Instead return errors encountered in the
> `odb_transaction_files_{prepare,begin,commit}()` interfaces to allow
> callers to handle as needed.

"handle them as needed", perhaps.

> -static void odb_transaction_files_prepare(struct odb_transaction *base)
> +static int odb_transaction_files_prepare(struct odb_transaction *base)
>  {
>  	struct odb_transaction_files *transaction =
>  		container_of_or_null(base, struct odb_transaction_files, base);
> @@ -511,11 +511,15 @@ static void odb_transaction_files_prepare(struct odb_transaction *base)
>  	 * added at the time they call odb_transaction_files_begin.
>  	 */
>  	if (!transaction || transaction->objdir)
> -		return;
> +		return 0;
>  
>  	transaction->objdir = tmp_objdir_create(base->source->odb->repo, "bulk-fsync");

If this fails, NULL is returned, and ...

> -	if (transaction->objdir)
> -		tmp_objdir_replace_primary_odb(transaction->objdir, 0);
> +	if (!transaction->objdir)
> +		return -1;

... we return -1 from here to signal an error now.

But callers of this function in write_loose_object(), and
odb_source_loose_write_stream() are not prepared to react to such an
error.

I guess this is nothing new.  The callers ignored such an error from here
in the original and proceeded writing the primary ODB anyway, and we
continue to do so after this step.

> @@ -542,13 +546,13 @@ static void fsync_loose_object_transaction(struct odb_transaction *base,
>  /*
>   * Cleanup after batch-mode fsync_object_files.
>   */
> -static void flush_loose_object_transaction(struct odb_transaction_files *transaction)
> +static int flush_loose_object_transaction(struct odb_transaction_files *transaction)
>  {
>  	struct strbuf temp_path = STRBUF_INIT;
>  	struct tempfile *temp;
>  
>  	if (!transaction->objdir)
> -		return;
> +		return 0;
>  
>  	/*
>  	 * Issue a full hardware flush against a temporary file to ensure
> @@ -570,8 +574,12 @@ static void flush_loose_object_transaction(struct odb_transaction_files *transac
>  	 * Make the object files visible in the primary ODB after their data is
>  	 * fully durable.
>  	 */
> -	tmp_objdir_migrate(transaction->objdir);
> +	if (tmp_objdir_migrate(transaction->objdir))
> +		return -1;
> +
>  	transaction->objdir = NULL;
> +
> +	return 0;
>  }

The caller of this function does react to a failure of it, ...

> @@ -1670,27 +1678,34 @@ int read_loose_object(struct repository *repo,
>  	return ret;
>  }
>  
> -static void odb_transaction_files_commit(struct odb_transaction *base)
> +static int odb_transaction_files_commit(struct odb_transaction *base)
>  {
>  	struct odb_transaction_files *transaction =
>  		container_of(base, struct odb_transaction_files, base);
>  
> -	flush_loose_object_transaction(transaction);
> +	if (flush_loose_object_transaction(transaction))
> +		return -1;
>  	flush_packfile_transaction(transaction);
> +
> +	return 0;
>  }

... like this, which is good.  Do we need an explicit "abort-transaction",
or is that implicit?

Thanks.

^ permalink raw reply

* Re: [PATCH 1/6] object-file: rename files transaction prepare function
From: Junio C Hamano @ 2026-06-24 18:26 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps
In-Reply-To: <20260624041920.2601961-2-jltobler@gmail.com>

Justin Tobler <jltobler@gmail.com> writes:

> The "files" ODB transaction backend lazily creates a temporary object
> directory when the first loose object is written to the transaction via
> `prepare_loose_object_transaction()`. In a subsequent commit, the
> temporary directory is used to also write packfiles to.
>
> Rename the function to `odb_transaction_files_prepare()` accordingly.

Taken by itself this renaming does make sense, but there are many
other function that follow the historical naming convention, like
{fsync,flush}_loose_object_transaction().  Should we rename them for
consistency with the new naming scheme, not necessarily as part of
this series but with a todo comment to do so once the dust settles,
or something?

^ permalink raw reply

* Re: [GSoC Patch v8 1/3] path: extract format_path() and use in rev-parse
From: Junio C Hamano @ 2026-06-24 18:15 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: a3205153416, git, jltobler, kumarayushjha123, lucasseikioshiro,
	phillip.wood, sandals
In-Reply-To: <20260624033748.108281-2-jayatheerthkulkarni2005@gmail.com>

K Jayatheerth <jayatheerthkulkarni2005@gmail.com> writes:

> +void format_path(struct strbuf *dest, const char *path,
> +		 const char *prefix, enum path_format format)
> +{
> +	strbuf_reset(dest);
> +
> +	switch (format) {
> +	case PATH_FORMAT_UNMODIFIED:
> +		strbuf_addstr(dest, path);
> +		break;
> +
> +	case PATH_FORMAT_RELATIVE: {
> ...
> +		strbuf_addstr(dest, relative_path(path, prefix, &relative_buf));
> +
> +		strbuf_release(&relative_buf);
> +		strbuf_release(&real_path);
> +		strbuf_release(&real_prefix);
> +		free(cwd);
> +		break;
> +	}
> +
> +	case PATH_FORMAT_RELATIVE_IF_SHARED: {
> ...
> +		strbuf_addstr(dest, relative_path(path, prefix, &relative_buf));
> +		strbuf_release(&relative_buf);
> +		break;
> +	}
> +
> +	case PATH_FORMAT_CANONICAL:
> +		/*
> +		 * strbuf_realpath_forgiving inherently resets the destination
> +		 * buffer, safely aligning with our replace semantics.
> +		 */
> +		strbuf_realpath_forgiving(dest, path, 1);
> +		break;
> +
> +	default:
> +		BUG("unknown path_format value %d", format);
> +	}
> +}

Hmph.

I was hoping that we could lose even more strbuf, but since
relative_path() does not always leave its result in the strbuf that
is passed to it as its third parameter, we do need addstr() into
dest, which is a bit unsatisfying but not a fault of this patch at
all.  At least, we lost extra copy in the canonical codepath ;-)

Looking good.  Thanks.

^ permalink raw reply

* Re: [PATCH 0/6] odb: refactor source-specific information in object info
From: Junio C Hamano @ 2026-06-24 17:13 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> this patch series refactors `struct object_info` to not contain the
> `whence` field anymore.
>
> This field only gave the caller information about the type of source
> this was read from, but it didn't allow them to figure out which source
> specifically yielded the object. So instead, we replace this information
> with a new `struct object_info_source` field that both contains info
> about the source, and any backend-specific data.

Great.


^ permalink raw reply

* Re: [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc
From: Junio C Hamano @ 2026-06-24 17:09 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget
  Cc: git, Derrick Stolee, Elijah Newren, Kristofer Karlsson
In-Reply-To: <19ed743bd10be5341eee040eb8070876b984773d.1782303254.git.gitgitgadget@gmail.com>

"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Kristofer Karlsson <krka@spotify.com>
>
> Add a technical document describing the paint_down_to_common()
> algorithm used for merge-base computation, covering the paint
> walk, generation number regions, and termination conditions.
>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
>  Documentation/Makefile                        |   1 +
>  Documentation/technical/meson.build           |   1 +
>  .../technical/paint-down-to-common.adoc       | 114 ++++++++++++++++++
>  commit-reach.c                                |   6 +-
>  4 files changed, 121 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/technical/paint-down-to-common.adoc

Great write-up that very clearly and concisely explains what goes on
inside the merge-base computation.  Thanks for a pleasant read.

^ permalink raw reply

* Re: [PATCH v2 2/4] odb/source-packed: support flags when iterating an object prefix
From: Christian Couder @ 2026-06-24 17:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Christian Couder
In-Reply-To: <20260624-pks-connected-generic-promisor-checks-v2-2-132d73ee47b9@pks.im>

On Wed, Jun 24, 2026 at 12:37 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> Callers of `odb_for_each_object()` can specify an optional object name
> prefix so that we only yield objects that match it. This is incompatible
> though with passing flags at the same time, as we don't yet know to
> handle them.
>
> Loosen this restriction by calling `should_exclude_pack()`.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  odb/source-packed.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/odb/source-packed.c b/odb/source-packed.c
> index 3afc4bf01f..6f31f0ff94 100644
> --- a/odb/source-packed.c
> +++ b/odb/source-packed.c
> @@ -148,6 +148,7 @@ static int for_each_prefixed_object_in_midx(
>         const struct odb_for_each_object_options *opts,
>         struct odb_source_packed_for_each_object_wrapper_data *data)
>  {
> +       bool pack_errors = false;
>         int ret;
>
>         for (; m; m = m->base_midx) {
> @@ -171,6 +172,20 @@ static int for_each_prefixed_object_in_midx(
>                         const struct object_id *current = NULL;
>                         struct object_id oid;
>
> +                       if (opts->flags) {
> +                               uint32_t pack_id = nth_midxed_pack_int_id(m, i);
> +                               struct packed_git *pack;
> +
> +                               if (prepare_midx_pack(m, pack_id)) {
> +                                       pack_errors = true;
> +                                       continue;
> +                               }
> +
> +                               pack = nth_midxed_pack(m, pack_id);
> +                               if (should_exclude_pack(pack, opts->flags))
> +                                       continue;
> +                       }
> +
>                         current = nth_midxed_object_oid(&oid, m, i);
>
>                         if (!match_hash(len, opts->prefix->hash, current->hash))

It looks like this is:

                        if (!match_hash(len, opts->prefix->hash, current->hash))
                                break;

and I wonder if the `if (opts->flags) { ... }` block would be better
after that prefix check rather than before it.

Putting it after the prefix check would make sure we don't continue
when the prefix doesn't match.

^ permalink raw reply

* Re: [PATCH] gpg-interface: fix strip_cr_before_lf to only remove CR before LF
From: Junio C Hamano @ 2026-06-24 16:58 UTC (permalink / raw)
  To: Antonio De Stefani; +Cc: git
In-Reply-To: <20260624093618.17456-1-antonio.destefani08@gmail.com>

Antonio De Stefani <antonio.destefani08@gmail.com> writes:

> c4adea82 (Convert CR/LF to LF in tag signatures, 2008-07-11)
> introduced CR stripping for GPG output on Windows, but intentionally
> stripped all CR characters unconditionally to "keep the code simpler",
> even though only \r\n sequences (Windows line endings) needed to be
> normalized. 2f47eae2 (Split GPG interface into its own helper library,
> 2011-09-07) moved the code into gpg-interface.c, and 29b31577 (ssh
> signing: add ssh key format and signing code, 2021-09-10) extracted
> it into the remove_cr_after() helper when adding SSH signing support.
>
> The original laziness was safe at the time because lone CR characters
> are not expected in GPG signature output. However, the NEEDSWORK
> comment left by a previous reader correctly identified that only
> \r\n pairs should be stripped, not lone \r characters.
>
> Fix the loop to skip \r only when immediately followed by \n, keeping
> lone trailing CR characters intact. Rename the function to
> strip_cr_before_lf to reflect its corrected behavior, and update
> both call sites and their comments accordingly.
>
> Signed-off-by: Antonio De Stefani <antonio.destefani08@gmail.com>
> ---
>  gpg-interface.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)

Looking good.  Will queue. Thanks.

^ permalink raw reply

* Re: [PATCH v2 4/4] connected: search promisor objects generically
From: Junio C Hamano @ 2026-06-24 16:27 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder
In-Reply-To: <20260624-pks-connected-generic-promisor-checks-v2-4-132d73ee47b9@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> When performing connectivity checks we have to figure out whether any of
> the new objects are promisor objects, as we cannot assume full
> connectivity if so.
>
> This check is performed by iterating through all packfiles in the
> repository and searching each of them for the given object. Of course,
> this mechanism is quite specific to implementation details of the object
> database, as we assume that it uses packfiles in the first place.
>
> Refactor the logic so that we instead use `odb_for_each_object_ext()`
> with an object prefix filter and the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY`
> flag. This will yield all objects that have the exact object name and
> that are part of a promisor pack in a generic way.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  connected.c | 32 +++++++++++++++++++++-----------
>  1 file changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/connected.c b/connected.c
> index d2b334173f..b557ff5db9 100644
> --- a/connected.c
> +++ b/connected.c
> @@ -11,6 +11,13 @@
>  #include "packfile.h"
>  #include "promisor-remote.h"
>  
> +static int promised_object_cb(const struct object_id *oid UNUSED,
> +			      struct object_info *oi UNUSED,
> +			      void *payload UNUSED)
> +{
> +	return 1;
> +}
> +
>  /*
>   * For partial clones, we don't want to have to do a regular connectivity check
>   * because we have to enumerate and exclude all promisor objects (slow), and
> @@ -30,25 +37,28 @@ static int check_connected_promisor(oid_iterate_fn fn,
>  				    void *cb_data,
>  				    const struct object_id **oid)
>  {
> +	struct odb_for_each_object_options opts = {
> +		.flags = ODB_FOR_EACH_OBJECT_PROMISOR_ONLY,
> +		.prefix_hex_len = the_repository->hash_algo->hexsz,
> +	};
> +	int err;
> +
>  	odb_reprepare(the_repository->objects);
>  	do {
> -		struct packed_git *p;
> +		opts.prefix = *oid;
>  
> -		repo_for_each_pack(the_repository, p) {
> -			if (!p->pack_promisor)
> -				continue;
> -			if (find_pack_entry_one(*oid, p))
> -				goto promisor_pack_found;
> -		}
> +		err = odb_for_each_object_ext(the_repository->objects,
> +					      NULL, promised_object_cb,
> +					      NULL, &opts);

promised_object_cb() returns 1 without any computation since we are
only interested in learning ODB_FOR_EACH_OBJECT_PROMISOR_ONLY finds
any such object.

odb_for_each_object_ext() returns 0 (if it iterates all the sources
to the end), but if its call to odb_source_for_each_object() yields
non-zero value, the returned value comes back as "err" here,
terminating the for-each iteration immediately.

odb_source_for_each_object() is implemented differently per the
source backend, but taking an example of "packfile" backend,
packfile_loose_for_each_object() ends up calling cb (wrapped in
packfile_store_for_each_object_wrapper_data) via
for_each_object_in_pack(), which stops immediately when cb returns
non-zero and the value returned from there is the value given by cb,
i.e., 1.  So we will have err==1 when we find any object.

> +		if (err < 0)
> +			return err;

And err presumably is 1 in such a case, so this does not trigger.

>  		/*
>  		 * We have found an object that is not part of a promisor pack,
>  		 * and thus we cannot skip the full connectivity check.
>  		 */
> -		return 0;
> -
> -promisor_pack_found:
> -		;
> +		if (err > 0)
> +			return 0;

And this does.

I may be misreading the patch, but as we return 0 from here, do we
cause the caller to fall back to full connectivity check?  The
caller, check_connected(), sees a zero returned from here.

>  	} while ((*oid = fn(cb_data)) != NULL);
>  
>  	return 1;

^ permalink raw reply

* Re: Fetching missing submodule refs unnecessarily fatal
From: Ben Knoble @ 2026-06-24 16:15 UTC (permalink / raw)
  To: Mike Crowe; +Cc: Git maillinglist
In-Reply-To: <ajvouniXVAPH8nyZ@mcrowe.com>

[re adding list, woops!]

> Le 24 juin 2026 à 10:24, Mike Crowe <mac@mcrowe.com> a écrit :
> 
> On Wednesday 24 June 2026 at 08:39:39 -0400, Ben Knoble wrote:
>> 
>>>> Le 23 juin 2026 à 11:04, Mike Crowe <mac@mcrowe.com> a écrit :
>>> 
>>> When Git fetches in a superproject with --recurse-submodules, it appears to
>>> try to fetch the corresponding submodule repository commits for every new
>>> or updated superproject branch. Presumably this is so that everthing is
>>> ready to switch to one of those branches without further fetching.
>>> 
>>> Developers may create commits that contain submodules that reference
>>> commits in the submodule repository, but those commits may not be pushed to
>>> the submodule's remote repository. When the superproject commits are pushed
>>> to a personal remote branch anyone else's Git fetch cannot find the
>>> corresponding submodule commit and fails.
>> 
>> This is the part that confuses me: if a (public) commit of history refers
>> to a submodule at a particular commit, and that commit is not available
>> anywhere, then we won’t be able to properly update submodules when using
>> that commit. That creates a problem!
> 
> It does. But only for that user's personal branch. Even though it is
> public, a personal branch is mostly only for the use of that user and it
> doesn't matter to anyone but them. (The user is probably working
> simultaneously on both the superproject and the submodule.)
> 
>> Why not instead make sure the submodule commit is also available for fetching?
> 
> This relies on the user realising what they've done. They might even think
> that they've made the right submodule commit available but forgot that they
> rebased it before pushing or something changing the commit hash.

Yeah, the recurse-submodules option for pushing makes this easier to notice/act on, too. 

> From a resilience point of view it shouldn't be possible for someone who
> can push changes to their own personal branch to perform a "denial of
> service" on anyone else fetching from the repository by making that fetch
> fail.
> 
> I hope this makes it clearer.

That does make some sense: I wouldn’t want to get interrupted by a colleague or collaborator’s bad push of an unrelated branch. 

> Thanks.
> 
> Mike.
> 
> (Was there a reason that you didn't reply to the list?)

Nope, mis-click. 

^ permalink raw reply

* Re: [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Derrick Stolee @ 2026-06-24 15:07 UTC (permalink / raw)
  To: Kristofer Karlsson
  Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <CAL71e4N1zMz=v9umGdGPTvLP1nF-tNLVQc+vAEBnekt2L0b6zQ@mail.gmail.com>

On 6/24/2026 10:47 AM, Kristofer Karlsson wrote:
> On Wed, 24 Jun 2026 at 16:02, Derrick Stolee <stolee@gmail.com> wrote:

>>> -     test_trace2_data paint_down_to_common steps 81 <trace-half.txt
>>> +     test_trace2_data paint_down_to_common steps 57 <trace-half.txt
>>>  '
>> I love to see these steps change. If you take my suggestion to
>> update more tests with these checks, then this diff will get bigger
>> (but in a deserved way).
> 
> I will try to add them to some (but not all) tests since it's more
> closely related to performance than correctness and I want to
> avoid making too many tests overly fragile.
In this case, I think it's more about protecting all of our special-
cased termination conditions. The rigidity means that it is hard to
accidentally change the behavior. It does have the downside that
more tests need to change if there is an intentional change, but it
also gives the same _evidence_ that the change has the intended
impact.

We are definitely leaning into personal preferences, though. There
is no hard rule one way or another.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Kristofer Karlsson @ 2026-06-24 14:47 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <6b0d81e7-7617-4fb4-9e39-cdf8bc778837@gmail.com>

On Wed, 24 Jun 2026 at 16:02, Derrick Stolee <stolee@gmail.com> wrote:
>
> I see how the previous implementation has a termination condition
> before calling prio_queue_get(), which is technically more
> efficient. It does make this initial diff a bit more complicated
> because we are moving the prio_queue_get() line.

I was thinking the efficiency here does not matter in practice -
prio_queue_get() only returns NULL once, and all other times
where we keep looping we do need the value.

I agree it does get a bit complex though.

> If the introduction of the method in patch 5/7 looked like this:
>
> +static struct commit *paint_queue_get(struct paint_state *state)
> +{
> +       struct commit *commit = prio_queue_get(&state->queue);
> +
> +       if (!commit)
> +               return NULL;
> +
> +       if (!state->p1_count && !state->p2_count &&
> +           !state->pending_merge_bases)
> +               return NULL;
> +
> +       commit->object.flags &= ~ENQUEUED;
> +       paint_count_update(state, commit->object.flags, -1);
> +       return commit;
> +}
>
> Then this diff would look cleaner.
>
> (This is the nittiest of nitpicks so feel free to ignore if this
> doesn't bother you at all.)

That's a good point. It doesn't technically bother me,
but it would be cleaner. The refactor commit would effectively
be looking into the future and prepare for it. I can change it for
the next version - my only thinking was that the current refactor
patch matched my original idea for how to best handle
the halt condition, but that did indeed change after this discussion.

> > -     test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> > +     test_trace2_data paint_down_to_common steps 57 <trace-half.txt
> >  '
> I love to see these steps change. If you take my suggestion to
> update more tests with these checks, then this diff will get bigger
> (but in a deserved way).

I will try to add them to some (but not all) tests since it's more
closely related to performance than correctness and I want to
avoid making too many tests overly fragile.

> Also, when I suggested that 'test_all_modes' creates the trace
> files on our behalf, I forgot to mention that this specific test
> that you added in patch 4/7 simplifies by running the merge-base
> check under 'test_all_modes' and then checking the trace2 data
> on the three well-known files afterwards.

That's a nice bonus, I will try to see if I can manage to utilize it.

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters
From: Kristofer Karlsson @ 2026-06-24 14:38 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <19639ad3-2d16-4f3b-be79-138e00144ea3@gmail.com>

On Wed, 24 Jun 2026 at 15:54, Derrick Stolee <stolee@gmail.com> wrote:
>
> I'm grateful to see these changes happening to the doc in real-
> time. I know it was extra work, but I'm grateful right now.
>
> Hopefully future historians will also benefit from this effort.

It was honestly not bad at all, and I agree it felt quite nice to
see how the doc naturally changed along with the implementation.

> > +static struct commit *paint_queue_get(struct paint_state *state)
> > +{
>
> Since we are going to make this a more complete termination
> condition, we may want to make that very explicit with a doc-
> comment. Something along the lines of "dequeue a commit when
> possible, but also signal termination of the walk when we
> conclude that no more merge bases will be discovered due to
> internal state."

Yes, I'll make sure to clean that part up more, maybe also
rename the function to be more descriptive.

> You mentioned in your cover letter how the min_generation value
> can add extra termination conditions. It may be a good idea to
> insert min_generation into the paint_queue struct and make it a
> termination condition for paint_queue_get(). If you consider this
> direction, then I'd make it a separate patch on top of this one
> _before_ adding the one-sided change. The extra tests that cover
> the exact number of walked commits can help to guarantee the same
> behavior, assuming that some of those tests check a non-zero
> min_generation input. (It may be good to add such trace tests in
> an earlier patch to help confidence in this case.)

I think I might wait with this - the patch series already feels
quite big, and I think it has a natural progression and finish now.
But I can definitely commit to following up later -- it would be a
smaller series that is easier to reason about, likely a single commit.

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases
From: Kristofer Karlsson @ 2026-06-24 14:33 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren
In-Reply-To: <b4b33635-1279-46c0-819a-d29cc13921f5@gmail.com>

On Wed, 24 Jun 2026 at 15:43, Derrick Stolee <stolee@gmail.com> wrote:
>
> One way to make these tests have potential to check exact stats
> without too much extra work would be to update 'test_all_modes'
> to run each command with GIT_TRACE2_EVENT set to a known trace
> file (reset each time) that can then be checked after verifying
> that the results of each command is the same.
>
> Then, these tests could have lines such as
>
>         test_trace2_data paint_down_to_common steps 20 <trace-full.txt &&
>         test_trace2_data paint_down_to_common steps 30 <trace-half.txt &&
>         test_trace2_data paint_down_to_common steps 40 <trace-none.txt
>
> after the test_all_modes line.

That does indeed look quite clean, I will try to massage the tests to
utilize that, though I haven't fully worked it through in my head yet so
I am not sure if/where I would get stuck. :)

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common()
From: Kristofer Karlsson @ 2026-06-24 14:31 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <560c91df-3c07-4c8f-9924-ef0cc7646e08@gmail.com>

On Wed, 24 Jun 2026 at 15:41, Derrick Stolee <stolee@gmail.com> wrote:
>
> (highlighting this chunk)
>
> > +     rm -rf .git/objects/info/commit-graph \
> > +             .git/objects/info/commit-graphs &&
> > +
> > +     GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \
> > +             git merge-base --all commit-9-9 commit-9-1 >actual &&
> > +     test_trace2_data paint_down_to_common steps 81 <trace-none.txt &&
>
> I'd rather see the whitespace line before the `rm` to make it
> more obvious that it's setting up the "none" case.

Ah yes, good point, will fix.

> > +     cp commit-graph-full .git/objects/info/commit-graph &&
> > +     GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
> > +             git merge-base --all commit-9-9 commit-9-1 >actual &&
> > +     test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
> > +
> > +     cp commit-graph-half .git/objects/info/commit-graph &&
> > +     GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
> > +             git merge-base --all commit-9-9 commit-9-1 >actual &&
> > +     test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> > +'
> > +
>
> This test is a great example. I look forward to seeing that it
> updates in the future.
>
> One thing I was hoping to see was that your side-exhaustion tests
> (from patch v2 2/7) would also include these checks so they are
> more obviously updating when the implementation updates later.

I was internally contemplating how much I should introduce the steps
validation to existing tests. My worry was that it might make tests
fragile - for example I repeatedly got some off-by-one changes
after refactoring the halt condition slightly (differs depending on
adding the halts solely within paint_queue_get or having it at the end
of the loop) and I think potentially other future work could affect it.

But I'm happy to attach the steps checks for more relevant tests,
it's not much work to change.

> One way to accomplish that is to reorder this patch before adding
> those tests so their first version includes these checks and then
> the values update when changing the implementation.

I was thinking I could keep the same order, but the patch to introduce
the trace could also modify the tests at the same time - that would
perhaps make it even more clear. Also this means I could avoid
making changes to Elijah's commit that I already _partly_ butchered
(extracted the test change as-is, but dropped the other file changes)
and I don't want to make that one more unclean.

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
From: Kristofer Karlsson @ 2026-06-24 14:25 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <67c00a9f-2aa2-4e83-9c0a-317ca589b232@gmail.com>

On Wed, 24 Jun 2026 at 15:34, Derrick Stolee <stolee@gmail.com> wrote:
>
> I like seeing these updates including the deterministic steps. Is there
> a reason you don't include the step data for 'merge-tree (across import)'
> in your monorepo case? The wall-clock is substantial, so it's not like the
> last two examples in git.git where there may not be any difference.

I will have to attribute to laziness I suppose :)
I ran the initial benchmarks before adding the trace, and I didn't
update all of them,
just enough to show the improvement and value of the trace data.

I will ensure that I include all of it in the next version though
(maybe 1-2 days from now?) or maybe drop some of the benchmarks to
not overload with partly redundant information.
(merge-tree benchmarks doesn't perhaps add much significance on top
of merge-base in practice).

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
From: Derrick Stolee @ 2026-06-24 14:09 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>

On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> commit-reach: terminate merge-base walk when one side is exhausted
> 
> Optimize paint_down_to_common() for merge-base queries that hit large
> one-sided histories.
I completed my review of this version. All of my comments are around
either making the commit history a little cleaner or expanding the
tests that use the trace2 data.

I believe that this code is _correct_ and could be shipped as-is. My
comments are focused on making it the best that it could be, with an
eye towards a cleaner final result or a more robust test setup.

The most actionable things are:

1. You can add tracing before the new tests, allowing the new tests
   to also check the step counts in their first versions and then
   get updated in the final patch to demonstrate how they change
   with that behavior change.

2. The t6600 helper 'test_all_modes' could set GIT_TRACE2_EVENT for
   each mode into a different trace file that can be scanned later.
   This will simplify your current tracing tests but also unlock
   easier tracing like this in the future.

3. The termination condition depending on min_generation could be
   refactored into paint_queue_get() to help make things even more
   obvious as to when we terminate. This should help with your
   concerns that you mentioned in response to patch 2/6 of the
   previous version:

> I am not 100% happy with the halt-condition placement yet --
> the existing loop in master already has several exit paths
> (while condition, min_generation break, FIND_ALL break) and I
> think there is an opportunity to consolidate them. But that is
> a separate discussion and I do not want to derail this series.
> I can propose some alternatives in a follow-up after this
> lands.

I then have some super minor comments around making the diffs
even easier to read, but they could be ignored as they are very
nit-picky. It's the kind of detail that I would try to resolve
if I was the author, but I'm _not_. You are. Your time is
valuable so make your own conclusions as to whether you want to
go down that road. You've already entertained my ideas around
updating the docs as the implementation changes.

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Derrick Stolee @ 2026-06-24 14:02 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <d84b932e5b078edc8255b6944ecb67fc1aa086b0.1782303254.git.gitgitgadget@gmail.com>

On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> 
> Add an early termination check to paint_down_to_common() using the
> per-side counters introduced earlier. Once the walk enters the
> finite-generation region, terminate early when one side's exclusive
> count drops to zero -- no new merge-base can form without both paint
> sides meeting.

Having this as the last patch is truly a nice climax moment for the
patch series!

> @@ -94,6 +94,9 @@ ends when one of the following conditions holds:
>  
>    1. The queue is empty.
>    2. The queue contains only stale entries.
> +  3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
> +     remain in the queue, no pending merge-base candidates exist,
> +     and the walk has entered the finite-generation region.
...> +Side-exhaustion condition
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +A new merge-base requires commits from both sides to meet. When one
> +side's exclusive counter reaches zero and there are no pending
> +merge-base candidates, no future traversal step can produce a new
> +candidate.
> +
> +This optimization only activates in the finite-generation region
> +where topological ordering holds. In that region, children are
> +always visited before parents, so paint flags are final at visit
> +time and an exhausted side cannot reappear. In the INFINITY region,
> +commit-date ordering can violate this guarantee, so the check is
> +skipped.
> +

And these doc updates inline make me happy.

>  Related documentation
>  ---------------------
>  
> diff --git a/commit-reach.c b/commit-reach.c
> index e0d9874f99..f79d0b64d6 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -133,17 +133,30 @@ static void paint_queue_put(struct paint_state *state,
>  
>  static struct commit *paint_queue_get(struct paint_state *state)
>  {
> -	struct commit *commit;
> +	struct commit *commit = prio_queue_get(&state->queue);
>  
> -	if (!state->p1_count && !state->p2_count &&
> -	    !state->pending_merge_bases)
> +	if (!commit)
>  		return NULL;
I see how the previous implementation has a termination condition
before calling prio_queue_get(), which is technically more
efficient. It does make this initial diff a bit more complicated
because we are moving the prio_queue_get() line.

If the introduction of the method in patch 5/7 looked like this:

+static struct commit *paint_queue_get(struct paint_state *state)
+{
+	struct commit *commit = prio_queue_get(&state->queue);
+
+	if (!commit)
+		return NULL;
+
+	if (!state->p1_count && !state->p2_count &&
+	    !state->pending_merge_bases)
+		return NULL;
+
+	commit->object.flags &= ~ENQUEUED;
+	paint_count_update(state, commit->object.flags, -1);
+	return commit;
+}

Then this diff would look cleaner.

(This is the nittiest of nitpicks so feel free to ignore if this
doesn't bother you at all.)

> -	commit = prio_queue_get(&state->queue);
> -	if (commit) {
> -		commit->object.flags &= ~ENQUEUED;
> -		paint_count_update(state, commit->object.flags, -1);
> +	commit->object.flags &= ~ENQUEUED;
> +
> +	if (!state->pending_merge_bases) {
> +		if (!state->p1_count && !state->p2_count)
> +			return NULL;
> +		/*
> +		 * Side exhaustion: a new merge-base can only form
> +		 * when both PARENT1-only and PARENT2-only commits
> +		 * remain in the queue. In the finite-generation
> +		 * region the queue is ordered topologically, so
> +		 * no future step can add paint to visited commits
> +		 * and an exhausted side cannot reappear.
> +		 */
> +		if ((!state->p1_count || !state->p2_count) &&
> +		    commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
> +			return NULL;
>  	}
> +
> +	paint_count_update(state, commit->object.flags, -1);
>  	return commit;
>  }

I like how the crux of this implementation is entirely within
paint_queue_get() now.

> diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> index c1109fb42f..03175befb3 100755
> --- a/t/t6600-test-reach.sh
> +++ b/t/t6600-test-reach.sh
> @@ -332,12 +332,12 @@ test_expect_success 'merge-base --all commit-walk steps' '
>  	cp commit-graph-full .git/objects/info/commit-graph &&
>  	GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
>  		git merge-base --all commit-9-9 commit-9-1 >actual &&
> -	test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
> +	test_trace2_data paint_down_to_common steps 9 <trace-full.txt &&
>  
>  	cp commit-graph-half .git/objects/info/commit-graph &&
>  	GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
>  		git merge-base --all commit-9-9 commit-9-1 >actual &&
> -	test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> +	test_trace2_data paint_down_to_common steps 57 <trace-half.txt
>  '
I love to see these steps change. If you take my suggestion to
update more tests with these checks, then this diff will get bigger
(but in a deserved way).

Also, when I suggested that 'test_all_modes' creates the trace
files on our behalf, I forgot to mention that this specific test
that you added in patch 4/7 simplifies by running the merge-base
check under 'test_all_modes' and then checking the trace2 data
on the three well-known files afterwards.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers
From: Derrick Stolee @ 2026-06-24 13:55 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <8c72f01083237c00397dd074beda8f854e882cbe.1782303254.git.gitgitgadget@gmail.com>

On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> 
> nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became
> unused after the previous commit. The core nonstale_queue functions
> remain in use by ahead_behind().
This is a nice cleanup that makes the previous diff easier to
read. Thanks!

^ permalink raw reply

* Re: [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters
From: Derrick Stolee @ 2026-06-24 13:54 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <f24edd45f0af1da64513164d5d720fe70c1decff.1782303254.git.gitgitgadget@gmail.com>

On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
>  Termination
>  -----------
>  
> -The walk uses a `nonstale_queue` wrapper around `prio_queue` that
> -tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
> -so far. Once that commit is dequeued, every remaining entry is known
> -to be STALE and the loop terminates. Specifically, the main loop
> +The walk tracks the number of commits of each type in the queue
> +(PARENT1-only, PARENT2-only, pending merge-base). The main loop
>  ends when one of the following conditions holds:
>  
>    1. The queue is empty.
> -  2. `max_nonstale` has been dequeued, meaning the queue only contains
> -     STALE entries.
> +  2. The queue contains only stale entries.

I'm grateful to see these changes happening to the doc in real-
time. I know it was extra work, but I'm grateful right now.

Hopefully future historians will also benefit from this effort.

> +static void paint_count_update(struct paint_state *state,
> +			       unsigned flags, int delta)
> +{
> +	switch (flags & (PARENT1 | PARENT2 | STALE)) {
> +	case PARENT1:
> +		state->p1_count += delta;
> +		break;
> +
> +	case PARENT2:
> +		state->p2_count += delta;
> +		break;
> +
> +	case PARENT1 | PARENT2:
> +		state->pending_merge_bases += delta;
> +		break;
> +
> +	case PARENT1 | PARENT2 | STALE:
> +		break;
> +
> +	default:
> +		BUG("unexpected paint state");
> +	}
> +}

I like the use of 'delta' to allow reuse of this switch.

> +
> +static void paint_queue_put(struct paint_state *state,
> +			    struct commit *c, unsigned add_flags)
> +{
> +	unsigned old_flags = c->object.flags;
> +	c->object.flags |= add_flags;
> +
> +	if (old_flags & ENQUEUED) {
> +		paint_count_update(state, old_flags, -1);
> +		paint_count_update(state, c->object.flags, 1);
> +	} else {
> +		c->object.flags |= ENQUEUED;
> +		prio_queue_put(&state->queue, c);
> +		paint_count_update(state, c->object.flags, 1);
> +	}
> +}

ok: if we are already in the queue then we have old flags and
may need to subtract their values because they were counted
already. Otherwise, we need to queue it for the first time and
only add the values. Makes sense.

> +
> +static struct commit *paint_queue_get(struct paint_state *state)
> +{

Since we are going to make this a more complete termination
condition, we may want to make that very explicit with a doc-
comment. Something along the lines of "dequeue a commit when
possible, but also signal termination of the walk when we
conclude that no more merge bases will be discovered due to
internal state."

> @@ -187,12 +253,11 @@ static int paint_down_to_common(struct repository *r,
>  				return error(_("could not parse commit %s"),
>  					     oid_to_hex(&p->object.oid));
>  			}
> -			p->object.flags |= flags;
> -			nonstale_queue_put_dedup(&queue, p);
> +			paint_queue_put(&state, p, flags);

I like how this simplifies the flag-assignment logic somewhat.

You mentioned in your cover letter how the min_generation value
can add extra termination conditions. It may be a good idea to
insert min_generation into the paint_queue struct and make it a
termination condition for paint_queue_get(). If you consider this
direction, then I'd make it a separate patch on top of this one
_before_ adding the one-sided change. The extra tests that cover
the exact number of walked commits can help to guarantee the same
behavior, assuming that some of those tests check a non-zero
min_generation input. (It may be good to add such trace tests in
an earlier patch to help confidence in this case.)

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases
From: Derrick Stolee @ 2026-06-24 13:43 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <6151b8e0a3989a51e6d9717e0ceac439f26f1c1d.1782303254.git.gitgitgadget@gmail.com>

On 6/24/2026 8:14 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Add test cases to t6600-test-reach.sh that exercise edge cases in the
> side-exhaustion optimization for paint_down_to_common():
> 
>  - in_merge_bases_many:self: commit is both A and one of the X inputs
>  - get_merge_bases_many:duplicate-twos: duplicate entries in X list
>  - get_merge_bases_many:pending-stale: STALE transition on an
>    already-painted commit (ps-* diamond topology)
>  - get_merge_bases_many:infinity-both-sides: both tips outside the
>    commit-graph with non-monotonic dates (pi-* topology)

I'm happy that these cases now exist.

> +test_expect_success 'in_merge_bases_many:self' '
> +	cat >input <<-\EOF &&
> +	A:commit-6-8
> +	X:commit-5-9
> +	X:commit-6-8
> +	EOF
> +	echo "in_merge_bases_many(A,X):1" >expect &&
> +	test_all_modes in_merge_bases_many
> +'

and using 'test_all_modes' is great to get coverage of all the
different commit-graph states. In reply to patch v2 4/7 I ask
to see the results of the traces in these kinds of test cases,
but each of these modes will have different values.

One way to make these tests have potential to check exact stats
without too much extra work would be to update 'test_all_modes'
to run each command with GIT_TRACE2_EVENT set to a known trace
file (reset each time) that can then be checked after verifying
that the results of each command is the same.

Then, these tests could have lines such as

	test_trace2_data paint_down_to_common steps 20 <trace-full.txt &&
	test_trace2_data paint_down_to_common steps 30 <trace-half.txt &&
	test_trace2_data paint_down_to_common steps 40 <trace-none.txt

after the test_all_modes line.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common()
From: Derrick Stolee @ 2026-06-24 13:41 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <6ade4df2ed2a836a3b4c5400ab13e8247e36c029.1782303254.git.gitgitgadget@gmail.com>

On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> 
> Add a step counter and trace2_data_intmax() call so that the number
> of commits visited during the paint walk is observable via
> GIT_TRACE2_PERF. This provides a way to measure the impact of
> future optimizations without relying on wall-clock benchmarks alone.

> +	trace2_data_intmax("paint_down_to_common", r,
> +			   "steps", steps);

This is great data. Very clearly marked for what we should be
doing here.

> +test_expect_success 'merge-base --all commit-walk steps' '
> +	test_when_finished rm -rf .git/objects/info/commit-graph \
> +		.git/objects/info/commit-graphs &&

(highlighting this chunk)

> +	rm -rf .git/objects/info/commit-graph \
> +		.git/objects/info/commit-graphs &&
> +
> +	GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \
> +		git merge-base --all commit-9-9 commit-9-1 >actual &&
> +	test_trace2_data paint_down_to_common steps 81 <trace-none.txt &&

I'd rather see the whitespace line before the `rm` to make it
more obvious that it's setting up the "none" case.

> +
> +	cp commit-graph-full .git/objects/info/commit-graph &&
> +	GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
> +		git merge-base --all commit-9-9 commit-9-1 >actual &&
> +	test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
> +
> +	cp commit-graph-half .git/objects/info/commit-graph &&
> +	GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
> +		git merge-base --all commit-9-9 commit-9-1 >actual &&
> +	test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> +'
> +

This test is a great example. I look forward to seeing that it
updates in the future.

One thing I was hoping to see was that your side-exhaustion tests
(from patch v2 2/7) would also include these checks so they are
more obviously updating when the implementation updates later.

One way to accomplish that is to reorder this patch before adding
those tests so their first version includes these checks and then
the values update when changing the implementation.

Thanks,
-Stolee



^ permalink raw reply

* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
From: Derrick Stolee @ 2026-06-24 13:34 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>

On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:

> Benchmarks
> 
> Step counts are deterministic (measured via trace2_data_intmax added in
> patch 4). Wall-clock times are medians over 10-20 runs with CPU governor set
> to performance.
> 
> 2.6M-commit monorepo with commit-graph (baseline v2.55.0-rc1):
> 
>                                         steps              wall-clock
> merge-base --all  (across import)    2682391 ->  53521     7.26s ->   88ms
> merge-base --all  (1000 apart)       2659607 ->   1106     6.98s ->    8ms
> merge-tree        (across import)          -               8.11s ->  100ms
> 
> 
> git.git (88k commits, commit-graph):
> 
>                                         steps              wall-clock
> merge-base --all v2.0.0 v2.55.0-rc1   72264 ->  44589      82ms ->   49ms
> merge-base --all HEAD HEAD~1000         9873 ->   3817      19ms ->    9ms
> merge-base --all HEAD HEAD~10000       72285 ->  41523      80ms ->   48ms
> merge-base HEAD HEAD~1000                  -                 9ms ->    9ms
> merge-base --is-ancestor HEAD~1000 HEAD    -                 6ms ->    6ms

I like seeing these updates including the deterministic steps. Is there
a reason you don't include the step data for 'merge-tree (across import)'
in your monorepo case? The wall-clock is substantial, so it's not like the
last two examples in git.git where there may not be any difference.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH GSoC RFC v13 06/12] connect: refactor packet writing
From: Karthik Nayak @ 2026-06-24 13:25 UTC (permalink / raw)
  To: Pablo Sabater
  Cc: gitster, peff, eric.peijian, chriscool, git, jltobler, toon,
	chandrapratap3519
In-Reply-To: <CAN5EUNRN4_5PS3cbQtQfpyRuwByvV=qvAVKnVbgT-pirKGnnTg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3976 bytes --]

Pablo Sabater <pabloosabaterr@gmail.com> writes:

> El lun, 22 jun 2026 a las 22:43, Karthik Nayak
> (<karthik.188@gmail.com>) escribió:
>>
>> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>>
>> [snip]
>>
>> > diff --git a/connect.c b/connect.c
>> > index 1dced8e632..78c69d4485 100644
>> > --- a/connect.c
>> > +++ b/connect.c
>> > @@ -700,16 +700,16 @@ int server_supports(const char *feature)
>> >       return !!server_feature_value(feature, NULL);
>> >  }
>> >
>> > -void write_fetch_command_and_capabilities(struct strbuf *req_buf,
>> > -                                       const struct string_list *server_options)
>> > +void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
>> > +                                 const struct string_list *server_options)
>> >  {
>> >       const char *hash_name;
>> >       int advertise_sid;
>> >
>> >       repo_config_get_bool(the_repository, "transfer.advertisesid", &advertise_sid);
>> >
>> > -     ensure_server_supports_v2("fetch");
>> > -     packet_buf_write(req_buf, "command=fetch");
>> > +     ensure_server_supports_v2(command);
>> > +     packet_buf_write(req_buf, "command=%s", command);
>> >       if (server_supports_v2("agent"))
>> >               packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
>> >       if (advertise_sid && server_supports_v2("session-id"))
>> > @@ -727,7 +727,7 @@ void write_fetch_command_and_capabilities(struct strbuf *req_buf,
>> >                       die(_("mismatched algorithms: client %s; server %s"),
>> >                           the_hash_algo->name, hash_name);
>> >               packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
>> > -     } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
>> > +     } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
>> >               die(_("the server does not support algorithm '%s'"),
>> >                   the_hash_algo->name);
>> >       }
>>
>> Why did we make this change? If the server doesn't support v2, then the
>> object format should be `GIT_HASH_SHA1_LEGACY`. While the value of it is
>> indeed `GIT_HASH_SHA1`, it indicates a scenario where there was no
>> option to select object hash, which is the scenario here.
>>
>> If there is a reason to make such a change, perhaps we should highlight
>> this in the commit message.
>
> Hi!
> There should be no diff related to that line, In some point between
> Eric's last version (v11) and mine's firs (v12) the original code
> changed. On the diff from v11 [1] the object format is the same, i
> didn't notice this change and it's wrong, I'll fix it for v14, Thanks!
>
>>
>> > diff --git a/connect.h b/connect.h
>> > index c4f6ea4b0a..8f4c523892 100644
>> > --- a/connect.h
>> > +++ b/connect.h
>> > @@ -34,8 +34,12 @@ void check_stateless_delimiter(int stateless_rpc,
>> >                              struct packet_reader *reader,
>> >                              const char *error);
>> >
>> > +/*
>> > + * Writes a command along with the requested server capabilities/features into a
>> > + * request buffer.
>> > + */
>> >  struct string_list;
>>
>> The comment should be above the function and not the forward
>> declaration.
>
> True, I'll fix it for v14.
>
>>
>> While we're here, why not `#include "string-list.h"` and remove the
>> forward declaration, is there a circular dependency?
>
> I believe this was right because from what I know forward declarations
> are prefered in headers when in this case, the struct is only used as
> a pointer. Investigating, this came from a review from patrick [2].
>

That's fair.

Nit: Maybe add this context to the commit message?

Thanks

> [snip]
>
> [1]: https://lore.kernel.org/git/20250221190451.12536-5-eric.peijian@gmail.com/
> [2]: https://lore.kernel.org/git/Z0RIqUAoEob8lGfM@pks.im/
>
> Thanks for the review,
> Pablo.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox