Git development
 help / color / mirror / Atom feed
* Re: [PATCH v17 5/7] branch: add --delete-merged <branch>
From: Phillip Wood @ 2026-06-22 15:36 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget, git
  Cc: Kristoffer Haugsbakk, Johannes Sixt, Harald Nordgren
In-Reply-To: <46da7c814056ddbc23accf19a61d0451b949127e.1782113388.git.gitgitgadget@gmail.com>

Hi Harald

On 22/06/2026 08:29, Harald Nordgren via GitGitGadget wrote:
> From: Harald Nordgren <haraldnordgren@gmail.com>
> 
> +static int collect_upstream(const struct reference *ref, void *cb_data)
> +{
> +	struct string_list *upstreams = cb_data;
> +	struct branch *branch = branch_get(ref->name);
> +	const char *upstream = branch_get_upstream(branch, NULL);
> +
> +	string_list_append(upstreams, ref->name)->util =
> +		xstrdup_or_null(upstream);
> +	return 0;
> +}
> +
> +/*
> + * Keep any branch that another, surviving branch tracks as its
> + * upstream, so we never delete a branch out from under one stacked on
> + * top of it.  Sparing a branch makes it a survivor whose own upstream
> + * then needs the same protection, so repeat until nothing changes.
> + */
> +static void spare_stacked_bases(struct ref_store *refs, struct strset *deletable)
> +{
> +	struct string_list upstreams = STRING_LIST_INIT_DUP;
> +	struct string_list_item *item;
> +	bool spared;
> +
> +	refs_for_each_branch_ref(refs, collect_upstream, &upstreams);
> +	do {
> +		spared = false;
> +		for_each_string_list_item(item, &upstreams) {
> +			const char *up = item->util, *up_short;
> +
> +			if (!up || strset_contains(deletable, item->string))
> +				continue;
> +			if (!skip_prefix(up, "refs/heads/", &up_short) ||
> +			    !strset_contains(deletable, up_short))
> +				continue;
> +
> +			strset_remove(deletable, up_short);
> +			spared = true;
> +		}
> +	} while (spared);
> +
> +	string_list_clear(&upstreams, 1);
> +}

This keeps the whole chain of branches, which is the safest thing to
do but potentially keeps unneeded branches around. It is only really
the upstream branches of unmerged branches which are useful so we
could just keep those, and if their upstream branch is going to be
deleted clear the upstream config for that branch.

For example, if we have branch feature-3 with upstream feature-2 which
has upstream feature-1, then if feature-1 and feature-2 are merged we'd
delete feature-1 but keep feature-2 and clear its upstream config. If we
also had feature-4 that was not merged and had upstream feature-1 we'd
keep feature-1 and leave the upstream config for feature-2 unchanged.

Here is a possible implementation for that. It compiles but I've not
written any new tests. It does pass the existing tests which means
either it is buggy or we don't have coverage for keeping a chain of
branches.

static int collect_upstreams(const struct reference *ref, void *cb_data)
{
	struct collect_upstream_data *data = cb_data;
	struct strset *deletable = data->deletable;
	struct strset *upstreams = data->upstreams;
	struct branch *branch;
	const char *upstream_ref, *upstream_name;

	/*
	 * We're only interested in the upstreams of branches that
	 * are not being deleted.
	 */
	if (strset_contains(deletable, ref->name))
		return 0;
	branch = branch_get(ref->name);
	if (!branch)
		return 0;
	upstream_ref = branch_get_upstream(branch, NULL);
	/*
	 * We're only interested in the upstream if it is going to
	 * be deleted.
	 */
	if (!upstream_ref ||
	    !skip_prefix(upstream_ref, "refs/heads/", &upstream_name) ||
	    !strset_contains(deletable, upstream_name))
		return 0;
	/*
	 * Do not delete this branch because it is the upstream of
	 * an unmerged branch. Also remember it so we can check if
	 * its upstream is marked for deletion once we've visited all
	 * branches
	 */
	strset_remove(deletable, upstream_name);
	strset_add(upstreams, upstream_name);

	return 0;
}

/*
  * Keep any branch that another, surviving branch tracks as its
  * upstream, so we never delete a branch out from under one stacked on
  * top of it.  If the upstream branch has an upstream set that is marked
  * for deletion clear its upstream config.
  */
static void spare_stacked_bases(struct ref_store *refs, struct strset *deletable)
{
	struct strset upstreams = STRSET_INIT;
	struct collect_upstream_data data = {
		.deletable = deletable,
		.upstreams = &upstreams,
	};
	struct strbuf buf = STRBUF_INIT;
	struct hashmap_iter iter;
	struct strmap_entry *entry;

	refs_for_each_branch_ref(refs, collect_upstreams, &data);
	strset_for_each_entry(&upstreams, &iter, entry) {
		const char *upstream_upstream;
		struct branch *upstream_branch;

		/* We know upstream_ref is a branch, skip "refs/heads/" */
		upstream_branch = branch_get(entry->key);
		upstream_upstream = branch_get_upstream(upstream_branch, NULL);
		if (upstream_upstream &&
		    strset_contains(deletable, upstream_upstream)) {
			/*
			 * This branch has been merged and is the upstream of
			 * an unmerged branch.  Its upstream is marked for
			 * deletion because it is not the upstream of any
			 * unmerged branch so clear its upstream config.
			 */
			strbuf_reset(&buf);
			strbuf_addf(&buf, "branch.%s.merge", upstream_branch->name);
			repo_config_set_gently(the_repository, buf.buf, NULL);
			strbuf_setlen(&buf, buf.len - 5);
			strbuf_addstr(&buf, "remote");
			repo_config_set_gently(the_repository, buf.buf, NULL);
		}
	}
	strbuf_release(&buf);
	strset_clear(&upstreams);

}


> +	for (i = 0; i < candidates.nr; i++) {
> +		const char *short_name;
> +
> +		if (skip_prefix(candidates.items[i]->refname, "refs/heads/",
> +				&short_name) &&
> +		    strset_contains(&deletable, short_name))
> +			strvec_push(&to_delete, short_name);
> +	}

It would be nicer to use strset_for_each_entry() here. First declare

	struct hashmap_iter iter;
	struct strmap_entry *entry;

at the top of the function and then replace the loop with

	strset_for_each_entry(&deletable, &iter, entry)
		strvec_push(&to_delete, entry->key);

Thanks

Phillip


> +	if (to_delete.nr)
> +		ret = delete_branches(to_delete.nr, to_delete.v,
> +				      FILTER_REFS_BRANCHES,
> +				      DELETE_BRANCH_SKIP_UNMERGED |
> +				      DELETE_BRANCH_NO_HEAD_FALLBACK |
> +				      flags);
> +
> +	strvec_clear(&to_delete);
> +	strset_clear(&deletable);
> +	ref_array_clear(&candidates);
> +	ref_filter_clear(&filter);
> +	return ret;
> +}
> +
>   static GIT_PATH_FUNC(edit_description, "EDIT_DESCRIPTION")
>   
>   static int edit_branch_description(const char *branch_name)
> @@ -746,6 +862,7 @@ int cmd_branch(int argc,
>   	/* possible actions */
>   	int delete = 0, rename = 0, copy = 0, list = 0,
>   	    unset_upstream = 0, show_current = 0, edit_description = 0;
> +	int delete_merged = 0;
>   	const char *new_upstream = NULL;
>   	int noncreate_actions = 0;
>   	/* possible options */
> @@ -799,6 +916,8 @@ int cmd_branch(int argc,
>   		OPT_BOOL(0, "create-reflog", &reflog, N_("create the branch's reflog")),
>   		OPT_BOOL(0, "edit-description", &edit_description,
>   			 N_("edit the description for the branch")),
> +		OPT_BOOL(0, "delete-merged", &delete_merged,
> +			N_("delete local branches whose upstream matches <branch> and are merged")),
>   		OPT__FORCE(&force, N_("force creation, move/rename, deletion"), PARSE_OPT_NOCOMPLETE),
>   		OPT_MERGED(&filter, N_("print only branches that are merged")),
>   		OPT_NO_MERGED(&filter, N_("print only branches that are not merged")),
> @@ -846,7 +965,8 @@ int cmd_branch(int argc,
>   			     0);
>   
>   	if (!delete && !rename && !copy && !edit_description && !new_upstream &&
> -	    !show_current && !unset_upstream && argc == 0)
> +	    !show_current && !unset_upstream && !delete_merged &&
> +	    argc == 0)
>   		list = 1;
>   
>   	if (filter.with_commit || filter.no_commit ||
> @@ -856,7 +976,7 @@ int cmd_branch(int argc,
>   
>   	noncreate_actions = !!delete + !!rename + !!copy + !!new_upstream +
>   			    !!show_current + !!list + !!edit_description +
> -			    !!unset_upstream;
> +			    !!unset_upstream + !!delete_merged;
>   	if (noncreate_actions > 1)
>   		usage_with_options(builtin_branch_usage, options);
>   
> @@ -898,6 +1018,10 @@ int cmd_branch(int argc,
>   				      (delete > 1 ? DELETE_BRANCH_FORCE : 0) |
>   				      (quiet ? DELETE_BRANCH_QUIET : 0));
>   		goto out;
> +	} else if (delete_merged) {
> +		ret = delete_merged_branches(argc, argv,
> +					     quiet ? DELETE_BRANCH_QUIET : 0);
> +		goto out;
>   	} else if (show_current) {
>   		print_current_branch_name();
>   		ret = 0;
> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> index 3104c555f6..1d372f95e8 100755
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -1839,4 +1839,155 @@ test_expect_success '--forked narrows a <pattern> argument' '
>   	test_cmp expect actual
>   '
>   
> +test_expect_success '--delete-merged: setup' '
> +	git init -b main upstream &&
> +	(
> +		cd upstream &&
> +		test_commit base &&
> +		git checkout -b next &&
> +		test_commit next-work &&
> +		git checkout main
> +	) &&
> +	git init -b main other &&
> +	test_commit -C other other-base &&
> +	git init -b main fork
> +'
> +
> +setup_repo_for_delete_merged () {
> +	rm -rf repo &&
> +	git clone upstream repo &&
> +	(
> +		cd repo &&
> +		git remote add fork ../fork &&
> +		git remote add other ../other &&
> +		git config remote.pushDefault fork &&
> +		git config push.default current &&
> +		git fetch other
> +	)
> +}
> +
> +merged_branch () {
> +	(
> +		cd repo &&
> +		git checkout -b "$1" "$2" &&
> +		git commit --allow-empty -m "$1 work" &&
> +		git push origin "$1:next" &&
> +		git fetch origin &&
> +		git branch --set-upstream-to="$2" "$1"
> +	)
> +}
> +
> +test_expect_success '--delete-merged deletes merged branches and spares the rest' '
> +	test_when_finished "rm -rf repo" &&
> +	setup_repo_for_delete_merged &&
> +	merged_branch merged origin/next &&
> +	(
> +		cd repo &&
> +		git checkout -b unmerged origin/next &&
> +		git commit --allow-empty -m "unmerged work" &&
> +		git branch --set-upstream-to=origin/next unmerged &&
> +		git checkout -b tracks-other other/main &&
> +		git branch --set-upstream-to=other/main tracks-other &&
> +		git checkout --detach
> +	) &&
> +	sha=$(git -C repo rev-parse --short merged) &&
> +
> +	git -C repo branch --delete-merged origin/next >actual 2>&1 &&
> +
> +	echo "Deleted branch merged (was $sha)." >expect &&
> +	test_cmp expect actual &&
> +	git -C repo for-each-ref --format="%(refname:short)" refs/heads/ >actual &&
> +	cat >expect <<-\EOF &&
> +	main
> +	tracks-other
> +	unmerged
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--delete-merged deletes merged branches and spares protected ones' '
> +	test_when_finished "rm -rf repo" &&
> +	setup_repo_for_delete_merged &&
> +	merged_branch on-next origin/next &&
> +	merged_branch checked-out origin/next &&
> +	merged_branch upstream-gone origin/next &&
> +	(
> +		cd repo &&
> +		git checkout -b mainline main &&
> +		git checkout -b on-local mainline &&
> +		git branch --set-upstream-to=mainline on-local &&
> +		git update-ref refs/remotes/origin/topic refs/remotes/origin/next &&
> +		git branch --set-upstream-to=origin/topic upstream-gone &&
> +		git update-ref -d refs/remotes/origin/topic &&
> +		git branch --set-upstream-to=origin/main main &&
> +		git config branch.main.pushRemote origin &&
> +		git checkout -b tracks-other other/main &&
> +		git branch --set-upstream-to=other/main tracks-other &&
> +		git checkout checked-out
> +	) &&
> +
> +	git -C repo branch --delete-merged origin/next mainline &&
> +
> +	git -C repo for-each-ref --format="%(refname:short)" refs/heads/ >actual &&
> +	cat >expect <<-\EOF &&
> +	checked-out
> +	main
> +	mainline
> +	tracks-other
> +	upstream-gone
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--delete-merged requires at least one <branch>' '
> +	test_must_fail git -C forked branch --delete-merged 2>err &&
> +	test_grep "requires at least one <branch>" err
> +'
> +
> +test_expect_success '--delete-merged keeps a branch that is an upstream' '
> +	test_when_finished "rm -rf repo" &&
> +	setup_repo_for_delete_merged &&
> +	merged_branch feature origin/next &&
> +	(
> +		cd repo &&
> +		git checkout -b topic feature &&
> +		git commit --allow-empty -m "topic work" &&
> +		git branch --set-upstream-to=feature topic &&
> +		git checkout --detach
> +	) &&
> +
> +	git -C repo branch --delete-merged origin/next 2>err &&
> +
> +	test_must_be_empty err &&
> +	git -C repo rev-parse --verify refs/heads/feature &&
> +	git -C repo rev-parse --verify refs/heads/topic
> +'
> +
> +test_expect_success '--delete-merged keeps a chain of upstreams of a kept branch' '
> +	test_when_finished "rm -rf repo" &&
> +	setup_repo_for_delete_merged &&
> +	(
> +		cd repo &&
> +		git branch b3 origin/next &&
> +		git branch --set-upstream-to=origin/next b3 &&
> +		git branch b2 origin/next &&
> +		git branch --set-upstream-to=b3 b2 &&
> +		git checkout -b b1 b2 &&
> +		git commit --allow-empty -m "b1 work" &&
> +		git branch --set-upstream-to=b2 b1 &&
> +		git checkout --detach
> +	) &&
> +
> +	git -C repo branch --delete-merged origin/next &&
> +
> +	git -C repo for-each-ref --format="%(refname:short)" refs/heads/ >actual &&
> +	cat >expect <<-\EOF &&
> +	b1
> +	b2
> +	b3
> +	main
> +	EOF
> +	test_cmp expect actual
> +'
> +
>   test_done


^ permalink raw reply

* Re: [PATCH v4 1/3] replay: refactor enum replay_mode into a bool
From: Junio C Hamano @ 2026-06-22 15:43 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Toon Claes, git, Elijah Newren
In-Reply-To: <ajk-YQxLWfspNWIm@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> On Mon, Jun 22, 2026 at 02:41:55PM +0200, Toon Claes wrote:
>> In 2760ee4983 (replay: add --revert mode to reverse commit changes,
>> 2026-03-26) the enum `replay_mode` was introduced. This has two possible
>> values:
>> 
>>  - The value `REPLAY_MODE_REVERT` is used when option `--revert` is
>>    passed to git-replay(1). When using this value the commits are
>>    processed in reverse order and the inverse of the changes are
>>    applied.
>> 
>>  - The value `REPLAY_MODE_PICK` is used when either option `--onto` or
>>    `--advance` is used. In both cases the commits are processed in
>>    normal order, and the changes are applied as-is.
>> 
>> Since there are only two possible values of this enum, simplify the code
>> by converting the enum into a bool. This avoids adding code paths that
>> check for invalid values of the enum, and shortens code where the value
>> is checked with a ternary operator.
>
> That's fair, and the result is easier to write. But is it really easier
> to read? And what if we ever have to create a third mode going forward?
>
> I'm generally no fan of booleans as parameters as they basically give
> you no information at all at the callsite, except if you're lucky and
> you already have an aptly-named variable available that you can pass.
> Which seems to be the case here, but I'm still not sure whether this
> change really improves the code.

I tend to agree with you on both counts.  The "what happens when
somebody else wants a third choice?" is a quesiton I would ask the
first thing as the maintainer of a project.

Even if the boolean parameter is so obviously named, the callsite
can only say "true" or "false", unlike some other popular languages
that lets you say

	my_function(use_revert_mode=true, verbose=false);

and you cannot tell what effect the author wanted out of that "true"
if all you can write were

	my_function(true, false);

Of course, we could go ultra verbose, like

	my_function(true, /* use_revert_mode */
		    false, /* verbose */);

but then we are often better off writing:

	my_function(REPLAY_MODE_REVERT, REPLAY_QUIET);

Thanks.

^ permalink raw reply

* Re: [PATCH v3] config.mak.uname: avoid macOS dup-library warning
From: D. Ben Knoble @ 2026-06-22 15:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Harald Nordgren via GitGitGadget, git, Harald Nordgren,
	Patrick Steinhardt, pbonzini@redhat.com
In-Reply-To: <CALnO6CAgNdkg0PnN9Zy=zLurLUSb2hUXYAGe_qB0oceZNy_=gg@mail.gmail.com>

On Sat, Jun 20, 2026 at 4:58 PM D. Ben Knoble <ben.knoble@gmail.com> wrote:
>
> On Fri, Jun 19, 2026 at 6:27 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > "Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > > From: Harald Nordgren <haraldnordgren@gmail.com>
> > >
> > > Building on macOS with Xcode 15 or newer emits:
> > >
> > >     ld: warning: ignoring duplicate libraries: 'libgit.a',
> > >     'target/release/libgitcore.a'
> > >
> > > Some link recipes list the same archive twice, which is harmless.
> > > Quiet the warning instead.
> > >
> > > Pass -Wl,-no_warn_duplicate_libraries on Xcode 15 and newer, whose
> > > linkers added both the warning and the suppression flag (ld64-907
> > > and dyld-1009). Earlier linkers reject the flag, so gate on the
> > > linker version. Broaden the existing -fno-common version probe to
> > > also match the "ld64-NNN" and "dyld-NNN" forms Xcode 15 reports.
> > >
> > > Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
> > > ---
> >
> > Yeah, this looks like what I expected.
> >
> > A few things to note.
> >
> >  * Can folks with different versions of Xcode (or is 15 sufficiently
> >    old that practically nobody is expected to have anything older?)
> >    test this patch?
> >
> >  * We only patch Makefile here; can folks who use meson report how
> >    well your build goes?
> >
> > Thanks.
>
> On one (old) machine I have available:
>
>     $ pkgutil --pkg-info=com.apple.pkg.CLTools_Executables
>     [trimmed]
>     version: 14.2.0.0.1.1668646533
>
> On said machine, I don't get the duplicate warnings on a Meson build.
> No issues with the patch when running make.

That old machine has "ld -v":

    @(#)PROGRAM:ld  PROJECT:ld64-711
    BUILD 21:57:11 Nov 17 2021
    [trimmed]
    LTO support using: LLVM version 13.0.0, (clang-1300.0.29.30)
(static support for 27, runtime is 27)
    TAPI support using: Apple TAPI version 13.0.0 (tapi-1300.0.6.5)

> I think I have seen this on my other machine, which is much newer.
> When I get around to trying it there, I'll report back as well.

Here's those results:

    $ pkgutil --pkg-info=com.apple.pkg.CLTools_Executables
    [trimmed]
    version: 26.1.0.0.1.1761104275

sans patch:
- Meson: duplicate warning
- Make: duplicate warning

w/ patch:
- Meson: (unchanged, obviously)
- Make: no duplicate warning

Under Meson + Ninja the warning I get is (status line may not be
helpful given parallelism)

    [705/708] Linking target t/helper/test-tool
    ld: warning: ignoring duplicate libraries: '-lexpat', '-liconv',
'-lresolv', '-lz'

That's in addition to a dozen or so

    ld: warning: reducing alignment of section __DATA,__common from
0x8000 to 0x4000 because it exceeds segment maximum alignment

Under Make I get (surrounding info may not be helpful given parallelism)

        LINK git
    ld: warning: ignoring duplicate libraries: 'libgit.a',
'target/release/libgitcore.a'
        MKDIR -p t/unit-tests/bin
        LINK git-sh-i18n--envsubst
        LINK t/helper/test-tool
        LINK git-remote-http
    ld: warning: ignoring duplicate libraries: 'libgit.a',
'target/release/libgitcore.a'

(No alignment warnings this time.)

Linker version (ld -v) on this machine:

    @(#)PROGRAM:ld PROJECT:ld-1230.1
    BUILD 16:18:08 Oct 17 2025
    [trimmed]
    LTO support using: LLVM version 17.0.0 (static support for 29,
runtime is 29)
    TAPI support using: Apple TAPI version 17.0.0 (tapi-1700.3.8)

-- 
D. Ben Knoble

^ permalink raw reply

* Re: [GSoC Patch v7 1/3] path: extract append_formatted_path() and use in rev-parse
From: Junio C Hamano @ 2026-06-22 16:03 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: a3205153416, git, jltobler, kumarayushjha123, lucasseikioshiro,
	phillip.wood, sandals
In-Reply-To: <xmqqtsqv6204.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> K Jayatheerth <jayatheerthkulkarni2005@gmail.com> writes:
> ...
> It is a minor point, but wouldn't it make it simpler to handle
> format_default first?  I.e.,
>
> 	if (format == FORMAT_DEFAULT)
> 		switch (def) {
> 		case DEFAULT_RELATIVE:
> 			format = DEFAULT_RELATIVE;
> 			break;
> 		...
> 		case DEFAULT_UNMODIFIED:
> 		default:
> 			format = DEFAULT_UNMODIFIED; 
> 			break;
> 	}
> 	switch (format) {
>         case FORMAT_RELATIVE: fmt = PATH_FORMAT_RELATIVE; break;
> 	case FORMAT_CANONICAL: fmt = PATH_FORMAT_CANONICAL; break;
> 	...
> 	}
>
> Perhaps yes, perhaps not.  I dunno.

I do not consider the above an blocker, but it might make a
difference if we are going to acquire more modes and formats, so
once somebody tries to rewrite the logic and finds the resulting
code harder to follow (or not easier to follow), I would be happy to
see the above discarded ;-)

>> +/**
>> + * Format a path according to the specified formatting strategy and append
>> + * the result to the given strbuf.
>> + *
>> + * `dest`   : The string buffer to append the formatted path to.
>> + * `path`   : The path string that needs to be formatted.
>> + * `prefix` : The directory prefix to calculate relative offsets against.
>> + * Pass NULL to default to the current working directory where applicable.
>> + * `format` : The formatting behavior rule to execute.
>> + */
>> +void append_formatted_path(struct strbuf *dest, const char *path,
>> +			   const char *prefix, enum path_format format);
>> +
>
> It is slightly unsatisfying that this function is defined to
> "append" to any existing value in the dest strbuf, rather than
> storing the result in the dest strbuf.  The original caller
> print_path() passes an empty strbuf to this helper, so it can let
> strbuf_realpath_*() functions to strbuf_reset() it (e.g.,
> abspath.c:get_root_part() called by strbuf_realpath_1(), wihch in
> turn is called by strbuf_realpath() and strbuf_realpath_forgiving())
> it freely, which means that use of temporary strbuf like
> canonical_buf only to copy it out to dest is wasteful and unneeded.
> But other callers we will have for this helper later may want to
> append to what they already have, so perhaps it is OK (on the other
> hand, we could say that preserving and appending is what these
> callers can do themselves).

This one we may want to consider a bit more seriously, but it is
entirely up to the future callers of the helper.  If it would make
the callers much easier to write for this helper to have "append"
semantics, I'd be happy to accept the semantics of the above as-is,
but otherwise, I suspect it would be simpler to use if the helper is
defined to replase dest with the result, instead of appending the
result to dest.

> Otherwise, looking good as a no-op bug-to-bug compatible rewrite,
> with a slight optimization (to skip xgetcwd()).

This part of the review does not change in any case.  The
refactoring looks good.

Thanks.

^ permalink raw reply

* Re: [PATCH v3 0/4] pack-objects: support bitmaps and delta-islands with `--path-walk`
From: Junio C Hamano @ 2026-06-22 16:26 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, Jeff King, Elijah Newren
In-Reply-To: <b6ed816c-030b-400a-9fb6-6671fd3cb0b0@gmail.com>

Derrick Stolee <stolee@gmail.com> writes:

> On 6/22/2026 3:35 AM, Junio C Hamano wrote:
>> Taylor Blau <me@ttaylorr.com> writes:
>
>>> Outside of the above, the series is functionally unchanged.
>>>
>>> Thanks in advance for another look.
>>>
>>> Taylor Blau (4):
>>>   t/perf: drop p5311's lookup-table permutation
>>>   pack-objects: support reachability bitmaps with `--path-walk`
>>>   pack-objects: extract `record_tree_depth()` helper
>>>   pack-objects: support `--delta-islands` with `--path-walk`
>> 
>> Very cleanly implemented.  I am not confident that I have followed
>> the detailed logic around delta islands in the last step but the
>> earlier three patches looked trivially good.
> I've been happy with the code, subject to the new data that is presented
> with this version confirming the expected performance benefits. I also
> lack confidence in the delta islands features, but based on my weak
> understanding it looks correct. I believe that Taylor has the right
> expertise here to make up for my lack of context.

Thanks.  Let me mark the topic for 'next' then.

^ permalink raw reply

* Re: [PATCH v3 0/2] environment: move ignore_case into repo_config_values
From: Tian Yuchen @ 2026-06-22 16:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, phillip.wood123, johannes.schindelin, stolee
In-Reply-To: <xmqqjyrr7ipf.fsf@gitster.g>

On 6/22/26 04:16, Junio C Hamano wrote:
> As the compat/ layer is not meant as a general purpose POSIX
> emulation wrapper that is generally reusable to projects other than
> us, if we have a knob settable by end users to affect behaviours of
> lower layer in compat/, it is natural to make repo-settings
> available to them.

I see.

> What is the perceived problem you have in mind, and what are your
> proposed alternatives?

Actually, my reason for showing this question wasn’t because I thought 
there were any architectural problem, but because I felt that for a file 
in compat/win32, which is more on the _downstream_ side (is that 
correct?), we need to exercise extra caution and confirm with its 
maintainer whether the changes are appropriate. That’s why I CC'd 
Johannes Schindelin on this.

Was that the right thing to do?

Regards, yuchen

^ permalink raw reply

* Re: [PATCH v3 1/2] sequencer: factor out parsing of todo commands
From: Junio C Hamano @ 2026-06-22 17:00 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git, Elijah Newren, Patrick Steinhardt
In-Reply-To: <d27dddff93144f7b6d7fc89719bdf53b6856c9fc.1782117361.git.phillip.wood@dunelm.org.uk>

Phillip Wood <phillip.wood123@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Move the code that parses todo commands into a separate function so
> that it can be shared with "git status" in the next commit. As we
> know the input is NUL terminated we do not pass a pointer to the end
> of the line and instead test for a blank line by looking for NUL, CR
> LF, or LF. We use starts_with() instead of starts_with_mem() for the
> same reason. This results in slightly different behavior when there
> a CR at the start of the line that is not followed by LF. Previously
> such a line was treated as a comment rather than an invalid line.

Meaning that the input validation is tighter than before?  I think
it is fine in this case, as I do not see a reason why anybody wants
to use a lone CR as comment introducer.

> +bool sequencer_parse_todo_command(const char **p, enum todo_command *cmd)
> +{
> +	const char *s = *p;
> +
> +	for (int i = 0; i < TODO_COMMENT; i++)
> +		if (is_command(i, p)) {
> +			*cmd = i;
> +			return true;
> +		}
> +
> +	if (starts_with(s, comment_line_str)) {
> +		*cmd = TODO_COMMENT;
> +		return true;
> +	} else if (s[0] == '\n' || (s[0] == '\r' && s[1] == '\n') || !s[0]) {
> +		*cmd = TODO_COMMENT;
> +		return true;
> +	}
> +
> +	return false;
>  }

I notice that the order of noticing concrete comments and comment
lines are swapped relative to the original.  There is no inherently
"natural" order between them, so the change is perfectly OK.  I just
got confused slightly while reading it until I realized that is what
you did.

>  static int check_label_or_ref_arg(enum todo_command command, const char *arg)
> @@ -2716,29 +2737,23 @@ static int parse_insn_line(struct repository *r, struct replay_opts *opts,
>  {
>  	struct object_id commit_oid;
>  	char *end_of_object_name;
> -	int i, saved, status, padding;
> +	int saved, status, padding;
>  
>  	item->flags = 0;
>  
>  	/* left-trim */
>  	bol += strspn(bol, " \t");
>  
> -	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {
> -		item->command = TODO_COMMENT;
> -		item->commit = NULL;
> -		item->arg_offset = bol - buf;
> -		item->arg_len = eol - bol;
> -		return 0;
> -	}
> -
> -	for (i = 0; i < TODO_COMMENT; i++)
> -		if (is_command(i, &bol)) {
> -			item->command = i;
> -			break;
> -		}
> -	if (i >= TODO_COMMENT)
> +	if (!sequencer_parse_todo_command(&bol, &item->command))
>  		return error(_("invalid command '%.*s'"),
>  			     (int)strcspn(bol, " \t\r\n"), bol);
> +
> +	if (item->command == TODO_COMMENT) {
> +		item->commit = NULL;
> +		item->arg_offset = bol - buf;
> +		item->arg_len = eol - bol;
> +		return 0;
> +	}

And the extra stuff that are only relevant to a comment line is
naturally processed by the caller.  OK.

Thanks.  Looking good so far.

^ permalink raw reply

* Re: [PATCH v3 2/2] status: improve rebase todo list parsing
From: Junio C Hamano @ 2026-06-22 17:26 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git, Elijah Newren, Patrick Steinhardt
In-Reply-To: <b3514e9b1c9515bf1a7f7983b9f120d63edba97f.1782117361.git.phillip.wood@dunelm.org.uk>

Phillip Wood <phillip.wood123@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> When there is rebase in progress "git status" displays the last couple
> of completed and the next couple of pending commands from the todo
> list. When it does this it tries to abbreviate the object ids of
> the commits to be picked. Unfortunately it does not abbreviate the
> object ids when the line starts with "fixup -C" or "merge -C". It
> also mistakenly replaces the refname in "reset main" and "update-ref
> refs/heads/main" with the object id that the ref points to.
>
> Fix this by using the function added in the last commit to parse the
> command name and only try to abbreviate the argument for commands that
> take an object id. If a command accepts a label then try to resolve the
> object name as a label first and only if that fails try to resolve it
> as an object_id. When trying to abbreviate an object id, only replace
> the object name if it starts with the abbreviated object id so that
> tag or branch names that contain only hex digits are left unchanged.

;-)  

Strictly speaking, the original that said "if begins with
exed, x, label, or l, then don't bother" style can be extended
without using the function added in the last commit to do this, but
it certainly is much more pleasant to read the resulting code
presented here that uses parsed command enum and switches on it.

> Comments are now processed after stripping any leading
> whitespace from the line. This matches what the sequencer does in
> parse_insn_line(). The existing test cases are updated to test a
> wider variety of commands. Only the pending commands in the tests
> are changed to avoid removing existing coverage.
>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
> diff --git a/wt-status.c b/wt-status.c
> index 479ccc3304b..4b15bda76f4 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -1363,6 +1363,71 @@ static int split_commit_in_progress(struct wt_status *s)
>  	free(rebase_orig_head);
>  
>  	return split_in_progress;
> +}
> +
> +/*
> + * If the whitespace-delimited token starting at or just after *pp
> + * is a hex object id that is longer than its default abbreviation,
> + * abbreviate it in-place, shrinking `line` accordingly. On return
> + * *pp points one past the (possibly abbreviated) token. Leaves both
> + * `line` and *pp-advanced-past-the-token unchanged in all other cases
> + * (non-hex token, label name, unresolvable, or a refname that happens
> + * to consist only of hex digits).
> + */
> +static void abbrev_oid_in_line(struct repository *r, struct strbuf *scratch,
> +			       struct strbuf *line, bool maybe_label, char **pp)
> +{
> +	char *p = *pp;
> +	char *end_of_object_name, saved;
> +	const char *abbrev;
> +	struct object_id oid;
> +	bool have_oid;
> +
> +	p += strspn(p, " \t");
> +	end_of_object_name = p + strcspn(p, " \t");
> +	/*
> +	 * For "merge" and "reset" the object name may be a label or
> +	 * ref rather than a hex object id. Only abbreviate the object
> +	 * name if it is a hex object id.
> +	 */
> +	for (const char *q = p; q < end_of_object_name; q++) {
> +		if (!isxdigit(*q))
> +			goto out;
> +	}

OK.  If the string has non hexdigit, it cannot be a raw object name
so there is no point in rewriting.  OK.

> +	if (maybe_label) {
> +		strbuf_reset(scratch);
> +		strbuf_addf(scratch, "refs/rewritten/%.*s",
> +			    (int)(end_of_object_name - p), p);
> +		if (refs_ref_exists(get_main_ref_store(r), scratch->buf))
> +			goto out; /* object name was a label */
> +	}

If it could be a label, then we check if such a label exists, and if
so, we won't modify it.  OK.

> +	saved = *end_of_object_name;
> +	*end_of_object_name = '\0';
> +	have_oid = !repo_get_oid(r, p, &oid);
> +	*end_of_object_name = saved;
> +	if (!have_oid)
> +		goto out; /* invalid object name */

We obviously cannot abbreviate if we cannot even recognize it as an
object name.  OK.

> +	abbrev = repo_find_unique_abbrev(r, &oid, DEFAULT_ABBREV);
> +	if (!starts_with(p, abbrev))
> +		goto out; /* object name was a refname containing only xdigits */

OK, nice to see sufficient paranoia ;-)

> +	p += strlen(abbrev);
> +	strbuf_remove(line, p - line->buf, end_of_object_name - p);
> +	end_of_object_name = p;

By abbreviating, line->buf only shrinks so we won't risk getting
confused by a realloc() happening under the hood.  Upon entry to
this helper, *pp must be pointing into line->buf, or everything will
go awry but for a file-scope static helper function like this, it
probably is too obvious to anybody that it does not have to be
spelled out.  OK.

> +out:
> +	*pp = end_of_object_name;
> +}


> +/* Skip "[ \t]*(-[cC])?", returns true if "-c/-C" was skipped. */
> +static bool skip_dash_c(char **pp)
> +{
> +	bool ret;
> +	char *p = *pp;
> +
> +	p += strspn(p, " \t");
> +	ret = skip_prefix(p, "-C", &p) || skip_prefix(p, "-c", &p);
> +	*pp = p;
> +
> +	return ret;
>  }

OK.

> @@ -1371,29 +1436,68 @@ static int split_commit_in_progress(struct wt_status *s)
>   * into
>   * "pick d6a2f03 some message"
>   *
> - * The function assumes that the line does not contain useless spaces
> - * before or after the command.
> + * Returns false on comment lines, true otherwise
>   */
> -static void abbrev_oid_in_line(struct repository *r, struct strbuf *line)
> +static bool format_todo_line(struct repository *r, struct strbuf *line)
>  {
> -	struct string_list split = STRING_LIST_INIT_DUP;
> -	struct object_id oid;
> -
> -	if (starts_with(line->buf, "exec ") ||
> -	    starts_with(line->buf, "x ") ||
> -	    starts_with(line->buf, "label ") ||
> -	    starts_with(line->buf, "l "))
> -		return;
> -
> -	if ((2 <= string_list_split(&split, line->buf, " ", 2)) &&
> -	    !repo_get_oid(r, split.items[1].string, &oid)) {
> -		strbuf_reset(line);
> -		strbuf_addf(line, "%s ", split.items[0].string);
> -		strbuf_add_unique_abbrev(line, &oid, DEFAULT_ABBREV);
> -		for (size_t i = 2; i < split.nr; i++)
> -			strbuf_addf(line, " %s", split.items[i].string);
> -	}
> -	string_list_clear(&split, 0);

We essentially said, "do not molest exec and label, but everything
else, as long as there are two (or more) tokens and the second token
looks like an object name, replace it with its abbreviation",
regardless of what the actual command was.  Now we do the right
thing by ...

> +	enum todo_command cmd;
> +	struct strbuf scratch = STRBUF_INIT;
> +	char *p = line->buf;
> +
> +	if (!sequencer_parse_todo_command((const char**)&p, &cmd))
> +		return true; /* keep invalid lines */

... parsing out what the line is about, and ...

> +	switch (cmd) {

... switching on it, to make it clear that we cover all the cases
known to us (and the code will be maintained like so, by not having
a "default" arm).

> +	case TODO_COMMENT:
> +		return false;
> +
> +	case TODO_MERGE: {
> +		/*
> +		 * The argument to -C cannot be a label, but the parents
> +		 * can be labels.
> +		 */
> +		bool maybe_label = !skip_dash_c(&p);
> +
> +		while (true) {
> +			p += strspn(p, " \t");
> +			if (!p[0] || (p[0] == '#' && (!p[1] || isspace(p[1]))))
> +				break;
> +			abbrev_oid_in_line(r, &scratch, line, maybe_label, &p);
> +			maybe_label = true;
> +		}
> +		break;
> +	}
> +
> +	case TODO_FIXUP:
> +		skip_dash_c(&p);
> +		/* fallthrough */

Fixup always refers to raw object ID and never a label, so it
would be OK to just skip -c/-C here ...

> +	case TODO_DROP:
> +	case TODO_EDIT:
> +	case TODO_PICK:
> +	case TODO_REVERT:
> +	case TODO_REWORD:
> +	case TODO_SQUASH:

... and pass "false" for "maybe_label".  OK.

> +		abbrev_oid_in_line(r, &scratch, line, false, &p);
> +		break;
> +
> +	case TODO_RESET:
> +		abbrev_oid_in_line(r, &scratch, line, true, &p);
> +		break;
> +	/*
> +	 * Avoid "default" and instead list all the other commands so
> +	 * that -Wswitch (which is included in -Wall) warns if a new
> +	 * command is added without handling it in this function.
> +	 */
> +	case TODO_BREAK:
> +	case TODO_EXEC:
> +	case TODO_LABEL:
> +	case TODO_NOOP:
> +	case TODO_UPDATE_REF:
> +		break;
> +	}
> +
> +	strbuf_release(&scratch);
> +	return true;
>  }
>  
>  static int read_rebase_todolist(struct repository *r, const char *fname, struct string_list *lines)
> @@ -1411,13 +1515,9 @@ static int read_rebase_todolist(struct repository *r, const char *fname, struct
>  			  repo_git_path_replace(r, &buf, "%s", fname));
>  	}
>  	while (!strbuf_getline_lf(&buf, f)) {
> -		if (starts_with(buf.buf, comment_line_str))
> -			continue;
>  		strbuf_trim(&buf);
> -		if (!buf.len)
> -			continue;
> -		abbrev_oid_in_line(r, &buf);
> -		string_list_append(lines, buf.buf);
> +		if (format_todo_line(r, &buf))
> +			string_list_append(lines, buf.buf);
>  	}
>  	fclose(f);

This loop got much nicer than the original.

Looks good.

Thanks.

^ permalink raw reply

* Re: [GSoC Patch v7 1/3] path: extract append_formatted_path() and use in rev-parse
From: K Jayatheerth @ 2026-06-22 17:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: a3205153416, git, jltobler, kumarayushjha123, lucasseikioshiro,
	phillip.wood, sandals
In-Reply-To: <xmqq1pdy36me.fsf@gitster.g>

Hey Junio,

On Mon, Jun 22, 2026 at 2:32 AM Junio C Hamano <gitster@pobox.com> wrote:

> So, for the existing user of this logic, the preimage ...
>
> > -static void print_path(const char *path, const char *prefix, enum format_type format, enum default_type def)
> >  {

...

> > -     free(cwd);
> >  }
>
> ... now becomes this postimage.
>

Yes that's right!

> > +static void print_path(const char *path, const char *prefix,
> > +                    enum format_type format, enum default_type def)
> >  {
> > +     struct strbuf sb = STRBUF_INIT;
> > +     enum path_format fmt;
> > +
> > +     if (format == FORMAT_RELATIVE) {
> > +             fmt = PATH_FORMAT_RELATIVE;
> > +     } else if (format == FORMAT_CANONICAL) {
> > +             fmt = PATH_FORMAT_CANONICAL;
> > +     } else /* FORMAT_DEFAULT */ {
> > +             switch (def) {
> > +             case DEFAULT_RELATIVE:
> > +                     fmt = PATH_FORMAT_RELATIVE;
> > +                     break;
> > +             case DEFAULT_RELATIVE_IF_SHARED:
> > +                     fmt = PATH_FORMAT_RELATIVE_IF_SHARED;
> > +                     break;
> > +             case DEFAULT_CANONICAL:
> > +                     fmt = PATH_FORMAT_CANONICAL;
> > +                     break;
> > +             case DEFAULT_UNMODIFIED:
> > +             default:
> > +                     fmt = PATH_FORMAT_UNMODIFIED;
> > +                     break;
> >               }
> >       }
> > +
> > +     append_formatted_path(&sb, path, prefix, fmt);
> > +     puts(sb.buf);
> > +
> > +     strbuf_release(&sb);
> >  }
>
> Mostly, the code translates FORMAT_FOO constants into the new
> PATH_FORMAT_FOO constants, and lets append_formatted_path() do the
> heavy lifting.
>
> It is a minor point, but wouldn't it make it simpler to handle
> format_default first?  I.e.,
>
>         if (format == FORMAT_DEFAULT)
>                 switch (def) {
>                 case DEFAULT_RELATIVE:
>                         format = DEFAULT_RELATIVE;
>                         break;
>                 ...
>                 case DEFAULT_UNMODIFIED:
>                 default:
>                         format = DEFAULT_UNMODIFIED;
>                         break;
>         }
>         switch (format) {
>         case FORMAT_RELATIVE: fmt = PATH_FORMAT_RELATIVE; break;
>         case FORMAT_CANONICAL: fmt = PATH_FORMAT_CANONICAL; break;
>         ...
>         }
>
> Perhaps yes, perhaps not.  I dunno.
>

I see you have continued this point further
I am going to respond to this in detail there.

> > diff --git a/path.c b/path.c
> > index d7e17bf174..6d8e892ada 100644
> > --- a/path.c
> > +++ b/path.c
> > @@ -1579,6 +1579,75 @@ char *xdg_cache_home(const char *filename)
> >       return NULL;
> >  }
> >
> > +void append_formatted_path(struct strbuf *dest, const char *path,
> > +                        const char *prefix, enum path_format format)
> > +{
> > +     switch (format) {
> > +     case PATH_FORMAT_UNMODIFIED:
> > +             strbuf_addstr(dest, path);
> > +             break;
>
> In the orignal "print_path()", DEFAULT/UNMODIFIED did this "show
> unmodified".  OK.
>
> > +     case PATH_FORMAT_RELATIVE: {
> > +             struct strbuf relative_buf = STRBUF_INIT;
> > +             struct strbuf real_path = STRBUF_INIT;
> > +             struct strbuf real_prefix = STRBUF_INIT;
> > +             char *cwd = NULL;
> > +
> > +             /*
> > +              * We don't ever produce a relative path if prefix is NULL,
> > +              * so set the prefix to the current directory so that we can
> > +              * produce a relative path whenever possible.
> > +              */
> > +             if (!prefix)
> > +                     prefix = cwd = xgetcwd();
>
> This is what was done in the original "print_path()" upfront, with
> a similar comment to explay why this happens.  Looking good.  Also
> we no longer call xgetcwd() when we do not need to, which is goodd.
>
> > +             if (!is_absolute_path(path)) {
> > +                     strbuf_realpath_forgiving(&real_path, path, 1);
> > +                     path = real_path.buf;
> > +             }
> > +             if (!is_absolute_path(prefix)) {
> > +                     strbuf_realpath_forgiving(&real_prefix, prefix, 1);
> > +                     prefix = real_prefix.buf;
> > +             }
>
> There used to be a comment explaining why we make realpath calls,
> which is now lost.  Perhaps what the comment said was so obvious
> that we are better off without it?  I offhand do not know.
>

When the logic was a single block, the comment felt necessary to
explain the flow.
By splitting it into explicit switch cases, the logic became a bit
more self-evident, so I removed it to reduce clutter.
I kept the other comments where the reasoning is less obvious.


> What is done to make the paths real is the same as before, which is
> good.
>
> > +             strbuf_addstr(dest, relative_path(path, prefix, &relative_buf));
> > +
> > +             strbuf_release(&relative_buf);
> > +             strbuf_release(&real_path);
> > +             strbuf_release(&real_prefix);
> > +             free(cwd);
> > +             break;
> > +     }
>
> OK.
>
> > +     case PATH_FORMAT_RELATIVE_IF_SHARED: {
> > +             struct strbuf relative_buf = STRBUF_INIT;
> > +
> > +             /*
> > +              * If we're using RELATIVE_IF_SHARED mode, then we want an
> > +              * absolute path unless the two share a common prefix, so don't
> > +              * default the prefix to the current working directory. Doing so
> > +              * would cause a relative path to always be produced if possible.
> > +              */

I thought this comment made sense keeping in for instance.

> Identical to the original, which is good.
> > +
> > +     case PATH_FORMAT_CANONICAL: {
> > +             struct strbuf canonical_buf = STRBUF_INIT;
> > +
> > +             strbuf_realpath_forgiving(&canonical_buf, path, 1);
> > +             strbuf_addbuf(dest, &canonical_buf);
> > +
> > +             strbuf_release(&canonical_buf);
> > +             break;
> > +     }
> > +
> > +     default:
> > +             BUG("unknown path_format value %d", format);
> > +     }
> > +}
>
> OK.
>
> > +/**
> > + * Format a path according to the specified formatting strategy and append
> > + * the result to the given strbuf.
> > + *
> > + * `dest`   : The string buffer to append the formatted path to.
> > + * `path`   : The path string that needs to be formatted.
> > + * `prefix` : The directory prefix to calculate relative offsets against.
> > + * Pass NULL to default to the current working directory where applicable.
> > + * `format` : The formatting behavior rule to execute.
> > + */
> > +void append_formatted_path(struct strbuf *dest, const char *path,
> > +                        const char *prefix, enum path_format format);
> > +
>
> It is slightly unsatisfying that this function is defined to
> "append" to any existing value in the dest strbuf, rather than
> storing the result in the dest strbuf.  The original caller
> print_path() passes an empty strbuf to this helper, so it can let
> strbuf_realpath_*() functions to strbuf_reset() it (e.g.,
> abspath.c:get_root_part() called by strbuf_realpath_1(), wihch in
> turn is called by strbuf_realpath() and strbuf_realpath_forgiving())
> it freely, which means that use of temporary strbuf like
> canonical_buf only to copy it out to dest is wasteful and unneeded.
> But other callers we will have for this helper later may want to
> append to what they already have, so perhaps it is OK (on the other
> hand, we could say that preserving and appending is what these
> callers can do themselves).
>

Hmm, I thought about this for a while.

Then I looked at what ls-tree.c does(using an accumulator).
They already routinely use temporary `strbuf`s to calculate
relative/absolute paths before
appending them to their main output string.

Because callers who need to accumulate can easily do the preserving
and appending
themselves with a temporary buffer, there is no reason to force that
overhead into our helper.

I will change the semantics from "append" to "replace", rename the
helper back to `format_path()`.
I hope I am looking at ls-tree.c correctly here : )

Eliminate the wasteful `canonical_buf` allocations so we can pass the
destination buffer directly to functions like
`strbuf_realpath_forgiving()`.
This is a good suggestion actually, thanks!

> Otherwise, looking good as a no-op bug-to-bug compatible rewrite,
> with a slight optimization (to skip xgetcwd()).
>
> Thanks.

On Mon, Jun 22, 2026 at 9:33 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:

> > ...
> > It is a minor point, but wouldn't it make it simpler to handle
> > format_default first?  I.e.,
> >
> >       if (format == FORMAT_DEFAULT)
> >               switch (def) {
> >               case DEFAULT_RELATIVE:
> >                       format = DEFAULT_RELATIVE;
> >                       break;
> >               ...
> >               case DEFAULT_UNMODIFIED:
> >               default:
> >                       format = DEFAULT_UNMODIFIED;
> >                       break;
> >       }
> >       switch (format) {
> >         case FORMAT_RELATIVE: fmt = PATH_FORMAT_RELATIVE; break;
> >       case FORMAT_CANONICAL: fmt = PATH_FORMAT_CANONICAL; break;
> >       ...
> >       }
> >
> > Perhaps yes, perhaps not.  I dunno.
>
> I do not consider the above an blocker, but it might make a
> difference if we are going to acquire more modes and formats, so
> once somebody tries to rewrite the logic and finds the resulting
> code harder to follow (or not easier to follow), I would be happy to
> see the above discarded ;-)
>

True, if new formats are introduced
this would instantly become sloppy.

I will change it to future proof since I am
looking to send v8 for append_formatted_path().

Although I would be surprised to see an example for a new format.

> >> +/**
> >> + * Format a path according to the specified formatting strategy and append
> >> + * the result to the given strbuf.
> >> + *
> >> + * `dest`   : The string buffer to append the formatted path to.
> >> + * `path`   : The path string that needs to be formatted.
> >> + * `prefix` : The directory prefix to calculate relative offsets against.
> >> + * Pass NULL to default to the current working directory where applicable.
> >> + * `format` : The formatting behavior rule to execute.
> >> + */
> >> +void append_formatted_path(struct strbuf *dest, const char *path,
> >> +                       const char *prefix, enum path_format format);
> >> +
> >
> > It is slightly unsatisfying that this function is defined to
> > "append" to any existing value in the dest strbuf, rather than
> > storing the result in the dest strbuf.  The original caller
> > print_path() passes an empty strbuf to this helper, so it can let
> > strbuf_realpath_*() functions to strbuf_reset() it (e.g.,
> > abspath.c:get_root_part() called by strbuf_realpath_1(), wihch in
> > turn is called by strbuf_realpath() and strbuf_realpath_forgiving())
> > it freely, which means that use of temporary strbuf like
> > canonical_buf only to copy it out to dest is wasteful and unneeded.
> > But other callers we will have for this helper later may want to
> > append to what they already have, so perhaps it is OK (on the other
> > hand, we could say that preserving and appending is what these
> > callers can do themselves).
>
> This one we may want to consider a bit more seriously, but it is
> entirely up to the future callers of the helper.  If it would make
> the callers much easier to write for this helper to have "append"
> semantics, I'd be happy to accept the semantics of the above as-is,
> but otherwise, I suspect it would be simpler to use if the helper is
> defined to replase dest with the result, instead of appending the
> result to dest.
>

I am still unsure if I am following ls-tree.c correctly.
If I am then I think it is a very good change to have for v8 as I
specified above.

> > Otherwise, looking good as a no-op bug-to-bug compatible rewrite,
> > with a slight optimization (to skip xgetcwd()).
>
> This part of the review does not change in any case.  The
> refactoring looks good.

Thank you ; )

Regards,
- K Jayatheerth

^ permalink raw reply

* Re: [PATCH 1/3] odb/source-packed: extract logic to skip certain packs
From: Junio C Hamano @ 2026-06-22 17:51 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260622-pks-connected-generic-promisor-checks-v1-1-25eba2698202@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> The caller can pass flags that allow them to filter out specific kinds
> of objects when iterating objects via `odb_for_each_object()`. This only
> works for "normal" iteration though, as we `BUG()` when the user passes
> flags and specifies an object prefix.
>
> This limitation will be lifted in the next commit. Prepare for this by
> extracting the logic that skips certain kinds of packs so that we can
> easily reuse it.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  odb/source-packed.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)

Quite straight-forward creation of a simple helper function.

> diff --git a/odb/source-packed.c b/odb/source-packed.c
> index 42c28fba0e..3afc4bf01f 100644
> --- a/odb/source-packed.c
> +++ b/odb/source-packed.c
> @@ -126,6 +126,22 @@ static int match_hash(unsigned len, const unsigned char *a, const unsigned char
>  	return 1;
>  }
>  
> +static bool should_exclude_pack(struct packed_git *p, enum odb_for_each_object_flags flags)
> +{
> +	if ((flags & ODB_FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
> +		return true;
> +	if ((flags & ODB_FOR_EACH_OBJECT_PROMISOR_ONLY) &&
> +	    !p->pack_promisor)
> +		return true;
> +	if ((flags & ODB_FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS) &&
> +	    p->pack_keep_in_core)
> +		return true;
> +	if ((flags & ODB_FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS) &&
> +	    p->pack_keep)
> +		return true;
> +	return false;
> +}
> +
>  static int for_each_prefixed_object_in_midx(
>  	struct odb_source_packed *store,
>  	struct multi_pack_index *m,
> @@ -306,17 +322,9 @@ static int odb_source_packed_for_each_object(struct odb_source *source,
>  	for (e = packfile_store_get_packs(packed); e; e = e->next) {
>  		struct packed_git *p = e->pack;
>  
> -		if ((opts->flags & ODB_FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
> -			continue;
> -		if ((opts->flags & ODB_FOR_EACH_OBJECT_PROMISOR_ONLY) &&
> -		    !p->pack_promisor)
> -			continue;
> -		if ((opts->flags & ODB_FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS) &&
> -		    p->pack_keep_in_core)
> -			continue;
> -		if ((opts->flags & ODB_FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS) &&
> -		    p->pack_keep)
> +		if (should_exclude_pack(p, opts->flags))
>  			continue;
> +
>  		if (open_pack_index(p)) {
>  			pack_errors = 1;
>  			continue;

^ permalink raw reply

* Re: [PATCH 3/3] connected: search promisor objects generically
From: Junio C Hamano @ 2026-06-22 17:57 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260622-pks-connected-generic-promisor-checks-v1-3-25eba2698202@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> When performing connectivity checks we have to figure out whether any of
> the new objects are promisor objects, as we cannot assume full
> connectivity if so.
>
> This check is performed by iterating through all packfiles in the
> repository and searching each of them for the given object. Of course,
> this mechanism is quite specific to implementation details of the object
> database, as we assume that it uses packfiles in the first place.
>
> Refactor the logic so that we instead use `odb_for_each_object_ext()`
> with an object prefix filter and the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY`
> flag. This will yield all objects that have the exact object name and
> that are part of a promisor pack in a generic way.
> ...


> -		 *
> -		 * Before checking for promisor packs, be sure we have the
> -		 * latest pack-files loaded into memory.
>  		 */
> -		odb_reprepare(the_repository->objects);

Hmph?

>  		do {
> -			struct packed_git *p;
> -
> -			repo_for_each_pack(the_repository, p) {
> -				if (!p->pack_promisor)
> -					continue;
> -				if (find_pack_entry_one(oid, p))
> -					goto promisor_pack_found;
> +			opts.prefix = oid;
> +
> +			err = odb_for_each_object_ext(the_repository->objects,
> +						      NULL, promised_object_cb,
> +						      NULL, &opts);
> +			if (err < 0)
> +				break;
> +			if (err > 0) {
> +				err = 0;
> +				continue;
>  			}

So we used to manually iterate and stop when we have a matching pack
entry, but now "stop when we find" is done by promisor_object_cb
callback that returns 1.

What is the reason why we no longer odb_(re)prepare() upfront before
going into the loop?  Would it make us miss a newly added promisor
packs?  We will fall back to rev-list for correctness, so it may not
matter, though.

> +
>  			/*
>  			 * Fallback to rev-list with oid and the rest of the
>  			 * object IDs provided by fn.
>  			 */
>  			goto no_promisor_pack_found;
> -promisor_pack_found:
> -			;
>  		} while ((oid = fn(cb_data)) != NULL);
> +
>  		if (opt->err_fd)
>  			close(opt->err_fd);
> -		return 0;
> +		return err;
>  	}
>  
>  no_promisor_pack_found:

^ permalink raw reply

* Re: [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue
From: Derrick Stolee @ 2026-06-22 18:00 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <5492acda0ad05eab67198880a5262e84a3f22ba6.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> 
> Move ahead_behind() off the shared nonstale_queue abstraction to use
> a plain prio_queue with a local max_nonstale pointer. The nonstale
> tracking is inlined into insert_no_dup().
> 
> This prepares for replacing nonstale_queue with a paint_queue struct
> that tracks per-side commit counts, which ahead_behind() does not
> need. No behavior change.

This change is only needed if we are intending to delete the nonstale
queue struct, which is currently happening in your patch 2. But we
are essentially recreating its logic in a more disjointed way here,
leaving this code in a worse state.

I'd rather see patch 2 create a _new_ data structure instead of
_replacing_ one that already works for multiple callers. (It does
drop to only one caller, but that seems cleaner to me right now.)

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH v3 0/2] environment: move ignore_case into repo_config_values
From: Junio C Hamano @ 2026-06-22 18:01 UTC (permalink / raw)
  To: Tian Yuchen; +Cc: git, ps, phillip.wood123, johannes.schindelin, stolee
In-Reply-To: <b5a9115a-c909-405f-b150-f956d866b1eb@malon.dev>

Tian Yuchen <cat@malon.dev> writes:

> On 6/22/26 04:16, Junio C Hamano wrote:
>> As the compat/ layer is not meant as a general purpose POSIX
>> emulation wrapper that is generally reusable to projects other than
>> us, if we have a knob settable by end users to affect behaviours of
>> lower layer in compat/, it is natural to make repo-settings
>> available to them.
>
> I see.
>
>> What is the perceived problem you have in mind, and what are your
>> proposed alternatives?
>
> Actually, my reason for showing this question wasn’t because I thought 
> there were any architectural problem, but because I felt that for a file 
> in compat/win32, which is more on the _downstream_ side (is that 
> correct?), we need to exercise extra caution and confirm with its 
> maintainer whether the changes are appropriate. That’s why I CC'd 
> Johannes Schindelin on this.
>
> Was that the right thing to do?

Yup, Dscho is the right person to decide on the design issues on
Windows build.

^ permalink raw reply

* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
From: Derrick Stolee @ 2026-06-22 18:10 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <316e4dfe261043730c77142639f86f5c3cabe370.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>

> +	if (!(old_paint & STALE)) {
> +		switch (old_paint & (PARENT1 | PARENT2)) {
> +		case 0:                  break;
> +		case PARENT1:            queue->p1_count--; break;
> +		case PARENT2:            queue->p2_count--; break;
> +		case PARENT1 | PARENT2:  queue->pending_merge_bases--; break;
> +		default:                 BUG("unexpected paint state");
> +		}
> +	}
> +	if (!(new_paint & STALE)) {
> +		switch (new_paint & (PARENT1 | PARENT2)) {
> +		case 0:                  break;
> +		case PARENT1:            queue->p1_count++; break;
> +		case PARENT2:            queue->p2_count++; break;
> +		case PARENT1 | PARENT2:  queue->pending_merge_bases++; break;
> +		default:                 BUG("unexpected paint state");
> +		}
> +	}

While correct and compact, I don't believe that these switch
statements follow the coding guidelines. We should split the
lines appropriately so they are more standard, such as:

if (!(new_paint & STALE)) {
	switch (new_paint & (PARENT1 | PARENT2)) {
	case 0:
		break;

	case PARENT1:
		queue->p1_count++;
		break;

	case PARENT2:
		queue->p2_count++;
		break;

	case PARENT1 | PARENT2:
		queue->pending_merge_bases++;
		break;

	default:
		BUG("unexpected paint state");
	}
}

Also: technically "case 0" should be a BUG() state, right? We
shouldn't be walking any commit that isn't reachable from at
least one side. (case 0 does happen for old_paint, though.)

>  }
>  
> -static void clear_nonstale_queue(struct nonstale_queue *queue)
> +static void paint_queue_put(struct paint_queue *queue,
> +			    struct commit *c, unsigned add_flags)
>  {
> -	clear_prio_queue(&queue->pq);
> -	queue->max_nonstale = NULL;
> -}
> +	unsigned old_flags = c->object.flags;
> +	c->object.flags |= add_flags;
  
Diffs like this are part of the reason I'd like to see a _new_
data structure instead of replacing the old one. Keeping the
old one for ahead_behind seems like a good idea to me, but even
if we don't land on that end state then deleting the old code
_after_ adding the new code will make the diff more readable.

> -	struct nonstale_queue queue = {
> -		{ compare_commits_by_gen_then_commit_date }
> +	struct paint_queue queue = {
> +		.pq = { compare_commits_by_gen_then_commit_date }
>  	};

I didn't notice when reading the struct definition, but looking at
'pq' here makes me think that we shouldn't be using that abbreviation
as it could stand for "prio_queue" or "paint_queue".

> +	while ((commit = paint_queue_get(&queue))) {
...> +
> +		if (queue.p1_count + queue.p2_count +
> +		    queue.pending_merge_bases == 0)
> +			break;
>  	}
When possible, I like to try to make loops only have one terminating
condition. Should we have paint_queue_get() return NULL when it sees
this internal state condition?

Also, I'd rather see it of the form of (!count) instead of using
addition to make it clear that we care about each value being zero.

Finally, I think we actually want this case to get the benefit:

	if ((!queue.p1_count || !queue.p2_count) &&
	    !queue.pending_merge_bases)
	    
I do see that you have this condition in patch 3 with the extra
detail that the max generation in the queue is finite. I think this
is more reason to include this in the data structure method and not
in the loop.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Derrick Stolee @ 2026-06-22 18:12 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <ed12a5cb5b76925cff08d2ab61efeda382b4477a.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> 
> Add an early termination check to paint_down_to_common() using the
> per-side counters introduced in the previous commit. Once the walk
> enters the finite-generation region, terminate early when one side's
> exclusive count drops to zero -- no new merge-base can form without
> both paint sides meeting.
> 
> The check also waits for pending_merge_bases to reach zero, ensuring
> all merge-base candidates have been popped and recorded before
> exiting.
> 
> The INFINITY gate ensures correctness: commits without a commit-graph
> entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
> which is not topologically reliable. The optimization only fires
> once the walk enters the finite-generation region where ordering
> guarantees hold.
> 
> On large repositories with commit-graph, this yields 100-1000x
> speedups for merge-base queries where one side (e.g. a PR branch) is
> much smaller than the other.
> 
> Helped-by: Derrick Stolee <stolee@gmail.com>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
>  commit-reach.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/commit-reach.c b/commit-reach.c
> index ba1e896f0f..fcd1ad0167 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -201,6 +201,19 @@ static int paint_down_to_common(struct repository *r,
>  		if (queue.p1_count + queue.p2_count +
>  		    queue.pending_merge_bases == 0)
>  			break;
> +
> +		/*
> +		 * Side exhaustion: a new merge-base can only form
> +		 * when both PARENT1-only and PARENT2-only commits
> +		 * remain in the queue.  In the finite-generation
> +		 * region the queue is ordered topologically, so
> +		 * no future step can add paint to visited commits
> +		 * and an exhausted side cannot reappear.
> +		 */
> +		if (generation < GENERATION_NUMBER_INFINITY &&
> +		    queue.pending_merge_bases == 0 &&
> +		    (queue.p1_count == 0 || queue.p2_count == 0))
> +			break;
I mentioned it earlier, but I think this check should be in the
dequeueing method instead of in the tail of the loop.

But I think this is the correct ending case.

I like that you broke this out into its own patch to demonstrate
that this is the key performance boost. It may be good to have
some performance test numbers that demonstrate that patch 2 does
not add any substantial overhead (timing should match previous
code) and in patch 3 this single condition gets us a huge benefit,
though it requires the data tracking of patch 2 to work.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases
From: Derrick Stolee @ 2026-06-22 18:15 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <91372b975fbe102538c05c7d2cdae356539d1bbd.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Add test cases to t6600-test-reach.sh that exercise edge cases in the
> side-exhaustion optimization for paint_down_to_common():
> 
>  - in_merge_bases_many:self: commit is both A and one of the X inputs
>  - get_merge_bases_many:duplicate-twos: duplicate entries in X list
>  - get_merge_bases_many:pending-stale: STALE transition on an
>    already-painted commit (ps-* diamond topology)
>  - get_merge_bases_many:infinity-both-sides: both tips outside the
>    commit-graph with non-monotonic dates (pi-* topology)

It's usually my preference to see these tests show up before the
new code arrives, that way we can see that they already work with
the old logic and continue to work with the new logic.

It's minor, but putting them after your code change may be adding
enforcement of a change of behavior.

One thing that could be helpful here is to consider tracing a
count of "commits walked" in the merge-base code, then you could
have these tests demonstrate the performance benefit by checking
for that number changing.

In t6600, that tracing number would not be the same across the
three different data shapes (full graph, half graph, no graph) and
that could be valuable to demonstrate in tests.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests
From: Derrick Stolee @ 2026-06-22 18:16 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <faf5bc98ede79965e23bfe1535127d6f52221680.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> Add t6099 to test the case where multiple merge-base candidates exist
> and one is an ancestor of another. This exercises the side-exhaustion
> optimization in paint_down_to_common together with the
> remove_redundant safety net in get_merge_bases_many_0.

Same as the previous patch: I'd like to see these before the code
change. And if we trace a count of commits walked, we'd be able to
see the number change in this specific case.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc
From: Derrick Stolee @ 2026-06-22 18:21 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <9cbfc67d724d91b9abc3621f03a3c97208c76a70.1781951820.git.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> 
> Add a technical document describing the paint_down_to_common()
> algorithm used for merge-base computation.

I like the idea of documenting this so it's easier to understand.

There is risk of drift from the actual implementation. You may want
to add a comment to the method in commit-reach.c to indicate that
any change should be reflected in this document.

> +Termination
> +-----------
> +
> +Termination happens when we can prove that no extra progress is
> +possible. We are done with the main loop when one of the following
> +conditions holds:
> +
> +  1. The queue is empty.
> +  2. The queue only contains STALE entries.
> +  3. Side-exhaustion: the walk has reached the finite region and one
> +     of the sides is fully exhausted.
It could be an interesting exercise, but potentially wasteful, to
add this document as a Patch 1, but reflecting the old algorithm
and then to update the document at the same time as you update the
code.

The changes in your patch 2 would impact this doc in terms of the
data being tracked by the paint_queue data structure instead of the
nonstale_queue structure (though those details are not currently
handled in the current version). The change to the termination
condition would come along with patch 3.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted
From: Derrick Stolee @ 2026-06-22 18:22 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <pull.2149.git.1781951820.gitgitgadget@gmail.com>

On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> Hi,
> 
> This follows up on my RFC [1] with a concrete proposal. I expect the design
> to still be scrutinized, but that may be easier with actual code to look at.
> 
> I tried to make this easier to review by splitting into atomic patches. The
> first two patches are the meatiest parts, though they are pure refactoring.
> The behavior change is in patch 3 and is in itself quite small. The last
> patch adds technical documentation to support future development.
Thanks for putting this together carefully.

I gave some feedback on the specific code and the patch organization.
Overall, I believe that this implementation is functionally correct
and everything I have to say is about presentation and data gathering.

I look forward to a non-RFC v2.

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue
From: Kristofer Karlsson @ 2026-06-22 18:53 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <001e8da6-3232-4cfa-ba6b-35d3489e4779@gmail.com>

On Mon, 22 Jun 2026 at 20:00, Derrick Stolee <stolee@gmail.com> wrote:
>
> This change is only needed if we are intending to delete the nonstale
> queue struct, which is currently happening in your patch 2. But we
> are essentially recreating its logic in a more disjointed way here,
> leaving this code in a worse state.
>
> I'd rather see patch 2 create a _new_ data structure instead of
> _replacing_ one that already works for multiple callers. (It does
> drop to only one caller, but that seems cleaner to me right now.)

I can definitely do that and leave ahead_behind unchanged for v2.
I was thinking that with only a single caller, and ahead_behind
being simpler than paint_down in this respect, it would be
worthwhile to simplify it, but if so I could instead do that as
a standalone follow up (though it may prove to be not enough
value for the win).

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
From: Kristofer Karlsson @ 2026-06-22 19:14 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <f0c9eb6e-60b1-4eb6-86be-3af4d87afe85@gmail.com>

On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> > From: Kristofer Karlsson <krka@spotify.com>
>
> > +     if (!(old_paint & STALE)) {
> > +             switch (old_paint & (PARENT1 | PARENT2)) {
> > +             case 0:                  break;
> > +             case PARENT1:            queue->p1_count--; break;
> > +             case PARENT2:            queue->p2_count--; break;
> > +             case PARENT1 | PARENT2:  queue->pending_merge_bases--; break;
> > +             default:                 BUG("unexpected paint state");
> > +             }
> > +     }
> > +     if (!(new_paint & STALE)) {
> > +             switch (new_paint & (PARENT1 | PARENT2)) {
> > +             case 0:                  break;
> > +             case PARENT1:            queue->p1_count++; break;
> > +             case PARENT2:            queue->p2_count++; break;
> > +             case PARENT1 | PARENT2:  queue->pending_merge_bases++; break;
> > +             default:                 BUG("unexpected paint state");
> > +             }
> > +     }
>
> While correct and compact, I don't believe that these switch
> statements follow the coding guidelines. We should split the
> lines appropriately so they are more standard, such as:
>
> if (!(new_paint & STALE)) {
>         switch (new_paint & (PARENT1 | PARENT2)) {
>         case 0:
>                 break;
>
>         case PARENT1:
>                 queue->p1_count++;
>                 break;
>
>         case PARENT2:
>                 queue->p2_count++;
>                 break;
>
>         case PARENT1 | PARENT2:
>                 queue->pending_merge_bases++;
>                 break;
>
>         default:
>                 BUG("unexpected paint state");
>         }
> }

Agreed, I will change to that style. I did try to look for style guidelines
but I missed the .clang-format file (I was only looking through text files).
Apologies, will remember clang-format for next time (and v2)

> Also: technically "case 0" should be a BUG() state, right? We
> shouldn't be walking any commit that isn't reachable from at
> least one side. (case 0 does happen for old_paint, though.)

No, this is actually intended - initially I started with skipping
case 0 and let it fall through, but that would hide _other_ bugs.
I use 0 as a marker for "not in the queue" so we have this:
Enqueuing: 0 -> flags
Dequeueing: flags -> 0
Only the case with the modified commit being in the queue
will have non-zero flags. I tried to document this, but perhaps
it is not clear enough, I will see if I can rephrase it, or add an
inline comment around the case itself.

> > -static void clear_nonstale_queue(struct nonstale_queue *queue)
> > +static void paint_queue_put(struct paint_queue *queue,
> > +                         struct commit *c, unsigned add_flags)
> >  {
> > -     clear_prio_queue(&queue->pq);
> > -     queue->max_nonstale = NULL;
> > -}
> > +     unsigned old_flags = c->object.flags;
> > +     c->object.flags |= add_flags;
>
> Diffs like this are part of the reason I'd like to see a _new_
> data structure instead of replacing the old one. Keeping the
> old one for ahead_behind seems like a good idea to me, but even
> if we don't land on that end state then deleting the old code
> _after_ adding the new code will make the diff more readable.

Agreed, will address that.

> > -     struct nonstale_queue queue = {
> > -             { compare_commits_by_gen_then_commit_date }
> > +     struct paint_queue queue = {
> > +             .pq = { compare_commits_by_gen_then_commit_date }
> >       };
>
> I didn't notice when reading the struct definition, but looking at
> 'pq' here makes me think that we shouldn't be using that abbreviation
> as it could stand for "prio_queue" or "paint_queue".

Good point, I should pick a longer name for the field. Perhaps simply queue
(I want to avoid prio_queue since it exactly matches the name of the struct
which could be confusing.)

> > +     while ((commit = paint_queue_get(&queue))) {
> ...> +
> > +             if (queue.p1_count + queue.p2_count +
> > +                 queue.pending_merge_bases == 0)
> > +                     break;
> >       }
> When possible, I like to try to make loops only have one terminating
> condition. Should we have paint_queue_get() return NULL when it sees
> this internal state condition?

Possibly, but that would couple the paint_queue struct very tightly with
the usage. Not a problem in practice since it only has one call site, and
it's unlikely that we want to add more of them but it may feel more natural
to let the paint_queue purely have the queue semantics and counters,
and keep the halt condition within the function itself. I don't feel
super-strongly about this and can change it if needed, I will just need to
verify that nothing else gets complex as a result, I have not fully thought
through the effects.

> Also, I'd rather see it of the form of (!count) instead of using
> addition to make it clear that we care about each value being zero.

I did consider that, and most of the code in commit-reach.c at least
prefers x and !x over x != 0 and x == 0, but my thinking was that
other code in the repo did use comparison operators specifically
for things like counters. Happy to change it to conform better though!

> Finally, I think we actually want this case to get the benefit:
>
>         if ((!queue.p1_count || !queue.p2_count) &&
>             !queue.pending_merge_bases)
>
> I do see that you have this condition in patch 3 with the extra
> detail that the max generation in the queue is finite. I think this
> is more reason to include this in the data structure method and not
> in the loop.

Yes, but just to be clear, you don't want to merge together patch 2 and 3
here, just grouping the halt conditions closer together
(within paint_queue_get)? Keeping patch 2 and 3 separate would be nice
to make it easier to show that introducing this extra counter bookkeeping
does not negatively impact the overall performance too much.

Thanks! I appreciate the thorough review of this patch
(which I feared was the most annoying one to look at).

Kristofer

^ permalink raw reply

* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Kristofer Karlsson @ 2026-06-22 19:19 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <5c43f6ce-4dfe-47dd-b96a-80de57ecf108@gmail.com>

On Mon, 22 Jun 2026 at 20:12, Derrick Stolee <stolee@gmail.com> wrote:
> > +             if (generation < GENERATION_NUMBER_INFINITY &&
> > +                 queue.pending_merge_bases == 0 &&
> > +                 (queue.p1_count == 0 || queue.p2_count == 0))
> > +                     break;
> I mentioned it earlier, but I think this check should be in the
> dequeueing method instead of in the tail of the loop.

Yes, I will try to fold this one into the paint_queue_get as well.

> I like that you broke this out into its own patch to demonstrate
> that this is the key performance boost. It may be good to have
> some performance test numbers that demonstrate that patch 2 does
> not add any substantial overhead (timing should match previous
> code) and in patch 3 this single condition gets us a huge benefit,
> though it requires the data tracking of patch 2 to work.

Good point, I will try to run enough local tests to ensure that patch 2
does not add too much overhead to slow things down.
I think I may need to create some type of (temporary, internal)
test runner that runs the same walk multiple times to reduce
the noise from parsing commits. I am not sure if I should also
commit such a performance test or simply include a brief summary
in the commit message

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases
From: Kristofer Karlsson @ 2026-06-22 19:25 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren
In-Reply-To: <1588b53d-9576-4752-9459-da48276e4b2a@gmail.com>

On Mon, 22 Jun 2026 at 20:15, Derrick Stolee <stolee@gmail.com> wrote:
> It's usually my preference to see these tests show up before the
> new code arrives, that way we can see that they already work with
> the old logic and continue to work with the new logic.
>
> It's minor, but putting them after your code change may be adding
> enforcement of a change of behavior.

Agreed, I actually also prefer that in practice so I am not
sure why I ordered them this way - perhaps some attempt at
making it easier to review (show the idea and change before
the verification). I will reorder to put all new tests as the first commit
(or second, if I will also introduce a status-quo technical first).

>
> One thing that could be helpful here is to consider tracing a
> count of "commits walked" in the merge-base code, then you could
> have these tests demonstrate the performance benefit by checking
> for that number changing.

Good idea, I actually had some of that locally when developing it,
but I removed the ugly traces before submitting this. I will try to
re-introduce that in a nice way. It would be neat to let tests
inspect that side effect, though in the worst case that could make
it fragile. At the very least it's good for human debugging though.

> In t6600, that tracing number would not be the same across the
> three different data shapes (full graph, half graph, no graph) and
> that could be valuable to demonstrate in tests.

Agreed, the number of commits visited would be more interesting
than the relative performance numbers since it's an algorithmic
change rather than a micro-optimization.

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc
From: Kristofer Karlsson @ 2026-06-22 19:30 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <50dd5fb1-6b4e-448c-977c-cdc476f7fe40@gmail.com>

On Mon, 22 Jun 2026 at 20:21, Derrick Stolee <stolee@gmail.com> wrote:
>
> I like the idea of documenting this so it's easier to understand.

Yes I was myself thinking that I can prove it to myself now that it works,
and anyone else could also prove it to themselves, but having it
explicit here is even better. I found the other documents
(i.e. commit-graph) to be a good source of inspiration here.

> There is risk of drift from the actual implementation. You may want
> to add a comment to the method in commit-reach.c to indicate that
> any change should be reflected in this document.

Good idea, will add that.

> > +Termination
> > +-----------
> > +
> > +Termination happens when we can prove that no extra progress is
> > +possible. We are done with the main loop when one of the following
> > +conditions holds:
> > +
> > +  1. The queue is empty.
> > +  2. The queue only contains STALE entries.
> > +  3. Side-exhaustion: the walk has reached the finite region and one
> > +     of the sides is fully exhausted.
> It could be an interesting exercise, but potentially wasteful, to
> add this document as a Patch 1, but reflecting the old algorithm
> and then to update the document at the same time as you update the
> code.

I did consider that initially but I was worried it would be considered
noisy. I am quite happy to rework it in a way that first
explains the status quo. That would make the document diff
more interesting. Agreed that should become the first patch,
and the patch that changes the algorithm should include
the documentation change.

> The changes in your patch 2 would impact this doc in terms of the
> data being tracked by the paint_queue data structure instead of the
> nonstale_queue structure (though those details are not currently
> handled in the current version). The change to the termination
> condition would come along with patch 3.

Agreed, I would need to rephrase from tracking non-stale
to tracking counts of p1 and p2 (and pending merge bases) commits,
but I think that would be a small tweak and well worth doing.

Thanks,
Kristofer

^ permalink raw reply

* Re: [PATCH 1/2] branch: suggest <remote>/<branch> on upstream slip
From: Junio C Hamano @ 2026-06-22 19:56 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget; +Cc: git, Harald Nordgren
In-Reply-To: <21684539debaf433b6b63404e1a7622a5cc33283.1781262619.git.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Harald Nordgren <haraldnordgren@gmail.com>
>
> "git branch --set-upstream-to origin main" reads the trailing word as
> the local branch to operate on and dies with "branch 'main' does not
> exist", pointing at the wrong problem.

When 'main' does not exist locally,

    $ git branch --set-upstream-to "$anything" main

would fail before even looking at the "$anything" (which is supposed
to specify the new_upstream for the named local branch 'main').  The
operation is to set the upstream for 'main', and if 'main' does not
exist, doesn't the user deserve the error that says 'main' does not
exist, no matter what "$anything" is, whether it is a well-formed or
ill-formed remote tracking branch name?

So it is unclear, at least to me, why "branch 'main' does not exist"
is an inappropriate message, mostly because these three lines does
not clearly tell me what the user _expected_ the command line to do.

When 'main' does exist, but named upstream "$anything" does not, we
get

    $ git branch sample master ;# make sure the thing exists
    $ git branch --set-upstream-to origin sample
    fatal: the requested upstream branch 'origin' does not exist
    hint:
    hint: If you are planning on basing your work on an upstream
    hint: branch that already exists at the remote, you may need to
    hint: run "git fetch" to retrieve it.
    hint:
    hint: If you are planning to push out a new local branch that
    hint: will track its remote counterpart, you may want to use
    hint: "git push -u" to set the upstream config as you push.
    hint: Disable this message with "git config set advice.setUpstreamFailure false"

which does sound clear enough to me, even though it does not exactly
say "Even though upstream branch 'origin' does not exist, 'origin'
is a nickname for a remote, perhaps you meant to say
origin/something?"

I do not doubt you are trying to address a real issue, but the above
three-line description does not tell me what that problem is.

Now I do not regularly use --set-upstream-to, so I may be missing an
obvious common mistake modes, but a couple of my attempts to make
bad command invocations seem to give me reasonable responses:

    $ git branch --set-upstream-to ko/master sample
    branch 'sample' set up to track 'ko/master'.

OK, both are well formed so no problem.

    $ git branch --set-upstream-to ko/mastre sample
    fatal: the requested upstream branch 'ko/mastre' does not exist
    hint:
    hint: If you are planning on basing your work on an upstream
    hint: branch that already exists at the remote, you may need to
    hint: run "git fetch" to retrieve it.
    hint:
    hint: If you are planning to push out a new local branch that
    hint: will track its remote counterpart, you may want to use
    hint: "git push -u" to set the upstream config as you push.
    hint: Disable this message with "git config set advice.setUpstreamFailure false"

Misspelt upstream branch name diagnosed correctly, just like the
case where I gave 'origin', which does not exist, either.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox