Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH RFC v2 2/2] builtin/history: abort reword on same message
From: Justin Tobler @ 2026-06-09 20:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Phillip Wood, Pablo Sabater, git, cat, ps, kaartic.sivaraam,
	ben.knoble
In-Reply-To: <xmqq5x3r1ph0.fsf@gitster.g>

On 26/06/09 12:30PM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > I can see a situation where a user performs:
> >
> >   git history reword abcd1234
> >
> > with the intention to modify a commit message, but then for some reason
> > changes their mind and doesn't want history to change. Maybe the wrong
> > commit was referenced, or they decide the current message is actually
> > fine. From my understanding, there isn't a great way to abort rewording
> > a commit during editing and thus the user would have to reset history
> > afterwards if they care enough to go back to the previous point.
> >
> > So I do see some value in a mechanism to abort rewriting a commit
> > message.
> 
> I think we are saying the same thing in different ways.  I want to
> see that command "succeed" either case (normally we create a new
> commit object because we record an updated committer timestamp, but
> if there is no need to create a new commit object only to record an
> updated committer timestamp, we may choose not to and leave the
> history intact) and I do not want it to *abort*.
> 
> The mechanism to do so may be exactly the same, i.e., accept an
> updated log message, then try to "hash-object" (without -w) the
> commit object with everything, except for the updated commit log
> message, taken from the original commit, plus the updated log
> message.  And if the resulting hash is the same as the original, do
> not do anything further and return happily.  Aborting sounds more
> like complaining loudly "baa, you asked me to reword but you gave me
> the same message? is anything wrong with you?" with non-zero exit
> status, which I think the user does not deserve in such a case.

Yes, I completely agree. If the user doesn't update the commit message,
for whatever reason, that should still be considered a success since it
follows the user's intent. I don't think it makes sense to exit with a
non-zero code in such cases. I would also question if we should print
any message/note to the user at all for the same reasons.

-Justin

^ permalink raw reply

* Re: [PATCH RFC v2 2/2] builtin/history: abort reword on same message
From: Junio C Hamano @ 2026-06-09 19:30 UTC (permalink / raw)
  To: Justin Tobler
  Cc: Phillip Wood, Pablo Sabater, git, cat, ps, kaartic.sivaraam,
	ben.knoble
In-Reply-To: <aihH8ye-r4QuXlRD@denethor>

Justin Tobler <jltobler@gmail.com> writes:

> I can see a situation where a user performs:
>
>   git history reword abcd1234
>
> with the intention to modify a commit message, but then for some reason
> changes their mind and doesn't want history to change. Maybe the wrong
> commit was referenced, or they decide the current message is actually
> fine. From my understanding, there isn't a great way to abort rewording
> a commit during editing and thus the user would have to reset history
> afterwards if they care enough to go back to the previous point.
>
> So I do see some value in a mechanism to abort rewriting a commit
> message.

I think we are saying the same thing in different ways.  I want to
see that command "succeed" either case (normally we create a new
commit object because we record an updated committer timestamp, but
if there is no need to create a new commit object only to record an
updated committer timestamp, we may choose not to and leave the
history intact) and I do not want it to *abort*.

The mechanism to do so may be exactly the same, i.e., accept an
updated log message, then try to "hash-object" (without -w) the
commit object with everything, except for the updated commit log
message, taken from the original commit, plus the updated log
message.  And if the resulting hash is the same as the original, do
not do anything further and return happily.  Aborting sounds more
like complaining loudly "baa, you asked me to reword but you gave me
the same message? is anything wrong with you?" with non-zero exit
status, which I think the user does not deserve in such a case.

^ permalink raw reply

* [PATCH] commit-reach: remove get_reachable_subset()
From: Kristofer Karlsson via GitGitGadget @ 2026-06-09 19:28 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Kristofer Karlsson, Kristofer Karlsson

From: Kristofer Karlsson <krka@spotify.com>

get_reachable_subset() and tips_reachable_from_bases() answer the
same question: given a set of bases and a set of tips, which tips
are reachable from at least one base?

get_reachable_subset() was introduced in fcb2c0769d (2018-11-02)
for add_missing_tags() in remote.c. tips_reachable_from_bases()
was added in cbfe360b14 (2023-03-20) as part of the ahead-behind
series. The two were never consolidated.

With a commit-graph, tips_reachable_from_bases() can have an edge:
its DFS raises the generation floor as lower targets are found,
pruning more aggressively than the static floor in
get_reachable_subset(). Without generation numbers, some edge cases
may be slower with DFS instead of BFS since the date-ordered
prio_queue naturally stays near the top of the graph, but this
should not matter in practice -- worst case both visit the full
graph down from the bases.

The flag in remote.c changes from 1 (bit 0) to TMP_MARK (bit 4)
because tips_reachable_from_bases() uses SEEN (bit 0) internally.
TMP_MARK is already used for deduplication earlier in the same
function and is cleared before the reachability check.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
    commit-reach: remove get_reachable_subset()
    
    This removes get_reachable_subset() and converts its only caller to use
    tips_reachable_from_bases() directly. Both answer the same category-2
    reachability question ("which tips are reachable from these bases?") but
    were introduced years apart and never consolidated.
    
    On the no-commit-graph tradeoff: without generation numbers, the
    date-ordered BFS in get_reachable_subset() can be more disciplined than
    DFS since it naturally stays near the top of the graph. But this only
    matters for repositories that are both large enough for the difference
    to be measurable and missing a commit-graph -- a combination that would
    already struggle for many other reasons. The fix there is to enable the
    commit-graph, not to maintain two implementations of the same
    reachability query.
    
    Notes for reviewers:
    
     * The flag in remote.c changes from 1 to TMP_MARK because
       tips_reachable_from_bases() uses SEEN (bit 0) internally. TMP_MARK is
       already used earlier in the same function and is cleared before the
       reachability block.
    
     * The sent_tips array is converted to a commit_list to match the
       tips_reachable_from_bases() API. This is O(n) list-node allocations,
       negligible compared to the graph walk.
    
     * Test helper and test names rename from get_reachable_subset to
       tips_reachable_from_bases to match the function being exercised.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2144%2Fspkrka%2Fkrka%2Fremove-get-reachable-subset-v2-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2144/spkrka/krka/remove-get-reachable-subset-v2-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2144

 commit-reach.c        | 73 -------------------------------------------
 commit-reach.h        | 13 --------
 remote.c              | 19 ++++++-----
 t/helper/test-reach.c | 39 +++++++++++------------
 t/t6600-test-reach.sh | 18 +++++------
 5 files changed, 36 insertions(+), 126 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 5df471a313..e78752eb87 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1013,79 +1013,6 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 	return result;
 }
 
-struct commit_list *get_reachable_subset(struct commit **from, size_t nr_from,
-					 struct commit **to, size_t nr_to,
-					 unsigned int reachable_flag)
-{
-	struct commit **item;
-	struct commit *current;
-	struct commit_list *found_commits = NULL;
-	struct commit **to_last = to + nr_to;
-	struct commit **from_last = from + nr_from;
-	timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
-	int num_to_find = 0;
-
-	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
-
-	for (item = to; item < to_last; item++) {
-		timestamp_t generation;
-		struct commit *c = *item;
-
-		repo_parse_commit(the_repository, c);
-		generation = commit_graph_generation(c);
-		if (generation < min_generation)
-			min_generation = generation;
-
-		if (!(c->object.flags & PARENT1)) {
-			c->object.flags |= PARENT1;
-			num_to_find++;
-		}
-	}
-
-	for (item = from; item < from_last; item++) {
-		struct commit *c = *item;
-		if (!(c->object.flags & PARENT2)) {
-			c->object.flags |= PARENT2;
-			repo_parse_commit(the_repository, c);
-
-			prio_queue_put(&queue, *item);
-		}
-	}
-
-	while (num_to_find && (current = prio_queue_get(&queue)) != NULL) {
-		struct commit_list *parents;
-
-		if (current->object.flags & PARENT1) {
-			current->object.flags &= ~PARENT1;
-			current->object.flags |= reachable_flag;
-			commit_list_insert(current, &found_commits);
-			num_to_find--;
-		}
-
-		for (parents = current->parents; parents; parents = parents->next) {
-			struct commit *p = parents->item;
-
-			repo_parse_commit(the_repository, p);
-
-			if (commit_graph_generation(p) < min_generation)
-				continue;
-
-			if (p->object.flags & PARENT2)
-				continue;
-
-			p->object.flags |= PARENT2;
-			prio_queue_put(&queue, p);
-		}
-	}
-
-	clear_prio_queue(&queue);
-
-	clear_commit_marks_many(nr_to, to, PARENT1);
-	clear_commit_marks_many(nr_from, from, PARENT2);
-
-	return found_commits;
-}
-
 define_commit_slab(bit_arrays, struct bitmap *);
 static struct bit_arrays bit_arrays;
 
diff --git a/commit-reach.h b/commit-reach.h
index 3f3a563d8a..b3e7051738 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -96,19 +96,6 @@ int can_all_from_reach_with_flag(struct object_array *from,
 int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 		       int commit_date_cutoff);
 
-
-/*
- * Return a list of commits containing the commits in the 'to' array
- * that are reachable from at least one commit in the 'from' array.
- * Also add the given 'flag' to each of the commits in the returned list.
- *
- * This method uses the PARENT1 and PARENT2 flags during its operation,
- * so be sure these flags are not set before calling the method.
- */
-struct commit_list *get_reachable_subset(struct commit **from, size_t nr_from,
-					 struct commit **to, size_t nr_to,
-					 unsigned int reachable_flag);
-
 struct ahead_behind_count {
 	/**
 	 * As input, the *_index members indicate which positions in
diff --git a/remote.c b/remote.c
index f1a3681b7c..7cdb59ed87 100644
--- a/remote.c
+++ b/remote.c
@@ -1459,9 +1459,8 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
 	 * sent to the other side.
 	 */
 	if (sent_tips.nr) {
-		const int reachable_flag = 1;
-		struct commit_list *found_commits;
 		struct commit_stack src_commits = COMMIT_STACK_INIT;
+		struct commit_list *bases = NULL;
 
 		for_each_string_list_item(item, &src_tag) {
 			struct ref *ref = item->util;
@@ -1479,11 +1478,12 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
 			commit_stack_push(&src_commits, commit);
 		}
 
-		found_commits = get_reachable_subset(sent_tips.items,
-						     sent_tips.nr,
-						     src_commits.items,
-						     src_commits.nr,
-						     reachable_flag);
+		for (size_t i = 0; i < sent_tips.nr; i++)
+			commit_list_insert(sent_tips.items[i], &bases);
+		tips_reachable_from_bases(the_repository,
+					 bases, src_commits.items,
+					 src_commits.nr, TMP_MARK);
+		commit_list_free(bases);
 
 		for_each_string_list_item(item, &src_tag) {
 			struct ref *dst_ref;
@@ -1503,7 +1503,7 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
 			 * Is this tag, which they do not have, reachable from
 			 * any of the commits we are sending?
 			 */
-			if (!(commit->object.flags & reachable_flag))
+			if (!(commit->object.flags & TMP_MARK))
 				continue;
 
 			/* Add it in */
@@ -1513,9 +1513,8 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
 		}
 
 		clear_commit_marks_many(src_commits.nr, src_commits.items,
-					reachable_flag);
+					TMP_MARK);
 		commit_stack_clear(&src_commits);
-		commit_list_free(found_commits);
 	}
 
 	string_list_clear(&src_tag, 0);
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 5d86a96c17..eb44a64f50 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -7,6 +7,7 @@
 #include "hex.h"
 #include "object-name.h"
 #include "ref-filter.h"
+#include "revision.h"
 #include "setup.h"
 #include "string-list.h"
 #include "tag.h"
@@ -149,30 +150,26 @@ int cmd__reach(int ac, const char **av)
 
 		printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
 		clear_contains_cache(&cache);
-	} else if (!strcmp(av[1], "get_reachable_subset")) {
-		const int reachable_flag = 1;
-		int count = 0;
-		struct commit_list *current;
-		struct commit_list *list = get_reachable_subset(X_stack.items, X_stack.nr,
-								Y_stack.items, Y_stack.nr,
-								reachable_flag);
-		printf("get_reachable_subset(X,Y)\n");
-		for (current = list; current; current = current->next) {
-			if (!(list->item->object.flags & reachable_flag))
-				die(_("commit %s is not marked reachable"),
-				    oid_to_hex(&list->item->object.oid));
-			count++;
-		}
+	} else if (!strcmp(av[1], "tips_reachable_from_bases")) {
+		struct commit_list *bases = NULL;
+		struct commit_list *result = NULL;
+
+		for (size_t i = 0; i < X_stack.nr; i++)
+			commit_list_insert(X_stack.items[i], &bases);
+		tips_reachable_from_bases(the_repository,
+					 bases, Y_stack.items,
+					 Y_stack.nr, TMP_MARK);
+		commit_list_free(bases);
+
+		printf("tips_reachable_from_bases(X,Y)\n");
 		for (size_t i = 0; i < Y_stack.nr; i++) {
-			if (Y_stack.items[i]->object.flags & reachable_flag)
-				count--;
+			if (Y_stack.items[i]->object.flags & TMP_MARK)
+				commit_list_insert(Y_stack.items[i], &result);
 		}
+		print_sorted_commit_ids(result);
 
-		if (count < 0)
-			die(_("too many commits marked reachable"));
-
-		print_sorted_commit_ids(list);
-		commit_list_free(list);
+		clear_commit_marks_many(Y_stack.nr, Y_stack.items, TMP_MARK);
+		commit_list_free(result);
 	}
 
 	object_array_clear(&X_obj);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b5b314e570..51b140a539 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -391,7 +391,7 @@ test_expect_success 'rev-list: symmetric difference topo-order' '
 	run_all_modes git rev-list --topo-order commit-3-8...commit-6-6
 '
 
-test_expect_success 'get_reachable_subset:all' '
+test_expect_success 'tips_reachable_from_bases:all' '
 	cat >input <<-\EOF &&
 	X:commit-9-1
 	X:commit-8-3
@@ -403,15 +403,15 @@ test_expect_success 'get_reachable_subset:all' '
 	Y:commit-5-6
 	EOF
 	(
-		echo "get_reachable_subset(X,Y)" &&
+		echo "tips_reachable_from_bases(X,Y)" &&
 		git rev-parse commit-3-3 \
 			      commit-1-7 \
 			      commit-5-6 | sort
 	) >expect &&
-	test_all_modes get_reachable_subset
+	test_all_modes tips_reachable_from_bases
 '
 
-test_expect_success 'get_reachable_subset:some' '
+test_expect_success 'tips_reachable_from_bases:some' '
 	cat >input <<-\EOF &&
 	X:commit-9-1
 	X:commit-8-3
@@ -422,14 +422,14 @@ test_expect_success 'get_reachable_subset:some' '
 	Y:commit-5-6
 	EOF
 	(
-		echo "get_reachable_subset(X,Y)" &&
+		echo "tips_reachable_from_bases(X,Y)" &&
 		git rev-parse commit-3-3 \
 			      commit-1-7 | sort
 	) >expect &&
-	test_all_modes get_reachable_subset
+	test_all_modes tips_reachable_from_bases
 '
 
-test_expect_success 'get_reachable_subset:none' '
+test_expect_success 'tips_reachable_from_bases:none' '
 	cat >input <<-\EOF &&
 	X:commit-9-1
 	X:commit-8-3
@@ -439,8 +439,8 @@ test_expect_success 'get_reachable_subset:none' '
 	Y:commit-7-6
 	Y:commit-2-8
 	EOF
-	echo "get_reachable_subset(X,Y)" >expect &&
-	test_all_modes get_reachable_subset
+	echo "tips_reachable_from_bases(X,Y)" >expect &&
+	test_all_modes tips_reachable_from_bases
 '
 
 test_expect_success 'for-each-ref ahead-behind:linear' '

base-commit: 600fe743028cbfb640855f659e9851522214bc0b
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH RFC v2 2/2] builtin/history: abort reword on same message
From: Junio C Hamano @ 2026-06-09 19:17 UTC (permalink / raw)
  To: Pablo Sabater; +Cc: Phillip Wood, git, cat, ps, kaartic.sivaraam, ben.knoble
In-Reply-To: <CAN5EUNRz9F+njb_O=Q4DzVMec-q+rDf83Ow+MPJE4yLCBq9qww@mail.gmail.com>

Pablo Sabater <pabloosabaterr@gmail.com> writes:

>> > I wonder if we should check that the committer identity is unchanged as
>> > well in case anyone is using this to fix commits after committing with
>> > the wrong identity.
>
> I think that if you reword a commit committed by someone else but end
> up with no changes I want it to be kept as it was.

That depends on the reason why the feature to "reword" the commit is
being used, and the use case Phillip is talking about is a bit
different.

A very common mistake a new user makes when starting a repository is
to make commits before they realize that they used a wrong identity
to create them.  They are happy with what they committed, except
that they want these commits to be attributed to user.{name,email}
they corrected.

Also, people often use multiple identities (e.g., corp vs personal),
and when making commits to the project for their employer they do
not want to use their personal identity (and vice versa).  After
making a mistake to create commits under wrong identity, they want
to fix these commits.

In such situations, there is no room for leaving the committer name
as "someone else".  The user wants to get rid of the "someone else"s
identity out of these commits.

^ permalink raw reply

* Re: [PATCH GSoC RFC v12 12/12] cat-file: make remote-object-info allow-list dynamic
From: Junio C Hamano @ 2026-06-09 18:54 UTC (permalink / raw)
  To: Chandra Pratap
  Cc: Pablo Sabater, eric.peijian, calvinwan, chriscool, git, jltobler,
	jonathantanmy, karthik.188, toon
In-Reply-To: <CA+J6zkQ22en2HgH03EedKOfC+jLcHH2UbwpH0h_bDEAHR6B2pg@mail.gmail.com>

Chandra Pratap <chandrapratap3519@gmail.com> writes:

> On Mon, 8 Jun 2026 at 15:45, Pablo Sabater <pabloosabaterr@gmail.com> wrote:
>>
>> The static allow-list in expand_atom() is hardcoded to only allow
>> "objectname" and "objectsize" for remote queries. This works because
>>  ...
>>  }

You just forced readers to skip over 200+ lines of quoted material,
looking for something interesting you have said in response to
comment on the patch in vain.

>> diff --git a/fetch-object-info.c b/fetch-object-info.c
>> index 51a898430d..425929a269 100644
>> --- a/fetch-object-info.c
>> +++ b/fetch-object-info.c
>> @@ -39,6 +39,12 @@ int fetch_object_info(const enum protocol_version version, struct object_info_ar
>>         case protocol_v2:
>>                 if (!server_supports_v2("object-info"))
>>                         die(_("object-info capability is not enabled on the server"));
>> +
>> +               for (int i = args->object_info_options->nr - 1; i >= 0; i--)
>
> Isn't args->object_info_options->nr of type size_t? We should probably
> do something
> like:
>
> for (size_t i = 0; i < args->args->object_info_options->nr; i++)
>
> instead.

This is a valid observation and a careful reading like this is very
much appreciated.  It is unfortunate that it was buried by 200+
lines of irrelevant material before we find it.

Thanks.


>> +                       if (!server_supports_feature("object-info",
>> +                                                    args->object_info_options->items[i].string, 0))
>> +                               unsorted_string_list_delete_item(args->object_info_options, i, 0);
>> +
>>                 send_object_info_request(fd_out, args);
>>                 break;
>>         case protocol_v1:
>>
>> --
>> 2.54.0
>
> Other than these, the patch series LGTM for now.
>
> Thanks,
> Chandra.

^ permalink raw reply

* Re: [PATCH RFC v2 2/2] builtin/history: abort reword on same message
From: Justin Tobler @ 2026-06-09 18:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Phillip Wood, Pablo Sabater, git, cat, ps, kaartic.sivaraam,
	ben.knoble
In-Reply-To: <xmqq4ijbsn2m.fsf@gitster.g>

On 26/06/09 09:20AM, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
> > Hi Pablo
> >
> > On 09/06/2026 11:42, Pablo Sabater wrote:
> >>   static int commit_tree_ext(struct repository *repo,
> >> @@ -135,6 +136,13 @@ static int commit_tree_ext(struct repository *repo,
> >>   					  original_body, action, &commit_message);
> >>   		if (ret < 0)
> >>   			goto out;
> >> +
> >> +		if (flags & COMMIT_TREE_ABORT_ON_SAME_MESSAGE &&
> >> +		    !strcmp(original_body, commit_message.buf)) {
> >> +			fprintf(stderr, _("Message unchanged, aborting reword.\n"));
> >> +			ret = 1;
> >> +			goto out;
> >> +		}
> >
> > I wonder if we should check that the committer identity is unchanged as 
> > well in case anyone is using this to fix commits after committing with 
> > the wrong identity.
> >
> > Aborting when the message and committer identity are unchanged seems 
> > like a good idea.
> 
> I am not sure why it would be a good idea.  The user wanted to make
> the commit have this message, and the commit ended up having the
> same message as the user gave.  That message may have been identical
> to what the commit originally had, or it may be different.  Why is
> the former an abort-worthy event?  A simple note, I may understand,
> but aborting with an error message?

I can see a situation where a user performs:

  git history reword abcd1234

with the intention to modify a commit message, but then for some reason
changes their mind and doesn't want history to change. Maybe the wrong
commit was referenced, or they decide the current message is actually
fine. From my understanding, there isn't a great way to abort rewording
a commit during editing and thus the user would have to reset history
afterwards if they care enough to go back to the previous point.

So I do see some value in a mechanism to abort rewriting a commit
message. An unchanged commit message does seem like a reasonable signal
to essentially abort the reword. I'm not sure committer identity should
be taken into consideration though since it would inhibit a users
ability to abort the reword if they ever touch a commit that they
themselves are not the previous committer. 

I don't think there is a need to have an error message though. Even in
the case where the user leaves the commit message unchanged and history
is left untouched, git-history(1) would be following exactly what the
user instructed it to do. I don't really see why the user should care
whether history was actually modified or not in such a scenario.

-Justin

^ permalink raw reply

* Re: [PATCH GSoC RFC v12 03/12] cat-file: add declaration of variable i inside its for loop
From: Pablo Sabater @ 2026-06-09 17:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: eric.peijian, calvinwan, chriscool, git, jltobler, jonathantanmy,
	karthik.188, toon, chandrapratap3519
In-Reply-To: <xmqqzf15w0cz.fsf@gitster.g>

El lun, 8 jun 2026 a las 16:52, Junio C Hamano (<gitster@pobox.com>) escribió:
>
> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>
> > From: Eric Ju <eric.peijian@gmail.com>
> > Subject: Re: [PATCH GSoC RFC v12 03/12] cat-file: add declaration of variable i inside its for loop
>
> "add" sounds a bit strange, as the existing code wouldn't have
> compiled if the variable were never declared.  What the patch did
> was to move (not add) the declaration of a function scope variable
> that is used to control for() loops.  Would any of these work?
>
> Subject: [PATCH GSOC v12 03/12] cat-file: narrow scope of loop counter
> Subject: [PATCH GSOC v12 03/12] cat-file: declare loop counter inside for()
>

Hi!
True, it's better to write "move" and about the title, works as well,
I'll change them for the next version.

> > Some code used in this series declares variable i and only uses it
> > in a for loop, not in any other logic outside the loop.
> >
> > Change the declaration of i to be inside the for loop for readability.
> > While at it, we also change its type from "int" to "size_t" where the latter makes more sense.
>
> Curious single line that is overly long?

True, I'll wrap it up correctly in the next version.

>
> > Helped-by: Christian Couder <chriscool@tuxfamily.org>
> > Signed-off-by: Eric Ju <eric.peijian@gmail.com>
> > Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
> > ---
> >  builtin/cat-file.c | 11 +++--------
> >  fetch-pack.c       |  3 +--
> >  2 files changed, 4 insertions(+), 10 deletions(-)
> >
> > diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> > index fa45f774d7..c060fd4800 100644
> > --- a/builtin/cat-file.c
> > +++ b/builtin/cat-file.c
> > @@ -726,12 +726,10 @@ static void dispatch_calls(struct batch_options *opt,
> >               struct queued_cmd *cmd,
> >               int nr)
> >  {
> > -     int i;
> > -
> >       if (!opt->buffer_output)
> >               die(_("flush is only for --buffer mode"));
> >
> > -     for (i = 0; i < nr; i++)
> > +     for (size_t i = 0; i < nr; i++)
> >               cmd[i].fn(opt, cmd[i].line, output, data);
>
> The loop limit "nr" will not become as large as size_t because the
> caller passes a platform natural "int" to the function.  Wouldn't a
> stupid compiler give us warning on comparing unsigned size_t with
> signed int here?

Yes, I'll change "i" to be int.

>
> > @@ -739,9 +737,7 @@ static void dispatch_calls(struct batch_options *opt,
> >
> >  static void free_cmds(struct queued_cmd *cmd, size_t *nr)
> >  {
> > -     size_t i;
> > -
> > -     for (i = 0; i < *nr; i++)
> > +     for (size_t i = 0; i < *nr; i++)
> >               FREE_AND_NULL(cmd[i].line);
>
> No type change, so the result is as safe as the original.
>
> > @@ -768,7 +764,6 @@ static void batch_objects_command(struct batch_options *opt,
> >       size_t alloc = 0, nr = 0;
> >
> >       while (strbuf_getdelim_strip_crlf(&input, stdin, opt->input_delim) != EOF) {
> > -             int i;
> >               const struct parse_cmd *cmd = NULL;
> >               const char *p = NULL, *cmd_end;
> >               struct queued_cmd call = {0};
> > @@ -778,7 +773,7 @@ static void batch_objects_command(struct batch_options *opt,
> >               if (isspace(*input.buf))
> >                       die(_("whitespace before command: '%s'"), input.buf);
> >
> > -             for (i = 0; i < ARRAY_SIZE(commands); i++) {
> > +             for (size_t i = 0; i < ARRAY_SIZE(commands); i++) {
> >                       if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
> >                               continue;
>
> ARRAY_SIZE() is some arithmetic over sizeof(*commands) and
> sizeof(commands), which is of type size_t, so this is better than
> the original.  Use of size_t i of course is a natural way to index
> into commands[] array, so the result is just fine.
>
> > diff --git a/fetch-pack.c b/fetch-pack.c
> > index 120e01f3cf..f13951d154 100644
> > --- a/fetch-pack.c
> > +++ b/fetch-pack.c
> > @@ -1388,9 +1388,8 @@ static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
> >       if (advertise_sid && server_supports_v2("session-id"))
> >               packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
> >       if (server_options && server_options->nr) {
> > -             int i;
> >               ensure_server_supports_v2("server-option");
> > -             for (i = 0; i < server_options->nr; i++)
> > +             for (size_t i = 0; i < server_options->nr; i++)
> >                       packet_buf_write(req_buf, "server-option=%s",
> >                                        server_options->items[i].string);
>
> server_options is a string_list whose .nr member is of type size_t,
> so this comparison is perfectly fine.  Ditto for ->items[i].string
> that is a natural way to index into an array.
>
> >       }
>
> v

Thanks,

Pablo.

^ permalink raw reply

* Re: [PATCH GSoC RFC v12 04/12] t1006: split test utility functions into new "lib-cat-file.sh"
From: Pablo Sabater @ 2026-06-09 17:44 UTC (permalink / raw)
  To: Chandra Pratap
  Cc: eric.peijian, calvinwan, chriscool, git, jltobler, jonathantanmy,
	karthik.188, toon
In-Reply-To: <CA+J6zkQe=K80QUOH8LwXCRw9nxv3tHBg+FtfDsYedY5xdHW79A@mail.gmail.com>

El mar, 9 jun 2026 a las 8:29, Chandra Pratap
(<chandrapratap3519@gmail.com>) escribió:
>
> On Mon, 8 Jun 2026 at 15:44, Pablo Sabater <pabloosabaterr@gmail.com> wrote:
> >
> > From: Eric Ju <eric.peijian@gmail.com>
> >
> > This refactor extracts utility functions from the cat-file's test
> > script "t1006-cat-file.sh" into a new "lib-cat-file.sh" dedicated
> > library file. The goal is to improve code reuse and readability,
> > enabling future tests to leverage these utilities without duplicating
> > code.
>
> Hmm, seems like a premature change to me. Do any of the subsequent
> commits require this refactor? Maybe the follow-up series that enables
> %objecttype support needs it? Did someone request this change in v11's
> feedback?
>
> If any of those are true, I think it's worthwhile mentioning it here. That will
> make it easier to determine whether this change is truly necessary.

Yes, these functions are needed for "t1017" which is created later in
the series [1] for the remote object info tests, so they are both used
in "t1006" (where they were originally) and "t1017".

>
> > Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
> > ---
> >  t/lib-cat-file.sh   | 16 ++++++++++++++++
> >  t/t1006-cat-file.sh | 13 +------------
> >  2 files changed, 17 insertions(+), 12 deletions(-)
> >
> > diff --git a/t/lib-cat-file.sh b/t/lib-cat-file.sh
> > new file mode 100644
> > index 0000000000..44af232d74
> > --- /dev/null
> > +++ b/t/lib-cat-file.sh
> > @@ -0,0 +1,16 @@
> > +# Library of git-cat-file related test functions.
> > +
> > +# Print a string without a trailing newline.
> > +echo_without_newline () {
> > +       printf '%s' "$*"
> > +}
> > +
> > +# Print a string without newlines and replace them with a NULL character (\0).
> > +echo_without_newline_nul () {
> > +       echo_without_newline "$@" | tr '\n' '\0'
> > +}
> > +
> > +# Calculate the length of a string.
> > +strlen () {
> > +       echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
> > +}
> > diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> > index 8e2c52652c..8360f3bbd9 100755
> > --- a/t/t1006-cat-file.sh
> > +++ b/t/t1006-cat-file.sh
> > @@ -4,6 +4,7 @@ test_description='git cat-file'
> >
> >  . ./test-lib.sh
> >  . "$TEST_DIRECTORY/lib-loose.sh"
> > +. "$TEST_DIRECTORY"/lib-cat-file.sh
> >
> >  test_cmdmode_usage () {
> >         test_expect_code 129 "$@" 2>err &&
> > @@ -99,18 +100,6 @@ do
> >         '
> >  done
> >
> > -echo_without_newline () {
> > -    printf '%s' "$*"
> > -}
> > -
> > -echo_without_newline_nul () {
> > -       echo_without_newline "$@" | tr '\n' '\0'
> > -}
> > -
> > -strlen () {
> > -    echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
> > -}
> > -
> >  run_tests () {
> >      type=$1
> >      object_name="$2"
> >
> > --
> > 2.54.0

[1]: https://lore.kernel.org/git/20260608-ps-eric-work-rebase-v12-10-5338b766e658@gmail.com/

Thanks,

Pablo.

^ permalink raw reply

* Re: [PATCH GSoC RFC v12 12/12] cat-file: make remote-object-info allow-list dynamic
From: Pablo Sabater @ 2026-06-09 17:34 UTC (permalink / raw)
  To: Chandra Pratap
  Cc: eric.peijian, calvinwan, chriscool, git, jltobler, jonathantanmy,
	karthik.188, toon
In-Reply-To: <CA+J6zkQ22en2HgH03EedKOfC+jLcHH2UbwpH0h_bDEAHR6B2pg@mail.gmail.com>

El mar, 9 jun 2026 a las 17:32, Chandra Pratap
(<chandrapratap3519@gmail.com>) escribió:
>
> On Mon, 8 Jun 2026 at 15:45, Pablo Sabater <pabloosabaterr@gmail.com> wrote:
> >
> > The static allow-list in expand_atom() is hardcoded to only allow
> > "objectname" and "objectsize" for remote queries. This works because
> > up to this point all servers will either support object-info with name
> > and size or they do not support them at all, but we cannot expect that
> > in a future different servers with different git versions to have the
> > same object-info capabilities. Therefore, the allow_list needs to be
> > dynamic depending on what does the server advertise.
> >
> > The client will now:
> >
> > 1. Request the protocol option that the placeholder refers to (i.e.
> >    "size" when "%(objectsize)").
> >
> > 2. Filters the request in fetch_object_info() dropping any option that
> >    the server does not advertise.
> >
> > 3. After the fetching, the options that haven't been dropped are the ones
> >    fetched and supported by the server, these supported options are
> >    mapped and remote_allowed_atoms is populated with the placeholders.
> >
> > 4. expand_atom() checks remote_allowed_atoms with the same behaviour as
> >    the static allow_list had.
> >
> > Move object_info_options out of get_remote_info so the caller which has
> > data can select what options will be requested instead of requesting
> > always size.
> > Move batch_object_write() out so there will always be an output even if
> > all the placeholders are not supported by the server (returns an empty
> > line).
> >
> > Include "type" in the object_info_options so once the server supports
> > it, the clients know already how to request it.
> >
> > Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> > Mentored-by: Chandra Pratap <chandrapratap3519@gmail.com>
> > Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
> > ---
> >  builtin/cat-file.c  | 85 ++++++++++++++++++++++++++++++++---------------------
> >  fetch-object-info.c |  6 ++++
> >  2 files changed, 58 insertions(+), 33 deletions(-)
> >
[snip]
> > diff --git a/fetch-object-info.c b/fetch-object-info.c
> > index 51a898430d..425929a269 100644
> > --- a/fetch-object-info.c
> > +++ b/fetch-object-info.c
> > @@ -39,6 +39,12 @@ int fetch_object_info(const enum protocol_version version, struct object_info_ar
> >         case protocol_v2:
> >                 if (!server_supports_v2("object-info"))
> >                         die(_("object-info capability is not enabled on the server"));
> > +
> > +               for (int i = args->object_info_options->nr - 1; i >= 0; i--)
>
> Isn't args->object_info_options->nr of type size_t? We should probably
> do something
> like:
>
> for (size_t i = 0; i < args->args->object_info_options->nr; i++)
>
> instead.

Hi!

void unsorted_string_list_delete_item(struct string_list *list, int i,
int free_util)
{
        if (list->strdup_strings)
                free(list->items[i].string);
        if (free_util)
                free(list->items[i].util);
        list->items[i] = list->items[list->nr-1];
        list->nr--;
}


I made it backwards because of "list->items[i] = list->items[list->nr
- 1];" If we made it from 0..nr and we delete the first element, for
the next iteration, the last element is at [0] but we are on [1] and
that swapped element never gets evaluated.

About size_t, yes, it is size_t but because we go backwards 0 - 1
would fail, also unsorted_string_list_delete_item() signature has "int
i". The options that can be on that list will be a small number so
there should be no problem, should I cast it explicitly?

>
> > +                       if (!server_supports_feature("object-info",
> > +                                                    args->object_info_options->items[i].string, 0))
> > +                               unsorted_string_list_delete_item(args->object_info_options, i, 0);
> > +
> >                 send_object_info_request(fd_out, args);
> >                 break;
> >         case protocol_v1:
> >
> > --
> > 2.54.0
>
> Other than these, the patch series LGTM for now.
>
> Thanks,
> Chandra.

Thanks,

Pablo.

^ permalink raw reply

* Re: git-diff in a worktree is an order of magnitude slower?
From: D. Ben Knoble @ 2026-06-09 17:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Git
In-Reply-To: <20260609001134.GD358144@coredump.intra.peff.net>

On Mon, Jun 8, 2026 at 8:11 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Jun 08, 2026 at 07:36:45PM -0400, D. Ben Knoble wrote:
>
> > I'd like to report and offer to help fix what I view as a serious performance
> > bug:
> >
> >     "git diff --no-ext-diff --quiet" performs about ~10x slower in a secondary
> >     worktree than in the main worktree.
>
> Hmm, I get the opposite effect: it is much faster in the worktree!
>
> I did:
>
>   git clone /path/to/linux.git
>   git -C linux worktree add --detach ../wt
>   hyperfine -L dir linux,wt 'git -C {dir} diff'
>
> which yielded:
>
>   Benchmark 1: git -C linux diff
>     Time (mean ± σ):     188.9 ms ±   2.5 ms    [User: 166.4 ms, System: 130.7 ms]
>     Range (min … max):   185.5 ms … 194.8 ms    16 runs
>
>   Benchmark 2: git -C wt diff
>     Time (mean ± σ):      20.0 ms ±   1.5 ms    [User: 23.4 ms, System: 103.5 ms]
>     Range (min … max):    17.2 ms …  24.6 ms    132 runs
>
>   Summary
>     git -C wt diff ran
>       9.43 ± 0.71 times faster than git -C linux diff
>
> Running:
>
>   perf record -g git -C wt --no-pager diff
>   perf record -g git -C linux --no-pager diff
>   perf diff
>
> implies that the slow case is spending a lot more time computing sha1s.
> Which implies that the entries are stat dirty. And indeed, if I run:
>
>   git -C linux update-index --refresh
>
> now they both take ~20ms.

Ah, TIL about --refresh. I suppose it could be nice if "git diff"
updated the index in this way, but that sounds like a band-aid. Maybe
creating a fresh worktree should do the equivalent to make sure it's
considered "fresh"?

(This also dropped my timings down to normal.)

At $DAYJOB, I _think_ some version of "git restore <stuff>" ended up
also updating the index.

> I wonder if it's just a racy-git problem? Many files are written in the
> same second as the index, so they end up with the same mtimes, and we
> have to err on the side of checking the contents.
>
> See Documentation/technical/racy-git.adoc for a larger discussion.
>
> So it is not really about worktrees at all, but just "bad luck" in
> generating that initial index (that goes away next time you actually
> make an index update that rewrites the whole thing).

Ah, that makes sense! I'm familiar with the raciness but didn't expect it here.

> I'd have thought USE_NSEC was the default these days, but looks like it
> isn't? Try building with that and I'll bet it goes away entirely.

Thanks, I'll take a look.

I can see on my Macbook that at least Meson does automatically set
either USE_ST_TIMESPEC or NO_NSEC automatically, but has no option to
enabled USE_NSEC and try that. I can probably write that patch (which
I'll do to test), and I can send it along with the "worktree add
should refresh the index" if you think that's an appropriate thing to
do.

> > PS I almost CC'd Peff and Patrick, whose names stood out in "git
> > shortlog builtin/{worktree,diff}* object-file* | sort -t\( -k2 -g",
> > but decided they'd be their own best judge of whether they can
> > understand what's going on? :)
>
> You might be interested in "git shortlog -ns". :)
>
> -Peff

Phew! Yeah, that's much nicer. Thanks! (When typing this out for
email, I even left out the "grep '^[^[:space:]]' |" filter :P)

-- 
D. Ben Knoble

^ permalink raw reply

* Re: [PATCH RFC v2 2/2] builtin/history: abort reword on same message
From: Pablo Sabater @ 2026-06-09 17:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, git, cat, ps, kaartic.sivaraam, ben.knoble
In-Reply-To: <xmqq4ijbsn2m.fsf@gitster.g>

El mar, 9 jun 2026 a las 18:20, Junio C Hamano (<gitster@pobox.com>) escribió:
>
> Phillip Wood <phillip.wood123@gmail.com> writes:
>
> > Hi Pablo
> >
> > On 09/06/2026 11:42, Pablo Sabater wrote:
> >>   static int commit_tree_ext(struct repository *repo,
> >> @@ -135,6 +136,13 @@ static int commit_tree_ext(struct repository *repo,
> >>                                        original_body, action, &commit_message);
> >>              if (ret < 0)
> >>                      goto out;
> >> +
> >> +            if (flags & COMMIT_TREE_ABORT_ON_SAME_MESSAGE &&
> >> +                !strcmp(original_body, commit_message.buf)) {
> >> +                    fprintf(stderr, _("Message unchanged, aborting reword.\n"));
> >> +                    ret = 1;
> >> +                    goto out;
> >> +            }
> >
> > I wonder if we should check that the committer identity is unchanged as
> > well in case anyone is using this to fix commits after committing with
> > the wrong identity.

I think that if you reword a commit committed by someone else but end
up with no changes I want it to be kept as it was.

> >
> > Aborting when the message and committer identity are unchanged seems
> > like a good idea.
>
> I am not sure why it would be a good idea.  The user wanted to make
> the commit have this message, and the commit ended up having the
> same message as the user gave.  That message may have been identical
> to what the commit originally had, or it may be different.  Why is
> the former an abort-worthy event?  A simple note, I may understand,
> but aborting with an error message?

With what you said at [1], having this in an
"--avoid-unnecessary-rewrite" I think that the abort might be too much
as with the flag the user already expects this to happen and silent
might be better.

By the way, I feel that "--avoid-unnecessary-rewrite" is too long,
could it be something shorter? If not it could be set "-r" as the
short and leave the long as it is.

>
> Thanks.

[1]: https://lore.kernel.org/git/xmqqtsrbsvcm.fsf@gitster.g/

Thanks,
Pablo

^ permalink raw reply

* Re: [PATCH RFC v2 2/2] builtin/history: abort reword on same message
From: Junio C Hamano @ 2026-06-09 16:20 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Pablo Sabater, git, cat, ps, kaartic.sivaraam, ben.knoble
In-Reply-To: <54bd36e9-3d21-4f83-86d6-2882a14779de@gmail.com>

Phillip Wood <phillip.wood123@gmail.com> writes:

> Hi Pablo
>
> On 09/06/2026 11:42, Pablo Sabater wrote:
>>   static int commit_tree_ext(struct repository *repo,
>> @@ -135,6 +136,13 @@ static int commit_tree_ext(struct repository *repo,
>>   					  original_body, action, &commit_message);
>>   		if (ret < 0)
>>   			goto out;
>> +
>> +		if (flags & COMMIT_TREE_ABORT_ON_SAME_MESSAGE &&
>> +		    !strcmp(original_body, commit_message.buf)) {
>> +			fprintf(stderr, _("Message unchanged, aborting reword.\n"));
>> +			ret = 1;
>> +			goto out;
>> +		}
>
> I wonder if we should check that the committer identity is unchanged as 
> well in case anyone is using this to fix commits after committing with 
> the wrong identity.
>
> Aborting when the message and committer identity are unchanged seems 
> like a good idea.

I am not sure why it would be a good idea.  The user wanted to make
the commit have this message, and the commit ended up having the
same message as the user gave.  That message may have been identical
to what the commit originally had, or it may be different.  Why is
the former an abort-worthy event?  A simple note, I may understand,
but aborting with an error message?

Thanks.

^ permalink raw reply

* Re: [PATCH RFC 1/2] builtin/history: abort reword on unchanged message
From: Pablo Sabater @ 2026-06-09 15:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Kaartic Sivaraam
In-Reply-To: <xmqqtsrbsvcm.fsf@gitster.g>

El mar, 9 jun 2026 a las 15:21, Junio C Hamano (<gitster@pobox.com>) escribió:
>
> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>
> > True, after reading it, history being more costly or the in memory are
> > not good args.
>
> And no argument, including that history is new, is a good excuse to
> make these three things inconsistent, period.
>
> One of the patches in your updated iteration claims
>
>     When using `git history reword <commit>` if the new message is the same
>     as the original, it continues and rewrites the history when nothing
>     changed.
>
>     `git commit --amend` and `git rebase -i` with reword share this behavior
>     and it is wrong as well, but changing them breaks what people are used
>     to. Take the opportunity of `git history` being a new command and handle
>     it correctly from the start.
>
> and I think this is a totally wrong attitude to go about this.
>
> I may have said that it may have been a better default to try hard
> to avoid making a change that is a no-op, other than that it changes
> committer timestamp, while making the current "always create a new
> commit object" behaviour optionally available, for these three
> commands, and cited that the behaviour of 'pick' in 'rebase -i' that
> avoids unnecessary rewrite as an example of a good practice.
>
> But I do not think the existing behaviour to always rewrite is
> *wrong* at all.  It may be wrong not to offer the other choice of
> pretending no content change means no commit object change, but that
> is a different story.
>
> I also do not think *aborting* only when the message happens to be
> the same is a valid mode of operation at all.
>
> The most sensible first step, I think, is to add a new command line
> option to "git history" (which will gain more history editing
> subcommands) that tells the command to leave the original history
> as-is when the only change rewriting commits would make would be to
> the committer ident or timestamp information.  If in a future a new
> replace-tree subcommand is added, e.g. if
>
>     $ git history replace-tree HEAD~20 HEAD~27^{tree}
>
> were a command to rewrite the history in such a way that 20th direct
> ancestor of the current HEAD had a tree object HEAD~27^{tree}, by
> derfault the command _should_ rewrite HEAD~10 and everything that
> has it as an ancestor.  With the "--avoid-unnecsssary-rewrite"
> optimization feature on, however, it may silently become a no-op
> when HEAD~27^{tree} happened to be the same tree as HEAD~20^{tree}
> so the only difference between rewritten and original HEAD~20 would
> be when that commit object was created and by whom.
>
> And give the same option to "rebase -i" or "commit --amend".  We can
> discuss, educate the users, and flip the default at a major version
> boundary, if the "avoid unnecessary rewrite" truly turns out to be a
> better default (right now it is merely our speculation, and we do
> not even know if the current behaviour is a worse default).

Hi Junio,

Sorry about how I expressed myself. I didn't mean by wrong to be bad
or anything similar, I just noticed this when testing `git history
reword` and thought that I would like it this other way.

Saying that git history is new or I would like this to be different
are not good arguments to have `git history` inconsistent with other
commands.

My idea was more of a defensive thing, where you would need a
"--force-rewrite" opt to explicitly change timestamps. But I see the
point of having it in an `--avoid-unnecessary-rewrite` so without
options it has the same behavior as other commands.

I'll try to express myself better in the next version and go with the
opt direction.

Sorry again,
Pablo

^ permalink raw reply

* Re: [PATCH GSoC RFC v12 12/12] cat-file: make remote-object-info allow-list dynamic
From: Chandra Pratap @ 2026-06-09 15:32 UTC (permalink / raw)
  To: Pablo Sabater
  Cc: eric.peijian, calvinwan, chriscool, git, jltobler, jonathantanmy,
	karthik.188, toon
In-Reply-To: <20260608-ps-eric-work-rebase-v12-12-5338b766e658@gmail.com>

On Mon, 8 Jun 2026 at 15:45, Pablo Sabater <pabloosabaterr@gmail.com> wrote:
>
> The static allow-list in expand_atom() is hardcoded to only allow
> "objectname" and "objectsize" for remote queries. This works because
> up to this point all servers will either support object-info with name
> and size or they do not support them at all, but we cannot expect that
> in a future different servers with different git versions to have the
> same object-info capabilities. Therefore, the allow_list needs to be
> dynamic depending on what does the server advertise.
>
> The client will now:
>
> 1. Request the protocol option that the placeholder refers to (i.e.
>    "size" when "%(objectsize)").
>
> 2. Filters the request in fetch_object_info() dropping any option that
>    the server does not advertise.
>
> 3. After the fetching, the options that haven't been dropped are the ones
>    fetched and supported by the server, these supported options are
>    mapped and remote_allowed_atoms is populated with the placeholders.
>
> 4. expand_atom() checks remote_allowed_atoms with the same behaviour as
>    the static allow_list had.
>
> Move object_info_options out of get_remote_info so the caller which has
> data can select what options will be requested instead of requesting
> always size.
> Move batch_object_write() out so there will always be an output even if
> all the placeholders are not supported by the server (returns an empty
> line).
>
> Include "type" in the object_info_options so once the server supports
> it, the clients know already how to request it.
>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Mentored-by: Chandra Pratap <chandrapratap3519@gmail.com>
> Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
> ---
>  builtin/cat-file.c  | 85 ++++++++++++++++++++++++++++++++---------------------
>  fetch-object-info.c |  6 ++++
>  2 files changed, 58 insertions(+), 33 deletions(-)
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 1166a046b4..055991b5af 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -341,13 +341,10 @@ struct expand_data {
>          * Flags about when an object info is being fetched from remote.
>          */
>         unsigned is_remote:1;
> -};
> -#define EXPAND_DATA_INIT  { .mode = S_IFINVALID, .type = OBJ_BAD }
>
> -static const char *remote_object_info_atoms[] = {
> -       "objectname",
> -       "objectsize",
> +       struct string_list remote_allowed_atoms;
>  };
> +#define EXPAND_DATA_INIT  { .mode = S_IFINVALID, .type = OBJ_BAD, .remote_allowed_atoms = STRING_LIST_INIT_NODUP }
>
>  static int is_atom(const char *atom, const char *s, int slen)
>  {
> @@ -359,17 +356,11 @@ static int expand_atom(struct strbuf *sb, const char *atom, int len,
>                        struct expand_data *data)
>  {
>         if (data->is_remote) {
> -               size_t i, allowed_nr = ARRAY_SIZE(remote_object_info_atoms);
> -               for (i = 0; i < allowed_nr; i++)
> -                       if (is_atom(remote_object_info_atoms[i], atom, len))
> +               size_t i;
> +               for (i = 0; i < data->remote_allowed_atoms.nr; i++)
> +                       if (is_atom(data->remote_allowed_atoms.items[i].string, atom, len))
>                                 break;
> -
> -               /*
> -                * On remote, skip unsupported atoms returning an empty sb,
> -                * honoring how for-each-ref handles known but inapplicable
> -                * atoms (e.g. %(tagger)).
> -                */
> -               if (i == allowed_nr)
> +               if (i == data->remote_allowed_atoms.nr)
>                         return 1;
>         }
>
> @@ -686,12 +677,12 @@ static int get_remote_info(struct batch_options *opt,
>                            int argc,
>                            const char **argv,
>                            struct object_info **remote_object_info,
> -                          struct oid_array *object_info_oids)
> +                          struct oid_array *object_info_oids,
> +                          struct string_list *object_info_options)
>  {
>         int retval = 0;
>         struct remote *remote = NULL;
>         struct object_id oid;
> -       struct string_list object_info_options = STRING_LIST_INIT_NODUP;
>         static struct transport *gtransport;
>
>         /*
> @@ -726,15 +717,12 @@ static int get_remote_info(struct batch_options *opt,
>         gtransport->smart_options->object_info = 1;
>         gtransport->smart_options->object_info_oids = object_info_oids;
>
> -       string_list_append(&object_info_options, "size");
> -
> -       if (object_info_options.nr > 0) {
> -               gtransport->smart_options->object_info_options = &object_info_options;
> +       if (object_info_options->nr > 0) {
> +               gtransport->smart_options->object_info_options = object_info_options;
>                 gtransport->smart_options->object_info_data = *remote_object_info;
>                 retval = transport_fetch_refs(gtransport, NULL);
>         }
>  cleanup:
> -       string_list_clear(&object_info_options, 0);
>         transport_disconnect(gtransport);
>         return retval;
>  }
> @@ -820,6 +808,21 @@ static void parse_cmd_mailmap(struct batch_options *opt UNUSED,
>                 load_mailmap();
>  }
>
> +struct protocol_placeholder_entry {
> +       const char *option;
> +       const char *atom;
> +};
> +
> +static const struct protocol_placeholder_entry remote_atom_map[] = {
> +       {"size", "objectsize"},
> +       {"type", "objecttype"},
> +       /*
> +        * Add new protocol options here. Even if the server doesn't support
> +        * them the allow_list will drop them if the server doesn't advertise
> +        * them.
> +        */
> +};
> +
>  static void parse_cmd_remote_object_info(struct batch_options *opt,
>                                          const char *line, struct strbuf *output,
>                                          struct expand_data *data)
> @@ -829,6 +832,7 @@ static void parse_cmd_remote_object_info(struct batch_options *opt,
>         char *line_to_split;
>         static struct object_info *remote_object_info;
>         static struct oid_array object_info_oids = OID_ARRAY_INIT;
> +       struct string_list object_info_options = STRING_LIST_INIT_NODUP;
>
>         if (strlen(line) >= MAX_REMOTE_OBJ_INFO_LINE)
>                 die(_("remote-object-info command too long"));
> @@ -841,30 +845,44 @@ static void parse_cmd_remote_object_info(struct batch_options *opt,
>                 die(_("remote-object-info supports at most %d objects"),
>                     MAX_ALLOWED_OBJ_LIMIT);
>
> +       if (data->info.sizep)
> +               string_list_append(&object_info_options, "size");
> +       if (data->info.typep)
> +               string_list_append(&object_info_options, "type");
> +
>         if (get_remote_info(opt, count, argv, &remote_object_info,
> -                           &object_info_oids))
> +                           &object_info_oids, &object_info_options))
>                 goto cleanup;
>
> +       string_list_clear(&data->remote_allowed_atoms, 0);
> +       string_list_append(&data->remote_allowed_atoms, "objectname");
> +       for (size_t i = 0; i < ARRAY_SIZE(remote_atom_map); i++)
> +               if (unsorted_string_list_has_string(&object_info_options, remote_atom_map[i].option))
> +                       string_list_append(&data->remote_allowed_atoms,
> +                                          remote_atom_map[i].atom);
> +
>         data->skip_object_info = 1;
>         for (size_t i = 0; i < object_info_oids.nr; i++) {
>                 data->oid = object_info_oids.oid[i];
> -               if (remote_object_info[i].sizep) {
> -                       /*
> -                        * When reaching here, it means remote-object-info can retrieve
> -                        * information from server without downloading them.
> -                        */
> +               /*
> +                * When reaching here, it means remote-object-info can retrieve
> +                * information from server without downloading them.
> +                */
> +               if (remote_object_info[i].sizep)
>                         data->size = *remote_object_info[i].sizep;
> -                       opt->batch_mode = BATCH_MODE_INFO;
> -                       data->is_remote = 1;
> -                       batch_object_write(argv[i + 1], output, opt, data, NULL, 0);
> -                       data->is_remote = 0;
> -               }
> +               if (remote_object_info[i].typep)
> +                       data->type = *remote_object_info[i].typep;
> +               opt->batch_mode = BATCH_MODE_INFO;
> +               data->is_remote = 1;
> +               batch_object_write(argv[i + 1], output, opt, data, NULL, 0);
> +               data->is_remote = 0;
>         }
>         data->skip_object_info = 0;
>
>  cleanup:
>         for (size_t i = 0; i < object_info_oids.nr; i++)
>                 free_object_info_contents(&remote_object_info[i]);
> +       string_list_clear(&object_info_options, 0);
>         free(line_to_split);
>         free(argv);
>         free(remote_object_info);
> @@ -1177,6 +1195,7 @@ static int batch_objects(struct batch_options *opt)
>   cleanup:
>         strbuf_release(&input);
>         strbuf_release(&output);
> +       string_list_clear(&data.remote_allowed_atoms, 0);
>         warn_on_object_refname_ambiguity = save_warning;
>         return retval;
>  }
> diff --git a/fetch-object-info.c b/fetch-object-info.c
> index 51a898430d..425929a269 100644
> --- a/fetch-object-info.c
> +++ b/fetch-object-info.c
> @@ -39,6 +39,12 @@ int fetch_object_info(const enum protocol_version version, struct object_info_ar
>         case protocol_v2:
>                 if (!server_supports_v2("object-info"))
>                         die(_("object-info capability is not enabled on the server"));
> +
> +               for (int i = args->object_info_options->nr - 1; i >= 0; i--)

Isn't args->object_info_options->nr of type size_t? We should probably
do something
like:

for (size_t i = 0; i < args->args->object_info_options->nr; i++)

instead.

> +                       if (!server_supports_feature("object-info",
> +                                                    args->object_info_options->items[i].string, 0))
> +                               unsorted_string_list_delete_item(args->object_info_options, i, 0);
> +
>                 send_object_info_request(fd_out, args);
>                 break;
>         case protocol_v1:
>
> --
> 2.54.0

Other than these, the patch series LGTM for now.

Thanks,
Chandra.

^ permalink raw reply

* Re: [PATCH v2] describe: limit default ref iteration to tags
From: Tamir Duberstein @ 2026-06-09 14:44 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Patrick Steinhardt
In-Reply-To: <20260609110957.GB1509396@coredump.intra.peff.net>

On Tue, Jun 9, 2026 at 4:09 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Jun 08, 2026 at 07:32:14PM -0700, Tamir Duberstein wrote:
>
> > The benchmark checkout had 120,532 refs, of which 330 were tags. With
> > `$repo` naming the checkout, `$commit` an exactly tagged commit, and
> > `$parent` and `$this` the two binaries, I ran:
> >
> >     hyperfine --warmup 3 --runs 15 \
> >         --command-name parent \
> >         '$parent -C $repo describe --exact-match $commit' \
> >         --command-name 'this commit' \
> >         '$this -C $repo describe --exact-match $commit'
> >
> > The results were:
> >
> >     Benchmark 1: parent
> >       Time (mean ± σ):     171.7 ms ±  18.5 ms    [User: 23.9 ms, System: 133.6 ms]
> >       Range (min … max):   142.3 ms … 198.3 ms    15 runs
> >
> >     Benchmark 2: this commit
> >       Time (mean ± σ):       9.9 ms ±   1.1 ms    [User: 3.3 ms, System: 4.7 ms]
> >       Range (min … max):     8.8 ms …  13.1 ms    15 runs
> >
> >     Summary
> >       this commit ran
> >        17.35 ± 2.63 times faster than parent
> >
> > Both revisions were built with -O3, -mcpu=native, and ThinLTO using
> > Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
> > (Mac16,6) with a 16-core Apple M4 Max (12 performance and four
> > efficiency cores) and 128 GB RAM.
>
> This patch looks fine to me, but let me pick a nit for a minute, because
> I think there is a broader conversation to be had.

Just to say from the start: I appreciate you taking the time to discuss this.

>
> Given the discussion in earlier rounds and sibling topics, I assume the
> commit message here was AI-generated. And it's OK in the sense that it
> is describing what happened and I assume is entirely accurate. But as a
> human reader, it feels so much more verbose than what I'd expect, as it
> is full of semi-irrelevant details. Why set --warmup and --runs? Why
> bother with --command-name, which just means you have to show the
> commands separately anyway? Is the amount of RAM in the machine
> important for this test? Surely it could be if it was absurdly tiny, but
> in general, no, I would not expect it to be.

Well, the details matter in case some human reader knows something I
don't, or wants to reproduce the findings and observes something
completely different - they aught to be able to reconstruct my
environment.

The command-name flag is needed; without it the output of the
hyperfine would include local paths and would require post-processing
to include in the message.

>
> So while it is perhaps reasonable to document every detail in case
> somebody later wants to verify or reproduce timings, it is a little
> overwhelming when trying to tell a story, the core of which is:
>
>   In a repo with ~120k refs, ~300 of which were tags, running:
>
>     git describe --exact-match $some_tag
>
>   went from ~170ms to ~10ms, since we no longer needed to iterate all of
>   those other refs.
>
> That has _way_ less detail, but makes the point succinctly.

I don't disagree. Ultimately, it is a matter of maintainer preference,
and I'm happy to follow (and instruct the AI to follow) the
preferences described in this thread.

>
> I dunno. I am not trying to pick apart your commit in particular, but am
> more interested in the broader use of AI commit messages going forward.
> This kind of verbosity is quite common in the output (from my limited
> experience), and I think creates more work for reviewers. Should we be
> expecting contributors to make things more concise before submitting
> (either manually or through prompting)? Or do people even agree that the
> shorter version is preferable? I could be the only one.

The AI does what you tell it; in this case I was telling it to follow
the precedent in the repo and to ensure its claims are always cited.
I'll tune it for succinct output going forward.

>
> I have a few other comments on the patch itself below.
>
> > diff --git a/builtin/describe.c b/builtin/describe.c
> > index 1c47d7c0b7..3532c8ff22 100644
> > --- a/builtin/describe.c
> > +++ b/builtin/describe.c
> > @@ -740,6 +740,9 @@ int cmd_describe(int argc,
> >               return ret;
> >       }
> >
> > +     if (!all)
> > +             for_each_ref_opts.prefix = "refs/tags/";
> > +
> >       hashmap_init(&names, commit_name_neq, NULL, 0);
> >       refs_for_each_ref_ext(get_main_ref_store(the_repository),
> >                             get_name, NULL, &for_each_ref_opts);
>
> The code change looks fine. It creates a bit of a subtle dependency
> between what's happening here, and the filtering inside get_name(). But
> I think that's OK for the scope of a single command. It _might_ be
> possible to simplify the top of get_name(), since we'd no longer see
> non-tag refs in the input. But it also may not, since we have to strip
> out the prefix anyway. It can certainly come on top as a cleanup later
> if we want.
>
> > diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh
>
> It is a little curious that we add a perf test here, but the commit
> message does not even show it off. ;)
>
> I ran it myself here and had trouble showing improvement, simply because
> it is already quite fast! I guess that's because I'm on Linux, where
> warm-cache filesystem operations are pretty fast. Bumping $ref_count by
> a factor of 10 made the "before" case 30ms, and after is still sub-1ms.
>
> > +test_expect_success 'set up many unrelated refs' '
> > +     ref_count=10000 &&
> > +     git tag -m tip tip HEAD &&
> > +     for i in $(test_seq $ref_count)
> > +     do
> > +             printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
> > +             return 1
> > +     done >instructions &&
> > +     git update-ref --stdin <instructions
> > +'
>
> A few things come to mind on reading this.
>
> I have mixed feelings on sticking synthetic constructions in the t/perf
> suite. Part of the original point was that we'd run it against real
> repos to see how they perform. But that implies that people running it
> have some clue about which tests may be interesting on which repos,
> which is hopeful at best. So we've turned to this kind of synthetic
> construction at times (and this is certainly not the first). It's
> probably a reasonable tactic here.
>
> I suspect the resulting state is not all that realistic, though. If you
> have 10,000 refs, you probably didn't make them all at once. And so in
> practice the majority of them would be packed. Sticking "git pack-refs
> --all" at the end might give more realistic numbers.
>
> Bumping to a larger number of refs shows the effect more clearly, but at
> the cost of making the setup take a long time (since we have to take a
> lockfile on each ref!). We could sneak around it by generating a
> packed-refs file directly, but now the test really would be
> backend-specific. Probably better not to go there.
>
> And finally, the loop can be written a bit more succinctly these days
> as:
>
> diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh
> index ed9f1abe18..b365dc67ee 100755
> --- a/t/perf/p6100-describe.sh
> +++ b/t/perf/p6100-describe.sh
> @@ -30,12 +30,8 @@ test_perf 'describe HEAD with one tag' '
>  test_expect_success 'set up many unrelated refs' '
>         ref_count=10000 &&
>         git tag -m tip tip HEAD &&
> -       for i in $(test_seq $ref_count)
> -       do
> -               printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
> -               return 1
> -       done >instructions &&
> -       git update-ref --stdin <instructions
> +       test_seq -f "create refs/heads/describe-perf/%05d HEAD" $ref_count |
> +       git update-ref --stdin
>  '
>
>  test_perf 'describe exact tag with many unrelated refs' '
>
>
> Probably not worth re-rolling on its own, though.

The suggested changes seem reasonable to me. Certainly I am happy to
make them, and re-rolls are cheap. Do let me know explicitly if you'd
like that done.

>
> -Peff

Thanks for your time! I really appreciate it.

^ permalink raw reply

* Re: [GSoC PATCH v2 3/4] repo: add path.gitdir with absolute and relative suffix formatting
From: Justin Tobler @ 2026-06-09 14:31 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, a3205153416, gitster, kumarayushjha123, lucasseikioshiro,
	phillip.wood, sandals
In-Reply-To: <CA+rGoLdpkuigWXqNSk3bS7-uhtzCizkPx2GGtNaTyy5J1SF7Rg@mail.gmail.com>

On 26/06/09 10:11AM, K Jayatheerth wrote:
> > > +     struct strbuf sb = STRBUF_INIT;
> > > +     enum path_format fmt = (arg_path_format != -1) ? arg_path_format : def_format;
> >
> > hmmm, so `arg_path_format` specifies what the user-provided format and
> > acts as a sentinel to signal there is no value provided and the fallback
> > format needs to be used. This feels a tad bit awkward to me.
> >
> > I wonder if we should introduce a PATH_FORMAT_DEFAULT to the
> > `path_format` enum that maps to one of the existing enum values in
> > `path.c:format_path()`. Here in `print_path()`, we could then intercept
> > a PATH_FORMAT_DEFAULT value and override it to the specified
> > `def_format`. I'm not sure if this is ultimately that much better
> > though.
> 
> You're right that the -1 is awkward
> it forces arg_path_format to be an int rather than the enum type
> itself, which loses type safety.
> 
> PATH_FORMAT_DEFAULT is cleaner in that regard, but it pushes the "what
> does default mean?" question into format_path()
> which currently has no notion of a fallback.
> Since the fallback is call-site specific (each path type in rev-parse
> has its own default),
> I'd rather keep that logic in print_path() where the context lives.
> 
> A middle ground would be adding PATH_FORMAT_DEFAULT to the enum but
> not handling it in format_path().
> 
> ---
> enum path_format_type format = PATH_FORMAT_DEFAULT;
> 
> /* ... */
> 
> static void print_path(const char *path, const char *prefix,
>                        enum path_format_type format,
>                        enum path_format_type def_format)
> {
>         struct strbuf sb = STRBUF_INIT;
>         enum path_format_type fmt =
>             (format == PATH_FORMAT_DEFAULT) ? def_format : format;
> 
>         format_path(&sb, path, prefix, fmt);
>         puts(sb.buf);
>         strbuf_release(&sb);
> }
> ---

Intercepting PATH_FORMAT_DEFAULT in print_path() and overriding it to
the appropriate default needed for the specific path printed by
git-rev-parse(1), as shown above, seems reasonable to me.

But I do think that PATH_FORMAT_DEFAULT should have an actual default in
format_path(). Otherwise we would have an enum value that requires
callers to explicitly handle prior to invoking format_path() which would
also be rather awkward. IMO, it probably wouldn't be a big deal to just
say PATH_FORMAT_DEFAULT is treated as PATH_FORMAT_UNMODIFIED when passed
to format_path() and document it. In practice, our rev-parse use-case
would always replace PATH_FORMAT_DEFAULT with the appropriate value
prior to invoking format_path().

-Justin

^ permalink raw reply

* [PATCH v3] transport-helper: fix TSAN race in transfer_debug()
From: Pushkar Singh @ 2026-06-09 13:47 UTC (permalink / raw)
  To: pushkarkumarsingh1970; +Cc: git, gitster, peff
In-Reply-To: <20260604132327.277693-3-pushkarkumarsingh1970@gmail.com>

Currently, transfer_debug() lazily initializes a static variable based
on GIT_TRANSLOOP_DEBUG. Since the function may be called from multiple
worker threads, this initialization is racy and is therefore suppressed
in .tsan-suppressions.

Initialize the variable in bidirectional_transfer_loop() before any
worker threads or processes are created. This patch removes the race and
allows dropping the corresponding TSAN suppression.

Signed-off-by: Pushkar Singh <pushkarkumarsingh1970@gmail.com>
---
Changes since v2:
- Treat an uninitialized transfer_debug_enabled as a BUG()
  instead of silently treating it as disabled.
- Follow Jeff King's suggestion to distinguish an
  uninitialized state from a disabled state.

 .tsan-suppressions |  1 -
 transport-helper.c | 19 ++++++++-----------
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/.tsan-suppressions b/.tsan-suppressions
index 5ba86d6845..d84883bd90 100644
--- a/.tsan-suppressions
+++ b/.tsan-suppressions
@@ -7,7 +7,6 @@
 # A static variable is written to racily, but we always write the same value, so
 # in practice it (hopefully!) doesn't matter.
 race:^want_color$
-race:^transfer_debug$
 
 # A boolean value, which tells whether the replace_map has been initialized or
 # not, is read racily with an update. As this variable is written to only once,
diff --git a/transport-helper.c b/transport-helper.c
index 04d55572a9..5b639bff3d 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -1361,24 +1361,18 @@ int transport_helper_init(struct transport *transport, const char *name)
 /* This should be enough to hold debugging message. */
 #define PBUFFERSIZE 8192
 
+static int transfer_debug_enabled = -1;
+
 /* Print bidirectional transfer loop debug message. */
 __attribute__((format (printf, 1, 2)))
 static void transfer_debug(const char *fmt, ...)
 {
-	/*
-	 * NEEDSWORK: This function is sometimes used from multiple threads, and
-	 * we end up using debug_enabled racily. That "should not matter" since
-	 * we always write the same value, but it's still wrong. This function
-	 * is listed in .tsan-suppressions for the time being.
-	 */
-
 	va_list args;
 	char msgbuf[PBUFFERSIZE];
-	static int debug_enabled = -1;
 
-	if (debug_enabled < 0)
-		debug_enabled = getenv("GIT_TRANSLOOP_DEBUG") ? 1 : 0;
-	if (!debug_enabled)
+	if (transfer_debug_enabled < 0)
+		BUG("somebody forgot to check GIT_TRANSLOOP_DEBUG!");
+	if (!transfer_debug_enabled)
 		return;
 
 	va_start(args, fmt);
@@ -1648,6 +1642,9 @@ int bidirectional_transfer_loop(int input, int output)
 {
 	struct bidirectional_transfer_state state;
 
+	if (transfer_debug_enabled < 0)
+		transfer_debug_enabled = getenv("GIT_TRANSLOOP_DEBUG") ? 1 : 0;
+
 	/* Fill the state fields. */
 	state.ptg.src = input;
 	state.ptg.dest = 1;
-- 
2.53.0.582.gca1db8a0f7


^ permalink raw reply related

* Re: [PATCH v1 3/4] environment: move 'trust_executable_bit' into repo_config_values
From: Junio C Hamano @ 2026-06-09 13:45 UTC (permalink / raw)
  To: Tian Yuchen
  Cc: git, christian.couder, ps, Ayush Chandekar, Olamide Caleb Bello
In-Reply-To: <8083b217-4a56-48ee-b34d-b4596d45e382@malon.dev>

Tian Yuchen <cat@malon.dev> writes:

> On 6/1/26 18:10, Tian Yuchen wrote:
>
>> That’s true: I had actually planned to start migrating has_symlinks as 
>> soon as this series was approved. Since you think it would be better to 
>> merge them into a single series, I’ll go ahead and do that ;)
>> 
>
> I’ve found that migrating has_symlinks seems to be quite a tricky 
> business. Some callers in certain files pass very few parameters, and 
> the call stack is quite deep, if I am correct. so I feel that adding a 
> repo for this purpose might be overkill. Perhaps it would be better to 
> focus on trust_executable_bit for now?
>
> Regards, yuchen

Sounds fine.  Thanks for digging.

^ permalink raw reply

* Re: [PATCH v2] describe: limit default ref iteration to tags
From: D. Ben Knoble @ 2026-06-09 13:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Tamir Duberstein, git, Junio C Hamano, Patrick Steinhardt
In-Reply-To: <20260609110957.GB1509396@coredump.intra.peff.net>

On Tue, Jun 9, 2026 at 7:10 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Jun 08, 2026 at 07:32:14PM -0700, Tamir Duberstein wrote:
>
> > The benchmark checkout had 120,532 refs, of which 330 were tags. With
> > `$repo` naming the checkout, `$commit` an exactly tagged commit, and
> > `$parent` and `$this` the two binaries, I ran:
> >
> >     hyperfine --warmup 3 --runs 15 \
> >         --command-name parent \
> >         '$parent -C $repo describe --exact-match $commit' \
> >         --command-name 'this commit' \
> >         '$this -C $repo describe --exact-match $commit'
> >
> > The results were:
> >
> >     Benchmark 1: parent
> >       Time (mean ± σ):     171.7 ms ±  18.5 ms    [User: 23.9 ms, System: 133.6 ms]
> >       Range (min … max):   142.3 ms … 198.3 ms    15 runs
> >
> >     Benchmark 2: this commit
> >       Time (mean ± σ):       9.9 ms ±   1.1 ms    [User: 3.3 ms, System: 4.7 ms]
> >       Range (min … max):     8.8 ms …  13.1 ms    15 runs
> >
> >     Summary
> >       this commit ran
> >        17.35 ± 2.63 times faster than parent
> >
> > Both revisions were built with -O3, -mcpu=native, and ThinLTO using
> > Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
> > (Mac16,6) with a 16-core Apple M4 Max (12 performance and four
> > efficiency cores) and 128 GB RAM.
>
> This patch looks fine to me, but let me pick a nit for a minute, because
> I think there is a broader conversation to be had.
>
> Given the discussion in earlier rounds and sibling topics, I assume the
> commit message here was AI-generated. And it's OK in the sense that it
> is describing what happened and I assume is entirely accurate. But as a
> human reader, it feels so much more verbose than what I'd expect, as it
> is full of semi-irrelevant details. Why set --warmup and --runs? Why
> bother with --command-name, which just means you have to show the
> commands separately anyway? Is the amount of RAM in the machine
> important for this test? Surely it could be if it was absurdly tiny, but
> in general, no, I would not expect it to be.

[You probably know this] It is common in academic papers to report
benchmarks with details about the hardware and how they were run to
contextualize the results and help with reproducibility.

Of course, Git's commits do not form an academic paper… so I have no
real opinion on what to see here. But I've seen a few other mails
where having perf test outputs or similar was suggested (maybe that
was to be reserved for the cover letter? idk).

_If_ we show all the hyperfine details, I think it's reasonable to use
--command-name to make distinguishing the versions easy, unless it's
obvious from the path/to/git in each benchmark (which I think I've
seen from Peff's benchmark reports before?).

Someone with better lore skills can probably dig up a few exemplars of
how to write about performance in a commit message?

-- 
D. Ben Knoble

^ permalink raw reply

* Re: [PATCH v3 0/3] Documentation: recommend the use of b4
From: Junio C Hamano @ 2026-06-09 13:30 UTC (permalink / raw)
  To: Toon Claes
  Cc: Patrick Steinhardt, git, Tuomas Ahola, Weijie Yuan, Ramsay Jones,
	SZEDER Gábor, Kristoffer Haugsbakk
In-Reply-To: <87a4t32a4g.fsf@emacs.iotcl.com>

Toon Claes <toon@iotcl.com> writes:

> Anyhow, I don't think it's worth it to keep bike shedding about this. In
> all methods we recommend to Cc people, I think that's more important
> then

Good point to stress about whom to involve.

> caring about how messages are threaded (for example, I've noticed
> LKML doesn't thread at all, i.e. `b4.send-same-thread=no` which is the
> default).

As long as it does not hurt automation, I do not care too much, but
that default is somewhat surprising to me.  The setting actively
discourages tools from finding previous iterations.

> Bottom line, for me this series is good to go in.

Yeah, sounds good to me.

^ permalink raw reply

* Re: [PATCH RFC v2 2/2] builtin/history: abort reword on same message
From: Phillip Wood @ 2026-06-09 13:25 UTC (permalink / raw)
  To: Pablo Sabater, git; +Cc: cat, ps, kaartic.sivaraam, ben.knoble, gitster
In-Reply-To: <20260609-ps-history-reword-v2-2-a0e6028ca9b4@gmail.com>

Hi Pablo

On 09/06/2026 11:42, Pablo Sabater wrote:
>   static int commit_tree_ext(struct repository *repo,
> @@ -135,6 +136,13 @@ static int commit_tree_ext(struct repository *repo,
>   					  original_body, action, &commit_message);
>   		if (ret < 0)
>   			goto out;
> +
> +		if (flags & COMMIT_TREE_ABORT_ON_SAME_MESSAGE &&
> +		    !strcmp(original_body, commit_message.buf)) {
> +			fprintf(stderr, _("Message unchanged, aborting reword.\n"));
> +			ret = 1;
> +			goto out;
> +		}

I wonder if we should check that the committer identity is unchanged as 
well in case anyone is using this to fix commits after committing with 
the wrong identity.

Aborting when the message and committer identity are unchanged seems 
like a good idea.

Thanks

Phillip

>   	} else {
>   		strbuf_addstr(&commit_message, original_body);
>   	}
> @@ -693,7 +701,8 @@ static int cmd_history_reword(int argc,
>   	struct strbuf reflog_msg = STRBUF_INIT;
>   	struct commit *original, *rewritten;
>   	struct rev_info revs = { 0 };
> -	enum commit_tree_flags flags = COMMIT_TREE_EDIT_MESSAGE;
> +	enum commit_tree_flags flags = COMMIT_TREE_EDIT_MESSAGE |
> +				       COMMIT_TREE_ABORT_ON_SAME_MESSAGE;
>   	int ret;
>   
>   	argc = parse_options(argc, argv, prefix, options, usage, 0);
> @@ -721,6 +730,9 @@ static int cmd_history_reword(int argc,
>   	if (ret < 0) {
>   		ret = error(_("failed writing reworded commit"));
>   		goto out;
> +	} else if (ret == 1) {
> +		ret = 0;
> +		goto out;
>   	}
>   
>   	strbuf_addf(&reflog_msg, "reword: updating %s", argv[0]);
> diff --git a/t/t3451-history-reword.sh b/t/t3451-history-reword.sh
> index de7b357685..6e0e278c42 100755
> --- a/t/t3451-history-reword.sh
> +++ b/t/t3451-history-reword.sh
> @@ -396,4 +396,20 @@ test_expect_success 'retains changes in the worktree and index' '
>   	)
>   '
>   
> +test_expect_success 'aborts if the commit message is the same' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit first &&
> +		test_commit second &&
> +
> +		git rev-parse HEAD >oid-before &&
> +		GIT_EDITOR=true git history reword HEAD 2>err &&
> +		git rev-parse HEAD >oid-after &&
> +		test_cmp oid-before oid-after &&
> +		test_grep "Message unchanged" err
> +	)
> +'
> +
>   test_done
> diff --git a/t/t3453-history-fixup.sh b/t/t3453-history-fixup.sh
> index 868298e248..9f9a3c93de 100755
> --- a/t/t3453-history-fixup.sh
> +++ b/t/t3453-history-fixup.sh
> @@ -443,6 +443,28 @@ test_expect_success '--reedit-message opens editor for the commit message' '
>   	)
>   '
>   
> +test_expect_success 'fixup --reedit-message does not abort with the same commit message' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit initial &&
> +		echo content > file.txt &&
> +		git add file.txt &&
> +		git commit -m "add file" &&
> +
> +		echo fix >>file.txt &&
> +		git add file.txt &&
> +		GIT_EDITOR=true git history fixup --reedit-message HEAD &&
> +		expect_changes --branches <<-\EOF
> +		add file
> +		2	0	file.txt
> +		initial
> +		1	0	initial.t
> +		EOF
> +	)
> +'
> +
>   test_expect_success 'retains unstaged working tree changes after fixup' '
>   	test_when_finished "rm -rf repo" &&
>   	git init repo &&
> 


^ permalink raw reply

* Re: [PATCH v2] describe: limit default ref iteration to tags
From: Junio C Hamano @ 2026-06-09 13:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Tamir Duberstein, git, Patrick Steinhardt
In-Reply-To: <20260609110957.GB1509396@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> So while it is perhaps reasonable to document every detail in case
> somebody later wants to verify or reproduce timings, it is a little
> overwhelming when trying to tell a story, the core of which is:
>
>   In a repo with ~120k refs, ~300 of which were tags, running:
>
>     git describe --exact-match $some_tag
>
>   went from ~170ms to ~10ms, since we no longer needed to iterate all of
>   those other refs.
>
> That has _way_ less detail, but makes the point succinctly.
>
> I dunno. I am not trying to pick apart your commit in particular, but am
> more interested in the broader use of AI commit messages going forward.
> This kind of verbosity is quite common in the output (from my limited
> experience), and I think creates more work for reviewers. Should we be
> expecting contributors to make things more concise before submitting
> (either manually or through prompting)? Or do people even agree that the
> shorter version is preferable? I could be the only one.

Count me in.  You are the one who often gives us a patch with 60
lines that explains a single line change, but I haven't found these
60 lines are _overly verbose_ in the same way as AI generated log
messages.

^ permalink raw reply

* Re: [PATCH v14 2/6] branch: let delete_branches warn instead of error on bulk refusal
From: Phillip Wood @ 2026-06-09 13:21 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget, git
  Cc: Kristoffer Haugsbakk, Johannes Sixt, Harald Nordgren
In-Reply-To: <7ef9502e01055fd5442550cf34d491fd21a9b971.1780999917.git.gitgitgadget@gmail.com>

Hi Harald

On 09/06/2026 11:11, Harald Nordgren via GitGitGadget wrote:
> From: Harald Nordgren <haraldnordgren@gmail.com>
> 
> Add a warn-only mode to delete_branches() and check_branch_commit()
> so a bulk caller can report branches that are not fully merged as a
> short warning and carry on, rather than erroring with the longer
> "use 'git branch -D'" advice that the plain "git branch -d" path
> emits. Existing callers are unaffected.

There is no mention here of the conversion to use a flags argument which 
should be a separate preparatory commit
  > @@ -189,20 +189,33 @@ static int branch_merged(int kind, const char 
*name,
>   	return merged;
>   }
>   
> +enum delete_branch_flags {
> +	DELETE_BRANCH_FORCE = (1 << 0),
> +	DELETE_BRANCH_QUIET = (1 << 1),
> +	DELETE_BRANCH_WARN_ONLY = (1 << 2),
> +};
> +
>   static int check_branch_commit(const char *branchname, const char *refname,
>   			       const struct object_id *oid, struct commit *head_rev,
> -			       int kinds, int force)
> +			       int kinds, unsigned int flags)
>   {
> +	int force = flags & DELETE_BRANCH_FORCE;

This is missing "!!" to keep the value the same (alternatively we could 
perhaps convert "force" to a bool, though I haven't looked too closely 
at how it is used in the rest of the function). Apart from that this 
good, unlike the conversion below it means the rest of the function sees 
the same variables in the same state as it did before the conversion. It 
would be a good idea to follow this pattern for the new flag.

	bool warn = flags & DELETE_BRANCH_WARN_ONLY;

and then just use "warn" later. It is a common pattern in our code to 
take a flags argument and split it out into various boolean variables at 
the start of the function to avoid a lot of awkward bit masks in the 
main body of the function.

> @@ -217,8 +230,8 @@ static void delete_branch_config(const char *branchname)
>   	strbuf_release(&buf);
>   }
>   
> -static int delete_branches(int argc, const char **argv, int force, int kinds,
> -			   int quiet)
> +static int delete_branches(int argc, const char **argv, int kinds,
> +			   unsigned int flags)
>   {
>   	struct commit *head_rev = NULL;
>   	struct object_id oid;
> @@ -227,6 +240,7 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>   	int i;
>   	int ret = 0;
>   	int remote_branch = 0;
> +	int force, quiet;

We should initialize these here using flags so that the rest of the code 
sees the same variables and values as it did before the conversion.

>   	struct strbuf bname = STRBUF_INIT;
>   	enum interpret_branch_kind allowed_interpret;
>   	struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
> @@ -241,7 +255,7 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>   		remote_branch = 1;
>   		allowed_interpret = INTERPRET_BRANCH_REMOTE;
>   
> -		force = 1;
> +		flags |= DELETE_BRANCH_FORCE;
>   		break;
>   	case FILTER_REFS_BRANCHES:
>   		fmt = "refs/heads/%s";
> @@ -252,12 +266,15 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>   	}
>   	branch_name_pos = strcspn(fmt, "%");
>   
> +	force = flags & DELETE_BRANCH_FORCE;
> +	quiet = flags & DELETE_BRANCH_QUIET;
> +
>   	if (!force)
>   		head_rev = lookup_commit_reference(the_repository, &head_oid);
>   
>   	for (i = 0; i < argc; i++, strbuf_reset(&bname)) {
>   		char *target = NULL;
> -		int flags = 0;
> +		int ref_flags = 0;

This is sensible so we don't shadow the new function argument but this 
added complexity is a good reason to split the flags change into its own 
commit before adding the warning flag.

I'll take a look at the other patches later this week - there is no need 
to send a new version before I've commented on the rest of the series.

Thanks

Phillip

>   
>   		copy_branchname(&bname, argv[i], allowed_interpret);
>   		free(name);
> @@ -279,7 +296,7 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>   					     RESOLVE_REF_READING
>   					     | RESOLVE_REF_NO_RECURSE
>   					     | RESOLVE_REF_ALLOW_BAD_NAME,
> -					     &oid, &flags);
> +					     &oid, &ref_flags);
>   		if (!target) {
>   			if (remote_branch) {
>   				error(_("remote-tracking branch '%s' not found"), bname.buf);
> @@ -291,7 +308,7 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>   									   | RESOLVE_REF_NO_RECURSE
>   									   | RESOLVE_REF_ALLOW_BAD_NAME,
>   									   &oid,
> -									   &flags);
> +									   &ref_flags);
>   				FREE_AND_NULL(virtual_name);
>   
>   				if (virtual_target)
> @@ -306,16 +323,17 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>   			continue;
>   		}
>   
> -		if (!(flags & (REF_ISSYMREF|REF_ISBROKEN)) &&
> +		if (!(ref_flags & (REF_ISSYMREF|REF_ISBROKEN)) &&
>   		    check_branch_commit(bname.buf, name, &oid, head_rev, kinds,
> -					force)) {
> -			ret = 1;
> +					flags)) {
> +			if (!(flags & DELETE_BRANCH_WARN_ONLY))
> +				ret = 1;
>   			goto next;
>   		}
>   
>   		item = string_list_append(&refs_to_delete, name);
> -		item->util = xstrdup((flags & REF_ISBROKEN) ? "broken"
> -				    : (flags & REF_ISSYMREF) ? target
> +		item->util = xstrdup((ref_flags & REF_ISBROKEN) ? "broken"
> +				    : (ref_flags & REF_ISSYMREF) ? target
>   				    : repo_find_unique_abbrev(the_repository, &oid, DEFAULT_ABBREV));
>   
>   	next:
> @@ -872,7 +890,9 @@ int cmd_branch(int argc,
>   	if (delete) {
>   		if (!argc)
>   			die(_("branch name required"));
> -		ret = delete_branches(argc, argv, delete > 1, filter.kind, quiet);
> +		ret = delete_branches(argc, argv, filter.kind,
> +				      (delete > 1 ? DELETE_BRANCH_FORCE : 0) |
> +				      (quiet ? DELETE_BRANCH_QUIET : 0));
>   		goto out;
>   	} else if (show_current) {
>   		print_current_branch_name();


^ permalink raw reply

* Re: [PATCH RFC 1/2] builtin/history: abort reword on unchanged message
From: Junio C Hamano @ 2026-06-09 13:21 UTC (permalink / raw)
  To: Pablo Sabater; +Cc: git, Patrick Steinhardt, Kaartic Sivaraam
In-Reply-To: <CAN5EUNRW3gyLKGC7x5BBMTNKtunoQks9AaXJse4PHvCziRF87A@mail.gmail.com>

Pablo Sabater <pabloosabaterr@gmail.com> writes:

> True, after reading it, history being more costly or the in memory are
> not good args.

And no argument, including that history is new, is a good excuse to
make these three things inconsistent, period.

One of the patches in your updated iteration claims

    When using `git history reword <commit>` if the new message is the same
    as the original, it continues and rewrites the history when nothing
    changed.

    `git commit --amend` and `git rebase -i` with reword share this behavior
    and it is wrong as well, but changing them breaks what people are used
    to. Take the opportunity of `git history` being a new command and handle
    it correctly from the start.

and I think this is a totally wrong attitude to go about this.

I may have said that it may have been a better default to try hard
to avoid making a change that is a no-op, other than that it changes
committer timestamp, while making the current "always create a new
commit object" behaviour optionally available, for these three
commands, and cited that the behaviour of 'pick' in 'rebase -i' that
avoids unnecessary rewrite as an example of a good practice.

But I do not think the existing behaviour to always rewrite is
*wrong* at all.  It may be wrong not to offer the other choice of
pretending no content change means no commit object change, but that
is a different story.

I also do not think *aborting* only when the message happens to be
the same is a valid mode of operation at all.

The most sensible first step, I think, is to add a new command line
option to "git history" (which will gain more history editing
subcommands) that tells the command to leave the original history
as-is when the only change rewriting commits would make would be to
the committer ident or timestamp information.  If in a future a new
replace-tree subcommand is added, e.g. if

    $ git history replace-tree HEAD~20 HEAD~27^{tree}

were a command to rewrite the history in such a way that 20th direct
ancestor of the current HEAD had a tree object HEAD~27^{tree}, by
derfault the command _should_ rewrite HEAD~10 and everything that
has it as an ancestor.  With the "--avoid-unnecsssary-rewrite"
optimization feature on, however, it may silently become a no-op
when HEAD~27^{tree} happened to be the same tree as HEAD~20^{tree}
so the only difference between rewritten and original HEAD~20 would
be when that commit object was created and by whom.

And give the same option to "rebase -i" or "commit --amend".  We can
discuss, educate the users, and flip the default at a major version
boundary, if the "avoid unnecessary rewrite" truly turns out to be a
better default (right now it is merely our speculation, and we do
not even know if the current behaviour is a worse default).

^ permalink raw reply

* Re: [PATCH v13 2/6] branch: let delete_branches warn instead of error on bulk refusal
From: Harald Nordgren @ 2026-06-09 13:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Harald Nordgren via GitGitGadget, git, Kristoffer Haugsbakk,
	Johannes Sixt, Phillip Wood
In-Reply-To: <xmqqa4t3ubwj.fsf@gitster.g>

On Tue, Jun 9, 2026 at 2:38 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Harald Nordgren <haraldnordgren@gmail.com> writes:
>
> > The GitHub CI has been broken for some time, maybe I should have told
> > you about this earlier, but it coincided with a period where other
> > open source projects I worked on also had mass CI failures, so I
> > chalked it up to upstream issues (GitHub, Linux, etc). But it seems to
> > have not gone away.
> >
> > All of my GitHub pull requests have broken tests (see e.g. which a
> > quite minimal change: https://github.com/git/git/pull/2313). This
> > makes it harder to detect actual issues. But of course it's not an
> > excuse.
>
> FWIW, the breakage was observed in my local testing, and that is why
> I found it so disturbing.  Apparently you didn't see such breakages
> that can be detected so easily during your local testing (otherwise
> you wouldn't have pushed it out to update your GitHub pull request),
> which may mean something in the test are platform dependent?

No, it was broken on my local machine as well. I was sloppy when I
pushed out v13 and didn't run tests locally.

Usually I will push to my GitHub PR incrementally as I work on a new
version. I don’t necessarily keep the GitHub PR clean between
submitting versions. I will diff against my latest version tag to see
my own progress, and if I mess it up I can always hard reset to the
latest version tag to start over from there.

Normally, I would never push out a new version unless the GitHub CI
passes. But it’s been broken for a month.


Harald

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox