Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v2] approxidate: make "today" wrap to midnight
From: Tuomas Ahola @ 2026-05-16 13:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King
In-Reply-To: <xmqqik8ncw98.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> wrote:

> Tuomas Ahola <taahol@utu.fi> writes:
> 
> > Although some commands do reject invalid approxidate expressions,
> > in other cases those are simply evaluated as the current time.
> > Oftentimes that is a perfectly good compromise to handle silly
> > requests, but it isn't without rough edges.
> > ...
> > Bind "today" to new function `date_today()` as an approxidate
> > special.  Make it return the last midnight if no specific time
> > is given; i.e. retain the old behavior of "noon today" and such.
> >
> > Document the new behavior of "git log --since=today" in
> > rev-list-options.adoc.
> >
> > Signed-off-by: Tuomas Ahola <taahol@utu.fi>
> > ---
> 
> I like this construction of argument.
> 
> How does this patch mesh with your earlier effort to make "noon" and
> "tea" more sensible?  Should we eject the "today is now" step from
> that series and instead queue this patch in its place?
> 
> Thanks.

Yes.  I have already rebased the series on top of this patch.  I will
soon(ish) post v4 without "today is now" step.

--Tuomas

^ permalink raw reply

* Re: [PATCH] commit-reach: use the decoration hash for tips_reachable_from_bases()
From: Derrick Stolee @ 2026-05-16 13:46 UTC (permalink / raw)
  To: Kristofer Karlsson, Jeff King; +Cc: Kristofer Karlsson via GitGitGadget, git
In-Reply-To: <CAL71e4NoKiRMGngCc-FYNX9PH5fTd6xpzMsfONefp+JwJ1-3BA@mail.gmail.com>

On 5/16/26 4:23 AM, Kristofer Karlsson wrote:
> Thanks for testing this, Jeff! You're right, the patch as posted
> regresses on your synthetic test case.
> 
> The issue is that when multiple refs point to the same commit,
> add_decoration overwrites earlier entries,
> so only one index gets stored. The marking itself is correct (the flag
> is on the shared commit object,
> so all duplicates get marked), but the j == min_generation_index check
> never fires for the minimum tip,
> so early termination breaks. The DFS walks the entire graph instead of
> stopping when all tips are found.
> 
> I have a fix for the early-termination bug (checking the flag at
> min_generation_index instead of comparing indices),
> but your suggestions about the API are well taken, I don't think the
> decoration hash is the right tool here.
> Since we only need set membership ("is this commit a tip?"), not a
> mapping, an object-flags bit or commit-slab would
> indeed be simpler and avoid the (void *)(i + 1) hack entirely.
> 
> I fixed it locally now for the linux test case and got a 4x speedup
> there too - the problem was failing the early termination.
> Some numbers when running against the linux repo on my machine:
> 
> Command          │ Baseline │     V1 (broken)     │     V2 (fixed)      │
> --no-merged HEAD │ 1.33s    │ 2.01s (1.5x slower) │ 0.31s (4.3x faster) │
> --merged HEAD    │ 1.35s    │ 1.96s (1.5x slower) │ 0.31s (4.3x faster) │
> 
> However, I'll still need to rethink the decoration map - I will come
> back with a better patch shortly.

This is indeed an interesting case (multiple decorations) that we should
make sure is covered by a test case so we don't fall into this mistake
again.

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH v1 01/11] git-gui: allow specifying path '.' to the browser
From: Mark Levedahl @ 2026-05-16 13:38 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: egg_mushroomcow, bootaina702, git
In-Reply-To: <796217c3-8998-47a8-9a46-298541708d41@kdbg.org>



On 5/15/26 11:54 AM, Johannes Sixt wrote:
> The description isn't precise, though. '.' means to list the current
> directory. The mentioned problem happens only if this is also the root
> of the working tree.
Easy to fix, will do.

Mark

^ permalink raw reply

* Re: [PATCH v2] approxidate: make "today" wrap to midnight
From: Junio C Hamano @ 2026-05-16 13:35 UTC (permalink / raw)
  To: Tuomas Ahola; +Cc: git, Jeff King
In-Reply-To: <20260516113622.23902-1-taahol@utu.fi>

Tuomas Ahola <taahol@utu.fi> writes:

> Although some commands do reject invalid approxidate expressions,
> in other cases those are simply evaluated as the current time.
> Oftentimes that is a perfectly good compromise to handle silly
> requests, but it isn't without rough edges.
> ...
> Bind "today" to new function `date_today()` as an approxidate
> special.  Make it return the last midnight if no specific time
> is given; i.e. retain the old behavior of "noon today" and such.
>
> Document the new behavior of "git log --since=today" in
> rev-list-options.adoc.
>
> Signed-off-by: Tuomas Ahola <taahol@utu.fi>
> ---

I like this construction of argument.

How does this patch mesh with your earlier effort to make "noon" and
"tea" more sensible?  Should we eject the "today is now" step from
that series and instead queue this patch in its place?

Thanks.

^ permalink raw reply

* Re: Noômen: Gifted Invite Enclosed (Github Community Partnership)
From: Sarah J @ 2026-05-16 13:29 UTC (permalink / raw)
  To: git
In-Reply-To: <019e17bf-b7d6-74cf-b622-b27affa43ca5@devthusiastcraft.com>

Hey Noômen,

We’re about to reassign the free lifetime Devthusiast membership we reserved for your GitHub profile.

If you’d like to claim it, just reply “yes.”

Otherwise we’ll pass it to another engineer in 2 days.

All the best,
Team @ Devthusiast

On Mon, May 11, 2026 3:55 PM, Sarah J <sarah.j@devthusiastcraft.com>
[sarah.j@devthusiastcraft.com]> wrote:

> Noômen, just checking if you saw my previous email.
> 
> Your GitHub profile was selected for a free lifetime membership to Devthusiast (normally $1,800/year).
> 
> If you want to join the newsletter, just reply “yes.”
> 
> Otherwise we’ll pass the spot to another engineer in 7 days.
> 
> All the best,
> Team @ Devthusiast
> On Sat, May 9, 2026 1:55 PM, Sarah J <sarah.j@devthusiastcraft.com>
> [sarah.j@devthusiastcraft.com]> wrote:
> 
> > Hey Noômen
> > 
> > Annually, we pick engineers from Github, and your GitHub profile https://github.com/bnhassin was selected this year.
> > 
> > You're officially invited to a lifetime membership to devthusiast, our email newsletter for tech founders that love to tinker. And because we selected your profile, it's completely free for you.
> > 
> > Some of what you can expect to find in our daily newsletter:
> > 
> > - Latest in AI: Latest AI news from our inside sources at OpenAI, Anthropic and Google
> > - VC Radar: The latest tech funding news, before they come out on Tech Crunch
> > - AI Wars, Model Power Rankings: Today’s leaderboard of the top AI models
> > - Tinker of the Week: One useful open-source tool that is flying under the radar
> > 
> > Please respond with “yes” to acknowledge receipt of this message, or we will have to choose a different profile. Once confirmed you will get your first newsletter edition!
> > 
> > Welcome,
> > Team @ Devthusiast
> >

^ permalink raw reply

* [PATCH v2] config: suggest the correct form when key contains "=" in set context
From: Harald Nordgren via GitGitGadget @ 2026-05-16 12:52 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, Harald Nordgren, Harald Nordgren
In-Reply-To: <pull.2302.git.git.1778680725459.gitgitgadget@gmail.com>

From: Harald Nordgren <haraldnordgren@gmail.com>

A user who types "git config pull.rebase=false" gets only "error:
invalid key: pull.rebase=false" with no clue what went wrong.

Emit a "did you mean ..." hint suggesting the split form.  Restrict it
to plausible-set contexts ("git config set", bare "git config <key>",
and their 2-arg forms); explicit "get"/"unset" keep the existing error.

"=" is legal inside a subsection, so only fire when "=" lands after
the last ".".  When the user supplied a separate value, use it in the
suggestion instead of the suffix after "=":

    $ git config set pull.rebase=false true
    error: invalid key: pull.rebase=false
    hint: did you mean "git config set pull.rebase true"?

Signed-off-by: Harald Nordgren <harald.nordgren@kostdoktorn.se>
---
    config: suggest the correct form when key contains "="
    
     * Hint moved from git_config_parse_key() to a new
       advise_setting_with_equals() in builtin/config.c; wired only into set
       and bare paths.
     * Only fires when = is after the last .; 2-arg forms use the user's
       value.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2302%2FHaraldNordgren%2Fconfig-hint-equals-key-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2302/HaraldNordgren/config-hint-equals-key-v2
Pull-Request: https://github.com/git/git/pull/2302

Range-diff vs v1:

 1:  56eb3ce6fd < -:  ---------- config: suggest the correct form when key contains "="
 -:  ---------- > 1:  40d9eb3e5c config: suggest the correct form when key contains "=" in set context


 builtin/config.c  | 30 ++++++++++++++++++++++++++++++
 t/t1300-config.sh | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/builtin/config.c b/builtin/config.c
index cf4ba0f7cc..f14a30e720 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -1,6 +1,7 @@
 #define USE_THE_REPOSITORY_VARIABLE
 #include "builtin.h"
 #include "abspath.h"
+#include "advice.h"
 #include "config.h"
 #include "color.h"
 #include "date.h"
@@ -210,6 +211,22 @@ static void check_argc(int argc, int min, int max)
 	exit(129);
 }
 
+static void advise_setting_with_equals(const char *key, const char *value)
+{
+	const char *last_dot = strrchr(key, '.');
+	const char *eq;
+
+	if (!last_dot)
+		return;
+	eq = strchr(last_dot + 1, '=');
+	if (!eq)
+		return;
+	if (!value)
+		value = eq + 1;
+	advise(_("did you mean \"git config set %.*s %s\"?"),
+	       (int)(eq - key), key, value);
+}
+
 static void show_config_origin(const struct config_display_options *opts,
 			       const struct key_value_info *kvi,
 			       struct strbuf *buf)
@@ -1133,6 +1150,11 @@ static int cmd_config_set(int argc, const char **argv, const char *prefix,
 
 	argc = parse_options(argc, argv, prefix, opts, builtin_config_set_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
+	if (argc == 1 && strchr(argv[0], '=')) {
+		error(_("wrong number of arguments, should be 2"));
+		advise_setting_with_equals(argv[0], NULL);
+		exit(129);
+	}
 	check_argc(argc, 2, 2);
 
 	if ((flags & CONFIG_FLAGS_FIXED_VALUE) && !value_pattern)
@@ -1160,6 +1182,8 @@ static int cmd_config_set(int argc, const char **argv, const char *prefix,
 			error(_("cannot overwrite multiple values with a single value\n"
 			"       Use --value=<pattern>, --append or --all to change %s."), argv[0]);
 	}
+	if (ret == CONFIG_INVALID_KEY)
+		advise_setting_with_equals(argv[0], argv[1]);
 
 	location_options_release(&location_opts);
 	free(comment);
@@ -1371,6 +1395,7 @@ static int cmd_config_actions(int argc, const char **argv, const char *prefix)
 	};
 	char *value = NULL, *comment = NULL;
 	int ret = 0;
+	int actions_implicit;
 	struct key_value_info default_kvi = KVI_INIT;
 
 	argc = parse_options(argc, argv, prefix, opts,
@@ -1385,6 +1410,7 @@ static int cmd_config_actions(int argc, const char **argv, const char *prefix)
 		exit(129);
 	}
 
+	actions_implicit = (actions == 0);
 	if (actions == 0)
 		switch (argc) {
 		case 1: actions = ACTION_GET; break;
@@ -1485,6 +1511,8 @@ static int cmd_config_actions(int argc, const char **argv, const char *prefix)
 		if (ret == CONFIG_NOTHING_SET)
 			error(_("cannot overwrite multiple values with a single value\n"
 			"       Use a regexp, --add or --replace-all to change %s."), argv[0]);
+		else if (ret == CONFIG_INVALID_KEY)
+			advise_setting_with_equals(argv[0], argv[1]);
 	}
 	else if (actions == ACTION_SET_ALL) {
 		check_write(&location_opts.source);
@@ -1515,6 +1543,8 @@ static int cmd_config_actions(int argc, const char **argv, const char *prefix)
 		check_argc(argc, 1, 2);
 		ret = get_value(&location_opts, &display_opts, argv[0], argv[1],
 				0, flags);
+		if (ret == CONFIG_INVALID_KEY && actions_implicit)
+			advise_setting_with_equals(argv[0], NULL);
 	}
 	else if (actions == ACTION_GET_ALL) {
 		check_argc(argc, 1, 2);
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 128971ee12..f46c081413 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -462,6 +462,53 @@ test_expect_success 'invalid key' '
 	test_must_fail git config inval.2key blabla
 '
 
+test_expect_success 'misplaced "=" in key: bare 1-arg form hints' '
+	test_must_fail git config pull.rebase=false 2>err &&
+	test_grep "invalid key: pull\\.rebase=false" err &&
+	test_grep "did you mean .git config set pull\\.rebase false." err
+'
+
+test_expect_success 'misplaced "=" in key: bare 2-arg form uses given value' '
+	test_must_fail git config pull.rebase=false true 2>err &&
+	test_grep "did you mean .git config set pull\\.rebase true." err
+'
+
+test_expect_success 'misplaced "=" in key: set subcommand uses given value' '
+	test_must_fail git config set pull.rebase=false true 2>err &&
+	test_grep "did you mean .git config set pull\\.rebase true." err
+'
+
+test_expect_success 'misplaced "=" in key: set with single arg hints' '
+	test_must_fail git config set pull.rebase=false 2>err &&
+	test_grep "wrong number of arguments" err &&
+	test_grep "did you mean .git config set pull\\.rebase false." err
+'
+
+test_expect_success 'misplaced "=" in key: explicit --get does not hint' '
+	test_must_fail git config --get pull.rebase=false 2>err &&
+	test_grep "invalid key: pull\\.rebase=false" err &&
+	test_grep ! "did you mean" err
+'
+
+test_expect_success 'misplaced "=" in key: get subcommand does not hint' '
+	test_must_fail git config get pull.rebase=false 2>err &&
+	test_grep ! "did you mean" err
+'
+
+test_expect_success 'misplaced "=" in key: unset subcommand does not hint' '
+	test_must_fail git config unset pull.rebase=false 2>err &&
+	test_grep ! "did you mean" err
+'
+
+test_expect_success '"=" inside subsection is valid, no hint' '
+	test_when_finished "rm -f subsection.cfg" &&
+	git config set -f subsection.cfg foo.bar=baz.boo qux 2>err &&
+	test_grep ! "did you mean" err &&
+	echo qux >expect &&
+	git config get -f subsection.cfg foo.bar=baz.boo >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'correct key' '
 	git config 123456.a123 987
 '

base-commit: 59ff4886a579f4bc91e976fe18590b9ae02c7a08
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH] config: suggest the correct form when key contains "="
From: Harald Nordgren @ 2026-05-16 12:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Harald Nordgren via GitGitGadget, git
In-Reply-To: <xmqqqzndel8c.fsf@gitster.g>

> And I think git_config_parse_key() is at a way too low level to tell
> in what context we are seeing this faulty key to guess end-user's
> intention to limit our "did you mean?"
>
> I also wonder if, given that "=" in anywhere other than three-level
> names, is invalid, we should just start accept
>
>         git config foo.bar=baz
>         git config set foo.bar=baz
>
> and interpret them as
>
>         git config set foo.bar baz

I tried implementing a version to be more liberal in what to accept, but
the implementation became very complex.

Moving in the other direction: show the warning, but try to make it more
correct.

(Also switching over to replying to emails with Gmail with 'plain text
mode'), hopefully there will be less miss-sends that end up on the wrong
topic from now on.)


Harald

^ permalink raw reply

* [PATCH v2] approxidate: make "today" wrap to midnight
From: Tuomas Ahola @ 2026-05-16 11:36 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Tuomas Ahola
In-Reply-To: <20260515205803.26211-1-taahol@utu.fi>

Although some commands do reject invalid approxidate expressions,
in other cases those are simply evaluated as the current time.
Oftentimes that is a perfectly good compromise to handle silly
requests, but it isn't without rough edges.

Because of the silent acceptance, it is easy to forget that
"today" isn't actually a valid approxidate format.  That is
a bit awkward because while the fallback logic of using the
current time does make some sense, there is no deliberative
decision behind such behavior of "today".  Indeed, whatever
(non-)action "today" currently has, is just an accidental
side effect.

That means "git log --since=today" is currently unlikely to
print anything at all as it tries to list commits dated with
*future* timestamps.  Arguably it would be more useful to
list the commits of the current day---i.e. those made since
midnight.

On the other hand, "git log --until=today" doesn't really
filter commits at all.  Changing the definition of "today"
would make it return the commits made before the current day.
That isn't without problems though---running "git log
--until=today" in the late afternoon could reasonably include
the work done earlier that day (as the command currently
does do).

Still the utility of no-op "--until=today" is debatable and
perhaps outweighed by the pros of having "--since=today" to
mean "--since=midnight".  The thing is that the approxidate
machinery doesn't know about its consumers, so the meaning
of "today" has to be the same for "--since" and "--until".

In fact, "git log --until=" is documented as

	`--until=<date>`::
	`--before=<date>`::
		Show commits older than _<date>_,

so excluding commits made today would actually match the
documentation more closely.

Moreover, a revision parameter "@{today}" is currently outright
rejected.  Making "today" a valid approxidate time format could
make a natural way to specify the state of the ref at the start
of the current day.

Bind "today" to new function `date_today()` as an approxidate
special.  Make it return the last midnight if no specific time
is given; i.e. retain the old behavior of "noon today" and such.

Document the new behavior of "git log --since=today" in
rev-list-options.adoc.

Signed-off-by: Tuomas Ahola <taahol@utu.fi>
---
Intervall-diff mot v1:
1:  849f058baf ! 1:  86bcb70ac2 approxidate: make "today" wrap to midnight
    @@ Commit message
         Oftentimes that is a perfectly good compromise to handle silly
         requests, but it isn't without rough edges.
     
    -    Let's consider what "git log --since=today" should yield.
    -    As it happens that "today" isn't actually a valid approxidate
    -    format, the command currently tries to list commits with
    -    *future* timestamps.  Perhaps it would make more sense if
    -    it returned the commits made since midnight---that is,
    -    during the current day.
    +    Because of the silent acceptance, it is easy to forget that
    +    "today" isn't actually a valid approxidate format.  That is
    +    a bit awkward because while the fallback logic of using the
    +    current time does make some sense, there is no deliberative
    +    decision behind such behavior of "today".  Indeed, whatever
    +    (non-)action "today" currently has, is just an accidental
    +    side effect.
    +
    +    That means "git log --since=today" is currently unlikely to
    +    print anything at all as it tries to list commits dated with
    +    *future* timestamps.  Arguably it would be more useful to
    +    list the commits of the current day---i.e. those made since
    +    midnight.
    +
    +    On the other hand, "git log --until=today" doesn't really
    +    filter commits at all.  Changing the definition of "today"
    +    would make it return the commits made before the current day.
    +    That isn't without problems though---running "git log
    +    --until=today" in the late afternoon could reasonably include
    +    the work done earlier that day (as the command currently
    +    does do).
    +
    +    Still the utility of no-op "--until=today" is debatable and
    +    perhaps outweighed by the pros of having "--since=today" to
    +    mean "--since=midnight".  The thing is that the approxidate
    +    machinery doesn't know about its consumers, so the meaning
    +    of "today" has to be the same for "--since" and "--until".
    +
    +    In fact, "git log --until=" is documented as
    +
    +            `--until=<date>`::
    +            `--before=<date>`::
    +                    Show commits older than _<date>_,
    +
    +    so excluding commits made today would actually match the
    +    documentation more closely.
     
         Moreover, a revision parameter "@{today}" is currently outright
         rejected.  Making "today" a valid approxidate time format could
    @@ Commit message
         special.  Make it return the last midnight if no specific time
         is given; i.e. retain the old behavior of "noon today" and such.
     
    +    Document the new behavior of "git log --since=today" in
    +    rev-list-options.adoc.
    +
         Signed-off-by: Tuomas Ahola <taahol@utu.fi>
     
    + ## Documentation/rev-list-options.adoc ##
    +@@ Documentation/rev-list-options.adoc: ordering and formatting options, such as `--reverse`.
    + 
    + `--since=<date>`::
    + `--after=<date>`::
    +-	Show commits more recent than _<date>_.
    ++	Show commits more recent than _<date>_.  As a special case,
    ++	'today' means the last midnight.
    + 
    + `--since-as-filter=<date>`::
    + 	Show all commits more recent than _<date>_. This visits
    +
      ## date.c ##
     @@ date.c: static void date_never(struct tm *tm, struct tm *now UNUSED, int *num)
      	*num = 0;

 Documentation/rev-list-options.adoc |  3 ++-
 date.c                              | 10 ++++++++++
 t/t0006-date.sh                     |  2 ++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 2d195a1474..a5abadf689 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -23,7 +23,8 @@ ordering and formatting options, such as `--reverse`.
 
 `--since=<date>`::
 `--after=<date>`::
-	Show commits more recent than _<date>_.
+	Show commits more recent than _<date>_.  As a special case,
+	'today' means the last midnight.
 
 `--since-as-filter=<date>`::
 	Show all commits more recent than _<date>_. This visits
diff --git a/date.c b/date.c
index 17a95077cf..343d6aab6f 100644
--- a/date.c
+++ b/date.c
@@ -1192,6 +1192,15 @@ static void date_never(struct tm *tm, struct tm *now UNUSED, int *num)
 	*num = 0;
 }
 
+static void date_today(struct tm *tm, struct tm *now, int *num UNUSED)
+{
+	if (tm->tm_hour == now->tm_hour &&
+	    tm->tm_min == now->tm_min &&
+	    tm->tm_sec == now->tm_sec)
+		date_time(tm, now, 0);
+	update_tm(tm, now, 0);
+}
+
 static const struct special {
 	const char *name;
 	void (*fn)(struct tm *, struct tm *, int *);
@@ -1204,6 +1213,7 @@ static const struct special {
 	{ "AM", date_am },
 	{ "never", date_never },
 	{ "now", date_now },
+	{ "today", date_today },
 	{ NULL }
 };
 
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 53ced36df4..07bf6115ab 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -164,6 +164,7 @@ check_approxidate() {
 }
 
 check_approxidate now '2009-08-30 19:20:00'
+check_approxidate today '2009-08-30 00:00:00'
 check_approxidate '5 seconds ago' '2009-08-30 19:19:55'
 check_approxidate 5.seconds.ago '2009-08-30 19:19:55'
 check_approxidate 10.minutes.ago '2009-08-30 19:10:00'
@@ -187,6 +188,7 @@ check_approxidate 'last tuesday' '2009-08-25 19:20:00'
 check_approxidate 'July 5th' '2009-07-05 19:20:00'
 check_approxidate '06/05/2009' '2009-06-05 19:20:00'
 check_approxidate '06.05.2009' '2009-05-06 19:20:00'
+check_approxidate 'Jan 5 today' '2009-01-30 00:00:00'
 
 check_approxidate 'Jun 6, 5AM' '2009-06-06 05:00:00'
 check_approxidate '5AM Jun 6' '2009-06-06 05:00:00'

base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
-- 
2.30.2


^ permalink raw reply related

* Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
From: René Scharfe @ 2026-05-16 11:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano
In-Reply-To: <20260516025119.GA832077@coredump.intra.peff.net>

On 5/16/26 4:51 AM, Jeff King wrote:
> On Sat, May 16, 2026 at 01:01:05AM +0200, René Scharfe wrote:
> 
>>> I think as long as the behavior remains "slow, but we do not overflow
>>> any buffers" when you reach these limits, that's OK. Nobody is going to
>>> do it in practice, and we just want to make sure that malicious inputs
>>> cannot get out-of-bounds writes. It might be worth adding a comment,
>>> though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
>>> "alloc" in that macro.
>> There is no overflow check in either version (yet), so neither is safe
>> to operate close to the boundary.  Close meaning the intermediate term
>> (alloc + 16) * 3 being bigger than the maximum value.
> 
> Yes, but for some definition of safe. Both before and after your patch,
> as we get close to the boundary the allocation will grow slower than it
> should, but we'll never write out of bounds. The behavior for the "git
> foo" I showed earlier is slightly different:
> 
>   - before your patch, ~2GB we stop doubling and instead start growing
>     the array by one at each ALLOC_GROW() call. This is because
>     alloc_nr() overflows to a small value, but the:
> 
>       if (alloc_nr(alloc) < (nr))
>               alloc = (nr);
> 
>     check kicks in.
> 
>   - after your patch we grow to ~4GB, and then things get super slow.
>     This is because we correctly compute the new allocation as a size_t,
>     but then truncate it while assigning to alloc. So on the next
>     ALLOC_GROW() call, we'll think the buffer is way too small and try
>     to realloc again. I don't know why this is so much slower than the
>     grow-by-one above, but it is.
> 
> Neither is really correct, but both are in the realm of OK: stupidly
> large input doesn't perform well, but there's no buffer overflow
> vulnerability.
> 
> What I was worried about is what happens if you tweak your patch like
> this:
> 
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 2bc1f43f48..0730dd24ad 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -870,7 +870,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
>  		size_t alloc_grow_new_alloc_; \
>  		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
>  			alloc = alloc_grow_new_alloc_; \
> -			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
> +			REALLOC_ARRAY(x, alloc); \
>  		} \
>  	} while (0)
>  
> 
> In that case we really do end up with too-small allocations and
> out-of-bounds writes.
> 
> Maybe you saw that coming and that's why you wrote it as you did. But it
> is definitely subtle enough that I think it would merit a big warning
> comment that "alloc" and "alloc_grow_new_alloc_" are not necessarily the
> same type, and hence not necessarily the same value.

It was the economic thing to do: Fetching the value of user-supplied
alloc variable makes no sense when we have our calculated size_t value
at hand.

>> Here's a demo program exercising the arithmetic part of the macros:
> 
> I think the difference isn't in the arithmetic values that come out, but
> in what is fed to realloc() itself. And in your harness, realloc is just
> "x = true". If you actually store the value that would be passed to
> realloc() like this:
> 
> diff --git a/foo.c.orig b/foo.c
> index 2fbce8c..7498f36 100644
> --- a/foo.c.orig
> +++ b/foo.c
> @@ -11,7 +11,7 @@
>  				alloc = (nr); \
>  			else \
>  				alloc = alloc_nr(alloc); \
> -			x = true; \
> +			x = alloc; \
>  		} \
>  	} while (0)
>  
> @@ -31,7 +31,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
>  		size_t alloc_grow_new_alloc_; \
>  		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
>  			alloc = alloc_grow_new_alloc_; \
> -			x = true; \
> +			x = alloc_grow_new_alloc_; \
>  		} \
>  	} while (0)
>  
> @@ -44,7 +44,7 @@ int main(int argc, char **argv)
>  	for (T i = 0;; i++) {
>  		for (T j = MIN;; j++) {
>  			T alloc1 = j, alloc2 = j;
> -			bool allocated1 = false, allocated2 = false;
> +			size_t allocated1 = 0, allocated2 = 0;
>  			ALLOC_GROW1(allocated1, i, alloc1);
>  			ALLOC_GROW2(allocated2, i, alloc2);
>  			if (alloc1 != alloc2 || allocated1 != allocated2)
> 
> then you see the differences. For negative values, yeah, you end up with
> big size_t values. But for an unsigned type you get different small
> allocations.
Good point.  For unsigned char I get differences starting at 156
elements, here just the first few:

unsigned char nr=156 0 (155 -> 0) vs 256 (155 -> 0)
unsigned char nr=157 0 (155 -> 0) vs 256 (155 -> 0)
unsigned char nr=157 2 (156 -> 2) vs 258 (156 -> 2)

So the current code cuts the allocation size to 0 if you have an
array of 155 and ask for more entries.  That would cause a buffer
overrun.  With the patch ALLOC_GROW actually grows the buffer.

For signed char I see a different failure mode:

signed char nr=71 18446744073709551489 (70 -> -127) vs 129 (70 -> -127)
signed char nr=72 18446744073709551489 (70 -> -127) vs 129 (70 -> -127)
signed char nr=72 18446744073709551490 (71 -> -126) vs 130 (71 -> -126)

The current code tries to allocate (something close to) infinity,
which would terminate the program.  With the patch ALLOC_GROW
actually grows the buffer.

I don't see this for int and unsigned, though.  Weird C integer
promotion rules hit us here, I guess.  Just this, as you mentioned:

ALLOC_GROW1 unsigned nr=1787844770 1787844770 (1787844769 -> 1787844770) step too small, abort
ALLOC_GROW1 int nr=1787844770 1787844770 (1787844769 -> 1787844770) step too small, abort

The demo code is now big and hairy enough to need its own
tests, though.  *snicker*

René


#include <limits.h>
#include <stdbool.h>
#include <stdio.h>

#define alloc_nr(x) (((x)+16)*3/2)

#define ALLOC_GROW1(x, nr, alloc) \
	do { \
		if ((nr) > alloc) { \
			if (alloc_nr(alloc) < (nr)) \
				alloc = (nr); \
			else \
				alloc = alloc_nr(alloc); \
			x = alloc; \
		} \
	} while (0)

static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
{
	if (nr > alloc) {
		size_t out = alloc_nr(alloc);
		*outp = out < nr ? nr : out;
		return true;
	}
	return false;
}


#define ALLOC_GROW2(x, nr, alloc) \
	do { \
		size_t alloc_grow_new_alloc_; \
		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
			alloc = alloc_grow_new_alloc_; \
			x = alloc_grow_new_alloc_; \
		} \
	} while (0)

#define COMPARE(T, P, nr, alloc1, alloc_sz1, alloc2, alloc_sz2) do { \
	T orig_alloc1 = alloc1, orig_alloc2 = alloc2; \
	ALLOC_GROW1(alloc_sz1, nr, alloc1); \
	ALLOC_GROW2(alloc_sz2, nr, alloc2); \
	if (alloc_sz1 != alloc_sz2) \
		printf(#T" nr="P" %zu ("P" -> "P") vs %zu ("P" -> "P")\n", \
		       nr,  \
		       alloc_sz1, orig_alloc1, alloc1, \
		       alloc_sz2, orig_alloc2, alloc2); \
} while (0)

#define COMPARE_ALL(T, MIN, MAX, P) do { \
	for (T nr = 0;; nr++) { \
		for (T alloc = MIN;; alloc++) { \
			T alloc1 = alloc, alloc2 = alloc; \
			size_t alloc_sz1 = alloc1, alloc_sz2 = alloc2; \
			COMPARE(T, P, nr, alloc1, alloc_sz1, alloc2, alloc_sz2); \
			if (alloc == MAX) \
				break; \
		} \
		if (nr == MAX) \
			break; \
	} \
} while (0)

#define COMPARE_GROWTH(T, MAX, P) do { \
	T alloc1 = 0, alloc2 = 0; \
	size_t alloc_sz1 = 0, alloc_sz2 = 0; \
	for (T nr = 0;; nr++) { \
		COMPARE(T, P, nr, alloc1, alloc_sz1, alloc2, alloc_sz2); \
		if (nr == MAX) \
			break; \
	} \
} while (0)

#define CHECK_GROWTH_ONE(T, MAX, P, ALLOC_GROW) do { \
	T alloc = 0; \
	size_t alloc_sz = 0; \
	for (T nr = 0;; nr++) { \
		T orig_alloc = alloc; \
		size_t orig_alloc_sz = alloc_sz; \
		ALLOC_GROW(alloc_sz, nr, alloc); \
		if (alloc_sz < (size_t)nr) \
			printf(#ALLOC_GROW" "#T" nr="P" %zu ("P" -> "P")" \
			       " too small\n", \
			       nr, alloc_sz, orig_alloc, alloc); \
		if (alloc_sz > alloc_nr((size_t)nr)) \
			printf(#ALLOC_GROW" "#T" nr="P" %zu ("P" -> "P")" \
			       " too big\n", \
			       nr, alloc_sz, orig_alloc, alloc); \
		if (alloc_sz > orig_alloc_sz && \
		    alloc_sz - alloc_sz / 3 < orig_alloc_sz) { \
			printf(#ALLOC_GROW" "#T" nr="P" %zu ("P" -> "P")" \
			       " step too small, abort\n", \
			       nr, alloc_sz, orig_alloc, alloc); \
			break; \
		} \
		if (nr == MAX) \
			break; \
	} \
} while (0)

#define CHECK_GROWTH(T, MAX, P) do { \
	CHECK_GROWTH_ONE(T, MAX, P, ALLOC_GROW1); \
	CHECK_GROWTH_ONE(T, MAX, P, ALLOC_GROW2); \
} while (0)

int main(int argc, char **argv)
{
	COMPARE_ALL(unsigned char, 0, UCHAR_MAX, "%hhu");
	COMPARE_ALL(signed char, 0, SCHAR_MAX, "%hhd");
	COMPARE_GROWTH(short, SHRT_MAX, "%hd");
	CHECK_GROWTH(short, SHRT_MAX, "%hd");
	CHECK_GROWTH(unsigned short, USHRT_MAX, "%hu");
	CHECK_GROWTH(unsigned, UINT_MAX, "%u");
	CHECK_GROWTH(int, INT_MAX, "%d");
	return 0;
}


^ permalink raw reply

* [PATCH v2] commit-reach: use object flags for tips_reachable_from_bases()
From: Kristofer Karlsson via GitGitGadget @ 2026-05-16  9:16 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Kristofer Karlsson, Kristofer Karlsson,
	Kristofer Karlsson
In-Reply-To: <pull.2116.git.1778868463992.gitgitgadget@gmail.com>

From: Kristofer Karlsson <krka@spotify.com>

tips_reachable_from_bases() walks the commit graph from a set of base
commits to find which tip commits are reachable.  The inner loop does
a linear scan over the tips array to check whether each visited commit
is a tip, making the overall cost O(C * T) where C is commits walked
and T is the number of tips.

Use the RESULT object flag to mark tip commits, replacing the linear
scan with a single flag test per visited commit.  This reduces the
per-commit tip check from O(T) to O(1) and the overall cost from
O(C * T) to O(C + T).

When multiple refs point to the same commit, the shared object gets
the flag once, so all duplicates are handled automatically.  The
early-termination advancement loop checks the flag on the sorted
commits array directly, which naturally handles duplicates since the
flag is on the shared commit object.

This also removes the index field from struct commit_and_index, since
the indirection through the original tips array is no longer needed.

This function is called by `git for-each-ref --merged` and
`git branch/tag --contains/--no-contains` via reach_filter() in
ref-filter.c.

Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):

  Command                           Before    After   Speedup
  for-each-ref --merged HEAD        6.57s     1.59s     4.1x
  for-each-ref --no-merged HEAD     6.67s     1.66s     4.0x
  branch --merged HEAD              0.68s     0.61s      10%
  branch --no-merged HEAD           0.65s     0.61s       8%
  tag --merged HEAD                 0.12s     0.12s       -

On linux.git with 10,000 synthetic branches at the root commit (worst
case for the DFS walk):

  Command                           Before    After   Speedup
  for-each-ref --merged HEAD        1.35s     0.35s     3.9x
  for-each-ref --no-merged HEAD     1.82s     0.31s     5.9x

The large speedup for for-each-ref is because it checks all 10,000
refs as tips, making the O(T) inner loop expensive.  The branch
subcommand only checks local branches (fewer tips), so the improvement
is smaller.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
    commit-reach: use object flags for tips_reachable_from_bases()
    
    This replaces the O(C*T) linear scan in tips_reachable_from_bases() with
    an O(1) flag check using the RESULT object flag.
    
    The function is called by git for-each-ref --merged and git branch/tag
    --contains/--no-contains via reach_filter() in ref-filter.c.
    
    Benchmarks on a merge-heavy monorepo (2.3M commits, 10,000 refs):
    
     * for-each-ref --merged HEAD: 6.6s → 1.6s (4.1x)
     * for-each-ref --no-merged HEAD: 6.7s → 1.7s (4.0x)
    
    On linux.git with 10,000 synthetic branches at the root commit:
    
     * for-each-ref --merged HEAD: 1.35s → 0.35s (3.9x)
     * for-each-ref --no-merged HEAD: 1.82s → 0.31s (5.9x)
    
    v2 of this patch, addressing Jeff King's feedback:
    
     * Replaced the decoration hash with the RESULT object flag (simpler, no
       extra data structure, handles duplicate tips naturally)
     * Fixed early-termination bug when multiple refs point to the same
       commit (the decoration API overwrites on duplicate keys)
     * Removed the now-unused index field from struct commit_and_index
     * Diff is +11/-12 lines

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2116%2Fspkrka%2Ftips-reachable-minimal-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2116/spkrka/tips-reachable-minimal-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2116

Range-diff vs v1:

 1:  992c0aff0e ! 1:  7399a12518 commit-reach: use the decoration hash for tips_reachable_from_bases()
     @@ Metadata
      Author: Kristofer Karlsson <krka@spotify.com>
      
       ## Commit message ##
     -    commit-reach: use the decoration hash for tips_reachable_from_bases()
     +    commit-reach: use object flags for tips_reachable_from_bases()
      
          tips_reachable_from_bases() walks the commit graph from a set of base
          commits to find which tip commits are reachable.  The inner loop does
     @@ Commit message
          is a tip, making the overall cost O(C * T) where C is commits walked
          and T is the number of tips.
      
     -    Replace the linear scan with the decoration hash for lookups, reducing
     -    the per-commit tip check from O(T) to O(1) and the overall cost from
     +    Use the RESULT object flag to mark tip commits, replacing the linear
     +    scan with a single flag test per visited commit.  This reduces the
     +    per-commit tip check from O(T) to O(1) and the overall cost from
          O(C * T) to O(C + T).
      
     +    When multiple refs point to the same commit, the shared object gets
     +    the flag once, so all duplicates are handled automatically.  The
     +    early-termination advancement loop checks the flag on the sorted
     +    commits array directly, which naturally handles duplicates since the
     +    flag is on the shared commit object.
     +
     +    This also removes the index field from struct commit_and_index, since
     +    the indirection through the original tips array is no longer needed.
     +
          This function is called by `git for-each-ref --merged` and
          `git branch/tag --contains/--no-contains` via reach_filter() in
          ref-filter.c.
     @@ Commit message
          Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):
      
            Command                           Before    After   Speedup
     -      for-each-ref --merged HEAD        6.64s     1.66s     4.0x
     -      for-each-ref --no-merged HEAD     6.75s     1.74s     3.9x
     +      for-each-ref --merged HEAD        6.57s     1.59s     4.1x
     +      for-each-ref --no-merged HEAD     6.67s     1.66s     4.0x
            branch --merged HEAD              0.68s     0.61s      10%
            branch --no-merged HEAD           0.65s     0.61s       8%
            tag --merged HEAD                 0.12s     0.12s       -
      
     +    On linux.git with 10,000 synthetic branches at the root commit (worst
     +    case for the DFS walk):
     +
     +      Command                           Before    After   Speedup
     +      for-each-ref --merged HEAD        1.35s     0.35s     3.9x
     +      for-each-ref --no-merged HEAD     1.82s     0.31s     5.9x
     +
          The large speedup for for-each-ref is because it checks all 10,000
          refs as tips, making the O(T) inner loop expensive.  The branch
          subcommand only checks local branches (fewer tips), so the improvement
     @@ Commit message
          Signed-off-by: Kristofer Karlsson <krka@spotify.com>
      
       ## commit-reach.c ##
     +@@ commit-reach.c: void ahead_behind(struct repository *r,
     + 
     + struct commit_and_index {
     + 	struct commit *commit;
     +-	unsigned int index;
     + 	timestamp_t generation;
     + };
     + 
      @@ commit-reach.c: void tips_reachable_from_bases(struct repository *r,
     - 	size_t min_generation_index = 0;
     - 	timestamp_t min_generation;
     - 	struct commit_list *stack = NULL;
     -+	struct decoration tip_index = { "tip_index" };
       
     - 	if (!bases || !tips || !tips_nr)
     - 		return;
     + 	for (size_t i = 0; i < tips_nr; i++) {
     + 		commits[i].commit = tips[i];
     +-		commits[i].index = i;
     + 		commits[i].generation = commit_graph_generation(tips[i]);
     + 	}
     + 
      @@ commit-reach.c: void tips_reachable_from_bases(struct repository *r,
       	QSORT(commits, tips_nr, compare_commit_and_index_by_generation);
       	min_generation = commits[0].generation;
       
      +	for (size_t i = 0; i < tips_nr; i++)
     -+		add_decoration(&tip_index, &commits[i].commit->object,
     -+			       (void *)(i + 1));
     ++		commits[i].commit->object.flags |= RESULT;
      +
       	while (bases) {
       		repo_parse_commit(r, bases->item);
     @@ commit-reach.c: void tips_reachable_from_bases(struct repository *r,
      -				break;
      -
      -			if (commits[j].commit == c) {
     +-				tips[commits[j].index]->object.flags |= mark;
      +		{
     -+			size_t j = (size_t)lookup_decoration(&tip_index, &c->object) - 1;
     -+			if (j < tips_nr) {
     - 				tips[commits[j].index]->object.flags |= mark;
     ++			if (c->object.flags & RESULT) {
     ++				c->object.flags |= mark;
       
     - 				if (j == min_generation_index) {
     +-				if (j == min_generation_index) {
     +-					unsigned int k = j + 1;
     ++				if (commits[min_generation_index].commit->object.flags & mark) {
     ++					unsigned int k = min_generation_index + 1;
     + 					while (k < tips_nr &&
     +-					       (tips[commits[k].index]->object.flags & mark))
     ++					       (commits[k].commit->object.flags & mark))
     + 						k++;
     + 
     + 					/* Terminate early if all found. */
      @@ commit-reach.c: void tips_reachable_from_bases(struct repository *r,
       	}
       
       done:
     -+	clear_decoration(&tip_index, NULL);
     ++	for (size_t i = 0; i < tips_nr; i++)
     ++		commits[i].commit->object.flags &= ~RESULT;
       	free(commits);
       	repo_clear_commit_marks(r, SEEN);
       	commit_list_free(stack);


 commit-reach.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index d3a9b3ed6f..82614d2409 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1125,7 +1125,6 @@ void ahead_behind(struct repository *r,
 
 struct commit_and_index {
 	struct commit *commit;
-	unsigned int index;
 	timestamp_t generation;
 };
 
@@ -1165,7 +1164,6 @@ void tips_reachable_from_bases(struct repository *r,
 
 	for (size_t i = 0; i < tips_nr; i++) {
 		commits[i].commit = tips[i];
-		commits[i].index = i;
 		commits[i].generation = commit_graph_generation(tips[i]);
 	}
 
@@ -1173,6 +1171,9 @@ void tips_reachable_from_bases(struct repository *r,
 	QSORT(commits, tips_nr, compare_commit_and_index_by_generation);
 	min_generation = commits[0].generation;
 
+	for (size_t i = 0; i < tips_nr; i++)
+		commits[i].commit->object.flags |= RESULT;
+
 	while (bases) {
 		repo_parse_commit(r, bases->item);
 		commit_list_insert(bases->item, &stack);
@@ -1183,20 +1184,16 @@ void tips_reachable_from_bases(struct repository *r,
 		int explored_all_parents = 1;
 		struct commit_list *p;
 		struct commit *c = stack->item;
-		timestamp_t c_gen = commit_graph_generation(c);
 
 		/* Does it match any of our tips? */
-		for (size_t j = min_generation_index; j < tips_nr; j++) {
-			if (c_gen < commits[j].generation)
-				break;
-
-			if (commits[j].commit == c) {
-				tips[commits[j].index]->object.flags |= mark;
+		{
+			if (c->object.flags & RESULT) {
+				c->object.flags |= mark;
 
-				if (j == min_generation_index) {
-					unsigned int k = j + 1;
+				if (commits[min_generation_index].commit->object.flags & mark) {
+					unsigned int k = min_generation_index + 1;
 					while (k < tips_nr &&
-					       (tips[commits[k].index]->object.flags & mark))
+					       (commits[k].commit->object.flags & mark))
 						k++;
 
 					/* Terminate early if all found. */
@@ -1232,6 +1229,8 @@ void tips_reachable_from_bases(struct repository *r,
 	}
 
 done:
+	for (size_t i = 0; i < tips_nr; i++)
+		commits[i].commit->object.flags &= ~RESULT;
 	free(commits);
 	repo_clear_commit_marks(r, SEEN);
 	commit_list_free(stack);

base-commit: 59ff4886a579f4bc91e976fe18590b9ae02c7a08
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH v1 00/11] Improve git gui operation without a worktree
From: Johannes Sixt @ 2026-05-16  8:28 UTC (permalink / raw)
  To: Mark Levedahl; +Cc: egg_mushroomcow, bootaina702, git
In-Reply-To: <20260514143322.865587-1-mlevedahl@gmail.com>

Am 14.05.26 um 16:33 schrieb Mark Levedahl:
> git gui has a number of inter-related problems that result in problems
> during startup from anything but a checked out worktree pointing at a
> valid git repository. Some of the symptoms are:
> - blame / browser subcommands, and launching gitk, are intended to be
>   useful without a worktree, but fail to work.
> - unlike git, git-gui is supposed to use the parent directory as a
>   worktree if started from the .git subdirectory in the very common
>   single worktree + embedded git repository format. This does not
>   work.
> - git-gui includes a repository picker allowing a user select a worktree
>   from a list and/or start a new repo+worktree: this dialog appears at
>   unexpected times, masking useful error feedback on configuration
>   problems.
> 
> This patch series addresses the above issues, substantially rewriting
> the blame / browser command line process, the initial repository and
> worktree discovery processes, and using git rev-parse when possible to
> handle repository / worktree discovery including any specification of
> GIT_DIR or GIT_WORK_TREE to reduce the future likelihood of conflict
> with command line git. This also allows explicit user control to avoid
> the repository picker masking a configuration error.

OK. Overall, this goes in the right direction. There are still open
questions and potential problems with this implementation. We also
disagree in a few details; see my comments on the patches.

> 
> Note: I question why git-gui ever exports GIT_WORK_TREE. If it is not
> empty, that is the current directory when startup is complete and any
> git command will use the current directory as the worktree.

I fully agree with this.

> If empty,
> there is no worktree and the current directory should be (and after this
> series, is) at the toplevel of the gitdir: again, there is nothing to
> communicate to another process.

Here I disagree. We should not need to change directory if no working
tree was found.

> If a process being launched needs a
> different worktree, that should be the startup directory given to the
> process without changing git-gui's current directory.

I haven't thought this through, but this sounds very reasonable.

> 
> Mark Levedahl (11):
>   git-gui: allow specifying path '.' to the browser
>   git-gui: refactor browser / blame argument parsing
>   git-gui: guard set/unset of GIT_DIR and GIT_WORK_TREE
>   git-gui: put choose_repository::pick in a proc
>   git-gui: use --absolute-git-dir
>   git gui: GIT_DIR / GIT_WORK_TREE make any discovery error fatal
>   git-gui: use rev-parse exclusively to find a repository
>   git-gui: simplify [is_bare] to report if a worktree is known
>   git-gui: support using repository parent dir as a worktree
>   git-gui: improve worktree discovery
>   git-gui: add gui and pick as explicit subcommands
> 
>  git-gui.sh | 276 ++++++++++++++++++++++++++---------------------------
>  1 file changed, 135 insertions(+), 141 deletions(-)
> 

-- Hannes


^ permalink raw reply

* Re: [PATCH] commit-reach: use the decoration hash for tips_reachable_from_bases()
From: Kristofer Karlsson @ 2026-05-16  8:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Kristofer Karlsson via GitGitGadget, git
In-Reply-To: <20260515211459.GA158762@coredump.intra.peff.net>

Thanks for testing this, Jeff! You're right, the patch as posted
regresses on your synthetic test case.

The issue is that when multiple refs point to the same commit,
add_decoration overwrites earlier entries,
so only one index gets stored. The marking itself is correct (the flag
is on the shared commit object,
so all duplicates get marked), but the j == min_generation_index check
never fires for the minimum tip,
so early termination breaks. The DFS walks the entire graph instead of
stopping when all tips are found.

I have a fix for the early-termination bug (checking the flag at
min_generation_index instead of comparing indices),
but your suggestions about the API are well taken, I don't think the
decoration hash is the right tool here.
Since we only need set membership ("is this commit a tip?"), not a
mapping, an object-flags bit or commit-slab would
indeed be simpler and avoid the (void *)(i + 1) hack entirely.

I fixed it locally now for the linux test case and got a 4x speedup
there too - the problem was failing the early termination.
Some numbers when running against the linux repo on my machine:

Command          │ Baseline │     V1 (broken)     │     V2 (fixed)      │
--no-merged HEAD │ 1.33s    │ 2.01s (1.5x slower) │ 0.31s (4.3x faster) │
--merged HEAD    │ 1.35s    │ 1.96s (1.5x slower) │ 0.31s (4.3x faster) │

However, I'll still need to rethink the decoration map - I will come
back with a better patch shortly.

- Kristofer

On Fri, 15 May 2026 at 23:15, Jeff King <peff@peff.net> wrote:
>
> On Fri, May 15, 2026 at 06:07:43PM +0000, Kristofer Karlsson via GitGitGadget wrote:
>
> > From: Kristofer Karlsson <krka@spotify.com>
> >
> > tips_reachable_from_bases() walks the commit graph from a set of base
> > commits to find which tip commits are reachable.  The inner loop does
> > a linear scan over the tips array to check whether each visited commit
> > is a tip, making the overall cost O(C * T) where C is commits walked
> > and T is the number of tips.
> >
> > Replace the linear scan with the decoration hash for lookups, reducing
> > the per-commit tip check from O(T) to O(1) and the overall cost from
> > O(C * T) to O(C + T).
> >
> > This function is called by `git for-each-ref --merged` and
> > `git branch/tag --contains/--no-contains` via reach_filter() in
> > ref-filter.c.
> >
> > Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):
> >
> >   Command                           Before    After   Speedup
> >   for-each-ref --merged HEAD        6.64s     1.66s     4.0x
> >   for-each-ref --no-merged HEAD     6.75s     1.74s     3.9x
> >   branch --merged HEAD              0.68s     0.61s      10%
> >   branch --no-merged HEAD           0.65s     0.61s       8%
> >   tag --merged HEAD                 0.12s     0.12s       -
> >
> > The large speedup for for-each-ref is because it checks all 10,000
> > refs as tips, making the O(T) inner loop expensive.  The branch
> > subcommand only checks local branches (fewer tips), so the improvement
> > is smaller.
>
> Hmm, I couldn't reproduce the speedup on something like linux.git (~1.4M
> commits) with a lot of synthetic branches. I'd think that old branches
> would be the most expensive, so I did:
>
>   old=$(git rev-list --reverse HEAD | head -n1)
>   seq --format="update refs/heads/branch%g $old" 10000 |
>   git update-ref --stdin
>
> Running "git for-each-ref --no-merged HEAD" takes ~650ms with stock Git.
> But with your patch, it goes to ~830ms!
>
> So what am I missing about your repo that it is so slow in the first
> place?
>
> >      * Hacking the array index into the decoration value as (void *)(i + 1)
> >        instead of storing a proper pointer
>
> The decoration API is not the most generic option here. There's an
> oidmap type, but you have to embed the hashmap bits into your struct,
> which is a lot of boilerplate if you're just storing an int. You can
> define a khash with a custom value type, and I think the existing
> oid_pos uses an int, which might be enough. All of those will store an
> extra copy of the oid, though for the sizes we're talking about that's
> not the end of the world.
>
> Since we're always mapping commits, you could define a commit-slab (each
> commit struct gets a unique id which we then index into a big array).
> See commit-slab.h for an example.
>
> I'm not very familiar with this code, but I wonder if we actually need
> to map at all. It looks like we are mostly interested in set inclusion,
> so perhaps an oidset() would work. Or even a bit in the object-flags.
>
> -Peff

^ permalink raw reply

* Re: [PATCH v1 11/11] git-gui: add gui and pick as explicit subcommands
From: Johannes Sixt @ 2026-05-16  8:18 UTC (permalink / raw)
  To: Mark Levedahl; +Cc: egg_mushroomcow, bootaina702, git
In-Reply-To: <20260514143322.865587-12-mlevedahl@gmail.com>

Am 14.05.26 um 16:33 schrieb Mark Levedahl:
> git-gui accepts subcommands blame | browser | citool, and assumes the
> subcommand is 'gui' if none is actually given, But, git gui also has a
> repository picker (choose_repository::pick) that can create a new
> repository + worktree, or choose an existing one, switch to that, and
> the run the gui. The user has no direct control over invoking the
> picker, instead the picker is triggered by failure in the repository /
> worktree discover process: this includes being started in a directory
> not controlled by git, which is probably the intended use case.
> 
> The picker can appear when the user has no intention of creating a new
> worktree, and the user cannot use the picker to create a new worktree
> inside another.
> 
> So, add two new explicit subcommands:
>     gui  - Run the gui if repository/worktree discovery succeeds, or die
>            with an error message, but never run the picker.
>     pick - First run the picker, regardless, then start the gui in
>            the chosen worktree.
> 
> Nothing in this changes the prior behavior, the alternates above must be
> explicitly selected to see any change.

OK.

> 
> Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
> ---
>  git-gui.sh | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/git-gui.sh b/git-gui.sh
> index 3a83dd5..c56aeef 100755
> --- a/git-gui.sh
> +++ b/git-gui.sh
> @@ -1021,6 +1021,7 @@ proc load_config {include_global} {
>  ##
>  ## feature option selection
>  
> +set run_picker_on_error 1
>  if {[regexp {^git-(.+)$} [file tail $argv0] _junk subcommand]} {
>  	unset _junk
>  } else {
> @@ -1030,6 +1031,7 @@ if {$subcommand eq {gui.sh}} {
>  	set subcommand gui
>  }
>  if {$subcommand eq {gui} && [llength $argv] > 0} {
> +	set run_picker_on_error 0
>  	set subcommand [lindex $argv 0]
>  	set argv [lrange $argv 1 end]
>  }
> @@ -1047,6 +1049,7 @@ blame {
>  	disable_option multicommit
>  	disable_option branch
>  	disable_option transport
> +	set run_picker_on_error 0
>  }
>  citool {
>  	enable_option singlecommit
> @@ -1055,6 +1058,7 @@ citool {
>  	disable_option multicommit
>  	disable_option branch
>  	disable_option transport
> +	set run_picker_on_error 0
>  
>  	while {[llength $argv] > 0} {
>  		set a [lindex $argv 0]

Can we please use the available disable_option and enable_option feature
instead of a new variable. Just for consistency around repository discovery.

> @@ -1162,14 +1166,28 @@ proc pick_repo {} {
>  	set picked 1
>  }
>  
> +# run repository picker if explicitly requested
> +switch -- $subcommand {
> +	pick {
> +		pick_repo
> +		set subcommand gui
> +		set run_picker_on_error 0
> +	}
> +}
> +

It just feels wrong to have a new pick_repo call before repository
discovery. Can we not treat this case below as if regular repository
discovery failed and then end up in the existing call of pick_repo?

>  # find repository.
>  if {[catch {
>  	set _gitdir [git rev-parse --absolute-git-dir]
>  } err]} {
>  	if {[is_gitvars_error $err]} {
>  		exit 1
> -	} else {
> +	}
> +	if {$run_picker_on_error} {
>  		pick_repo
> +	} else {
> +		catch {wm withdraw .}
> +		error_popup [strcat [mc "Git directory not found:"] "\n\n$err"]
> +		exit 1
>  	}
>  }
>  
> @@ -3051,7 +3069,7 @@ gui {
>  	# fall through to setup UI for commits
>  }
>  default {
> -	set err "[mc usage:] $argv0 \[{blame|browser|citool}\]"
> +	set err "[mc usage:] $argv0 \[{blame|browser|citool|gui|pick}\]"
>  	if {[tk windowingsystem] eq "win32"} {
>  		wm withdraw .
>  		tk_messageBox -icon error -message $err \

We don't need to switch on the new subcommands?

As a follow-up to my comment on 04/11: How relevant is it that variable
$picked is set in a 'git gui pick' invocation?

-- Hannes


^ permalink raw reply

* Re: [PATCH v1 10/11] git-gui: improve worktree discovery
From: Johannes Sixt @ 2026-05-16  8:16 UTC (permalink / raw)
  To: Mark Levedahl; +Cc: egg_mushroomcow, bootaina702, git
In-Reply-To: <20260514143322.865587-11-mlevedahl@gmail.com>

Am 14.05.26 um 16:33 schrieb Mark Levedahl:
> git gui's worktree discovery needs update based upon prior work in this
> series. In the normal case, all information we need comes directly from
> git rev-parse (--show-toplevel, and --show-prefix). Should this work, we
> have a valid worktree and all git gui commands can run.
> 
> If not, we need to consider:
> - if GIT_DIR or GIT_WORK_TREE are in the environment, just stop as we
>   the input configuration was wrong, the user must fix that.
> - if we have a browser or blame subcommand, no worktree is needed so
>   git-gui can run without.
> - using the git repository's parent is a valid worktree (if possible),
>   restoring prior behavior.
> 
> The current directory should be either the root of the worktree, if one
> is found, or the top-level of the git repository.

I disagree in the case where no working tree is found. Then there is no
point in changing the current directory.

> 
> Make it so. Also, make worktree discover directly follow repository
> discovery, reducing the locations that might need error trapping to
> catch configuration issues.

Good!

> 
> Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>

BTW, please make the subject line more descriptive. The word "improve"
does not convey anything of importance.

> ---
>  git-gui.sh | 56 ++++++++++++++++++++++--------------------------------
>  1 file changed, 23 insertions(+), 33 deletions(-)
> 
> diff --git a/git-gui.sh b/git-gui.sh
> index e326401..3a83dd5 100755
> --- a/git-gui.sh
> +++ b/git-gui.sh
> @@ -1173,6 +1173,28 @@ if {[catch {
>  	}
>  }
>  
> +# find worktree, continue without if not required
> +if {[catch {
> +	set _gitworktree [git rev-parse --show-toplevel]
> +	set _prefix [git rev-parse --show-prefix]
> +	cd $_gitworktree
> +} err]} {

We have three commands, each with their own possible failure sources.
One of the outcomes of an error is that we proceed anyway. I think that
this is incorrect if the 'cd' fails: we must not proceed if it fails.
Therefore, we must handle its failure separately.

> +	if {[is_gitvars_error $err]} {
> +		exit 1
> +	}
> +	set _gitworktree {}
> +	set _prefix {}
> +	if {[is_enabled bare]} {
> +		cd $_gitdir

Why change the directory here? If we run `git gui browser master dir` we
do not want to change the directory in an uncontrolled manner. The
argument parser will want to check for the existence of files, and then
we do not want to operate from a random directory.

Also, I think that the check must be for [is_bare] and not [is_enabled
bare].

> +	} elseif {![is_parent_worktree]} {
> +		catch {wm withdraw .}
> +		error_popup [strcat [mc "Cannot use bare repository:"] "\n\n" $_gitdir]
> +		exit 1
> +	}
> +}
> +
> +# repository and worktree config are complete, export them
> +set_gitdir_vars
>  
>  # Use object format as hash algorithm (either "sha1" or "sha256")
>  set hashalgorithm [git rev-parse --show-object-format]

This moves code around. In particular, we see load_config and
apply_config in the context below, which now happens only after these
calls. How certain are we that these have no effect on the code that
runs now earlier?

> @@ -1189,37 +1211,8 @@ if {$hashalgorithm eq "sha1"} {
>  load_config 0
>  apply_config
>  
> -set _gitworktree [git rev-parse --show-toplevel]
>  
> -if {$_prefix ne {}} {
> -	if {$_gitworktree eq {}} {
> -		regsub -all {[^/]+/} $_prefix ../ cdup
> -	} else {
> -		set cdup $_gitworktree
> -	}
> -	if {[catch {cd $cdup} err]} {
> -		catch {wm withdraw .}
> -		error_popup [strcat [mc "Cannot move to top of working directory:"] "\n\n$err"]
> -		exit 1
> -	}
> -	set _gitworktree [pwd]
> -	unset cdup
> -} elseif {![is_enabled bare]} {
> -	if {[is_bare]} {
> -		catch {wm withdraw .}
> -		error_popup [strcat [mc "Cannot use bare repository:"] "\n\n$_gitdir"]
> -		exit 1
> -	}
> -	if {$_gitworktree eq {}} {
> -		set _gitworktree [file dirname $_gitdir]
> -	}
> -	if {[catch {cd $_gitworktree} err]} {
> -		catch {wm withdraw .}
> -		error_popup [strcat [mc "No working directory"] " $_gitworktree:\n\n$err"]
> -		exit 1
> -	}
> -	set _gitworktree [pwd]
> -}
> +# Derive a human-readable repository name
>  set _reponame [file split [file normalize $_gitdir]]
>  if {[lindex $_reponame end] eq {.git}} {
>  	set _reponame [lindex $_reponame end-1]
> @@ -1227,9 +1220,6 @@ if {[lindex $_reponame end] eq {.git}} {
>  	set _reponame [lindex $_reponame end]
>  }
>  
> -# Export the final paths
> -set_gitdir_vars
> -
>  ######################################################################
>  ##
>  ## global init

-- Hannes


^ permalink raw reply

* Re: [PATCH v1 09/11] git-gui: support using repository parent dir as a worktree
From: Johannes Sixt @ 2026-05-16  8:14 UTC (permalink / raw)
  To: Mark Levedahl; +Cc: egg_mushroomcow, bootaina702, git
In-Reply-To: <20260514143322.865587-10-mlevedahl@gmail.com>

Am 14.05.26 um 16:33 schrieb Mark Levedahl:
> git-gui, since 87cd09f43e ("git-gui: work from the .git dir",
> 2010-01-23), has had the intent to allow starting from inside a
> repository, then switching to the parent directory if that is a valid
> worktree.
> 
> This certainly hasn't worked since 2d92ab32fd ("rev-parse: make
> --show-toplevel without a worktree an error", 2019-11-19) in git, but
> breaking this git-gui feature was unintentional.
> 
> Add a proc to test if the parent of the git repository is a valid
> worktree, and set that directory as the worktree if so. Use invocations
> of git rev-parse to assure all validity and safety checks included in
> git-core are executed.

BTW, missing sign-off.

> ---
>  git-gui.sh | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/git-gui.sh b/git-gui.sh
> index a03eaa7..e326401 100755
> --- a/git-gui.sh
> +++ b/git-gui.sh
> @@ -1100,6 +1100,23 @@ unset argv0dir
>  ##
>  ## repository setup
>  
> +proc is_parent_worktree {} {
> +	# Directory 'parent' of a repository named 'parent/.git' might be the worktree
> +	set ok 0
> +	if {[file tail $::_gitdir] eq {.git}} {
> +		set gitdir_parent [file join $::_gitdir {..}]
> +		set expected_worktree [file normalize $gitdir_parent]

We have [file dirname ...]. Is there a reason to avoid it?

> +		catch {set git_worktree [git -C $gitdir_parent rev-parse --show-toplevel]}
> +		if {[string compare $expected_worktree $git_worktree] == 0} {

The purpose of this check should be explained in a comment. I think it is:

For a repository with the database in a directory named .git we assume
that the working tree is the directory containing .git. But
configuration may point to a different worktree. Then we do not want to
hold on to our assumption.

However, whether [git -C elsewhere ...] uses the same gitdir that we
have discovered so far cannot be told from this piece of code alone.
Therefore, I think it is wrong to extract this check into a function.

Also, I don't think we can use string comparison here. On Windows, the
command returns the Windows style path, but Tcl my operate with a POSIX
style path.

> +			set ::_prefix {}
> +			set ::_gitworktree $git_worktree
> +			cd $git_worktree

So many side-effects in a function whose name suggests that it only does
some checks. Please, don't do that.

> +			set ok 1
> +		}
> +	}
> +	return $ok
> +}
> +
>  proc is_gitvars_error {err} {
>  	set havevars 0
>  	set GIT_DIR {}

In general, I am not a fan of commits that add new functions, but no
call sites. Please squash this into 10/11. Ditto for is_gitvars_error in
06/11.

-- Hannes

^ permalink raw reply

* Re: [PATCH v1 08/11] git-gui: simplify [is_bare] to report if a worktree is known
From: Johannes Sixt @ 2026-05-16  8:12 UTC (permalink / raw)
  To: Mark Levedahl; +Cc: egg_mushroomcow, bootaina702, git
In-Reply-To: <20260514143322.865587-9-mlevedahl@gmail.com>

Am 14.05.26 um 16:33 schrieb Mark Levedahl:
> git-gui includes proc is_bare, used in several places to make decisions
> on whether a worktree exists, but also in discovery to tell if a
> worktree can be supported.
> 
> But, is_bare is out of date with regard to multiple worktrees, safe
> repository guards, and possibly other relevant features known to git
> rev-parse. Also, is_bare caches its result on the first call, so is not
> useful if a later step in the discovery process finds a worktree.
> 
> So, simplify is_bare to report whether git-gui has a worktree or is
> working only from a repository.
> 
> Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
> ---
>  git-gui.sh | 25 +------------------------
>  1 file changed, 1 insertion(+), 24 deletions(-)
> 
> diff --git a/git-gui.sh b/git-gui.sh
> index 81789dd..a03eaa7 100755
> --- a/git-gui.sh
> +++ b/git-gui.sh
> @@ -372,7 +372,6 @@ if {[tk windowingsystem] eq "aqua"} {
>  set _appname {Git Gui}
>  set _gitdir {}
>  set _gitworktree {}
> -set _isbare {}
>  set _githtmldir {}
>  set _prefix {}
>  set _reponame {}
> @@ -524,29 +523,7 @@ proc get_config {name} {
>  }
>  
>  proc is_bare {} {
> -	global _isbare
> -	global _gitdir
> -	global _gitworktree
> -
> -	if {$_isbare eq {}} {
> -		if {[catch {
> -			set _bare [git rev-parse --is-bare-repository]
> -			switch  -- $_bare {
> -			true { set _isbare 1 }
> -			false { set _isbare 0}
> -			default { throw }
> -			}
> -		}]} {
> -			if {[is_config_true core.bare]
> -				|| ($_gitworktree eq {}
> -					&& [lindex [file split $_gitdir] end] ne {.git})} {
> -				set _isbare 1
> -			} else {
> -				set _isbare 0
> -			}
> -		}
> -	}
> -	return $_isbare
> +	return [expr {$::_gitworktree eq {}}]
>  }
>  
>  ######################################################################

Very nice!

IMO, regardless of which way we end up rewriting repository discovery,
the end result should be that we can use $_gitworktree like this to tell
whether we are in a bare repository or not.

-- Hannes


^ permalink raw reply

* Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
From: Johannes Sixt @ 2026-05-16  6:55 UTC (permalink / raw)
  To: Jeff King, René Scharfe; +Cc: Git List, Junio C Hamano
In-Reply-To: <20260515195049.GA149960@coredump.intra.peff.net>

Am 15.05.26 um 21:50 schrieb Jeff King:
> On Fri, May 15, 2026 at 03:08:18PM -0400, Jeff King wrote:
> 
>> On Fri, May 15, 2026 at 08:16:50PM +0200, René Scharfe wrote:
>>
>>> +		size_t alloc_grow_new_alloc_; \
>>> +		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
>>> +			alloc = alloc_grow_new_alloc_; \
>>> +			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
>>>  		} \
>>
>> What happens if a caller passes in an argument that isn't a size_t?
>> We'll check for overflow in the size_t space, and then truncate it when
>> we assign to alloc, I think.

> 
> Hmm, playing with it and looking a little closer, I think we don't end
> up overflowing the buffer because you use the size_t for
> REALLOC_ARRAY(). So the result is big, but then "alloc" is truncated.

Protect against double-evaluation of "alloc", too, using

	size_t *palloc = &(alloc);

and use *palloc in the two places, then all callers are forced to work
with a size_t as third argument. Don't know what the damage would be,
though.

-- Hannes


^ permalink raw reply

* Re: UBSan failing on expensive test(s)
From: Derrick Stolee @ 2026-05-16  2:55 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: git
In-Reply-To: <20260516021343.GA174647@coredump.intra.peff.net>

On 5/15/26 10:13 PM, Jeff King wrote:
> On Sat, May 16, 2026 at 08:43:07AM +0900, Junio C Hamano wrote:
> 
>> This started happening on 'next' that runs EXPENSIVE tests thanks to
>> Dscho's recent updates to enable them in CI.
>>
>> https://github.com/git/git/actions/runs/25896439353/job/76110441841#step:10:2172
>>
>> It claims that """
>>
>>      commit.c:1574:6: runtime error: signed integer overflow:
>>      -2147483648 - 1 cannot be represented in type 'int'
>>
>> """.
>>
>> Another is related in the sense that it used to be hidden behind
>> EXPENSIVE prerequisite, but is probably unrelated.
>>
>> https://github.com/git/git/actions/runs/25896439353/job/76110441842#step:10:156
> 
> These patches should fix both.
> 
>    [1/2]: apply: plug leak on "patch too large" error
>    [2/2]: commit: handle large commit messages in utf8 verification
Thanks for the fast fixes. I agree with you that 2GB commit
messages are silly, but the methods you change could be used
for other things so having better types is good. We can
consider blocking large commits as a feature another time.

-Stolee


^ permalink raw reply

* Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
From: Jeff King @ 2026-05-16  2:51 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano
In-Reply-To: <1d79d7bc-5441-4e72-9cb0-e8900f57172c@web.de>

On Sat, May 16, 2026 at 01:01:05AM +0200, René Scharfe wrote:

> > I think as long as the behavior remains "slow, but we do not overflow
> > any buffers" when you reach these limits, that's OK. Nobody is going to
> > do it in practice, and we just want to make sure that malicious inputs
> > cannot get out-of-bounds writes. It might be worth adding a comment,
> > though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
> > "alloc" in that macro.
> There is no overflow check in either version (yet), so neither is safe
> to operate close to the boundary.  Close meaning the intermediate term
> (alloc + 16) * 3 being bigger than the maximum value.

Yes, but for some definition of safe. Both before and after your patch,
as we get close to the boundary the allocation will grow slower than it
should, but we'll never write out of bounds. The behavior for the "git
foo" I showed earlier is slightly different:

  - before your patch, ~2GB we stop doubling and instead start growing
    the array by one at each ALLOC_GROW() call. This is because
    alloc_nr() overflows to a small value, but the:

      if (alloc_nr(alloc) < (nr))
              alloc = (nr);

    check kicks in.

  - after your patch we grow to ~4GB, and then things get super slow.
    This is because we correctly compute the new allocation as a size_t,
    but then truncate it while assigning to alloc. So on the next
    ALLOC_GROW() call, we'll think the buffer is way too small and try
    to realloc again. I don't know why this is so much slower than the
    grow-by-one above, but it is.

Neither is really correct, but both are in the realm of OK: stupidly
large input doesn't perform well, but there's no buffer overflow
vulnerability.

What I was worried about is what happens if you tweak your patch like
this:

diff --git a/git-compat-util.h b/git-compat-util.h
index 2bc1f43f48..0730dd24ad 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -870,7 +870,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
 		size_t alloc_grow_new_alloc_; \
 		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
 			alloc = alloc_grow_new_alloc_; \
-			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
+			REALLOC_ARRAY(x, alloc); \
 		} \
 	} while (0)
 

In that case we really do end up with too-small allocations and
out-of-bounds writes.

Maybe you saw that coming and that's why you wrote it as you did. But it
is definitely subtle enough that I think it would merit a big warning
comment that "alloc" and "alloc_grow_new_alloc_" are not necessarily the
same type, and hence not necessarily the same value.

> Here's a demo program exercising the arithmetic part of the macros:

I think the difference isn't in the arithmetic values that come out, but
in what is fed to realloc() itself. And in your harness, realloc is just
"x = true". If you actually store the value that would be passed to
realloc() like this:

diff --git a/foo.c.orig b/foo.c
index 2fbce8c..7498f36 100644
--- a/foo.c.orig
+++ b/foo.c
@@ -11,7 +11,7 @@
 				alloc = (nr); \
 			else \
 				alloc = alloc_nr(alloc); \
-			x = true; \
+			x = alloc; \
 		} \
 	} while (0)
 
@@ -31,7 +31,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
 		size_t alloc_grow_new_alloc_; \
 		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
 			alloc = alloc_grow_new_alloc_; \
-			x = true; \
+			x = alloc_grow_new_alloc_; \
 		} \
 	} while (0)
 
@@ -44,7 +44,7 @@ int main(int argc, char **argv)
 	for (T i = 0;; i++) {
 		for (T j = MIN;; j++) {
 			T alloc1 = j, alloc2 = j;
-			bool allocated1 = false, allocated2 = false;
+			size_t allocated1 = 0, allocated2 = 0;
 			ALLOC_GROW1(allocated1, i, alloc1);
 			ALLOC_GROW2(allocated2, i, alloc2);
 			if (alloc1 != alloc2 || allocated1 != allocated2)

then you see the differences. For negative values, yeah, you end up with
big size_t values. But for an unsigned type you get different small
allocations.

-Peff

^ permalink raw reply related

* [PATCH 2/2] commit: handle large commit messages in utf8 verification
From: Jeff King @ 2026-05-16  2:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <20260516021343.GA174647@coredump.intra.peff.net>

Running t4205 under UBSan with the EXPENSIVE prereq enabled triggers an
error when we try to create a commit message that is over 2GB:

  commit.c:1574:6: runtime error: signed integer overflow:
    -2147483648 - 1 cannot be represented in type 'int'

The problem is that find_invalid_utf8() is not prepared to handle
large buffers, as it uses an "int" to represent buffer sizes and
offsets.

We can fix this with a few changes:

  1. We'll take in "len" as a size_t (which is what the caller has
     anyway, since it's working with a strbuf).

  2. We need to return a size_t to give the offset to the invalid utf8,
     but we also need a sentinel value for "no invalid value"
     (previously "-1"). Let's split these to return a bool for "found
     invalid utf8" and then pass back the offset as an out-parameter.
     We'll switch the function name to match the new semantics.

  3. The caller in verify_utf8() uses a "long" to store buffer
     positions, which is a bit funny. This goes back to 08a94a145c
     (commit/commit-tree: correct latin1 to utf-8, 2012-06-28) and is
     perhaps trying to match our use of "unsigned long" for object sizes
     (though we don't care about it ever becoming negative here). This
     should be a size_t, too, as some platforms (like Windows) still use
     a 32-bit long on machines with 64-bit pointers.

  4. The "bytes" field within find_invalid_utf() does not have range
     problems. It is the number of bytes the utf8 sequence claims to
     have, so is limited by how many bits can be set in a single 8-bit
     byte. However, if we leave it as an "int" then the compiler will
     complain about the sign mismatch when comparing it to "len". So
     let's make it unsigned, too.

All of this is a little silly, of course, because 2GB text commit
messages are clearly nonsense. So we might consider rejecting them
outright, but it is easy enough to make these helper functions more
robust in the meantime.

Signed-off-by: Jeff King <peff@peff.net>
---
I tried to look carefully for any reasons why these variables could ever
be negative, but beyond the sentinel value in the return type, didn't
see one. But reviewers should double check. Note that "bad_offset =
offset-1" in the context looks suspicious, but we are guaranteed that
offset has been advanced at this point.

 commit.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/commit.c b/commit.c
index 4385ae4329..8cf09be39d 100644
--- a/commit.c
+++ b/commit.c
@@ -1558,16 +1558,16 @@ int commit_tree(const char *msg, size_t msg_len, const struct object_id *tree,
 	return result;
 }
 
-static int find_invalid_utf8(const char *buf, int len)
+static bool has_invalid_utf8(const char *buf, size_t len, size_t *bad_offset)
 {
-	int offset = 0;
+	size_t offset = 0;
 	static const unsigned int max_codepoint[] = {
 		0x7f, 0x7ff, 0xffff, 0x10ffff
 	};
 
 	while (len) {
 		unsigned char c = *buf++;
-		int bytes, bad_offset;
+		unsigned bytes;
 		unsigned int codepoint;
 		unsigned int min_val, max_val;
 
@@ -1578,7 +1578,7 @@ static int find_invalid_utf8(const char *buf, int len)
 		if (c < 0x80)
 			continue;
 
-		bad_offset = offset-1;
+		*bad_offset = offset-1;
 
 		/*
 		 * Count how many more high bits set: that's how
@@ -1595,11 +1595,11 @@ static int find_invalid_utf8(const char *buf, int len)
 		 * codepoints beyond U+10FFFF, which are guaranteed never to exist.
 		 */
 		if (bytes < 1 || 3 < bytes)
-			return bad_offset;
+			return true;
 
 		/* Do we *have* that many bytes? */
 		if (len < bytes)
-			return bad_offset;
+			return true;
 
 		/*
 		 * Place the encoded bits at the bottom of the value and compute the
@@ -1617,23 +1617,23 @@ static int find_invalid_utf8(const char *buf, int len)
 			codepoint <<= 6;
 			codepoint |= *buf & 0x3f;
 			if ((*buf++ & 0xc0) != 0x80)
-				return bad_offset;
+				return true;
 		} while (--bytes);
 
 		/* Reject codepoints that are out of range for the sequence length. */
 		if (codepoint < min_val || codepoint > max_val)
-			return bad_offset;
+			return true;
 		/* Surrogates are only for UTF-16 and cannot be encoded in UTF-8. */
 		if ((codepoint & 0x1ff800) == 0xd800)
-			return bad_offset;
+			return true;
 		/* U+xxFFFE and U+xxFFFF are guaranteed non-characters. */
 		if ((codepoint & 0xfffe) == 0xfffe)
-			return bad_offset;
+			return true;
 		/* So are anything in the range U+FDD0..U+FDEF. */
 		if (codepoint >= 0xfdd0 && codepoint <= 0xfdef)
-			return bad_offset;
+			return true;
 	}
-	return -1;
+	return false;
 }
 
 /*
@@ -1645,15 +1645,14 @@ static int find_invalid_utf8(const char *buf, int len)
 static int verify_utf8(struct strbuf *buf)
 {
 	int ok = 1;
-	long pos = 0;
+	size_t pos = 0;
 
 	for (;;) {
-		int bad;
+		size_t bad;
 		unsigned char c;
 		unsigned char replace[2];
 
-		bad = find_invalid_utf8(buf->buf + pos, buf->len - pos);
-		if (bad < 0)
+		if (!has_invalid_utf8(buf->buf + pos, buf->len - pos, &bad))
 			return ok;
 		pos += bad;
 		ok = 0;
-- 
2.54.0.490.gaeb18d0c26

^ permalink raw reply related

* [PATCH 1/2] apply: plug leak on "patch too large" error
From: Jeff King @ 2026-05-16  2:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <20260516021343.GA174647@coredump.intra.peff.net>

In apply_patch(), we return immediately if read_patch_file() returns an
error. Traditionally this was OK, since an error from strbuf_read()
would restore the strbuf to its unallocated state.

But since f1c0e3946e (apply: reject patches larger than ~1 GiB,
2022-10-25), we may also return an error if we successfully read the
patch but it is too large. In this case we leak the strbuf contents when
apply_patch() returns.

You can see it in action by running t4141 under LSan with the EXPENSIVE
prereq enabled.

We can fix this in one of two places:

  1. In read_patch_file(), we could release the buffer before returning
     the error, behaving more like a raw strbuf_read() call.

  2. In apply_patch(), we can release the strbuf ourselves before
     returning.

I picked the latter, since it future proofs us against read_patch_file()
getting new error modes. We also have a cleanup label in that function
already, so now our error handling at this spot matches the rest of the
function (and all of the variables are initialized such that the rest of
the cleanup is correctly a noop at this point).

Signed-off-by: Jeff King <peff@peff.net>
---
 apply.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/apply.c b/apply.c
index 4aa1694cfa..249248d4f2 100644
--- a/apply.c
+++ b/apply.c
@@ -4881,8 +4881,10 @@ static int apply_patch(struct apply_state *state,

 	state->patch_input_file = filename;
 	state->linenr = 1;
-	if (read_patch_file(&buf, fd) < 0)
-		return -128;
+	if (read_patch_file(&buf, fd) < 0) {
+		res = -128;
+		goto end;
+	}
 	offset = 0;
 	while (offset < buf.len) {
 		struct patch *patch;
-- 
2.54.0.490.gaeb18d0c26

^ permalink raw reply related

* Re: UBSan failing on expensive test(s)
From: Jeff King @ 2026-05-16  2:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <871pfcdyt0.fsf@gitster.g>

On Sat, May 16, 2026 at 08:43:07AM +0900, Junio C Hamano wrote:

> This started happening on 'next' that runs EXPENSIVE tests thanks to
> Dscho's recent updates to enable them in CI.
> 
> https://github.com/git/git/actions/runs/25896439353/job/76110441841#step:10:2172
> 
> It claims that """
> 
>     commit.c:1574:6: runtime error: signed integer overflow:
>     -2147483648 - 1 cannot be represented in type 'int'
> 
> """.
> 
> Another is related in the sense that it used to be hidden behind
> EXPENSIVE prerequisite, but is probably unrelated.
> 
> https://github.com/git/git/actions/runs/25896439353/job/76110441842#step:10:156

These patches should fix both.

  [1/2]: apply: plug leak on "patch too large" error
  [2/2]: commit: handle large commit messages in utf8 verification

 apply.c  |  6 ++++--
 commit.c | 31 +++++++++++++++----------------
 2 files changed, 19 insertions(+), 18 deletions(-)

-Peff

^ permalink raw reply

* [PATCH] pack-objects: fix typo in code comment
From: Clinton Phillips @ 2026-05-16  1:01 UTC (permalink / raw)
  To: git; +Cc: Clinton Phillips

"accomodate" -> "accommodate". Pure typo fix, no behavior change.

Signed-off-by: Clinton Phillips <clintdotphillips@gmail.com>
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index dd2480a7..80606890 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1341,7 +1341,7 @@ static void write_pack_file(void)
 			 * length of them as buffer length.
 			 *
 			 * Note that we need to subtract one though to
-			 * accomodate for the sideband byte.
+			 * accommodate for the sideband byte.
 			 */
 			struct hashfd_options opts = {
 				.progress = progress_state,
-- 
2.49.0


^ permalink raw reply related

* [PATCH] Documentation: fix typos in user docs and release notes
From: Clinton Phillips @ 2026-05-16  1:00 UTC (permalink / raw)
  To: git; +Cc: Clinton Phillips

Pure typo fixes, no semantic change.

- RelNotes/2.49.1.adoc: "updates to to Fedora base image" -> "updates to Fedora base image"
- RelNotes/2.49.0.adoc: "which turns out that that the object" -> "which turns out that the object"
- howto/revert-branch-rebase.adoc: "I happen to know that that merge" -> "I happen to know that the merge"
- config/bitmap-pseudo-merge.adoc: "psuedo-merge" -> "pseudo-merge"
- technical/bitmap-format.adoc: "included in the this psuedo-merge" -> "included in this pseudo-merge"

Signed-off-by: Clinton Phillips <clintdotphillips@gmail.com>
---
 Documentation/RelNotes/2.49.0.adoc            | 2 +-
 Documentation/RelNotes/2.49.1.adoc            | 2 +-
 Documentation/config/bitmap-pseudo-merge.adoc | 2 +-
 Documentation/howto/revert-branch-rebase.adoc | 2 +-
 Documentation/technical/bitmap-format.adoc    | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/RelNotes/2.49.0.adoc b/Documentation/RelNotes/2.49.0.adoc
index 494c8309..35684515 100644
--- a/Documentation/RelNotes/2.49.0.adoc
+++ b/Documentation/RelNotes/2.49.0.adoc
@@ -118,7 +118,7 @@ Fixes since v2.48
    placeholder signal (e.g. "--option=<value>").
    (merge 5b34dd08d0 as/long-option-help-i18n later to maint).
 
- * CI jobs gave sporadic failures, which turns out that that the
+ * CI jobs gave sporadic failures, which turns out that the
    object finalization code was giving an error when it did not have
    to.
    (merge d7fcbe2c56 ps/object-collision-check later to maint).
diff --git a/Documentation/RelNotes/2.49.1.adoc b/Documentation/RelNotes/2.49.1.adoc
index c619e8b4..697f0f22 100644
--- a/Documentation/RelNotes/2.49.1.adoc
+++ b/Documentation/RelNotes/2.49.1.adoc
@@ -9,4 +9,4 @@ notes for v2.43.7 for details.
 
 It also contains some updates to various CI bits to work around
 and/or to adjust to the deprecation of use of Ubuntu 20.04 GitHub
-Actions CI, updates to to Fedora base image.
+Actions CI, updates to Fedora base image.
diff --git a/Documentation/config/bitmap-pseudo-merge.adoc b/Documentation/config/bitmap-pseudo-merge.adoc
index 1f264eca..a0604a41 100644
--- a/Documentation/config/bitmap-pseudo-merge.adoc
+++ b/Documentation/config/bitmap-pseudo-merge.adoc
@@ -88,4 +88,4 @@ more useful).
 
 bitmapPseudoMerge.<name>.stableSize::
 	Determines the size (in number of commits) of a stable
-	psuedo-merge bitmap. The default is `512`.
+	pseudo-merge bitmap. The default is `512`.
diff --git a/Documentation/howto/revert-branch-rebase.adoc b/Documentation/howto/revert-branch-rebase.adoc
index a3e5595a..d56dde64 100644
--- a/Documentation/howto/revert-branch-rebase.adoc
+++ b/Documentation/howto/revert-branch-rebase.adoc
@@ -27,7 +27,7 @@ $ git checkout -b revert-c99 master
 Now I am on the 'revert-c99' branch.  Let's figure out which commit to
 revert.  I happen to know that the top of the 'master' branch is a
 merge, and its second parent (i.e. foreign commit I merged from) has
-the change I would want to undo.  Further I happen to know that that
+the change I would want to undo.  Further I happen to know that the
 merge introduced 5 commits or so:
 
 ------------------------------------------------
diff --git a/Documentation/technical/bitmap-format.adoc b/Documentation/technical/bitmap-format.adoc
index bfb0ec7b..59859c80 100644
--- a/Documentation/technical/bitmap-format.adoc
+++ b/Documentation/technical/bitmap-format.adoc
@@ -338,7 +338,7 @@ the end of a `.bitmap` file. The format is as follows:
 * One or more pseudo-merge bitmaps, each containing:
 
   ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of
-     commits included in the this psuedo-merge.
+     commits included in this pseudo-merge.
 
   ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of
      the set of objects reachable from all commits listed in the
-- 
2.49.0


^ permalink raw reply related

* Re: [RFC PATCH] approxidate: make "today" wrap to midnight
From: Junio C Hamano @ 2026-05-16  0:03 UTC (permalink / raw)
  To: Tuomas Ahola; +Cc: git, Jeff King
In-Reply-To: <20260515205803.26211-1-taahol@utu.fi>

Tuomas Ahola <taahol@utu.fi> writes:

> Although some commands do reject invalid approxidate expressions,
> in other cases those are simply evaluated as the current time.
> Oftentimes that is a perfectly good compromise to handle silly
> requests, but it isn't without rough edges.
>
> Let's consider what "git log --since=today" should yield.
> As it happens that "today" isn't actually a valid approxidate
> format, the command currently tries to list commits with
> *future* timestamps.  Perhaps it would make more sense if
> it returned the commits made since midnight---that is,
> during the current day.

I actually am of two minds about this.  

What should "git log --until=today" do when you run it in the late
afternoon?  Wouldn't you want to see what you did in the morning and
early afternoon?

Because we cannot define "today" as "--since=today means the latest
midnight and later, while --until=today means until the end of today
[*]" without introducing an extra hint to calls to approxidate() to
tell it who is calling for what, it is impossible to give these two
sensible behavior at the same time.

    Side note: but because the existing history is all about the past,
    "until the end of today" is by definition a synonym for "up to
    now", so defining "today" the same as "now" would make "until"
    behave just as sensibly as if we define it as "the end of today".

In practice, using "--until" to *not* truncate at all (which is what
--until=now or --until=end.of.today would essentially mean) has no
practical value, while "--since=beginning.of.today" does have more
utility, allowing you to specify "the last 8 hours and 50 minutes"
without knowing that it is now at 08:50 in the morning.  So I am
still in favor of interpreting "today" as "the latest midnight, the
beginning of today" because that would give us a more useful
behavior than other possible definitions.  We may want to strengthen
the justification behind the chosen definition of why we chose what
we chose over any other time in today with something like what I
said above, mentioning "--until".

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox