Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 05/11] reftable/block: fix OOB write with bogus inflated log size
From: Patrick Steinhardt @ 2026-06-29  7:08 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, oxsignal
In-Reply-To: <CAP8UFD0y0GVjdnWYDkOsk6R9-ReGfzr6ZEm8PbyHOHrdAETXzg@mail.gmail.com>

On Fri, Jun 26, 2026 at 09:48:36AM +0200, Christian Couder wrote:
> On Wed, Jun 24, 2026 at 10:24 AM Patrick Steinhardt <ps@pks.im> wrote:
> 
> > diff --git a/t/unit-tests/u-reftable-block.c b/t/unit-tests/u-reftable-block.c
> > index f4bded7d26..40274af5c0 100644
> > --- a/t/unit-tests/u-reftable-block.c
> > +++ b/t/unit-tests/u-reftable-block.c
> > @@ -456,3 +456,47 @@ void test_reftable_block__iterator(void)
> >         block_writer_release(&writer);
> >         reftable_buf_release(&data);
> >  }
> > +
> > +void test_reftable_block__corrupt_log_block_size(void)
> > +{
> > +       struct reftable_block_source source = { 0 };
> > +       struct block_writer writer = {
> > +               .last_key = REFTABLE_BUF_INIT,
> > +       };
> > +       struct reftable_record rec = {
> > +               .type = REFTABLE_BLOCK_TYPE_LOG,
> > +               .u.log = {
> > +                       .refname = (char *) "refs/heads/main",
> > +                       .update_index = 1,
> > +                       .value_type = REFTABLE_LOG_UPDATE,
> > +               },
> > +       };
> > +       struct reftable_block block = { 0 };
> > +       struct reftable_buf data;
> > +
> > +       data.len = 1024;
> > +       REFTABLE_CALLOC_ARRAY(data.buf, data.len);
> > +       cl_assert(data.buf != NULL);
> > +
> > +       cl_must_pass(block_writer_init(&writer, REFTABLE_BLOCK_TYPE_LOG,
> > +                                      (uint8_t *) data.buf, data.len,
> > +                                      0, hash_size(REFTABLE_HASH_SHA1)));
> > +       cl_must_pass(block_writer_add(&writer, &rec));
> > +       cl_assert(block_writer_finish(&writer) > 0);
> 
> It looks like some of the block writing code above could be simplified
> using an helper function like:
> 
> int cl_reftable_write_block(struct reftable_buf *buf, uint8_t block_type,
>                            size_t block_size, uint32_t header_off,
>                            struct reftable_record *recs, size_t nrecs)
> {
>        struct block_writer writer = {
>                .last_key = REFTABLE_BUF_INIT,
>        };
>        int block_end;
> 
>        REFTABLE_CALLOC_ARRAY(buf->buf, block_size);
>        cl_assert(buf->buf != NULL);
>        buf->len = block_size;
> 
>        cl_must_pass(block_writer_init(&writer, block_type, (uint8_t *) buf->buf,
>                                       block_size, header_off,
>                                       hash_size(REFTABLE_HASH_SHA1)));
>        for (size_t i = 0; i < nrecs; i++)
>                cl_must_pass(block_writer_add(&writer, &recs[i]));
> 
>        block_end = block_writer_finish(&writer);
>        cl_assert(block_end > 0);
> 
>        block_writer_release(&writer);
> 
>        return block_end;
> }
> 
> This function could be introduced by a preparatory commit in
> t/unit-tests/lib-reftable.{c,h}. It would be kind of similar to the
> existing cl_reftable_write_to_buf() helper in those files.
> 
> It looks like it could already simplify existing tests like:
> 
> - test_reftable_block__log_read_write
> - test_reftable_block__obj_read_write
> - test_reftable_block__ref_read_write
> - test_reftable_block__iterator
> 
> and it could simplify the new tests introduced by other patches in this series:
> 
> - 06/11 reftable/block: fix OOB read with bogus block size
> - 07/11 reftable/block: fix OOB read with bogus restart count
> - 09/11 reftable/block: fix OOB read with bogus restart offset

Good point, will do. Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH] history: streamline message preparation and plug file stream leak
From: Patrick Steinhardt @ 2026-06-29  6:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin
In-Reply-To: <xmqqecht8df1.fsf@gitster.g>

On Fri, Jun 26, 2026 at 09:38:42AM -0700, Junio C Hamano wrote:
> diff --git a/builtin/history.c b/builtin/history.c
> index 8dcb9a6046..f17ec049c0 100644
> --- a/builtin/history.c
> +++ b/builtin/history.c
> @@ -41,11 +41,6 @@ static int fill_commit_message(struct repository *repo,
>  		  " empty message aborts the commit.\n");
>  	struct wt_status s;
>  
> -	strbuf_addstr(out, default_message);
> -	strbuf_addch(out, '\n');
> -	strbuf_commented_addf(out, comment_line_str, hint, action, comment_line_str);
> -	write_file_buf(path, out->buf, out->len);
> -
>  	wt_status_prepare(repo, &s);
>  	FREE_AND_NULL(s.branch);
>  	s.ahead_behind_flags = AHEAD_BEHIND_QUICK;
> @@ -57,14 +52,20 @@ static int fill_commit_message(struct repository *repo,
>  	s.whence = FROM_COMMIT;
>  	s.committable = 1;
>  
> -	s.fp = fopen(git_path_commit_editmsg(), "a");

Here we reuse the local `path` variable, which already carries the
result of `git_path_commit_editmsg()`.

> +	s.fp = fopen(path, "w");
>  	if (!s.fp)
> -		return error_errno(_("could not open '%s'"), git_path_commit_editmsg());
> +		return error_errno(_("could not open '%s'"), path);

Likewise.

> +	strbuf_addstr(out, default_message);
> +	strbuf_addch(out, '\n');
> +	strbuf_commented_addf(out, comment_line_str, hint, action, comment_line_str);
> +	fwrite(out->buf, 1, out->len, s.fp);
>  
>  	wt_status_collect_changes_trees(&s, old_tree, new_tree);
>  	wt_status_print(&s);
>  	wt_status_collect_free_buffers(&s);
>  	string_list_clear_func(&s.change, change_data_free);
> +	fclose(s.fp);

This is fixing the leaked file descriptor.

One thing I wonder though is that we don't perform any error checking on
the file in the new version. Previously, we would have died in case
`write_file_buf()` failed. But now we just `fwrite()` without error
checking. I don't think that "wt-status.c" does error checking either,
so we might end up with a partially-written file without us noticing.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH v5 0/4] history: add squash subcommand to fold a range
From: Patrick Steinhardt @ 2026-06-29  6:26 UTC (permalink / raw)
  To: phillip.wood; +Cc: Harald Nordgren via GitGitGadget, git, Harald Nordgren
In-Reply-To: <d37e8f4f-d1f9-45aa-8c95-ebe676d54671@gmail.com>

On Fri, Jun 26, 2026 at 09:52:57AM +0100, Phillip Wood wrote:
> Hi Harald
> 
> On 24/06/2026 22:54, Harald Nordgren via GitGitGadget wrote:
> > Adds git history squash <revision-range> to fold a range of commits.
> 
> It would be helpful to give a bit more detail here about the command so that
> the reader has an overview of what is actually being implemented.
> 
>  - what does it do with fixup!, squash! and amend! commits? Can it use
>    the message from amend! commits to reword the commit?
>  - can the user reword the commit message?

Good things to document/discuss.

>  - what happens if a merge commit inside the range has a parent outside
>    the range?

Yeah, I agree that we should punt on merge commits for now. They are a
can of worms, and I'm not sure that we should just squash them. I would
at least like the user to ask a flag that tells us that it's fine
squashing those.

>  - what happens to branches that point to commits inside the range?

Yeah, this should be documented indeed.

> I had a quick play and found that it accepts ranges that containing a single
> commit (e.g. @^!) where there is nothing to squash. It also accepts ranges
> that are not ancestors of HEAD (e.g. checkout master and run "git history
> squash --dry-run origin/seen^2^!") without printing an error message. Only
> accepting a single argument is quite limiting as one cannot say
> 
> 	git history squash ^:/base :/tip

Note that it is intentional that you can rewrite branches that are not
currently checked out, and the other subcommands work the same. So I'd
argue this should also be the case for "squash".

Patrick

^ permalink raw reply

* Re: [PATCH] reftable: fix unlikely leak on API error
From: Patrick Steinhardt @ 2026-06-29  6:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20260628090314.GA661068@coredump.intra.peff.net>

On Sun, Jun 28, 2026 at 05:03:14AM -0400, Jeff King wrote:
> If the reftable writer sees a bogus block size, we return with
> REFTABLE_API_ERROR, leaking the reftable_writer struct we previously
> allocated. Originally this case was a BUG(), but it became a regular
> return in 445f9f4f35 (reftable: stop using `BUG()` in trivial cases,
> 2025-02-18).
> 
> We could obviously fix it by calling "reftable_free(wp)". But we can
> observe that we never use the allocated "wp" until after we've validated
> the input options. So let's just bump the allocation down. That fixes
> the leak, and I think makes the flow of the function more logical
> (we validate our inputs before doing any work).

Another alternative would be to create a common exit path where we free
the structure when we're about to return an error. But that might not
even be worth it.

> diff --git a/reftable/writer.c b/reftable/writer.c
> index 0133b64975..1bd4aa388b 100644
> --- a/reftable/writer.c
> +++ b/reftable/writer.c
> @@ -152,16 +152,16 @@ int reftable_writer_new(struct reftable_writer **out,
>  	struct reftable_write_options opts = {0};
>  	struct reftable_writer *wp;
>  
> -	wp = reftable_calloc(1, sizeof(*wp));
> -	if (!wp)
> -		return REFTABLE_OUT_OF_MEMORY_ERROR;
> -
>  	if (_opts)
>  		opts = *_opts;
>  	options_set_defaults(&opts);
>  	if (opts.block_size >= (1 << 24))
>  		return REFTABLE_API_ERROR;
>  
> +	wp = reftable_calloc(1, sizeof(*wp));
> +	if (!wp)
> +		return REFTABLE_OUT_OF_MEMORY_ERROR;
> +
>  	reftable_buf_init(&wp->block_writer_data.last_key);
>  	reftable_buf_init(&wp->last_key);
>  	reftable_buf_init(&wp->scratch);

Makes sense. There's another early return in this function, but there we
already know to free the writer.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH 2/2] odb: introduce `odb_prepare()`
From: Patrick Steinhardt @ 2026-06-29  6:16 UTC (permalink / raw)
  To: Toon Claes; +Cc: git
In-Reply-To: <87o6gx4i5w.fsf@emacs.iotcl.com>

On Fri, Jun 26, 2026 at 02:09:47PM +0200, Toon Claes wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/odb.h b/odb.h
> > index c14c9030e4..b1c0f3767b 100644
> > --- a/odb.h
> > +++ b/odb.h
> > @@ -133,9 +133,13 @@ enum odb_prepare_flags {
> >  };
> >  
> >  /*
> > - * Clear caches, reload alternates and then reload object sources so that new
> > - * objects may become accessible.
> > + * Prepare the object database for use. Calling this function is generally not
> > + * needed, but can be useful in case the caller wants to pre-open individual
> > + * sources.
> >   */
> > +void odb_prepare(struct object_database *o, enum odb_prepare_flags flags);
> > +
> > +/* Equivalent to `odb_prepare(o, ODB_PREPARE_FLUSH_CACHES)`. */
> >  void odb_reprepare(struct object_database *o);
> 
> According to my grep results are there 17 callsites for odb_reprepare(),
> then I agree it makes sense to create this wrapper.

Yeah, I was a bit torn myself whether or not to keep the wrapper. I
eventually decided to just keep it because it reduces churn, and it's a
trivial wrapper anyway.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH 1/2] odb/source: generalize `reprepare()` callback
From: Patrick Steinhardt @ 2026-06-29  6:16 UTC (permalink / raw)
  To: Toon Claes; +Cc: git
In-Reply-To: <87ldc14i4n.fsf@emacs.iotcl.com>

On Fri, Jun 26, 2026 at 02:10:32PM +0200, Toon Claes wrote:
> > diff --git a/builtin/grep.c b/builtin/grep.c
> > index 8080d1bf5e..7361bf071e 100644
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -1361,10 +1360,8 @@ int cmd_grep(int argc,
> >  			struct odb_source *source;
> >  
> >  			odb_prepare_alternates(the_repository->objects);
> > -			for (source = the_repository->objects->sources; source; source = source->next) {
> > -				struct odb_source_files *files = odb_source_files_downcast(source);
> 
> So you're downcasting inside the implementation by the backends itself.
> That makes sense, but would it be worth to say something about that in
> the commit message?

Hm. Would that provide much value? I'm probably quite a bit biased here,
but I think that it's implicit that the backends have to eventually cast
the generic structure to their own backend.

So I wouldn't really know how to clarify this. Did you have anything
specific in mind?

> > diff --git a/odb/source-packed.c b/odb/source-packed.c
> > index 42c28fba0e..fa5a072488 100644
> > --- a/odb/source-packed.c
> > +++ b/odb/source-packed.c
> > @@ -15,7 +15,7 @@ static int find_pack_entry(struct odb_source_packed *store,
> >  {
> >  	struct packfile_list_entry *l;
> >  
> > -	odb_source_packed_prepare(store);
> > +	odb_source_prepare(&store->base, 0);
> 
> Why are you not using ODB_PREPARE_FLUSH_CACHES here? It used to do
> before?

Because this was calling `odb_source_packed_prepare()` before, not
`odb_source_reprepare()`. So this was calling the non-flushing variant.

> >  	if (store->midx && fill_midx_entry(store->midx, oid, e))
> >  		return 1;
> >  
> > @@ -47,7 +47,7 @@ static int odb_source_packed_read_object_info(struct odb_source *source,
> >  	 * been added since the last time we have prepared the packfile store.
> >  	 */
> >  	if (flags & OBJECT_INFO_SECOND_READ)
> > -		odb_source_reprepare(source);
> > +		odb_source_prepare(source, ODB_PREPARE_FLUSH_CACHES);
> 
> I think the new code is correct, but why wasn't `packed` used here in
> the past? The old odb_source_reprepare() expected a downcasted, didn't
> it?

No, `odb_source_reprepare()` is the generic variant. The naming schema
is typically:

  - `odb_source_frobnicate()` for the generic variants, which receive a
    `struct odb_source` as input.

  - `odb_source_<type>_frobnitcate()` for their backend-specific
    implementations, which cast down the generic `struct odb_source` to
    their backend-specific struct.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH] meson: wire up USE_NSEC build knob
From: Patrick Steinhardt @ 2026-06-29  6:08 UTC (permalink / raw)
  To: Jeff King
  Cc: D. Ben Knoble, git, brian m . carlson, Junio C Hamano,
	Ramsay Jones
In-Reply-To: <20260628081806.GA3594700@coredump.intra.peff.net>

On Sun, Jun 28, 2026 at 04:18:06AM -0400, Jeff King wrote:
> On Mon, Jun 22, 2026 at 10:13:21AM +0200, Patrick Steinhardt wrote:
> 
> > > So I guess if we wanted to go further it would take some digging as to
> > > how each platform behaves, and then flipping the config.make.uname knob
> > > for ones where it can be argued that the behavior is always reasonable.
> > 
> > Yeah, it would be nice indeed to figure out whether these concerns still
> > apply. If they do, I would argue that it might even make sense to remove
> > the build option completely. It doesn't really make sense in my opinion
> > to have a build option that nobody uses and that is subtly broken when
> > enabled.
> 
> I suspect it works just fine on some platforms and some filesystems
> (i.e., those that actually store nanoseconds on disk). So probably Linux
> with ext4 is OK. That's just guessing, though.
> 
> If I understand the original problem correctly, then doing this:
> 
>   touch foo
>   ls --full-time foo
>   echo 3 | sudo tee /proc/sys/vm/drop_caches
>   ls --full-time foo
> 
> should be instructive. If it shows the same time for both "ls" calls,
> then USE_NSEC would be fine. If it doesn't, then the system is losing
> the nanosecond information when it drops the cache and has to reload
> from disk (and thus USE_NSEC would cause spurious stat mismatches).
> 
> On my ext4 system, I get the same answers. So far so good.
> 
> I get the same answers with a loopback-mounted ext2 system. Which
> surprised me a bit, but even unmounting and remounting the filesystem,
> the nanosecond times are still there. So...I guess ext2 supports
> nanoseconds.
> 
> I tried with a vfat mount, and it also works: we don't have nanoseconds
> either before or after. That makes sense, and implies that modern Linux
> will always be OK (because it limits the cached VFS response to what the
> underlying filesystem can handle).
> 
> So...maybe this is just a non-issue these days, at least on Linux?
> 
> > > But that's all outside the scope of your patch here.
> > 
> > Kind of, I guess. If we figure that this mechanism is still subtly broken
> > then I'd argue that it doesn't make sense to expose the option via
> > Meson.
> 
> True, but AFAICT it probably is safe these days, at least one some
> platforms.

Hm. That makes me wonder whether it is the completely wrong approach to
make this a build option then. If it works on some systems and only on
some filesystems, then a build option is just too coarse-grained. A
distro wouldn't really be able to ever enable the option, unless it knew
that repositories will only ever exist on a filesystem that works. Which
I guess is an assumption that no distro can make.

So instead, I wonder whether we should treat this the same as for
example "core.ignoreCase", where we only use nanosecond resolution when
opted in by the user. Ideally, if we had a way to detect brokenness, we
could even make git-init(1) set it automatically.

If so, we could unconditionally enable nanoseconds on platforms that
support them, but still have a runtime toggle for filesystems that
don't.

Patrick

^ permalink raw reply

* Re: [PATCH v4 1/1] environment: move excludes_file into repo_config_values
From: Christian Couder @ 2026-06-29  6:03 UTC (permalink / raw)
  To: Tian Yuchen
  Cc: Junio C Hamano, git, cirnovskyv, szeder.dev, Ayush Chandekar,
	Olamide Caleb Bello
In-Reply-To: <18ad7c1c-5ddc-4f62-ba7c-5cda53f5a48d@malon.dev>

On Sun, Jun 28, 2026 at 2:58 PM Tian Yuchen <cat@malon.dev> wrote:
>
> On 6/28/26 16:40, Junio C Hamano wrote:
> > Tian Yuchen <cat@malon.dev> writes:
> >
> >>> Wouldn't we rather want to try to be more strict and say
> >>>
> >>>     if (!repo || !repo->initialized)
> >>>             BUG("repo must be an initialied repository");
> >>>
> >>> here?  Aren't all the callers of this function supposed to be
> >>> dealing with an already initialized repository?
> >>
> >> That makes sense, but from my point of view...
> >>
> >> 'repo_config_values()' already has a check for 'repo->initialized'. If
> >> we're absolutely certain that the 'repo' is initialized, wouldn't it be
> >> better to simply remove all the checks inside the getter and leave the
> >> judgment to 'repo_config_values()'?
> >
> > Yes, that was what I was getting at ;-).
>
> A lot of CI tests are failing, but that just goes to show that the
> "bugs" are being properly identified, doesn’t it?
>
> It means there are a lot of "invalid" calls in the tests (if the way we
> define a 'valid' call, i.e. repo must be initialized, is correct)... It
> seems that code like 'if (repo != the_repository) return' or something
> similar is inevitably going to end up somewhere, even though, as you
> said, it’s "sweeping problems under the rug."
>
> I’m not sure how to proceed from here either..

I agree that the best end state would be to have no `if (!repo ||
!repo->initialized)` check, but we shouldn't have to get there right
away. I think it's fine to proceed in several steps:

1) `if (!repo || !repo->initialized) return NULL;` allows us to remove
the global variable and use getters which will help us in the next
step.

2) `if (!repo || !repo->initialized) return BUG("repo must be an
initialized repository");` now we want to find and fix callers
(including tests) that haven't properly initialized things.

3) No `if (!repo || !repo->initialized)` check, as we are sure that
all the callers that didn't properly initialized things have been
found and fixed.

So I think 1) is fine for now as long as we properly explain in the
commit messages and in code comments (maybe using NEEDSWORK comments)
that we know there is more work to do on this in the future.

^ permalink raw reply

* Re: [PATCH 3/3] t5551: pack refs after creating many tags
From: Patrick Steinhardt @ 2026-06-29  5:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Montalbo, git, Junio C Hamano
In-Reply-To: <20260628080710.GC107826@coredump.intra.peff.net>

On Sun, Jun 28, 2026 at 04:07:10AM -0400, Jeff King wrote:
> We have two tests that create 2,000 and 100,000 tags respectively.
> After doing so, the resulting state can be a bit slow to work with when
> using the "files" ref backend, as each of those refs is in its own file.
> 
> This isn't a very realistic scenario, as we'd expect most of those refs
> to be packed. If they accrue over time along with objects, they'd get
> packed by maintenance/gc runs. And if you have a process that creates a
> ton of refs at once (like a big fast-import), the usual recommendation
> is to run maintenance afterwards.
> 
> So let's follow that recommendation and pack the refs ourselves.
> Unfortunately, this does not seem to produce an improvement to the
> run-time of the test script! That's because after producing this state,
> we perform only a few fetches of it. And packing the refs costs at least
> as much as serving a ref advertisement (both have to iterate the refs,
> but packing additionally must write .lock files as we pack).

> My wall-clock time was slightly improved (but within the noise) with
> this patch, but my user and system CPU time were slightly worse!
> However, on a loaded system with I/O bottlenecks, it may be a net win.
> That's somewhat of a guess, though.
> 
> It would be nice if we had a way to generate all of these refs without
> writing so many individual files. But even if we taught the ref code to
> write large cases directly to the packed-refs file, we'd still need to
> take individual locks. The real solution is a backend like reftable,
> which shaves ~30% off of the test runtime.

We kind of already have this with the `REF_TRANSACTION_FLAG_INITIAL`
flag, but right now it is only used when performing a clone or when
migrating references. Also, it requires an empty repository that has no
references yet.

It raises the question whether we could also extend git-fast-import(1)
to use it, as it would typically be run on an almost-empty repository.
It's the "almost" that kills it though, as we already do have at least
the HEAD reference. So it could be feasible, but it's not as trivial as
just setting the flag and then we're magically faster.

And besides, in this particular test here we run git-fast-import(1)
multiple times in the same repository, so it wouldn't help us.

We could of course extend all of this so that Git is able to write into
the packed-refs directly, even with preexisting refs. But I agree with
your sentiment: it doesn't feel worth it as the reftable backend fixes
scenarios like this anyway.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I'm iffy on whether this one is worth it.
> 
> If you apply just this patch without patch 2, then the run-time does
> improve quite a bit. The cost of packing is amortized by the improved
> performance for all of those subsequent tests (but after patch 2, they
> never even see the unpacked state).
> 
> Likewise, I suspect this would make our timeout problems go away even
> without patch 1.
> 
> So the whole series _could_ be reduced to just this one patch. But
> hopefully the reasoning given in the earlier patches makes sense, at
> which point this one is kind of superfluous.

Agreed. I'd just merge the first two patches and drop this one here.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH v6 3/4] history: add squash subcommand to fold a range
From: Junio C Hamano @ 2026-06-29  5:50 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget; +Cc: git, Harald Nordgren
In-Reply-To: <811e393ab48e2f79e6f8c78d883a5b92311791b3.1782635349.git.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -11,6 +11,7 @@ SYNOPSIS
>  git history fixup <commit> [--dry-run] [--update-refs=(branches|head)] [--reedit-message] [--empty=(drop|keep|abort)]
>  git history reword <commit> [--dry-run] [--update-refs=(branches|head)]
>  git history split <commit> [--dry-run] [--update-refs=(branches|head)] [--] [<pathspec>...]
> +git history squash <revision-range> [--dry-run] [--update-refs=(branches|head)] [--reedit-message]

Not your fault at all, as there are existing violators but the
gitcli tells us that the canonical order of arguments on the command
line is to have dashed options early before any real arguments.

Seeing <commit> followed by --dry-run, --update-refs, etc. with the
existing subcommands should also be corrected and this new command
should not spread the existing violation by mimicking.  The revision
range (which is not just a single token A..B, but the usual range
notation that rev-list takes, like ^A B..C) should come after all
these dashed options.

> +The range is given in the usual `<base>..<tip>` form, where _<base>_ is
> +the commit just below the oldest commit to squash. For example, `git
> +history squash @~3..` folds the three most recent commits into one, and

It is OK to use in your personal development, but in the official
documentation, avoid cryptic @ and always spell HEAD (except for a
single place that explain that @ can stand in for HEAD, for obvious
reasons).  It is extremely annoying and confusing to read as these
look so similar to reflog notation for the current branch, e.g., @~1
vs @{1}.

So "squash HEAD~3..HEAD" specifies a range of commits HEAD~2, HEAD~1
and HEAD (three commits), these are squashed into one commit on top
of HEAD~3, creating a sibling to HEAD~2 whose tree matches HEAD.

OK.  Looking good.

^ permalink raw reply

* Re: [PATCH v6 4/4] history: re-edit a squash with every message
From: Junio C Hamano @ 2026-06-29  5:50 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget; +Cc: git, Harald Nordgren
In-Reply-To: <4edf012b77fd2f2fb2a51eb10863bbf852fffa40.1782635349.git.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Harald Nordgren <haraldnordgren@gmail.com>
>
> By default "git history squash" reuses the oldest commit's message.
> When --reedit-message is given it only reopened that one message, so the
> messages of the folded-in commits were lost.
>
> Gather the messages of every commit in the range, oldest first, and use
> them as the editor template when re-editing, mirroring how "git rebase
> -i" presents a squash.

I doubt it would make practical difference, but one thing I notice
is that unlike "git rebase -i", this one does not intersperse
markers like "# This is the 1st commit message" in between the
messages taken from the squashed commits, so it is not exactly
"mirroring".

^ permalink raw reply

* Re: [PATCH v4 6/8] commit-reach: remove unused nonstale_queue dedup wrappers
From: SZEDER Gábor @ 2026-06-29  5:25 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget
  Cc: git, Derrick Stolee, Elijah Newren, Kristofer Karlsson
In-Reply-To: <4db485b48aae810eeba28ea4feb47401ab352e88.1782649547.git.gitgitgadget@gmail.com>

On Sun, Jun 28, 2026 at 12:25:44PM +0000, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> 
> nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became
> unused after the previous commit. The core nonstale_queue functions
> remain in use by ahead_behind().

Please squash this patch into the previous one.  Since the last
callers of these static functions went away in that commit, it can't
be built with DEVELOPER=1:

  commit-reach.c:91:23: warning: ‘nonstale_queue_get_dedup’ defined but not used [-Wunused-function]
     91 | static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
        |                       ^~~~~~~~~~~~~~~~~~~~~~~~
  commit-reach.c:82:13: warning: ‘nonstale_queue_put_dedup’ defined but not used [-Wunused-function]
     82 | static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
        |             ^~~~~~~~~~~~~~~~~~~~~~~~

> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
>  commit-reach.c | 18 ------------------
>  1 file changed, 18 deletions(-)
> 
> diff --git a/commit-reach.c b/commit-reach.c
> index 9ae306f60c..176ffd68d0 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -79,24 +79,6 @@ static void clear_nonstale_queue(struct nonstale_queue *queue)
>  	queue->max_nonstale = NULL;
>  }
>  
> -static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
> -				     struct commit *c)
> -{
> -	if (c->object.flags & ENQUEUED)
> -		return;
> -	c->object.flags |= ENQUEUED;
> -	nonstale_queue_put(queue, c);
> -}
> -
> -static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
> -{
> -	struct commit *commit = nonstale_queue_get(queue);
> -
> -	if (commit)
> -		commit->object.flags &= ~ENQUEUED;
> -	return commit;
> -}
> -
>  /*
>   * Priority queue with per-side commit counters for paint_down_to_common().
>   * Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
> -- 
> gitgitgadget
> 

^ permalink raw reply

* Re: [PATCH 2/3] t5551: put many-tags case into its own repo
From: Jeff King @ 2026-06-29  0:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Montalbo, Patrick Steinhardt, git
In-Reply-To: <xmqqh5mm1gsf.fsf@gitster.g>

On Sun, Jun 28, 2026 at 02:44:32PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
> > index e236e526f0..cd851f24b8 100755
> > --- a/t/t5551-http-fetch-smart.sh
> > +++ b/t/t5551-http-fetch-smart.sh
> > @@ -397,15 +397,16 @@ create_tags () {
> >  }
> >  
> >  test_expect_success 'create 2,000 tags in the repo' '
> > +	git init "$HTTPD_DOCUMENT_ROOT_PATH/many-tags.git" &&
> >  	(
> > -		cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
> > +		cd "$HTTPD_DOCUMENT_ROOT_PATH/many-tags.git" &&
> >  		create_tags 1 2000
> >  	)
> >  '
> 
> While all the other repositories used in this tests are bare
> repositories, this new one is a non-bare repository.
> 
> It shouldn't make any difference, but since I noticed it...

Ah, yeah. It should work either way, but it is slightly confusing for it
to be non-bare. I'll wait to re-send (though if nothing else comes up,
it may be simpler for you to just amend on your side).

-Peff

^ permalink raw reply

* Re: [PATCH] meson: wire up USE_NSEC build knob
From: brian m. carlson @ 2026-06-29  0:23 UTC (permalink / raw)
  To: Jeff King
  Cc: Patrick Steinhardt, D. Ben Knoble, git, Junio C Hamano,
	Ramsay Jones
In-Reply-To: <20260628084815.GA111587@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 2138 bytes --]

On 2026-06-28 at 08:48:15, Jeff King wrote:
> Oh, I also ran across this old thread:
> 
>   https://public-inbox.org/git/5605D88A.20104%40gmail.com/
> 
> that implies similar:
> 
>   * In-core file times may not be properly rounded to on-disk
>     precision, causing spurious file time changes when the cache is
>     refreshed from disk. This was fixed for typical Unix file systems
>     in kernel 2.6.11. The fix for CEPH, CIFS, NTFS, UFS and FUSE will
>     be in kernel 4.3. There's no fix for FAT-based file systems yet.
> 
> I also tested with CIFS on my system and it is fine. It looks like FAT
> systems were fixed since 2015. ;)
> 
> But there is another interesting question raised there, which is how
> different implementations may interact (e.g., two versions of Git
> without and without USE_NSEC, or JGit which may have to use
> millisecond-resolution APIs, etc). It should all work correctly as long
> as each implementation consistently uses its own resolution (so JGit
> would have to compare in millisecond-space and treat ties as racy). And
> I think that is _probably_ what is happening now, since we already store
> nanoseconds unconditionally (and only use them with USE_NSEC).
> 
> Though the opposite case is a performance problem but not a correctness
> one: if JGit writes out an index with milliseconds and USE_NSEC Git
> tries to read it, we will consider everything stat-dirty and re-read the
> contents.
> 
> I don't know if these would be a problem in practice or not, but it's an
> interesting potential gotcha. And one that nobody may have noticed,
> because probably hardly anybody bothers to build with USE_NSEC now.

I would suggest that we provide a config knob and then build with
USE_NSEC by default.  Most people are using Linux with typical Unix file
systems, NTFS, CIFS, or FUSE (e.g., sshfs).  In the event someone
detects a problem, there's an easy solution—adjust the knob—and we can
then add a Linux-specific statfs call to determine if the file system is
a safe one in a future version.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 325 bytes --]

^ permalink raw reply

* Re: [PATCH GSoC v14 09/13] serve: advertise object-info feature
From: Karthik Nayak @ 2026-06-28 22:55 UTC (permalink / raw)
  To: Pablo Sabater
  Cc: git, chandrapratap3519, chriscool, eric.peijian, gitster,
	jltobler, peff, toon, Calvin Wan, Jonathan Tan
In-Reply-To: <CAN5EUNTrdNArd5SX9df6x9bOhRfzE4c7dLOuNu7ONUdn4TLsUA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3367 bytes --]

Pablo Sabater <pabloosabaterr@gmail.com> writes:

> El sáb, 27 jun 2026 a las 0:23, Karthik Nayak
> (<karthik.188@gmail.com>) escribió:
>>
>> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>>
>> > From: Calvin Wan <calvinwan@google.com>
>> >
>> > In order for a client to know what object-info components a server can
>> > provide, advertise supported object-info features. This will allow a
>> > client to decide whether to query the server for object-info or fetch
>> > as a fallback.
>> >
>> > Helped-by: Jonathan Tan <jonathantanmy@google.com>
>> > Helped-by: Christian Couder <chriscool@tuxfamily.org>
>> > Signed-off-by: Calvin Wan <calvinwan@google.com>
>> > Signed-off-by: Eric Ju <eric.peijian@gmail.com>
>> > Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
>> > ---
>> >  serve.c | 5 ++++-
>> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/serve.c b/serve.c
>> > index 49a6e39b1d..2b07d922b3 100644
>> > --- a/serve.c
>> > +++ b/serve.c
>> > @@ -89,7 +89,7 @@ static void session_id_receive(struct repository *r UNUSED,
>> >       trace2_data_string("transfer", NULL, "client-sid", client_sid);
>> >  }
>> >
>> > -static int object_info_advertise(struct repository *r, struct strbuf *value UNUSED)
>> > +static int object_info_advertise(struct repository *r, struct strbuf *value)
>> >  {
>> >       if (advertise_object_info == -1 &&
>> >           repo_config_get_bool(r, "transfer.advertiseobjectinfo",
>> > @@ -97,6 +97,9 @@ static int object_info_advertise(struct repository *r, struct strbuf *value UNUS
>> >               /* disabled by default */
>> >               advertise_object_info = 0;
>> >       }
>> > +     /* Currently only size is supported */
>> > +     if (value && advertise_object_info)
>> > +             strbuf_addstr(value, "size");
>>
>> So is the plan that further options will be added here to value? If so,
>> whats the format we will follow?
>
> Hi!
> The current documented format is at `gitprotocol-v2.adoc`, however I
> think it could be improved. I have a more complete version in the
> not-yet-sent %(objecttype) support series, but since the question
> comes up here, I will update the format documentation in this series
> for size only:
>

Ah nice, I didn't see that.

> oid <oid>
>   Indicates to the server an object which the client wants to obtain
> - information for.
> + information for. They must be full object IDs.
>
> - info = PKT-LINE(attrs) LF)
> + info = PKT-LINE(attrs LF)
>        *PKT-LINE(obj-info LF)
>
>   attrs = attr | attrs SP attrs
>
> + obj-size = 1*DIGIT
> +
>   attr = "size"
>
> - obj-info = obj-id SP obj-size
> + obj-info = obj-id SP [obj-size]
> +
> +If the server does not recognize the object id, the response will be
> +`obj-id SP` regardless of the number of attributes requested.
>
> About the names `size` and future ones `type` they are arbitrarily
> chosen, so for example: `delta:base` could be `delta`. They are
> appended to the buffer so in case of adding `type`, it would look
> like:
>
> strbuf_addstr(value, "size type");
>
> What do you think?
>

I didn't know this part was already documented, so its all good :)

> Thanks for the review,
> Pablo.
>
>>
>> >       return advertise_object_info;
>> >  }
>> >
>> >
>> > --
>> > 2.54.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply

* Re: [PATCH] http: accept https:// proxies again
From: Junio C Hamano @ 2026-06-28 22:27 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Aliwoto, Johannes Schindelin
In-Reply-To: <xmqqjyrj2qsp.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> From this function nothing returns an error anymore, and looking at
>> the preimage of 663d7abe (http: reject unsupported proxy URL
>> schemes, 2026-05-05) that is the source of the bug, the original did
>> not do anything when the corresponding code did not find and set any
>> proxy settings, either.
>>
>> So perhaps it is a better fix to make it just a function that
>> returns void with early returns?
>
> Nah, I was being stupid.  Disregard the above.
>
> The whole point of 663d7abe was that we wanted to reject what we did
> not recognise, and we cannot do so without returning "good/bad" from
> that function.  The bug was that we did recognise https:// but still
> returned -1 because of the bug, which the patch in the thread fixed.

And as an important bugfix, this patch of course has been
fast-tracked.  I'll make sure we have it in 'master' before Git 2.55
gets tagged.

Thanks.

^ permalink raw reply

* Re: [PATCH 2/3] t5551: put many-tags case into its own repo
From: Junio C Hamano @ 2026-06-28 21:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Montalbo, Patrick Steinhardt, git
In-Reply-To: <20260628080345.GB107826@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
> index e236e526f0..cd851f24b8 100755
> --- a/t/t5551-http-fetch-smart.sh
> +++ b/t/t5551-http-fetch-smart.sh
> @@ -397,15 +397,16 @@ create_tags () {
>  }
>  
>  test_expect_success 'create 2,000 tags in the repo' '
> +	git init "$HTTPD_DOCUMENT_ROOT_PATH/many-tags.git" &&
>  	(
> -		cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
> +		cd "$HTTPD_DOCUMENT_ROOT_PATH/many-tags.git" &&
>  		create_tags 1 2000
>  	)
>  '

While all the other repositories used in this tests are bare
repositories, this new one is a non-bare repository.

It shouldn't make any difference, but since I noticed it...

^ permalink raw reply

* Re: [PATCH 3/3] t5551: pack refs after creating many tags
From: Junio C Hamano @ 2026-06-28 21:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Montalbo, Patrick Steinhardt, git
In-Reply-To: <20260628080710.GC107826@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> So let's follow that recommendation and pack the refs ourselves.
> Unfortunately, this does not seem to produce an improvement to the
> run-time of the test script! That's because after producing this state,
> we perform only a few fetches of it. And packing the refs costs at least
> as much as serving a ref advertisement (both have to iterate the refs,
> but packing additionally must write .lock files as we pack).

Testing a pathological set-up with too many loose refs may have
extra value, as long as we are also testing the recommended set-up,
so ideally we should have both ;-) but if we have to pick only one
and drop the other, we probably should be testing the packed case.

> I'm iffy on whether this one is worth it.

I am ambivalent, too, about this change for the purpose of the
"yeek, apache times out while enumerating refs" issue.  But see
above ;-)

^ permalink raw reply

* Re: 2.54.0: fyi: endless loop at 100% CPU
From: Steffen Nurpmeso @ 2026-06-28 16:37 UTC (permalink / raw)
  To: Michael Montalbo; +Cc: git, Steffen Nurpmeso
In-Reply-To: <CAC2Qwm+v2pRp30TYJpy8Wxzb7gbX+nzybZ_3A99cHb-xjjpCnQ@mail.gmail.com>

Michael Montalbo wrote in
 <CAC2Qwm+v2pRp30TYJpy8Wxzb7gbX+nzybZ_3A99cHb-xjjpCnQ@mail.gmail.com>:
 |On Sat, Jun 27, 2026 at 1:16 PM Steffen Nurpmeso <steffen@sdaoden.eu> \
 |wrote:
 |>
 |> Thanks for these pointers, i did not know about such configuration
 |> variables.  I will set them like you show.
 |
 |No problem! Just to clarify, I'm not sure you should actually use those
 |configuration values verbatim. I was more pointing in the direction of
 |potentially relevant options for debugging / working around the issue.

We'll see.  But if there was some kind of "with love from canada"
misconfiguration -- i have seen quite a bit of those, and
permanent sub-second page reload was one of those effects, in
a browser though .. and git has a little road 'till it gets to
that stage (i hope) -- then maybe these settings .. Or i have to
tweak.

Restartable "fetch" is likely not on that roadmap of git -- that
would (have) be(en) so cool.  (But for years i now have
a WireGuard VPN and go through that, which has improved my TCP
connectivity / stability massively.  But it is still a thriller to
go for some rate-limited fetch of large size ..)

Oh.  Maybe i see.

  $ git ls-remote https://gitlab.xiph.org/xiph/opus.git
  fatal: unable to access 'https://gitlab.xiph.org/xiph/opus.git/': Operation too slow. Less than 1000 bytes/sec transferred the last 10 seconds
  $ git ls-remote https://gitlab.xiph.org/xiph/opus.git
  fatal: unable to access 'https://gitlab.xiph.org/xiph/opus.git/': Operation too slow. Less than 500 bytes/sec transferred the last 10 seconds
  $ git ls-remote https://gitlab.xiph.org/xiph/opus.git
  fatal: unable to access 'https://gitlab.xiph.org/xiph/opus.git/': Operation too slow. Less than 500 bytes/sec transferred the last 21 seconds

It succeeds with

  lowSpeedLimit = 500
  lowSpeedTime = 33

But at least it does not busy loop:

  steffen   2960  2959   0  0.0   2738   2016 S+   00:00:00 18:26 pts/4    /usr/lib/git-core/git remote-https https://gitlab.xiph.org/xiph/opus.git https://gitlab.xiph.org/xiph/opus.git
  steffen   2961  2960   0  0.0  41213  11316 S+   00:00:00 18:26 pts/4    /usr/lib/git-core/git-remote-https https://gitlab.xiph.org/xiph/opus.git https://gitlab.xiph.org/xiph/opus.git

Unfortunately no info where and why it busy looped.  If it would
be non-blocking I/O and .. in this area, one could understand
a bit.  Some ticks-per-sec limit (without X progress) is not
configurable?  I am not really keen to create a rlimit wrapper
for git, or whatever.  (I have limiting cgroups, but still.)

A nice Sunday,
Ciao and greetings from Germany,

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

^ permalink raw reply

* Re: [PATCH v4 0/8] commit-reach: terminate merge-base walk when one side is exhausted
From: Derrick Stolee @ 2026-06-28 15:16 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <pull.2149.v4.git.1782649547.gitgitgadget@gmail.com>

On 6/28/26 8:25 AM, Kristofer Karlsson via GitGitGadget wrote:
> commit-reach: terminate merge-base walk when one paint side is exhausted
> 
> Optimize paint_down_to_common() for merge-base queries that hit large
> one-sided histories.
> 
> When the walk from one side reaches a commit with a very low generation
> number that the other side never paints, the walk is forced to drain most of
> the graph. A common trigger is a repository import that grafts a separate
> history with its own root, but any merge that introduces a low-generation
> commit never painted by the other side has the same effect.

> Changes since v3:
> 
>   * Fixed BUG assertion that was accidentally made unconditional in v3:
>     restored the min_generation guard so it only fires when generation-based
>     ordering is active.
> 
>   * Moved generation cutoff and single-result termination conditions into the
>     documentation in patch 1/8, since they describe existing behavior.
> 
>   * Renamed paint_state counter fields for clarity: p1_count ->
>     parent1_count, p2_count -> parent2_count, pending_merge_bases ->
>     mb_candidate_count. Changed counter types from int to size_t. (Suggested
>     by Rene Scharfe.)

I reviewed the v3 discussion, the range-diff, and reread patch 8. I think
that this version is good to go.

Thanks for your hard work!
-Stolee


^ permalink raw reply

* Re: [PATCH v4 8/8] commit-reach: move min_generation check into paint_queue_get()
From: Derrick Stolee @ 2026-06-28 15:15 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Elijah Newren, Kristofer Karlsson
In-Reply-To: <8dd15d44e6a60fc39bbf6d894628507e839f9248.1782649547.git.gitgitgadget@gmail.com>

On 6/28/26 8:25 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
...> @@ -138,11 +140,23 @@ static void paint_queue_put(struct paint_state *state,
>   static struct commit *paint_queue_get(struct paint_state *state)
>   {
>   	struct commit *commit = prio_queue_get(&state->queue);
> +	timestamp_t generation;
>   
>   	if (!commit)
>   		return NULL;
>   
>   	commit->object.flags &= ~ENQUEUED;
> +	generation = commit_graph_generation(commit);
> +
> +	if (state->min_generation && generation > state->last_gen)
> +		BUG("bad generation skip %"PRItime" > %"PRItime" at %s",
> +		    generation, state->last_gen,
> +		    oid_to_hex(&commit->object.oid));
> +	state->last_gen = generation;
> +
> +	/* generation cutoff */
> +	if (generation < state->min_generation)
> +		return NULL;

...

> -		if (min_generation && generation > last_gen)
> -			BUG("bad generation skip %"PRItime" > %"PRItime" at %s",
> -			    generation, last_gen,
> -			    oid_to_hex(&commit->object.oid));
> -		last_gen = generation;
> -
> -		if (generation < min_generation)
> -			break;

I'm just stopping in to say that this looks like a clean code move
in this version, without mutating this chunk in the previous patch.

LGTM.
-Stolee

^ permalink raw reply

* Re: [PATCH v4 1/1] environment: move excludes_file into repo_config_values
From: Tian Yuchen @ 2026-06-28 12:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, cirnovskyv, szeder.dev, Christian Couder, Ayush Chandekar,
	Olamide Caleb Bello
In-Reply-To: <xmqqbjcv2h3j.fsf@gitster.g>

On 6/28/26 16:40, Junio C Hamano wrote:
> Tian Yuchen <cat@malon.dev> writes:
> 
>>> Wouldn't we rather want to try to be more strict and say
>>>
>>> 	if (!repo || !repo->initialized)
>>> 		BUG("repo must be an initialied repository");
>>>
>>> here?  Aren't all the callers of this function supposed to be
>>> dealing with an already initialized repository?
>>
>> That makes sense, but from my point of view...
>>
>> 'repo_config_values()' already has a check for 'repo->initialized'. If
>> we're absolutely certain that the 'repo' is initialized, wouldn't it be
>> better to simply remove all the checks inside the getter and leave the
>> judgment to 'repo_config_values()'?
> 
> Yes, that was what I was getting at ;-).

A lot of CI tests are failing, but that just goes to show that the 
"bugs" are being properly identified, doesn’t it?

It means there are a lot of "invalid" calls in the tests (if the way we 
define a 'valid' call, i.e. repo must be initialized, is correct)... It 
seems that code like 'if (repo != the_repository) return' or something 
similar is inevitably going to end up somewhere, even though, as you 
said, it’s "sweeping problems under the rug."

I’m not sure how to proceed from here either..

Regards, yuchen

^ permalink raw reply

* [PATCH v4 8/8] commit-reach: move min_generation check into paint_queue_get()
From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
	Kristofer Karlsson
In-Reply-To: <pull.2149.v4.git.1782649547.gitgitgadget@gmail.com>

From: Kristofer Karlsson <krka@spotify.com>

Consolidate the min_generation termination condition into
paint_queue_get(), alongside the existing stale-entry and
side-exhaustion checks.

Move last_gen into struct paint_state so that
commit_graph_generation() is called exactly once per dequeued commit
and the result is shared across all termination checks and the
monotonicity BUG assertion.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
 commit-reach.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index e174b219c6..5c5c54d66e 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -89,6 +89,8 @@ struct paint_state {
 	size_t parent1_count;
 	size_t parent2_count;
 	size_t mb_candidate_count;
+	timestamp_t min_generation;
+	timestamp_t last_gen;
 };
 
 static void paint_count_update(struct paint_state *state,
@@ -138,11 +140,23 @@ static void paint_queue_put(struct paint_state *state,
 static struct commit *paint_queue_get(struct paint_state *state)
 {
 	struct commit *commit = prio_queue_get(&state->queue);
+	timestamp_t generation;
 
 	if (!commit)
 		return NULL;
 
 	commit->object.flags &= ~ENQUEUED;
+	generation = commit_graph_generation(commit);
+
+	if (state->min_generation && generation > state->last_gen)
+		BUG("bad generation skip %"PRItime" > %"PRItime" at %s",
+		    generation, state->last_gen,
+		    oid_to_hex(&commit->object.oid));
+	state->last_gen = generation;
+
+	/* generation cutoff */
+	if (generation < state->min_generation)
+		return NULL;
 
 	if (!state->mb_candidate_count) {
 		/* only stale entries remain */
@@ -151,7 +165,7 @@ static struct commit *paint_queue_get(struct paint_state *state)
 
 		/* one side is exhausted */
 		if ((!state->parent1_count || !state->parent2_count) &&
-		    commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
+		    generation < GENERATION_NUMBER_INFINITY)
 			return NULL;
 	}
 
@@ -177,9 +191,10 @@ static int paint_down_to_common(struct repository *r,
 	struct commit *commit;
 	int i;
 	int steps = 0;
-	timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
 	struct commit_list **tail = result;
 
+	state.min_generation = min_generation;
+	state.last_gen = GENERATION_NUMBER_INFINITY;
 	if (!min_generation && !corrected_commit_dates_enabled(r))
 		state.queue.compare = compare_commits_by_commit_date;
 
@@ -196,18 +211,8 @@ static int paint_down_to_common(struct repository *r,
 	while ((commit = paint_queue_get(&state))) {
 		struct commit_list *parents;
 		int flags;
-		timestamp_t generation = commit_graph_generation(commit);
 		steps++;
 
-		if (min_generation && generation > last_gen)
-			BUG("bad generation skip %"PRItime" > %"PRItime" at %s",
-			    generation, last_gen,
-			    oid_to_hex(&commit->object.oid));
-		last_gen = generation;
-
-		if (generation < min_generation)
-			break;
-
 		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
 		if (flags == (PARENT1 | PARENT2)) {
 			if (!(commit->object.flags & RESULT)) {
@@ -219,7 +224,7 @@ static int paint_down_to_common(struct repository *r,
 				 * descendant of this one.
 				 */
 				if (!(mb_flags & MERGE_BASE_FIND_ALL) &&
-				    generation < GENERATION_NUMBER_INFINITY)
+				    state.last_gen < GENERATION_NUMBER_INFINITY)
 					break;
 			}
 			/* Mark parents of a found merge stale */
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v4 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
	Kristofer Karlsson
In-Reply-To: <pull.2149.v4.git.1782649547.gitgitgadget@gmail.com>

From: Kristofer Karlsson <krka@spotify.com>

Add an early termination check to paint_down_to_common() using the
per-side counters introduced earlier.  Once the walk enters the
finite-generation region, terminate early when one side's exclusive
count drops to zero -- no new merge-base can form without both paint
sides meeting.

The check also waits for pending_merge_bases to reach zero, ensuring
all merge-base candidates have been dequeued and recorded before
exiting.

The INFINITY gate ensures correctness: commits without a commit-graph
entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
which is not topologically reliable.  The optimization only fires
once the walk enters the finite-generation region where ordering
guarantees hold.

Step counts measured with trace2 on git.git with commit-graph:

  merge-base --all v2.0.0 v2.55.0-rc1:
    before: 72264 steps    after: 44589 steps

  merge-base --all v2.55.0-rc1 v2.55.0-rc1~5:
    before:   110 steps    after:     7 steps

Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
 .../technical/paint-down-to-common.adoc         | 17 +++++++++++++++++
 commit-reach.c                                  | 17 ++++++++++++++---
 t/t6600-test-reach.sh                           |  4 ++--
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc
index ac3e2b39a5..15adac7885 100644
--- a/Documentation/technical/paint-down-to-common.adoc
+++ b/Documentation/technical/paint-down-to-common.adoc
@@ -99,6 +99,9 @@ ends when one of the following conditions holds:
   4. Single result: the caller only needs one merge base, one has
      been found, and the walk has entered the finite-generation
      region.
+  5. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
+     remain in the queue, no pending merge-base candidates exist,
+     and the walk has entered the finite-generation region.
 
 Stale entry condition
 ~~~~~~~~~~~~~~~~~~~~~
@@ -109,6 +112,20 @@ existing candidates by proving one is an ancestor of another, but
 `remove_redundant()` handles that as a post-processing step, so it
 is safe to exit early.
 
+Side-exhaustion condition
+~~~~~~~~~~~~~~~~~~~~~~~~~
+A new merge-base requires commits from both sides to meet. When one
+side's exclusive counter reaches zero and there are no pending
+merge-base candidates, no future traversal step can produce a new
+candidate.
+
+This optimization only activates in the finite-generation region
+where topological ordering holds. In that region, children are
+always visited before parents, so paint flags are final at visit
+time and an exhausted side cannot reappear. In the INFINITY region,
+commit-date ordering can violate this guarantee, so the check is
+skipped.
+
 Generation cutoff
 ~~~~~~~~~~~~~~~~~
 Some callers (notably `remove_redundant()`) supply a `min_generation`
diff --git a/commit-reach.c b/commit-reach.c
index 176ffd68d0..e174b219c6 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -131,6 +131,10 @@ static void paint_queue_put(struct paint_state *state,
 	}
 }
 
+/*
+ * Dequeue the next commit for the paint walk, or return NULL when
+ * no more merge bases can be discovered.
+ */
 static struct commit *paint_queue_get(struct paint_state *state)
 {
 	struct commit *commit = prio_queue_get(&state->queue);
@@ -140,9 +144,16 @@ static struct commit *paint_queue_get(struct paint_state *state)
 
 	commit->object.flags &= ~ENQUEUED;
 
-	if (!state->parent1_count && !state->parent2_count &&
-	    !state->mb_candidate_count)
-		return NULL;
+	if (!state->mb_candidate_count) {
+		/* only stale entries remain */
+		if (!state->parent1_count && !state->parent2_count)
+			return NULL;
+
+		/* one side is exhausted */
+		if ((!state->parent1_count || !state->parent2_count) &&
+		    commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
+			return NULL;
+	}
 
 	paint_count_update(state, commit->object.flags, -1);
 	return commit;
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 51f3d70492..6365007560 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -220,7 +220,7 @@ test_expect_success 'in_merge_bases_many:self' '
 	EOF
 	echo "in_merge_bases_many(A,X):1" >expect &&
 	test_all_modes in_merge_bases_many &&
-	test_paint_down_steps 45 2 25 3
+	test_paint_down_steps 45 1 25 1
 '
 
 test_expect_success 'is_descendant_of:hit' '
@@ -337,7 +337,7 @@ test_expect_success 'merge-base --all commit-walk steps' '
 	>input &&
 	git rev-parse commit-9-1 >expect &&
 	run_all_modes git merge-base --all commit-9-9 commit-9-1 &&
-	test_paint_down_steps 81 80 81 81
+	test_paint_down_steps 81 9 57 10
 '
 
 test_expect_success 'reduce_heads' '
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 6/8] commit-reach: remove unused nonstale_queue dedup wrappers
From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
	Kristofer Karlsson
In-Reply-To: <pull.2149.v4.git.1782649547.gitgitgadget@gmail.com>

From: Kristofer Karlsson <krka@spotify.com>

nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became
unused after the previous commit. The core nonstale_queue functions
remain in use by ahead_behind().

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
 commit-reach.c | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 9ae306f60c..176ffd68d0 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -79,24 +79,6 @@ static void clear_nonstale_queue(struct nonstale_queue *queue)
 	queue->max_nonstale = NULL;
 }
 
-static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
-				     struct commit *c)
-{
-	if (c->object.flags & ENQUEUED)
-		return;
-	c->object.flags |= ENQUEUED;
-	nonstale_queue_put(queue, c);
-}
-
-static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
-{
-	struct commit *commit = nonstale_queue_get(queue);
-
-	if (commit)
-		commit->object.flags &= ~ENQUEUED;
-	return commit;
-}
-
 /*
  * Priority queue with per-side commit counters for paint_down_to_common().
  * Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
-- 
gitgitgadget


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox