Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v13 2/6] branch: let delete_branches warn instead of error on bulk refusal
From: Junio C Hamano @ 2026-06-08 23:56 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget
  Cc: git, Kristoffer Haugsbakk, Johannes Sixt, Phillip Wood,
	Harald Nordgren
In-Reply-To: <a7672713f67d6a44992c0f0cf989770c7e9ca38b.1780684553.git.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Harald Nordgren <haraldnordgren@gmail.com>
>
> Add a warn-only mode to delete_branches() and check_branch_commit()
> so a bulk caller can report branches that are not fully merged as a
> short warning and carry on, rather than erroring with the longer
> "use 'git branch -D'" advice that the plain "git branch -d" path
> emits. Existing callers are unaffected.
>
> Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
> ---
>  builtin/branch.c | 50 ++++++++++++++++++++++++++++++++----------------
>  1 file changed, 34 insertions(+), 16 deletions(-)

This breaks t5404, t5514, and t5505, which contradicts with
"Existing callers are unaffected".

What's going on?  It is troubling that the breakage happens without
even getting merged with other topics in-flight, which means that
the environment you are developing in and testing on and the
environment that I apply patches on, integrate and test (something
based on Debian testing) are somehow behaving differently.

"cd t && sh t5404-*.sh -i -v" ends like so:

expecting success of 5404.7 'already deleted tracking branches ignored':
        git branch -d -r origin/b3 &&
        git push origin :b3 >output 2>&1 &&
        ! grep "^error: " output

error: the branch 'origin/b3' is not fully merged
hint: If you are sure you want to delete it, run 'git branch -D origin/b3'
hint: Disable this message with "git config set advice.forceDeleteBranch false"
not ok 7 - already deleted tracking branches ignored
#
#               git branch -d -r origin/b3 &&
#               git push origin :b3 >output 2>&1 &&
#               ! grep "^error: " output
#
1..7

but it may be possible that earlier steps are behaving differently
with the patches applied.  I didn't dig further but I think the CI
in the recent past have been affected by the same breakage.

^ permalink raw reply

* Re: [PATCH 0/6] Support hashing objects larger than 4GB on Windows
From: Junio C Hamano @ 2026-06-08 23:56 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin
In-Reply-To: <4e3430a1-e8ee-47de-b6f0-25abafe3c45b@iee.email>

Philip Oakley <philipoakley@iee.email> writes:

> On 04/06/2026 18:15, Johannes Schindelin via GitGitGadget wrote:
>> Philip Oakley has contributed these patches ~4.5 years ago, and they have
>> been carried in Git for Windows ever since.
>> 
>> Now that there are already other patch series flying around that try to
>> address various aspects about >4GB objects (which aren't handled well by Git
>> until it stops forcing unsigned long to do size_t's job), it seems a good
>> time to upstream these patches, too, at long last.
>
> Yay. I approve this message ;-)

While I very much appreciate the effort to switch to size_t where
appropriate (and the places we historically used ulong for size of
in-core memory region are the most appropriate places), such an old
series crashes with in-flight topics big time.  Can we get an update
on a more recent base?

No need to rush, as I'll be slowly processing the backlog to catch
up with the list traffic for a few days.

Thanks.

^ permalink raw reply

* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Jeff King @ 2026-06-08 23:52 UTC (permalink / raw)
  To: Tamir Duberstein
  Cc: git, Karthik Nayak, Junio C Hamano, Victoria Dye, Derrick Stolee,
	Elijah Newren, Kristofer Karlsson
In-Reply-To: <CAJ-ks9ng3Obv8jydYiBD4kxmTSZCJX8xNb0YihNeSW8_8WL5Ew@mail.gmail.com>

On Mon, Jun 08, 2026 at 07:35:57PM -0400, Tamir Duberstein wrote:

> > So I think a better rule here is to tweak the selection in
> > commit_contains() to select the depth-first algorithm when we have
> > generation numbers enabled. There's a patch in an old thread, which was
> > revived a week or two ago by Kristofer (cc'd):
> >
> >   https://lore.kernel.org/git/20260527070510.3510836-1-krka@spotify.com/
> 
> Very good catch, thank you. I reproduced the regression with a
> 100,000-commit history and generation numbers disabled. The parent
> took 13.0 ms, the unconditional depth-first version took 238.4 ms, and
> the generation-aware version took 9.1 ms.
> 
> I didn't find a patch in that thread, so I will reroll using the
> memoized walk for tags or when generation numbers are enabled, while
> retaining the breadth-first walk otherwise. If someone else would
> prefer to send that patch, that is fine by me as well.

It's just this:

diff --git a/commit-reach.c b/commit-reach.c
index 9b3ea46d6f..cdea0030b8 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -799,7 +799,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 int commit_contains(struct ref_filter *filter, struct commit *commit,
 		    struct commit_list *list, struct contains_cache *cache)
 {
-	if (filter->with_commit_tag_algo)
+	if (filter->with_commit_tag_algo ||
+	    generation_numbers_enabled(the_repository))
 		return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
 	return repo_is_descendant_of(the_repository, commit, list);
 }

from:

  https://lore.kernel.org/git/20230324191009.GA536967@coredump.intra.peff.net/

But I won't be surprised if you recreated the identical patch yourself. ;)

-Peff

^ permalink raw reply related

* Re: followRemoteHEAD management question
From: Jeff King @ 2026-06-08 23:49 UTC (permalink / raw)
  To: Matt Hunter; +Cc: git, Bence Ferdinandy
In-Reply-To: <DJ19CI50W6UH.17QLIBNTXBWXU@lfurio.us>

On Fri, Jun 05, 2026 at 12:31:30PM -0400, Matt Hunter wrote:

> In the past, I've preferred to run 'git remote set-head <name> -d' when
> setting up a new repository, since I generally have an awareness of what
> the remote default branch is, and I don't like seeing them in branch
> listings or git-log annotations.  They are especially noisy to me if I
> have multiple remotes.  It's possible this config is ill-advised - I
> would love to be educated if so...

No, it's perfectly reasonable. Being able to refer to "origin" to mean
"origin/HEAD" is sometimes handy, but if you don't use it, there's no
reason to set up the symref in the first place.

> However, since b7f7d16562c3 (fetch: add configuration for set_head
> behaviour), these changes are undone by every 'git fetch'.
> 
> The topic mentioned above (merged in a1f34d595503) adds a new
> configuration key 'remote.<name>.followRemoteHEAD'.  I'm assuming that
> the intended use for followRemoteHEAD is really only in local /
> per-repository config, since trying to apply it to my personal
> .gitconfig has some odd behavior.

I think this is a gap in the new feature's implementation. It added
per-remote config, but there is no global config to fall back to (e.g.,
the way that remote.*.prune falls back to fetch.prune). There should be
a fetch.followRemoteHEAD option (or perhaps remote.followRemoteHEAD).

> The <name> in the key template does not accept a wildcard, so I must
> list out each of the common remote names I use across different
> repositories.  Since many of my repos don't actually have remotes
> established for all of these names, they pick up a kind of half-baked
> definition for each of them as git performs its config parsing.  For
> instance, a name will appear under 'git remote -v', but it won't
> have any actual properties configured.

Yes, this is a common problem with the remote-config namespace. Defining
_any_ key makes the remote "exist", even without a defined url, but that
isn't usually the intent.  But we can't distinguish that from the case
where you really do want to define a remote without a url (in which case
the url is the name of the remote).

> Is there another solution in place I've missed?  If not, would there be
> any opposition to a new key like 'remote.followRemoteHEAD' which serves
> to provide a default value for any remote that doesn't have its own
> 'remote.<name>.followRemoteHEAD' key?
> 
> I've started scouting out changes to make for such a patch.  It's not
> ready yet, but I figured I would throw this question out in case an easy
> answer can save the effort.

I think you are on the right track. I can see arguments for or against
putting it in fetch.* or remote.*, so you'll have to pick one. ;)

-Peff

^ permalink raw reply

* Re: Mirror repositories for submodules
From: Jeff King @ 2026-06-08 23:41 UTC (permalink / raw)
  To: Simon Richter; +Cc: Junio C Hamano, Benson Muite, git
In-Reply-To: <fa075b7a-96f6-4fd9-ae94-30ddf323f759@hogyros.de>

On Thu, Jun 04, 2026 at 06:27:31PM +0900, Simon Richter wrote:

> Hi,
> 
> On 6/4/26 3:16 PM, Jeff King wrote:
> 
> > Here's a thought experiment. What if you put the UUID into a URL, like:
> >    repoid://123456789.git
> 
> Yes, that's the idea, except I would want to use a relative URL, like
> 
>     ../123456789.git
> 
> This could solve the "naive cloning" problem, because it creates an
> expectation that the submodules can be found on the same server, or in a
> nearby path.

I see. I forgot that we allowed relative submodule URLs.

> > Now, all of that said, do we still need uuids at all? If the canonical
> > submodule name is https://github.com/git/git.git, then anybody can just
> > rewrite that locally in the same way using url.*.insteadOf config.
> 
> Yes, but we'd then need a mechanism for a server to indicate "for cloning,
> you should use these 'insteadOf' settings, which is a massive can of worms
> from a security standpoint.
> 
> I also don't think these canonical URLs can ever be stable if they refer to
> infrastructure that is not under the control of the maintainer -- it would
> tie the project identity to the hosting provider, and increase the inertia
> to overcome for moves (such as the current exodus from github and gitlab
> towards codeberg).

From your description I was assuming the cloner had to always specify
insteadOf (which they find out about "somehow").

If they're not, then your choice of canonical URL is effectively trading
off some cases for others. In the scenario you care about, you assume
that the submodules are hosted relative to the superproject, so clients
can usually get what they need without further config. The server
operator and the superproject repo coordinate on the names.

But in many decentralized cases, there's no URL or administrative
relationship between the superproject and the submodules. They might
happen to be on the same server, but even that falls down if the
superproject is mirrored elsewhere. So using some canonical name which
works in practice _now_ is usually the best we can do.

> The common goal is that a naive clone should get submodules from a local
> server, ideally without us having to write some tool to make an initial
> checkout, enumerate submodules, create insteadOf settings, clone first layer
> of submodules, enumerate second layer, ...

You shouldn't need to do the recursive enumeration if you set up the
inteadOf ahead of time. You don't know which insteadOf settings you'll
want, but you can feed the whole possible mapping. How you get that
mapping is unspecified, but if you are mirroring the submodules already
on your local infrastructure, then whatever process does that can also
output the mapping.

Just to be clear, I'm not trying to dismiss what you're going for. I'm
looking at this from the lens of Git developers: how do existing Git
features fit into this space, and which features are missing that might
assist in a generalized way.

-Peff

^ permalink raw reply

* git-diff in a worktree is an order of magnitude slower?
From: D. Ben Knoble @ 2026-06-08 23:36 UTC (permalink / raw)
  To: Git

Hello all,

I'd like to report and offer to help fix what I view as a serious performance
bug:

    "git diff --no-ext-diff --quiet" performs about ~10x slower in a secondary
    worktree than in the main worktree.

Fortunately, this doesn't seem to extend to "--cached" (and "--no-ext-diff" and
"--quiet" are probably both red-herrings, since it _does_ extend to plain "git
diff").

Here's a short demo in Git:

    # git switch -d v2.54.0
    # ninja -C build # where my meson-generated build dir is
    # git worktree add --detach ../perf-test v2.54.0
    # hyperfine -N --warmup 10 './build/bin-wrappers/git diff'
    Benchmark 1: ./build/bin-wrappers/git diff
      Time (mean ± σ):       3.4 ms ±   0.5 ms    [User: 4.2 ms, System: 3.9 ms]
      Range (min … max):     2.5 ms …   5.8 ms    677 runs
    # pushd ../perf-test
    # hyperfine -N --warmup 10 '../git/build/bin-wrappers/git diff'
    Benchmark 1: ../git/build/bin-wrappers/git diff
      Time (mean ± σ):     223.3 ms ±  10.5 ms    [User: 210.4 ms,
System: 19.1 ms]
      Range (min … max):   213.5 ms … 243.9 ms    13 runs

I've had a similar experience at $DAYJOB, where a large repo takes ~6ms for the
former and ~650ms for the latter. I noticed because the Bash prompt functions
execute "git diff --no-ext-diff --quiet", and that was (AFAICT) the largest
culprit for a slow shell prompt in a worktree. To squelch that from the prompt,
I have to go down the rabbit hole of the worktree config extension, so I figured
better to fix the slow diff if possible anyway.

2 questions:

1. Is this known, and if so is anybody working on it?
2. How can I help identify problem areas?

A little more
-------------

I've reproduced this as far back as v2.50.0, which is as far back as I could get
the meson build to work with little effort (so I can't rule out that this is an
old regression).

Using "perf record -F 99 -g -- <bin-wrappers/git> diff" in both trees and then
"perf report":

- it looks like the main worktree spends most of it's time in preload_thread,
  threaded_has_symlink_leading_path, lstat_cache…
- the worktree spends a lot more time in ie_match_stat, ce_modified_check_fs,
  ce_compare_data, index_fd, would_convert_to_git_filter_fd…

Here's the relevant "perf stat":

main tree:

 Performance counter stats for './build/bin-wrappers/git diff':

                 0      context-switches:u               #      0,0
cs/sec  cs_per_second
                 0      cpu-migrations:u                 #      0,0
migrations/sec  migrations_per_second
               967      page-faults:u                    #  65036,4
faults/sec  page_faults_per_second
             14,87 msec task-clock:u                     #      0,3
CPUs  CPUs_utilized
            48 616      branch-misses:u                  #      3,2 %
branch_miss_rate         (57,19%)
         3 571 630      branches:u                       #    240,2
M/sec  branch_frequency
        13 635 411      cpu-cycles:u                     #      0,9
GHz  cycles_frequency
        22 120 068      instructions:u                   #      1,9
instructions  insn_per_cycle  (85,61%)
         3 634 065      stalled-cycles-frontend:u        #     0,28
frontend_cycles_idle        (9,56%)

       0,006860098 seconds time elapsed

       0,001364000 seconds user
       0,015157000 seconds sys

worktree:

 Performance counter stats for '../git/build/bin-wrappers/git diff':

                 0      context-switches:u               #      0,0
cs/sec  cs_per_second
                 0      cpu-migrations:u                 #      0,0
migrations/sec  migrations_per_second
             1 585      page-faults:u                    #   5058,0
faults/sec  page_faults_per_second
            313,37 msec task-clock:u                     #      0,9
CPUs  CPUs_utilized
         2 481 188      branch-misses:u                  #      1,5 %
branch_miss_rate         (48,94%)
       168 664 155      branches:u                       #    538,2
M/sec  branch_frequency     (51,21%)
     1 004 095 217      cpu-cycles:u                     #      3,2
GHz  cycles_frequency       (67,74%)
     3 864 851 223      instructions:u                   #      3,9
instructions  insn_per_cycle  (52,73%)
        70 755 234      stalled-cycles-frontend:u        #     0,07
frontend_cycles_idle        (49,29%)

       0,306707634 seconds time elapsed

       0,269027000 seconds user
       0,045512000 seconds sys

My observations:
- the worktree has ~twice as many page faults and
- executes ~150 times as many instructions (3.8b compared to 23m).

(When I try to run some "perf" stats as root to access other counters, like
syscalls, "git diff" in the worktree says "not a git repository", so I'm not
counting the actual behavior. Ditto with DTrace.)

PS I almost CC'd Peff and Patrick, whose names stood out in "git
shortlog builtin/{worktree,diff}* object-file* | sort -t\( -k2 -g",
but decided they'd be their own best judge of whether they can
understand what's going on? :)

-- 
D. Ben Knoble

^ permalink raw reply

* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Tamir Duberstein @ 2026-06-08 23:35 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Karthik Nayak, Junio C Hamano, Victoria Dye, Derrick Stolee,
	Elijah Newren, Kristofer Karlsson
In-Reply-To: <20260608223430.GA340696@coredump.intra.peff.net>

On Mon, Jun 8, 2026 at 3:34 PM Jeff King <peff@peff.net> wrote:
>
> On Sun, Jun 07, 2026 at 08:33:29PM -0700, Tamir Duberstein wrote:
>
> > git branch and git for-each-ref call repo_is_descendant_of() for each
> > candidate selected by --contains or --no-contains. Each call starts a
> > new graph walk, so refs with shared history repeatedly traverse the same
> > commits.
> >
> > ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> > the tag traversal that caches positive and negative answers across
> > candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> > 2015-07-07) preserved the branch and tag implementations when ref-filter
> > learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> > the '--contains' option, 2015-10-18) noted that they should be unified.
> >
> > Use the memoized traversal for every ref-filter contains check and
> > remove the implementation selector. The cache records answers for one
> > fixed target list, so document that callers must clear it before
> > changing the list.
>
> The subject line obfuscated the intent here (at least for me). I think a
> more clear subject would just be: "ref-filter: always use
> contains_tag_algo" or something.

Ack, changed to "ref-filter: memoize --contains with generations" in v2 draft.

>
> But more importantly, I think the analysis above is missing a key point
> about why we didn't make the tag algo the default in the first place: it
> is depth first, and thus slower when the merge base can be found quickly
> by the breadth-first traversal. For tags, you tend to have to look at
> all of history anyway (because you have at least one old tag that
> requires walking back that far), but that is often not true for
> branches.
>
> We are able to get the best of both worlds if we can cut off the
> depth-first traversal early using generation numbers.
>
> So I think a better rule here is to tweak the selection in
> commit_contains() to select the depth-first algorithm when we have
> generation numbers enabled. There's a patch in an old thread, which was
> revived a week or two ago by Kristofer (cc'd):
>
>   https://lore.kernel.org/git/20260527070510.3510836-1-krka@spotify.com/

Very good catch, thank you. I reproduced the regression with a
100,000-commit history and generation numbers disabled. The parent
took 13.0 ms, the unconditional depth-first version took 238.4 ms, and
the generation-aware version took 9.1 ms.

I didn't find a patch in that thread, so I will reroll using the
memoized walk for tags or when generation numbers are enabled, while
retaining the breadth-first walk otherwise. If someone else would
prefer to send that patch, that is fine by me as well.

>
> > The memoized depth-first walk assumes acyclic ancestry, but replacement
> > refs can create cycles. Track commits while they are on the walk. If a
> > cycle is found, discard partial cache entries and use
> > repo_is_descendant_of() for that candidate.
>
> I can believe that the depth-first code doesn't handle cycles well. But
> if that's the case, then it's already a problem for "git tag
> --contains". And we should fix it as a separate patch from enabling that
> algorithm in more cases.

Agreed, and Karthik flagged the same. The cycle handling is now a
separate first patch.

>
> I'm not quite sure how ancestry should be defined in a cycle. How does
> the algorithm behave now when it sees a cycle? If it loops infinitely,
> we definitely would want to fix that. If not, then to some degree I
> don't care too much what answer is provided, since the input is somewhat
> nonsense in the first place. And if it is expensive to track, it might
> not be worth inflicting that penalty on the sane cases. But it looks
> like your solution is just setting an extra flag value in the slab,
> which should be pretty cheap.
>
> > The branch and for-each-ref path passed repo_is_descendant_of() through
> > a Boolean interface. In configurations where it returned -1 for missing
> > ancestry, ref-filter treated the error as "contains". The memoized path
> > instead fails when ancestry cannot be parsed, as git tag already did.
> > During review of the 2018 reachability series, making parse failures
> > fatal was explicitly deferred because that series was intended to
> > preserve behavior. Unifying the implementations now makes all callers
> > fail consistently instead of preserving that accidental Boolean
> > interpretation.
>
> I think that's a good outcome.
>
> > The added p1500 case uses up to 8,192 packed refs along one first-parent
> > history. It improves from 0.68 to 0.03 seconds.
> >
> > On a checkout with 62,174 remote-tracking refs, I ran:
> >
> >     hyperfine --warmup 0 --runs 3 \
> >         --command-name parent \
> >         '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
> >         --command-name this-commit \
> >         '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
> >
> > The results were:
> >
> >              parent       this commit
> >   elapsed    104.365 s     467.7 ms
> >   user        93.702 s     220.2 ms
> >   system       0.723 s     182.7 ms
>
> I didn't time it, but the probable regression case is something like
> this: a very deep history with a small number of branches diverging only
> a few commits away. Without a commit-graph file (or one without
> generation numbers), that probably makes "git branch --contains" slower.
>
> -Peff

^ permalink raw reply

* Re: [PATCH] ls-files: filter pathspec before lstat
From: Jeff King @ 2026-06-08 23:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Tamir Duberstein, git, René Scharfe, Patrick Steinhardt
In-Reply-To: <20260608230315.GC340696@coredump.intra.peff.net>

On Mon, Jun 08, 2026 at 07:03:15PM -0400, Jeff King wrote:

> > Adding an extra early `match_pathspec()` check before making slow
> > system calls like `lstat()` makes sense, especially when most of the
> > index entries need to be skipped.  But if most of them would match,
> > then we would end up doing the same match_pathspec() calls twice for
> > each path, and run lstat() anyway, so you may also be able to
> > construct a perf test that demonstrates a case where this approach
> > is not a clear win (or even degradation), perhaps?
> 
> The patchspec matching is linear in the number of pathspecs, so it's
> easy to get quadratic-ish results by just asking about:
> 
>   git ls-files -- $(git ls-files)
> 
> So that probably provides an easy regression demonstration for this
> patch.

Ah, yeah, it is easy to demonstrate. Making a repo of size $n like this:

  n=10000
  git init
  for i in $(seq $n); do
    echo $i >file$i
  done
  git add .
  git commit -m foo

If we then run:

  time git ls-files -- $(git ls-files) >/dev/null

then n=1000 takes ~15ms for me, but n=10000 takes ~800ms. So that shows
the slowdown of the existing pathspec code as the number of pathspecs
grows.

With this patch, starting with n=10000 and adding in "-m" (which
triggers the code in this patch), like:

  time git ls-files -m -- $(git ls-files) >/dev/null

the time goes from ~15ms (without the patch) to ~800ms with it. Which
makes sense. Nothing is modified, so the current code which puts the
lstat() check first eliminates each entry before we even consider
pathspecs. So it doesn't hit the slow case at all.

But after the patch, we do a preliminary pathspec match and
pay the cost.

So it really is a question of how many items are actually modified, the
cost of lstat(), and the cost of pathspec matching (which varies with
the size of the pathspec).

But like I said, this is kind of a silly case. If it actually starts to
matter in the real world, I think it may be more productive to make the
pathspec code scale better.

-Peff

^ permalink raw reply

* Re: limiting git branch --contains
From: Jeff King @ 2026-06-08 23:12 UTC (permalink / raw)
  To: Kristofer Karlsson; +Cc: git, Derrick Stolee
In-Reply-To: <20260527070510.3510836-1-krka@spotify.com>

On Wed, May 27, 2026 at 09:05:09AM +0200, Kristofer Karlsson wrote:

> With generation numbers (commit-graph), the cached algorithm is
> strictly at least as good as the uncached one. Both have the same
> walk floor -- the generation cutoff in contains_tag_algo is equivalent
> to the STALE boundary in paint_down_to_common. The cache then provides
> a pure win: O(total unique commits) instead of O(N * commits per ref).
> 
> In my benchmarks, the improvement is significant and scales with the
> number of refs and the depth of the target commit:

Yeah, I think this is worth pursuing. Looks like somebody else also
generated a similar patch recently:

  https://lore.kernel.org/git/20260607-ref-filter-memoized-contains-v1-1-a1972dde9c76@gmail.com/

But I think the approach in this thread (to use depth-first only when we
have generation numbers) makes more sense.

I cc'd you on that thread.

-Peff

^ permalink raw reply

* Re: [PATCH] ls-files: filter pathspec before lstat
From: Jeff King @ 2026-06-08 23:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Tamir Duberstein, git, René Scharfe, Patrick Steinhardt
In-Reply-To: <xmqqa4t5yyee.fsf@gitster.g>

On Mon, Jun 08, 2026 at 06:06:33AM -0700, Junio C Hamano wrote:

> On Sun, Jun 7, 2026 at 11:40, Tamir Duberstein wrote:
> > show_files() checks whether each index entry is deleted or modified
> > before show_ce() applies the pathspec. prune_index() avoids most of this
> > work for pathspecs with a common directory prefix, but a top-level name
> > or leading wildcard leaves every entry to be checked.
> > 
> > Match the pathspec before lstat() for the deleted and modified modes.
> > Keep the later match in show_ce() so --error-unmatch is satisfied only
> > by entries that are actually shown.
> 
> Adding an extra early `match_pathspec()` check before making slow
> system calls like `lstat()` makes sense, especially when most of the
> index entries need to be skipped.  But if most of them would match,
> then we would end up doing the same match_pathspec() calls twice for
> each path, and run lstat() anyway, so you may also be able to
> construct a perf test that demonstrates a case where this approach
> is not a clear win (or even degradation), perhaps?

The patchspec matching is linear in the number of pathspecs, so it's
easy to get quadratic-ish results by just asking about:

  git ls-files -- $(git ls-files)

So that probably provides an easy regression demonstration for this
patch.

I don't know how much it matters in the real world. That command is
_already_ slow, and mostly people don't care that much. Long ago I had
patches to build a trie of literal pathspecs, with the intent that
blame-tree/last-modified could use the pathspec mechanism, but
ultimately we took the code in a different direction. And nobody really
complained about it since. ;)

-Peff

^ permalink raw reply

* Re: [PATCH 0/3] config: allow disabling config includes
From: Jeff King @ 2026-06-08 22:51 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee
In-Reply-To: <pull.2139.git.1780927027.gitgitgadget@gmail.com>

On Mon, Jun 08, 2026 at 01:57:03PM +0000, Derrick Stolee via GitGitGadget wrote:

> This series introduces a new way to ignore config include directives via two
> mechanisms:
> 
>  * GIT_CONFIG_INCLUDES=0 in the environment.
>  * git --no-includes ... in the command line.
> 
> My motivation is from a tricky situation where users want to do the risky
> thing and include a repo-tracked file for sharing aliases and other
> recommended config. They are then struggling in a later build step that is
> running Git commands (under a tool we don't control and can't change) that
> then cause filesystem accesses outside of the build system's sandbox.

I'm not opposed to global control of includes, but this is just one way
in which config can escape the sandbox. They can always point to files
(e.g., core.attributesFile) or even commands that totally leave the
sandbox (e.g., ext diff or textconv commands). Fundamentally git config
is equivalent to arbitrary code execution, so pointing an include at a
repo-tracked file carries the risk of confusion both malicious and
accidental.

So I dunno. From the described motivation, this feels like a band-aid
that fixes only one narrow instance of a greater problem.

The notion of enabling/disabling includes per-command is itself a
flexible building block. So it's possible that it has other uses in
general. But it's also a fairly broad hammer that covers more than your
use case. If you're planning to use "git --no-includes" in some script,
then it breaks the config of anybody who uses includes in their
user-level ~/.gitconfig file.

So you may need some more directed limiting.

> One thing I do worry about is whether or not this would cause a significant
> break in behavior, or if this is a relatively safe thing to allow.

Yeah. Consider something like:

  $ cat ~/.gitconfig
  [user]
  name = My Name
  email = me@personal.example.com
  [includeIf "gitdir:/path/to/work/stuff"]
  path = .gitconfig-work

  $ cat ~/.gitconfig-work
  [user]
  email = me@work.example.com

Using "git --no-include" will silently use the wrong user.email value.
That's OK if the user is asking for it, but if you are planning to
sprinkle "--no-include" inside scripts, that's likely to cause
confusion.

-Peff

^ permalink raw reply

* Re: [GSoC PATCH v2 4/4] repo: add path.commondir with absolute and relative suffix formatting
From: Lucas Seiki Oshiro @ 2026-06-08 22:40 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, a3205153416, gitster, jltobler, kumarayushjha123,
	phillip.wood, sandals
In-Reply-To: <20260605163012.181089-5-jayatheerthkulkarni2005@gmail.com>

This patch looks really straightforward after the previous one.
I hope the rest of the path.* series will be just like that.

> +test_repo_info_path 'commondir' 'echo "$(cd .. && pwd)/.git"' '../.git'
> +test_repo_info_path 'commondir' 'echo "$(cd .. && pwd)/custom-common"' '../custom-common' 'GIT_COMMON_DIR="$(cd .. && pwd)/custom-common" GIT_DIR=../.git'
> +test_repo_info_path 'commondir' 'echo "$(cd .. && pwd)/.git"' '../.git' 'GIT_DIR=../.git'

If you use the test_repo_info_path that I suggested in the
other answer, this would be:

test_repo_info_path 'commondir without env vars' 'commondir' 'common-no-env' \
                    '.git' '../.git'

test_repo_info_path 'commondir with GIT_COMMON_DIR and GIT_DIR' 'commondir' \
                    'commondir-envs' 'custom-common' '../custom-common'\
		    'export GIT_COMMON_DIR="$ROOT/custom-common" &&
		     export GIT_DIR="../.git" &&
		     git init --bare "$ROOT/custom-common"
		    '

test_repo_info_path 'commondir with only GIT_DIR' 'commondir' \
                    'commondir-only-gitdir' '.git' '../.git' 'GIT_DIR=../.git'


^ permalink raw reply

* Re: [PATCH] ref-filter: restore prefix-scoped iteration
From: Tamir Duberstein @ 2026-06-08 22:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Karthik Nayak, Patrick Steinhardt, Victoria Dye, ZheNing Hu
In-Reply-To: <xmqqpl20vhni.fsf@gitster.g>

On Mon, Jun 8, 2026 at 2:36 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Tamir Duberstein <tamird@gmail.com> writes:
>
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 1da4c0e60d..2388a57b39 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -3315,19 +3315,31 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, refs_for
> >               prefix = "refs/tags/";
> >
> >       if (prefix) {
>
> Below, adding an extra call to get_main_ref_store(the_repository)
> makes one line unnecessarily split and harder to read.  How about
> doing
>
>                 struct ref_store *store = get_main_ref_store(the_repository);
>
> upfront here, and then use that to replace these two calls of
> get_main_ref_store(the_repository)?

Yep, done in v2.

Thanks for the review!

By the way, how long should I wait before sending new versions of my
patches? I have 4 outstanding at the moment.

^ permalink raw reply

* Re: [GSoC PATCH v2 0/4] teach git repo info to handle path keys
From: Junio C Hamano @ 2026-06-08 22:36 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, a3205153416, jltobler, kumarayushjha123, lucasseikioshiro,
	phillip.wood, sandals
In-Reply-To: <20260605163012.181089-1-jayatheerthkulkarni2005@gmail.com>

K Jayatheerth <jayatheerthkulkarni2005@gmail.com> writes:

> 2. Should we consider a default option?
>    Currently we have path.gitdir.absolute. Should we consider an
>    option where a plain `path.gitdir` returns some default?

Probably not.  It will invite folks wanting to tweak the default
between absolute and relative, rendering this feature useless for
robust scripting.  You do not necessarily want to save typing in
plumbing interface.  You want to reduce ambiguity by reducing more
than one ways to do a thing down to just one way, and as long as
that one way is not overly verbose, you are fine.

^ permalink raw reply

* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Jeff King @ 2026-06-08 22:34 UTC (permalink / raw)
  To: Tamir Duberstein
  Cc: git, Karthik Nayak, Junio C Hamano, Victoria Dye, Derrick Stolee,
	Elijah Newren, Kristofer Karlsson
In-Reply-To: <20260607-ref-filter-memoized-contains-v1-1-a1972dde9c76@gmail.com>

On Sun, Jun 07, 2026 at 08:33:29PM -0700, Tamir Duberstein wrote:

> git branch and git for-each-ref call repo_is_descendant_of() for each
> candidate selected by --contains or --no-contains. Each call starts a
> new graph walk, so refs with shared history repeatedly traverse the same
> commits.
> 
> ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> the tag traversal that caches positive and negative answers across
> candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> 2015-07-07) preserved the branch and tag implementations when ref-filter
> learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> the '--contains' option, 2015-10-18) noted that they should be unified.
> 
> Use the memoized traversal for every ref-filter contains check and
> remove the implementation selector. The cache records answers for one
> fixed target list, so document that callers must clear it before
> changing the list.

The subject line obfuscated the intent here (at least for me). I think a
more clear subject would just be: "ref-filter: always use
contains_tag_algo" or something.

But more importantly, I think the analysis above is missing a key point
about why we didn't make the tag algo the default in the first place: it
is depth first, and thus slower when the merge base can be found quickly
by the breadth-first traversal. For tags, you tend to have to look at
all of history anyway (because you have at least one old tag that
requires walking back that far), but that is often not true for
branches.

We are able to get the best of both worlds if we can cut off the
depth-first traversal early using generation numbers.

So I think a better rule here is to tweak the selection in
commit_contains() to select the depth-first algorithm when we have
generation numbers enabled. There's a patch in an old thread, which was
revived a week or two ago by Kristofer (cc'd):

  https://lore.kernel.org/git/20260527070510.3510836-1-krka@spotify.com/

> The memoized depth-first walk assumes acyclic ancestry, but replacement
> refs can create cycles. Track commits while they are on the walk. If a
> cycle is found, discard partial cache entries and use
> repo_is_descendant_of() for that candidate.

I can believe that the depth-first code doesn't handle cycles well. But
if that's the case, then it's already a problem for "git tag
--contains". And we should fix it as a separate patch from enabling that
algorithm in more cases.

I'm not quite sure how ancestry should be defined in a cycle. How does
the algorithm behave now when it sees a cycle? If it loops infinitely,
we definitely would want to fix that. If not, then to some degree I
don't care too much what answer is provided, since the input is somewhat
nonsense in the first place. And if it is expensive to track, it might
not be worth inflicting that penalty on the sane cases. But it looks
like your solution is just setting an extra flag value in the slab,
which should be pretty cheap.

> The branch and for-each-ref path passed repo_is_descendant_of() through
> a Boolean interface. In configurations where it returned -1 for missing
> ancestry, ref-filter treated the error as "contains". The memoized path
> instead fails when ancestry cannot be parsed, as git tag already did.
> During review of the 2018 reachability series, making parse failures
> fatal was explicitly deferred because that series was intended to
> preserve behavior. Unifying the implementations now makes all callers
> fail consistently instead of preserving that accidental Boolean
> interpretation.

I think that's a good outcome.

> The added p1500 case uses up to 8,192 packed refs along one first-parent
> history. It improves from 0.68 to 0.03 seconds.
> 
> On a checkout with 62,174 remote-tracking refs, I ran:
> 
>     hyperfine --warmup 0 --runs 3 \
>         --command-name parent \
>         '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
>         --command-name this-commit \
>         '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
> 
> The results were:
> 
>              parent       this commit
>   elapsed    104.365 s     467.7 ms
>   user        93.702 s     220.2 ms
>   system       0.723 s     182.7 ms

I didn't time it, but the probable regression case is something like
this: a very deep history with a small number of branches diverging only
a few commits away. Without a commit-graph file (or one without
generation numbers), that probably makes "git branch --contains" slower.

-Peff

^ permalink raw reply

* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Tamir Duberstein @ 2026-06-08 22:30 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: git, Jeff King, Junio C Hamano, Victoria Dye, Derrick Stolee,
	Elijah Newren
In-Reply-To: <CAOLa=ZS_U+u43SV9ELSEU6AT7rzEQ44BuHPAi1BAHEGQAnPoPw@mail.gmail.com>

On Mon, Jun 8, 2026 at 2:18 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
> Tamir Duberstein <tamird@gmail.com> writes:
>
> > git branch and git for-each-ref call repo_is_descendant_of() for each
> > candidate selected by --contains or --no-contains. Each call starts a
> > new graph walk, so refs with shared history repeatedly traverse the same
> > commits.
> >
> > ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> > the tag traversal that caches positive and negative answers across
> > candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> > 2015-07-07) preserved the branch and tag implementations when ref-filter
> > learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> > the '--contains' option, 2015-10-18) noted that they should be unified.
> >
>
> Nicely explained. We should've merged this long ago, so this is a
> worthwhile change.
>
> > Use the memoized traversal for every ref-filter contains check and
> > remove the implementation selector. The cache records answers for one
> > fixed target list, so document that callers must clear it before
> > changing the list.
> >
> > The memoized depth-first walk assumes acyclic ancestry, but replacement
> > refs can create cycles. Track commits while they are on the walk. If a
> > cycle is found, discard partial cache entries and use
> > repo_is_descendant_of() for that candidate.
> >
> > The branch and for-each-ref path passed repo_is_descendant_of() through
> > a Boolean interface. In configurations where it returned -1 for missing
> > ancestry, ref-filter treated the error as "contains". The memoized path
> > instead fails when ancestry cannot be parsed, as git tag already did.
> > During review of the 2018 reachability series, making parse failures
> > fatal was explicitly deferred because that series was intended to
> > preserve behavior. Unifying the implementations now makes all callers
> > fail consistently instead of preserving that accidental Boolean
> > interpretation.
> >
> > The added p1500 case uses up to 8,192 packed refs along one first-parent
> > history. It improves from 0.68 to 0.03 seconds.
> >
> > On a checkout with 62,174 remote-tracking refs, I ran:
> >
> >     hyperfine --warmup 0 --runs 3 \
> >         --command-name parent \
> >         '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
> >         --command-name this-commit \
> >         '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
> >
> > The results were:
> >
> >              parent       this commit
> >   elapsed    104.365 s     467.7 ms
> >   user        93.702 s     220.2 ms
> >   system       0.723 s     182.7 ms
> >
> > The wall-time standard deviations were 11.356 seconds and 133.8
> > milliseconds, respectively, for a 223x speedup. Both commands produced
> > output with SHA-256
> > 2466f6e2b72aa16b1a2126eddb81c8a1b2764ee251204ac034c191a925aa896f.
> >
> > Both revisions were rebuilt with the default -O2 flags using Apple clang
> > 21.0.0 on macOS 26.5. The machine was a MacBook Pro (Mac16,6) with a
> > 16-core Apple M4 Max (12 performance and four efficiency cores) and 128
> > GB RAM.
> >
> > Link: https://lore.kernel.org/git/1445163904-24611-1-git-send-email-Karthik.188@gmail.com/
> > Link: https://lore.kernel.org/git/20180723204112.233274-1-jonathantanmy@google.com/
> > Link: https://lore.kernel.org/git/24424e55-7fa8-d05b-bc39-e14b4d5abcb6@gmail.com/
> > Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> > ---
> >  builtin/tag.c                  |  1 -
> >  commit-reach.c                 | 45 +++++++++++++++++++++++++++++++-----------
> >  commit-reach.h                 | 15 ++++++++++----
> >  ref-filter.c                   |  6 ++++--
> >  ref-filter.h                   |  7 +++----
> >  t/helper/test-reach.c          | 10 ++--------
> >  t/perf/p1500-graph-walks.sh    | 24 +++++++++++++++++++++-
> >  t/t6301-for-each-ref-errors.sh | 18 +++++++++++++++++
> >  t/t6302-for-each-ref-filter.sh | 21 ++++++++++++++++++++
> >  t/t6600-test-reach.sh          |  6 ++----
> >  10 files changed, 117 insertions(+), 36 deletions(-)
> >
> > diff --git a/builtin/tag.c b/builtin/tag.c
> > index d51c2e3349..9f34d948d4 100644
> > --- a/builtin/tag.c
> > +++ b/builtin/tag.c
> > @@ -71,7 +71,6 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >
> >       if (verify_ref_format(format))
> >               die(_("unable to parse format string"));
> > -     filter->with_commit_tag_algo = 1;
>
> We were selectively using the algo for `git tag`, like mentioned I guess
> we'll entirely remove `with_commit_tag_algo` below somewhere
>
> >       filter_and_format_refs(filter, FILTER_REFS_TAGS, sorting, format);
> >
> >       free(to_free);
> > diff --git a/commit-reach.c b/commit-reach.c
> > index 9b3ea46d6f..6e599a3670 100644
> > --- a/commit-reach.c
> > +++ b/commit-reach.c
> > @@ -6,7 +6,6 @@
> >  #include "decorate.h"
> >  #include "hex.h"
> >  #include "prio-queue.h"
> > -#include "ref-filter.h"
> >  #include "revision.h"
> >  #include "tag.h"
> >  #include "commit-reach.h"
> > @@ -708,7 +707,8 @@ static int in_commit_list(const struct commit_list *want, struct commit *c)
> >
> >  /*
> >   * Test whether the candidate is contained in the list.
> > - * Do not recurse to find out, though, but return -1 if inconclusive.
> > + * Do not recurse to find out, though, but return CONTAINS_UNKNOWN if
> > + * inconclusive.
>
> Okay so the code does return CONTAINS_UNKNOWN which is an enum beginning
> at 0, so this makes sense.
>
> >   */
> >  static enum contains_result contains_test(struct commit *candidate,
> >                                         const struct commit_list *want,
> > @@ -743,9 +743,9 @@ static void push_to_contains_stack(struct commit *candidate, struct contains_sta
> >       contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
> >  }
> >
> > -static enum contains_result contains_tag_algo(struct commit *candidate,
> > -                                           const struct commit_list *want,
> > -                                           struct contains_cache *cache)
> > +static enum contains_result contains_algo(struct commit *candidate,
> > +                                       struct commit_list *want,
> > +                                       struct contains_cache *cache)
> >  {
> >       struct contains_stack contains_stack = { 0, 0, NULL };
> >       enum contains_result result;
> > @@ -765,6 +765,7 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> >       if (result != CONTAINS_UNKNOWN)
> >               return result;
> >
> > +     *contains_cache_at(cache, candidate) = CONTAINS_IN_PROGRESS;
> >       push_to_contains_stack(candidate, &contains_stack);
> >       while (contains_stack.nr) {
> >               struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
> > @@ -776,8 +777,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> >                       contains_stack.nr--;
> >               }
> >               /*
> > -              * If we just popped the stack, parents->item has been marked,
> > -              * therefore contains_test will return a meaningful yes/no.
> > +              * A parent may have just been popped and marked, or may still
> > +              * be active when replacement refs create a cycle.
> >                */
> >               else switch (contains_test(parents->item, want, cache, cutoff)) {
> >               case CONTAINS_YES:
> > @@ -787,21 +788,41 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> >               case CONTAINS_NO:
> >                       entry->parents = parents->next;
> >                       break;
> > +             case CONTAINS_IN_PROGRESS:
> > +                     /*
> > +                      * Partial negative answers are not safe across a cycle.
> > +                      * Discard them and use the cycle-safe reachability walk.
> > +                      */
> > +                     goto cycle;
> >
>
> If I understand this correctly, we now use CONTAINS_IN_PROGRESS to
> showcase a commit for which we still don't have a result. Since any
> commit with UNKNOWN will start a recursive search through its parents.
>
> So with this if we encounter a CONTAINS_IN_PROGRESS while recursion,
> this would indicate that we hit a cyclic graph and so go to the fallback
> of using repo_is_descendant_of(). So this avoids an infinite recursion
> in such instances. Makes sense.
>
> >               case CONTAINS_UNKNOWN:
> > +                     *contains_cache_at(cache, parents->item) =
> > +                             CONTAINS_IN_PROGRESS;
> >                       push_to_contains_stack(parents->item, &contains_stack);
> >                       break;
> >               }
> >       }
> >       free(contains_stack.contains_stack);
> >       return contains_test(candidate, want, cache, cutoff);
> > +
> > +cycle:
> > +     free(contains_stack.contains_stack);
> > +     clear_contains_cache(cache);
> > +     init_contains_cache(cache);
> > +
> > +     result = repo_is_descendant_of(the_repository, candidate, want);
> > +     if (result < 0)
> > +             exit(128);
> > +     *contains_cache_at(cache, candidate) =
> > +             result ? CONTAINS_YES : CONTAINS_NO;
> > +     return result ? CONTAINS_YES : CONTAINS_NO;
> >  }
> >
> > -int commit_contains(struct ref_filter *filter, struct commit *commit,
> > -                 struct commit_list *list, struct contains_cache *cache)
> > +int commit_contains(struct commit *commit, struct commit_list *list,
> > +                 struct contains_cache *cache)
> >  {
> > -     if (filter->with_commit_tag_algo)
> > -             return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
> > -     return repo_is_descendant_of(the_repository, commit, list);
> > +     if (!list)
> > +             return 1;
> > +     return contains_algo(commit, list, cache) == CONTAINS_YES;
> >  }
> >
> >  int can_all_from_reach_with_flag(struct object_array *from,
> > diff --git a/commit-reach.h b/commit-reach.h
> > index 3f3a563d8a..144dc56275 100644
> > --- a/commit-reach.h
> > +++ b/commit-reach.h
> > @@ -5,7 +5,6 @@
> >  #include "commit-slab.h"
> >
> >  struct commit_list;
> > -struct ref_filter;
> >  struct object_id;
> >  struct object_array;
> >
> > @@ -73,13 +72,21 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
> >  enum contains_result {
> >       CONTAINS_UNKNOWN = 0,
> >       CONTAINS_NO,
> > -     CONTAINS_YES
> > +     CONTAINS_YES,
> > +     CONTAINS_IN_PROGRESS
> >  };
> >
> >  define_commit_slab(contains_cache, enum contains_result);
> >
> > -int commit_contains(struct ref_filter *filter, struct commit *commit,
> > -                 struct commit_list *list, struct contains_cache *cache);
> > +/*
> > + * Return whether "commit" is a descendant of any commit in "list". An empty
> > + * list matches.
> > + *
> > + * "cache" records answers for one fixed "list". Clear it before changing the
> > + * list.
> > + */
> > +int commit_contains(struct commit *commit, struct commit_list *list,
> > +                 struct contains_cache *cache);
> >
> >  /*
> >   * Determine if every commit in 'from' can reach at least one commit
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 1da4c0e60d..7788147959 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2991,11 +2991,13 @@ static struct ref_array_item *apply_ref_filter(const struct reference *ref,
> >                       return NULL;
> >               /* We perform the filtering for the '--contains' option... */
> >               if (filter->with_commit &&
> > -                 !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
> > +                 !commit_contains(commit, filter->with_commit,
> > +                                  &filter->internal.contains_cache))
> >                       return NULL;
> >               /* ...or for the `--no-contains' option */
> >               if (filter->no_commit &&
> > -                 commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
> > +                 commit_contains(commit, filter->no_commit,
> > +                                 &filter->internal.no_contains_cache))
> >                       return NULL;
> >       }
> >
> > diff --git a/ref-filter.h b/ref-filter.h
> > index 120221b47f..9e14afca9c 100644
> > --- a/ref-filter.h
> > +++ b/ref-filter.h
> > @@ -73,10 +73,9 @@ struct ref_filter {
> >       struct commit_list *reachable_from;
> >       struct commit_list *unreachable_from;
> >
> > -     unsigned int with_commit_tag_algo : 1,
> > -             match_as_path : 1,
> > -             ignore_case : 1,
> > -             detached : 1;
> > +     unsigned int match_as_path : 1,
> > +                  ignore_case : 1,
> > +                  detached : 1;
> >       unsigned int kind,
> >               lines;
> >       int abbrev,
>
> Nit: With the changes above. I do wish it was split into two commits.
> 1. Fix cyclic recursions in the algo.
> 2. Use the algo for all filter types.

Makes sense. I split v2 accordingly. The first patch now fixes the
existing git tag --contains traversal and tests that path directly.
The second patch then uses the traversal for the other ref-filter
callers.

>
> > diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
> > index 5d86a96c17..82235f713e 100644
> > --- a/t/helper/test-reach.c
> > +++ b/t/helper/test-reach.c
> > @@ -6,7 +6,6 @@
> >  #include "gettext.h"
> >  #include "hex.h"
> >  #include "object-name.h"
> > -#include "ref-filter.h"
> >  #include "setup.h"
> >  #include "string-list.h"
> >  #include "tag.h"
> > @@ -138,16 +137,11 @@ int cmd__reach(int ac, const char **av)
> >
> >               printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
> >       } else if (!strcmp(av[1], "commit_contains")) {
> > -             struct ref_filter filter = REF_FILTER_INIT;
> >               struct contains_cache cache;
> >               init_contains_cache(&cache);
> >
> > -             if (ac > 2 && !strcmp(av[2], "--tag"))
> > -                     filter.with_commit_tag_algo = 1;
> > -             else
> > -                     filter.with_commit_tag_algo = 0;
> > -
> > -             printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
> > +             printf("%s(_,A,X,_):%d\n", av[1],
> > +                    commit_contains(A, X, &cache));
> >               clear_contains_cache(&cache);
> >       } else if (!strcmp(av[1], "get_reachable_subset")) {
> >               const int reachable_flag = 1;
>
> Makes sense.
>
> > diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
> > index 5b23ce5db9..ac68fdbacd 100755
> > --- a/t/perf/p1500-graph-walks.sh
> > +++ b/t/perf/p1500-graph-walks.sh
> > @@ -5,6 +5,8 @@ test_description='Commit walk performance tests'
> >
> >  test_perf_large_repo
> >
> > +contains_ref_limit=8192
> > +
> >  test_expect_success 'setup' '
> >       git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
> >       sort -r allrefs | head -n 50 >refs &&
> > @@ -32,10 +34,25 @@ test_expect_success 'setup' '
> >               echo "X:$line" >>test-tool-tags || return 1
> >       done &&
> >
> > +     git rev-list --first-parent --max-count=$contains_ref_limit HEAD >contains-commits &&
> > +     contains_ref_count=$(wc -l <contains-commits) &&
> > +     test "$contains_ref_count" -gt 0 &&
> > +     contains_base=$(tail -n 1 contains-commits) &&
> > +     export contains_base &&
> > +     awk "{
> > +             printf \"update refs/contains-perf/%04d %s\\n\", NR, \$1
> > +     }" contains-commits |
> > +             git update-ref --stdin &&
> > +     git pack-refs --include "refs/contains-perf/*" &&
> > +
> >       commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
> >       git update-ref refs/heads/disjoint-base $commit &&
> >
> > -     git commit-graph write --reachable
> > +     git commit-graph write --reachable &&
> > +
> > +     git for-each-ref --contains="$contains_base" \
> > +             refs/contains-perf/ >actual &&
> > +     test_line_count = $contains_ref_count actual
>
> Shouldn't this be a separate test and not a part of the setup?
>
> >  '
> >
> >  test_perf 'ahead-behind counts: git for-each-ref' '
> > @@ -62,6 +79,11 @@ test_perf 'contains: git tag --merged' '
> >       xargs git tag --merged=HEAD <tags
> >  '
> >
> > +test_perf 'contains: git for-each-ref --contains' '
> > +     git for-each-ref --contains="$contains_base" \
> > +             refs/contains-perf/ >/dev/null
> > +'
> > +
>
> Ah! we also have this, so perhaps moving the `test_line_count` here and
> dropping it above would be better.

Makes sense! Done in v2.

>
> >  test_perf 'is-base check: test-tool reach (refs)' '
> >       test-tool reach get_branch_base_for_tip <test-tool-refs
> >  '
> > diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
> > index e06feb06e9..169cc70c23 100755
> > --- a/t/t6301-for-each-ref-errors.sh
> > +++ b/t/t6301-for-each-ref-errors.sh
> > @@ -52,6 +52,24 @@ test_expect_success 'Missing objects are reported correctly' '
> >       test_must_be_empty brief-err
> >  '
> >
> > +test_expect_success 'missing ancestors are reported by contains filters' '
> > +     test_when_finished "git update-ref -d refs/heads/missing-parent" &&
> > +     {
> > +             echo "tree $(git rev-parse HEAD^{tree})" &&
> > +             echo "parent $MISSING" &&
> > +             git cat-file commit HEAD |
> > +                     sed -n -e "/^author /p" -e "/^committer /p" &&
> > +             echo &&
> > +             echo "missing parent"
> > +     } > commit &&
> > +     broken=$(git hash-object -t commit -w commit) &&
> > +     git update-ref refs/heads/missing-parent "$broken" &&
> > +     test_must_fail git for-each-ref --contains=HEAD \
> > +             refs/heads/missing-parent >out 2>err &&
> > +     test_must_be_empty out &&
> > +     test_grep "unable to parse commit $MISSING" err
> > +'
> > +
> >  test_expect_success 'ahead-behind requires an argument' '
> >       test_must_fail git for-each-ref \
> >               --format="%(ahead-behind)" 2>err &&
> > diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> > index 7f060d97bf..423505d1fb 100755
> > --- a/t/t6302-for-each-ref-filter.sh
> > +++ b/t/t6302-for-each-ref-filter.sh
> > @@ -177,6 +177,27 @@ test_expect_success 'filtering with --contains and --no-contains' '
> >       test_cmp expect actual
> >  '
> >
> > +test_expect_success 'contains handles cyclic replacement histories' '
> > +     one=$(git rev-parse one) &&
> > +     three=$(git rev-parse three) &&
> > +     test_when_finished "
> > +             git replace -d $one
> > +             git replace -d $three
> > +             git tag -d cycle-a cycle-b
> > +     " &&
> > +     git tag cycle-a "$one" &&
> > +     git tag cycle-b "$three" &&
> > +     git replace --graft "$one" "$three" two &&
> > +     git replace --graft "$three" "$one" &&
> > +     cat >expect <<-\EOF &&
> > +     refs/tags/cycle-a
> > +     refs/tags/cycle-b
> > +     EOF
> > +     git for-each-ref --format="%(refname)" --contains=two \
> > +             "refs/tags/cycle-*" >actual &&
> > +     test_cmp expect actual
> > +'
> > +
> >  test_expect_success '%(color) must fail' '
> >       test_must_fail git for-each-ref --format="%(color)%(refname)"
> >  '
> > diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> > index b5b314e570..1ecc2571c2 100755
> > --- a/t/t6600-test-reach.sh
> > +++ b/t/t6600-test-reach.sh
> > @@ -286,8 +286,7 @@ test_expect_success 'commit_contains:hit' '
> >       X:commit-9-3
> >       EOF
> >       echo "commit_contains(_,A,X,_):1" >expect &&
> > -     test_all_modes commit_contains &&
> > -     test_all_modes commit_contains --tag
> > +     test_all_modes commit_contains
> >  '
> >
> >  test_expect_success 'commit_contains:miss' '
> > @@ -303,8 +302,7 @@ test_expect_success 'commit_contains:miss' '
> >       X:commit-9-3
> >       EOF
> >       echo "commit_contains(_,A,X,_):0" >expect &&
> > -     test_all_modes commit_contains &&
> > -     test_all_modes commit_contains --tag
> > +     test_all_modes commit_contains
> >  '
> >
> >  test_expect_success 'rev-list: basic topo-order' '
> >
> > ---
> > base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
> > change-id: 20260607-ref-filter-memoized-contains-7cb6b3bccad1
> >
> > Best regards,
> > --
> > Tamir Duberstein <tamird@gmail.com>
>
> Overall the patch looks great. The perf improvements are also very
> welcome. Some small nits from me.

Thanks a lot for the review!

^ permalink raw reply

* Re: [GSoC PATCH v2 3/4] repo: add path.gitdir with absolute and relative suffix formatting
From: Lucas Seiki Oshiro @ 2026-06-08 22:17 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, a3205153416, gitster, jltobler, kumarayushjha123,
	phillip.wood, sandals
In-Reply-To: <20260605163012.181089-4-jayatheerthkulkarni2005@gmail.com>


> +test_repo_info_path () {
> + field_name=$1
> + expect_absolute_eval=$2
> + expect_relative=$3
> + env_prefix=$4

This helper function needs a documentation.

> + test_expect_success "query individual key: path.$field_name.absolute${env_prefix:+ ($env_prefix)}" '

This makes the output polluted. What about changing it by something like:

	test_expect_success "absolute: $label' '...'
	test_expect_success "relative: $label' '...'

with a custom label?

> +
> +test_expect_success 'setup test repository layout for path fields' '
> + git init test-repo &&
> + mkdir -p test-repo/sub
> +'

The helper function `test_repo_info_path` is relying too much on the
existence of the `test-repo`. I think it would be better to add a new
parameter `repo_name` (or similar) because:

1. You could move this creation to the helper function and
   you won't need to place the test after that creation

2. You could use different for each (test_repo_info_path call, path format)
   pair. Currently, if more than one test fails, its result is overwritten
   and the `expect` and `actual` files from the trash directory will be
   the last of the broken tests.

3. You won't need to use the hacky 'echo "$(cd .. && pwd)' 

This applies my suggestions (feel free to use, adapt or discard it):

test_repo_info_path () {
        label=$1
	field_name=$2
	repo_name=$3
	expect_absolute=$4
	expect_relative=$5
	init_command=$6

	absolute_root="$repo_name"-absolute
	relative_root="$repo_name"-relative

	expect_absolute="$PWD"/"$absolute_root"/"$expect_absolute"

	test_expect_success 'setup test repository layout for path fields' '
		git init "$absolute_root" &&
		git init "$relative_root" &&
	       	mkdir -p "$absolute_root"/sub "$relative_root"/sub
	'

	test_expect_success "absolute: $label" '
		(
			export ROOT="$PWD"/"$absolute_root" &&
			cd "$absolute_root"/sub &&
			eval "$init_command" &&
			echo "path.$field_name.absolute=$expect_absolute" >expect &&
			git repo info path.$field_name.absolute >actual &&
			test_cmp expect actual
		)
	'

	test_expect_success "relative: $label" '
		(
			export ROOT="$PWD"/"$relative_root" &&
			cd "$relative_root"/sub &&
			eval "$init_command" &&
			echo "path.$field_name.relative=$expect_relative" >expect &&
			git repo info path.$field_name.relative >actual &&
			test_cmp expect actual
		)
	'
}

test_repo_info_path 'gitdir' 'gitdir' 'gitdir' '.git' '../.git'



^ permalink raw reply

* Re: [PATCH] docs: fix typos
From: Junio C Hamano @ 2026-06-08 22:16 UTC (permalink / raw)
  To: Tuomas Ahola; +Cc: git
In-Reply-To: <20260604131457.19215-1-taahol@utu.fi>

Tuomas Ahola <taahol@utu.fi> writes:

> Fix some typos and grammar errors in comments and documentation files.
>
> Signed-off-by: Tuomas Ahola <taahol@utu.fi>
> ---

Thanks, all changes make sense.  Will queue.


^ permalink raw reply

* Re: [PATCH 0/6] t: add lint-style.pl and convert grep to test_grep
From: Junio C Hamano @ 2026-06-08 21:36 UTC (permalink / raw)
  To: Michael Montalbo via GitGitGadget
  Cc: git, D. Ben Knoble, Eric Sunshine, Michael Montalbo
In-Reply-To: <pull.2135.git.1780559158.gitgitgadget@gmail.com>

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The test suite has a test_grep wrapper that prints file contents on
> assertion failure, making debugging easier. Many tests still use bare 'grep'
> for assertions, which silently swallows context on failure.
>
> This series adds a lint tool (lint-style.pl) to mechanically detect and
> convert these, then applies it across the test suite.

I do not think we want an automated tool that rewrites the source
files.  I was hoping that we would get a patch or two that _adds_ to
existing test-lint framework (i.e., 'test-grep' that 'test-lint'
target depends on in t/Makefile) that gives diagnosis in a similar
fashion as test-lint-shell-syntax and test-chainlint do.

Also some existing uses of "grep" are not end-user facing and should
not be rewritten to "test_grep".

^ permalink raw reply

* Re: [PATCH] ref-filter: restore prefix-scoped iteration
From: Junio C Hamano @ 2026-06-08 21:36 UTC (permalink / raw)
  To: Tamir Duberstein
  Cc: git, Karthik Nayak, Patrick Steinhardt, Victoria Dye, ZheNing Hu
In-Reply-To: <20260605-fix-git-branch-regression-v1-1-02f40ad40929@gmail.com>

Tamir Duberstein <tamird@gmail.com> writes:

> diff --git a/ref-filter.c b/ref-filter.c
> index 1da4c0e60d..2388a57b39 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -3315,19 +3315,31 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, refs_for
>  		prefix = "refs/tags/";
>  
>  	if (prefix) {

Below, adding an extra call to get_main_ref_store(the_repository)
makes one line unnecessarily split and harder to read.  How about
doing

		struct ref_store *store = get_main_ref_store(the_repository);

upfront here, and then use that to replace these two calls of
get_main_ref_store(the_repository)?

> +		if (filter->start_after) {
> +			struct ref_iterator *iter;
>  
> +			iter = refs_ref_iterator_begin(
> +				get_main_ref_store(the_repository), "", NULL, 0,
> +				0);
>  
>  			ret = start_ref_iterator_after(iter, filter->start_after);
> +			if (!ret)
> +				ret = do_for_each_ref_iterator(iter, fn,
> +							       cb_data);
> +		} else {
> +			/*
> +			 * Pass the prefix during construction because the files
> +			 * backend primes loose refs before a later seek can
> +			 * narrow the iterator.
> +			 */
> +			struct refs_for_each_ref_options opts = {
> +				.prefix = prefix,
> +			};
>  
> +			ret = refs_for_each_ref_ext(
> +				get_main_ref_store(the_repository), fn, cb_data,
> +				&opts);
> +		}
>  	} else if (filter->kind & FILTER_REFS_REGULAR) {

^ permalink raw reply

* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Karthik Nayak @ 2026-06-08 21:18 UTC (permalink / raw)
  To: Tamir Duberstein, git
  Cc: Jeff King, Junio C Hamano, Victoria Dye, Derrick Stolee,
	Elijah Newren
In-Reply-To: <20260607-ref-filter-memoized-contains-v1-1-a1972dde9c76@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 17907 bytes --]

Tamir Duberstein <tamird@gmail.com> writes:

> git branch and git for-each-ref call repo_is_descendant_of() for each
> candidate selected by --contains or --no-contains. Each call starts a
> new graph walk, so refs with shared history repeatedly traverse the same
> commits.
>
> ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> the tag traversal that caches positive and negative answers across
> candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> 2015-07-07) preserved the branch and tag implementations when ref-filter
> learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> the '--contains' option, 2015-10-18) noted that they should be unified.
>

Nicely explained. We should've merged this long ago, so this is a
worthwhile change.

> Use the memoized traversal for every ref-filter contains check and
> remove the implementation selector. The cache records answers for one
> fixed target list, so document that callers must clear it before
> changing the list.
>
> The memoized depth-first walk assumes acyclic ancestry, but replacement
> refs can create cycles. Track commits while they are on the walk. If a
> cycle is found, discard partial cache entries and use
> repo_is_descendant_of() for that candidate.
>
> The branch and for-each-ref path passed repo_is_descendant_of() through
> a Boolean interface. In configurations where it returned -1 for missing
> ancestry, ref-filter treated the error as "contains". The memoized path
> instead fails when ancestry cannot be parsed, as git tag already did.
> During review of the 2018 reachability series, making parse failures
> fatal was explicitly deferred because that series was intended to
> preserve behavior. Unifying the implementations now makes all callers
> fail consistently instead of preserving that accidental Boolean
> interpretation.
>
> The added p1500 case uses up to 8,192 packed refs along one first-parent
> history. It improves from 0.68 to 0.03 seconds.
>
> On a checkout with 62,174 remote-tracking refs, I ran:
>
>     hyperfine --warmup 0 --runs 3 \
>         --command-name parent \
>         '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
>         --command-name this-commit \
>         '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
>
> The results were:
>
>              parent       this commit
>   elapsed    104.365 s     467.7 ms
>   user        93.702 s     220.2 ms
>   system       0.723 s     182.7 ms
>
> The wall-time standard deviations were 11.356 seconds and 133.8
> milliseconds, respectively, for a 223x speedup. Both commands produced
> output with SHA-256
> 2466f6e2b72aa16b1a2126eddb81c8a1b2764ee251204ac034c191a925aa896f.
>
> Both revisions were rebuilt with the default -O2 flags using Apple clang
> 21.0.0 on macOS 26.5. The machine was a MacBook Pro (Mac16,6) with a
> 16-core Apple M4 Max (12 performance and four efficiency cores) and 128
> GB RAM.
>
> Link: https://lore.kernel.org/git/1445163904-24611-1-git-send-email-Karthik.188@gmail.com/
> Link: https://lore.kernel.org/git/20180723204112.233274-1-jonathantanmy@google.com/
> Link: https://lore.kernel.org/git/24424e55-7fa8-d05b-bc39-e14b4d5abcb6@gmail.com/
> Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> ---
>  builtin/tag.c                  |  1 -
>  commit-reach.c                 | 45 +++++++++++++++++++++++++++++++-----------
>  commit-reach.h                 | 15 ++++++++++----
>  ref-filter.c                   |  6 ++++--
>  ref-filter.h                   |  7 +++----
>  t/helper/test-reach.c          | 10 ++--------
>  t/perf/p1500-graph-walks.sh    | 24 +++++++++++++++++++++-
>  t/t6301-for-each-ref-errors.sh | 18 +++++++++++++++++
>  t/t6302-for-each-ref-filter.sh | 21 ++++++++++++++++++++
>  t/t6600-test-reach.sh          |  6 ++----
>  10 files changed, 117 insertions(+), 36 deletions(-)
>
> diff --git a/builtin/tag.c b/builtin/tag.c
> index d51c2e3349..9f34d948d4 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -71,7 +71,6 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
>
>  	if (verify_ref_format(format))
>  		die(_("unable to parse format string"));
> -	filter->with_commit_tag_algo = 1;

We were selectively using the algo for `git tag`, like mentioned I guess
we'll entirely remove `with_commit_tag_algo` below somewhere

>  	filter_and_format_refs(filter, FILTER_REFS_TAGS, sorting, format);
>
>  	free(to_free);
> diff --git a/commit-reach.c b/commit-reach.c
> index 9b3ea46d6f..6e599a3670 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -6,7 +6,6 @@
>  #include "decorate.h"
>  #include "hex.h"
>  #include "prio-queue.h"
> -#include "ref-filter.h"
>  #include "revision.h"
>  #include "tag.h"
>  #include "commit-reach.h"
> @@ -708,7 +707,8 @@ static int in_commit_list(const struct commit_list *want, struct commit *c)
>
>  /*
>   * Test whether the candidate is contained in the list.
> - * Do not recurse to find out, though, but return -1 if inconclusive.
> + * Do not recurse to find out, though, but return CONTAINS_UNKNOWN if
> + * inconclusive.

Okay so the code does return CONTAINS_UNKNOWN which is an enum beginning
at 0, so this makes sense.

>   */
>  static enum contains_result contains_test(struct commit *candidate,
>  					  const struct commit_list *want,
> @@ -743,9 +743,9 @@ static void push_to_contains_stack(struct commit *candidate, struct contains_sta
>  	contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
>  }
>
> -static enum contains_result contains_tag_algo(struct commit *candidate,
> -					      const struct commit_list *want,
> -					      struct contains_cache *cache)
> +static enum contains_result contains_algo(struct commit *candidate,
> +					  struct commit_list *want,
> +					  struct contains_cache *cache)
>  {
>  	struct contains_stack contains_stack = { 0, 0, NULL };
>  	enum contains_result result;
> @@ -765,6 +765,7 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
>  	if (result != CONTAINS_UNKNOWN)
>  		return result;
>
> +	*contains_cache_at(cache, candidate) = CONTAINS_IN_PROGRESS;
>  	push_to_contains_stack(candidate, &contains_stack);
>  	while (contains_stack.nr) {
>  		struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
> @@ -776,8 +777,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
>  			contains_stack.nr--;
>  		}
>  		/*
> -		 * If we just popped the stack, parents->item has been marked,
> -		 * therefore contains_test will return a meaningful yes/no.
> +		 * A parent may have just been popped and marked, or may still
> +		 * be active when replacement refs create a cycle.
>  		 */
>  		else switch (contains_test(parents->item, want, cache, cutoff)) {
>  		case CONTAINS_YES:
> @@ -787,21 +788,41 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
>  		case CONTAINS_NO:
>  			entry->parents = parents->next;
>  			break;
> +		case CONTAINS_IN_PROGRESS:
> +			/*
> +			 * Partial negative answers are not safe across a cycle.
> +			 * Discard them and use the cycle-safe reachability walk.
> +			 */
> +			goto cycle;
>

If I understand this correctly, we now use CONTAINS_IN_PROGRESS to
showcase a commit for which we still don't have a result. Since any
commit with UNKNOWN will start a recursive search through its parents.

So with this if we encounter a CONTAINS_IN_PROGRESS while recursion,
this would indicate that we hit a cyclic graph and so go to the fallback
of using repo_is_descendant_of(). So this avoids an infinite recursion
in such instances. Makes sense.

>  		case CONTAINS_UNKNOWN:
> +			*contains_cache_at(cache, parents->item) =
> +				CONTAINS_IN_PROGRESS;
>  			push_to_contains_stack(parents->item, &contains_stack);
>  			break;
>  		}
>  	}
>  	free(contains_stack.contains_stack);
>  	return contains_test(candidate, want, cache, cutoff);
> +
> +cycle:
> +	free(contains_stack.contains_stack);
> +	clear_contains_cache(cache);
> +	init_contains_cache(cache);
> +
> +	result = repo_is_descendant_of(the_repository, candidate, want);
> +	if (result < 0)
> +		exit(128);
> +	*contains_cache_at(cache, candidate) =
> +		result ? CONTAINS_YES : CONTAINS_NO;
> +	return result ? CONTAINS_YES : CONTAINS_NO;
>  }
>
> -int commit_contains(struct ref_filter *filter, struct commit *commit,
> -		    struct commit_list *list, struct contains_cache *cache)
> +int commit_contains(struct commit *commit, struct commit_list *list,
> +		    struct contains_cache *cache)
>  {
> -	if (filter->with_commit_tag_algo)
> -		return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
> -	return repo_is_descendant_of(the_repository, commit, list);
> +	if (!list)
> +		return 1;
> +	return contains_algo(commit, list, cache) == CONTAINS_YES;
>  }
>
>  int can_all_from_reach_with_flag(struct object_array *from,
> diff --git a/commit-reach.h b/commit-reach.h
> index 3f3a563d8a..144dc56275 100644
> --- a/commit-reach.h
> +++ b/commit-reach.h
> @@ -5,7 +5,6 @@
>  #include "commit-slab.h"
>
>  struct commit_list;
> -struct ref_filter;
>  struct object_id;
>  struct object_array;
>
> @@ -73,13 +72,21 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
>  enum contains_result {
>  	CONTAINS_UNKNOWN = 0,
>  	CONTAINS_NO,
> -	CONTAINS_YES
> +	CONTAINS_YES,
> +	CONTAINS_IN_PROGRESS
>  };
>
>  define_commit_slab(contains_cache, enum contains_result);
>
> -int commit_contains(struct ref_filter *filter, struct commit *commit,
> -		    struct commit_list *list, struct contains_cache *cache);
> +/*
> + * Return whether "commit" is a descendant of any commit in "list". An empty
> + * list matches.
> + *
> + * "cache" records answers for one fixed "list". Clear it before changing the
> + * list.
> + */
> +int commit_contains(struct commit *commit, struct commit_list *list,
> +		    struct contains_cache *cache);
>
>  /*
>   * Determine if every commit in 'from' can reach at least one commit
> diff --git a/ref-filter.c b/ref-filter.c
> index 1da4c0e60d..7788147959 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2991,11 +2991,13 @@ static struct ref_array_item *apply_ref_filter(const struct reference *ref,
>  			return NULL;
>  		/* We perform the filtering for the '--contains' option... */
>  		if (filter->with_commit &&
> -		    !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
> +		    !commit_contains(commit, filter->with_commit,
> +				     &filter->internal.contains_cache))
>  			return NULL;
>  		/* ...or for the `--no-contains' option */
>  		if (filter->no_commit &&
> -		    commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
> +		    commit_contains(commit, filter->no_commit,
> +				    &filter->internal.no_contains_cache))
>  			return NULL;
>  	}
>
> diff --git a/ref-filter.h b/ref-filter.h
> index 120221b47f..9e14afca9c 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -73,10 +73,9 @@ struct ref_filter {
>  	struct commit_list *reachable_from;
>  	struct commit_list *unreachable_from;
>
> -	unsigned int with_commit_tag_algo : 1,
> -		match_as_path : 1,
> -		ignore_case : 1,
> -		detached : 1;
> +	unsigned int match_as_path : 1,
> +		     ignore_case : 1,
> +		     detached : 1;
>  	unsigned int kind,
>  		lines;
>  	int abbrev,

Nit: With the changes above. I do wish it was split into two commits.
1. Fix cyclic recursions in the algo.
2. Use the algo for all filter types.

> diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
> index 5d86a96c17..82235f713e 100644
> --- a/t/helper/test-reach.c
> +++ b/t/helper/test-reach.c
> @@ -6,7 +6,6 @@
>  #include "gettext.h"
>  #include "hex.h"
>  #include "object-name.h"
> -#include "ref-filter.h"
>  #include "setup.h"
>  #include "string-list.h"
>  #include "tag.h"
> @@ -138,16 +137,11 @@ int cmd__reach(int ac, const char **av)
>
>  		printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
>  	} else if (!strcmp(av[1], "commit_contains")) {
> -		struct ref_filter filter = REF_FILTER_INIT;
>  		struct contains_cache cache;
>  		init_contains_cache(&cache);
>
> -		if (ac > 2 && !strcmp(av[2], "--tag"))
> -			filter.with_commit_tag_algo = 1;
> -		else
> -			filter.with_commit_tag_algo = 0;
> -
> -		printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
> +		printf("%s(_,A,X,_):%d\n", av[1],
> +		       commit_contains(A, X, &cache));
>  		clear_contains_cache(&cache);
>  	} else if (!strcmp(av[1], "get_reachable_subset")) {
>  		const int reachable_flag = 1;

Makes sense.

> diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
> index 5b23ce5db9..ac68fdbacd 100755
> --- a/t/perf/p1500-graph-walks.sh
> +++ b/t/perf/p1500-graph-walks.sh
> @@ -5,6 +5,8 @@ test_description='Commit walk performance tests'
>
>  test_perf_large_repo
>
> +contains_ref_limit=8192
> +
>  test_expect_success 'setup' '
>  	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
>  	sort -r allrefs | head -n 50 >refs &&
> @@ -32,10 +34,25 @@ test_expect_success 'setup' '
>  		echo "X:$line" >>test-tool-tags || return 1
>  	done &&
>
> +	git rev-list --first-parent --max-count=$contains_ref_limit HEAD >contains-commits &&
> +	contains_ref_count=$(wc -l <contains-commits) &&
> +	test "$contains_ref_count" -gt 0 &&
> +	contains_base=$(tail -n 1 contains-commits) &&
> +	export contains_base &&
> +	awk "{
> +		printf \"update refs/contains-perf/%04d %s\\n\", NR, \$1
> +	}" contains-commits |
> +		git update-ref --stdin &&
> +	git pack-refs --include "refs/contains-perf/*" &&
> +
>  	commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
>  	git update-ref refs/heads/disjoint-base $commit &&
>
> -	git commit-graph write --reachable
> +	git commit-graph write --reachable &&
> +
> +	git for-each-ref --contains="$contains_base" \
> +		refs/contains-perf/ >actual &&
> +	test_line_count = $contains_ref_count actual

Shouldn't this be a separate test and not a part of the setup?

>  '
>
>  test_perf 'ahead-behind counts: git for-each-ref' '
> @@ -62,6 +79,11 @@ test_perf 'contains: git tag --merged' '
>  	xargs git tag --merged=HEAD <tags
>  '
>
> +test_perf 'contains: git for-each-ref --contains' '
> +	git for-each-ref --contains="$contains_base" \
> +		refs/contains-perf/ >/dev/null
> +'
> +

Ah! we also have this, so perhaps moving the `test_line_count` here and
dropping it above would be better.

>  test_perf 'is-base check: test-tool reach (refs)' '
>  	test-tool reach get_branch_base_for_tip <test-tool-refs
>  '
> diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
> index e06feb06e9..169cc70c23 100755
> --- a/t/t6301-for-each-ref-errors.sh
> +++ b/t/t6301-for-each-ref-errors.sh
> @@ -52,6 +52,24 @@ test_expect_success 'Missing objects are reported correctly' '
>  	test_must_be_empty brief-err
>  '
>
> +test_expect_success 'missing ancestors are reported by contains filters' '
> +	test_when_finished "git update-ref -d refs/heads/missing-parent" &&
> +	{
> +		echo "tree $(git rev-parse HEAD^{tree})" &&
> +		echo "parent $MISSING" &&
> +		git cat-file commit HEAD |
> +			sed -n -e "/^author /p" -e "/^committer /p" &&
> +		echo &&
> +		echo "missing parent"
> +	} > commit &&
> +	broken=$(git hash-object -t commit -w commit) &&
> +	git update-ref refs/heads/missing-parent "$broken" &&
> +	test_must_fail git for-each-ref --contains=HEAD \
> +		refs/heads/missing-parent >out 2>err &&
> +	test_must_be_empty out &&
> +	test_grep "unable to parse commit $MISSING" err
> +'
> +
>  test_expect_success 'ahead-behind requires an argument' '
>  	test_must_fail git for-each-ref \
>  		--format="%(ahead-behind)" 2>err &&
> diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> index 7f060d97bf..423505d1fb 100755
> --- a/t/t6302-for-each-ref-filter.sh
> +++ b/t/t6302-for-each-ref-filter.sh
> @@ -177,6 +177,27 @@ test_expect_success 'filtering with --contains and --no-contains' '
>  	test_cmp expect actual
>  '
>
> +test_expect_success 'contains handles cyclic replacement histories' '
> +	one=$(git rev-parse one) &&
> +	three=$(git rev-parse three) &&
> +	test_when_finished "
> +		git replace -d $one
> +		git replace -d $three
> +		git tag -d cycle-a cycle-b
> +	" &&
> +	git tag cycle-a "$one" &&
> +	git tag cycle-b "$three" &&
> +	git replace --graft "$one" "$three" two &&
> +	git replace --graft "$three" "$one" &&
> +	cat >expect <<-\EOF &&
> +	refs/tags/cycle-a
> +	refs/tags/cycle-b
> +	EOF
> +	git for-each-ref --format="%(refname)" --contains=two \
> +		"refs/tags/cycle-*" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success '%(color) must fail' '
>  	test_must_fail git for-each-ref --format="%(color)%(refname)"
>  '
> diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> index b5b314e570..1ecc2571c2 100755
> --- a/t/t6600-test-reach.sh
> +++ b/t/t6600-test-reach.sh
> @@ -286,8 +286,7 @@ test_expect_success 'commit_contains:hit' '
>  	X:commit-9-3
>  	EOF
>  	echo "commit_contains(_,A,X,_):1" >expect &&
> -	test_all_modes commit_contains &&
> -	test_all_modes commit_contains --tag
> +	test_all_modes commit_contains
>  '
>
>  test_expect_success 'commit_contains:miss' '
> @@ -303,8 +302,7 @@ test_expect_success 'commit_contains:miss' '
>  	X:commit-9-3
>  	EOF
>  	echo "commit_contains(_,A,X,_):0" >expect &&
> -	test_all_modes commit_contains &&
> -	test_all_modes commit_contains --tag
> +	test_all_modes commit_contains
>  '
>
>  test_expect_success 'rev-list: basic topo-order' '
>
> ---
> base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
> change-id: 20260607-ref-filter-memoized-contains-7cb6b3bccad1
>
> Best regards,
> --
> Tamir Duberstein <tamird@gmail.com>

Overall the patch looks great. The perf improvements are also very
welcome. Some small nits from me.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply

* Re: [PATCH 2/3] config: add GIT_CONFIG_INCLUDES
From: Derrick Stolee @ 2026-06-08 19:38 UTC (permalink / raw)
  To: Patrick Steinhardt, Derrick Stolee via GitGitGadget; +Cc: git, gitster
In-Reply-To: <aibTAOrcSvTOtv78@pks.im>

On 6/8/2026 10:34 AM, Patrick Steinhardt wrote:
> On Mon, Jun 08, 2026 at 01:57:05PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The config keys 'include.path' and 'includeIf.*' allow users to specify
>> config stored in a location outside of the typical list of config files
>> (system, global, local, etc.). For example, users who accept the risk
>> can specify helpful aliases via a file checked into the repo by pointing
>> 'include.path' to the position of that file in the working directory.
>> This is dangerous, but people do it.
> 
> Huh, I never even considered this use case. But of course, this is
> possible, even though it's quite scary.
> 
>> What becomes tricky is that this modifies all Git behavior, including
>> operations that are intended to be limited in activity or sandboxed in
>> some way. These include directives can provide surprising changes to
>> behavior, especially when expecting a specific list of allowed file
>> accesses. This could lead to failed builds, for instance.
>>
>> To allow for these user-desired features when they are running commands,
>> add a new GIT_CONFIG_INCLUDES environment variable that disables these
>> redirections of config when set to zero. This variable can be set by
>> automation, such as build tooling, to avoid these strange behaviors.
>> This could be considered a recommended option for tools executing Git
>> commands, the same as GIT_ADVICE=0.
> 
> I don't know about this part though. I could see use cases where the
> tools _should_ read the project-relative configuration. It might also be
> the case that the user may want to evaluate some includes, but not all
> of them.

True. I'm not confident that we should recommend this environment in all
cases.

> That raises the question whether we can introduce the configuration in a
> way that it allows a bit more flexibility than just "yes"/"no", like for
> example an allow-list of locations that should be evaluated. But maybe
> I'm overthinking this.
I see. So we can say "avoid including into the repository worktree" but
that will probably be incomplete.

There is room for nuance in future expansions, if we can find a creative
way to handle that nuance. For now, I think I would still want an ability
to turn the entire feature off, at least for certain tools that care.

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH 3/3] replay: offer an option to linearize the commit topology
From: Junio C Hamano @ 2026-06-08 19:29 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Johannes Schindelin
In-Reply-To: <20260608-toon-git-replay-drop-merges-v1-3-e3ee71fce7b4@iotcl.com>

Toon Claes <toon@iotcl.com> writes:

> From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
>
> One of the stated goals of git-replay(1) is to allow implementing the
> git-rebase(1) functionality on the server side.
>
> The default mode of git-rebase(1) is to act as if `--no-rebase-merges`
> was given. This mode drops merge commits instead of replaying them, and
> linearized the commit history into a sequence of the
> regular (single-parent) commits.

"linearized" -> "linearizes"?

>
> Add option `--linearize` to git-replay(1) do the same.

"do the same" -> "to do the same"?

> Co-authored-by: Toon Claes <toon@iotcl.com>

There is no sign-off by any of the authors?

> @@ -430,12 +435,20 @@ int replay_revisions(struct rev_info *revs,
>  	while ((commit = get_revision(revs))) {
>  		const struct name_decoration *decoration;
>  
> -		if (commit->parents && commit->parents->next)
> +		if (opts->linearize && (!commit->parents || commit->parents->next))
> +			; /* map current commit to the same as the previous commit */

This uses the same treatment on either root commits or merge
commits?  If this were a mistake and this wants to handle merges but
not roots, shouldn't it be more like

		if (opts->linearize && (commit->parents && commit->parents->next))
			; /* map the merge to the previous */

> +		else if (commit->parents && commit->parents->next)
>  			die(_("replaying merge commits is not supported yet!"));

And because the next one is also about merges, perhaps the early
part of this if/else if cascade can be written

		if (commit->parents && commit->parents->next) {
			/* We have a merge */
			if (!opts->linearize)
				die(_("can't replay a merge (yet)"));
			; /* map current to the previous */
		} else {
			...

wouldn't it?

If the "map current to prev" is applicable to root, any root are
mapped to the last_commit in the above, and if we saw a root as the
first thing in the loop, last_commit is NULL, we do not do anything
here, and after the if/else if/else cascade, we see last_commit is
NULL and break out of the loop.

> +		else {
> +			struct commit *to_pick = reverse ? last_commit : onto;
> +			last_commit =
> +				pick_regular_commit(revs->repo, commit,
> +						    replayed_commits, to_pick,
> +						    &merge_opt, &result,
> +						    opts->linearize ? last_commit : NULL,
> +						    reverse, opts->empty);
> +		}
>  
> -		last_commit = pick_regular_commit(revs->repo, commit, replayed_commits,
> -						  reverse ? last_commit : onto,
> -						  &merge_opt, &result, reverse, opts->empty);
>  		if (!last_commit)
>  			break;

> diff --git a/replay.h b/replay.h
> index 1851a07705..07e6fdcca3 100644
> --- a/replay.h
> +++ b/replay.h
> @@ -62,6 +62,11 @@ struct replay_revisions_options {
>  	 * Defaults to REPLAY_EMPTY_COMMIT_DROP.
>  	 */
>  	enum replay_empty_commit_action empty;
> +
> +	/*
> +	 * Whether to linearize the commits (i.e. drop merge commits).
> +	 */
> +	int linearize;
>  };
>  
>  /* This struct is used as an out-parameter by `replay_revisions()`. */
> diff --git a/t/t3650-replay-basics.sh b/t/t3650-replay-basics.sh
> index 3353bc4a4d..c781a3bb1b 100755
> --- a/t/t3650-replay-basics.sh
> +++ b/t/t3650-replay-basics.sh
> @@ -565,4 +565,26 @@ test_expect_success '--onto with --ref rejects multiple revision ranges' '
>  	test_grep "cannot be used with multiple revision ranges" err
>  '
>  
> +test_expect_success 'linearize the commit topology' '
> +	test_tick &&
> +	N=$(git commit-tree -m N -p L -p I L:) &&
> +	N=$(git commit-tree -m N-child -p $N L:) &&
> +	git update-ref refs/heads/N $N &&
> +
> +	git replay --ref-action=print --linearize \
> +		--onto A B..refs/heads/N >out &&
> +
> +	test_line_count = 1 out &&
> +	read N1 N2 N3 N4 <out &&
> +
> +	cat >expect <<-EOF &&
> +	* N-child
> +	* I
> +	* L
> +	o A
> +	EOF
> +	git log --format=%s --graph --boundary A...$N3 >actual &&
> +	test_cmp expect actual
> +'

Perhaps we would want to have a test that replays all the way down
to the root commit?

^ permalink raw reply

* Re: [PATCH] ls-files: filter pathspec before lstat
From: Tamir Duberstein @ 2026-06-08 19:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Patrick Steinhardt
In-Reply-To: <xmqqa4t5yyee.fsf@gitster.g>

On Mon, Jun 8, 2026 at 6:06 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> On Sun, Jun 7, 2026 at 11:40, Tamir Duberstein wrote:
> > show_files() checks whether each index entry is deleted or modified
> > before show_ce() applies the pathspec. prune_index() avoids most of this
> > work for pathspecs with a common directory prefix, but a top-level name
> > or leading wildcard leaves every entry to be checked.
> >
> > Match the pathspec before lstat() for the deleted and modified modes.
> > Keep the later match in show_ce() so --error-unmatch is satisfied only
> > by entries that are actually shown.
>
> Adding an extra early `match_pathspec()` check before making slow
> system calls like `lstat()` makes sense, especially when most of the
> index entries need to be skipped.  But if most of them would match,
> then we would end up doing the same match_pathspec() calls twice for
> each path, and run lstat() anyway, so you may also be able to
> construct a perf test that demonstrates a case where this approach
> is not a clear win (or even degradation), perhaps?

Yes. I added an all-matching pathspec case to p3010 and ran:

    hyperfine --warmup 0 --runs 3 \
        'git -c core.fsmonitor=false ls-files --deleted -- "*"'

On a checkout with 859,940 index entries, I ran the parent and patched
binaries in both orders:

                         parent          this commit
  parent first elapsed    56.807 s        64.618 s
               user        1.256 s         1.270 s
               system     10.633 s        11.068 s
  patched first elapsed   63.361 s        64.316 s
                user       1.238 s         1.280 s
                system    10.296 s        11.864 s

The added match costs 14-42 ms of user time in this case. Elapsed time
varies by several seconds with command order, obscuring that CPU cost.

The later match in show_ce() is reached only for entries actually found
deleted or modified. This case therefore exercises the extra match for
every index entry while still performing every lstat().

>
> > diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> > index e1a22b41b9..702c607183 100644
> > --- a/builtin/ls-files.c
> > +++ b/builtin/ls-files.c
> > @@ -450,6 +450,13 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >                       continue;
> >               if (ce_skip_worktree(ce))
> >                       continue;
> > +             /* Only entries shown by show_ce() satisfy --error-unmatch. */
> > +             if (pathspec.nr &&
> > +                 !match_pathspec(repo->index, &pathspec, fullname.buf,
> > +                                 fullname.len, max_prefix_len, NULL,
> > +                                 S_ISDIR(ce->ce_mode) ||
> > +                                 S_ISGITLINK(ce->ce_mode)))
> > +                     continue;
> >               stat_err = lstat(fullname.buf, &st);
> >               if (stat_err && (errno != ENOENT && errno != ENOTDIR))
> >                       error_errno("cannot lstat '%s'", fullname.buf);
>
> Hmph.  In the current code, because there is no such pre-filtering,
> show_ce() would unconditionally recurse into active submodules when
> told to with the "--recurse-submodules" flag, even if your pathspec
> coes not match the submodule.  With this change, such a submodule
> whose path does not match the pathspec would not even be seen by
> show_ce().  Would it cause a change in behaviour?

This path cannot affect --recurse-submodules. cmd_ls_files() rejects
--recurse-submodules together with either --deleted or --modified before
calling show_files(), and the new check is reached only for those two
modes. Cached and stage output continue to call show_ce() before the new
check. t3007 already verifies that both combinations are rejected.

Given the 60.742 s to 1.061 s improvement for a selective pathspec, I
think this small CPU cost for an all-matching pathspec is a worthwhile
tradeoff. What do you think?

Thanks for the review!
Tamir

^ permalink raw reply

* [PATCH v4 2/2] prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
From: Kristofer Karlsson via GitGitGadget @ 2026-06-08 19:10 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Kristofer Karlsson, Kristofer Karlsson,
	Kristofer Karlsson
In-Reply-To: <pull.2140.v4.git.1780945851.gitgitgadget@gmail.com>

From: Kristofer Karlsson <krka@spotify.com>

Defer the actual removal in prio_queue_get() until the next
operation.  If that next operation is a prio_queue_put(), the
removal and insertion are fused into a single replace — writing
the new element at the root and sifting it down — which avoids
a full remove-rebalance-insert cycle.

This matches the dominant usage pattern in git's commit traversal:
get a commit, then put its parents.  The first parent insertion
after each get is now a replace operation automatically.

This generalizes the lazy_queue pattern from builtin/describe.c
(introduced in 08bb69d70f) into prio_queue itself.  Three callers
independently implemented the same get+put fusion:

  - builtin/describe.c had a full lazy_queue wrapper
  - commit.c:pop_most_recent_commit() used peek+replace
  - builtin/show-branch.c:join_revs() used peek+replace

All three now collapse to plain _get() and _put(), with the data
structure handling the fusion internally.  This simplifies callers
and means every prio_queue user gets the optimization for free
without needing to implement it manually.

Remove prio_queue_replace() since no external callers remain.

Benchmarked on a 1.8M-commit monorepo (30 interleaved runs,
paired t-test, Xeon @ 2.20GHz):

Code paths that previously did eager get+put (new optimization):

  Command                       base    patched  change      p
  merge-base --all A A~1000     3828ms  3725ms   -2.69%  0.0001
  rev-list --count A~1000..A    3055ms  2986ms   -2.27%  0.0601
  log --oneline A~1000..A       3408ms  3350ms   -1.71%  0.0482

Code paths that already had manual get+put fusion (expect
neutral — the optimization moves into prio_queue but the number
of heap operations stays the same):

  Command                       base    patched  change      p
  show-branch A A~1000          9156ms  9127ms   -0.32%  0.3470
  describe (4751 revs, 81K repo) 1983ms 1963ms  -1.02%  <0.001

No regressions in any scenario.

Suggested-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
 builtin/describe.c          | 75 +++++++-----------------------
 builtin/show-branch.c       | 11 ++---
 commit.c                    | 11 +----
 prio-queue.c                | 92 ++++++++++++++++++++-----------------
 prio-queue.h                | 15 ++----
 t/unit-tests/u-prio-queue.c |  6 +--
 6 files changed, 78 insertions(+), 132 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 8e88bdeea6..64424543ef 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -251,61 +251,19 @@ static int compare_pt(const void *a_, const void *b_)
 	return 0;
 }
 
-struct lazy_queue {
-	struct prio_queue queue;
-	bool get_pending;
-};
-
-#define LAZY_QUEUE_INIT { { compare_commits_by_commit_date }, false }
-
-static void *lazy_queue_get(struct lazy_queue *queue)
-{
-	if (queue->get_pending)
-		prio_queue_get(&queue->queue);
-	else
-		queue->get_pending = true;
-	return prio_queue_peek(&queue->queue);
-}
-
-static void lazy_queue_put(struct lazy_queue *queue, void *thing)
-{
-	if (queue->get_pending)
-		prio_queue_replace(&queue->queue, thing);
-	else
-		prio_queue_put(&queue->queue, thing);
-	queue->get_pending = false;
-}
-
-static bool lazy_queue_empty(const struct lazy_queue *queue)
-{
-	return prio_queue_size(&queue->queue) == (queue->get_pending ? 1 : 0);
-}
-
-static void lazy_queue_clear(struct lazy_queue *queue)
-{
-	clear_prio_queue(&queue->queue);
-	queue->get_pending = false;
-}
-
-static unsigned long finish_depth_computation(struct lazy_queue *queue,
+static unsigned long finish_depth_computation(struct prio_queue *queue,
 					      struct possible_tag *best)
 {
 	unsigned long seen_commits = 0;
 	struct oidset unflagged = OIDSET_INIT;
-	struct commit *commit;
-	int skip = queue->get_pending ? 1 : 0;
+	struct commit *c;
 
-	prio_queue_for_each(&queue->queue, commit) {
-		if (skip) {
-			skip = 0;
-			continue;
-		}
-		if (!(commit->object.flags & best->flag_within))
-			oidset_insert(&unflagged, &commit->object.oid);
+	prio_queue_for_each(queue, c) {
+		if (!(c->object.flags & best->flag_within))
+			oidset_insert(&unflagged, &c->object.oid);
 	}
 
-	while (!lazy_queue_empty(queue)) {
-		struct commit *c = lazy_queue_get(queue);
+	while ((c = prio_queue_get(queue))) {
 		struct commit_list *parents = c->parents;
 		seen_commits++;
 		if (c->object.flags & best->flag_within) {
@@ -321,7 +279,7 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
 			repo_parse_commit(the_repository, p);
 			seen = p->object.flags & SEEN;
 			if (!seen)
-				lazy_queue_put(queue, p);
+				prio_queue_put(queue, p);
 			flag_before = p->object.flags & best->flag_within;
 			p->object.flags |= c->object.flags;
 			flag_after = p->object.flags & best->flag_within;
@@ -369,8 +327,8 @@ static void append_suffix(int depth, const struct object_id *oid, struct strbuf
 
 static void describe_commit(struct commit *cmit, struct strbuf *dst)
 {
-	struct commit *gave_up_on = NULL;
-	struct lazy_queue queue = LAZY_QUEUE_INIT;
+	struct commit *c, *gave_up_on = NULL;
+	struct prio_queue queue = { compare_commits_by_commit_date };
 	struct commit_name *n;
 	struct possible_tag all_matches[MAX_TAGS];
 	unsigned int match_cnt = 0, annotated_cnt = 0, cur_match;
@@ -412,9 +370,8 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 	}
 
 	cmit->object.flags = SEEN;
-	lazy_queue_put(&queue, cmit);
-	while (!lazy_queue_empty(&queue)) {
-		struct commit *c = lazy_queue_get(&queue);
+	prio_queue_put(&queue, cmit);
+	while ((c = prio_queue_get(&queue))) {
 		struct commit_list *parents = c->parents;
 		struct commit_name **slot;
 
@@ -448,7 +405,7 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 				t->depth++;
 		}
 		/* Stop if last remaining path already covered by best candidate(s) */
-		if (annotated_cnt && lazy_queue_empty(&queue)) {
+		if (annotated_cnt && !prio_queue_size(&queue)) {
 			int best_depth = INT_MAX;
 			unsigned best_within = 0;
 			for (cur_match = 0; cur_match < match_cnt; cur_match++) {
@@ -471,7 +428,7 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 			struct commit *p = parents->item;
 			repo_parse_commit(the_repository, p);
 			if (!(p->object.flags & SEEN))
-				lazy_queue_put(&queue, p);
+				prio_queue_put(&queue, p);
 			p->object.flags |= c->object.flags;
 			parents = parents->next;
 
@@ -486,7 +443,7 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 			strbuf_add_unique_abbrev(dst, cmit_oid, abbrev);
 			if (suffix)
 				strbuf_addstr(dst, suffix);
-			lazy_queue_clear(&queue);
+			clear_prio_queue(&queue);
 			return;
 		}
 		if (unannotated_cnt)
@@ -502,11 +459,11 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 	QSORT(all_matches, match_cnt, compare_pt);
 
 	if (gave_up_on) {
-		lazy_queue_put(&queue, gave_up_on);
+		prio_queue_put(&queue, gave_up_on);
 		seen_commits--;
 	}
 	seen_commits += finish_depth_computation(&queue, &all_matches[0]);
-	lazy_queue_clear(&queue);
+	clear_prio_queue(&queue);
 
 	if (debug) {
 		static int label_width = -1;
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 8846f2376f..2435e8aeda 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -232,12 +232,13 @@ static void join_revs(struct prio_queue *queue,
 	while ((commit = prio_queue_peek(queue))) {
 		struct commit_list *parents;
 		int still_interesting = !!interesting(queue);
-		bool get_pending = true;
 		int flags = commit->object.flags & all_mask;
 
 		if (!still_interesting && extra <= 0)
 			break;
 
+		prio_queue_get(queue);
+
 		mark_seen(commit, seen_p);
 		if ((flags & all_revs) == all_revs)
 			flags |= UNINTERESTING;
@@ -253,14 +254,8 @@ static void join_revs(struct prio_queue *queue,
 			if (mark_seen(p, seen_p) && !still_interesting)
 				extra--;
 			p->object.flags |= flags;
-			if (get_pending)
-				prio_queue_replace(queue, p);
-			else
-				prio_queue_put(queue, p);
-			get_pending = false;
+			prio_queue_put(queue, p);
 		}
-		if (get_pending)
-			prio_queue_get(queue);
 	}
 
 	/*
diff --git a/commit.c b/commit.c
index fd8723502e..976bfc4618 100644
--- a/commit.c
+++ b/commit.c
@@ -795,24 +795,17 @@ void commit_list_sort_by_date(struct commit_list **list)
 struct commit *pop_most_recent_commit(struct prio_queue *queue,
 				      unsigned int mark)
 {
-	struct commit *ret = prio_queue_peek(queue);
-	int get_pending = 1;
+	struct commit *ret = prio_queue_get(queue);
 	struct commit_list *parents = ret->parents;
 
 	while (parents) {
 		struct commit *commit = parents->item;
 		if (!repo_parse_commit(the_repository, commit) && !(commit->object.flags & mark)) {
 			commit->object.flags |= mark;
-			if (get_pending)
-				prio_queue_replace(queue, commit);
-			else
-				prio_queue_put(queue, commit);
-			get_pending = 0;
+			prio_queue_put(queue, commit);
 		}
 		parents = parents->next;
 	}
-	if (get_pending)
-		prio_queue_get(queue);
 	return ret;
 }
 
diff --git a/prio-queue.c b/prio-queue.c
index ead4faf4bb..199775d5af 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -34,12 +34,48 @@ void clear_prio_queue(struct prio_queue *queue)
 	queue->nr_ = 0;
 	queue->alloc = 0;
 	queue->insertion_ctr = 0;
+	queue->get_pending = 0;
+}
+
+static void sift_down_root(struct prio_queue *queue)
+{
+	size_t ix, child;
+
+	/* Push down the one at the root */
+	for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
+		child = ix * 2 + 1; /* left */
+		if (child + 1 < queue->nr_ &&
+		    compare(queue, child, child + 1) >= 0)
+			child++; /* use right child */
+
+		if (compare(queue, ix, child) <= 0)
+			break;
+
+		swap(queue, child, ix);
+	}
+}
+
+static inline void flush_get(struct prio_queue *queue)
+{
+	if (!queue->get_pending)
+		return;
+	queue->get_pending = 0;
+	queue->array[0] = queue->array[--queue->nr_];
+	sift_down_root(queue);
 }
 
 void prio_queue_put(struct prio_queue *queue, void *thing)
 {
 	size_t ix, parent;
 
+	if (queue->get_pending) {
+		queue->get_pending = 0;
+		queue->array[0].ctr = queue->insertion_ctr++;
+		queue->array[0].data = thing;
+		sift_down_root(queue);
+		return;
+	}
+
 	/* Append at the end */
 	ALLOC_GROW(queue->array, queue->nr_ + 1, queue->alloc);
 	queue->array[queue->nr_].ctr = queue->insertion_ctr++;
@@ -58,61 +94,33 @@ void prio_queue_put(struct prio_queue *queue, void *thing)
 	}
 }
 
-static void sift_down_root(struct prio_queue *queue)
-{
-	size_t ix, child;
-
-	/* Push down the one at the root */
-	for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
-		child = ix * 2 + 1; /* left */
-		if (child + 1 < queue->nr_ &&
-		    compare(queue, child, child + 1) >= 0)
-			child++; /* use right child */
-
-		if (compare(queue, ix, child) <= 0)
-			break;
-
-		swap(queue, child, ix);
-	}
-}
-
 void *prio_queue_get(struct prio_queue *queue)
 {
-	void *result;
-
-	if (!queue->nr_)
+	if (queue->nr_ <= queue->get_pending) {
+		queue->nr_ = 0;
+		queue->get_pending = 0;
 		return NULL;
+	}
 	if (!queue->compare)
 		return queue->array[--queue->nr_].data; /* LIFO */
 
-	result = queue->array[0].data;
-	if (!--queue->nr_)
-		return result;
+	flush_get(queue);
 
-	queue->array[0] = queue->array[queue->nr_];
-	sift_down_root(queue);
-	return result;
+	queue->get_pending = 1;
+	return queue->array[0].data;
 }
 
 void *prio_queue_peek(struct prio_queue *queue)
 {
-	if (!queue->nr_)
+	if (queue->nr_ <= queue->get_pending) {
+		queue->nr_ = 0;
+		queue->get_pending = 0;
 		return NULL;
+	}
 	if (!queue->compare)
 		return queue->array[queue->nr_ - 1].data;
-	return queue->array[0].data;
-}
 
-void prio_queue_replace(struct prio_queue *queue, void *thing)
-{
-	if (!queue->nr_) {
-		prio_queue_put(queue, thing);
-	} else if (!queue->compare) {
-		queue->array[queue->nr_ - 1].ctr = queue->insertion_ctr++;
-		queue->array[queue->nr_ - 1].data = thing;
-	} else {
-		queue->array[0].ctr = queue->insertion_ctr++;
-		queue->array[0].data = thing;
-		sift_down_root(queue);
-	}
+	flush_get(queue);
+
+	return queue->array[0].data;
 }
diff --git a/prio-queue.h b/prio-queue.h
index 7f2aa986b1..570b48e648 100644
--- a/prio-queue.h
+++ b/prio-queue.h
@@ -30,8 +30,9 @@ struct prio_queue {
 	prio_queue_compare_fn compare;
 	size_t insertion_ctr;
 	void *cb_data;
-	size_t alloc, nr_;
+	size_t alloc, nr_; /* use prio_queue_size() for logical count */
 	struct prio_queue_entry *array;
+	unsigned get_pending;
 };
 
 /*
@@ -54,22 +55,14 @@ void *prio_queue_peek(struct prio_queue *);
 
 static inline size_t prio_queue_size(const struct prio_queue *queue)
 {
-	return queue->nr_;
+	return queue->nr_ - queue->get_pending;
 }
 
 #define prio_queue_for_each(queue, it) \
-	for (size_t pq_ix_ = 0; \
+	for (size_t pq_ix_ = (queue)->get_pending; \
 	     pq_ix_ < (queue)->nr_ && ((it) = (queue)->array[pq_ix_].data, 1); \
 	     pq_ix_++)
 
-/*
- * Replace the "thing" that compares the smallest with a new "thing",
- * like prio_queue_get()+prio_queue_put() would do, but in a more
- * efficient way.  Does the same as prio_queue_put() if the queue is
- * empty.
- */
-void prio_queue_replace(struct prio_queue *queue, void *thing);
-
 void clear_prio_queue(struct prio_queue *);
 
 /* Reverse the LIFO elements */
diff --git a/t/unit-tests/u-prio-queue.c b/t/unit-tests/u-prio-queue.c
index 63e58114ae..af3e0b8598 100644
--- a/t/unit-tests/u-prio-queue.c
+++ b/t/unit-tests/u-prio-queue.c
@@ -53,13 +53,13 @@ static void test_prio_queue(int *input, size_t input_size,
 			prio_queue_reverse(&pq);
 			break;
 		case REPLACE:
-			peek = prio_queue_peek(&pq);
+			get = prio_queue_get(&pq);
 			cl_assert(i + 1 < input_size);
 			cl_assert(input[i + 1] >= 0);
 			cl_assert(j < result_size);
-			cl_assert_equal_i(result[j], show(peek));
+			cl_assert_equal_i(result[j], show(get));
 			j++;
-			prio_queue_replace(&pq, &input[++i]);
+			prio_queue_put(&pq, &input[++i]);
 			break;
 		default:
 			prio_queue_put(&pq, &input[i]);
-- 
gitgitgadget

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox