Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH RFC 2/2] builtin/history: print feedback after successful reword
From: Ben Knoble @ 2026-06-08 16:47 UTC (permalink / raw)
  To: Pablo Sabater; +Cc: Junio C Hamano, git, Patrick Steinhardt, Kaartic Sivaraam
In-Reply-To: <CAN5EUNQNj86Q+hi6PouOZNWo1T4QTQ6sE5Hs9USZXWpkTedTcw@mail.gmail.com>


> Le 8 juin 2026 à 09:29, Pablo Sabater <pabloosabaterr@gmail.com> a écrit :
> 
> El lun, 8 jun 2026 a las 14:16, Junio C Hamano (<gitster@pobox.com>) escribió:
>> 
>> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>> 
>>> Unlike `git commit --amend` and `git rebase -i`, `git history reword`
>>> doesn't print anything, this makes it feel empty for a porcelain command
>>> and hard to tell if the command did anything without using other
>>> commands like `git log <commit>` to check if the reword was done.
>>> 
>>> Print a message on successful rewords so the user has feedback about it.
>>> 
>>> Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
>>> ---
>>> builtin/history.c         |  4 ++++
>>> t/t3451-history-reword.sh | 14 ++++++++++++++
>>> 2 files changed, 18 insertions(+)
>>> 
>>> diff --git a/builtin/history.c b/builtin/history.c
>>> index 51a22a9a1c..0f1ba3b531 100644
>>> --- a/builtin/history.c
>>> +++ b/builtin/history.c
>>> @@ -739,6 +739,10 @@ static int cmd_history_reword(int argc,
>>>              goto out;
>>>      }
>>> 
>>> +     fprintf(stderr, _("Successfully reworded commit %s to %s\n"),
>>> +             repo_find_unique_abbrev(repo, &original->object.oid, DEFAULT_ABBREV),
>>> +             repo_find_unique_abbrev(repo, &rewritten->object.oid, DEFAULT_ABBREV));
>>> +
>>>      ret = 0;
>>> 
>>> out:
>> 
>> Do other commands in "git history" (split is in 'master', drop and
>> fixup are cooking) behave with similar verbosity?  Consistency within
>> the same "history" umbrella matters more than being similar with
>> other commands that can be used for similar purposes.
> 
> They do not, they are thought with the rule of silence in mind.
> However I think that this output is valuable information I might have
> explained myself better at [1] but my thought is:
> 
> git history reword aabb
> 
> Now that I have my commit aabb rewritten I want to check it again just
> to make sure I did what I wanted correctly,

Some thoughts:

- If the rewritten commit is an ancestor of HEAD, look at the log of HEAD@{1} or the log between HEAD and the aforementioned reflog entry. (git-range-diff may also be helpful there.)
- Similarly, if the rewritten commit is reachable from some ref R, check R@{1} etc. 

> but git log aabb is still
> the old commit, the rewritten one has a different hash which I do not
> know unless I search for it, if it's far from HEAD I'd have to git log
> --oneline, get the hash and then git log new_hash. I think that git
> history reword that does have the information about the new hash
> should print it to avoid this search.
> What I want is something like:
> 
> git history reword aabb
> Successfully reworded aabb to ccdd
> 
> So I can just git log ccdd without having to search.
> 
> I want to say I haven't looked as much as I'd like to split, drop and
> fixup, but I think it would be a good addition for them also. On [1]
> Patrick wrote about a --verbose for git history, I think that the
> basic information i.e. at reword which is the new hash should be
> always printed but if it's preferred it could go there.
> 
> For split it can print the hashes of the new commits like:
> "...split into ccdd and eeff."
> For fixup the commit hash also changes, so the same as reword.
> The one that will have more friction would be drop is the one that
> doesn't end up with new commits.
> 
> [1]: https://lore.kernel.org/git/CAN5EUNSAOMRvmLGVfzQiwWoOn9VGNVU5rVMZizOryn_q2fbCNA@mail.gmail.com/

^ permalink raw reply

* Re: [PATCH] worktree: record creation time and free-form note
From: Junio C Hamano @ 2026-06-08 16:59 UTC (permalink / raw)
  To: Kiesel, Norbert; +Cc: git
In-Reply-To: <CAPGaHktHLPUeSuhETwyBo+jE2fMu40jHW284PN+2oY1YJ2j0Yw@mail.gmail.com>

"Kiesel, Norbert" <norbert.kiesel@creditkarma.com> writes:

> I looked at the usage of `.git/description` and I could not find any
> usage.

GitWeb shows it.

^ permalink raw reply

* Re: [PATCH 2/2] builtin/add: use die_for_required_opt() helper
From: Junio C Hamano @ 2026-06-08 17:00 UTC (permalink / raw)
  To: Siddharth Shrimali; +Cc: git, christian.couder, toon, jn.avila
In-Reply-To: <20260603111044.39116-3-r.siddharth.shrimali@gmail.com>

Siddharth Shrimali <r.siddharth.shrimali@gmail.com> writes:

> -	if (!show_only && ignore_missing)
> -		die(_("the option '%s' requires '%s'"), "--ignore-missing", "--dry-run");
> +	die_for_required_opt(ignore_missing, "--ignore-missing", show_only, "--dry-run");

As builtin_add_options[] knows that ignore_missing (variable) comes
from the use of "--ignore-missing" (option), and similarly the value
of show_only (variable) is tightly linked to "--dry-run" (option),
it feels quite wasteful having to pass both.

I wonder if we can do this more declaratively, perhaps by
introducing extra types of elements in struct option[] that tells
"--ignore-missing" requires "--dry-run", so that the client code
does not have to do anything more than calling parse_options() to
implement this?

A possible counter-argument may be that the value of, say,
ignore_missing may be different at this point in the code from what
was set by parse_options() when the command line was processed, but
then it means that the message (with or without your patch) is
misleading, so I am not sure if that counter-argument is valid.

>  	if (chmod_arg && ((chmod_arg[0] != '-' && chmod_arg[0] != '+') ||
>  			  chmod_arg[1] != 'x' || chmod_arg[2]))
> @@ -462,6 +461,8 @@ int cmd_add(int argc,
>  		       PATHSPEC_SYMLINK_LEADING_PATH,
>  		       prefix, argv);
>  
> +	die_for_required_opt(pathspec_file_nul, "--pathspec-file-nul",
> +				!!pathspec_from_file, "--pathspec-from-file");
>  	if (pathspec_from_file) {
>  		if (pathspec.nr)
>  			die(_("'%s' and pathspec arguments cannot be used together"), "--pathspec-from-file");
> @@ -470,8 +471,6 @@ int cmd_add(int argc,
>  				    PATHSPEC_PREFER_FULL |
>  				    PATHSPEC_SYMLINK_LEADING_PATH,
>  				    prefix, pathspec_from_file, pathspec_file_nul);
> -	} else if (pathspec_file_nul) {
> -		die(_("the option '%s' requires '%s'"), "--pathspec-file-nul", "--pathspec-from-file");
>  	}
>  
>  	if (require_pathspec && pathspec.nr == 0) {

^ permalink raw reply

* Re: [PATCH v3 4/6] diff: add long-running diff process via diff.<driver>.process
From: Junio C Hamano @ 2026-06-08 17:19 UTC (permalink / raw)
  To: Michael Montalbo
  Cc: Johannes Schindelin, Michael Montalbo via GitGitGadget, git
In-Reply-To: <CAC2QwmKNA6wv-jG07fgJj7Xj2J+dzzWEiqV5Q+8HJpjA_GtkFw@mail.gmail.com>

Michael Montalbo <mmontalbo@gmail.com> writes:

>> So the conscious project direction has been: fold pkt-line test backends
>> into `test-tool` and drop the scripting-language prereq. Reintroducing the
>> same shape in Python would walk this back.
>>
>> Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
>> 2025-04-29) has been another clear sign that the Git project is actively
>> _removing_ scripting-language dependencies from the build and test
>> infrastructure, not adding new ones.
>
> Now I wonder if the extension / addition of more Perl test infra with my other
> series:
>
> https://lore.kernel.org/git/pull.2135.git.1780559158.gitgitgadget@gmail.com/T/
>
> also goes against the project direction w.r.t. removing scripting languages.
> Maybe I should also re-evaluate the usage of Perl there. I am leveraging
> existing shell parsing logic written in Perl, but maybe the preference for
> Perl-based lint rules is a mistake and should be avoided.

That sounds prudent (even though it is outside the scope of _this_
topic, of course).

Thanks.

^ permalink raw reply

* Re: [GSoC PATCH v2 1/4] path: introduce format_path() for centralized path formatting
From: Justin Tobler @ 2026-06-08 17:28 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, a3205153416, gitster, kumarayushjha123, lucasseikioshiro,
	phillip.wood, sandals
In-Reply-To: <20260605163012.181089-2-jayatheerthkulkarni2005@gmail.com>

On 26/06/05 10:00PM, K Jayatheerth wrote:
> The path-formatting logic inside `builtin/rev-parse.c` handles absolute,
> canonical, and relative formatting rules based on user-supplied options.
> However, this logic is tightly coupled to `rev-parse` and writes directly
> to stdout.
> 
> To allow other builtins (such as the upcoming `git repo` path keys) to
> re-use this logic, extract the core path-formatting algorithm into a centralized
> helper function, `format_path()`, in `path.c`.

Makes sense.

> Expose a single, streamlined `path_format` enum in `path.h` to let callers
> explicitly declare their formatting strategy (UNMODIFIED, RELATIVE,
> RELATIVE_IF_SHARED, or CANONICAL). This decouples the core algorithm from
> the localized fallback mechanics specific to `rev-parse`.

Ok, so rev-parse has its own logic to select the formatting strategy
used when printing paths that either relies on what the user provides or
a designated fallback format that is specific to the type of path. Since
that is specific to rev-parse, it makes to factor it out of the generic
helper function here.

> Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
> Mentored-by: Justin Tobler <jltobler@gmail.com>
> Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
> ---
>  path.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  path.h | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 88 insertions(+)
> 
> diff --git a/path.c b/path.c
> index d7e17bf174..2fcd24c5eb 100644
> --- a/path.c
> +++ b/path.c
> @@ -1579,6 +1579,64 @@ char *xdg_cache_home(const char *filename)
>  	return NULL;
>  }
>  
> +void format_path(struct strbuf *buf, const char *path,
> +		 const char *prefix, enum path_format format)
> +{
> +	if (format == PATH_FORMAT_UNMODIFIED) {
> +		strbuf_addstr(buf, path);
> +		return;
> +	}
> +
> +	if (format == PATH_FORMAT_RELATIVE) {

nit: we could just continue the "else if" chain here instead of
restarting it.

> +		struct strbuf relative_buf = STRBUF_INIT;
> +		struct strbuf real_path = STRBUF_INIT;
> +		struct strbuf real_prefix = STRBUF_INIT;
> +		char *cwd = NULL;
> +
> +		/*
> +		 * We don't ever produce a relative path if prefix is NULL,
> +		 * so set the prefix to the current directory so that we can
> +		 * produce a relative path whenever possible.
> +		 */
> +		if (!prefix)
> +			prefix = cwd = xgetcwd();
> +
> +		if (!is_absolute_path(path)) {
> +			strbuf_realpath_forgiving(&real_path, path, 1);
> +			path = real_path.buf;
> +		}
> +		if (!is_absolute_path(prefix)) {
> +			strbuf_realpath_forgiving(&real_prefix, prefix, 1);
> +			prefix = real_prefix.buf;
> +		}
> +
> +		strbuf_addstr(buf, relative_path(path, prefix, &relative_buf));
> +
> +		strbuf_release(&relative_buf);
> +		strbuf_release(&real_path);
> +		strbuf_release(&real_prefix);
> +		free(cwd);
> +	} else if (format == PATH_FORMAT_RELATIVE_IF_SHARED) {
> +		struct strbuf relative_buf = STRBUF_INIT;
> +
> +		/*
> +		 * If we're using RELATIVE_IF_SHARED mode, then we want an
> +		 * absolute path unless the two share a common prefix, so don't
> +		 * default the prefix to the current working directory. Doing so
> +		 * would cause a relative path to always be produced if possible.
> +		 */
> +		strbuf_addstr(buf, relative_path(path, prefix, &relative_buf));
> +		strbuf_release(&relative_buf);
> +	} else if (format == PATH_FORMAT_CANONICAL) {
> +		struct strbuf canonical_buf = STRBUF_INIT;
> +
> +		strbuf_realpath_forgiving(&canonical_buf, path, 1);
> +		strbuf_addbuf(buf, &canonical_buf);

Do we need `canonical_buf` here? Can we just add the path to `buf`
directly?

> +
> +		strbuf_release(&canonical_buf);
> +	}
> +}
> +
>  REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
>  REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
>  REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
> diff --git a/path.h b/path.h
> index 0434ba5e07..a78e0fc141 100644
> --- a/path.h
> +++ b/path.h
> @@ -262,6 +262,36 @@ enum scld_error safe_create_leading_directories_no_share(char *path);
>  int safe_create_file_with_leading_directories(struct repository *repo,
>  					      const char *path);
>  
> +/**
> + * The formatting strategy to apply when writing a path into a buffer.
> + */
> +enum path_format {
> +	/* Output the path exactly as-is without any modifications. */
> +	PATH_FORMAT_UNMODIFIED,
> +
> +	/* Output a path relative to the provided directory prefix. */
> +	PATH_FORMAT_RELATIVE,
> +
> +	/* Output a relative path only if the path shares a root with the prefix. */
> +	PATH_FORMAT_RELATIVE_IF_SHARED,
> +
> +	/* Output a fully resolved, absolute canonical path. */
> +	PATH_FORMAT_CANONICAL
> +};
> +
> +/**
> + * Format a path according to the specified formatting strategy and append
> + * the result to the given strbuf.
> + *
> + * `buf`    : The string buffer to append the formatted path to.
> + * `path`   : The path string that needs to be formatted.
> + * `prefix` : The directory prefix to calculate relative offsets against.
> + * Pass NULL to default to the current working directory where applicable.
> + * `format` : The formatting behavior rule to execute.
> + */
> +void format_path(struct strbuf *buf, const char *path,
> +		 const char *prefix, enum path_format format);
> +

Ok so in this patch we are just adding the new path formatting
interface and will integrate it in the next one. Overall the direction
of this patch looks good to me.

-Justin

^ permalink raw reply

* Re: [GSoC PATCH v2 2/4] rev-parse: use format_path for path formatting
From: Justin Tobler @ 2026-06-08 17:54 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, a3205153416, gitster, kumarayushjha123, lucasseikioshiro,
	phillip.wood, sandals
In-Reply-To: <20260605163012.181089-3-jayatheerthkulkarni2005@gmail.com>

On 26/06/05 10:00PM, K Jayatheerth wrote:
> Now that the core path-formatting logic has been abstracted into
> format_path() inside path.c, remove the localized duplicate formatting
> mechanics from builtin/rev-parse.c.
> 
> Drop the usage of the old local format_type and default_type enums,
> and update print_path() to act as a light wrapper around the new shared
> engine. Resolve user-provided formatting flags directly within rev-parse
> to pass the final determined path_format to format_path().

So if the format isn't explicitly set by the user via the
`--path-format` option, the default formatting strategy used depends on
the path being printed. IOW, there is no consistent default path format
here.

> Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
> Mentored-by: Justin Tobler <jltobler@gmail.com>
> Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
> ---
>  builtin/rev-parse.c | 103 ++++++++++----------------------------------
>  1 file changed, 23 insertions(+), 80 deletions(-)
> 
> diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
> index 218b5f34d6..c78bdc04c1 100644
> --- a/builtin/rev-parse.c
> +++ b/builtin/rev-parse.c
> @@ -632,73 +632,16 @@ static void handle_ref_opt(const char *pattern, const char *prefix)
>  	clear_ref_exclusions(&ref_excludes);
>  }
>  
> -enum format_type {
> -	/* We would like a relative path. */
> -	FORMAT_RELATIVE,
> -	/* We would like a canonical absolute path. */
> -	FORMAT_CANONICAL,
> -	/* We would like the default behavior. */
> -	FORMAT_DEFAULT,
> -};
> -
> -enum default_type {
> -	/* Our default is a relative path. */
> -	DEFAULT_RELATIVE,
> -	/* Our default is a relative path if there's a shared root. */
> -	DEFAULT_RELATIVE_IF_SHARED,
> -	/* Our default is a canonical absolute path. */
> -	DEFAULT_CANONICAL,
> -	/* Our default is not to modify the item. */
> -	DEFAULT_UNMODIFIED,
> -};
> -
> -static void print_path(const char *path, const char *prefix, enum format_type format, enum default_type def)
> +static void print_path(const char *path, const char *prefix,
> +		       int arg_path_format, enum path_format def_format)
>  {
[snip]
> +	struct strbuf sb = STRBUF_INIT;
> +	enum path_format fmt = (arg_path_format != -1) ? arg_path_format : def_format;

hmmm, so `arg_path_format` specifies what the user-provided format and
acts as a sentinel to signal there is no value provided and the fallback
format needs to be used. This feels a tad bit awkward to me.

I wonder if we should introduce a PATH_FORMAT_DEFAULT to the
`path_format` enum that maps to one of the existing enum values in
`path.c:format_path()`. Here in `print_path()`, we could then intercept
a PATH_FORMAT_DEFAULT value and override it to the specified
`def_format`. I'm not sure if this is ultimately that much better
though.

-Justin

^ permalink raw reply

* Re: inconsistent order of --diff-algorithm variants in man pages
From: Junio C Hamano @ 2026-06-08 17:56 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: git
In-Reply-To: <20260608112656.GI1082778@cventin.lip.ens-lyon.fr>

Vincent Lefevre <vincent@vinc17.net> writes:

> In Documentation/diff-algorithm-option.adoc, which is used by the
> git-blame(1) and git-diff(1) man pages:
>
> `--diff-algorithm=(patience|minimal|histogram|myers)`::
>         Choose a diff algorithm. The variants are as follows:
> +
> --
>    `default`;;
>    `myers`;;
>         The basic greedy diff algorithm. Currently, this is the default.
>    `minimal`;;
>         Spend extra time to make sure the smallest possible diff is
>         produced.
>    `patience`;;
>         Use "patience diff" algorithm when generating patches.
>    `histogram`;;
>         This algorithm extends the patience algorithm to "support
>         low-occurrence common elements".
> --
>
> I think that using the same order in the --diff-algorithm line and
> in the description that follows would be better, i.e.
>
>   --diff-algorithm=(myers|minimal|patience|histogram)
>
> FYI, the text was added in 07924d4d50e5304fb53eb60aaba8aef31d4c4e5e
> in 2013, but without any explanation on this difference.

I think this is meant to list them as equals without any precedence
or preference order, so it is understandable that nobody paid much
attention.  Until now, that is.

I agree that being consistent between these two places makes tons of
sense.  I just do not know what the right ordering should be.  When
listing a set of equals without any precedence or preference order,
the most easy to see to new readers is alphabetical, except that the
built-in default (myers) is a head above among other equals, so it
does make sense to present it first.

^ permalink raw reply

* [PATCH 0/3] Teach git-replay(1) to linearize merge commits
From: Toon Claes @ 2026-06-08 18:37 UTC (permalink / raw)
  To: git; +Cc: Toon Claes, Johannes Schindelin

As an alternative to dscho's patch series to replay merges[1], add
option to git-replay(1) to linearize merges. This mimics wath
git-rebase(1) does too with --no-rebase-merges (the default).

The first two patches do some refactoring. The third patch implements
the actual change. I was kindly helped by dscho to implement this
change.

The --linearize option is only added to git-replay(1) and not to
git-history(1) because in my opinion doesn't make much sense to do so,
but I'm happy to hear if anyone disagrees.

This series might conflict with Kristoffer's series to make
documentation changes[2], but should be trivial to resolve. And I don't
think there's a conflict with Patrick's series on adding "drop" to
git-history(1)[3].

dscho's series to replay merges[1] need a bit of rework to fit on top of
this, but I'm happy to help figuring that out.

[1]: <pull.2106.git.1778107405.gitgitgadget@gmail.com>
[2]: <V2_CV_doc_replay_config.767@msgid.xyz>
[3]: <20260603-b4-pks-history-drop-v2-0-742cb5b5176d@pks.im>

Signed-off-by: Toon Claes <toon@iotcl.com>
---
Johannes Schindelin (1):
      replay: offer an option to linearize the commit topology

Toon Claes (2):
      replay: refactor enum replay_mode into a bool
      replay: add helper to put entry into mapped_commits

 Documentation/git-replay.adoc |   5 ++
 builtin/replay.c              |   4 ++
 replay.c                      | 109 +++++++++++++++++++++++-------------------
 replay.h                      |   5 ++
 t/t3650-replay-basics.sh      |  22 +++++++++
 5 files changed, 97 insertions(+), 48 deletions(-)

---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260604-toon-git-replay-drop-merges-807fa008d395

^ permalink raw reply

* [PATCH 1/3] replay: refactor enum replay_mode into a bool
From: Toon Claes @ 2026-06-08 18:37 UTC (permalink / raw)
  To: git; +Cc: Toon Claes
In-Reply-To: <20260608-toon-git-replay-drop-merges-v1-0-e3ee71fce7b4@iotcl.com>

In 2760ee4983 (replay: add --revert mode to reverse commit changes,
2026-03-26) the enum `replay_mode` was introduced. This has two possible
values:

 - The value `REPLAY_MODE_REVERT` is used when option `--revert` is
   passed to git-replay(1). When using this value the commits are
   possible in reverse order and the inverse of the changes are applied.

 - The value `REPLAY_MODE_PICK` is used when either option `--onto` or
   `--advance` is used. In both cases the commits are pocessed in normal
   order, and the changes are applied as-is.

Since there are only two possible values of this enum, simplify the code
by converting the enum into a bool. This avoid adding code paths that
check for invalid vaues of the enum, and shortens code where the value
is checked with a ternary operator.

Signed-off-by: Toon Claes <toon@iotcl.com>
---
 replay.c | 59 +++++++++++++++++++++++++----------------------------------
 1 file changed, 25 insertions(+), 34 deletions(-)

diff --git a/replay.c b/replay.c
index 4ef8abb607..1f8e5b083b 100644
--- a/replay.c
+++ b/replay.c
@@ -18,11 +18,6 @@
  */
 #define the_repository DO_NOT_USE_THE_REPOSITORY
 
-enum replay_mode {
-	REPLAY_MODE_PICK,
-	REPLAY_MODE_REVERT,
-};
-
 static const char *short_commit_name(struct repository *repo,
 				     struct commit *commit)
 {
@@ -81,7 +76,7 @@ static struct commit *create_commit(struct repository *repo,
 				    struct tree *tree,
 				    struct commit *based_on,
 				    struct commit *parent,
-				    enum replay_mode mode)
+				    bool reverse)
 {
 	struct object_id ret;
 	struct object *obj = NULL;
@@ -98,15 +93,13 @@ static struct commit *create_commit(struct repository *repo,
 
 	commit_list_insert(parent, &parents);
 	extra = read_commit_extra_headers(based_on, exclude_gpgsig);
-	if (mode == REPLAY_MODE_REVERT) {
+	if (reverse) {
 		generate_revert_message(&msg, based_on, repo);
 		/* For revert, use current user as author (NULL = use default) */
-	} else if (mode == REPLAY_MODE_PICK) {
+	} else {
 		find_commit_subject(message, &orig_message);
 		strbuf_addstr(&msg, orig_message);
 		author = get_author(message);
-	} else {
-		BUG("unexpected replay mode %d", mode);
 	}
 	reset_ident_date();
 	if (commit_tree_extended(msg.buf, msg.len, &tree->object.oid, parents,
@@ -269,7 +262,7 @@ static struct commit *pick_regular_commit(struct repository *repo,
 					  struct commit *onto,
 					  struct merge_options *merge_opt,
 					  struct merge_result *result,
-					  enum replay_mode mode,
+					  bool reverse,
 					  enum replay_empty_commit_action empty)
 {
 	struct commit *base, *replayed_base;
@@ -287,7 +280,21 @@ static struct commit *pick_regular_commit(struct repository *repo,
 	replayed_base_tree = repo_get_commit_tree(repo, replayed_base);
 	pickme_tree = repo_get_commit_tree(repo, pickme);
 
-	if (mode == REPLAY_MODE_PICK) {
+	if (reverse) {
+		/* Revert: swap base and pickme to reverse the diff */
+		const char *pickme_name = short_commit_name(repo, pickme);
+		merge_opt->branch1 = short_commit_name(repo, replayed_base);
+		merge_opt->branch2 = xstrfmt("parent of %s", pickme_name);
+		merge_opt->ancestor = pickme_name;
+
+		merge_incore_nonrecursive(merge_opt,
+					  pickme_tree,
+					  replayed_base_tree,
+					  base_tree,
+					  result);
+
+		free((char *)merge_opt->branch2);
+	} else {
 		/* Cherry-pick: normal order */
 		merge_opt->branch1 = short_commit_name(repo, replayed_base);
 		merge_opt->branch2 = short_commit_name(repo, pickme);
@@ -303,22 +310,6 @@ static struct commit *pick_regular_commit(struct repository *repo,
 					  result);
 
 		free((char *)merge_opt->ancestor);
-	} else if (mode == REPLAY_MODE_REVERT) {
-		/* Revert: swap base and pickme to reverse the diff */
-		const char *pickme_name = short_commit_name(repo, pickme);
-		merge_opt->branch1 = short_commit_name(repo, replayed_base);
-		merge_opt->branch2 = xstrfmt("parent of %s", pickme_name);
-		merge_opt->ancestor = pickme_name;
-
-		merge_incore_nonrecursive(merge_opt,
-					  pickme_tree,
-					  replayed_base_tree,
-					  base_tree,
-					  result);
-
-		free((char *)merge_opt->branch2);
-	} else {
-		BUG("unexpected replay mode %d", mode);
 	}
 	merge_opt->ancestor = NULL;
 	merge_opt->branch2 = NULL;
@@ -341,7 +332,7 @@ static struct commit *pick_regular_commit(struct repository *repo,
 		}
 	}
 
-	return create_commit(repo, result->tree, pickme, replayed_base, mode);
+	return create_commit(repo, result->tree, pickme, replayed_base, reverse);
 }
 
 void replay_result_release(struct replay_result *result)
@@ -381,13 +372,13 @@ int replay_revisions(struct rev_info *revs,
 	char *revert;
 	const char *ref;
 	struct object_id old_oid;
-	enum replay_mode mode = REPLAY_MODE_PICK;
+	bool reverse;
 	int ret;
 
 	advance = xstrdup_or_null(opts->advance);
 	revert = xstrdup_or_null(opts->revert);
-	if (revert)
-		mode = REPLAY_MODE_REVERT;
+	reverse = !!revert;
+
 	set_up_replay_mode(revs->repo, &revs->cmdline, opts->onto,
 			   &detached_head, &advance, &revert, &onto, &update_refs);
 
@@ -430,8 +421,8 @@ int replay_revisions(struct rev_info *revs,
 			die(_("replaying merge commits is not supported yet!"));
 
 		last_commit = pick_regular_commit(revs->repo, commit, replayed_commits,
-						  mode == REPLAY_MODE_REVERT ? last_commit : onto,
-						  &merge_opt, &result, mode, opts->empty);
+						  reverse ? last_commit : onto,
+						  &merge_opt, &result, reverse, opts->empty);
 		if (!last_commit)
 			break;
 

-- 
2.53.0.1323.g189a785ab5


^ permalink raw reply related

* [PATCH 2/3] replay: add helper to put entry into mapped_commits
From: Toon Claes @ 2026-06-08 18:37 UTC (permalink / raw)
  To: git; +Cc: Toon Claes
In-Reply-To: <20260608-toon-git-replay-drop-merges-v1-0-e3ee71fce7b4@iotcl.com>

The function replay_revisions() in replay.c is rather lengthy. Extract
the logic to put commit entry into mapped_commits into a helper
function.

Signed-off-by: Toon Claes <toon@iotcl.com>
---
 replay.c | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/replay.c b/replay.c
index 1f8e5b083b..7921d7dba3 100644
--- a/replay.c
+++ b/replay.c
@@ -243,9 +243,9 @@ static void set_up_replay_mode(struct repository *repo,
 	strset_clear(&rinfo.positive_refs);
 }
 
-static struct commit *mapped_commit(kh_oid_map_t *replayed_commits,
-				    struct commit *commit,
-				    struct commit *fallback)
+static struct commit *get_mapped_commit(kh_oid_map_t *replayed_commits,
+					struct commit *commit,
+					struct commit *fallback)
 {
 	khint_t pos;
 	if (!commit)
@@ -256,6 +256,21 @@ static struct commit *mapped_commit(kh_oid_map_t *replayed_commits,
 	return kh_value(replayed_commits, pos);
 }
 
+static void put_mapped_commit(kh_oid_map_t *replayed_commits,
+			      struct commit *commit,
+			      struct commit *new_commit)
+{
+	khint_t pos;
+	int ret;
+
+	pos = kh_put_oid_map(replayed_commits, commit->object.oid, &ret);
+	if (ret == 0)
+		BUG("Duplicate rewritten commit: %s\n",
+		    oid_to_hex(&commit->object.oid));
+
+	kh_value(replayed_commits, pos) = new_commit;
+}
+
 static struct commit *pick_regular_commit(struct repository *repo,
 					  struct commit *pickme,
 					  kh_oid_map_t *replayed_commits,
@@ -276,7 +291,7 @@ static struct commit *pick_regular_commit(struct repository *repo,
 		base_tree = lookup_tree(repo, repo->hash_algo->empty_tree);
 	}
 
-	replayed_base = mapped_commit(replayed_commits, base, onto);
+	replayed_base = get_mapped_commit(replayed_commits, base, onto);
 	replayed_base_tree = repo_get_commit_tree(repo, replayed_base);
 	pickme_tree = repo_get_commit_tree(repo, pickme);
 
@@ -414,8 +429,6 @@ int replay_revisions(struct rev_info *revs,
 	replayed_commits = kh_init_oid_map();
 	while ((commit = get_revision(revs))) {
 		const struct name_decoration *decoration;
-		khint_t pos;
-		int hr;
 
 		if (commit->parents && commit->parents->next)
 			die(_("replaying merge commits is not supported yet!"));
@@ -427,11 +440,7 @@ int replay_revisions(struct rev_info *revs,
 			break;
 
 		/* Record commit -> last_commit mapping */
-		pos = kh_put_oid_map(replayed_commits, commit->object.oid, &hr);
-		if (hr == 0)
-			BUG("Duplicate rewritten commit: %s\n",
-			    oid_to_hex(&commit->object.oid));
-		kh_value(replayed_commits, pos) = last_commit;
+		put_mapped_commit(replayed_commits, commit, last_commit);
 
 		/* Update any necessary branches */
 		if (ref)

-- 
2.53.0.1323.g189a785ab5


^ permalink raw reply related

* [PATCH 3/3] replay: offer an option to linearize the commit topology
From: Toon Claes @ 2026-06-08 18:37 UTC (permalink / raw)
  To: git; +Cc: Toon Claes, Johannes Schindelin
In-Reply-To: <20260608-toon-git-replay-drop-merges-v1-0-e3ee71fce7b4@iotcl.com>

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

One of the stated goals of git-replay(1) is to allow implementing the
git-rebase(1) functionality on the server side.

The default mode of git-rebase(1) is to act as if `--no-rebase-merges`
was given. This mode drops merge commits instead of replaying them, and
linearized the commit history into a sequence of the
regular (single-parent) commits.

Add option `--linearize` to git-replay(1) do the same.

Co-authored-by: Toon Claes <toon@iotcl.com>
---
 Documentation/git-replay.adoc |  5 +++++
 builtin/replay.c              |  4 ++++
 replay.c                      | 25 +++++++++++++++++++------
 replay.h                      |  5 +++++
 t/t3650-replay-basics.sh      | 22 ++++++++++++++++++++++
 5 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-replay.adoc b/Documentation/git-replay.adoc
index a32f72aead..41c96c7061 100644
--- a/Documentation/git-replay.adoc
+++ b/Documentation/git-replay.adoc
@@ -88,6 +88,11 @@ incompatible with `--contained` (which is a modifier for `--onto` only).
 +
 The default mode can be configured via the `replay.refAction` configuration variable.
 
+--linearize::
+	In this mode, `git replay` imitates `git rebase --no-rebase-merges`,
+	i.e. it cherry-picks only non-merge commits, each one on top of the
+	previous one.
+
 <revision-range>::
 	Range of commits to replay; see "Specifying Ranges" in
 	linkgit:git-rev-parse[1]. In `--advance=<branch>` or
diff --git a/builtin/replay.c b/builtin/replay.c
index 39e3a86f6c..fedfe46dc6 100644
--- a/builtin/replay.c
+++ b/builtin/replay.c
@@ -111,6 +111,8 @@ int cmd_replay(int argc,
 			     N_("mode"),
 			     N_("control ref update behavior (update|print)"),
 			     PARSE_OPT_NONEG),
+		OPT_BOOL(0, "linearize", &opts.linearize,
+			 N_("ignore merge commits instead of replaying them")),
 		OPT_END()
 	};
 
@@ -132,6 +134,8 @@ int cmd_replay(int argc,
 				  opts.contained, "--contained");
 	die_for_incompatible_opt2(!!opts.ref, "--ref",
 				  !!opts.contained, "--contained");
+	die_for_incompatible_opt2(!!opts.revert, "--revert",
+				  opts.linearize, "--linearize");
 
 	/* Parse ref action mode from command line or config */
 	ref_mode = get_ref_action_mode(repo, ref_action);
diff --git a/replay.c b/replay.c
index 7921d7dba3..3e36908131 100644
--- a/replay.c
+++ b/replay.c
@@ -277,12 +277,16 @@ static struct commit *pick_regular_commit(struct repository *repo,
 					  struct commit *onto,
 					  struct merge_options *merge_opt,
 					  struct merge_result *result,
+					  struct commit *replayed_base,
 					  bool reverse,
 					  enum replay_empty_commit_action empty)
 {
-	struct commit *base, *replayed_base;
+	struct commit *base;
 	struct tree *pickme_tree, *base_tree, *replayed_base_tree;
 
+	if (replayed_base && reverse)
+		BUG("Linearizing commits is not supported when replaying in reverse");
+
 	if (pickme->parents) {
 		base = pickme->parents->item;
 		base_tree = repo_get_commit_tree(repo, base);
@@ -291,7 +295,8 @@ static struct commit *pick_regular_commit(struct repository *repo,
 		base_tree = lookup_tree(repo, repo->hash_algo->empty_tree);
 	}
 
-	replayed_base = get_mapped_commit(replayed_commits, base, onto);
+	if (!replayed_base)
+		replayed_base = get_mapped_commit(replayed_commits, base, onto);
 	replayed_base_tree = repo_get_commit_tree(repo, replayed_base);
 	pickme_tree = repo_get_commit_tree(repo, pickme);
 
@@ -430,12 +435,20 @@ int replay_revisions(struct rev_info *revs,
 	while ((commit = get_revision(revs))) {
 		const struct name_decoration *decoration;
 
-		if (commit->parents && commit->parents->next)
+		if (opts->linearize && (!commit->parents || commit->parents->next))
+			; /* map current commit to the same as the previous commit */
+		else if (commit->parents && commit->parents->next)
 			die(_("replaying merge commits is not supported yet!"));
+		else {
+			struct commit *to_pick = reverse ? last_commit : onto;
+			last_commit =
+				pick_regular_commit(revs->repo, commit,
+						    replayed_commits, to_pick,
+						    &merge_opt, &result,
+						    opts->linearize ? last_commit : NULL,
+						    reverse, opts->empty);
+		}
 
-		last_commit = pick_regular_commit(revs->repo, commit, replayed_commits,
-						  reverse ? last_commit : onto,
-						  &merge_opt, &result, reverse, opts->empty);
 		if (!last_commit)
 			break;
 
diff --git a/replay.h b/replay.h
index 1851a07705..07e6fdcca3 100644
--- a/replay.h
+++ b/replay.h
@@ -62,6 +62,11 @@ struct replay_revisions_options {
 	 * Defaults to REPLAY_EMPTY_COMMIT_DROP.
 	 */
 	enum replay_empty_commit_action empty;
+
+	/*
+	 * Whether to linearize the commits (i.e. drop merge commits).
+	 */
+	int linearize;
 };
 
 /* This struct is used as an out-parameter by `replay_revisions()`. */
diff --git a/t/t3650-replay-basics.sh b/t/t3650-replay-basics.sh
index 3353bc4a4d..c781a3bb1b 100755
--- a/t/t3650-replay-basics.sh
+++ b/t/t3650-replay-basics.sh
@@ -565,4 +565,26 @@ test_expect_success '--onto with --ref rejects multiple revision ranges' '
 	test_grep "cannot be used with multiple revision ranges" err
 '
 
+test_expect_success 'linearize the commit topology' '
+	test_tick &&
+	N=$(git commit-tree -m N -p L -p I L:) &&
+	N=$(git commit-tree -m N-child -p $N L:) &&
+	git update-ref refs/heads/N $N &&
+
+	git replay --ref-action=print --linearize \
+		--onto A B..refs/heads/N >out &&
+
+	test_line_count = 1 out &&
+	read N1 N2 N3 N4 <out &&
+
+	cat >expect <<-EOF &&
+	* N-child
+	* I
+	* L
+	o A
+	EOF
+	git log --format=%s --graph --boundary A...$N3 >actual &&
+	test_cmp expect actual
+'
+
 test_done

-- 
2.53.0.1323.g189a785ab5


^ permalink raw reply related

* Re: [GSoC PATCH v2 3/4] repo: add path.gitdir with absolute and relative suffix formatting
From: Justin Tobler @ 2026-06-08 18:50 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, a3205153416, gitster, kumarayushjha123, lucasseikioshiro,
	phillip.wood, sandals
In-Reply-To: <20260605163012.181089-4-jayatheerthkulkarni2005@gmail.com>

On 26/06/05 10:00PM, K Jayatheerth wrote:
> Scripts often need to locate the `.git` directory. While `git rev-parse`
> provides this, it relies on command-line flags to dictate path formatting.
> 
> Introduce `path.gitdir.absolute` and `path.gitdir.relative` keys to
> `git repo info`. Exposing separate format-specific keys instead of a base
> `path.gitdir` key avoids default fallbacks and requires callers to state
> their format requirements explicitly. Both keys use `format_path()` to
> resolve paths.

Makes sense.

> To test these keys, introduce the `test_repo_info_path` helper in
> `t/t1900-repo-info.sh`. The helper evaluates paths dynamically and accepts
> environment variable prefixes. This prepares the test suite for future path
> keys that depend on environment overrides, such as `commondir`.
> 
> Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
> Mentored-by: Justin Tobler <jltobler@gmail.com>
> Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
> ---
>  Documentation/git-repo.adoc |  6 ++++++
>  builtin/repo.c              | 26 ++++++++++++++++++++++++++
>  t/t1900-repo-info.sh        | 33 +++++++++++++++++++++++++++++++++
>  3 files changed, 65 insertions(+)
> 
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 42262c1983..a0dca7ce88 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -104,6 +104,12 @@ values that they return:
>  `object.format`::
>  	The object format (hash algorithm) used in the repository.
>  
> +`path.gitdir.absolute`::
> +	The canonical absolute path to the Git repository directory (the `.git` directory).
> +
> +`path.gitdir.relative`::
> +	The path to the Git repository directory relative to the current working directory.
> +
>  `references.format`::
>  	The reference storage format. The valid values are:
>  +
> diff --git a/builtin/repo.c b/builtin/repo.c
> index 71a5c1c29c..6e97f6a0e4 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -7,12 +7,14 @@
>  #include "hex.h"
>  #include "odb.h"
>  #include "parse-options.h"
> +#include "path.h"
>  #include "path-walk.h"
>  #include "progress.h"
>  #include "quote.h"
>  #include "ref-filter.h"
>  #include "refs.h"
>  #include "revision.h"
> +#include "setup.h"
>  #include "strbuf.h"
>  #include "string-list.h"
>  #include "shallow.h"
> @@ -75,6 +77,28 @@ static int get_object_format(struct repository *repo, struct strbuf *buf)
>  	return 0;
>  }
>  
> +static int get_path_gitdir_absolute(struct repository *repo, struct strbuf *buf)
> +{
> +	const char *git_dir = repo_get_git_dir(repo);
> +
> +	if (!git_dir)
> +		return error(_("unable to get git directory"));
> +
> +	format_path(buf, git_dir, startup_info->prefix, PATH_FORMAT_CANONICAL);

For absolute paths, I don't think we actually need the prefix, but
providing it doesn't probably matter too much either way.

> +	return 0;
> +}
> +
> +static int get_path_gitdir_relative(struct repository *repo, struct strbuf *buf)
> +{
> +	const char *git_dir = repo_get_git_dir(repo);
> +
> +	if (!git_dir)
> +		return error(_("unable to get git directory"));
> +
> +	format_path(buf, git_dir, startup_info->prefix, PATH_FORMAT_RELATIVE);
> +	return 0;
> +}

Looks good.

> +
>  static int get_references_format(struct repository *repo, struct strbuf *buf)
>  {
>  	strbuf_addstr(buf,
> @@ -87,6 +111,8 @@ static const struct repo_info_field repo_info_field[] = {
>  	{ "layout.bare", get_layout_bare },
>  	{ "layout.shallow", get_layout_shallow },
>  	{ "object.format", get_object_format },
> +	{ "path.gitdir.absolute", get_path_gitdir_absolute },
> +	{ "path.gitdir.relative", get_path_gitdir_relative },
>  	{ "references.format", get_references_format },
>  };
>  
> diff --git a/t/t1900-repo-info.sh b/t/t1900-repo-info.sh
> index 39bb77dda0..0660b00bbc 100755
> --- a/t/t1900-repo-info.sh
> +++ b/t/t1900-repo-info.sh
> @@ -155,4 +155,37 @@ test_expect_success 'git repo info -h shows only repo info usage' '
>  	test_grep ! "git repo structure" actual
>  '
>  
> +test_repo_info_path () {
> +	field_name=$1
> +	expect_absolute_eval=$2
> +	expect_relative=$3
> +	env_prefix=$4

nit: I was a bit uncertain regarding the purpose of env_prefix here.
Since the env_prefix is not used by any tests yet, I wonder if it we
should delay adding it until the next patch. If we want to reduce churn
though, I think we could also swap the order of patch 3 and 4.

> +
> +	test_expect_success "query individual key: path.$field_name.absolute${env_prefix:+ ($env_prefix)}" '
> +		(
> +			cd test-repo/sub &&
> +			expect_absolute=$(eval "$expect_absolute_eval") &&

Can we just compute `expect_absolute` prior to passing it instead of
using eval here?

> +			echo "path.$field_name.absolute=$expect_absolute" >expect &&
> +			eval "${env_prefix:+$env_prefix }git repo info \"path.$field_name.absolute\"" >actual &&
> +			test_cmp expect actual
> +		)
> +	'
> +
> +	test_expect_success "query individual key: path.$field_name.relative${env_prefix:+ ($env_prefix)}" '
> +		(
> +			cd test-repo/sub &&
> +			echo "path.$field_name.relative=$expect_relative" >expect &&
> +			eval "${env_prefix:+$env_prefix }git repo info \"path.$field_name.relative\"" >actual &&
> +			test_cmp expect actual
> +		)
> +	'
> +}
> +
> +test_expect_success 'setup test repository layout for path fields' '
> +	git init test-repo &&
> +	mkdir -p test-repo/sub
> +'
> +
> +test_repo_info_path 'gitdir' 'echo "$(cd .. && pwd)/.git"' '../.git'

hmmm, do we expect the path suffix to be the same between relative and
absolute paths for all test cases? If so, we could just have a single
`expect_path_suffix` argument and let the helper compute the appropriate
absolute and relative paths internally.

-Justin

^ permalink raw reply

* [PATCH v4 0/2] prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
From: Kristofer Karlsson via GitGitGadget @ 2026-06-08 19:10 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Kristofer Karlsson, Kristofer Karlsson
In-Reply-To: <pull.2140.v3.git.1780832592.gitgitgadget@gmail.com>

Rene's lazy_queue wrapper in describe.c was a clever optimization -- by
deferring the get, a following put becomes a simple replace, avoiding a full
remove-rebalance-insert cycle.

It turns out this pattern is so common in git's traversal code that it makes
sense to fold it into prio_queue itself. Gets and puts are interleaved in
virtually every commit walk, so the fusion is essentially always a win.

This is mostly a code simplification -- three callers had independently
reimplemented the same optimization, and they all collapse to plain get+put
now. The 1.7-2.7% speedup on traversal-heavy workloads is a nice bonus.

More details and benchmark numbers in the commit message.

Related to but independent of the cascade sift-down work in
kk/prio-queue-cascade-sift -- the two can land in either order.

Changes in v4:

 * Thanks Junio for review, applied all suggestions.

 * Renamed .nr_internal to .nr_

 * Restored flush_get() as a static inline helper instead of inlining the
   flush logic into get() and peek().

 * Guard empty-queue check with nr_ <= get_pending.

 * Flipped commit order: the rename/accessor commit is now first, and the
   behavioral fusion change is second. This was partly messy -- the first
   rename commit introduces some ugly intermediate code (e.g. describe.c's
   prio_queue_for_each with a skip variable) that gets cleaned up in commit
   2 when the lazy get makes it unnecessary.

Changes in v3:

 * Adopted Rene's suggestion to move the flush logic below the LIFO
   early-return (LIFO mode never sets get_pending, so flushing there is a
   no-op).

 * Went a step further and inlined the flush logic directly into get() and
   peek(), eliminating the flush_get() helper and its forward declaration of
   sift_down_root().

 * Updated benchmark numbers with more rigorous methodology: 30 interleaved
   runs with paired t-test on an idle server. Split results into code paths
   that already had manual fusion (neutral) vs code paths that benefit from
   the new automatic fusion (1.7-2.7% improvement).

Changes in v2:

 * Added a second commit that renames .nr to .nr_internal so that direct
   access from outside prio-queue.c is a compile error. Verified that after
   the rename, only prio-queue.c references nr_internal.

 * Added prio_queue_for_each() macro for callers that need to walk all
   elements (describe.c, show-branch.c, commit-reach.c, revision.c,
   negotiator/skipping.c).

 * Converted remaining .nr loop conditions to use
   prio_queue_get()/prio_queue_peek() as the loop condition, or
   prio_queue_size() where get/peek isn't suitable.

 * Fixed several callers missed in v1 (object-name.c, fetch-pack.c,
   path-walk.c, pack-bitmap-write.c, negotiator/default.c,
   negotiator/skipping.c, revision.c, builtin/last-modified.c).

Kristofer Karlsson (2):
  prio-queue: rename .nr to .nr_ and add accessor helpers
  prio-queue: fold lazy_queue into prio_queue for automatic get+put
    fusion

 builtin/describe.c          |  70 ++++++-----------------
 builtin/last-modified.c     |   7 +--
 builtin/show-branch.c       |  24 +++-----
 commit-reach.c              |  14 ++---
 commit.c                    |  11 +---
 fetch-pack.c                |   4 +-
 negotiator/default.c        |   4 +-
 negotiator/skipping.c       |  12 ++--
 object-name.c               |   2 +-
 pack-bitmap-write.c         |  10 ++--
 path-walk.c                 |   8 +--
 prio-queue.c                | 110 +++++++++++++++++++-----------------
 prio-queue.h                |  19 ++++---
 revision.c                  |  16 +++---
 t/unit-tests/u-prio-queue.c |   6 +-
 walker.c                    |   4 +-
 16 files changed, 141 insertions(+), 180 deletions(-)


base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2140%2Fspkrka%2Flazy-prio-queue-pr-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2140/spkrka/lazy-prio-queue-pr-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/2140

Range-diff vs v3:

 2:  033215e304 ! 1:  d0f2294661 prio-queue: rename .nr to .nr_internal to prevent direct access
     @@ Metadata
      Author: Kristofer Karlsson <krka@spotify.com>
      
       ## Commit message ##
     -    prio-queue: rename .nr to .nr_internal to prevent direct access
     +    prio-queue: rename .nr to .nr_ and add accessor helpers
      
     -    Rename the .nr member to .nr_internal so that callers outside
     -    prio-queue.c that directly reference .nr get a compilation error.
     -    This catches both existing misuse and future in-flight topics.
     +    Rename the .nr member to .nr_ so that callers outside prio-queue.c
     +    that directly reference .nr get a compilation error.  This catches
     +    both existing misuse and future in-flight topics.
      
     -    Add prio_queue_for_each() macro for callers that need to walk all
     -    elements in the queue, accounting for the get_pending offset.
     +    Add prio_queue_size() for callers that need to know the element count
     +    and prio_queue_for_each() for callers that need to walk all elements.
      
          Convert all external .nr users:
           - Loop conditions: use prio_queue_size(), prio_queue_get(), or
     @@ Commit message
          Signed-off-by: Kristofer Karlsson <krka@spotify.com>
      
       ## builtin/describe.c ##
     -@@ builtin/describe.c: static unsigned long finish_depth_computation(struct prio_queue *queue,
     - 	struct oidset unflagged = OIDSET_INIT;
     - 	struct commit *c;
     +@@ builtin/describe.c: static void lazy_queue_put(struct lazy_queue *queue, void *thing)
       
     --	for (size_t i = queue->get_pending; i < queue->nr; i++) {
     --		struct commit *commit = queue->array[i].data;
     --		if (!(commit->object.flags & best->flag_within))
     --			oidset_insert(&unflagged, &commit->object.oid);
     -+	prio_queue_for_each(queue, c) {
     -+		if (!(c->object.flags & best->flag_within))
     -+			oidset_insert(&unflagged, &c->object.oid);
     - 	}
     + static bool lazy_queue_empty(const struct lazy_queue *queue)
     + {
     +-	return queue->queue.nr == (queue->get_pending ? 1 : 0);
     ++	return prio_queue_size(&queue->queue) == (queue->get_pending ? 1 : 0);
     + }
       
     - 	while ((c = prio_queue_get(queue))) {
     + static void lazy_queue_clear(struct lazy_queue *queue)
     +@@ builtin/describe.c: static unsigned long finish_depth_computation(struct lazy_queue *queue,
     + {
     + 	unsigned long seen_commits = 0;
     + 	struct oidset unflagged = OIDSET_INIT;
     ++	struct commit *commit;
     ++	int skip = queue->get_pending ? 1 : 0;
     + 
     +-	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
     +-		struct commit *commit = queue->queue.array[i].data;
     ++	prio_queue_for_each(&queue->queue, commit) {
     ++		if (skip) {
     ++			skip = 0;
     ++			continue;
     ++		}
     + 		if (!(commit->object.flags & best->flag_within))
     + 			oidset_insert(&unflagged, &commit->object.oid);
     + 	}
      
       ## builtin/last-modified.c ##
      @@ builtin/last-modified.c: static void process_parent(struct last_modified *lm,
       static int last_modified_run(struct last_modified *lm)
       {
       	int max_count, queue_popped = 0;
     --	struct commit *c;
      +	struct commit *c, *n;
       	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
       	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
       	struct commit_list *list;
     +@@ builtin/last-modified.c: static int last_modified_run(struct last_modified *lm)
     + 		}
     + 	}
     + 
     +-	while (queue.nr) {
     ++	while ((c = prio_queue_get(&queue))) {
     + 		int parent_i;
     + 		struct commit_list *p;
     +-		struct commit *c = prio_queue_get(&queue);
     + 		struct bitmap *active_c = active_paths_for(lm, c);
     + 
     + 		if ((0 <= max_count && max_count < ++queue_popped) ||
      @@ builtin/last-modified.c: static int last_modified_run(struct last_modified *lm)
       		 */
       		repo_parse_commit(lm->rev.repo, c);
     @@ builtin/show-branch.c: static const char *get_color_reset_code(void)
       
       static struct commit *interesting(struct prio_queue *queue)
       {
     --	for (size_t i = queue->get_pending; i < queue->nr; i++) {
     +-	for (size_t i = 0; i < queue->nr; i++) {
      -		struct commit *commit = queue->array[i].data;
      -		if (commit->object.flags & UNINTERESTING)
      -			continue;
     @@ builtin/show-branch.c: static const char *get_color_reset_code(void)
       	}
       	return NULL;
       }
     +@@ builtin/show-branch.c: static void join_revs(struct prio_queue *queue,
     + {
     + 	int all_mask = ((1u << (REV_SHIFT + num_rev)) - 1);
     + 	int all_revs = all_mask & ~((1u << REV_SHIFT) - 1);
     ++	struct commit *commit;
     + 
     +-	while (queue->nr) {
     ++	while ((commit = prio_queue_peek(queue))) {
     + 		struct commit_list *parents;
     + 		int still_interesting = !!interesting(queue);
     +-		struct commit *commit = prio_queue_peek(queue);
     + 		bool get_pending = true;
     + 		int flags = commit->object.flags & all_mask;
     + 
      
       ## commit-reach.c ##
      @@ commit-reach.c: static int compare_commits_by_gen(const void *_a, const void *_b)
     @@ commit-reach.c: static int compare_commits_by_gen(const void *_a, const void *_b
       			return 1;
       	}
      @@ commit-reach.c: void ahead_behind(struct repository *r,
     - 		  struct commit **commits, size_t commits_nr,
       		  struct ahead_behind_count *counts, size_t counts_nr)
       {
     -+	struct commit *c;
       	struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
     ++	void *entry;
       	size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD);
       
     -@@ commit-reach.c: void ahead_behind(struct repository *r,
     - 	init_bit_arrays(&bit_arrays);
     - 
     - 	for (size_t i = 0; i < commits_nr; i++) {
     --		struct commit *c = commits[i];
     --		struct bitmap *bitmap = get_bit_array(c, width);
     -+		struct bitmap *bitmap;
     -+		c = commits[i];
     -+		bitmap = get_bit_array(c, width);
     - 
     - 		bitmap_set(bitmap, i);
     - 		insert_no_dup(&queue, c);
     - 	}
     - 
     - 	while (queue_has_nonstale(&queue)) {
     --		struct commit *c = prio_queue_get(&queue);
     - 		struct commit_list *p;
     --		struct bitmap *bitmap_c = get_bit_array(c, width);
     -+		struct bitmap *bitmap_c;
     -+		c = prio_queue_get(&queue);
     -+		bitmap_c = get_bit_array(c, width);
     - 
     - 		for (size_t i = 0; i < counts_nr; i++) {
     - 			int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
     + 	if (!commits_nr || !counts_nr)
      @@ commit-reach.c: void ahead_behind(struct repository *r,
       
       	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
       	repo_clear_commit_marks(r, PARENT2 | STALE);
      -	for (size_t i = 0; i < queue.nr; i++)
      -		free_bit_array(queue.array[i].data);
     -+	prio_queue_for_each(&queue, c)
     -+		free_bit_array(c);
     ++	prio_queue_for_each(&queue, entry)
     ++		free_bit_array(entry);
       	clear_bit_arrays(&bit_arrays);
       	clear_prio_queue(&queue);
       }
     +@@ commit-reach.c: int get_branch_base_for_tip(struct repository *r,
     + 			    size_t bases_nr)
     + {
     + 	int best_index = -1;
     +-	struct commit *branch_point = NULL;
     ++	struct commit *c, *branch_point = NULL;
     + 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
     + 	int found_missing_gen = 0;
     + 
     +@@ commit-reach.c: int get_branch_base_for_tip(struct repository *r,
     + 		prio_queue_put(&queue, c);
     + 	}
     + 
     +-	while (queue.nr) {
     +-		struct commit *c = prio_queue_get(&queue);
     ++	while ((c = prio_queue_get(&queue))) {
     + 		int best_for_c = get_best(c);
     + 		int best_for_p, positive;
     + 		struct commit *parent;
      
       ## fetch-pack.c ##
      @@ fetch-pack.c: static int mark_complete_oid(const struct reference *ref, void *cb_data UNUSED)
     @@ object-name.c: static int get_oid_oneline(struct repository *r,
      
       ## pack-bitmap-write.c ##
      @@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
     + 			      struct bitmap_index *old_bitmap,
       			      const uint32_t *mapping)
       {
     - 	struct commit *c;
     ++	struct commit *c;
      +	struct tree *tree;
       	int found;
       	uint32_t pos;
       	if (!ent->bitmap)
      @@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
     + 
     + 	prio_queue_put(queue, commit);
     + 
     +-	while (queue->nr) {
     ++	while ((c = prio_queue_get(queue))) {
     + 		struct commit_list *p;
     +-		struct commit *c = prio_queue_get(queue);
     + 
     + 		if (old_bitmap && mapping) {
     + 			struct ewah_bitmap *old;
     +@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
       		}
       	}
       
     @@ prio-queue.c: void prio_queue_reverse(struct prio_queue *queue)
       	if (queue->compare)
       		BUG("prio_queue_reverse() on non-LIFO queue");
      -	if (!queue->nr)
     -+	if (!queue->nr_internal)
     ++	if (!queue->nr_)
       		return;
      -	for (i = 0; i < (j = (queue->nr - 1) - i); i++)
     -+	for (i = 0; i < (j = (queue->nr_internal - 1) - i); i++)
     ++	for (i = 0; i < (j = (queue->nr_ - 1) - i); i++)
       		swap(queue, i, j);
       }
       
     @@ prio-queue.c: void prio_queue_reverse(struct prio_queue *queue)
       {
       	FREE_AND_NULL(queue->array);
      -	queue->nr = 0;
     -+	queue->nr_internal = 0;
     ++	queue->nr_ = 0;
       	queue->alloc = 0;
       	queue->insertion_ctr = 0;
     - 	queue->get_pending = 0;
     -@@ prio-queue.c: static void sift_down_root(struct prio_queue *queue)
     - {
     - 	size_t ix, child;
     - 
     --	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
     --		child = ix * 2 + 1;
     --		if (child + 1 < queue->nr &&
     -+	/* Push down the one at the root */
     -+	for (ix = 0; ix * 2 + 1 < queue->nr_internal; ix = child) {
     -+		child = ix * 2 + 1; /* left */
     -+		if (child + 1 < queue->nr_internal &&
     - 		    compare(queue, child, child + 1) >= 0)
     --			child++;
     -+			child++; /* use right child */
     -+
     - 		if (compare(queue, ix, child) <= 0)
     - 			break;
     -+
     - 		swap(queue, child, ix);
     - 	}
       }
      @@ prio-queue.c: void prio_queue_put(struct prio_queue *queue, void *thing)
     - 		return;
     - 	}
     + 	size_t ix, parent;
       
     + 	/* Append at the end */
      -	ALLOC_GROW(queue->array, queue->nr + 1, queue->alloc);
      -	queue->array[queue->nr].ctr = queue->insertion_ctr++;
      -	queue->array[queue->nr].data = thing;
      -	queue->nr++;
     -+	/* Append at the end */
     -+	ALLOC_GROW(queue->array, queue->nr_internal + 1, queue->alloc);
     -+	queue->array[queue->nr_internal].ctr = queue->insertion_ctr++;
     -+	queue->array[queue->nr_internal].data = thing;
     -+	queue->nr_internal++;
     ++	ALLOC_GROW(queue->array, queue->nr_ + 1, queue->alloc);
     ++	queue->array[queue->nr_].ctr = queue->insertion_ctr++;
     ++	queue->array[queue->nr_].data = thing;
     ++	queue->nr_++;
       	if (!queue->compare)
     --		return;
     -+		return; /* LIFO */
     + 		return; /* LIFO */
       
     + 	/* Bubble up the new one */
      -	for (ix = queue->nr - 1; ix; ix = parent) {
     -+	/* Bubble up the new one */
     -+	for (ix = queue->nr_internal - 1; ix; ix = parent) {
     ++	for (ix = queue->nr_ - 1; ix; ix = parent) {
       		parent = (ix - 1) / 2;
       		if (compare(queue, parent, ix) <= 0)
       			break;
     -+
     - 		swap(queue, parent, ix);
     - 	}
     - }
     +@@ prio-queue.c: static void sift_down_root(struct prio_queue *queue)
     + 	size_t ix, child;
     + 
     + 	/* Push down the one at the root */
     +-	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
     ++	for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
     + 		child = ix * 2 + 1; /* left */
     +-		if (child + 1 < queue->nr &&
     ++		if (child + 1 < queue->nr_ &&
     + 		    compare(queue, child, child + 1) >= 0)
     + 			child++; /* use right child */
       
     - void *prio_queue_get(struct prio_queue *queue)
     +@@ prio-queue.c: void *prio_queue_get(struct prio_queue *queue)
       {
     + 	void *result;
     + 
      -	if (!queue->nr)
     -+	if (!queue->nr_internal)
     ++	if (!queue->nr_)
       		return NULL;
       	if (!queue->compare)
     --		return queue->array[--queue->nr].data;
     -+		return queue->array[--queue->nr_internal].data; /* LIFO */
     - 
     - 	if (queue->get_pending) {
     --		if (!--queue->nr) {
     -+		if (!--queue->nr_internal) {
     - 			queue->get_pending = 0;
     - 			return NULL;
     - 		}
     --		queue->array[0] = queue->array[queue->nr];
     -+		queue->array[0] = queue->array[queue->nr_internal];
     - 		sift_down_root(queue);
     - 	}
     - 
     -@@ prio-queue.c: void *prio_queue_get(struct prio_queue *queue)
     +-		return queue->array[--queue->nr].data; /* LIFO */
     ++		return queue->array[--queue->nr_].data; /* LIFO */
     + 
     + 	result = queue->array[0].data;
     +-	if (!--queue->nr)
     ++	if (!--queue->nr_)
     + 		return result;
     + 
     +-	queue->array[0] = queue->array[queue->nr];
     ++	queue->array[0] = queue->array[queue->nr_];
     + 	sift_down_root(queue);
     + 	return result;
     + }
       
       void *prio_queue_peek(struct prio_queue *queue)
       {
      -	if (!queue->nr)
     -+	if (!queue->nr_internal)
     ++	if (!queue->nr_)
       		return NULL;
       	if (!queue->compare)
      -		return queue->array[queue->nr - 1].data;
     -+		return queue->array[queue->nr_internal - 1].data;
     - 
     - 	if (queue->get_pending) {
     - 		queue->get_pending = 0;
     --		if (!--queue->nr)
     -+		if (!--queue->nr_internal)
     - 			return NULL;
     --		queue->array[0] = queue->array[queue->nr];
     -+		queue->array[0] = queue->array[queue->nr_internal];
     - 		sift_down_root(queue);
     - 	}
     ++		return queue->array[queue->nr_ - 1].data;
     + 	return queue->array[0].data;
     + }
       
     + void prio_queue_replace(struct prio_queue *queue, void *thing)
     + {
     +-	if (!queue->nr) {
     ++	if (!queue->nr_) {
     + 		prio_queue_put(queue, thing);
     + 	} else if (!queue->compare) {
     +-		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
     +-		queue->array[queue->nr - 1].data = thing;
     ++		queue->array[queue->nr_ - 1].ctr = queue->insertion_ctr++;
     ++		queue->array[queue->nr_ - 1].data = thing;
     + 	} else {
     + 		queue->array[0].ctr = queue->insertion_ctr++;
     + 		queue->array[0].data = thing;
      
       ## prio-queue.h ##
      @@ prio-queue.h: struct prio_queue {
     @@ prio-queue.h: struct prio_queue {
       	size_t insertion_ctr;
       	void *cb_data;
      -	size_t alloc, nr;
     -+	size_t alloc, nr_internal; /* use prio_queue_size() for logical count */
     ++	size_t alloc, nr_;
       	struct prio_queue_entry *array;
     - 	unsigned get_pending;
       };
     -@@ prio-queue.h: void *prio_queue_peek(struct prio_queue *);
       
     - static inline size_t prio_queue_size(struct prio_queue *queue)
     - {
     --	return queue->nr - queue->get_pending;
     -+	return queue->nr_internal - queue->get_pending;
     - }
     +@@ prio-queue.h: void *prio_queue_get(struct prio_queue *);
     +  */
     + void *prio_queue_peek(struct prio_queue *);
       
     ++static inline size_t prio_queue_size(const struct prio_queue *queue)
     ++{
     ++	return queue->nr_;
     ++}
     ++
      +#define prio_queue_for_each(queue, it) \
     -+	for (size_t pq_ix_ = (queue)->get_pending; \
     -+	     pq_ix_ < (queue)->nr_internal && ((it) = (queue)->array[pq_ix_].data, 1); \
     ++	for (size_t pq_ix_ = 0; \
     ++	     pq_ix_ < (queue)->nr_ && ((it) = (queue)->array[pq_ix_].data, 1); \
      +	     pq_ix_++)
      +
     - void clear_prio_queue(struct prio_queue *);
     - 
     - /* Reverse the LIFO elements */
     + /*
     +  * Replace the "thing" that compares the smallest with a new "thing",
     +  * like prio_queue_get()+prio_queue_put() would do, but in a more
      
       ## revision.c ##
      @@ revision.c: static struct commit *handle_commit(struct rev_info *revs,
     @@ revision.c: static struct commit *handle_commit(struct rev_info *revs,
       		if (commit->object.flags & UNINTERESTING)
       			continue;
       
     +@@ revision.c: static int limit_list(struct rev_info *revs)
     + 	struct commit_list *original_list = revs->commits;
     + 	struct commit_list *newlist = NULL;
     + 	struct commit_list **p = &newlist;
     +-	struct commit *interesting_cache = NULL;
     ++	struct commit *commit, *interesting_cache = NULL;
     + 	struct prio_queue queue = { .compare = compare_commits_by_commit_date };
     + 
     + 	if (revs->ancestry_path_implicit_bottoms) {
     +@@ revision.c: static int limit_list(struct rev_info *revs)
     + 		prio_queue_put(&queue, commit);
     + 	}
     + 
     +-	while (queue.nr) {
     +-		struct commit *commit = prio_queue_get(&queue);
     ++	while ((commit = prio_queue_get(&queue))) {
     + 		struct object *obj = &commit->object;
     + 
     + 		if (commit == interesting_cache)
      @@ revision.c: static enum rewrite_result rewrite_one_1(struct rev_info *revs,
       
       static void merge_queue_into_list(struct prio_queue *q, struct commit_list **list)
     @@ revision.c: static enum rewrite_result rewrite_one_1(struct rev_info *revs,
       		struct commit_list *p = *list;
       
       		if (p && p->item->date >= item->date)
     +
     + ## walker.c ##
     +@@ walker.c: static struct prio_queue complete = { compare_commits_by_commit_date };
     + static int process_commit(struct walker *walker, struct commit *commit)
     + {
     + 	struct commit_list *parents;
     ++	struct commit *item;
     + 
     + 	if (repo_parse_commit(the_repository, commit))
     + 		return -1;
     + 
     +-	while (complete.nr) {
     +-		struct commit *item = prio_queue_peek(&complete);
     ++	while ((item = prio_queue_peek(&complete))) {
     + 		if (item->date < commit->date)
     + 			break;
     + 		pop_most_recent_commit(&complete, COMPLETE);
 1:  e882206d29 ! 2:  a3f4cb57f2 prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
     @@ Commit message
      
          Defer the actual removal in prio_queue_get() until the next
          operation.  If that next operation is a prio_queue_put(), the
     -    removal and insertion are fused into a single replace, writing
     -    the new element at the root and sifting it down which avoids
     +    removal and insertion are fused into a single replace — writing
     +    the new element at the root and sifting it down — which avoids
          a full remove-rebalance-insert cycle.
      
          This matches the dominant usage pattern in git's commit traversal:
     -    get a commit, then put its parents. The first parent insertion
     +    get a commit, then put its parents.  The first parent insertion
          after each get is now a replace operation automatically.
      
          This generalizes the lazy_queue pattern from builtin/describe.c
     -    (introduced in 08bb69d70f) into prio_queue itself. Three callers
     +    (introduced in 08bb69d70f) into prio_queue itself.  Three callers
          independently implemented the same get+put fusion:
      
            - builtin/describe.c had a full lazy_queue wrapper
     @@ Commit message
            - builtin/show-branch.c:join_revs() used peek+replace
      
          All three now collapse to plain _get() and _put(), with the data
     -    structure handling the fusion internally. This simplifies callers
     +    structure handling the fusion internally.  This simplifies callers
          and means every prio_queue user gets the optimization for free
          without needing to implement it manually.
      
          Remove prio_queue_replace() since no external callers remain.
     -    Add prio_queue_size() for callers that need the logical element
     -    count, since the physical nr may temporarily include a
     -    pending-removal element.
      
     -    Benchmarked on a large monorepo (30 interleaved runs,
     +    Benchmarked on a 1.8M-commit monorepo (30 interleaved runs,
          paired t-test, Xeon @ 2.20GHz):
      
          Code paths that previously did eager get+put (new optimization):
      
     -      Command                       base    patched  change  p
     +      Command                       base    patched  change      p
            merge-base --all A A~1000     3828ms  3725ms   -2.69%  0.0001
            rev-list --count A~1000..A    3055ms  2986ms   -2.27%  0.0601
            log --oneline A~1000..A       3408ms  3350ms   -1.71%  0.0482
      
          Code paths that already had manual get+put fusion (expect
     -    neutral - the optimization moves into prio_queue but the number
     +    neutral — the optimization moves into prio_queue but the number
          of heap operations stays the same):
      
     -      Command                base   patched  change  p
     -      show-branch A A~1000   9156ms  9127ms  -0.32%  0.3470
     -      describe (git.git)     1983ms  1963ms  -1.02%  <0.001
     +      Command                       base    patched  change      p
     +      show-branch A A~1000          9156ms  9127ms   -0.32%  0.3470
     +      describe (4751 revs, 81K repo) 1983ms 1963ms  -1.02%  <0.001
      
          No regressions in any scenario.
      
     @@ builtin/describe.c: static int compare_pt(const void *a_, const void *b_)
      -
      -static bool lazy_queue_empty(const struct lazy_queue *queue)
      -{
     --	return queue->queue.nr == (queue->get_pending ? 1 : 0);
     +-	return prio_queue_size(&queue->queue) == (queue->get_pending ? 1 : 0);
      -}
      -
      -static void lazy_queue_clear(struct lazy_queue *queue)
     @@ builtin/describe.c: static int compare_pt(const void *a_, const void *b_)
       {
       	unsigned long seen_commits = 0;
       	struct oidset unflagged = OIDSET_INIT;
     +-	struct commit *commit;
     +-	int skip = queue->get_pending ? 1 : 0;
      +	struct commit *c;
       
     --	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
     --		struct commit *commit = queue->queue.array[i].data;
     -+	for (size_t i = queue->get_pending; i < queue->nr; i++) {
     -+		struct commit *commit = queue->array[i].data;
     - 		if (!(commit->object.flags & best->flag_within))
     - 			oidset_insert(&unflagged, &commit->object.oid);
     +-	prio_queue_for_each(&queue->queue, commit) {
     +-		if (skip) {
     +-			skip = 0;
     +-			continue;
     +-		}
     +-		if (!(commit->object.flags & best->flag_within))
     +-			oidset_insert(&unflagged, &commit->object.oid);
     ++	prio_queue_for_each(queue, c) {
     ++		if (!(c->object.flags & best->flag_within))
     ++			oidset_insert(&unflagged, &c->object.oid);
       	}
       
      -	while (!lazy_queue_empty(queue)) {
     @@ builtin/describe.c: static void describe_commit(struct commit *cmit, struct strb
       	if (debug) {
       		static int label_width = -1;
      
     - ## builtin/last-modified.c ##
     -@@ builtin/last-modified.c: static void process_parent(struct last_modified *lm,
     - static int last_modified_run(struct last_modified *lm)
     - {
     - 	int max_count, queue_popped = 0;
     -+	struct commit *c;
     - 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
     - 	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
     - 	struct commit_list *list;
     -@@ builtin/last-modified.c: static int last_modified_run(struct last_modified *lm)
     - 		}
     - 	}
     - 
     --	while (queue.nr) {
     -+	while ((c = prio_queue_get(&queue))) {
     - 		int parent_i;
     - 		struct commit_list *p;
     --		struct commit *c = prio_queue_get(&queue);
     - 		struct bitmap *active_c = active_paths_for(lm, c);
     - 
     - 		if ((0 <= max_count && max_count < ++queue_popped) ||
     -
       ## builtin/show-branch.c ##
     -@@ builtin/show-branch.c: static const char *get_color_reset_code(void)
     - 
     - static struct commit *interesting(struct prio_queue *queue)
     - {
     --	for (size_t i = 0; i < queue->nr; i++) {
     -+	for (size_t i = queue->get_pending; i < queue->nr; i++) {
     - 		struct commit *commit = queue->array[i].data;
     - 		if (commit->object.flags & UNINTERESTING)
     - 			continue;
      @@ builtin/show-branch.c: static void join_revs(struct prio_queue *queue,
     - {
     - 	int all_mask = ((1u << (REV_SHIFT + num_rev)) - 1);
     - 	int all_revs = all_mask & ~((1u << REV_SHIFT) - 1);
     -+	struct commit *commit;
     - 
     --	while (queue->nr) {
     -+	while ((commit = prio_queue_peek(queue))) {
     + 	while ((commit = prio_queue_peek(queue))) {
       		struct commit_list *parents;
       		int still_interesting = !!interesting(queue);
     --		struct commit *commit = prio_queue_peek(queue);
      -		bool get_pending = true;
       		int flags = commit->object.flags & all_mask;
       
     @@ builtin/show-branch.c: static void join_revs(struct prio_queue *queue,
       
       	/*
      
     - ## commit-reach.c ##
     -@@ commit-reach.c: int get_branch_base_for_tip(struct repository *r,
     - 			    size_t bases_nr)
     - {
     - 	int best_index = -1;
     --	struct commit *branch_point = NULL;
     -+	struct commit *c, *branch_point = NULL;
     - 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
     - 	int found_missing_gen = 0;
     - 
     -@@ commit-reach.c: int get_branch_base_for_tip(struct repository *r,
     - 		prio_queue_put(&queue, c);
     - 	}
     - 
     --	while (queue.nr) {
     --		struct commit *c = prio_queue_get(&queue);
     -+	while ((c = prio_queue_get(&queue))) {
     - 		int best_for_c = get_best(c);
     - 		int best_for_p, positive;
     - 		struct commit *parent;
     -
       ## commit.c ##
      @@ commit.c: void commit_list_sort_by_date(struct commit_list **list)
       struct commit *pop_most_recent_commit(struct prio_queue *queue,
     @@ commit.c: void commit_list_sort_by_date(struct commit_list **list)
       }
       
      
     - ## pack-bitmap-write.c ##
     -@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
     - 			      struct bitmap_index *old_bitmap,
     - 			      const uint32_t *mapping)
     - {
     -+	struct commit *c;
     - 	int found;
     - 	uint32_t pos;
     - 	if (!ent->bitmap)
     -@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
     - 
     - 	prio_queue_put(queue, commit);
     - 
     --	while (queue->nr) {
     -+	while ((c = prio_queue_get(queue))) {
     - 		struct commit_list *p;
     --		struct commit *c = prio_queue_get(queue);
     - 
     - 		if (old_bitmap && mapping) {
     - 			struct ewah_bitmap *old;
     -
       ## prio-queue.c ##
      @@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
     - 	queue->nr = 0;
     + 	queue->nr_ = 0;
       	queue->alloc = 0;
       	queue->insertion_ctr = 0;
      +	queue->get_pending = 0;
     @@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
      +{
      +	size_t ix, child;
      +
     -+	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
     -+		child = ix * 2 + 1;
     -+		if (child + 1 < queue->nr &&
     ++	/* Push down the one at the root */
     ++	for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
     ++		child = ix * 2 + 1; /* left */
     ++		if (child + 1 < queue->nr_ &&
      +		    compare(queue, child, child + 1) >= 0)
     -+			child++;
     ++			child++; /* use right child */
     ++
      +		if (compare(queue, ix, child) <= 0)
      +			break;
     ++
      +		swap(queue, child, ix);
      +	}
     ++}
     ++
     ++static inline void flush_get(struct prio_queue *queue)
     ++{
     ++	if (!queue->get_pending)
     ++		return;
     ++	queue->get_pending = 0;
     ++	queue->array[0] = queue->array[--queue->nr_];
     ++	sift_down_root(queue);
       }
       
       void prio_queue_put(struct prio_queue *queue, void *thing)
       {
       	size_t ix, parent;
       
     --	/* Append at the end */
      +	if (queue->get_pending) {
      +		queue->get_pending = 0;
      +		queue->array[0].ctr = queue->insertion_ctr++;
     @@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
      +		return;
      +	}
      +
     - 	ALLOC_GROW(queue->array, queue->nr + 1, queue->alloc);
     - 	queue->array[queue->nr].ctr = queue->insertion_ctr++;
     - 	queue->array[queue->nr].data = thing;
     - 	queue->nr++;
     - 	if (!queue->compare)
     --		return; /* LIFO */
     -+		return;
     - 
     --	/* Bubble up the new one */
     - 	for (ix = queue->nr - 1; ix; ix = parent) {
     - 		parent = (ix - 1) / 2;
     - 		if (compare(queue, parent, ix) <= 0)
     - 			break;
     --
     - 		swap(queue, parent, ix);
     + 	/* Append at the end */
     + 	ALLOC_GROW(queue->array, queue->nr_ + 1, queue->alloc);
     + 	queue->array[queue->nr_].ctr = queue->insertion_ctr++;
     +@@ prio-queue.c: void prio_queue_put(struct prio_queue *queue, void *thing)
       	}
       }
       
     @@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
      -	size_t ix, child;
      -
      -	/* Push down the one at the root */
     --	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
     +-	for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
      -		child = ix * 2 + 1; /* left */
     --		if (child + 1 < queue->nr &&
     +-		if (child + 1 < queue->nr_ &&
      -		    compare(queue, child, child + 1) >= 0)
      -			child++; /* use right child */
      -
     @@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
       {
      -	void *result;
      -
     - 	if (!queue->nr)
     +-	if (!queue->nr_)
     ++	if (queue->nr_ <= queue->get_pending) {
     ++		queue->nr_ = 0;
     ++		queue->get_pending = 0;
       		return NULL;
     ++	}
       	if (!queue->compare)
     --		return queue->array[--queue->nr].data; /* LIFO */
     --
     + 		return queue->array[--queue->nr_].data; /* LIFO */
     + 
      -	result = queue->array[0].data;
     --	if (!--queue->nr)
     +-	if (!--queue->nr_)
      -		return result;
     -+		return queue->array[--queue->nr].data;
     -+
     -+	if (queue->get_pending) {
     -+		if (!--queue->nr) {
     -+			queue->get_pending = 0;
     -+			return NULL;
     -+		}
     -+		queue->array[0] = queue->array[queue->nr];
     -+		sift_down_root(queue);
     -+	}
     ++	flush_get(queue);
       
     --	queue->array[0] = queue->array[queue->nr];
     +-	queue->array[0] = queue->array[queue->nr_];
      -	sift_down_root(queue);
      -	return result;
      +	queue->get_pending = 1;
     @@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
       }
       
       void *prio_queue_peek(struct prio_queue *queue)
     -@@ prio-queue.c: void *prio_queue_peek(struct prio_queue *queue)
     + {
     +-	if (!queue->nr_)
     ++	if (queue->nr_ <= queue->get_pending) {
     ++		queue->nr_ = 0;
     ++		queue->get_pending = 0;
       		return NULL;
     ++	}
       	if (!queue->compare)
     - 		return queue->array[queue->nr - 1].data;
     + 		return queue->array[queue->nr_ - 1].data;
      -	return queue->array[0].data;
      -}
       
      -void prio_queue_replace(struct prio_queue *queue, void *thing)
      -{
     --	if (!queue->nr) {
     +-	if (!queue->nr_) {
      -		prio_queue_put(queue, thing);
      -	} else if (!queue->compare) {
     --		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
     --		queue->array[queue->nr - 1].data = thing;
     +-		queue->array[queue->nr_ - 1].ctr = queue->insertion_ctr++;
     +-		queue->array[queue->nr_ - 1].data = thing;
      -	} else {
      -		queue->array[0].ctr = queue->insertion_ctr++;
      -		queue->array[0].data = thing;
     -+	if (queue->get_pending) {
     -+		queue->get_pending = 0;
     -+		if (!--queue->nr)
     -+			return NULL;
     -+		queue->array[0] = queue->array[queue->nr];
     - 		sift_down_root(queue);
     - 	}
     +-		sift_down_root(queue);
     +-	}
     ++	flush_get(queue);
      +
      +	return queue->array[0].data;
       }
      
       ## prio-queue.h ##
      @@ prio-queue.h: struct prio_queue {
     + 	prio_queue_compare_fn compare;
     + 	size_t insertion_ctr;
       	void *cb_data;
     - 	size_t alloc, nr;
     +-	size_t alloc, nr_;
     ++	size_t alloc, nr_; /* use prio_queue_size() for logical count */
       	struct prio_queue_entry *array;
      +	unsigned get_pending;
       };
       
       /*
     -@@ prio-queue.h: void *prio_queue_get(struct prio_queue *);
     -  */
     - void *prio_queue_peek(struct prio_queue *);
     +@@ prio-queue.h: void *prio_queue_peek(struct prio_queue *);
     + 
     + static inline size_t prio_queue_size(const struct prio_queue *queue)
     + {
     +-	return queue->nr_;
     ++	return queue->nr_ - queue->get_pending;
     + }
     + 
     + #define prio_queue_for_each(queue, it) \
     +-	for (size_t pq_ix_ = 0; \
     ++	for (size_t pq_ix_ = (queue)->get_pending; \
     + 	     pq_ix_ < (queue)->nr_ && ((it) = (queue)->array[pq_ix_].data, 1); \
     + 	     pq_ix_++)
       
      -/*
      - * Replace the "thing" that compares the smallest with a new "thing",
     @@ prio-queue.h: void *prio_queue_get(struct prio_queue *);
      - * empty.
      - */
      -void prio_queue_replace(struct prio_queue *queue, void *thing);
     -+static inline size_t prio_queue_size(struct prio_queue *queue)
     -+{
     -+	return queue->nr - queue->get_pending;
     -+}
     - 
     +-
       void clear_prio_queue(struct prio_queue *);
       
     -
     - ## revision.c ##
     -@@ revision.c: static int limit_list(struct rev_info *revs)
     - 	struct commit_list *original_list = revs->commits;
     - 	struct commit_list *newlist = NULL;
     - 	struct commit_list **p = &newlist;
     --	struct commit *interesting_cache = NULL;
     -+	struct commit *commit, *interesting_cache = NULL;
     - 	struct prio_queue queue = { .compare = compare_commits_by_commit_date };
     - 
     - 	if (revs->ancestry_path_implicit_bottoms) {
     -@@ revision.c: static int limit_list(struct rev_info *revs)
     - 		prio_queue_put(&queue, commit);
     - 	}
     - 
     --	while (queue.nr) {
     --		struct commit *commit = prio_queue_get(&queue);
     -+	while ((commit = prio_queue_get(&queue))) {
     - 		struct object *obj = &commit->object;
     - 
     - 		if (commit == interesting_cache)
     + /* Reverse the LIFO elements */
      
       ## t/unit-tests/u-prio-queue.c ##
      @@ t/unit-tests/u-prio-queue.c: static void test_prio_queue(int *input, size_t input_size,
     @@ t/unit-tests/u-prio-queue.c: static void test_prio_queue(int *input, size_t inpu
       			break;
       		default:
       			prio_queue_put(&pq, &input[i]);
     -
     - ## walker.c ##
     -@@ walker.c: static struct prio_queue complete = { compare_commits_by_commit_date };
     - static int process_commit(struct walker *walker, struct commit *commit)
     - {
     - 	struct commit_list *parents;
     -+	struct commit *item;
     - 
     - 	if (repo_parse_commit(the_repository, commit))
     - 		return -1;
     - 
     --	while (complete.nr) {
     --		struct commit *item = prio_queue_peek(&complete);
     -+	while ((item = prio_queue_peek(&complete))) {
     - 		if (item->date < commit->date)
     - 			break;
     - 		pop_most_recent_commit(&complete, COMPLETE);

-- 
gitgitgadget

^ permalink raw reply

* [PATCH v4 1/2] prio-queue: rename .nr to .nr_ and add accessor helpers
From: Kristofer Karlsson via GitGitGadget @ 2026-06-08 19:10 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Kristofer Karlsson, Kristofer Karlsson,
	Kristofer Karlsson
In-Reply-To: <pull.2140.v4.git.1780945851.gitgitgadget@gmail.com>

From: Kristofer Karlsson <krka@spotify.com>

Rename the .nr member to .nr_ so that callers outside prio-queue.c
that directly reference .nr get a compilation error.  This catches
both existing misuse and future in-flight topics.

Add prio_queue_size() for callers that need to know the element count
and prio_queue_for_each() for callers that need to walk all elements.

Convert all external .nr users:
 - Loop conditions: use prio_queue_size(), prio_queue_get(), or
   prio_queue_peek() as the loop condition
 - Array iterations: use prio_queue_for_each()

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
 builtin/describe.c      | 11 ++++++++---
 builtin/last-modified.c |  7 +++----
 builtin/show-branch.c   | 13 ++++++-------
 commit-reach.c          | 14 +++++++-------
 fetch-pack.c            |  4 ++--
 negotiator/default.c    |  4 +++-
 negotiator/skipping.c   | 12 +++++++-----
 object-name.c           |  2 +-
 pack-bitmap-write.c     | 10 +++++-----
 path-walk.c             |  8 ++++----
 prio-queue.c            | 38 +++++++++++++++++++-------------------
 prio-queue.h            | 12 +++++++++++-
 revision.c              | 16 +++++++---------
 walker.c                |  4 ++--
 14 files changed, 85 insertions(+), 70 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 1c47d7c0b7..8e88bdeea6 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -278,7 +278,7 @@ static void lazy_queue_put(struct lazy_queue *queue, void *thing)
 
 static bool lazy_queue_empty(const struct lazy_queue *queue)
 {
-	return queue->queue.nr == (queue->get_pending ? 1 : 0);
+	return prio_queue_size(&queue->queue) == (queue->get_pending ? 1 : 0);
 }
 
 static void lazy_queue_clear(struct lazy_queue *queue)
@@ -292,9 +292,14 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
 {
 	unsigned long seen_commits = 0;
 	struct oidset unflagged = OIDSET_INIT;
+	struct commit *commit;
+	int skip = queue->get_pending ? 1 : 0;
 
-	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
-		struct commit *commit = queue->queue.array[i].data;
+	prio_queue_for_each(&queue->queue, commit) {
+		if (skip) {
+			skip = 0;
+			continue;
+		}
 		if (!(commit->object.flags & best->flag_within))
 			oidset_insert(&unflagged, &commit->object.oid);
 	}
diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index 8900ceece1..5478182f2e 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -344,6 +344,7 @@ static void process_parent(struct last_modified *lm,
 static int last_modified_run(struct last_modified *lm)
 {
 	int max_count, queue_popped = 0;
+	struct commit *c, *n;
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
 	struct commit_list *list;
@@ -389,10 +390,9 @@ static int last_modified_run(struct last_modified *lm)
 		}
 	}
 
-	while (queue.nr) {
+	while ((c = prio_queue_get(&queue))) {
 		int parent_i;
 		struct commit_list *p;
-		struct commit *c = prio_queue_get(&queue);
 		struct bitmap *active_c = active_paths_for(lm, c);
 
 		if ((0 <= max_count && max_count < ++queue_popped) ||
@@ -416,9 +416,8 @@ static int last_modified_run(struct last_modified *lm)
 		 */
 		repo_parse_commit(lm->rev.repo, c);
 
-		while (not_queue.nr) {
+		while ((n = prio_queue_get(&not_queue))) {
 			struct commit_list *np;
-			struct commit *n = prio_queue_get(&not_queue);
 
 			repo_parse_commit(lm->rev.repo, n);
 
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index f02831b085..8846f2376f 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -62,11 +62,10 @@ static const char *get_color_reset_code(void)
 
 static struct commit *interesting(struct prio_queue *queue)
 {
-	for (size_t i = 0; i < queue->nr; i++) {
-		struct commit *commit = queue->array[i].data;
-		if (commit->object.flags & UNINTERESTING)
-			continue;
-		return commit;
+	struct commit *commit;
+	prio_queue_for_each(queue, commit) {
+		if (!(commit->object.flags & UNINTERESTING))
+			return commit;
 	}
 	return NULL;
 }
@@ -228,11 +227,11 @@ static void join_revs(struct prio_queue *queue,
 {
 	int all_mask = ((1u << (REV_SHIFT + num_rev)) - 1);
 	int all_revs = all_mask & ~((1u << REV_SHIFT) - 1);
+	struct commit *commit;
 
-	while (queue->nr) {
+	while ((commit = prio_queue_peek(queue))) {
 		struct commit_list *parents;
 		int still_interesting = !!interesting(queue);
-		struct commit *commit = prio_queue_peek(queue);
 		bool get_pending = true;
 		int flags = commit->object.flags & all_mask;
 
diff --git a/commit-reach.c b/commit-reach.c
index 9b3ea46d6f..a849de653e 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -41,8 +41,8 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
 
 static int queue_has_nonstale(struct prio_queue *queue)
 {
-	for (size_t i = 0; i < queue->nr; i++) {
-		struct commit *commit = queue->array[i].data;
+	struct commit *commit;
+	prio_queue_for_each(queue, commit) {
 		if (!(commit->object.flags & STALE))
 			return 1;
 	}
@@ -1070,6 +1070,7 @@ void ahead_behind(struct repository *r,
 		  struct ahead_behind_count *counts, size_t counts_nr)
 {
 	struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
+	void *entry;
 	size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD);
 
 	if (!commits_nr || !counts_nr)
@@ -1135,8 +1136,8 @@ void ahead_behind(struct repository *r,
 
 	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
 	repo_clear_commit_marks(r, PARENT2 | STALE);
-	for (size_t i = 0; i < queue.nr; i++)
-		free_bit_array(queue.array[i].data);
+	prio_queue_for_each(&queue, entry)
+		free_bit_array(entry);
 	clear_bit_arrays(&bit_arrays);
 	clear_prio_queue(&queue);
 }
@@ -1269,7 +1270,7 @@ int get_branch_base_for_tip(struct repository *r,
 			    size_t bases_nr)
 {
 	int best_index = -1;
-	struct commit *branch_point = NULL;
+	struct commit *c, *branch_point = NULL;
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 	int found_missing_gen = 0;
 
@@ -1322,8 +1323,7 @@ int get_branch_base_for_tip(struct repository *r,
 		prio_queue_put(&queue, c);
 	}
 
-	while (queue.nr) {
-		struct commit *c = prio_queue_get(&queue);
+	while ((c = prio_queue_get(&queue))) {
 		int best_for_c = get_best(c);
 		int best_for_p, positive;
 		struct commit *parent;
diff --git a/fetch-pack.c b/fetch-pack.c
index 120e01f3cf..29c41132ee 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -662,8 +662,8 @@ static int mark_complete_oid(const struct reference *ref, void *cb_data UNUSED)
 static void mark_recent_complete_commits(struct fetch_pack_args *args,
 					 timestamp_t cutoff)
 {
-	while (complete.nr) {
-		struct commit *item = prio_queue_peek(&complete);
+	struct commit *item;
+	while ((item = prio_queue_peek(&complete))) {
 		if (item->date < cutoff)
 			break;
 		print_verbose(args, _("Marking %s as complete"),
diff --git a/negotiator/default.c b/negotiator/default.c
index 78d58d57ce..19cdf3808c 100644
--- a/negotiator/default.c
+++ b/negotiator/default.c
@@ -113,10 +113,12 @@ static const struct object_id *get_rev(struct negotiation_state *ns)
 		unsigned int mark;
 		struct commit_list *parents;
 
-		if (ns->rev_list.nr == 0 || ns->non_common_revs == 0)
+		if (ns->non_common_revs == 0)
 			return NULL;
 
 		commit = prio_queue_get(&ns->rev_list);
+		if (!commit)
+			return NULL;
 		repo_parse_commit(the_repository, commit);
 		parents = commit->parents;
 
diff --git a/negotiator/skipping.c b/negotiator/skipping.c
index 68c9b3b997..db90fa77b5 100644
--- a/negotiator/skipping.c
+++ b/negotiator/skipping.c
@@ -143,8 +143,7 @@ static int push_parent(struct data *data, struct entry *entry,
 		/*
 		 * Find the existing entry and use it.
 		 */
-		for (size_t i = 0; i < data->rev_list.nr; i++) {
-			parent_entry = data->rev_list.array[i].data;
+		prio_queue_for_each(&data->rev_list, parent_entry) {
 			if (parent_entry->commit == to_push)
 				goto parent_found;
 		}
@@ -181,10 +180,12 @@ static const struct object_id *get_rev(struct data *data)
 		struct commit_list *p;
 		int parent_pushed = 0;
 
-		if (data->rev_list.nr == 0 || data->non_common_revs == 0)
+		if (data->non_common_revs == 0)
 			return NULL;
 
 		entry = prio_queue_get(&data->rev_list);
+		if (!entry)
+			return NULL;
 		commit = entry->commit;
 		commit->object.flags |= POPPED;
 		if (!(commit->object.flags & COMMON))
@@ -253,8 +254,9 @@ static void have_sent(struct fetch_negotiator *n, struct commit *c)
 static void release(struct fetch_negotiator *n)
 {
 	struct data *data = n->data;
-	for (size_t i = 0; i < data->rev_list.nr; i++)
-		free(data->rev_list.array[i].data);
+	void *entry;
+	prio_queue_for_each(&data->rev_list, entry)
+		free(entry);
 	clear_prio_queue(&data->rev_list);
 	FREE_AND_NULL(data);
 }
diff --git a/object-name.c b/object-name.c
index 9ac86f19c7..2fedfe1761 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1208,7 +1208,7 @@ static int get_oid_oneline(struct repository *r,
 		l->item->object.flags |= ONELINE_SEEN;
 		prio_queue_put(&copy, l->item);
 	}
-	while (copy.nr) {
+	while (prio_queue_size(&copy)) {
 		const char *p, *buf;
 		struct commit *commit;
 		int matches;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 1c8070f99c..ed9714b135 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -513,6 +513,8 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 			      struct bitmap_index *old_bitmap,
 			      const uint32_t *mapping)
 {
+	struct commit *c;
+	struct tree *tree;
 	int found;
 	uint32_t pos;
 	if (!ent->bitmap)
@@ -520,9 +522,8 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 
 	prio_queue_put(queue, commit);
 
-	while (queue->nr) {
+	while ((c = prio_queue_get(queue))) {
 		struct commit_list *p;
-		struct commit *c = prio_queue_get(queue);
 
 		if (old_bitmap && mapping) {
 			struct ewah_bitmap *old;
@@ -574,9 +575,8 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 		}
 	}
 
-	while (tree_queue->nr) {
-		if (fill_bitmap_tree(writer, ent->bitmap,
-				     prio_queue_get(tree_queue)) < 0)
+	while ((tree = prio_queue_get(tree_queue))) {
+		if (fill_bitmap_tree(writer, ent->bitmap, tree) < 0)
 			return -1;
 	}
 	return 0;
diff --git a/path-walk.c b/path-walk.c
index 94ff90bd15..cf3b2d0765 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -699,6 +699,7 @@ int walk_objects_by_path(struct path_walk_info *info)
 	int ret;
 	size_t commits_nr = 0, paths_nr = 0;
 	struct commit *c;
+	char *path;
 	struct type_and_oid_list *root_tree_list;
 	struct type_and_oid_list *commit_list;
 	struct path_walk_context ctx = {
@@ -808,8 +809,7 @@ int walk_objects_by_path(struct path_walk_info *info)
 	free(commit_list);
 
 	trace2_region_enter("path-walk", "path-walk", info->revs->repo);
-	while (!ret && ctx.path_stack.nr) {
-		char *path = prio_queue_get(&ctx.path_stack);
+	while (!ret && (path = prio_queue_get(&ctx.path_stack))) {
 		paths_nr++;
 
 		ret = walk_path(&ctx, path);
@@ -821,12 +821,12 @@ int walk_objects_by_path(struct path_walk_info *info)
 	if (!strmap_empty(&ctx.paths_to_lists)) {
 		struct hashmap_iter iter;
 		struct strmap_entry *entry;
+		char *path;
 
 		strmap_for_each_entry(&ctx.paths_to_lists, &iter, entry)
 			push_to_stack(&ctx, entry->key);
 
-		while (!ret && ctx.path_stack.nr) {
-			char *path = prio_queue_get(&ctx.path_stack);
+		while (!ret && (path = prio_queue_get(&ctx.path_stack))) {
 			paths_nr++;
 
 			ret = walk_path(&ctx, path);
diff --git a/prio-queue.c b/prio-queue.c
index 9748528ce6..ead4faf4bb 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -22,16 +22,16 @@ void prio_queue_reverse(struct prio_queue *queue)
 
 	if (queue->compare)
 		BUG("prio_queue_reverse() on non-LIFO queue");
-	if (!queue->nr)
+	if (!queue->nr_)
 		return;
-	for (i = 0; i < (j = (queue->nr - 1) - i); i++)
+	for (i = 0; i < (j = (queue->nr_ - 1) - i); i++)
 		swap(queue, i, j);
 }
 
 void clear_prio_queue(struct prio_queue *queue)
 {
 	FREE_AND_NULL(queue->array);
-	queue->nr = 0;
+	queue->nr_ = 0;
 	queue->alloc = 0;
 	queue->insertion_ctr = 0;
 }
@@ -41,15 +41,15 @@ void prio_queue_put(struct prio_queue *queue, void *thing)
 	size_t ix, parent;
 
 	/* Append at the end */
-	ALLOC_GROW(queue->array, queue->nr + 1, queue->alloc);
-	queue->array[queue->nr].ctr = queue->insertion_ctr++;
-	queue->array[queue->nr].data = thing;
-	queue->nr++;
+	ALLOC_GROW(queue->array, queue->nr_ + 1, queue->alloc);
+	queue->array[queue->nr_].ctr = queue->insertion_ctr++;
+	queue->array[queue->nr_].data = thing;
+	queue->nr_++;
 	if (!queue->compare)
 		return; /* LIFO */
 
 	/* Bubble up the new one */
-	for (ix = queue->nr - 1; ix; ix = parent) {
+	for (ix = queue->nr_ - 1; ix; ix = parent) {
 		parent = (ix - 1) / 2;
 		if (compare(queue, parent, ix) <= 0)
 			break;
@@ -63,9 +63,9 @@ static void sift_down_root(struct prio_queue *queue)
 	size_t ix, child;
 
 	/* Push down the one at the root */
-	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
+	for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
 		child = ix * 2 + 1; /* left */
-		if (child + 1 < queue->nr &&
+		if (child + 1 < queue->nr_ &&
 		    compare(queue, child, child + 1) >= 0)
 			child++; /* use right child */
 
@@ -80,36 +80,36 @@ void *prio_queue_get(struct prio_queue *queue)
 {
 	void *result;
 
-	if (!queue->nr)
+	if (!queue->nr_)
 		return NULL;
 	if (!queue->compare)
-		return queue->array[--queue->nr].data; /* LIFO */
+		return queue->array[--queue->nr_].data; /* LIFO */
 
 	result = queue->array[0].data;
-	if (!--queue->nr)
+	if (!--queue->nr_)
 		return result;
 
-	queue->array[0] = queue->array[queue->nr];
+	queue->array[0] = queue->array[queue->nr_];
 	sift_down_root(queue);
 	return result;
 }
 
 void *prio_queue_peek(struct prio_queue *queue)
 {
-	if (!queue->nr)
+	if (!queue->nr_)
 		return NULL;
 	if (!queue->compare)
-		return queue->array[queue->nr - 1].data;
+		return queue->array[queue->nr_ - 1].data;
 	return queue->array[0].data;
 }
 
 void prio_queue_replace(struct prio_queue *queue, void *thing)
 {
-	if (!queue->nr) {
+	if (!queue->nr_) {
 		prio_queue_put(queue, thing);
 	} else if (!queue->compare) {
-		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
-		queue->array[queue->nr - 1].data = thing;
+		queue->array[queue->nr_ - 1].ctr = queue->insertion_ctr++;
+		queue->array[queue->nr_ - 1].data = thing;
 	} else {
 		queue->array[0].ctr = queue->insertion_ctr++;
 		queue->array[0].data = thing;
diff --git a/prio-queue.h b/prio-queue.h
index da7fad2f1f..7f2aa986b1 100644
--- a/prio-queue.h
+++ b/prio-queue.h
@@ -30,7 +30,7 @@ struct prio_queue {
 	prio_queue_compare_fn compare;
 	size_t insertion_ctr;
 	void *cb_data;
-	size_t alloc, nr;
+	size_t alloc, nr_;
 	struct prio_queue_entry *array;
 };
 
@@ -52,6 +52,16 @@ void *prio_queue_get(struct prio_queue *);
  */
 void *prio_queue_peek(struct prio_queue *);
 
+static inline size_t prio_queue_size(const struct prio_queue *queue)
+{
+	return queue->nr_;
+}
+
+#define prio_queue_for_each(queue, it) \
+	for (size_t pq_ix_ = 0; \
+	     pq_ix_ < (queue)->nr_ && ((it) = (queue)->array[pq_ix_].data, 1); \
+	     pq_ix_++)
+
 /*
  * Replace the "thing" that compares the smallest with a new "thing",
  * like prio_queue_get()+prio_queue_put() would do, but in a more
diff --git a/revision.c b/revision.c
index 5693618be4..34e2d146f4 100644
--- a/revision.c
+++ b/revision.c
@@ -476,16 +476,15 @@ static struct commit *handle_commit(struct rev_info *revs,
 static int everybody_uninteresting(struct prio_queue *orig,
 				   struct commit **interesting_cache)
 {
-	size_t i;
+	struct commit *commit;
 
 	if (*interesting_cache) {
-		struct commit *commit = *interesting_cache;
+		commit = *interesting_cache;
 		if (!(commit->object.flags & UNINTERESTING))
 			return 0;
 	}
 
-	for (i = 0; i < orig->nr; i++) {
-		struct commit *commit = orig->array[i].data;
+	prio_queue_for_each(orig, commit) {
 		if (commit->object.flags & UNINTERESTING)
 			continue;
 
@@ -1446,7 +1445,7 @@ static int limit_list(struct rev_info *revs)
 	struct commit_list *original_list = revs->commits;
 	struct commit_list *newlist = NULL;
 	struct commit_list **p = &newlist;
-	struct commit *interesting_cache = NULL;
+	struct commit *commit, *interesting_cache = NULL;
 	struct prio_queue queue = { .compare = compare_commits_by_commit_date };
 
 	if (revs->ancestry_path_implicit_bottoms) {
@@ -1461,8 +1460,7 @@ static int limit_list(struct rev_info *revs)
 		prio_queue_put(&queue, commit);
 	}
 
-	while (queue.nr) {
-		struct commit *commit = prio_queue_get(&queue);
+	while ((commit = prio_queue_get(&queue))) {
 		struct object *obj = &commit->object;
 
 		if (commit == interesting_cache)
@@ -4028,8 +4026,8 @@ static enum rewrite_result rewrite_one_1(struct rev_info *revs,
 
 static void merge_queue_into_list(struct prio_queue *q, struct commit_list **list)
 {
-	while (q->nr) {
-		struct commit *item = prio_queue_peek(q);
+	struct commit *item;
+	while ((item = prio_queue_peek(q))) {
 		struct commit_list *p = *list;
 
 		if (p && p->item->date >= item->date)
diff --git a/walker.c b/walker.c
index e98eb6da53..e3de77f092 100644
--- a/walker.c
+++ b/walker.c
@@ -84,12 +84,12 @@ static struct prio_queue complete = { compare_commits_by_commit_date };
 static int process_commit(struct walker *walker, struct commit *commit)
 {
 	struct commit_list *parents;
+	struct commit *item;
 
 	if (repo_parse_commit(the_repository, commit))
 		return -1;
 
-	while (complete.nr) {
-		struct commit *item = prio_queue_peek(&complete);
+	while ((item = prio_queue_peek(&complete))) {
 		if (item->date < commit->date)
 			break;
 		pop_most_recent_commit(&complete, COMPLETE);
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 2/2] prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
From: Kristofer Karlsson via GitGitGadget @ 2026-06-08 19:10 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Kristofer Karlsson, Kristofer Karlsson,
	Kristofer Karlsson
In-Reply-To: <pull.2140.v4.git.1780945851.gitgitgadget@gmail.com>

From: Kristofer Karlsson <krka@spotify.com>

Defer the actual removal in prio_queue_get() until the next
operation.  If that next operation is a prio_queue_put(), the
removal and insertion are fused into a single replace — writing
the new element at the root and sifting it down — which avoids
a full remove-rebalance-insert cycle.

This matches the dominant usage pattern in git's commit traversal:
get a commit, then put its parents.  The first parent insertion
after each get is now a replace operation automatically.

This generalizes the lazy_queue pattern from builtin/describe.c
(introduced in 08bb69d70f) into prio_queue itself.  Three callers
independently implemented the same get+put fusion:

  - builtin/describe.c had a full lazy_queue wrapper
  - commit.c:pop_most_recent_commit() used peek+replace
  - builtin/show-branch.c:join_revs() used peek+replace

All three now collapse to plain _get() and _put(), with the data
structure handling the fusion internally.  This simplifies callers
and means every prio_queue user gets the optimization for free
without needing to implement it manually.

Remove prio_queue_replace() since no external callers remain.

Benchmarked on a 1.8M-commit monorepo (30 interleaved runs,
paired t-test, Xeon @ 2.20GHz):

Code paths that previously did eager get+put (new optimization):

  Command                       base    patched  change      p
  merge-base --all A A~1000     3828ms  3725ms   -2.69%  0.0001
  rev-list --count A~1000..A    3055ms  2986ms   -2.27%  0.0601
  log --oneline A~1000..A       3408ms  3350ms   -1.71%  0.0482

Code paths that already had manual get+put fusion (expect
neutral — the optimization moves into prio_queue but the number
of heap operations stays the same):

  Command                       base    patched  change      p
  show-branch A A~1000          9156ms  9127ms   -0.32%  0.3470
  describe (4751 revs, 81K repo) 1983ms 1963ms  -1.02%  <0.001

No regressions in any scenario.

Suggested-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
 builtin/describe.c          | 75 +++++++-----------------------
 builtin/show-branch.c       | 11 ++---
 commit.c                    | 11 +----
 prio-queue.c                | 92 ++++++++++++++++++++-----------------
 prio-queue.h                | 15 ++----
 t/unit-tests/u-prio-queue.c |  6 +--
 6 files changed, 78 insertions(+), 132 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 8e88bdeea6..64424543ef 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -251,61 +251,19 @@ static int compare_pt(const void *a_, const void *b_)
 	return 0;
 }
 
-struct lazy_queue {
-	struct prio_queue queue;
-	bool get_pending;
-};
-
-#define LAZY_QUEUE_INIT { { compare_commits_by_commit_date }, false }
-
-static void *lazy_queue_get(struct lazy_queue *queue)
-{
-	if (queue->get_pending)
-		prio_queue_get(&queue->queue);
-	else
-		queue->get_pending = true;
-	return prio_queue_peek(&queue->queue);
-}
-
-static void lazy_queue_put(struct lazy_queue *queue, void *thing)
-{
-	if (queue->get_pending)
-		prio_queue_replace(&queue->queue, thing);
-	else
-		prio_queue_put(&queue->queue, thing);
-	queue->get_pending = false;
-}
-
-static bool lazy_queue_empty(const struct lazy_queue *queue)
-{
-	return prio_queue_size(&queue->queue) == (queue->get_pending ? 1 : 0);
-}
-
-static void lazy_queue_clear(struct lazy_queue *queue)
-{
-	clear_prio_queue(&queue->queue);
-	queue->get_pending = false;
-}
-
-static unsigned long finish_depth_computation(struct lazy_queue *queue,
+static unsigned long finish_depth_computation(struct prio_queue *queue,
 					      struct possible_tag *best)
 {
 	unsigned long seen_commits = 0;
 	struct oidset unflagged = OIDSET_INIT;
-	struct commit *commit;
-	int skip = queue->get_pending ? 1 : 0;
+	struct commit *c;
 
-	prio_queue_for_each(&queue->queue, commit) {
-		if (skip) {
-			skip = 0;
-			continue;
-		}
-		if (!(commit->object.flags & best->flag_within))
-			oidset_insert(&unflagged, &commit->object.oid);
+	prio_queue_for_each(queue, c) {
+		if (!(c->object.flags & best->flag_within))
+			oidset_insert(&unflagged, &c->object.oid);
 	}
 
-	while (!lazy_queue_empty(queue)) {
-		struct commit *c = lazy_queue_get(queue);
+	while ((c = prio_queue_get(queue))) {
 		struct commit_list *parents = c->parents;
 		seen_commits++;
 		if (c->object.flags & best->flag_within) {
@@ -321,7 +279,7 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
 			repo_parse_commit(the_repository, p);
 			seen = p->object.flags & SEEN;
 			if (!seen)
-				lazy_queue_put(queue, p);
+				prio_queue_put(queue, p);
 			flag_before = p->object.flags & best->flag_within;
 			p->object.flags |= c->object.flags;
 			flag_after = p->object.flags & best->flag_within;
@@ -369,8 +327,8 @@ static void append_suffix(int depth, const struct object_id *oid, struct strbuf
 
 static void describe_commit(struct commit *cmit, struct strbuf *dst)
 {
-	struct commit *gave_up_on = NULL;
-	struct lazy_queue queue = LAZY_QUEUE_INIT;
+	struct commit *c, *gave_up_on = NULL;
+	struct prio_queue queue = { compare_commits_by_commit_date };
 	struct commit_name *n;
 	struct possible_tag all_matches[MAX_TAGS];
 	unsigned int match_cnt = 0, annotated_cnt = 0, cur_match;
@@ -412,9 +370,8 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 	}
 
 	cmit->object.flags = SEEN;
-	lazy_queue_put(&queue, cmit);
-	while (!lazy_queue_empty(&queue)) {
-		struct commit *c = lazy_queue_get(&queue);
+	prio_queue_put(&queue, cmit);
+	while ((c = prio_queue_get(&queue))) {
 		struct commit_list *parents = c->parents;
 		struct commit_name **slot;
 
@@ -448,7 +405,7 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 				t->depth++;
 		}
 		/* Stop if last remaining path already covered by best candidate(s) */
-		if (annotated_cnt && lazy_queue_empty(&queue)) {
+		if (annotated_cnt && !prio_queue_size(&queue)) {
 			int best_depth = INT_MAX;
 			unsigned best_within = 0;
 			for (cur_match = 0; cur_match < match_cnt; cur_match++) {
@@ -471,7 +428,7 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 			struct commit *p = parents->item;
 			repo_parse_commit(the_repository, p);
 			if (!(p->object.flags & SEEN))
-				lazy_queue_put(&queue, p);
+				prio_queue_put(&queue, p);
 			p->object.flags |= c->object.flags;
 			parents = parents->next;
 
@@ -486,7 +443,7 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 			strbuf_add_unique_abbrev(dst, cmit_oid, abbrev);
 			if (suffix)
 				strbuf_addstr(dst, suffix);
-			lazy_queue_clear(&queue);
+			clear_prio_queue(&queue);
 			return;
 		}
 		if (unannotated_cnt)
@@ -502,11 +459,11 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 	QSORT(all_matches, match_cnt, compare_pt);
 
 	if (gave_up_on) {
-		lazy_queue_put(&queue, gave_up_on);
+		prio_queue_put(&queue, gave_up_on);
 		seen_commits--;
 	}
 	seen_commits += finish_depth_computation(&queue, &all_matches[0]);
-	lazy_queue_clear(&queue);
+	clear_prio_queue(&queue);
 
 	if (debug) {
 		static int label_width = -1;
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 8846f2376f..2435e8aeda 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -232,12 +232,13 @@ static void join_revs(struct prio_queue *queue,
 	while ((commit = prio_queue_peek(queue))) {
 		struct commit_list *parents;
 		int still_interesting = !!interesting(queue);
-		bool get_pending = true;
 		int flags = commit->object.flags & all_mask;
 
 		if (!still_interesting && extra <= 0)
 			break;
 
+		prio_queue_get(queue);
+
 		mark_seen(commit, seen_p);
 		if ((flags & all_revs) == all_revs)
 			flags |= UNINTERESTING;
@@ -253,14 +254,8 @@ static void join_revs(struct prio_queue *queue,
 			if (mark_seen(p, seen_p) && !still_interesting)
 				extra--;
 			p->object.flags |= flags;
-			if (get_pending)
-				prio_queue_replace(queue, p);
-			else
-				prio_queue_put(queue, p);
-			get_pending = false;
+			prio_queue_put(queue, p);
 		}
-		if (get_pending)
-			prio_queue_get(queue);
 	}
 
 	/*
diff --git a/commit.c b/commit.c
index fd8723502e..976bfc4618 100644
--- a/commit.c
+++ b/commit.c
@@ -795,24 +795,17 @@ void commit_list_sort_by_date(struct commit_list **list)
 struct commit *pop_most_recent_commit(struct prio_queue *queue,
 				      unsigned int mark)
 {
-	struct commit *ret = prio_queue_peek(queue);
-	int get_pending = 1;
+	struct commit *ret = prio_queue_get(queue);
 	struct commit_list *parents = ret->parents;
 
 	while (parents) {
 		struct commit *commit = parents->item;
 		if (!repo_parse_commit(the_repository, commit) && !(commit->object.flags & mark)) {
 			commit->object.flags |= mark;
-			if (get_pending)
-				prio_queue_replace(queue, commit);
-			else
-				prio_queue_put(queue, commit);
-			get_pending = 0;
+			prio_queue_put(queue, commit);
 		}
 		parents = parents->next;
 	}
-	if (get_pending)
-		prio_queue_get(queue);
 	return ret;
 }
 
diff --git a/prio-queue.c b/prio-queue.c
index ead4faf4bb..199775d5af 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -34,12 +34,48 @@ void clear_prio_queue(struct prio_queue *queue)
 	queue->nr_ = 0;
 	queue->alloc = 0;
 	queue->insertion_ctr = 0;
+	queue->get_pending = 0;
+}
+
+static void sift_down_root(struct prio_queue *queue)
+{
+	size_t ix, child;
+
+	/* Push down the one at the root */
+	for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
+		child = ix * 2 + 1; /* left */
+		if (child + 1 < queue->nr_ &&
+		    compare(queue, child, child + 1) >= 0)
+			child++; /* use right child */
+
+		if (compare(queue, ix, child) <= 0)
+			break;
+
+		swap(queue, child, ix);
+	}
+}
+
+static inline void flush_get(struct prio_queue *queue)
+{
+	if (!queue->get_pending)
+		return;
+	queue->get_pending = 0;
+	queue->array[0] = queue->array[--queue->nr_];
+	sift_down_root(queue);
 }
 
 void prio_queue_put(struct prio_queue *queue, void *thing)
 {
 	size_t ix, parent;
 
+	if (queue->get_pending) {
+		queue->get_pending = 0;
+		queue->array[0].ctr = queue->insertion_ctr++;
+		queue->array[0].data = thing;
+		sift_down_root(queue);
+		return;
+	}
+
 	/* Append at the end */
 	ALLOC_GROW(queue->array, queue->nr_ + 1, queue->alloc);
 	queue->array[queue->nr_].ctr = queue->insertion_ctr++;
@@ -58,61 +94,33 @@ void prio_queue_put(struct prio_queue *queue, void *thing)
 	}
 }
 
-static void sift_down_root(struct prio_queue *queue)
-{
-	size_t ix, child;
-
-	/* Push down the one at the root */
-	for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
-		child = ix * 2 + 1; /* left */
-		if (child + 1 < queue->nr_ &&
-		    compare(queue, child, child + 1) >= 0)
-			child++; /* use right child */
-
-		if (compare(queue, ix, child) <= 0)
-			break;
-
-		swap(queue, child, ix);
-	}
-}
-
 void *prio_queue_get(struct prio_queue *queue)
 {
-	void *result;
-
-	if (!queue->nr_)
+	if (queue->nr_ <= queue->get_pending) {
+		queue->nr_ = 0;
+		queue->get_pending = 0;
 		return NULL;
+	}
 	if (!queue->compare)
 		return queue->array[--queue->nr_].data; /* LIFO */
 
-	result = queue->array[0].data;
-	if (!--queue->nr_)
-		return result;
+	flush_get(queue);
 
-	queue->array[0] = queue->array[queue->nr_];
-	sift_down_root(queue);
-	return result;
+	queue->get_pending = 1;
+	return queue->array[0].data;
 }
 
 void *prio_queue_peek(struct prio_queue *queue)
 {
-	if (!queue->nr_)
+	if (queue->nr_ <= queue->get_pending) {
+		queue->nr_ = 0;
+		queue->get_pending = 0;
 		return NULL;
+	}
 	if (!queue->compare)
 		return queue->array[queue->nr_ - 1].data;
-	return queue->array[0].data;
-}
 
-void prio_queue_replace(struct prio_queue *queue, void *thing)
-{
-	if (!queue->nr_) {
-		prio_queue_put(queue, thing);
-	} else if (!queue->compare) {
-		queue->array[queue->nr_ - 1].ctr = queue->insertion_ctr++;
-		queue->array[queue->nr_ - 1].data = thing;
-	} else {
-		queue->array[0].ctr = queue->insertion_ctr++;
-		queue->array[0].data = thing;
-		sift_down_root(queue);
-	}
+	flush_get(queue);
+
+	return queue->array[0].data;
 }
diff --git a/prio-queue.h b/prio-queue.h
index 7f2aa986b1..570b48e648 100644
--- a/prio-queue.h
+++ b/prio-queue.h
@@ -30,8 +30,9 @@ struct prio_queue {
 	prio_queue_compare_fn compare;
 	size_t insertion_ctr;
 	void *cb_data;
-	size_t alloc, nr_;
+	size_t alloc, nr_; /* use prio_queue_size() for logical count */
 	struct prio_queue_entry *array;
+	unsigned get_pending;
 };
 
 /*
@@ -54,22 +55,14 @@ void *prio_queue_peek(struct prio_queue *);
 
 static inline size_t prio_queue_size(const struct prio_queue *queue)
 {
-	return queue->nr_;
+	return queue->nr_ - queue->get_pending;
 }
 
 #define prio_queue_for_each(queue, it) \
-	for (size_t pq_ix_ = 0; \
+	for (size_t pq_ix_ = (queue)->get_pending; \
 	     pq_ix_ < (queue)->nr_ && ((it) = (queue)->array[pq_ix_].data, 1); \
 	     pq_ix_++)
 
-/*
- * Replace the "thing" that compares the smallest with a new "thing",
- * like prio_queue_get()+prio_queue_put() would do, but in a more
- * efficient way.  Does the same as prio_queue_put() if the queue is
- * empty.
- */
-void prio_queue_replace(struct prio_queue *queue, void *thing);
-
 void clear_prio_queue(struct prio_queue *);
 
 /* Reverse the LIFO elements */
diff --git a/t/unit-tests/u-prio-queue.c b/t/unit-tests/u-prio-queue.c
index 63e58114ae..af3e0b8598 100644
--- a/t/unit-tests/u-prio-queue.c
+++ b/t/unit-tests/u-prio-queue.c
@@ -53,13 +53,13 @@ static void test_prio_queue(int *input, size_t input_size,
 			prio_queue_reverse(&pq);
 			break;
 		case REPLACE:
-			peek = prio_queue_peek(&pq);
+			get = prio_queue_get(&pq);
 			cl_assert(i + 1 < input_size);
 			cl_assert(input[i + 1] >= 0);
 			cl_assert(j < result_size);
-			cl_assert_equal_i(result[j], show(peek));
+			cl_assert_equal_i(result[j], show(get));
 			j++;
-			prio_queue_replace(&pq, &input[++i]);
+			prio_queue_put(&pq, &input[++i]);
 			break;
 		default:
 			prio_queue_put(&pq, &input[i]);
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH] ls-files: filter pathspec before lstat
From: Tamir Duberstein @ 2026-06-08 19:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Patrick Steinhardt
In-Reply-To: <xmqqa4t5yyee.fsf@gitster.g>

On Mon, Jun 8, 2026 at 6:06 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> On Sun, Jun 7, 2026 at 11:40, Tamir Duberstein wrote:
> > show_files() checks whether each index entry is deleted or modified
> > before show_ce() applies the pathspec. prune_index() avoids most of this
> > work for pathspecs with a common directory prefix, but a top-level name
> > or leading wildcard leaves every entry to be checked.
> >
> > Match the pathspec before lstat() for the deleted and modified modes.
> > Keep the later match in show_ce() so --error-unmatch is satisfied only
> > by entries that are actually shown.
>
> Adding an extra early `match_pathspec()` check before making slow
> system calls like `lstat()` makes sense, especially when most of the
> index entries need to be skipped.  But if most of them would match,
> then we would end up doing the same match_pathspec() calls twice for
> each path, and run lstat() anyway, so you may also be able to
> construct a perf test that demonstrates a case where this approach
> is not a clear win (or even degradation), perhaps?

Yes. I added an all-matching pathspec case to p3010 and ran:

    hyperfine --warmup 0 --runs 3 \
        'git -c core.fsmonitor=false ls-files --deleted -- "*"'

On a checkout with 859,940 index entries, I ran the parent and patched
binaries in both orders:

                         parent          this commit
  parent first elapsed    56.807 s        64.618 s
               user        1.256 s         1.270 s
               system     10.633 s        11.068 s
  patched first elapsed   63.361 s        64.316 s
                user       1.238 s         1.280 s
                system    10.296 s        11.864 s

The added match costs 14-42 ms of user time in this case. Elapsed time
varies by several seconds with command order, obscuring that CPU cost.

The later match in show_ce() is reached only for entries actually found
deleted or modified. This case therefore exercises the extra match for
every index entry while still performing every lstat().

>
> > diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> > index e1a22b41b9..702c607183 100644
> > --- a/builtin/ls-files.c
> > +++ b/builtin/ls-files.c
> > @@ -450,6 +450,13 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >                       continue;
> >               if (ce_skip_worktree(ce))
> >                       continue;
> > +             /* Only entries shown by show_ce() satisfy --error-unmatch. */
> > +             if (pathspec.nr &&
> > +                 !match_pathspec(repo->index, &pathspec, fullname.buf,
> > +                                 fullname.len, max_prefix_len, NULL,
> > +                                 S_ISDIR(ce->ce_mode) ||
> > +                                 S_ISGITLINK(ce->ce_mode)))
> > +                     continue;
> >               stat_err = lstat(fullname.buf, &st);
> >               if (stat_err && (errno != ENOENT && errno != ENOTDIR))
> >                       error_errno("cannot lstat '%s'", fullname.buf);
>
> Hmph.  In the current code, because there is no such pre-filtering,
> show_ce() would unconditionally recurse into active submodules when
> told to with the "--recurse-submodules" flag, even if your pathspec
> coes not match the submodule.  With this change, such a submodule
> whose path does not match the pathspec would not even be seen by
> show_ce().  Would it cause a change in behaviour?

This path cannot affect --recurse-submodules. cmd_ls_files() rejects
--recurse-submodules together with either --deleted or --modified before
calling show_files(), and the new check is reached only for those two
modes. Cached and stage output continue to call show_ce() before the new
check. t3007 already verifies that both combinations are rejected.

Given the 60.742 s to 1.061 s improvement for a selective pathspec, I
think this small CPU cost for an all-matching pathspec is a worthwhile
tradeoff. What do you think?

Thanks for the review!
Tamir

^ permalink raw reply

* Re: [PATCH 3/3] replay: offer an option to linearize the commit topology
From: Junio C Hamano @ 2026-06-08 19:29 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Johannes Schindelin
In-Reply-To: <20260608-toon-git-replay-drop-merges-v1-3-e3ee71fce7b4@iotcl.com>

Toon Claes <toon@iotcl.com> writes:

> From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
>
> One of the stated goals of git-replay(1) is to allow implementing the
> git-rebase(1) functionality on the server side.
>
> The default mode of git-rebase(1) is to act as if `--no-rebase-merges`
> was given. This mode drops merge commits instead of replaying them, and
> linearized the commit history into a sequence of the
> regular (single-parent) commits.

"linearized" -> "linearizes"?

>
> Add option `--linearize` to git-replay(1) do the same.

"do the same" -> "to do the same"?

> Co-authored-by: Toon Claes <toon@iotcl.com>

There is no sign-off by any of the authors?

> @@ -430,12 +435,20 @@ int replay_revisions(struct rev_info *revs,
>  	while ((commit = get_revision(revs))) {
>  		const struct name_decoration *decoration;
>  
> -		if (commit->parents && commit->parents->next)
> +		if (opts->linearize && (!commit->parents || commit->parents->next))
> +			; /* map current commit to the same as the previous commit */

This uses the same treatment on either root commits or merge
commits?  If this were a mistake and this wants to handle merges but
not roots, shouldn't it be more like

		if (opts->linearize && (commit->parents && commit->parents->next))
			; /* map the merge to the previous */

> +		else if (commit->parents && commit->parents->next)
>  			die(_("replaying merge commits is not supported yet!"));

And because the next one is also about merges, perhaps the early
part of this if/else if cascade can be written

		if (commit->parents && commit->parents->next) {
			/* We have a merge */
			if (!opts->linearize)
				die(_("can't replay a merge (yet)"));
			; /* map current to the previous */
		} else {
			...

wouldn't it?

If the "map current to prev" is applicable to root, any root are
mapped to the last_commit in the above, and if we saw a root as the
first thing in the loop, last_commit is NULL, we do not do anything
here, and after the if/else if/else cascade, we see last_commit is
NULL and break out of the loop.

> +		else {
> +			struct commit *to_pick = reverse ? last_commit : onto;
> +			last_commit =
> +				pick_regular_commit(revs->repo, commit,
> +						    replayed_commits, to_pick,
> +						    &merge_opt, &result,
> +						    opts->linearize ? last_commit : NULL,
> +						    reverse, opts->empty);
> +		}
>  
> -		last_commit = pick_regular_commit(revs->repo, commit, replayed_commits,
> -						  reverse ? last_commit : onto,
> -						  &merge_opt, &result, reverse, opts->empty);
>  		if (!last_commit)
>  			break;

> diff --git a/replay.h b/replay.h
> index 1851a07705..07e6fdcca3 100644
> --- a/replay.h
> +++ b/replay.h
> @@ -62,6 +62,11 @@ struct replay_revisions_options {
>  	 * Defaults to REPLAY_EMPTY_COMMIT_DROP.
>  	 */
>  	enum replay_empty_commit_action empty;
> +
> +	/*
> +	 * Whether to linearize the commits (i.e. drop merge commits).
> +	 */
> +	int linearize;
>  };
>  
>  /* This struct is used as an out-parameter by `replay_revisions()`. */
> diff --git a/t/t3650-replay-basics.sh b/t/t3650-replay-basics.sh
> index 3353bc4a4d..c781a3bb1b 100755
> --- a/t/t3650-replay-basics.sh
> +++ b/t/t3650-replay-basics.sh
> @@ -565,4 +565,26 @@ test_expect_success '--onto with --ref rejects multiple revision ranges' '
>  	test_grep "cannot be used with multiple revision ranges" err
>  '
>  
> +test_expect_success 'linearize the commit topology' '
> +	test_tick &&
> +	N=$(git commit-tree -m N -p L -p I L:) &&
> +	N=$(git commit-tree -m N-child -p $N L:) &&
> +	git update-ref refs/heads/N $N &&
> +
> +	git replay --ref-action=print --linearize \
> +		--onto A B..refs/heads/N >out &&
> +
> +	test_line_count = 1 out &&
> +	read N1 N2 N3 N4 <out &&
> +
> +	cat >expect <<-EOF &&
> +	* N-child
> +	* I
> +	* L
> +	o A
> +	EOF
> +	git log --format=%s --graph --boundary A...$N3 >actual &&
> +	test_cmp expect actual
> +'

Perhaps we would want to have a test that replays all the way down
to the root commit?

^ permalink raw reply

* Re: [PATCH 2/3] config: add GIT_CONFIG_INCLUDES
From: Derrick Stolee @ 2026-06-08 19:38 UTC (permalink / raw)
  To: Patrick Steinhardt, Derrick Stolee via GitGitGadget; +Cc: git, gitster
In-Reply-To: <aibTAOrcSvTOtv78@pks.im>

On 6/8/2026 10:34 AM, Patrick Steinhardt wrote:
> On Mon, Jun 08, 2026 at 01:57:05PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The config keys 'include.path' and 'includeIf.*' allow users to specify
>> config stored in a location outside of the typical list of config files
>> (system, global, local, etc.). For example, users who accept the risk
>> can specify helpful aliases via a file checked into the repo by pointing
>> 'include.path' to the position of that file in the working directory.
>> This is dangerous, but people do it.
> 
> Huh, I never even considered this use case. But of course, this is
> possible, even though it's quite scary.
> 
>> What becomes tricky is that this modifies all Git behavior, including
>> operations that are intended to be limited in activity or sandboxed in
>> some way. These include directives can provide surprising changes to
>> behavior, especially when expecting a specific list of allowed file
>> accesses. This could lead to failed builds, for instance.
>>
>> To allow for these user-desired features when they are running commands,
>> add a new GIT_CONFIG_INCLUDES environment variable that disables these
>> redirections of config when set to zero. This variable can be set by
>> automation, such as build tooling, to avoid these strange behaviors.
>> This could be considered a recommended option for tools executing Git
>> commands, the same as GIT_ADVICE=0.
> 
> I don't know about this part though. I could see use cases where the
> tools _should_ read the project-relative configuration. It might also be
> the case that the user may want to evaluate some includes, but not all
> of them.

True. I'm not confident that we should recommend this environment in all
cases.

> That raises the question whether we can introduce the configuration in a
> way that it allows a bit more flexibility than just "yes"/"no", like for
> example an allow-list of locations that should be evaluated. But maybe
> I'm overthinking this.
I see. So we can say "avoid including into the repository worktree" but
that will probably be incomplete.

There is room for nuance in future expansions, if we can find a creative
way to handle that nuance. For now, I think I would still want an ability
to turn the entire feature off, at least for certain tools that care.

Thanks,
-Stolee

^ permalink raw reply

* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Karthik Nayak @ 2026-06-08 21:18 UTC (permalink / raw)
  To: Tamir Duberstein, git
  Cc: Jeff King, Junio C Hamano, Victoria Dye, Derrick Stolee,
	Elijah Newren
In-Reply-To: <20260607-ref-filter-memoized-contains-v1-1-a1972dde9c76@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 17907 bytes --]

Tamir Duberstein <tamird@gmail.com> writes:

> git branch and git for-each-ref call repo_is_descendant_of() for each
> candidate selected by --contains or --no-contains. Each call starts a
> new graph walk, so refs with shared history repeatedly traverse the same
> commits.
>
> ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> the tag traversal that caches positive and negative answers across
> candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> 2015-07-07) preserved the branch and tag implementations when ref-filter
> learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> the '--contains' option, 2015-10-18) noted that they should be unified.
>

Nicely explained. We should've merged this long ago, so this is a
worthwhile change.

> Use the memoized traversal for every ref-filter contains check and
> remove the implementation selector. The cache records answers for one
> fixed target list, so document that callers must clear it before
> changing the list.
>
> The memoized depth-first walk assumes acyclic ancestry, but replacement
> refs can create cycles. Track commits while they are on the walk. If a
> cycle is found, discard partial cache entries and use
> repo_is_descendant_of() for that candidate.
>
> The branch and for-each-ref path passed repo_is_descendant_of() through
> a Boolean interface. In configurations where it returned -1 for missing
> ancestry, ref-filter treated the error as "contains". The memoized path
> instead fails when ancestry cannot be parsed, as git tag already did.
> During review of the 2018 reachability series, making parse failures
> fatal was explicitly deferred because that series was intended to
> preserve behavior. Unifying the implementations now makes all callers
> fail consistently instead of preserving that accidental Boolean
> interpretation.
>
> The added p1500 case uses up to 8,192 packed refs along one first-parent
> history. It improves from 0.68 to 0.03 seconds.
>
> On a checkout with 62,174 remote-tracking refs, I ran:
>
>     hyperfine --warmup 0 --runs 3 \
>         --command-name parent \
>         '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
>         --command-name this-commit \
>         '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
>
> The results were:
>
>              parent       this commit
>   elapsed    104.365 s     467.7 ms
>   user        93.702 s     220.2 ms
>   system       0.723 s     182.7 ms
>
> The wall-time standard deviations were 11.356 seconds and 133.8
> milliseconds, respectively, for a 223x speedup. Both commands produced
> output with SHA-256
> 2466f6e2b72aa16b1a2126eddb81c8a1b2764ee251204ac034c191a925aa896f.
>
> Both revisions were rebuilt with the default -O2 flags using Apple clang
> 21.0.0 on macOS 26.5. The machine was a MacBook Pro (Mac16,6) with a
> 16-core Apple M4 Max (12 performance and four efficiency cores) and 128
> GB RAM.
>
> Link: https://lore.kernel.org/git/1445163904-24611-1-git-send-email-Karthik.188@gmail.com/
> Link: https://lore.kernel.org/git/20180723204112.233274-1-jonathantanmy@google.com/
> Link: https://lore.kernel.org/git/24424e55-7fa8-d05b-bc39-e14b4d5abcb6@gmail.com/
> Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> ---
>  builtin/tag.c                  |  1 -
>  commit-reach.c                 | 45 +++++++++++++++++++++++++++++++-----------
>  commit-reach.h                 | 15 ++++++++++----
>  ref-filter.c                   |  6 ++++--
>  ref-filter.h                   |  7 +++----
>  t/helper/test-reach.c          | 10 ++--------
>  t/perf/p1500-graph-walks.sh    | 24 +++++++++++++++++++++-
>  t/t6301-for-each-ref-errors.sh | 18 +++++++++++++++++
>  t/t6302-for-each-ref-filter.sh | 21 ++++++++++++++++++++
>  t/t6600-test-reach.sh          |  6 ++----
>  10 files changed, 117 insertions(+), 36 deletions(-)
>
> diff --git a/builtin/tag.c b/builtin/tag.c
> index d51c2e3349..9f34d948d4 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -71,7 +71,6 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
>
>  	if (verify_ref_format(format))
>  		die(_("unable to parse format string"));
> -	filter->with_commit_tag_algo = 1;

We were selectively using the algo for `git tag`, like mentioned I guess
we'll entirely remove `with_commit_tag_algo` below somewhere

>  	filter_and_format_refs(filter, FILTER_REFS_TAGS, sorting, format);
>
>  	free(to_free);
> diff --git a/commit-reach.c b/commit-reach.c
> index 9b3ea46d6f..6e599a3670 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -6,7 +6,6 @@
>  #include "decorate.h"
>  #include "hex.h"
>  #include "prio-queue.h"
> -#include "ref-filter.h"
>  #include "revision.h"
>  #include "tag.h"
>  #include "commit-reach.h"
> @@ -708,7 +707,8 @@ static int in_commit_list(const struct commit_list *want, struct commit *c)
>
>  /*
>   * Test whether the candidate is contained in the list.
> - * Do not recurse to find out, though, but return -1 if inconclusive.
> + * Do not recurse to find out, though, but return CONTAINS_UNKNOWN if
> + * inconclusive.

Okay so the code does return CONTAINS_UNKNOWN which is an enum beginning
at 0, so this makes sense.

>   */
>  static enum contains_result contains_test(struct commit *candidate,
>  					  const struct commit_list *want,
> @@ -743,9 +743,9 @@ static void push_to_contains_stack(struct commit *candidate, struct contains_sta
>  	contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
>  }
>
> -static enum contains_result contains_tag_algo(struct commit *candidate,
> -					      const struct commit_list *want,
> -					      struct contains_cache *cache)
> +static enum contains_result contains_algo(struct commit *candidate,
> +					  struct commit_list *want,
> +					  struct contains_cache *cache)
>  {
>  	struct contains_stack contains_stack = { 0, 0, NULL };
>  	enum contains_result result;
> @@ -765,6 +765,7 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
>  	if (result != CONTAINS_UNKNOWN)
>  		return result;
>
> +	*contains_cache_at(cache, candidate) = CONTAINS_IN_PROGRESS;
>  	push_to_contains_stack(candidate, &contains_stack);
>  	while (contains_stack.nr) {
>  		struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
> @@ -776,8 +777,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
>  			contains_stack.nr--;
>  		}
>  		/*
> -		 * If we just popped the stack, parents->item has been marked,
> -		 * therefore contains_test will return a meaningful yes/no.
> +		 * A parent may have just been popped and marked, or may still
> +		 * be active when replacement refs create a cycle.
>  		 */
>  		else switch (contains_test(parents->item, want, cache, cutoff)) {
>  		case CONTAINS_YES:
> @@ -787,21 +788,41 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
>  		case CONTAINS_NO:
>  			entry->parents = parents->next;
>  			break;
> +		case CONTAINS_IN_PROGRESS:
> +			/*
> +			 * Partial negative answers are not safe across a cycle.
> +			 * Discard them and use the cycle-safe reachability walk.
> +			 */
> +			goto cycle;
>

If I understand this correctly, we now use CONTAINS_IN_PROGRESS to
showcase a commit for which we still don't have a result. Since any
commit with UNKNOWN will start a recursive search through its parents.

So with this if we encounter a CONTAINS_IN_PROGRESS while recursion,
this would indicate that we hit a cyclic graph and so go to the fallback
of using repo_is_descendant_of(). So this avoids an infinite recursion
in such instances. Makes sense.

>  		case CONTAINS_UNKNOWN:
> +			*contains_cache_at(cache, parents->item) =
> +				CONTAINS_IN_PROGRESS;
>  			push_to_contains_stack(parents->item, &contains_stack);
>  			break;
>  		}
>  	}
>  	free(contains_stack.contains_stack);
>  	return contains_test(candidate, want, cache, cutoff);
> +
> +cycle:
> +	free(contains_stack.contains_stack);
> +	clear_contains_cache(cache);
> +	init_contains_cache(cache);
> +
> +	result = repo_is_descendant_of(the_repository, candidate, want);
> +	if (result < 0)
> +		exit(128);
> +	*contains_cache_at(cache, candidate) =
> +		result ? CONTAINS_YES : CONTAINS_NO;
> +	return result ? CONTAINS_YES : CONTAINS_NO;
>  }
>
> -int commit_contains(struct ref_filter *filter, struct commit *commit,
> -		    struct commit_list *list, struct contains_cache *cache)
> +int commit_contains(struct commit *commit, struct commit_list *list,
> +		    struct contains_cache *cache)
>  {
> -	if (filter->with_commit_tag_algo)
> -		return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
> -	return repo_is_descendant_of(the_repository, commit, list);
> +	if (!list)
> +		return 1;
> +	return contains_algo(commit, list, cache) == CONTAINS_YES;
>  }
>
>  int can_all_from_reach_with_flag(struct object_array *from,
> diff --git a/commit-reach.h b/commit-reach.h
> index 3f3a563d8a..144dc56275 100644
> --- a/commit-reach.h
> +++ b/commit-reach.h
> @@ -5,7 +5,6 @@
>  #include "commit-slab.h"
>
>  struct commit_list;
> -struct ref_filter;
>  struct object_id;
>  struct object_array;
>
> @@ -73,13 +72,21 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
>  enum contains_result {
>  	CONTAINS_UNKNOWN = 0,
>  	CONTAINS_NO,
> -	CONTAINS_YES
> +	CONTAINS_YES,
> +	CONTAINS_IN_PROGRESS
>  };
>
>  define_commit_slab(contains_cache, enum contains_result);
>
> -int commit_contains(struct ref_filter *filter, struct commit *commit,
> -		    struct commit_list *list, struct contains_cache *cache);
> +/*
> + * Return whether "commit" is a descendant of any commit in "list". An empty
> + * list matches.
> + *
> + * "cache" records answers for one fixed "list". Clear it before changing the
> + * list.
> + */
> +int commit_contains(struct commit *commit, struct commit_list *list,
> +		    struct contains_cache *cache);
>
>  /*
>   * Determine if every commit in 'from' can reach at least one commit
> diff --git a/ref-filter.c b/ref-filter.c
> index 1da4c0e60d..7788147959 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2991,11 +2991,13 @@ static struct ref_array_item *apply_ref_filter(const struct reference *ref,
>  			return NULL;
>  		/* We perform the filtering for the '--contains' option... */
>  		if (filter->with_commit &&
> -		    !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
> +		    !commit_contains(commit, filter->with_commit,
> +				     &filter->internal.contains_cache))
>  			return NULL;
>  		/* ...or for the `--no-contains' option */
>  		if (filter->no_commit &&
> -		    commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
> +		    commit_contains(commit, filter->no_commit,
> +				    &filter->internal.no_contains_cache))
>  			return NULL;
>  	}
>
> diff --git a/ref-filter.h b/ref-filter.h
> index 120221b47f..9e14afca9c 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -73,10 +73,9 @@ struct ref_filter {
>  	struct commit_list *reachable_from;
>  	struct commit_list *unreachable_from;
>
> -	unsigned int with_commit_tag_algo : 1,
> -		match_as_path : 1,
> -		ignore_case : 1,
> -		detached : 1;
> +	unsigned int match_as_path : 1,
> +		     ignore_case : 1,
> +		     detached : 1;
>  	unsigned int kind,
>  		lines;
>  	int abbrev,

Nit: With the changes above. I do wish it was split into two commits.
1. Fix cyclic recursions in the algo.
2. Use the algo for all filter types.

> diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
> index 5d86a96c17..82235f713e 100644
> --- a/t/helper/test-reach.c
> +++ b/t/helper/test-reach.c
> @@ -6,7 +6,6 @@
>  #include "gettext.h"
>  #include "hex.h"
>  #include "object-name.h"
> -#include "ref-filter.h"
>  #include "setup.h"
>  #include "string-list.h"
>  #include "tag.h"
> @@ -138,16 +137,11 @@ int cmd__reach(int ac, const char **av)
>
>  		printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
>  	} else if (!strcmp(av[1], "commit_contains")) {
> -		struct ref_filter filter = REF_FILTER_INIT;
>  		struct contains_cache cache;
>  		init_contains_cache(&cache);
>
> -		if (ac > 2 && !strcmp(av[2], "--tag"))
> -			filter.with_commit_tag_algo = 1;
> -		else
> -			filter.with_commit_tag_algo = 0;
> -
> -		printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
> +		printf("%s(_,A,X,_):%d\n", av[1],
> +		       commit_contains(A, X, &cache));
>  		clear_contains_cache(&cache);
>  	} else if (!strcmp(av[1], "get_reachable_subset")) {
>  		const int reachable_flag = 1;

Makes sense.

> diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
> index 5b23ce5db9..ac68fdbacd 100755
> --- a/t/perf/p1500-graph-walks.sh
> +++ b/t/perf/p1500-graph-walks.sh
> @@ -5,6 +5,8 @@ test_description='Commit walk performance tests'
>
>  test_perf_large_repo
>
> +contains_ref_limit=8192
> +
>  test_expect_success 'setup' '
>  	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
>  	sort -r allrefs | head -n 50 >refs &&
> @@ -32,10 +34,25 @@ test_expect_success 'setup' '
>  		echo "X:$line" >>test-tool-tags || return 1
>  	done &&
>
> +	git rev-list --first-parent --max-count=$contains_ref_limit HEAD >contains-commits &&
> +	contains_ref_count=$(wc -l <contains-commits) &&
> +	test "$contains_ref_count" -gt 0 &&
> +	contains_base=$(tail -n 1 contains-commits) &&
> +	export contains_base &&
> +	awk "{
> +		printf \"update refs/contains-perf/%04d %s\\n\", NR, \$1
> +	}" contains-commits |
> +		git update-ref --stdin &&
> +	git pack-refs --include "refs/contains-perf/*" &&
> +
>  	commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
>  	git update-ref refs/heads/disjoint-base $commit &&
>
> -	git commit-graph write --reachable
> +	git commit-graph write --reachable &&
> +
> +	git for-each-ref --contains="$contains_base" \
> +		refs/contains-perf/ >actual &&
> +	test_line_count = $contains_ref_count actual

Shouldn't this be a separate test and not a part of the setup?

>  '
>
>  test_perf 'ahead-behind counts: git for-each-ref' '
> @@ -62,6 +79,11 @@ test_perf 'contains: git tag --merged' '
>  	xargs git tag --merged=HEAD <tags
>  '
>
> +test_perf 'contains: git for-each-ref --contains' '
> +	git for-each-ref --contains="$contains_base" \
> +		refs/contains-perf/ >/dev/null
> +'
> +

Ah! we also have this, so perhaps moving the `test_line_count` here and
dropping it above would be better.

>  test_perf 'is-base check: test-tool reach (refs)' '
>  	test-tool reach get_branch_base_for_tip <test-tool-refs
>  '
> diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
> index e06feb06e9..169cc70c23 100755
> --- a/t/t6301-for-each-ref-errors.sh
> +++ b/t/t6301-for-each-ref-errors.sh
> @@ -52,6 +52,24 @@ test_expect_success 'Missing objects are reported correctly' '
>  	test_must_be_empty brief-err
>  '
>
> +test_expect_success 'missing ancestors are reported by contains filters' '
> +	test_when_finished "git update-ref -d refs/heads/missing-parent" &&
> +	{
> +		echo "tree $(git rev-parse HEAD^{tree})" &&
> +		echo "parent $MISSING" &&
> +		git cat-file commit HEAD |
> +			sed -n -e "/^author /p" -e "/^committer /p" &&
> +		echo &&
> +		echo "missing parent"
> +	} > commit &&
> +	broken=$(git hash-object -t commit -w commit) &&
> +	git update-ref refs/heads/missing-parent "$broken" &&
> +	test_must_fail git for-each-ref --contains=HEAD \
> +		refs/heads/missing-parent >out 2>err &&
> +	test_must_be_empty out &&
> +	test_grep "unable to parse commit $MISSING" err
> +'
> +
>  test_expect_success 'ahead-behind requires an argument' '
>  	test_must_fail git for-each-ref \
>  		--format="%(ahead-behind)" 2>err &&
> diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> index 7f060d97bf..423505d1fb 100755
> --- a/t/t6302-for-each-ref-filter.sh
> +++ b/t/t6302-for-each-ref-filter.sh
> @@ -177,6 +177,27 @@ test_expect_success 'filtering with --contains and --no-contains' '
>  	test_cmp expect actual
>  '
>
> +test_expect_success 'contains handles cyclic replacement histories' '
> +	one=$(git rev-parse one) &&
> +	three=$(git rev-parse three) &&
> +	test_when_finished "
> +		git replace -d $one
> +		git replace -d $three
> +		git tag -d cycle-a cycle-b
> +	" &&
> +	git tag cycle-a "$one" &&
> +	git tag cycle-b "$three" &&
> +	git replace --graft "$one" "$three" two &&
> +	git replace --graft "$three" "$one" &&
> +	cat >expect <<-\EOF &&
> +	refs/tags/cycle-a
> +	refs/tags/cycle-b
> +	EOF
> +	git for-each-ref --format="%(refname)" --contains=two \
> +		"refs/tags/cycle-*" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success '%(color) must fail' '
>  	test_must_fail git for-each-ref --format="%(color)%(refname)"
>  '
> diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> index b5b314e570..1ecc2571c2 100755
> --- a/t/t6600-test-reach.sh
> +++ b/t/t6600-test-reach.sh
> @@ -286,8 +286,7 @@ test_expect_success 'commit_contains:hit' '
>  	X:commit-9-3
>  	EOF
>  	echo "commit_contains(_,A,X,_):1" >expect &&
> -	test_all_modes commit_contains &&
> -	test_all_modes commit_contains --tag
> +	test_all_modes commit_contains
>  '
>
>  test_expect_success 'commit_contains:miss' '
> @@ -303,8 +302,7 @@ test_expect_success 'commit_contains:miss' '
>  	X:commit-9-3
>  	EOF
>  	echo "commit_contains(_,A,X,_):0" >expect &&
> -	test_all_modes commit_contains &&
> -	test_all_modes commit_contains --tag
> +	test_all_modes commit_contains
>  '
>
>  test_expect_success 'rev-list: basic topo-order' '
>
> ---
> base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
> change-id: 20260607-ref-filter-memoized-contains-7cb6b3bccad1
>
> Best regards,
> --
> Tamir Duberstein <tamird@gmail.com>

Overall the patch looks great. The perf improvements are also very
welcome. Some small nits from me.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply

* Re: [PATCH] ref-filter: restore prefix-scoped iteration
From: Junio C Hamano @ 2026-06-08 21:36 UTC (permalink / raw)
  To: Tamir Duberstein
  Cc: git, Karthik Nayak, Patrick Steinhardt, Victoria Dye, ZheNing Hu
In-Reply-To: <20260605-fix-git-branch-regression-v1-1-02f40ad40929@gmail.com>

Tamir Duberstein <tamird@gmail.com> writes:

> diff --git a/ref-filter.c b/ref-filter.c
> index 1da4c0e60d..2388a57b39 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -3315,19 +3315,31 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, refs_for
>  		prefix = "refs/tags/";
>  
>  	if (prefix) {

Below, adding an extra call to get_main_ref_store(the_repository)
makes one line unnecessarily split and harder to read.  How about
doing

		struct ref_store *store = get_main_ref_store(the_repository);

upfront here, and then use that to replace these two calls of
get_main_ref_store(the_repository)?

> +		if (filter->start_after) {
> +			struct ref_iterator *iter;
>  
> +			iter = refs_ref_iterator_begin(
> +				get_main_ref_store(the_repository), "", NULL, 0,
> +				0);
>  
>  			ret = start_ref_iterator_after(iter, filter->start_after);
> +			if (!ret)
> +				ret = do_for_each_ref_iterator(iter, fn,
> +							       cb_data);
> +		} else {
> +			/*
> +			 * Pass the prefix during construction because the files
> +			 * backend primes loose refs before a later seek can
> +			 * narrow the iterator.
> +			 */
> +			struct refs_for_each_ref_options opts = {
> +				.prefix = prefix,
> +			};
>  
> +			ret = refs_for_each_ref_ext(
> +				get_main_ref_store(the_repository), fn, cb_data,
> +				&opts);
> +		}
>  	} else if (filter->kind & FILTER_REFS_REGULAR) {

^ permalink raw reply

* Re: [PATCH 0/6] t: add lint-style.pl and convert grep to test_grep
From: Junio C Hamano @ 2026-06-08 21:36 UTC (permalink / raw)
  To: Michael Montalbo via GitGitGadget
  Cc: git, D. Ben Knoble, Eric Sunshine, Michael Montalbo
In-Reply-To: <pull.2135.git.1780559158.gitgitgadget@gmail.com>

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The test suite has a test_grep wrapper that prints file contents on
> assertion failure, making debugging easier. Many tests still use bare 'grep'
> for assertions, which silently swallows context on failure.
>
> This series adds a lint tool (lint-style.pl) to mechanically detect and
> convert these, then applies it across the test suite.

I do not think we want an automated tool that rewrites the source
files.  I was hoping that we would get a patch or two that _adds_ to
existing test-lint framework (i.e., 'test-grep' that 'test-lint'
target depends on in t/Makefile) that gives diagnosis in a similar
fashion as test-lint-shell-syntax and test-chainlint do.

Also some existing uses of "grep" are not end-user facing and should
not be rewritten to "test_grep".

^ permalink raw reply

* Re: [PATCH] docs: fix typos
From: Junio C Hamano @ 2026-06-08 22:16 UTC (permalink / raw)
  To: Tuomas Ahola; +Cc: git
In-Reply-To: <20260604131457.19215-1-taahol@utu.fi>

Tuomas Ahola <taahol@utu.fi> writes:

> Fix some typos and grammar errors in comments and documentation files.
>
> Signed-off-by: Tuomas Ahola <taahol@utu.fi>
> ---

Thanks, all changes make sense.  Will queue.


^ permalink raw reply

* Re: [GSoC PATCH v2 3/4] repo: add path.gitdir with absolute and relative suffix formatting
From: Lucas Seiki Oshiro @ 2026-06-08 22:17 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, a3205153416, gitster, jltobler, kumarayushjha123,
	phillip.wood, sandals
In-Reply-To: <20260605163012.181089-4-jayatheerthkulkarni2005@gmail.com>


> +test_repo_info_path () {
> + field_name=$1
> + expect_absolute_eval=$2
> + expect_relative=$3
> + env_prefix=$4

This helper function needs a documentation.

> + test_expect_success "query individual key: path.$field_name.absolute${env_prefix:+ ($env_prefix)}" '

This makes the output polluted. What about changing it by something like:

	test_expect_success "absolute: $label' '...'
	test_expect_success "relative: $label' '...'

with a custom label?

> +
> +test_expect_success 'setup test repository layout for path fields' '
> + git init test-repo &&
> + mkdir -p test-repo/sub
> +'

The helper function `test_repo_info_path` is relying too much on the
existence of the `test-repo`. I think it would be better to add a new
parameter `repo_name` (or similar) because:

1. You could move this creation to the helper function and
   you won't need to place the test after that creation

2. You could use different for each (test_repo_info_path call, path format)
   pair. Currently, if more than one test fails, its result is overwritten
   and the `expect` and `actual` files from the trash directory will be
   the last of the broken tests.

3. You won't need to use the hacky 'echo "$(cd .. && pwd)' 

This applies my suggestions (feel free to use, adapt or discard it):

test_repo_info_path () {
        label=$1
	field_name=$2
	repo_name=$3
	expect_absolute=$4
	expect_relative=$5
	init_command=$6

	absolute_root="$repo_name"-absolute
	relative_root="$repo_name"-relative

	expect_absolute="$PWD"/"$absolute_root"/"$expect_absolute"

	test_expect_success 'setup test repository layout for path fields' '
		git init "$absolute_root" &&
		git init "$relative_root" &&
	       	mkdir -p "$absolute_root"/sub "$relative_root"/sub
	'

	test_expect_success "absolute: $label" '
		(
			export ROOT="$PWD"/"$absolute_root" &&
			cd "$absolute_root"/sub &&
			eval "$init_command" &&
			echo "path.$field_name.absolute=$expect_absolute" >expect &&
			git repo info path.$field_name.absolute >actual &&
			test_cmp expect actual
		)
	'

	test_expect_success "relative: $label" '
		(
			export ROOT="$PWD"/"$relative_root" &&
			cd "$relative_root"/sub &&
			eval "$init_command" &&
			echo "path.$field_name.relative=$expect_relative" >expect &&
			git repo info path.$field_name.relative >actual &&
			test_cmp expect actual
		)
	'
}

test_repo_info_path 'gitdir' 'gitdir' 'gitdir' '.git' '../.git'



^ permalink raw reply

* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Tamir Duberstein @ 2026-06-08 22:30 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: git, Jeff King, Junio C Hamano, Victoria Dye, Derrick Stolee,
	Elijah Newren
In-Reply-To: <CAOLa=ZS_U+u43SV9ELSEU6AT7rzEQ44BuHPAi1BAHEGQAnPoPw@mail.gmail.com>

On Mon, Jun 8, 2026 at 2:18 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
> Tamir Duberstein <tamird@gmail.com> writes:
>
> > git branch and git for-each-ref call repo_is_descendant_of() for each
> > candidate selected by --contains or --no-contains. Each call starts a
> > new graph walk, so refs with shared history repeatedly traverse the same
> > commits.
> >
> > ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> > the tag traversal that caches positive and negative answers across
> > candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> > 2015-07-07) preserved the branch and tag implementations when ref-filter
> > learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> > the '--contains' option, 2015-10-18) noted that they should be unified.
> >
>
> Nicely explained. We should've merged this long ago, so this is a
> worthwhile change.
>
> > Use the memoized traversal for every ref-filter contains check and
> > remove the implementation selector. The cache records answers for one
> > fixed target list, so document that callers must clear it before
> > changing the list.
> >
> > The memoized depth-first walk assumes acyclic ancestry, but replacement
> > refs can create cycles. Track commits while they are on the walk. If a
> > cycle is found, discard partial cache entries and use
> > repo_is_descendant_of() for that candidate.
> >
> > The branch and for-each-ref path passed repo_is_descendant_of() through
> > a Boolean interface. In configurations where it returned -1 for missing
> > ancestry, ref-filter treated the error as "contains". The memoized path
> > instead fails when ancestry cannot be parsed, as git tag already did.
> > During review of the 2018 reachability series, making parse failures
> > fatal was explicitly deferred because that series was intended to
> > preserve behavior. Unifying the implementations now makes all callers
> > fail consistently instead of preserving that accidental Boolean
> > interpretation.
> >
> > The added p1500 case uses up to 8,192 packed refs along one first-parent
> > history. It improves from 0.68 to 0.03 seconds.
> >
> > On a checkout with 62,174 remote-tracking refs, I ran:
> >
> >     hyperfine --warmup 0 --runs 3 \
> >         --command-name parent \
> >         '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
> >         --command-name this-commit \
> >         '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
> >
> > The results were:
> >
> >              parent       this commit
> >   elapsed    104.365 s     467.7 ms
> >   user        93.702 s     220.2 ms
> >   system       0.723 s     182.7 ms
> >
> > The wall-time standard deviations were 11.356 seconds and 133.8
> > milliseconds, respectively, for a 223x speedup. Both commands produced
> > output with SHA-256
> > 2466f6e2b72aa16b1a2126eddb81c8a1b2764ee251204ac034c191a925aa896f.
> >
> > Both revisions were rebuilt with the default -O2 flags using Apple clang
> > 21.0.0 on macOS 26.5. The machine was a MacBook Pro (Mac16,6) with a
> > 16-core Apple M4 Max (12 performance and four efficiency cores) and 128
> > GB RAM.
> >
> > Link: https://lore.kernel.org/git/1445163904-24611-1-git-send-email-Karthik.188@gmail.com/
> > Link: https://lore.kernel.org/git/20180723204112.233274-1-jonathantanmy@google.com/
> > Link: https://lore.kernel.org/git/24424e55-7fa8-d05b-bc39-e14b4d5abcb6@gmail.com/
> > Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> > ---
> >  builtin/tag.c                  |  1 -
> >  commit-reach.c                 | 45 +++++++++++++++++++++++++++++++-----------
> >  commit-reach.h                 | 15 ++++++++++----
> >  ref-filter.c                   |  6 ++++--
> >  ref-filter.h                   |  7 +++----
> >  t/helper/test-reach.c          | 10 ++--------
> >  t/perf/p1500-graph-walks.sh    | 24 +++++++++++++++++++++-
> >  t/t6301-for-each-ref-errors.sh | 18 +++++++++++++++++
> >  t/t6302-for-each-ref-filter.sh | 21 ++++++++++++++++++++
> >  t/t6600-test-reach.sh          |  6 ++----
> >  10 files changed, 117 insertions(+), 36 deletions(-)
> >
> > diff --git a/builtin/tag.c b/builtin/tag.c
> > index d51c2e3349..9f34d948d4 100644
> > --- a/builtin/tag.c
> > +++ b/builtin/tag.c
> > @@ -71,7 +71,6 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >
> >       if (verify_ref_format(format))
> >               die(_("unable to parse format string"));
> > -     filter->with_commit_tag_algo = 1;
>
> We were selectively using the algo for `git tag`, like mentioned I guess
> we'll entirely remove `with_commit_tag_algo` below somewhere
>
> >       filter_and_format_refs(filter, FILTER_REFS_TAGS, sorting, format);
> >
> >       free(to_free);
> > diff --git a/commit-reach.c b/commit-reach.c
> > index 9b3ea46d6f..6e599a3670 100644
> > --- a/commit-reach.c
> > +++ b/commit-reach.c
> > @@ -6,7 +6,6 @@
> >  #include "decorate.h"
> >  #include "hex.h"
> >  #include "prio-queue.h"
> > -#include "ref-filter.h"
> >  #include "revision.h"
> >  #include "tag.h"
> >  #include "commit-reach.h"
> > @@ -708,7 +707,8 @@ static int in_commit_list(const struct commit_list *want, struct commit *c)
> >
> >  /*
> >   * Test whether the candidate is contained in the list.
> > - * Do not recurse to find out, though, but return -1 if inconclusive.
> > + * Do not recurse to find out, though, but return CONTAINS_UNKNOWN if
> > + * inconclusive.
>
> Okay so the code does return CONTAINS_UNKNOWN which is an enum beginning
> at 0, so this makes sense.
>
> >   */
> >  static enum contains_result contains_test(struct commit *candidate,
> >                                         const struct commit_list *want,
> > @@ -743,9 +743,9 @@ static void push_to_contains_stack(struct commit *candidate, struct contains_sta
> >       contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
> >  }
> >
> > -static enum contains_result contains_tag_algo(struct commit *candidate,
> > -                                           const struct commit_list *want,
> > -                                           struct contains_cache *cache)
> > +static enum contains_result contains_algo(struct commit *candidate,
> > +                                       struct commit_list *want,
> > +                                       struct contains_cache *cache)
> >  {
> >       struct contains_stack contains_stack = { 0, 0, NULL };
> >       enum contains_result result;
> > @@ -765,6 +765,7 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> >       if (result != CONTAINS_UNKNOWN)
> >               return result;
> >
> > +     *contains_cache_at(cache, candidate) = CONTAINS_IN_PROGRESS;
> >       push_to_contains_stack(candidate, &contains_stack);
> >       while (contains_stack.nr) {
> >               struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
> > @@ -776,8 +777,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> >                       contains_stack.nr--;
> >               }
> >               /*
> > -              * If we just popped the stack, parents->item has been marked,
> > -              * therefore contains_test will return a meaningful yes/no.
> > +              * A parent may have just been popped and marked, or may still
> > +              * be active when replacement refs create a cycle.
> >                */
> >               else switch (contains_test(parents->item, want, cache, cutoff)) {
> >               case CONTAINS_YES:
> > @@ -787,21 +788,41 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> >               case CONTAINS_NO:
> >                       entry->parents = parents->next;
> >                       break;
> > +             case CONTAINS_IN_PROGRESS:
> > +                     /*
> > +                      * Partial negative answers are not safe across a cycle.
> > +                      * Discard them and use the cycle-safe reachability walk.
> > +                      */
> > +                     goto cycle;
> >
>
> If I understand this correctly, we now use CONTAINS_IN_PROGRESS to
> showcase a commit for which we still don't have a result. Since any
> commit with UNKNOWN will start a recursive search through its parents.
>
> So with this if we encounter a CONTAINS_IN_PROGRESS while recursion,
> this would indicate that we hit a cyclic graph and so go to the fallback
> of using repo_is_descendant_of(). So this avoids an infinite recursion
> in such instances. Makes sense.
>
> >               case CONTAINS_UNKNOWN:
> > +                     *contains_cache_at(cache, parents->item) =
> > +                             CONTAINS_IN_PROGRESS;
> >                       push_to_contains_stack(parents->item, &contains_stack);
> >                       break;
> >               }
> >       }
> >       free(contains_stack.contains_stack);
> >       return contains_test(candidate, want, cache, cutoff);
> > +
> > +cycle:
> > +     free(contains_stack.contains_stack);
> > +     clear_contains_cache(cache);
> > +     init_contains_cache(cache);
> > +
> > +     result = repo_is_descendant_of(the_repository, candidate, want);
> > +     if (result < 0)
> > +             exit(128);
> > +     *contains_cache_at(cache, candidate) =
> > +             result ? CONTAINS_YES : CONTAINS_NO;
> > +     return result ? CONTAINS_YES : CONTAINS_NO;
> >  }
> >
> > -int commit_contains(struct ref_filter *filter, struct commit *commit,
> > -                 struct commit_list *list, struct contains_cache *cache)
> > +int commit_contains(struct commit *commit, struct commit_list *list,
> > +                 struct contains_cache *cache)
> >  {
> > -     if (filter->with_commit_tag_algo)
> > -             return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
> > -     return repo_is_descendant_of(the_repository, commit, list);
> > +     if (!list)
> > +             return 1;
> > +     return contains_algo(commit, list, cache) == CONTAINS_YES;
> >  }
> >
> >  int can_all_from_reach_with_flag(struct object_array *from,
> > diff --git a/commit-reach.h b/commit-reach.h
> > index 3f3a563d8a..144dc56275 100644
> > --- a/commit-reach.h
> > +++ b/commit-reach.h
> > @@ -5,7 +5,6 @@
> >  #include "commit-slab.h"
> >
> >  struct commit_list;
> > -struct ref_filter;
> >  struct object_id;
> >  struct object_array;
> >
> > @@ -73,13 +72,21 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
> >  enum contains_result {
> >       CONTAINS_UNKNOWN = 0,
> >       CONTAINS_NO,
> > -     CONTAINS_YES
> > +     CONTAINS_YES,
> > +     CONTAINS_IN_PROGRESS
> >  };
> >
> >  define_commit_slab(contains_cache, enum contains_result);
> >
> > -int commit_contains(struct ref_filter *filter, struct commit *commit,
> > -                 struct commit_list *list, struct contains_cache *cache);
> > +/*
> > + * Return whether "commit" is a descendant of any commit in "list". An empty
> > + * list matches.
> > + *
> > + * "cache" records answers for one fixed "list". Clear it before changing the
> > + * list.
> > + */
> > +int commit_contains(struct commit *commit, struct commit_list *list,
> > +                 struct contains_cache *cache);
> >
> >  /*
> >   * Determine if every commit in 'from' can reach at least one commit
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 1da4c0e60d..7788147959 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2991,11 +2991,13 @@ static struct ref_array_item *apply_ref_filter(const struct reference *ref,
> >                       return NULL;
> >               /* We perform the filtering for the '--contains' option... */
> >               if (filter->with_commit &&
> > -                 !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
> > +                 !commit_contains(commit, filter->with_commit,
> > +                                  &filter->internal.contains_cache))
> >                       return NULL;
> >               /* ...or for the `--no-contains' option */
> >               if (filter->no_commit &&
> > -                 commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
> > +                 commit_contains(commit, filter->no_commit,
> > +                                 &filter->internal.no_contains_cache))
> >                       return NULL;
> >       }
> >
> > diff --git a/ref-filter.h b/ref-filter.h
> > index 120221b47f..9e14afca9c 100644
> > --- a/ref-filter.h
> > +++ b/ref-filter.h
> > @@ -73,10 +73,9 @@ struct ref_filter {
> >       struct commit_list *reachable_from;
> >       struct commit_list *unreachable_from;
> >
> > -     unsigned int with_commit_tag_algo : 1,
> > -             match_as_path : 1,
> > -             ignore_case : 1,
> > -             detached : 1;
> > +     unsigned int match_as_path : 1,
> > +                  ignore_case : 1,
> > +                  detached : 1;
> >       unsigned int kind,
> >               lines;
> >       int abbrev,
>
> Nit: With the changes above. I do wish it was split into two commits.
> 1. Fix cyclic recursions in the algo.
> 2. Use the algo for all filter types.

Makes sense. I split v2 accordingly. The first patch now fixes the
existing git tag --contains traversal and tests that path directly.
The second patch then uses the traversal for the other ref-filter
callers.

>
> > diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
> > index 5d86a96c17..82235f713e 100644
> > --- a/t/helper/test-reach.c
> > +++ b/t/helper/test-reach.c
> > @@ -6,7 +6,6 @@
> >  #include "gettext.h"
> >  #include "hex.h"
> >  #include "object-name.h"
> > -#include "ref-filter.h"
> >  #include "setup.h"
> >  #include "string-list.h"
> >  #include "tag.h"
> > @@ -138,16 +137,11 @@ int cmd__reach(int ac, const char **av)
> >
> >               printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
> >       } else if (!strcmp(av[1], "commit_contains")) {
> > -             struct ref_filter filter = REF_FILTER_INIT;
> >               struct contains_cache cache;
> >               init_contains_cache(&cache);
> >
> > -             if (ac > 2 && !strcmp(av[2], "--tag"))
> > -                     filter.with_commit_tag_algo = 1;
> > -             else
> > -                     filter.with_commit_tag_algo = 0;
> > -
> > -             printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
> > +             printf("%s(_,A,X,_):%d\n", av[1],
> > +                    commit_contains(A, X, &cache));
> >               clear_contains_cache(&cache);
> >       } else if (!strcmp(av[1], "get_reachable_subset")) {
> >               const int reachable_flag = 1;
>
> Makes sense.
>
> > diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
> > index 5b23ce5db9..ac68fdbacd 100755
> > --- a/t/perf/p1500-graph-walks.sh
> > +++ b/t/perf/p1500-graph-walks.sh
> > @@ -5,6 +5,8 @@ test_description='Commit walk performance tests'
> >
> >  test_perf_large_repo
> >
> > +contains_ref_limit=8192
> > +
> >  test_expect_success 'setup' '
> >       git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
> >       sort -r allrefs | head -n 50 >refs &&
> > @@ -32,10 +34,25 @@ test_expect_success 'setup' '
> >               echo "X:$line" >>test-tool-tags || return 1
> >       done &&
> >
> > +     git rev-list --first-parent --max-count=$contains_ref_limit HEAD >contains-commits &&
> > +     contains_ref_count=$(wc -l <contains-commits) &&
> > +     test "$contains_ref_count" -gt 0 &&
> > +     contains_base=$(tail -n 1 contains-commits) &&
> > +     export contains_base &&
> > +     awk "{
> > +             printf \"update refs/contains-perf/%04d %s\\n\", NR, \$1
> > +     }" contains-commits |
> > +             git update-ref --stdin &&
> > +     git pack-refs --include "refs/contains-perf/*" &&
> > +
> >       commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
> >       git update-ref refs/heads/disjoint-base $commit &&
> >
> > -     git commit-graph write --reachable
> > +     git commit-graph write --reachable &&
> > +
> > +     git for-each-ref --contains="$contains_base" \
> > +             refs/contains-perf/ >actual &&
> > +     test_line_count = $contains_ref_count actual
>
> Shouldn't this be a separate test and not a part of the setup?
>
> >  '
> >
> >  test_perf 'ahead-behind counts: git for-each-ref' '
> > @@ -62,6 +79,11 @@ test_perf 'contains: git tag --merged' '
> >       xargs git tag --merged=HEAD <tags
> >  '
> >
> > +test_perf 'contains: git for-each-ref --contains' '
> > +     git for-each-ref --contains="$contains_base" \
> > +             refs/contains-perf/ >/dev/null
> > +'
> > +
>
> Ah! we also have this, so perhaps moving the `test_line_count` here and
> dropping it above would be better.

Makes sense! Done in v2.

>
> >  test_perf 'is-base check: test-tool reach (refs)' '
> >       test-tool reach get_branch_base_for_tip <test-tool-refs
> >  '
> > diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
> > index e06feb06e9..169cc70c23 100755
> > --- a/t/t6301-for-each-ref-errors.sh
> > +++ b/t/t6301-for-each-ref-errors.sh
> > @@ -52,6 +52,24 @@ test_expect_success 'Missing objects are reported correctly' '
> >       test_must_be_empty brief-err
> >  '
> >
> > +test_expect_success 'missing ancestors are reported by contains filters' '
> > +     test_when_finished "git update-ref -d refs/heads/missing-parent" &&
> > +     {
> > +             echo "tree $(git rev-parse HEAD^{tree})" &&
> > +             echo "parent $MISSING" &&
> > +             git cat-file commit HEAD |
> > +                     sed -n -e "/^author /p" -e "/^committer /p" &&
> > +             echo &&
> > +             echo "missing parent"
> > +     } > commit &&
> > +     broken=$(git hash-object -t commit -w commit) &&
> > +     git update-ref refs/heads/missing-parent "$broken" &&
> > +     test_must_fail git for-each-ref --contains=HEAD \
> > +             refs/heads/missing-parent >out 2>err &&
> > +     test_must_be_empty out &&
> > +     test_grep "unable to parse commit $MISSING" err
> > +'
> > +
> >  test_expect_success 'ahead-behind requires an argument' '
> >       test_must_fail git for-each-ref \
> >               --format="%(ahead-behind)" 2>err &&
> > diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> > index 7f060d97bf..423505d1fb 100755
> > --- a/t/t6302-for-each-ref-filter.sh
> > +++ b/t/t6302-for-each-ref-filter.sh
> > @@ -177,6 +177,27 @@ test_expect_success 'filtering with --contains and --no-contains' '
> >       test_cmp expect actual
> >  '
> >
> > +test_expect_success 'contains handles cyclic replacement histories' '
> > +     one=$(git rev-parse one) &&
> > +     three=$(git rev-parse three) &&
> > +     test_when_finished "
> > +             git replace -d $one
> > +             git replace -d $three
> > +             git tag -d cycle-a cycle-b
> > +     " &&
> > +     git tag cycle-a "$one" &&
> > +     git tag cycle-b "$three" &&
> > +     git replace --graft "$one" "$three" two &&
> > +     git replace --graft "$three" "$one" &&
> > +     cat >expect <<-\EOF &&
> > +     refs/tags/cycle-a
> > +     refs/tags/cycle-b
> > +     EOF
> > +     git for-each-ref --format="%(refname)" --contains=two \
> > +             "refs/tags/cycle-*" >actual &&
> > +     test_cmp expect actual
> > +'
> > +
> >  test_expect_success '%(color) must fail' '
> >       test_must_fail git for-each-ref --format="%(color)%(refname)"
> >  '
> > diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> > index b5b314e570..1ecc2571c2 100755
> > --- a/t/t6600-test-reach.sh
> > +++ b/t/t6600-test-reach.sh
> > @@ -286,8 +286,7 @@ test_expect_success 'commit_contains:hit' '
> >       X:commit-9-3
> >       EOF
> >       echo "commit_contains(_,A,X,_):1" >expect &&
> > -     test_all_modes commit_contains &&
> > -     test_all_modes commit_contains --tag
> > +     test_all_modes commit_contains
> >  '
> >
> >  test_expect_success 'commit_contains:miss' '
> > @@ -303,8 +302,7 @@ test_expect_success 'commit_contains:miss' '
> >       X:commit-9-3
> >       EOF
> >       echo "commit_contains(_,A,X,_):0" >expect &&
> > -     test_all_modes commit_contains &&
> > -     test_all_modes commit_contains --tag
> > +     test_all_modes commit_contains
> >  '
> >
> >  test_expect_success 'rev-list: basic topo-order' '
> >
> > ---
> > base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
> > change-id: 20260607-ref-filter-memoized-contains-7cb6b3bccad1
> >
> > Best regards,
> > --
> > Tamir Duberstein <tamird@gmail.com>
>
> Overall the patch looks great. The perf improvements are also very
> welcome. Some small nits from me.

Thanks a lot for the review!

^ permalink raw reply

* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Jeff King @ 2026-06-08 22:34 UTC (permalink / raw)
  To: Tamir Duberstein
  Cc: git, Karthik Nayak, Junio C Hamano, Victoria Dye, Derrick Stolee,
	Elijah Newren, Kristofer Karlsson
In-Reply-To: <20260607-ref-filter-memoized-contains-v1-1-a1972dde9c76@gmail.com>

On Sun, Jun 07, 2026 at 08:33:29PM -0700, Tamir Duberstein wrote:

> git branch and git for-each-ref call repo_is_descendant_of() for each
> candidate selected by --contains or --no-contains. Each call starts a
> new graph walk, so refs with shared history repeatedly traverse the same
> commits.
> 
> ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> the tag traversal that caches positive and negative answers across
> candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> 2015-07-07) preserved the branch and tag implementations when ref-filter
> learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> the '--contains' option, 2015-10-18) noted that they should be unified.
> 
> Use the memoized traversal for every ref-filter contains check and
> remove the implementation selector. The cache records answers for one
> fixed target list, so document that callers must clear it before
> changing the list.

The subject line obfuscated the intent here (at least for me). I think a
more clear subject would just be: "ref-filter: always use
contains_tag_algo" or something.

But more importantly, I think the analysis above is missing a key point
about why we didn't make the tag algo the default in the first place: it
is depth first, and thus slower when the merge base can be found quickly
by the breadth-first traversal. For tags, you tend to have to look at
all of history anyway (because you have at least one old tag that
requires walking back that far), but that is often not true for
branches.

We are able to get the best of both worlds if we can cut off the
depth-first traversal early using generation numbers.

So I think a better rule here is to tweak the selection in
commit_contains() to select the depth-first algorithm when we have
generation numbers enabled. There's a patch in an old thread, which was
revived a week or two ago by Kristofer (cc'd):

  https://lore.kernel.org/git/20260527070510.3510836-1-krka@spotify.com/

> The memoized depth-first walk assumes acyclic ancestry, but replacement
> refs can create cycles. Track commits while they are on the walk. If a
> cycle is found, discard partial cache entries and use
> repo_is_descendant_of() for that candidate.

I can believe that the depth-first code doesn't handle cycles well. But
if that's the case, then it's already a problem for "git tag
--contains". And we should fix it as a separate patch from enabling that
algorithm in more cases.

I'm not quite sure how ancestry should be defined in a cycle. How does
the algorithm behave now when it sees a cycle? If it loops infinitely,
we definitely would want to fix that. If not, then to some degree I
don't care too much what answer is provided, since the input is somewhat
nonsense in the first place. And if it is expensive to track, it might
not be worth inflicting that penalty on the sane cases. But it looks
like your solution is just setting an extra flag value in the slab,
which should be pretty cheap.

> The branch and for-each-ref path passed repo_is_descendant_of() through
> a Boolean interface. In configurations where it returned -1 for missing
> ancestry, ref-filter treated the error as "contains". The memoized path
> instead fails when ancestry cannot be parsed, as git tag already did.
> During review of the 2018 reachability series, making parse failures
> fatal was explicitly deferred because that series was intended to
> preserve behavior. Unifying the implementations now makes all callers
> fail consistently instead of preserving that accidental Boolean
> interpretation.

I think that's a good outcome.

> The added p1500 case uses up to 8,192 packed refs along one first-parent
> history. It improves from 0.68 to 0.03 seconds.
> 
> On a checkout with 62,174 remote-tracking refs, I ran:
> 
>     hyperfine --warmup 0 --runs 3 \
>         --command-name parent \
>         '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
>         --command-name this-commit \
>         '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
> 
> The results were:
> 
>              parent       this commit
>   elapsed    104.365 s     467.7 ms
>   user        93.702 s     220.2 ms
>   system       0.723 s     182.7 ms

I didn't time it, but the probable regression case is something like
this: a very deep history with a small number of branches diverging only
a few commits away. Without a commit-graph file (or one without
generation numbers), that probably makes "git branch --contains" slower.

-Peff

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox