* [PATCH 2/3] replay: add helper to put entry into mapped_commits
From: Toon Claes @ 2026-06-08 18:37 UTC (permalink / raw)
To: git; +Cc: Toon Claes
In-Reply-To: <20260608-toon-git-replay-drop-merges-v1-0-e3ee71fce7b4@iotcl.com>
The function replay_revisions() in replay.c is rather lengthy. Extract
the logic to put commit entry into mapped_commits into a helper
function.
Signed-off-by: Toon Claes <toon@iotcl.com>
---
replay.c | 31 ++++++++++++++++++++-----------
1 file changed, 20 insertions(+), 11 deletions(-)
diff --git a/replay.c b/replay.c
index 1f8e5b083b..7921d7dba3 100644
--- a/replay.c
+++ b/replay.c
@@ -243,9 +243,9 @@ static void set_up_replay_mode(struct repository *repo,
strset_clear(&rinfo.positive_refs);
}
-static struct commit *mapped_commit(kh_oid_map_t *replayed_commits,
- struct commit *commit,
- struct commit *fallback)
+static struct commit *get_mapped_commit(kh_oid_map_t *replayed_commits,
+ struct commit *commit,
+ struct commit *fallback)
{
khint_t pos;
if (!commit)
@@ -256,6 +256,21 @@ static struct commit *mapped_commit(kh_oid_map_t *replayed_commits,
return kh_value(replayed_commits, pos);
}
+static void put_mapped_commit(kh_oid_map_t *replayed_commits,
+ struct commit *commit,
+ struct commit *new_commit)
+{
+ khint_t pos;
+ int ret;
+
+ pos = kh_put_oid_map(replayed_commits, commit->object.oid, &ret);
+ if (ret == 0)
+ BUG("Duplicate rewritten commit: %s\n",
+ oid_to_hex(&commit->object.oid));
+
+ kh_value(replayed_commits, pos) = new_commit;
+}
+
static struct commit *pick_regular_commit(struct repository *repo,
struct commit *pickme,
kh_oid_map_t *replayed_commits,
@@ -276,7 +291,7 @@ static struct commit *pick_regular_commit(struct repository *repo,
base_tree = lookup_tree(repo, repo->hash_algo->empty_tree);
}
- replayed_base = mapped_commit(replayed_commits, base, onto);
+ replayed_base = get_mapped_commit(replayed_commits, base, onto);
replayed_base_tree = repo_get_commit_tree(repo, replayed_base);
pickme_tree = repo_get_commit_tree(repo, pickme);
@@ -414,8 +429,6 @@ int replay_revisions(struct rev_info *revs,
replayed_commits = kh_init_oid_map();
while ((commit = get_revision(revs))) {
const struct name_decoration *decoration;
- khint_t pos;
- int hr;
if (commit->parents && commit->parents->next)
die(_("replaying merge commits is not supported yet!"));
@@ -427,11 +440,7 @@ int replay_revisions(struct rev_info *revs,
break;
/* Record commit -> last_commit mapping */
- pos = kh_put_oid_map(replayed_commits, commit->object.oid, &hr);
- if (hr == 0)
- BUG("Duplicate rewritten commit: %s\n",
- oid_to_hex(&commit->object.oid));
- kh_value(replayed_commits, pos) = last_commit;
+ put_mapped_commit(replayed_commits, commit, last_commit);
/* Update any necessary branches */
if (ref)
--
2.53.0.1323.g189a785ab5
^ permalink raw reply related
* [PATCH 3/3] replay: offer an option to linearize the commit topology
From: Toon Claes @ 2026-06-08 18:37 UTC (permalink / raw)
To: git; +Cc: Toon Claes, Johannes Schindelin
In-Reply-To: <20260608-toon-git-replay-drop-merges-v1-0-e3ee71fce7b4@iotcl.com>
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
One of the stated goals of git-replay(1) is to allow implementing the
git-rebase(1) functionality on the server side.
The default mode of git-rebase(1) is to act as if `--no-rebase-merges`
was given. This mode drops merge commits instead of replaying them, and
linearized the commit history into a sequence of the
regular (single-parent) commits.
Add option `--linearize` to git-replay(1) do the same.
Co-authored-by: Toon Claes <toon@iotcl.com>
---
Documentation/git-replay.adoc | 5 +++++
builtin/replay.c | 4 ++++
replay.c | 25 +++++++++++++++++++------
replay.h | 5 +++++
t/t3650-replay-basics.sh | 22 ++++++++++++++++++++++
5 files changed, 55 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-replay.adoc b/Documentation/git-replay.adoc
index a32f72aead..41c96c7061 100644
--- a/Documentation/git-replay.adoc
+++ b/Documentation/git-replay.adoc
@@ -88,6 +88,11 @@ incompatible with `--contained` (which is a modifier for `--onto` only).
+
The default mode can be configured via the `replay.refAction` configuration variable.
+--linearize::
+ In this mode, `git replay` imitates `git rebase --no-rebase-merges`,
+ i.e. it cherry-picks only non-merge commits, each one on top of the
+ previous one.
+
<revision-range>::
Range of commits to replay; see "Specifying Ranges" in
linkgit:git-rev-parse[1]. In `--advance=<branch>` or
diff --git a/builtin/replay.c b/builtin/replay.c
index 39e3a86f6c..fedfe46dc6 100644
--- a/builtin/replay.c
+++ b/builtin/replay.c
@@ -111,6 +111,8 @@ int cmd_replay(int argc,
N_("mode"),
N_("control ref update behavior (update|print)"),
PARSE_OPT_NONEG),
+ OPT_BOOL(0, "linearize", &opts.linearize,
+ N_("ignore merge commits instead of replaying them")),
OPT_END()
};
@@ -132,6 +134,8 @@ int cmd_replay(int argc,
opts.contained, "--contained");
die_for_incompatible_opt2(!!opts.ref, "--ref",
!!opts.contained, "--contained");
+ die_for_incompatible_opt2(!!opts.revert, "--revert",
+ opts.linearize, "--linearize");
/* Parse ref action mode from command line or config */
ref_mode = get_ref_action_mode(repo, ref_action);
diff --git a/replay.c b/replay.c
index 7921d7dba3..3e36908131 100644
--- a/replay.c
+++ b/replay.c
@@ -277,12 +277,16 @@ static struct commit *pick_regular_commit(struct repository *repo,
struct commit *onto,
struct merge_options *merge_opt,
struct merge_result *result,
+ struct commit *replayed_base,
bool reverse,
enum replay_empty_commit_action empty)
{
- struct commit *base, *replayed_base;
+ struct commit *base;
struct tree *pickme_tree, *base_tree, *replayed_base_tree;
+ if (replayed_base && reverse)
+ BUG("Linearizing commits is not supported when replaying in reverse");
+
if (pickme->parents) {
base = pickme->parents->item;
base_tree = repo_get_commit_tree(repo, base);
@@ -291,7 +295,8 @@ static struct commit *pick_regular_commit(struct repository *repo,
base_tree = lookup_tree(repo, repo->hash_algo->empty_tree);
}
- replayed_base = get_mapped_commit(replayed_commits, base, onto);
+ if (!replayed_base)
+ replayed_base = get_mapped_commit(replayed_commits, base, onto);
replayed_base_tree = repo_get_commit_tree(repo, replayed_base);
pickme_tree = repo_get_commit_tree(repo, pickme);
@@ -430,12 +435,20 @@ int replay_revisions(struct rev_info *revs,
while ((commit = get_revision(revs))) {
const struct name_decoration *decoration;
- if (commit->parents && commit->parents->next)
+ if (opts->linearize && (!commit->parents || commit->parents->next))
+ ; /* map current commit to the same as the previous commit */
+ else if (commit->parents && commit->parents->next)
die(_("replaying merge commits is not supported yet!"));
+ else {
+ struct commit *to_pick = reverse ? last_commit : onto;
+ last_commit =
+ pick_regular_commit(revs->repo, commit,
+ replayed_commits, to_pick,
+ &merge_opt, &result,
+ opts->linearize ? last_commit : NULL,
+ reverse, opts->empty);
+ }
- last_commit = pick_regular_commit(revs->repo, commit, replayed_commits,
- reverse ? last_commit : onto,
- &merge_opt, &result, reverse, opts->empty);
if (!last_commit)
break;
diff --git a/replay.h b/replay.h
index 1851a07705..07e6fdcca3 100644
--- a/replay.h
+++ b/replay.h
@@ -62,6 +62,11 @@ struct replay_revisions_options {
* Defaults to REPLAY_EMPTY_COMMIT_DROP.
*/
enum replay_empty_commit_action empty;
+
+ /*
+ * Whether to linearize the commits (i.e. drop merge commits).
+ */
+ int linearize;
};
/* This struct is used as an out-parameter by `replay_revisions()`. */
diff --git a/t/t3650-replay-basics.sh b/t/t3650-replay-basics.sh
index 3353bc4a4d..c781a3bb1b 100755
--- a/t/t3650-replay-basics.sh
+++ b/t/t3650-replay-basics.sh
@@ -565,4 +565,26 @@ test_expect_success '--onto with --ref rejects multiple revision ranges' '
test_grep "cannot be used with multiple revision ranges" err
'
+test_expect_success 'linearize the commit topology' '
+ test_tick &&
+ N=$(git commit-tree -m N -p L -p I L:) &&
+ N=$(git commit-tree -m N-child -p $N L:) &&
+ git update-ref refs/heads/N $N &&
+
+ git replay --ref-action=print --linearize \
+ --onto A B..refs/heads/N >out &&
+
+ test_line_count = 1 out &&
+ read N1 N2 N3 N4 <out &&
+
+ cat >expect <<-EOF &&
+ * N-child
+ * I
+ * L
+ o A
+ EOF
+ git log --format=%s --graph --boundary A...$N3 >actual &&
+ test_cmp expect actual
+'
+
test_done
--
2.53.0.1323.g189a785ab5
^ permalink raw reply related
* Re: [GSoC PATCH v2 3/4] repo: add path.gitdir with absolute and relative suffix formatting
From: Justin Tobler @ 2026-06-08 18:50 UTC (permalink / raw)
To: K Jayatheerth
Cc: git, a3205153416, gitster, kumarayushjha123, lucasseikioshiro,
phillip.wood, sandals
In-Reply-To: <20260605163012.181089-4-jayatheerthkulkarni2005@gmail.com>
On 26/06/05 10:00PM, K Jayatheerth wrote:
> Scripts often need to locate the `.git` directory. While `git rev-parse`
> provides this, it relies on command-line flags to dictate path formatting.
>
> Introduce `path.gitdir.absolute` and `path.gitdir.relative` keys to
> `git repo info`. Exposing separate format-specific keys instead of a base
> `path.gitdir` key avoids default fallbacks and requires callers to state
> their format requirements explicitly. Both keys use `format_path()` to
> resolve paths.
Makes sense.
> To test these keys, introduce the `test_repo_info_path` helper in
> `t/t1900-repo-info.sh`. The helper evaluates paths dynamically and accepts
> environment variable prefixes. This prepares the test suite for future path
> keys that depend on environment overrides, such as `commondir`.
>
> Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
> Mentored-by: Justin Tobler <jltobler@gmail.com>
> Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
> ---
> Documentation/git-repo.adoc | 6 ++++++
> builtin/repo.c | 26 ++++++++++++++++++++++++++
> t/t1900-repo-info.sh | 33 +++++++++++++++++++++++++++++++++
> 3 files changed, 65 insertions(+)
>
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> index 42262c1983..a0dca7ce88 100644
> --- a/Documentation/git-repo.adoc
> +++ b/Documentation/git-repo.adoc
> @@ -104,6 +104,12 @@ values that they return:
> `object.format`::
> The object format (hash algorithm) used in the repository.
>
> +`path.gitdir.absolute`::
> + The canonical absolute path to the Git repository directory (the `.git` directory).
> +
> +`path.gitdir.relative`::
> + The path to the Git repository directory relative to the current working directory.
> +
> `references.format`::
> The reference storage format. The valid values are:
> +
> diff --git a/builtin/repo.c b/builtin/repo.c
> index 71a5c1c29c..6e97f6a0e4 100644
> --- a/builtin/repo.c
> +++ b/builtin/repo.c
> @@ -7,12 +7,14 @@
> #include "hex.h"
> #include "odb.h"
> #include "parse-options.h"
> +#include "path.h"
> #include "path-walk.h"
> #include "progress.h"
> #include "quote.h"
> #include "ref-filter.h"
> #include "refs.h"
> #include "revision.h"
> +#include "setup.h"
> #include "strbuf.h"
> #include "string-list.h"
> #include "shallow.h"
> @@ -75,6 +77,28 @@ static int get_object_format(struct repository *repo, struct strbuf *buf)
> return 0;
> }
>
> +static int get_path_gitdir_absolute(struct repository *repo, struct strbuf *buf)
> +{
> + const char *git_dir = repo_get_git_dir(repo);
> +
> + if (!git_dir)
> + return error(_("unable to get git directory"));
> +
> + format_path(buf, git_dir, startup_info->prefix, PATH_FORMAT_CANONICAL);
For absolute paths, I don't think we actually need the prefix, but
providing it doesn't probably matter too much either way.
> + return 0;
> +}
> +
> +static int get_path_gitdir_relative(struct repository *repo, struct strbuf *buf)
> +{
> + const char *git_dir = repo_get_git_dir(repo);
> +
> + if (!git_dir)
> + return error(_("unable to get git directory"));
> +
> + format_path(buf, git_dir, startup_info->prefix, PATH_FORMAT_RELATIVE);
> + return 0;
> +}
Looks good.
> +
> static int get_references_format(struct repository *repo, struct strbuf *buf)
> {
> strbuf_addstr(buf,
> @@ -87,6 +111,8 @@ static const struct repo_info_field repo_info_field[] = {
> { "layout.bare", get_layout_bare },
> { "layout.shallow", get_layout_shallow },
> { "object.format", get_object_format },
> + { "path.gitdir.absolute", get_path_gitdir_absolute },
> + { "path.gitdir.relative", get_path_gitdir_relative },
> { "references.format", get_references_format },
> };
>
> diff --git a/t/t1900-repo-info.sh b/t/t1900-repo-info.sh
> index 39bb77dda0..0660b00bbc 100755
> --- a/t/t1900-repo-info.sh
> +++ b/t/t1900-repo-info.sh
> @@ -155,4 +155,37 @@ test_expect_success 'git repo info -h shows only repo info usage' '
> test_grep ! "git repo structure" actual
> '
>
> +test_repo_info_path () {
> + field_name=$1
> + expect_absolute_eval=$2
> + expect_relative=$3
> + env_prefix=$4
nit: I was a bit uncertain regarding the purpose of env_prefix here.
Since the env_prefix is not used by any tests yet, I wonder if it we
should delay adding it until the next patch. If we want to reduce churn
though, I think we could also swap the order of patch 3 and 4.
> +
> + test_expect_success "query individual key: path.$field_name.absolute${env_prefix:+ ($env_prefix)}" '
> + (
> + cd test-repo/sub &&
> + expect_absolute=$(eval "$expect_absolute_eval") &&
Can we just compute `expect_absolute` prior to passing it instead of
using eval here?
> + echo "path.$field_name.absolute=$expect_absolute" >expect &&
> + eval "${env_prefix:+$env_prefix }git repo info \"path.$field_name.absolute\"" >actual &&
> + test_cmp expect actual
> + )
> + '
> +
> + test_expect_success "query individual key: path.$field_name.relative${env_prefix:+ ($env_prefix)}" '
> + (
> + cd test-repo/sub &&
> + echo "path.$field_name.relative=$expect_relative" >expect &&
> + eval "${env_prefix:+$env_prefix }git repo info \"path.$field_name.relative\"" >actual &&
> + test_cmp expect actual
> + )
> + '
> +}
> +
> +test_expect_success 'setup test repository layout for path fields' '
> + git init test-repo &&
> + mkdir -p test-repo/sub
> +'
> +
> +test_repo_info_path 'gitdir' 'echo "$(cd .. && pwd)/.git"' '../.git'
hmmm, do we expect the path suffix to be the same between relative and
absolute paths for all test cases? If so, we could just have a single
`expect_path_suffix` argument and let the helper compute the appropriate
absolute and relative paths internally.
-Justin
^ permalink raw reply
* [PATCH v4 0/2] prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
From: Kristofer Karlsson via GitGitGadget @ 2026-06-08 19:10 UTC (permalink / raw)
To: git; +Cc: René Scharfe, Kristofer Karlsson, Kristofer Karlsson
In-Reply-To: <pull.2140.v3.git.1780832592.gitgitgadget@gmail.com>
Rene's lazy_queue wrapper in describe.c was a clever optimization -- by
deferring the get, a following put becomes a simple replace, avoiding a full
remove-rebalance-insert cycle.
It turns out this pattern is so common in git's traversal code that it makes
sense to fold it into prio_queue itself. Gets and puts are interleaved in
virtually every commit walk, so the fusion is essentially always a win.
This is mostly a code simplification -- three callers had independently
reimplemented the same optimization, and they all collapse to plain get+put
now. The 1.7-2.7% speedup on traversal-heavy workloads is a nice bonus.
More details and benchmark numbers in the commit message.
Related to but independent of the cascade sift-down work in
kk/prio-queue-cascade-sift -- the two can land in either order.
Changes in v4:
* Thanks Junio for review, applied all suggestions.
* Renamed .nr_internal to .nr_
* Restored flush_get() as a static inline helper instead of inlining the
flush logic into get() and peek().
* Guard empty-queue check with nr_ <= get_pending.
* Flipped commit order: the rename/accessor commit is now first, and the
behavioral fusion change is second. This was partly messy -- the first
rename commit introduces some ugly intermediate code (e.g. describe.c's
prio_queue_for_each with a skip variable) that gets cleaned up in commit
2 when the lazy get makes it unnecessary.
Changes in v3:
* Adopted Rene's suggestion to move the flush logic below the LIFO
early-return (LIFO mode never sets get_pending, so flushing there is a
no-op).
* Went a step further and inlined the flush logic directly into get() and
peek(), eliminating the flush_get() helper and its forward declaration of
sift_down_root().
* Updated benchmark numbers with more rigorous methodology: 30 interleaved
runs with paired t-test on an idle server. Split results into code paths
that already had manual fusion (neutral) vs code paths that benefit from
the new automatic fusion (1.7-2.7% improvement).
Changes in v2:
* Added a second commit that renames .nr to .nr_internal so that direct
access from outside prio-queue.c is a compile error. Verified that after
the rename, only prio-queue.c references nr_internal.
* Added prio_queue_for_each() macro for callers that need to walk all
elements (describe.c, show-branch.c, commit-reach.c, revision.c,
negotiator/skipping.c).
* Converted remaining .nr loop conditions to use
prio_queue_get()/prio_queue_peek() as the loop condition, or
prio_queue_size() where get/peek isn't suitable.
* Fixed several callers missed in v1 (object-name.c, fetch-pack.c,
path-walk.c, pack-bitmap-write.c, negotiator/default.c,
negotiator/skipping.c, revision.c, builtin/last-modified.c).
Kristofer Karlsson (2):
prio-queue: rename .nr to .nr_ and add accessor helpers
prio-queue: fold lazy_queue into prio_queue for automatic get+put
fusion
builtin/describe.c | 70 ++++++-----------------
builtin/last-modified.c | 7 +--
builtin/show-branch.c | 24 +++-----
commit-reach.c | 14 ++---
commit.c | 11 +---
fetch-pack.c | 4 +-
negotiator/default.c | 4 +-
negotiator/skipping.c | 12 ++--
object-name.c | 2 +-
pack-bitmap-write.c | 10 ++--
path-walk.c | 8 +--
prio-queue.c | 110 +++++++++++++++++++-----------------
prio-queue.h | 19 ++++---
revision.c | 16 +++---
t/unit-tests/u-prio-queue.c | 6 +-
walker.c | 4 +-
16 files changed, 141 insertions(+), 180 deletions(-)
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2140%2Fspkrka%2Flazy-prio-queue-pr-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2140/spkrka/lazy-prio-queue-pr-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/2140
Range-diff vs v3:
2: 033215e304 ! 1: d0f2294661 prio-queue: rename .nr to .nr_internal to prevent direct access
@@ Metadata
Author: Kristofer Karlsson <krka@spotify.com>
## Commit message ##
- prio-queue: rename .nr to .nr_internal to prevent direct access
+ prio-queue: rename .nr to .nr_ and add accessor helpers
- Rename the .nr member to .nr_internal so that callers outside
- prio-queue.c that directly reference .nr get a compilation error.
- This catches both existing misuse and future in-flight topics.
+ Rename the .nr member to .nr_ so that callers outside prio-queue.c
+ that directly reference .nr get a compilation error. This catches
+ both existing misuse and future in-flight topics.
- Add prio_queue_for_each() macro for callers that need to walk all
- elements in the queue, accounting for the get_pending offset.
+ Add prio_queue_size() for callers that need to know the element count
+ and prio_queue_for_each() for callers that need to walk all elements.
Convert all external .nr users:
- Loop conditions: use prio_queue_size(), prio_queue_get(), or
@@ Commit message
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
## builtin/describe.c ##
-@@ builtin/describe.c: static unsigned long finish_depth_computation(struct prio_queue *queue,
- struct oidset unflagged = OIDSET_INIT;
- struct commit *c;
+@@ builtin/describe.c: static void lazy_queue_put(struct lazy_queue *queue, void *thing)
-- for (size_t i = queue->get_pending; i < queue->nr; i++) {
-- struct commit *commit = queue->array[i].data;
-- if (!(commit->object.flags & best->flag_within))
-- oidset_insert(&unflagged, &commit->object.oid);
-+ prio_queue_for_each(queue, c) {
-+ if (!(c->object.flags & best->flag_within))
-+ oidset_insert(&unflagged, &c->object.oid);
- }
+ static bool lazy_queue_empty(const struct lazy_queue *queue)
+ {
+- return queue->queue.nr == (queue->get_pending ? 1 : 0);
++ return prio_queue_size(&queue->queue) == (queue->get_pending ? 1 : 0);
+ }
- while ((c = prio_queue_get(queue))) {
+ static void lazy_queue_clear(struct lazy_queue *queue)
+@@ builtin/describe.c: static unsigned long finish_depth_computation(struct lazy_queue *queue,
+ {
+ unsigned long seen_commits = 0;
+ struct oidset unflagged = OIDSET_INIT;
++ struct commit *commit;
++ int skip = queue->get_pending ? 1 : 0;
+
+- for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
+- struct commit *commit = queue->queue.array[i].data;
++ prio_queue_for_each(&queue->queue, commit) {
++ if (skip) {
++ skip = 0;
++ continue;
++ }
+ if (!(commit->object.flags & best->flag_within))
+ oidset_insert(&unflagged, &commit->object.oid);
+ }
## builtin/last-modified.c ##
@@ builtin/last-modified.c: static void process_parent(struct last_modified *lm,
static int last_modified_run(struct last_modified *lm)
{
int max_count, queue_popped = 0;
-- struct commit *c;
+ struct commit *c, *n;
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
struct commit_list *list;
+@@ builtin/last-modified.c: static int last_modified_run(struct last_modified *lm)
+ }
+ }
+
+- while (queue.nr) {
++ while ((c = prio_queue_get(&queue))) {
+ int parent_i;
+ struct commit_list *p;
+- struct commit *c = prio_queue_get(&queue);
+ struct bitmap *active_c = active_paths_for(lm, c);
+
+ if ((0 <= max_count && max_count < ++queue_popped) ||
@@ builtin/last-modified.c: static int last_modified_run(struct last_modified *lm)
*/
repo_parse_commit(lm->rev.repo, c);
@@ builtin/show-branch.c: static const char *get_color_reset_code(void)
static struct commit *interesting(struct prio_queue *queue)
{
-- for (size_t i = queue->get_pending; i < queue->nr; i++) {
+- for (size_t i = 0; i < queue->nr; i++) {
- struct commit *commit = queue->array[i].data;
- if (commit->object.flags & UNINTERESTING)
- continue;
@@ builtin/show-branch.c: static const char *get_color_reset_code(void)
}
return NULL;
}
+@@ builtin/show-branch.c: static void join_revs(struct prio_queue *queue,
+ {
+ int all_mask = ((1u << (REV_SHIFT + num_rev)) - 1);
+ int all_revs = all_mask & ~((1u << REV_SHIFT) - 1);
++ struct commit *commit;
+
+- while (queue->nr) {
++ while ((commit = prio_queue_peek(queue))) {
+ struct commit_list *parents;
+ int still_interesting = !!interesting(queue);
+- struct commit *commit = prio_queue_peek(queue);
+ bool get_pending = true;
+ int flags = commit->object.flags & all_mask;
+
## commit-reach.c ##
@@ commit-reach.c: static int compare_commits_by_gen(const void *_a, const void *_b)
@@ commit-reach.c: static int compare_commits_by_gen(const void *_a, const void *_b
return 1;
}
@@ commit-reach.c: void ahead_behind(struct repository *r,
- struct commit **commits, size_t commits_nr,
struct ahead_behind_count *counts, size_t counts_nr)
{
-+ struct commit *c;
struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
++ void *entry;
size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD);
-@@ commit-reach.c: void ahead_behind(struct repository *r,
- init_bit_arrays(&bit_arrays);
-
- for (size_t i = 0; i < commits_nr; i++) {
-- struct commit *c = commits[i];
-- struct bitmap *bitmap = get_bit_array(c, width);
-+ struct bitmap *bitmap;
-+ c = commits[i];
-+ bitmap = get_bit_array(c, width);
-
- bitmap_set(bitmap, i);
- insert_no_dup(&queue, c);
- }
-
- while (queue_has_nonstale(&queue)) {
-- struct commit *c = prio_queue_get(&queue);
- struct commit_list *p;
-- struct bitmap *bitmap_c = get_bit_array(c, width);
-+ struct bitmap *bitmap_c;
-+ c = prio_queue_get(&queue);
-+ bitmap_c = get_bit_array(c, width);
-
- for (size_t i = 0; i < counts_nr; i++) {
- int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
+ if (!commits_nr || !counts_nr)
@@ commit-reach.c: void ahead_behind(struct repository *r,
/* STALE is used here, PARENT2 is used by insert_no_dup(). */
repo_clear_commit_marks(r, PARENT2 | STALE);
- for (size_t i = 0; i < queue.nr; i++)
- free_bit_array(queue.array[i].data);
-+ prio_queue_for_each(&queue, c)
-+ free_bit_array(c);
++ prio_queue_for_each(&queue, entry)
++ free_bit_array(entry);
clear_bit_arrays(&bit_arrays);
clear_prio_queue(&queue);
}
+@@ commit-reach.c: int get_branch_base_for_tip(struct repository *r,
+ size_t bases_nr)
+ {
+ int best_index = -1;
+- struct commit *branch_point = NULL;
++ struct commit *c, *branch_point = NULL;
+ struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+ int found_missing_gen = 0;
+
+@@ commit-reach.c: int get_branch_base_for_tip(struct repository *r,
+ prio_queue_put(&queue, c);
+ }
+
+- while (queue.nr) {
+- struct commit *c = prio_queue_get(&queue);
++ while ((c = prio_queue_get(&queue))) {
+ int best_for_c = get_best(c);
+ int best_for_p, positive;
+ struct commit *parent;
## fetch-pack.c ##
@@ fetch-pack.c: static int mark_complete_oid(const struct reference *ref, void *cb_data UNUSED)
@@ object-name.c: static int get_oid_oneline(struct repository *r,
## pack-bitmap-write.c ##
@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
+ struct bitmap_index *old_bitmap,
const uint32_t *mapping)
{
- struct commit *c;
++ struct commit *c;
+ struct tree *tree;
int found;
uint32_t pos;
if (!ent->bitmap)
@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
+
+ prio_queue_put(queue, commit);
+
+- while (queue->nr) {
++ while ((c = prio_queue_get(queue))) {
+ struct commit_list *p;
+- struct commit *c = prio_queue_get(queue);
+
+ if (old_bitmap && mapping) {
+ struct ewah_bitmap *old;
+@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
}
}
@@ prio-queue.c: void prio_queue_reverse(struct prio_queue *queue)
if (queue->compare)
BUG("prio_queue_reverse() on non-LIFO queue");
- if (!queue->nr)
-+ if (!queue->nr_internal)
++ if (!queue->nr_)
return;
- for (i = 0; i < (j = (queue->nr - 1) - i); i++)
-+ for (i = 0; i < (j = (queue->nr_internal - 1) - i); i++)
++ for (i = 0; i < (j = (queue->nr_ - 1) - i); i++)
swap(queue, i, j);
}
@@ prio-queue.c: void prio_queue_reverse(struct prio_queue *queue)
{
FREE_AND_NULL(queue->array);
- queue->nr = 0;
-+ queue->nr_internal = 0;
++ queue->nr_ = 0;
queue->alloc = 0;
queue->insertion_ctr = 0;
- queue->get_pending = 0;
-@@ prio-queue.c: static void sift_down_root(struct prio_queue *queue)
- {
- size_t ix, child;
-
-- for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
-- child = ix * 2 + 1;
-- if (child + 1 < queue->nr &&
-+ /* Push down the one at the root */
-+ for (ix = 0; ix * 2 + 1 < queue->nr_internal; ix = child) {
-+ child = ix * 2 + 1; /* left */
-+ if (child + 1 < queue->nr_internal &&
- compare(queue, child, child + 1) >= 0)
-- child++;
-+ child++; /* use right child */
-+
- if (compare(queue, ix, child) <= 0)
- break;
-+
- swap(queue, child, ix);
- }
}
@@ prio-queue.c: void prio_queue_put(struct prio_queue *queue, void *thing)
- return;
- }
+ size_t ix, parent;
+ /* Append at the end */
- ALLOC_GROW(queue->array, queue->nr + 1, queue->alloc);
- queue->array[queue->nr].ctr = queue->insertion_ctr++;
- queue->array[queue->nr].data = thing;
- queue->nr++;
-+ /* Append at the end */
-+ ALLOC_GROW(queue->array, queue->nr_internal + 1, queue->alloc);
-+ queue->array[queue->nr_internal].ctr = queue->insertion_ctr++;
-+ queue->array[queue->nr_internal].data = thing;
-+ queue->nr_internal++;
++ ALLOC_GROW(queue->array, queue->nr_ + 1, queue->alloc);
++ queue->array[queue->nr_].ctr = queue->insertion_ctr++;
++ queue->array[queue->nr_].data = thing;
++ queue->nr_++;
if (!queue->compare)
-- return;
-+ return; /* LIFO */
+ return; /* LIFO */
+ /* Bubble up the new one */
- for (ix = queue->nr - 1; ix; ix = parent) {
-+ /* Bubble up the new one */
-+ for (ix = queue->nr_internal - 1; ix; ix = parent) {
++ for (ix = queue->nr_ - 1; ix; ix = parent) {
parent = (ix - 1) / 2;
if (compare(queue, parent, ix) <= 0)
break;
-+
- swap(queue, parent, ix);
- }
- }
+@@ prio-queue.c: static void sift_down_root(struct prio_queue *queue)
+ size_t ix, child;
+
+ /* Push down the one at the root */
+- for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
++ for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
+ child = ix * 2 + 1; /* left */
+- if (child + 1 < queue->nr &&
++ if (child + 1 < queue->nr_ &&
+ compare(queue, child, child + 1) >= 0)
+ child++; /* use right child */
- void *prio_queue_get(struct prio_queue *queue)
+@@ prio-queue.c: void *prio_queue_get(struct prio_queue *queue)
{
+ void *result;
+
- if (!queue->nr)
-+ if (!queue->nr_internal)
++ if (!queue->nr_)
return NULL;
if (!queue->compare)
-- return queue->array[--queue->nr].data;
-+ return queue->array[--queue->nr_internal].data; /* LIFO */
-
- if (queue->get_pending) {
-- if (!--queue->nr) {
-+ if (!--queue->nr_internal) {
- queue->get_pending = 0;
- return NULL;
- }
-- queue->array[0] = queue->array[queue->nr];
-+ queue->array[0] = queue->array[queue->nr_internal];
- sift_down_root(queue);
- }
-
-@@ prio-queue.c: void *prio_queue_get(struct prio_queue *queue)
+- return queue->array[--queue->nr].data; /* LIFO */
++ return queue->array[--queue->nr_].data; /* LIFO */
+
+ result = queue->array[0].data;
+- if (!--queue->nr)
++ if (!--queue->nr_)
+ return result;
+
+- queue->array[0] = queue->array[queue->nr];
++ queue->array[0] = queue->array[queue->nr_];
+ sift_down_root(queue);
+ return result;
+ }
void *prio_queue_peek(struct prio_queue *queue)
{
- if (!queue->nr)
-+ if (!queue->nr_internal)
++ if (!queue->nr_)
return NULL;
if (!queue->compare)
- return queue->array[queue->nr - 1].data;
-+ return queue->array[queue->nr_internal - 1].data;
-
- if (queue->get_pending) {
- queue->get_pending = 0;
-- if (!--queue->nr)
-+ if (!--queue->nr_internal)
- return NULL;
-- queue->array[0] = queue->array[queue->nr];
-+ queue->array[0] = queue->array[queue->nr_internal];
- sift_down_root(queue);
- }
++ return queue->array[queue->nr_ - 1].data;
+ return queue->array[0].data;
+ }
+ void prio_queue_replace(struct prio_queue *queue, void *thing)
+ {
+- if (!queue->nr) {
++ if (!queue->nr_) {
+ prio_queue_put(queue, thing);
+ } else if (!queue->compare) {
+- queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
+- queue->array[queue->nr - 1].data = thing;
++ queue->array[queue->nr_ - 1].ctr = queue->insertion_ctr++;
++ queue->array[queue->nr_ - 1].data = thing;
+ } else {
+ queue->array[0].ctr = queue->insertion_ctr++;
+ queue->array[0].data = thing;
## prio-queue.h ##
@@ prio-queue.h: struct prio_queue {
@@ prio-queue.h: struct prio_queue {
size_t insertion_ctr;
void *cb_data;
- size_t alloc, nr;
-+ size_t alloc, nr_internal; /* use prio_queue_size() for logical count */
++ size_t alloc, nr_;
struct prio_queue_entry *array;
- unsigned get_pending;
};
-@@ prio-queue.h: void *prio_queue_peek(struct prio_queue *);
- static inline size_t prio_queue_size(struct prio_queue *queue)
- {
-- return queue->nr - queue->get_pending;
-+ return queue->nr_internal - queue->get_pending;
- }
+@@ prio-queue.h: void *prio_queue_get(struct prio_queue *);
+ */
+ void *prio_queue_peek(struct prio_queue *);
++static inline size_t prio_queue_size(const struct prio_queue *queue)
++{
++ return queue->nr_;
++}
++
+#define prio_queue_for_each(queue, it) \
-+ for (size_t pq_ix_ = (queue)->get_pending; \
-+ pq_ix_ < (queue)->nr_internal && ((it) = (queue)->array[pq_ix_].data, 1); \
++ for (size_t pq_ix_ = 0; \
++ pq_ix_ < (queue)->nr_ && ((it) = (queue)->array[pq_ix_].data, 1); \
+ pq_ix_++)
+
- void clear_prio_queue(struct prio_queue *);
-
- /* Reverse the LIFO elements */
+ /*
+ * Replace the "thing" that compares the smallest with a new "thing",
+ * like prio_queue_get()+prio_queue_put() would do, but in a more
## revision.c ##
@@ revision.c: static struct commit *handle_commit(struct rev_info *revs,
@@ revision.c: static struct commit *handle_commit(struct rev_info *revs,
if (commit->object.flags & UNINTERESTING)
continue;
+@@ revision.c: static int limit_list(struct rev_info *revs)
+ struct commit_list *original_list = revs->commits;
+ struct commit_list *newlist = NULL;
+ struct commit_list **p = &newlist;
+- struct commit *interesting_cache = NULL;
++ struct commit *commit, *interesting_cache = NULL;
+ struct prio_queue queue = { .compare = compare_commits_by_commit_date };
+
+ if (revs->ancestry_path_implicit_bottoms) {
+@@ revision.c: static int limit_list(struct rev_info *revs)
+ prio_queue_put(&queue, commit);
+ }
+
+- while (queue.nr) {
+- struct commit *commit = prio_queue_get(&queue);
++ while ((commit = prio_queue_get(&queue))) {
+ struct object *obj = &commit->object;
+
+ if (commit == interesting_cache)
@@ revision.c: static enum rewrite_result rewrite_one_1(struct rev_info *revs,
static void merge_queue_into_list(struct prio_queue *q, struct commit_list **list)
@@ revision.c: static enum rewrite_result rewrite_one_1(struct rev_info *revs,
struct commit_list *p = *list;
if (p && p->item->date >= item->date)
+
+ ## walker.c ##
+@@ walker.c: static struct prio_queue complete = { compare_commits_by_commit_date };
+ static int process_commit(struct walker *walker, struct commit *commit)
+ {
+ struct commit_list *parents;
++ struct commit *item;
+
+ if (repo_parse_commit(the_repository, commit))
+ return -1;
+
+- while (complete.nr) {
+- struct commit *item = prio_queue_peek(&complete);
++ while ((item = prio_queue_peek(&complete))) {
+ if (item->date < commit->date)
+ break;
+ pop_most_recent_commit(&complete, COMPLETE);
1: e882206d29 ! 2: a3f4cb57f2 prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
@@ Commit message
Defer the actual removal in prio_queue_get() until the next
operation. If that next operation is a prio_queue_put(), the
- removal and insertion are fused into a single replace, writing
- the new element at the root and sifting it down which avoids
+ removal and insertion are fused into a single replace — writing
+ the new element at the root and sifting it down — which avoids
a full remove-rebalance-insert cycle.
This matches the dominant usage pattern in git's commit traversal:
- get a commit, then put its parents. The first parent insertion
+ get a commit, then put its parents. The first parent insertion
after each get is now a replace operation automatically.
This generalizes the lazy_queue pattern from builtin/describe.c
- (introduced in 08bb69d70f) into prio_queue itself. Three callers
+ (introduced in 08bb69d70f) into prio_queue itself. Three callers
independently implemented the same get+put fusion:
- builtin/describe.c had a full lazy_queue wrapper
@@ Commit message
- builtin/show-branch.c:join_revs() used peek+replace
All three now collapse to plain _get() and _put(), with the data
- structure handling the fusion internally. This simplifies callers
+ structure handling the fusion internally. This simplifies callers
and means every prio_queue user gets the optimization for free
without needing to implement it manually.
Remove prio_queue_replace() since no external callers remain.
- Add prio_queue_size() for callers that need the logical element
- count, since the physical nr may temporarily include a
- pending-removal element.
- Benchmarked on a large monorepo (30 interleaved runs,
+ Benchmarked on a 1.8M-commit monorepo (30 interleaved runs,
paired t-test, Xeon @ 2.20GHz):
Code paths that previously did eager get+put (new optimization):
- Command base patched change p
+ Command base patched change p
merge-base --all A A~1000 3828ms 3725ms -2.69% 0.0001
rev-list --count A~1000..A 3055ms 2986ms -2.27% 0.0601
log --oneline A~1000..A 3408ms 3350ms -1.71% 0.0482
Code paths that already had manual get+put fusion (expect
- neutral - the optimization moves into prio_queue but the number
+ neutral — the optimization moves into prio_queue but the number
of heap operations stays the same):
- Command base patched change p
- show-branch A A~1000 9156ms 9127ms -0.32% 0.3470
- describe (git.git) 1983ms 1963ms -1.02% <0.001
+ Command base patched change p
+ show-branch A A~1000 9156ms 9127ms -0.32% 0.3470
+ describe (4751 revs, 81K repo) 1983ms 1963ms -1.02% <0.001
No regressions in any scenario.
@@ builtin/describe.c: static int compare_pt(const void *a_, const void *b_)
-
-static bool lazy_queue_empty(const struct lazy_queue *queue)
-{
-- return queue->queue.nr == (queue->get_pending ? 1 : 0);
+- return prio_queue_size(&queue->queue) == (queue->get_pending ? 1 : 0);
-}
-
-static void lazy_queue_clear(struct lazy_queue *queue)
@@ builtin/describe.c: static int compare_pt(const void *a_, const void *b_)
{
unsigned long seen_commits = 0;
struct oidset unflagged = OIDSET_INIT;
+- struct commit *commit;
+- int skip = queue->get_pending ? 1 : 0;
+ struct commit *c;
-- for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
-- struct commit *commit = queue->queue.array[i].data;
-+ for (size_t i = queue->get_pending; i < queue->nr; i++) {
-+ struct commit *commit = queue->array[i].data;
- if (!(commit->object.flags & best->flag_within))
- oidset_insert(&unflagged, &commit->object.oid);
+- prio_queue_for_each(&queue->queue, commit) {
+- if (skip) {
+- skip = 0;
+- continue;
+- }
+- if (!(commit->object.flags & best->flag_within))
+- oidset_insert(&unflagged, &commit->object.oid);
++ prio_queue_for_each(queue, c) {
++ if (!(c->object.flags & best->flag_within))
++ oidset_insert(&unflagged, &c->object.oid);
}
- while (!lazy_queue_empty(queue)) {
@@ builtin/describe.c: static void describe_commit(struct commit *cmit, struct strb
if (debug) {
static int label_width = -1;
- ## builtin/last-modified.c ##
-@@ builtin/last-modified.c: static void process_parent(struct last_modified *lm,
- static int last_modified_run(struct last_modified *lm)
- {
- int max_count, queue_popped = 0;
-+ struct commit *c;
- struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
- struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
- struct commit_list *list;
-@@ builtin/last-modified.c: static int last_modified_run(struct last_modified *lm)
- }
- }
-
-- while (queue.nr) {
-+ while ((c = prio_queue_get(&queue))) {
- int parent_i;
- struct commit_list *p;
-- struct commit *c = prio_queue_get(&queue);
- struct bitmap *active_c = active_paths_for(lm, c);
-
- if ((0 <= max_count && max_count < ++queue_popped) ||
-
## builtin/show-branch.c ##
-@@ builtin/show-branch.c: static const char *get_color_reset_code(void)
-
- static struct commit *interesting(struct prio_queue *queue)
- {
-- for (size_t i = 0; i < queue->nr; i++) {
-+ for (size_t i = queue->get_pending; i < queue->nr; i++) {
- struct commit *commit = queue->array[i].data;
- if (commit->object.flags & UNINTERESTING)
- continue;
@@ builtin/show-branch.c: static void join_revs(struct prio_queue *queue,
- {
- int all_mask = ((1u << (REV_SHIFT + num_rev)) - 1);
- int all_revs = all_mask & ~((1u << REV_SHIFT) - 1);
-+ struct commit *commit;
-
-- while (queue->nr) {
-+ while ((commit = prio_queue_peek(queue))) {
+ while ((commit = prio_queue_peek(queue))) {
struct commit_list *parents;
int still_interesting = !!interesting(queue);
-- struct commit *commit = prio_queue_peek(queue);
- bool get_pending = true;
int flags = commit->object.flags & all_mask;
@@ builtin/show-branch.c: static void join_revs(struct prio_queue *queue,
/*
- ## commit-reach.c ##
-@@ commit-reach.c: int get_branch_base_for_tip(struct repository *r,
- size_t bases_nr)
- {
- int best_index = -1;
-- struct commit *branch_point = NULL;
-+ struct commit *c, *branch_point = NULL;
- struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
- int found_missing_gen = 0;
-
-@@ commit-reach.c: int get_branch_base_for_tip(struct repository *r,
- prio_queue_put(&queue, c);
- }
-
-- while (queue.nr) {
-- struct commit *c = prio_queue_get(&queue);
-+ while ((c = prio_queue_get(&queue))) {
- int best_for_c = get_best(c);
- int best_for_p, positive;
- struct commit *parent;
-
## commit.c ##
@@ commit.c: void commit_list_sort_by_date(struct commit_list **list)
struct commit *pop_most_recent_commit(struct prio_queue *queue,
@@ commit.c: void commit_list_sort_by_date(struct commit_list **list)
}
- ## pack-bitmap-write.c ##
-@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
- struct bitmap_index *old_bitmap,
- const uint32_t *mapping)
- {
-+ struct commit *c;
- int found;
- uint32_t pos;
- if (!ent->bitmap)
-@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
-
- prio_queue_put(queue, commit);
-
-- while (queue->nr) {
-+ while ((c = prio_queue_get(queue))) {
- struct commit_list *p;
-- struct commit *c = prio_queue_get(queue);
-
- if (old_bitmap && mapping) {
- struct ewah_bitmap *old;
-
## prio-queue.c ##
@@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
- queue->nr = 0;
+ queue->nr_ = 0;
queue->alloc = 0;
queue->insertion_ctr = 0;
+ queue->get_pending = 0;
@@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
+{
+ size_t ix, child;
+
-+ for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
-+ child = ix * 2 + 1;
-+ if (child + 1 < queue->nr &&
++ /* Push down the one at the root */
++ for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
++ child = ix * 2 + 1; /* left */
++ if (child + 1 < queue->nr_ &&
+ compare(queue, child, child + 1) >= 0)
-+ child++;
++ child++; /* use right child */
++
+ if (compare(queue, ix, child) <= 0)
+ break;
++
+ swap(queue, child, ix);
+ }
++}
++
++static inline void flush_get(struct prio_queue *queue)
++{
++ if (!queue->get_pending)
++ return;
++ queue->get_pending = 0;
++ queue->array[0] = queue->array[--queue->nr_];
++ sift_down_root(queue);
}
void prio_queue_put(struct prio_queue *queue, void *thing)
{
size_t ix, parent;
-- /* Append at the end */
+ if (queue->get_pending) {
+ queue->get_pending = 0;
+ queue->array[0].ctr = queue->insertion_ctr++;
@@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
+ return;
+ }
+
- ALLOC_GROW(queue->array, queue->nr + 1, queue->alloc);
- queue->array[queue->nr].ctr = queue->insertion_ctr++;
- queue->array[queue->nr].data = thing;
- queue->nr++;
- if (!queue->compare)
-- return; /* LIFO */
-+ return;
-
-- /* Bubble up the new one */
- for (ix = queue->nr - 1; ix; ix = parent) {
- parent = (ix - 1) / 2;
- if (compare(queue, parent, ix) <= 0)
- break;
--
- swap(queue, parent, ix);
+ /* Append at the end */
+ ALLOC_GROW(queue->array, queue->nr_ + 1, queue->alloc);
+ queue->array[queue->nr_].ctr = queue->insertion_ctr++;
+@@ prio-queue.c: void prio_queue_put(struct prio_queue *queue, void *thing)
}
}
@@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
- size_t ix, child;
-
- /* Push down the one at the root */
-- for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
+- for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
- child = ix * 2 + 1; /* left */
-- if (child + 1 < queue->nr &&
+- if (child + 1 < queue->nr_ &&
- compare(queue, child, child + 1) >= 0)
- child++; /* use right child */
-
@@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
{
- void *result;
-
- if (!queue->nr)
+- if (!queue->nr_)
++ if (queue->nr_ <= queue->get_pending) {
++ queue->nr_ = 0;
++ queue->get_pending = 0;
return NULL;
++ }
if (!queue->compare)
-- return queue->array[--queue->nr].data; /* LIFO */
--
+ return queue->array[--queue->nr_].data; /* LIFO */
+
- result = queue->array[0].data;
-- if (!--queue->nr)
+- if (!--queue->nr_)
- return result;
-+ return queue->array[--queue->nr].data;
-+
-+ if (queue->get_pending) {
-+ if (!--queue->nr) {
-+ queue->get_pending = 0;
-+ return NULL;
-+ }
-+ queue->array[0] = queue->array[queue->nr];
-+ sift_down_root(queue);
-+ }
++ flush_get(queue);
-- queue->array[0] = queue->array[queue->nr];
+- queue->array[0] = queue->array[queue->nr_];
- sift_down_root(queue);
- return result;
+ queue->get_pending = 1;
@@ prio-queue.c: void clear_prio_queue(struct prio_queue *queue)
}
void *prio_queue_peek(struct prio_queue *queue)
-@@ prio-queue.c: void *prio_queue_peek(struct prio_queue *queue)
+ {
+- if (!queue->nr_)
++ if (queue->nr_ <= queue->get_pending) {
++ queue->nr_ = 0;
++ queue->get_pending = 0;
return NULL;
++ }
if (!queue->compare)
- return queue->array[queue->nr - 1].data;
+ return queue->array[queue->nr_ - 1].data;
- return queue->array[0].data;
-}
-void prio_queue_replace(struct prio_queue *queue, void *thing)
-{
-- if (!queue->nr) {
+- if (!queue->nr_) {
- prio_queue_put(queue, thing);
- } else if (!queue->compare) {
-- queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
-- queue->array[queue->nr - 1].data = thing;
+- queue->array[queue->nr_ - 1].ctr = queue->insertion_ctr++;
+- queue->array[queue->nr_ - 1].data = thing;
- } else {
- queue->array[0].ctr = queue->insertion_ctr++;
- queue->array[0].data = thing;
-+ if (queue->get_pending) {
-+ queue->get_pending = 0;
-+ if (!--queue->nr)
-+ return NULL;
-+ queue->array[0] = queue->array[queue->nr];
- sift_down_root(queue);
- }
+- sift_down_root(queue);
+- }
++ flush_get(queue);
+
+ return queue->array[0].data;
}
## prio-queue.h ##
@@ prio-queue.h: struct prio_queue {
+ prio_queue_compare_fn compare;
+ size_t insertion_ctr;
void *cb_data;
- size_t alloc, nr;
+- size_t alloc, nr_;
++ size_t alloc, nr_; /* use prio_queue_size() for logical count */
struct prio_queue_entry *array;
+ unsigned get_pending;
};
/*
-@@ prio-queue.h: void *prio_queue_get(struct prio_queue *);
- */
- void *prio_queue_peek(struct prio_queue *);
+@@ prio-queue.h: void *prio_queue_peek(struct prio_queue *);
+
+ static inline size_t prio_queue_size(const struct prio_queue *queue)
+ {
+- return queue->nr_;
++ return queue->nr_ - queue->get_pending;
+ }
+
+ #define prio_queue_for_each(queue, it) \
+- for (size_t pq_ix_ = 0; \
++ for (size_t pq_ix_ = (queue)->get_pending; \
+ pq_ix_ < (queue)->nr_ && ((it) = (queue)->array[pq_ix_].data, 1); \
+ pq_ix_++)
-/*
- * Replace the "thing" that compares the smallest with a new "thing",
@@ prio-queue.h: void *prio_queue_get(struct prio_queue *);
- * empty.
- */
-void prio_queue_replace(struct prio_queue *queue, void *thing);
-+static inline size_t prio_queue_size(struct prio_queue *queue)
-+{
-+ return queue->nr - queue->get_pending;
-+}
-
+-
void clear_prio_queue(struct prio_queue *);
-
- ## revision.c ##
-@@ revision.c: static int limit_list(struct rev_info *revs)
- struct commit_list *original_list = revs->commits;
- struct commit_list *newlist = NULL;
- struct commit_list **p = &newlist;
-- struct commit *interesting_cache = NULL;
-+ struct commit *commit, *interesting_cache = NULL;
- struct prio_queue queue = { .compare = compare_commits_by_commit_date };
-
- if (revs->ancestry_path_implicit_bottoms) {
-@@ revision.c: static int limit_list(struct rev_info *revs)
- prio_queue_put(&queue, commit);
- }
-
-- while (queue.nr) {
-- struct commit *commit = prio_queue_get(&queue);
-+ while ((commit = prio_queue_get(&queue))) {
- struct object *obj = &commit->object;
-
- if (commit == interesting_cache)
+ /* Reverse the LIFO elements */
## t/unit-tests/u-prio-queue.c ##
@@ t/unit-tests/u-prio-queue.c: static void test_prio_queue(int *input, size_t input_size,
@@ t/unit-tests/u-prio-queue.c: static void test_prio_queue(int *input, size_t inpu
break;
default:
prio_queue_put(&pq, &input[i]);
-
- ## walker.c ##
-@@ walker.c: static struct prio_queue complete = { compare_commits_by_commit_date };
- static int process_commit(struct walker *walker, struct commit *commit)
- {
- struct commit_list *parents;
-+ struct commit *item;
-
- if (repo_parse_commit(the_repository, commit))
- return -1;
-
-- while (complete.nr) {
-- struct commit *item = prio_queue_peek(&complete);
-+ while ((item = prio_queue_peek(&complete))) {
- if (item->date < commit->date)
- break;
- pop_most_recent_commit(&complete, COMPLETE);
--
gitgitgadget
^ permalink raw reply
* [PATCH v4 1/2] prio-queue: rename .nr to .nr_ and add accessor helpers
From: Kristofer Karlsson via GitGitGadget @ 2026-06-08 19:10 UTC (permalink / raw)
To: git
Cc: René Scharfe, Kristofer Karlsson, Kristofer Karlsson,
Kristofer Karlsson
In-Reply-To: <pull.2140.v4.git.1780945851.gitgitgadget@gmail.com>
From: Kristofer Karlsson <krka@spotify.com>
Rename the .nr member to .nr_ so that callers outside prio-queue.c
that directly reference .nr get a compilation error. This catches
both existing misuse and future in-flight topics.
Add prio_queue_size() for callers that need to know the element count
and prio_queue_for_each() for callers that need to walk all elements.
Convert all external .nr users:
- Loop conditions: use prio_queue_size(), prio_queue_get(), or
prio_queue_peek() as the loop condition
- Array iterations: use prio_queue_for_each()
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
builtin/describe.c | 11 ++++++++---
builtin/last-modified.c | 7 +++----
builtin/show-branch.c | 13 ++++++-------
commit-reach.c | 14 +++++++-------
fetch-pack.c | 4 ++--
negotiator/default.c | 4 +++-
negotiator/skipping.c | 12 +++++++-----
object-name.c | 2 +-
pack-bitmap-write.c | 10 +++++-----
path-walk.c | 8 ++++----
prio-queue.c | 38 +++++++++++++++++++-------------------
prio-queue.h | 12 +++++++++++-
revision.c | 16 +++++++---------
walker.c | 4 ++--
14 files changed, 85 insertions(+), 70 deletions(-)
diff --git a/builtin/describe.c b/builtin/describe.c
index 1c47d7c0b7..8e88bdeea6 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -278,7 +278,7 @@ static void lazy_queue_put(struct lazy_queue *queue, void *thing)
static bool lazy_queue_empty(const struct lazy_queue *queue)
{
- return queue->queue.nr == (queue->get_pending ? 1 : 0);
+ return prio_queue_size(&queue->queue) == (queue->get_pending ? 1 : 0);
}
static void lazy_queue_clear(struct lazy_queue *queue)
@@ -292,9 +292,14 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
{
unsigned long seen_commits = 0;
struct oidset unflagged = OIDSET_INIT;
+ struct commit *commit;
+ int skip = queue->get_pending ? 1 : 0;
- for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
- struct commit *commit = queue->queue.array[i].data;
+ prio_queue_for_each(&queue->queue, commit) {
+ if (skip) {
+ skip = 0;
+ continue;
+ }
if (!(commit->object.flags & best->flag_within))
oidset_insert(&unflagged, &commit->object.oid);
}
diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index 8900ceece1..5478182f2e 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -344,6 +344,7 @@ static void process_parent(struct last_modified *lm,
static int last_modified_run(struct last_modified *lm)
{
int max_count, queue_popped = 0;
+ struct commit *c, *n;
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
struct commit_list *list;
@@ -389,10 +390,9 @@ static int last_modified_run(struct last_modified *lm)
}
}
- while (queue.nr) {
+ while ((c = prio_queue_get(&queue))) {
int parent_i;
struct commit_list *p;
- struct commit *c = prio_queue_get(&queue);
struct bitmap *active_c = active_paths_for(lm, c);
if ((0 <= max_count && max_count < ++queue_popped) ||
@@ -416,9 +416,8 @@ static int last_modified_run(struct last_modified *lm)
*/
repo_parse_commit(lm->rev.repo, c);
- while (not_queue.nr) {
+ while ((n = prio_queue_get(¬_queue))) {
struct commit_list *np;
- struct commit *n = prio_queue_get(¬_queue);
repo_parse_commit(lm->rev.repo, n);
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index f02831b085..8846f2376f 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -62,11 +62,10 @@ static const char *get_color_reset_code(void)
static struct commit *interesting(struct prio_queue *queue)
{
- for (size_t i = 0; i < queue->nr; i++) {
- struct commit *commit = queue->array[i].data;
- if (commit->object.flags & UNINTERESTING)
- continue;
- return commit;
+ struct commit *commit;
+ prio_queue_for_each(queue, commit) {
+ if (!(commit->object.flags & UNINTERESTING))
+ return commit;
}
return NULL;
}
@@ -228,11 +227,11 @@ static void join_revs(struct prio_queue *queue,
{
int all_mask = ((1u << (REV_SHIFT + num_rev)) - 1);
int all_revs = all_mask & ~((1u << REV_SHIFT) - 1);
+ struct commit *commit;
- while (queue->nr) {
+ while ((commit = prio_queue_peek(queue))) {
struct commit_list *parents;
int still_interesting = !!interesting(queue);
- struct commit *commit = prio_queue_peek(queue);
bool get_pending = true;
int flags = commit->object.flags & all_mask;
diff --git a/commit-reach.c b/commit-reach.c
index 9b3ea46d6f..a849de653e 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -41,8 +41,8 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
static int queue_has_nonstale(struct prio_queue *queue)
{
- for (size_t i = 0; i < queue->nr; i++) {
- struct commit *commit = queue->array[i].data;
+ struct commit *commit;
+ prio_queue_for_each(queue, commit) {
if (!(commit->object.flags & STALE))
return 1;
}
@@ -1070,6 +1070,7 @@ void ahead_behind(struct repository *r,
struct ahead_behind_count *counts, size_t counts_nr)
{
struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
+ void *entry;
size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD);
if (!commits_nr || !counts_nr)
@@ -1135,8 +1136,8 @@ void ahead_behind(struct repository *r,
/* STALE is used here, PARENT2 is used by insert_no_dup(). */
repo_clear_commit_marks(r, PARENT2 | STALE);
- for (size_t i = 0; i < queue.nr; i++)
- free_bit_array(queue.array[i].data);
+ prio_queue_for_each(&queue, entry)
+ free_bit_array(entry);
clear_bit_arrays(&bit_arrays);
clear_prio_queue(&queue);
}
@@ -1269,7 +1270,7 @@ int get_branch_base_for_tip(struct repository *r,
size_t bases_nr)
{
int best_index = -1;
- struct commit *branch_point = NULL;
+ struct commit *c, *branch_point = NULL;
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
int found_missing_gen = 0;
@@ -1322,8 +1323,7 @@ int get_branch_base_for_tip(struct repository *r,
prio_queue_put(&queue, c);
}
- while (queue.nr) {
- struct commit *c = prio_queue_get(&queue);
+ while ((c = prio_queue_get(&queue))) {
int best_for_c = get_best(c);
int best_for_p, positive;
struct commit *parent;
diff --git a/fetch-pack.c b/fetch-pack.c
index 120e01f3cf..29c41132ee 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -662,8 +662,8 @@ static int mark_complete_oid(const struct reference *ref, void *cb_data UNUSED)
static void mark_recent_complete_commits(struct fetch_pack_args *args,
timestamp_t cutoff)
{
- while (complete.nr) {
- struct commit *item = prio_queue_peek(&complete);
+ struct commit *item;
+ while ((item = prio_queue_peek(&complete))) {
if (item->date < cutoff)
break;
print_verbose(args, _("Marking %s as complete"),
diff --git a/negotiator/default.c b/negotiator/default.c
index 78d58d57ce..19cdf3808c 100644
--- a/negotiator/default.c
+++ b/negotiator/default.c
@@ -113,10 +113,12 @@ static const struct object_id *get_rev(struct negotiation_state *ns)
unsigned int mark;
struct commit_list *parents;
- if (ns->rev_list.nr == 0 || ns->non_common_revs == 0)
+ if (ns->non_common_revs == 0)
return NULL;
commit = prio_queue_get(&ns->rev_list);
+ if (!commit)
+ return NULL;
repo_parse_commit(the_repository, commit);
parents = commit->parents;
diff --git a/negotiator/skipping.c b/negotiator/skipping.c
index 68c9b3b997..db90fa77b5 100644
--- a/negotiator/skipping.c
+++ b/negotiator/skipping.c
@@ -143,8 +143,7 @@ static int push_parent(struct data *data, struct entry *entry,
/*
* Find the existing entry and use it.
*/
- for (size_t i = 0; i < data->rev_list.nr; i++) {
- parent_entry = data->rev_list.array[i].data;
+ prio_queue_for_each(&data->rev_list, parent_entry) {
if (parent_entry->commit == to_push)
goto parent_found;
}
@@ -181,10 +180,12 @@ static const struct object_id *get_rev(struct data *data)
struct commit_list *p;
int parent_pushed = 0;
- if (data->rev_list.nr == 0 || data->non_common_revs == 0)
+ if (data->non_common_revs == 0)
return NULL;
entry = prio_queue_get(&data->rev_list);
+ if (!entry)
+ return NULL;
commit = entry->commit;
commit->object.flags |= POPPED;
if (!(commit->object.flags & COMMON))
@@ -253,8 +254,9 @@ static void have_sent(struct fetch_negotiator *n, struct commit *c)
static void release(struct fetch_negotiator *n)
{
struct data *data = n->data;
- for (size_t i = 0; i < data->rev_list.nr; i++)
- free(data->rev_list.array[i].data);
+ void *entry;
+ prio_queue_for_each(&data->rev_list, entry)
+ free(entry);
clear_prio_queue(&data->rev_list);
FREE_AND_NULL(data);
}
diff --git a/object-name.c b/object-name.c
index 9ac86f19c7..2fedfe1761 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1208,7 +1208,7 @@ static int get_oid_oneline(struct repository *r,
l->item->object.flags |= ONELINE_SEEN;
prio_queue_put(©, l->item);
}
- while (copy.nr) {
+ while (prio_queue_size(©)) {
const char *p, *buf;
struct commit *commit;
int matches;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 1c8070f99c..ed9714b135 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -513,6 +513,8 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
struct bitmap_index *old_bitmap,
const uint32_t *mapping)
{
+ struct commit *c;
+ struct tree *tree;
int found;
uint32_t pos;
if (!ent->bitmap)
@@ -520,9 +522,8 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
prio_queue_put(queue, commit);
- while (queue->nr) {
+ while ((c = prio_queue_get(queue))) {
struct commit_list *p;
- struct commit *c = prio_queue_get(queue);
if (old_bitmap && mapping) {
struct ewah_bitmap *old;
@@ -574,9 +575,8 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
}
}
- while (tree_queue->nr) {
- if (fill_bitmap_tree(writer, ent->bitmap,
- prio_queue_get(tree_queue)) < 0)
+ while ((tree = prio_queue_get(tree_queue))) {
+ if (fill_bitmap_tree(writer, ent->bitmap, tree) < 0)
return -1;
}
return 0;
diff --git a/path-walk.c b/path-walk.c
index 94ff90bd15..cf3b2d0765 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -699,6 +699,7 @@ int walk_objects_by_path(struct path_walk_info *info)
int ret;
size_t commits_nr = 0, paths_nr = 0;
struct commit *c;
+ char *path;
struct type_and_oid_list *root_tree_list;
struct type_and_oid_list *commit_list;
struct path_walk_context ctx = {
@@ -808,8 +809,7 @@ int walk_objects_by_path(struct path_walk_info *info)
free(commit_list);
trace2_region_enter("path-walk", "path-walk", info->revs->repo);
- while (!ret && ctx.path_stack.nr) {
- char *path = prio_queue_get(&ctx.path_stack);
+ while (!ret && (path = prio_queue_get(&ctx.path_stack))) {
paths_nr++;
ret = walk_path(&ctx, path);
@@ -821,12 +821,12 @@ int walk_objects_by_path(struct path_walk_info *info)
if (!strmap_empty(&ctx.paths_to_lists)) {
struct hashmap_iter iter;
struct strmap_entry *entry;
+ char *path;
strmap_for_each_entry(&ctx.paths_to_lists, &iter, entry)
push_to_stack(&ctx, entry->key);
- while (!ret && ctx.path_stack.nr) {
- char *path = prio_queue_get(&ctx.path_stack);
+ while (!ret && (path = prio_queue_get(&ctx.path_stack))) {
paths_nr++;
ret = walk_path(&ctx, path);
diff --git a/prio-queue.c b/prio-queue.c
index 9748528ce6..ead4faf4bb 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -22,16 +22,16 @@ void prio_queue_reverse(struct prio_queue *queue)
if (queue->compare)
BUG("prio_queue_reverse() on non-LIFO queue");
- if (!queue->nr)
+ if (!queue->nr_)
return;
- for (i = 0; i < (j = (queue->nr - 1) - i); i++)
+ for (i = 0; i < (j = (queue->nr_ - 1) - i); i++)
swap(queue, i, j);
}
void clear_prio_queue(struct prio_queue *queue)
{
FREE_AND_NULL(queue->array);
- queue->nr = 0;
+ queue->nr_ = 0;
queue->alloc = 0;
queue->insertion_ctr = 0;
}
@@ -41,15 +41,15 @@ void prio_queue_put(struct prio_queue *queue, void *thing)
size_t ix, parent;
/* Append at the end */
- ALLOC_GROW(queue->array, queue->nr + 1, queue->alloc);
- queue->array[queue->nr].ctr = queue->insertion_ctr++;
- queue->array[queue->nr].data = thing;
- queue->nr++;
+ ALLOC_GROW(queue->array, queue->nr_ + 1, queue->alloc);
+ queue->array[queue->nr_].ctr = queue->insertion_ctr++;
+ queue->array[queue->nr_].data = thing;
+ queue->nr_++;
if (!queue->compare)
return; /* LIFO */
/* Bubble up the new one */
- for (ix = queue->nr - 1; ix; ix = parent) {
+ for (ix = queue->nr_ - 1; ix; ix = parent) {
parent = (ix - 1) / 2;
if (compare(queue, parent, ix) <= 0)
break;
@@ -63,9 +63,9 @@ static void sift_down_root(struct prio_queue *queue)
size_t ix, child;
/* Push down the one at the root */
- for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
+ for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
child = ix * 2 + 1; /* left */
- if (child + 1 < queue->nr &&
+ if (child + 1 < queue->nr_ &&
compare(queue, child, child + 1) >= 0)
child++; /* use right child */
@@ -80,36 +80,36 @@ void *prio_queue_get(struct prio_queue *queue)
{
void *result;
- if (!queue->nr)
+ if (!queue->nr_)
return NULL;
if (!queue->compare)
- return queue->array[--queue->nr].data; /* LIFO */
+ return queue->array[--queue->nr_].data; /* LIFO */
result = queue->array[0].data;
- if (!--queue->nr)
+ if (!--queue->nr_)
return result;
- queue->array[0] = queue->array[queue->nr];
+ queue->array[0] = queue->array[queue->nr_];
sift_down_root(queue);
return result;
}
void *prio_queue_peek(struct prio_queue *queue)
{
- if (!queue->nr)
+ if (!queue->nr_)
return NULL;
if (!queue->compare)
- return queue->array[queue->nr - 1].data;
+ return queue->array[queue->nr_ - 1].data;
return queue->array[0].data;
}
void prio_queue_replace(struct prio_queue *queue, void *thing)
{
- if (!queue->nr) {
+ if (!queue->nr_) {
prio_queue_put(queue, thing);
} else if (!queue->compare) {
- queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
- queue->array[queue->nr - 1].data = thing;
+ queue->array[queue->nr_ - 1].ctr = queue->insertion_ctr++;
+ queue->array[queue->nr_ - 1].data = thing;
} else {
queue->array[0].ctr = queue->insertion_ctr++;
queue->array[0].data = thing;
diff --git a/prio-queue.h b/prio-queue.h
index da7fad2f1f..7f2aa986b1 100644
--- a/prio-queue.h
+++ b/prio-queue.h
@@ -30,7 +30,7 @@ struct prio_queue {
prio_queue_compare_fn compare;
size_t insertion_ctr;
void *cb_data;
- size_t alloc, nr;
+ size_t alloc, nr_;
struct prio_queue_entry *array;
};
@@ -52,6 +52,16 @@ void *prio_queue_get(struct prio_queue *);
*/
void *prio_queue_peek(struct prio_queue *);
+static inline size_t prio_queue_size(const struct prio_queue *queue)
+{
+ return queue->nr_;
+}
+
+#define prio_queue_for_each(queue, it) \
+ for (size_t pq_ix_ = 0; \
+ pq_ix_ < (queue)->nr_ && ((it) = (queue)->array[pq_ix_].data, 1); \
+ pq_ix_++)
+
/*
* Replace the "thing" that compares the smallest with a new "thing",
* like prio_queue_get()+prio_queue_put() would do, but in a more
diff --git a/revision.c b/revision.c
index 5693618be4..34e2d146f4 100644
--- a/revision.c
+++ b/revision.c
@@ -476,16 +476,15 @@ static struct commit *handle_commit(struct rev_info *revs,
static int everybody_uninteresting(struct prio_queue *orig,
struct commit **interesting_cache)
{
- size_t i;
+ struct commit *commit;
if (*interesting_cache) {
- struct commit *commit = *interesting_cache;
+ commit = *interesting_cache;
if (!(commit->object.flags & UNINTERESTING))
return 0;
}
- for (i = 0; i < orig->nr; i++) {
- struct commit *commit = orig->array[i].data;
+ prio_queue_for_each(orig, commit) {
if (commit->object.flags & UNINTERESTING)
continue;
@@ -1446,7 +1445,7 @@ static int limit_list(struct rev_info *revs)
struct commit_list *original_list = revs->commits;
struct commit_list *newlist = NULL;
struct commit_list **p = &newlist;
- struct commit *interesting_cache = NULL;
+ struct commit *commit, *interesting_cache = NULL;
struct prio_queue queue = { .compare = compare_commits_by_commit_date };
if (revs->ancestry_path_implicit_bottoms) {
@@ -1461,8 +1460,7 @@ static int limit_list(struct rev_info *revs)
prio_queue_put(&queue, commit);
}
- while (queue.nr) {
- struct commit *commit = prio_queue_get(&queue);
+ while ((commit = prio_queue_get(&queue))) {
struct object *obj = &commit->object;
if (commit == interesting_cache)
@@ -4028,8 +4026,8 @@ static enum rewrite_result rewrite_one_1(struct rev_info *revs,
static void merge_queue_into_list(struct prio_queue *q, struct commit_list **list)
{
- while (q->nr) {
- struct commit *item = prio_queue_peek(q);
+ struct commit *item;
+ while ((item = prio_queue_peek(q))) {
struct commit_list *p = *list;
if (p && p->item->date >= item->date)
diff --git a/walker.c b/walker.c
index e98eb6da53..e3de77f092 100644
--- a/walker.c
+++ b/walker.c
@@ -84,12 +84,12 @@ static struct prio_queue complete = { compare_commits_by_commit_date };
static int process_commit(struct walker *walker, struct commit *commit)
{
struct commit_list *parents;
+ struct commit *item;
if (repo_parse_commit(the_repository, commit))
return -1;
- while (complete.nr) {
- struct commit *item = prio_queue_peek(&complete);
+ while ((item = prio_queue_peek(&complete))) {
if (item->date < commit->date)
break;
pop_most_recent_commit(&complete, COMPLETE);
--
gitgitgadget
^ permalink raw reply related
* [PATCH v4 2/2] prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
From: Kristofer Karlsson via GitGitGadget @ 2026-06-08 19:10 UTC (permalink / raw)
To: git
Cc: René Scharfe, Kristofer Karlsson, Kristofer Karlsson,
Kristofer Karlsson
In-Reply-To: <pull.2140.v4.git.1780945851.gitgitgadget@gmail.com>
From: Kristofer Karlsson <krka@spotify.com>
Defer the actual removal in prio_queue_get() until the next
operation. If that next operation is a prio_queue_put(), the
removal and insertion are fused into a single replace — writing
the new element at the root and sifting it down — which avoids
a full remove-rebalance-insert cycle.
This matches the dominant usage pattern in git's commit traversal:
get a commit, then put its parents. The first parent insertion
after each get is now a replace operation automatically.
This generalizes the lazy_queue pattern from builtin/describe.c
(introduced in 08bb69d70f) into prio_queue itself. Three callers
independently implemented the same get+put fusion:
- builtin/describe.c had a full lazy_queue wrapper
- commit.c:pop_most_recent_commit() used peek+replace
- builtin/show-branch.c:join_revs() used peek+replace
All three now collapse to plain _get() and _put(), with the data
structure handling the fusion internally. This simplifies callers
and means every prio_queue user gets the optimization for free
without needing to implement it manually.
Remove prio_queue_replace() since no external callers remain.
Benchmarked on a 1.8M-commit monorepo (30 interleaved runs,
paired t-test, Xeon @ 2.20GHz):
Code paths that previously did eager get+put (new optimization):
Command base patched change p
merge-base --all A A~1000 3828ms 3725ms -2.69% 0.0001
rev-list --count A~1000..A 3055ms 2986ms -2.27% 0.0601
log --oneline A~1000..A 3408ms 3350ms -1.71% 0.0482
Code paths that already had manual get+put fusion (expect
neutral — the optimization moves into prio_queue but the number
of heap operations stays the same):
Command base patched change p
show-branch A A~1000 9156ms 9127ms -0.32% 0.3470
describe (4751 revs, 81K repo) 1983ms 1963ms -1.02% <0.001
No regressions in any scenario.
Suggested-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
builtin/describe.c | 75 +++++++-----------------------
builtin/show-branch.c | 11 ++---
commit.c | 11 +----
prio-queue.c | 92 ++++++++++++++++++++-----------------
prio-queue.h | 15 ++----
t/unit-tests/u-prio-queue.c | 6 +--
6 files changed, 78 insertions(+), 132 deletions(-)
diff --git a/builtin/describe.c b/builtin/describe.c
index 8e88bdeea6..64424543ef 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -251,61 +251,19 @@ static int compare_pt(const void *a_, const void *b_)
return 0;
}
-struct lazy_queue {
- struct prio_queue queue;
- bool get_pending;
-};
-
-#define LAZY_QUEUE_INIT { { compare_commits_by_commit_date }, false }
-
-static void *lazy_queue_get(struct lazy_queue *queue)
-{
- if (queue->get_pending)
- prio_queue_get(&queue->queue);
- else
- queue->get_pending = true;
- return prio_queue_peek(&queue->queue);
-}
-
-static void lazy_queue_put(struct lazy_queue *queue, void *thing)
-{
- if (queue->get_pending)
- prio_queue_replace(&queue->queue, thing);
- else
- prio_queue_put(&queue->queue, thing);
- queue->get_pending = false;
-}
-
-static bool lazy_queue_empty(const struct lazy_queue *queue)
-{
- return prio_queue_size(&queue->queue) == (queue->get_pending ? 1 : 0);
-}
-
-static void lazy_queue_clear(struct lazy_queue *queue)
-{
- clear_prio_queue(&queue->queue);
- queue->get_pending = false;
-}
-
-static unsigned long finish_depth_computation(struct lazy_queue *queue,
+static unsigned long finish_depth_computation(struct prio_queue *queue,
struct possible_tag *best)
{
unsigned long seen_commits = 0;
struct oidset unflagged = OIDSET_INIT;
- struct commit *commit;
- int skip = queue->get_pending ? 1 : 0;
+ struct commit *c;
- prio_queue_for_each(&queue->queue, commit) {
- if (skip) {
- skip = 0;
- continue;
- }
- if (!(commit->object.flags & best->flag_within))
- oidset_insert(&unflagged, &commit->object.oid);
+ prio_queue_for_each(queue, c) {
+ if (!(c->object.flags & best->flag_within))
+ oidset_insert(&unflagged, &c->object.oid);
}
- while (!lazy_queue_empty(queue)) {
- struct commit *c = lazy_queue_get(queue);
+ while ((c = prio_queue_get(queue))) {
struct commit_list *parents = c->parents;
seen_commits++;
if (c->object.flags & best->flag_within) {
@@ -321,7 +279,7 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
repo_parse_commit(the_repository, p);
seen = p->object.flags & SEEN;
if (!seen)
- lazy_queue_put(queue, p);
+ prio_queue_put(queue, p);
flag_before = p->object.flags & best->flag_within;
p->object.flags |= c->object.flags;
flag_after = p->object.flags & best->flag_within;
@@ -369,8 +327,8 @@ static void append_suffix(int depth, const struct object_id *oid, struct strbuf
static void describe_commit(struct commit *cmit, struct strbuf *dst)
{
- struct commit *gave_up_on = NULL;
- struct lazy_queue queue = LAZY_QUEUE_INIT;
+ struct commit *c, *gave_up_on = NULL;
+ struct prio_queue queue = { compare_commits_by_commit_date };
struct commit_name *n;
struct possible_tag all_matches[MAX_TAGS];
unsigned int match_cnt = 0, annotated_cnt = 0, cur_match;
@@ -412,9 +370,8 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
}
cmit->object.flags = SEEN;
- lazy_queue_put(&queue, cmit);
- while (!lazy_queue_empty(&queue)) {
- struct commit *c = lazy_queue_get(&queue);
+ prio_queue_put(&queue, cmit);
+ while ((c = prio_queue_get(&queue))) {
struct commit_list *parents = c->parents;
struct commit_name **slot;
@@ -448,7 +405,7 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
t->depth++;
}
/* Stop if last remaining path already covered by best candidate(s) */
- if (annotated_cnt && lazy_queue_empty(&queue)) {
+ if (annotated_cnt && !prio_queue_size(&queue)) {
int best_depth = INT_MAX;
unsigned best_within = 0;
for (cur_match = 0; cur_match < match_cnt; cur_match++) {
@@ -471,7 +428,7 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
struct commit *p = parents->item;
repo_parse_commit(the_repository, p);
if (!(p->object.flags & SEEN))
- lazy_queue_put(&queue, p);
+ prio_queue_put(&queue, p);
p->object.flags |= c->object.flags;
parents = parents->next;
@@ -486,7 +443,7 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
strbuf_add_unique_abbrev(dst, cmit_oid, abbrev);
if (suffix)
strbuf_addstr(dst, suffix);
- lazy_queue_clear(&queue);
+ clear_prio_queue(&queue);
return;
}
if (unannotated_cnt)
@@ -502,11 +459,11 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
QSORT(all_matches, match_cnt, compare_pt);
if (gave_up_on) {
- lazy_queue_put(&queue, gave_up_on);
+ prio_queue_put(&queue, gave_up_on);
seen_commits--;
}
seen_commits += finish_depth_computation(&queue, &all_matches[0]);
- lazy_queue_clear(&queue);
+ clear_prio_queue(&queue);
if (debug) {
static int label_width = -1;
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 8846f2376f..2435e8aeda 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -232,12 +232,13 @@ static void join_revs(struct prio_queue *queue,
while ((commit = prio_queue_peek(queue))) {
struct commit_list *parents;
int still_interesting = !!interesting(queue);
- bool get_pending = true;
int flags = commit->object.flags & all_mask;
if (!still_interesting && extra <= 0)
break;
+ prio_queue_get(queue);
+
mark_seen(commit, seen_p);
if ((flags & all_revs) == all_revs)
flags |= UNINTERESTING;
@@ -253,14 +254,8 @@ static void join_revs(struct prio_queue *queue,
if (mark_seen(p, seen_p) && !still_interesting)
extra--;
p->object.flags |= flags;
- if (get_pending)
- prio_queue_replace(queue, p);
- else
- prio_queue_put(queue, p);
- get_pending = false;
+ prio_queue_put(queue, p);
}
- if (get_pending)
- prio_queue_get(queue);
}
/*
diff --git a/commit.c b/commit.c
index fd8723502e..976bfc4618 100644
--- a/commit.c
+++ b/commit.c
@@ -795,24 +795,17 @@ void commit_list_sort_by_date(struct commit_list **list)
struct commit *pop_most_recent_commit(struct prio_queue *queue,
unsigned int mark)
{
- struct commit *ret = prio_queue_peek(queue);
- int get_pending = 1;
+ struct commit *ret = prio_queue_get(queue);
struct commit_list *parents = ret->parents;
while (parents) {
struct commit *commit = parents->item;
if (!repo_parse_commit(the_repository, commit) && !(commit->object.flags & mark)) {
commit->object.flags |= mark;
- if (get_pending)
- prio_queue_replace(queue, commit);
- else
- prio_queue_put(queue, commit);
- get_pending = 0;
+ prio_queue_put(queue, commit);
}
parents = parents->next;
}
- if (get_pending)
- prio_queue_get(queue);
return ret;
}
diff --git a/prio-queue.c b/prio-queue.c
index ead4faf4bb..199775d5af 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -34,12 +34,48 @@ void clear_prio_queue(struct prio_queue *queue)
queue->nr_ = 0;
queue->alloc = 0;
queue->insertion_ctr = 0;
+ queue->get_pending = 0;
+}
+
+static void sift_down_root(struct prio_queue *queue)
+{
+ size_t ix, child;
+
+ /* Push down the one at the root */
+ for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
+ child = ix * 2 + 1; /* left */
+ if (child + 1 < queue->nr_ &&
+ compare(queue, child, child + 1) >= 0)
+ child++; /* use right child */
+
+ if (compare(queue, ix, child) <= 0)
+ break;
+
+ swap(queue, child, ix);
+ }
+}
+
+static inline void flush_get(struct prio_queue *queue)
+{
+ if (!queue->get_pending)
+ return;
+ queue->get_pending = 0;
+ queue->array[0] = queue->array[--queue->nr_];
+ sift_down_root(queue);
}
void prio_queue_put(struct prio_queue *queue, void *thing)
{
size_t ix, parent;
+ if (queue->get_pending) {
+ queue->get_pending = 0;
+ queue->array[0].ctr = queue->insertion_ctr++;
+ queue->array[0].data = thing;
+ sift_down_root(queue);
+ return;
+ }
+
/* Append at the end */
ALLOC_GROW(queue->array, queue->nr_ + 1, queue->alloc);
queue->array[queue->nr_].ctr = queue->insertion_ctr++;
@@ -58,61 +94,33 @@ void prio_queue_put(struct prio_queue *queue, void *thing)
}
}
-static void sift_down_root(struct prio_queue *queue)
-{
- size_t ix, child;
-
- /* Push down the one at the root */
- for (ix = 0; ix * 2 + 1 < queue->nr_; ix = child) {
- child = ix * 2 + 1; /* left */
- if (child + 1 < queue->nr_ &&
- compare(queue, child, child + 1) >= 0)
- child++; /* use right child */
-
- if (compare(queue, ix, child) <= 0)
- break;
-
- swap(queue, child, ix);
- }
-}
-
void *prio_queue_get(struct prio_queue *queue)
{
- void *result;
-
- if (!queue->nr_)
+ if (queue->nr_ <= queue->get_pending) {
+ queue->nr_ = 0;
+ queue->get_pending = 0;
return NULL;
+ }
if (!queue->compare)
return queue->array[--queue->nr_].data; /* LIFO */
- result = queue->array[0].data;
- if (!--queue->nr_)
- return result;
+ flush_get(queue);
- queue->array[0] = queue->array[queue->nr_];
- sift_down_root(queue);
- return result;
+ queue->get_pending = 1;
+ return queue->array[0].data;
}
void *prio_queue_peek(struct prio_queue *queue)
{
- if (!queue->nr_)
+ if (queue->nr_ <= queue->get_pending) {
+ queue->nr_ = 0;
+ queue->get_pending = 0;
return NULL;
+ }
if (!queue->compare)
return queue->array[queue->nr_ - 1].data;
- return queue->array[0].data;
-}
-void prio_queue_replace(struct prio_queue *queue, void *thing)
-{
- if (!queue->nr_) {
- prio_queue_put(queue, thing);
- } else if (!queue->compare) {
- queue->array[queue->nr_ - 1].ctr = queue->insertion_ctr++;
- queue->array[queue->nr_ - 1].data = thing;
- } else {
- queue->array[0].ctr = queue->insertion_ctr++;
- queue->array[0].data = thing;
- sift_down_root(queue);
- }
+ flush_get(queue);
+
+ return queue->array[0].data;
}
diff --git a/prio-queue.h b/prio-queue.h
index 7f2aa986b1..570b48e648 100644
--- a/prio-queue.h
+++ b/prio-queue.h
@@ -30,8 +30,9 @@ struct prio_queue {
prio_queue_compare_fn compare;
size_t insertion_ctr;
void *cb_data;
- size_t alloc, nr_;
+ size_t alloc, nr_; /* use prio_queue_size() for logical count */
struct prio_queue_entry *array;
+ unsigned get_pending;
};
/*
@@ -54,22 +55,14 @@ void *prio_queue_peek(struct prio_queue *);
static inline size_t prio_queue_size(const struct prio_queue *queue)
{
- return queue->nr_;
+ return queue->nr_ - queue->get_pending;
}
#define prio_queue_for_each(queue, it) \
- for (size_t pq_ix_ = 0; \
+ for (size_t pq_ix_ = (queue)->get_pending; \
pq_ix_ < (queue)->nr_ && ((it) = (queue)->array[pq_ix_].data, 1); \
pq_ix_++)
-/*
- * Replace the "thing" that compares the smallest with a new "thing",
- * like prio_queue_get()+prio_queue_put() would do, but in a more
- * efficient way. Does the same as prio_queue_put() if the queue is
- * empty.
- */
-void prio_queue_replace(struct prio_queue *queue, void *thing);
-
void clear_prio_queue(struct prio_queue *);
/* Reverse the LIFO elements */
diff --git a/t/unit-tests/u-prio-queue.c b/t/unit-tests/u-prio-queue.c
index 63e58114ae..af3e0b8598 100644
--- a/t/unit-tests/u-prio-queue.c
+++ b/t/unit-tests/u-prio-queue.c
@@ -53,13 +53,13 @@ static void test_prio_queue(int *input, size_t input_size,
prio_queue_reverse(&pq);
break;
case REPLACE:
- peek = prio_queue_peek(&pq);
+ get = prio_queue_get(&pq);
cl_assert(i + 1 < input_size);
cl_assert(input[i + 1] >= 0);
cl_assert(j < result_size);
- cl_assert_equal_i(result[j], show(peek));
+ cl_assert_equal_i(result[j], show(get));
j++;
- prio_queue_replace(&pq, &input[++i]);
+ prio_queue_put(&pq, &input[++i]);
break;
default:
prio_queue_put(&pq, &input[i]);
--
gitgitgadget
^ permalink raw reply related
* Re: [PATCH] ls-files: filter pathspec before lstat
From: Tamir Duberstein @ 2026-06-08 19:15 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, René Scharfe, Patrick Steinhardt
In-Reply-To: <xmqqa4t5yyee.fsf@gitster.g>
On Mon, Jun 8, 2026 at 6:06 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> On Sun, Jun 7, 2026 at 11:40, Tamir Duberstein wrote:
> > show_files() checks whether each index entry is deleted or modified
> > before show_ce() applies the pathspec. prune_index() avoids most of this
> > work for pathspecs with a common directory prefix, but a top-level name
> > or leading wildcard leaves every entry to be checked.
> >
> > Match the pathspec before lstat() for the deleted and modified modes.
> > Keep the later match in show_ce() so --error-unmatch is satisfied only
> > by entries that are actually shown.
>
> Adding an extra early `match_pathspec()` check before making slow
> system calls like `lstat()` makes sense, especially when most of the
> index entries need to be skipped. But if most of them would match,
> then we would end up doing the same match_pathspec() calls twice for
> each path, and run lstat() anyway, so you may also be able to
> construct a perf test that demonstrates a case where this approach
> is not a clear win (or even degradation), perhaps?
Yes. I added an all-matching pathspec case to p3010 and ran:
hyperfine --warmup 0 --runs 3 \
'git -c core.fsmonitor=false ls-files --deleted -- "*"'
On a checkout with 859,940 index entries, I ran the parent and patched
binaries in both orders:
parent this commit
parent first elapsed 56.807 s 64.618 s
user 1.256 s 1.270 s
system 10.633 s 11.068 s
patched first elapsed 63.361 s 64.316 s
user 1.238 s 1.280 s
system 10.296 s 11.864 s
The added match costs 14-42 ms of user time in this case. Elapsed time
varies by several seconds with command order, obscuring that CPU cost.
The later match in show_ce() is reached only for entries actually found
deleted or modified. This case therefore exercises the extra match for
every index entry while still performing every lstat().
>
> > diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> > index e1a22b41b9..702c607183 100644
> > --- a/builtin/ls-files.c
> > +++ b/builtin/ls-files.c
> > @@ -450,6 +450,13 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> > continue;
> > if (ce_skip_worktree(ce))
> > continue;
> > + /* Only entries shown by show_ce() satisfy --error-unmatch. */
> > + if (pathspec.nr &&
> > + !match_pathspec(repo->index, &pathspec, fullname.buf,
> > + fullname.len, max_prefix_len, NULL,
> > + S_ISDIR(ce->ce_mode) ||
> > + S_ISGITLINK(ce->ce_mode)))
> > + continue;
> > stat_err = lstat(fullname.buf, &st);
> > if (stat_err && (errno != ENOENT && errno != ENOTDIR))
> > error_errno("cannot lstat '%s'", fullname.buf);
>
> Hmph. In the current code, because there is no such pre-filtering,
> show_ce() would unconditionally recurse into active submodules when
> told to with the "--recurse-submodules" flag, even if your pathspec
> coes not match the submodule. With this change, such a submodule
> whose path does not match the pathspec would not even be seen by
> show_ce(). Would it cause a change in behaviour?
This path cannot affect --recurse-submodules. cmd_ls_files() rejects
--recurse-submodules together with either --deleted or --modified before
calling show_files(), and the new check is reached only for those two
modes. Cached and stage output continue to call show_ce() before the new
check. t3007 already verifies that both combinations are rejected.
Given the 60.742 s to 1.061 s improvement for a selective pathspec, I
think this small CPU cost for an all-matching pathspec is a worthwhile
tradeoff. What do you think?
Thanks for the review!
Tamir
^ permalink raw reply
* Re: [PATCH 3/3] replay: offer an option to linearize the commit topology
From: Junio C Hamano @ 2026-06-08 19:29 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Johannes Schindelin
In-Reply-To: <20260608-toon-git-replay-drop-merges-v1-3-e3ee71fce7b4@iotcl.com>
Toon Claes <toon@iotcl.com> writes:
> From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
>
> One of the stated goals of git-replay(1) is to allow implementing the
> git-rebase(1) functionality on the server side.
>
> The default mode of git-rebase(1) is to act as if `--no-rebase-merges`
> was given. This mode drops merge commits instead of replaying them, and
> linearized the commit history into a sequence of the
> regular (single-parent) commits.
"linearized" -> "linearizes"?
>
> Add option `--linearize` to git-replay(1) do the same.
"do the same" -> "to do the same"?
> Co-authored-by: Toon Claes <toon@iotcl.com>
There is no sign-off by any of the authors?
> @@ -430,12 +435,20 @@ int replay_revisions(struct rev_info *revs,
> while ((commit = get_revision(revs))) {
> const struct name_decoration *decoration;
>
> - if (commit->parents && commit->parents->next)
> + if (opts->linearize && (!commit->parents || commit->parents->next))
> + ; /* map current commit to the same as the previous commit */
This uses the same treatment on either root commits or merge
commits? If this were a mistake and this wants to handle merges but
not roots, shouldn't it be more like
if (opts->linearize && (commit->parents && commit->parents->next))
; /* map the merge to the previous */
> + else if (commit->parents && commit->parents->next)
> die(_("replaying merge commits is not supported yet!"));
And because the next one is also about merges, perhaps the early
part of this if/else if cascade can be written
if (commit->parents && commit->parents->next) {
/* We have a merge */
if (!opts->linearize)
die(_("can't replay a merge (yet)"));
; /* map current to the previous */
} else {
...
wouldn't it?
If the "map current to prev" is applicable to root, any root are
mapped to the last_commit in the above, and if we saw a root as the
first thing in the loop, last_commit is NULL, we do not do anything
here, and after the if/else if/else cascade, we see last_commit is
NULL and break out of the loop.
> + else {
> + struct commit *to_pick = reverse ? last_commit : onto;
> + last_commit =
> + pick_regular_commit(revs->repo, commit,
> + replayed_commits, to_pick,
> + &merge_opt, &result,
> + opts->linearize ? last_commit : NULL,
> + reverse, opts->empty);
> + }
>
> - last_commit = pick_regular_commit(revs->repo, commit, replayed_commits,
> - reverse ? last_commit : onto,
> - &merge_opt, &result, reverse, opts->empty);
> if (!last_commit)
> break;
> diff --git a/replay.h b/replay.h
> index 1851a07705..07e6fdcca3 100644
> --- a/replay.h
> +++ b/replay.h
> @@ -62,6 +62,11 @@ struct replay_revisions_options {
> * Defaults to REPLAY_EMPTY_COMMIT_DROP.
> */
> enum replay_empty_commit_action empty;
> +
> + /*
> + * Whether to linearize the commits (i.e. drop merge commits).
> + */
> + int linearize;
> };
>
> /* This struct is used as an out-parameter by `replay_revisions()`. */
> diff --git a/t/t3650-replay-basics.sh b/t/t3650-replay-basics.sh
> index 3353bc4a4d..c781a3bb1b 100755
> --- a/t/t3650-replay-basics.sh
> +++ b/t/t3650-replay-basics.sh
> @@ -565,4 +565,26 @@ test_expect_success '--onto with --ref rejects multiple revision ranges' '
> test_grep "cannot be used with multiple revision ranges" err
> '
>
> +test_expect_success 'linearize the commit topology' '
> + test_tick &&
> + N=$(git commit-tree -m N -p L -p I L:) &&
> + N=$(git commit-tree -m N-child -p $N L:) &&
> + git update-ref refs/heads/N $N &&
> +
> + git replay --ref-action=print --linearize \
> + --onto A B..refs/heads/N >out &&
> +
> + test_line_count = 1 out &&
> + read N1 N2 N3 N4 <out &&
> +
> + cat >expect <<-EOF &&
> + * N-child
> + * I
> + * L
> + o A
> + EOF
> + git log --format=%s --graph --boundary A...$N3 >actual &&
> + test_cmp expect actual
> +'
Perhaps we would want to have a test that replays all the way down
to the root commit?
^ permalink raw reply
* Re: [PATCH 2/3] config: add GIT_CONFIG_INCLUDES
From: Derrick Stolee @ 2026-06-08 19:38 UTC (permalink / raw)
To: Patrick Steinhardt, Derrick Stolee via GitGitGadget; +Cc: git, gitster
In-Reply-To: <aibTAOrcSvTOtv78@pks.im>
On 6/8/2026 10:34 AM, Patrick Steinhardt wrote:
> On Mon, Jun 08, 2026 at 01:57:05PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The config keys 'include.path' and 'includeIf.*' allow users to specify
>> config stored in a location outside of the typical list of config files
>> (system, global, local, etc.). For example, users who accept the risk
>> can specify helpful aliases via a file checked into the repo by pointing
>> 'include.path' to the position of that file in the working directory.
>> This is dangerous, but people do it.
>
> Huh, I never even considered this use case. But of course, this is
> possible, even though it's quite scary.
>
>> What becomes tricky is that this modifies all Git behavior, including
>> operations that are intended to be limited in activity or sandboxed in
>> some way. These include directives can provide surprising changes to
>> behavior, especially when expecting a specific list of allowed file
>> accesses. This could lead to failed builds, for instance.
>>
>> To allow for these user-desired features when they are running commands,
>> add a new GIT_CONFIG_INCLUDES environment variable that disables these
>> redirections of config when set to zero. This variable can be set by
>> automation, such as build tooling, to avoid these strange behaviors.
>> This could be considered a recommended option for tools executing Git
>> commands, the same as GIT_ADVICE=0.
>
> I don't know about this part though. I could see use cases where the
> tools _should_ read the project-relative configuration. It might also be
> the case that the user may want to evaluate some includes, but not all
> of them.
True. I'm not confident that we should recommend this environment in all
cases.
> That raises the question whether we can introduce the configuration in a
> way that it allows a bit more flexibility than just "yes"/"no", like for
> example an allow-list of locations that should be evaluated. But maybe
> I'm overthinking this.
I see. So we can say "avoid including into the repository worktree" but
that will probably be incomplete.
There is room for nuance in future expansions, if we can find a creative
way to handle that nuance. For now, I think I would still want an ability
to turn the entire feature off, at least for certain tools that care.
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Karthik Nayak @ 2026-06-08 21:18 UTC (permalink / raw)
To: Tamir Duberstein, git
Cc: Jeff King, Junio C Hamano, Victoria Dye, Derrick Stolee,
Elijah Newren
In-Reply-To: <20260607-ref-filter-memoized-contains-v1-1-a1972dde9c76@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 17907 bytes --]
Tamir Duberstein <tamird@gmail.com> writes:
> git branch and git for-each-ref call repo_is_descendant_of() for each
> candidate selected by --contains or --no-contains. Each call starts a
> new graph walk, so refs with shared history repeatedly traverse the same
> commits.
>
> ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> the tag traversal that caches positive and negative answers across
> candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> 2015-07-07) preserved the branch and tag implementations when ref-filter
> learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> the '--contains' option, 2015-10-18) noted that they should be unified.
>
Nicely explained. We should've merged this long ago, so this is a
worthwhile change.
> Use the memoized traversal for every ref-filter contains check and
> remove the implementation selector. The cache records answers for one
> fixed target list, so document that callers must clear it before
> changing the list.
>
> The memoized depth-first walk assumes acyclic ancestry, but replacement
> refs can create cycles. Track commits while they are on the walk. If a
> cycle is found, discard partial cache entries and use
> repo_is_descendant_of() for that candidate.
>
> The branch and for-each-ref path passed repo_is_descendant_of() through
> a Boolean interface. In configurations where it returned -1 for missing
> ancestry, ref-filter treated the error as "contains". The memoized path
> instead fails when ancestry cannot be parsed, as git tag already did.
> During review of the 2018 reachability series, making parse failures
> fatal was explicitly deferred because that series was intended to
> preserve behavior. Unifying the implementations now makes all callers
> fail consistently instead of preserving that accidental Boolean
> interpretation.
>
> The added p1500 case uses up to 8,192 packed refs along one first-parent
> history. It improves from 0.68 to 0.03 seconds.
>
> On a checkout with 62,174 remote-tracking refs, I ran:
>
> hyperfine --warmup 0 --runs 3 \
> --command-name parent \
> '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
> --command-name this-commit \
> '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
>
> The results were:
>
> parent this commit
> elapsed 104.365 s 467.7 ms
> user 93.702 s 220.2 ms
> system 0.723 s 182.7 ms
>
> The wall-time standard deviations were 11.356 seconds and 133.8
> milliseconds, respectively, for a 223x speedup. Both commands produced
> output with SHA-256
> 2466f6e2b72aa16b1a2126eddb81c8a1b2764ee251204ac034c191a925aa896f.
>
> Both revisions were rebuilt with the default -O2 flags using Apple clang
> 21.0.0 on macOS 26.5. The machine was a MacBook Pro (Mac16,6) with a
> 16-core Apple M4 Max (12 performance and four efficiency cores) and 128
> GB RAM.
>
> Link: https://lore.kernel.org/git/1445163904-24611-1-git-send-email-Karthik.188@gmail.com/
> Link: https://lore.kernel.org/git/20180723204112.233274-1-jonathantanmy@google.com/
> Link: https://lore.kernel.org/git/24424e55-7fa8-d05b-bc39-e14b4d5abcb6@gmail.com/
> Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> ---
> builtin/tag.c | 1 -
> commit-reach.c | 45 +++++++++++++++++++++++++++++++-----------
> commit-reach.h | 15 ++++++++++----
> ref-filter.c | 6 ++++--
> ref-filter.h | 7 +++----
> t/helper/test-reach.c | 10 ++--------
> t/perf/p1500-graph-walks.sh | 24 +++++++++++++++++++++-
> t/t6301-for-each-ref-errors.sh | 18 +++++++++++++++++
> t/t6302-for-each-ref-filter.sh | 21 ++++++++++++++++++++
> t/t6600-test-reach.sh | 6 ++----
> 10 files changed, 117 insertions(+), 36 deletions(-)
>
> diff --git a/builtin/tag.c b/builtin/tag.c
> index d51c2e3349..9f34d948d4 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -71,7 +71,6 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
>
> if (verify_ref_format(format))
> die(_("unable to parse format string"));
> - filter->with_commit_tag_algo = 1;
We were selectively using the algo for `git tag`, like mentioned I guess
we'll entirely remove `with_commit_tag_algo` below somewhere
> filter_and_format_refs(filter, FILTER_REFS_TAGS, sorting, format);
>
> free(to_free);
> diff --git a/commit-reach.c b/commit-reach.c
> index 9b3ea46d6f..6e599a3670 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -6,7 +6,6 @@
> #include "decorate.h"
> #include "hex.h"
> #include "prio-queue.h"
> -#include "ref-filter.h"
> #include "revision.h"
> #include "tag.h"
> #include "commit-reach.h"
> @@ -708,7 +707,8 @@ static int in_commit_list(const struct commit_list *want, struct commit *c)
>
> /*
> * Test whether the candidate is contained in the list.
> - * Do not recurse to find out, though, but return -1 if inconclusive.
> + * Do not recurse to find out, though, but return CONTAINS_UNKNOWN if
> + * inconclusive.
Okay so the code does return CONTAINS_UNKNOWN which is an enum beginning
at 0, so this makes sense.
> */
> static enum contains_result contains_test(struct commit *candidate,
> const struct commit_list *want,
> @@ -743,9 +743,9 @@ static void push_to_contains_stack(struct commit *candidate, struct contains_sta
> contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
> }
>
> -static enum contains_result contains_tag_algo(struct commit *candidate,
> - const struct commit_list *want,
> - struct contains_cache *cache)
> +static enum contains_result contains_algo(struct commit *candidate,
> + struct commit_list *want,
> + struct contains_cache *cache)
> {
> struct contains_stack contains_stack = { 0, 0, NULL };
> enum contains_result result;
> @@ -765,6 +765,7 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> if (result != CONTAINS_UNKNOWN)
> return result;
>
> + *contains_cache_at(cache, candidate) = CONTAINS_IN_PROGRESS;
> push_to_contains_stack(candidate, &contains_stack);
> while (contains_stack.nr) {
> struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
> @@ -776,8 +777,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> contains_stack.nr--;
> }
> /*
> - * If we just popped the stack, parents->item has been marked,
> - * therefore contains_test will return a meaningful yes/no.
> + * A parent may have just been popped and marked, or may still
> + * be active when replacement refs create a cycle.
> */
> else switch (contains_test(parents->item, want, cache, cutoff)) {
> case CONTAINS_YES:
> @@ -787,21 +788,41 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> case CONTAINS_NO:
> entry->parents = parents->next;
> break;
> + case CONTAINS_IN_PROGRESS:
> + /*
> + * Partial negative answers are not safe across a cycle.
> + * Discard them and use the cycle-safe reachability walk.
> + */
> + goto cycle;
>
If I understand this correctly, we now use CONTAINS_IN_PROGRESS to
showcase a commit for which we still don't have a result. Since any
commit with UNKNOWN will start a recursive search through its parents.
So with this if we encounter a CONTAINS_IN_PROGRESS while recursion,
this would indicate that we hit a cyclic graph and so go to the fallback
of using repo_is_descendant_of(). So this avoids an infinite recursion
in such instances. Makes sense.
> case CONTAINS_UNKNOWN:
> + *contains_cache_at(cache, parents->item) =
> + CONTAINS_IN_PROGRESS;
> push_to_contains_stack(parents->item, &contains_stack);
> break;
> }
> }
> free(contains_stack.contains_stack);
> return contains_test(candidate, want, cache, cutoff);
> +
> +cycle:
> + free(contains_stack.contains_stack);
> + clear_contains_cache(cache);
> + init_contains_cache(cache);
> +
> + result = repo_is_descendant_of(the_repository, candidate, want);
> + if (result < 0)
> + exit(128);
> + *contains_cache_at(cache, candidate) =
> + result ? CONTAINS_YES : CONTAINS_NO;
> + return result ? CONTAINS_YES : CONTAINS_NO;
> }
>
> -int commit_contains(struct ref_filter *filter, struct commit *commit,
> - struct commit_list *list, struct contains_cache *cache)
> +int commit_contains(struct commit *commit, struct commit_list *list,
> + struct contains_cache *cache)
> {
> - if (filter->with_commit_tag_algo)
> - return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
> - return repo_is_descendant_of(the_repository, commit, list);
> + if (!list)
> + return 1;
> + return contains_algo(commit, list, cache) == CONTAINS_YES;
> }
>
> int can_all_from_reach_with_flag(struct object_array *from,
> diff --git a/commit-reach.h b/commit-reach.h
> index 3f3a563d8a..144dc56275 100644
> --- a/commit-reach.h
> +++ b/commit-reach.h
> @@ -5,7 +5,6 @@
> #include "commit-slab.h"
>
> struct commit_list;
> -struct ref_filter;
> struct object_id;
> struct object_array;
>
> @@ -73,13 +72,21 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
> enum contains_result {
> CONTAINS_UNKNOWN = 0,
> CONTAINS_NO,
> - CONTAINS_YES
> + CONTAINS_YES,
> + CONTAINS_IN_PROGRESS
> };
>
> define_commit_slab(contains_cache, enum contains_result);
>
> -int commit_contains(struct ref_filter *filter, struct commit *commit,
> - struct commit_list *list, struct contains_cache *cache);
> +/*
> + * Return whether "commit" is a descendant of any commit in "list". An empty
> + * list matches.
> + *
> + * "cache" records answers for one fixed "list". Clear it before changing the
> + * list.
> + */
> +int commit_contains(struct commit *commit, struct commit_list *list,
> + struct contains_cache *cache);
>
> /*
> * Determine if every commit in 'from' can reach at least one commit
> diff --git a/ref-filter.c b/ref-filter.c
> index 1da4c0e60d..7788147959 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2991,11 +2991,13 @@ static struct ref_array_item *apply_ref_filter(const struct reference *ref,
> return NULL;
> /* We perform the filtering for the '--contains' option... */
> if (filter->with_commit &&
> - !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
> + !commit_contains(commit, filter->with_commit,
> + &filter->internal.contains_cache))
> return NULL;
> /* ...or for the `--no-contains' option */
> if (filter->no_commit &&
> - commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
> + commit_contains(commit, filter->no_commit,
> + &filter->internal.no_contains_cache))
> return NULL;
> }
>
> diff --git a/ref-filter.h b/ref-filter.h
> index 120221b47f..9e14afca9c 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -73,10 +73,9 @@ struct ref_filter {
> struct commit_list *reachable_from;
> struct commit_list *unreachable_from;
>
> - unsigned int with_commit_tag_algo : 1,
> - match_as_path : 1,
> - ignore_case : 1,
> - detached : 1;
> + unsigned int match_as_path : 1,
> + ignore_case : 1,
> + detached : 1;
> unsigned int kind,
> lines;
> int abbrev,
Nit: With the changes above. I do wish it was split into two commits.
1. Fix cyclic recursions in the algo.
2. Use the algo for all filter types.
> diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
> index 5d86a96c17..82235f713e 100644
> --- a/t/helper/test-reach.c
> +++ b/t/helper/test-reach.c
> @@ -6,7 +6,6 @@
> #include "gettext.h"
> #include "hex.h"
> #include "object-name.h"
> -#include "ref-filter.h"
> #include "setup.h"
> #include "string-list.h"
> #include "tag.h"
> @@ -138,16 +137,11 @@ int cmd__reach(int ac, const char **av)
>
> printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
> } else if (!strcmp(av[1], "commit_contains")) {
> - struct ref_filter filter = REF_FILTER_INIT;
> struct contains_cache cache;
> init_contains_cache(&cache);
>
> - if (ac > 2 && !strcmp(av[2], "--tag"))
> - filter.with_commit_tag_algo = 1;
> - else
> - filter.with_commit_tag_algo = 0;
> -
> - printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
> + printf("%s(_,A,X,_):%d\n", av[1],
> + commit_contains(A, X, &cache));
> clear_contains_cache(&cache);
> } else if (!strcmp(av[1], "get_reachable_subset")) {
> const int reachable_flag = 1;
Makes sense.
> diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
> index 5b23ce5db9..ac68fdbacd 100755
> --- a/t/perf/p1500-graph-walks.sh
> +++ b/t/perf/p1500-graph-walks.sh
> @@ -5,6 +5,8 @@ test_description='Commit walk performance tests'
>
> test_perf_large_repo
>
> +contains_ref_limit=8192
> +
> test_expect_success 'setup' '
> git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
> sort -r allrefs | head -n 50 >refs &&
> @@ -32,10 +34,25 @@ test_expect_success 'setup' '
> echo "X:$line" >>test-tool-tags || return 1
> done &&
>
> + git rev-list --first-parent --max-count=$contains_ref_limit HEAD >contains-commits &&
> + contains_ref_count=$(wc -l <contains-commits) &&
> + test "$contains_ref_count" -gt 0 &&
> + contains_base=$(tail -n 1 contains-commits) &&
> + export contains_base &&
> + awk "{
> + printf \"update refs/contains-perf/%04d %s\\n\", NR, \$1
> + }" contains-commits |
> + git update-ref --stdin &&
> + git pack-refs --include "refs/contains-perf/*" &&
> +
> commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
> git update-ref refs/heads/disjoint-base $commit &&
>
> - git commit-graph write --reachable
> + git commit-graph write --reachable &&
> +
> + git for-each-ref --contains="$contains_base" \
> + refs/contains-perf/ >actual &&
> + test_line_count = $contains_ref_count actual
Shouldn't this be a separate test and not a part of the setup?
> '
>
> test_perf 'ahead-behind counts: git for-each-ref' '
> @@ -62,6 +79,11 @@ test_perf 'contains: git tag --merged' '
> xargs git tag --merged=HEAD <tags
> '
>
> +test_perf 'contains: git for-each-ref --contains' '
> + git for-each-ref --contains="$contains_base" \
> + refs/contains-perf/ >/dev/null
> +'
> +
Ah! we also have this, so perhaps moving the `test_line_count` here and
dropping it above would be better.
> test_perf 'is-base check: test-tool reach (refs)' '
> test-tool reach get_branch_base_for_tip <test-tool-refs
> '
> diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
> index e06feb06e9..169cc70c23 100755
> --- a/t/t6301-for-each-ref-errors.sh
> +++ b/t/t6301-for-each-ref-errors.sh
> @@ -52,6 +52,24 @@ test_expect_success 'Missing objects are reported correctly' '
> test_must_be_empty brief-err
> '
>
> +test_expect_success 'missing ancestors are reported by contains filters' '
> + test_when_finished "git update-ref -d refs/heads/missing-parent" &&
> + {
> + echo "tree $(git rev-parse HEAD^{tree})" &&
> + echo "parent $MISSING" &&
> + git cat-file commit HEAD |
> + sed -n -e "/^author /p" -e "/^committer /p" &&
> + echo &&
> + echo "missing parent"
> + } > commit &&
> + broken=$(git hash-object -t commit -w commit) &&
> + git update-ref refs/heads/missing-parent "$broken" &&
> + test_must_fail git for-each-ref --contains=HEAD \
> + refs/heads/missing-parent >out 2>err &&
> + test_must_be_empty out &&
> + test_grep "unable to parse commit $MISSING" err
> +'
> +
> test_expect_success 'ahead-behind requires an argument' '
> test_must_fail git for-each-ref \
> --format="%(ahead-behind)" 2>err &&
> diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> index 7f060d97bf..423505d1fb 100755
> --- a/t/t6302-for-each-ref-filter.sh
> +++ b/t/t6302-for-each-ref-filter.sh
> @@ -177,6 +177,27 @@ test_expect_success 'filtering with --contains and --no-contains' '
> test_cmp expect actual
> '
>
> +test_expect_success 'contains handles cyclic replacement histories' '
> + one=$(git rev-parse one) &&
> + three=$(git rev-parse three) &&
> + test_when_finished "
> + git replace -d $one
> + git replace -d $three
> + git tag -d cycle-a cycle-b
> + " &&
> + git tag cycle-a "$one" &&
> + git tag cycle-b "$three" &&
> + git replace --graft "$one" "$three" two &&
> + git replace --graft "$three" "$one" &&
> + cat >expect <<-\EOF &&
> + refs/tags/cycle-a
> + refs/tags/cycle-b
> + EOF
> + git for-each-ref --format="%(refname)" --contains=two \
> + "refs/tags/cycle-*" >actual &&
> + test_cmp expect actual
> +'
> +
> test_expect_success '%(color) must fail' '
> test_must_fail git for-each-ref --format="%(color)%(refname)"
> '
> diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> index b5b314e570..1ecc2571c2 100755
> --- a/t/t6600-test-reach.sh
> +++ b/t/t6600-test-reach.sh
> @@ -286,8 +286,7 @@ test_expect_success 'commit_contains:hit' '
> X:commit-9-3
> EOF
> echo "commit_contains(_,A,X,_):1" >expect &&
> - test_all_modes commit_contains &&
> - test_all_modes commit_contains --tag
> + test_all_modes commit_contains
> '
>
> test_expect_success 'commit_contains:miss' '
> @@ -303,8 +302,7 @@ test_expect_success 'commit_contains:miss' '
> X:commit-9-3
> EOF
> echo "commit_contains(_,A,X,_):0" >expect &&
> - test_all_modes commit_contains &&
> - test_all_modes commit_contains --tag
> + test_all_modes commit_contains
> '
>
> test_expect_success 'rev-list: basic topo-order' '
>
> ---
> base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
> change-id: 20260607-ref-filter-memoized-contains-7cb6b3bccad1
>
> Best regards,
> --
> Tamir Duberstein <tamird@gmail.com>
Overall the patch looks great. The perf improvements are also very
welcome. Some small nits from me.
Thanks
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply
* Re: [PATCH] ref-filter: restore prefix-scoped iteration
From: Junio C Hamano @ 2026-06-08 21:36 UTC (permalink / raw)
To: Tamir Duberstein
Cc: git, Karthik Nayak, Patrick Steinhardt, Victoria Dye, ZheNing Hu
In-Reply-To: <20260605-fix-git-branch-regression-v1-1-02f40ad40929@gmail.com>
Tamir Duberstein <tamird@gmail.com> writes:
> diff --git a/ref-filter.c b/ref-filter.c
> index 1da4c0e60d..2388a57b39 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -3315,19 +3315,31 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, refs_for
> prefix = "refs/tags/";
>
> if (prefix) {
Below, adding an extra call to get_main_ref_store(the_repository)
makes one line unnecessarily split and harder to read. How about
doing
struct ref_store *store = get_main_ref_store(the_repository);
upfront here, and then use that to replace these two calls of
get_main_ref_store(the_repository)?
> + if (filter->start_after) {
> + struct ref_iterator *iter;
>
> + iter = refs_ref_iterator_begin(
> + get_main_ref_store(the_repository), "", NULL, 0,
> + 0);
>
> ret = start_ref_iterator_after(iter, filter->start_after);
> + if (!ret)
> + ret = do_for_each_ref_iterator(iter, fn,
> + cb_data);
> + } else {
> + /*
> + * Pass the prefix during construction because the files
> + * backend primes loose refs before a later seek can
> + * narrow the iterator.
> + */
> + struct refs_for_each_ref_options opts = {
> + .prefix = prefix,
> + };
>
> + ret = refs_for_each_ref_ext(
> + get_main_ref_store(the_repository), fn, cb_data,
> + &opts);
> + }
> } else if (filter->kind & FILTER_REFS_REGULAR) {
^ permalink raw reply
* Re: [PATCH 0/6] t: add lint-style.pl and convert grep to test_grep
From: Junio C Hamano @ 2026-06-08 21:36 UTC (permalink / raw)
To: Michael Montalbo via GitGitGadget
Cc: git, D. Ben Knoble, Eric Sunshine, Michael Montalbo
In-Reply-To: <pull.2135.git.1780559158.gitgitgadget@gmail.com>
"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:
> The test suite has a test_grep wrapper that prints file contents on
> assertion failure, making debugging easier. Many tests still use bare 'grep'
> for assertions, which silently swallows context on failure.
>
> This series adds a lint tool (lint-style.pl) to mechanically detect and
> convert these, then applies it across the test suite.
I do not think we want an automated tool that rewrites the source
files. I was hoping that we would get a patch or two that _adds_ to
existing test-lint framework (i.e., 'test-grep' that 'test-lint'
target depends on in t/Makefile) that gives diagnosis in a similar
fashion as test-lint-shell-syntax and test-chainlint do.
Also some existing uses of "grep" are not end-user facing and should
not be rewritten to "test_grep".
^ permalink raw reply
* Re: [PATCH] docs: fix typos
From: Junio C Hamano @ 2026-06-08 22:16 UTC (permalink / raw)
To: Tuomas Ahola; +Cc: git
In-Reply-To: <20260604131457.19215-1-taahol@utu.fi>
Tuomas Ahola <taahol@utu.fi> writes:
> Fix some typos and grammar errors in comments and documentation files.
>
> Signed-off-by: Tuomas Ahola <taahol@utu.fi>
> ---
Thanks, all changes make sense. Will queue.
^ permalink raw reply
* Re: [GSoC PATCH v2 3/4] repo: add path.gitdir with absolute and relative suffix formatting
From: Lucas Seiki Oshiro @ 2026-06-08 22:17 UTC (permalink / raw)
To: K Jayatheerth
Cc: git, a3205153416, gitster, jltobler, kumarayushjha123,
phillip.wood, sandals
In-Reply-To: <20260605163012.181089-4-jayatheerthkulkarni2005@gmail.com>
> +test_repo_info_path () {
> + field_name=$1
> + expect_absolute_eval=$2
> + expect_relative=$3
> + env_prefix=$4
This helper function needs a documentation.
> + test_expect_success "query individual key: path.$field_name.absolute${env_prefix:+ ($env_prefix)}" '
This makes the output polluted. What about changing it by something like:
test_expect_success "absolute: $label' '...'
test_expect_success "relative: $label' '...'
with a custom label?
> +
> +test_expect_success 'setup test repository layout for path fields' '
> + git init test-repo &&
> + mkdir -p test-repo/sub
> +'
The helper function `test_repo_info_path` is relying too much on the
existence of the `test-repo`. I think it would be better to add a new
parameter `repo_name` (or similar) because:
1. You could move this creation to the helper function and
you won't need to place the test after that creation
2. You could use different for each (test_repo_info_path call, path format)
pair. Currently, if more than one test fails, its result is overwritten
and the `expect` and `actual` files from the trash directory will be
the last of the broken tests.
3. You won't need to use the hacky 'echo "$(cd .. && pwd)'
This applies my suggestions (feel free to use, adapt or discard it):
test_repo_info_path () {
label=$1
field_name=$2
repo_name=$3
expect_absolute=$4
expect_relative=$5
init_command=$6
absolute_root="$repo_name"-absolute
relative_root="$repo_name"-relative
expect_absolute="$PWD"/"$absolute_root"/"$expect_absolute"
test_expect_success 'setup test repository layout for path fields' '
git init "$absolute_root" &&
git init "$relative_root" &&
mkdir -p "$absolute_root"/sub "$relative_root"/sub
'
test_expect_success "absolute: $label" '
(
export ROOT="$PWD"/"$absolute_root" &&
cd "$absolute_root"/sub &&
eval "$init_command" &&
echo "path.$field_name.absolute=$expect_absolute" >expect &&
git repo info path.$field_name.absolute >actual &&
test_cmp expect actual
)
'
test_expect_success "relative: $label" '
(
export ROOT="$PWD"/"$relative_root" &&
cd "$relative_root"/sub &&
eval "$init_command" &&
echo "path.$field_name.relative=$expect_relative" >expect &&
git repo info path.$field_name.relative >actual &&
test_cmp expect actual
)
'
}
test_repo_info_path 'gitdir' 'gitdir' 'gitdir' '.git' '../.git'
^ permalink raw reply
* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Tamir Duberstein @ 2026-06-08 22:30 UTC (permalink / raw)
To: Karthik Nayak
Cc: git, Jeff King, Junio C Hamano, Victoria Dye, Derrick Stolee,
Elijah Newren
In-Reply-To: <CAOLa=ZS_U+u43SV9ELSEU6AT7rzEQ44BuHPAi1BAHEGQAnPoPw@mail.gmail.com>
On Mon, Jun 8, 2026 at 2:18 PM Karthik Nayak <karthik.188@gmail.com> wrote:
>
> Tamir Duberstein <tamird@gmail.com> writes:
>
> > git branch and git for-each-ref call repo_is_descendant_of() for each
> > candidate selected by --contains or --no-contains. Each call starts a
> > new graph walk, so refs with shared history repeatedly traverse the same
> > commits.
> >
> > ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> > the tag traversal that caches positive and negative answers across
> > candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> > 2015-07-07) preserved the branch and tag implementations when ref-filter
> > learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> > the '--contains' option, 2015-10-18) noted that they should be unified.
> >
>
> Nicely explained. We should've merged this long ago, so this is a
> worthwhile change.
>
> > Use the memoized traversal for every ref-filter contains check and
> > remove the implementation selector. The cache records answers for one
> > fixed target list, so document that callers must clear it before
> > changing the list.
> >
> > The memoized depth-first walk assumes acyclic ancestry, but replacement
> > refs can create cycles. Track commits while they are on the walk. If a
> > cycle is found, discard partial cache entries and use
> > repo_is_descendant_of() for that candidate.
> >
> > The branch and for-each-ref path passed repo_is_descendant_of() through
> > a Boolean interface. In configurations where it returned -1 for missing
> > ancestry, ref-filter treated the error as "contains". The memoized path
> > instead fails when ancestry cannot be parsed, as git tag already did.
> > During review of the 2018 reachability series, making parse failures
> > fatal was explicitly deferred because that series was intended to
> > preserve behavior. Unifying the implementations now makes all callers
> > fail consistently instead of preserving that accidental Boolean
> > interpretation.
> >
> > The added p1500 case uses up to 8,192 packed refs along one first-parent
> > history. It improves from 0.68 to 0.03 seconds.
> >
> > On a checkout with 62,174 remote-tracking refs, I ran:
> >
> > hyperfine --warmup 0 --runs 3 \
> > --command-name parent \
> > '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
> > --command-name this-commit \
> > '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
> >
> > The results were:
> >
> > parent this commit
> > elapsed 104.365 s 467.7 ms
> > user 93.702 s 220.2 ms
> > system 0.723 s 182.7 ms
> >
> > The wall-time standard deviations were 11.356 seconds and 133.8
> > milliseconds, respectively, for a 223x speedup. Both commands produced
> > output with SHA-256
> > 2466f6e2b72aa16b1a2126eddb81c8a1b2764ee251204ac034c191a925aa896f.
> >
> > Both revisions were rebuilt with the default -O2 flags using Apple clang
> > 21.0.0 on macOS 26.5. The machine was a MacBook Pro (Mac16,6) with a
> > 16-core Apple M4 Max (12 performance and four efficiency cores) and 128
> > GB RAM.
> >
> > Link: https://lore.kernel.org/git/1445163904-24611-1-git-send-email-Karthik.188@gmail.com/
> > Link: https://lore.kernel.org/git/20180723204112.233274-1-jonathantanmy@google.com/
> > Link: https://lore.kernel.org/git/24424e55-7fa8-d05b-bc39-e14b4d5abcb6@gmail.com/
> > Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> > ---
> > builtin/tag.c | 1 -
> > commit-reach.c | 45 +++++++++++++++++++++++++++++++-----------
> > commit-reach.h | 15 ++++++++++----
> > ref-filter.c | 6 ++++--
> > ref-filter.h | 7 +++----
> > t/helper/test-reach.c | 10 ++--------
> > t/perf/p1500-graph-walks.sh | 24 +++++++++++++++++++++-
> > t/t6301-for-each-ref-errors.sh | 18 +++++++++++++++++
> > t/t6302-for-each-ref-filter.sh | 21 ++++++++++++++++++++
> > t/t6600-test-reach.sh | 6 ++----
> > 10 files changed, 117 insertions(+), 36 deletions(-)
> >
> > diff --git a/builtin/tag.c b/builtin/tag.c
> > index d51c2e3349..9f34d948d4 100644
> > --- a/builtin/tag.c
> > +++ b/builtin/tag.c
> > @@ -71,7 +71,6 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >
> > if (verify_ref_format(format))
> > die(_("unable to parse format string"));
> > - filter->with_commit_tag_algo = 1;
>
> We were selectively using the algo for `git tag`, like mentioned I guess
> we'll entirely remove `with_commit_tag_algo` below somewhere
>
> > filter_and_format_refs(filter, FILTER_REFS_TAGS, sorting, format);
> >
> > free(to_free);
> > diff --git a/commit-reach.c b/commit-reach.c
> > index 9b3ea46d6f..6e599a3670 100644
> > --- a/commit-reach.c
> > +++ b/commit-reach.c
> > @@ -6,7 +6,6 @@
> > #include "decorate.h"
> > #include "hex.h"
> > #include "prio-queue.h"
> > -#include "ref-filter.h"
> > #include "revision.h"
> > #include "tag.h"
> > #include "commit-reach.h"
> > @@ -708,7 +707,8 @@ static int in_commit_list(const struct commit_list *want, struct commit *c)
> >
> > /*
> > * Test whether the candidate is contained in the list.
> > - * Do not recurse to find out, though, but return -1 if inconclusive.
> > + * Do not recurse to find out, though, but return CONTAINS_UNKNOWN if
> > + * inconclusive.
>
> Okay so the code does return CONTAINS_UNKNOWN which is an enum beginning
> at 0, so this makes sense.
>
> > */
> > static enum contains_result contains_test(struct commit *candidate,
> > const struct commit_list *want,
> > @@ -743,9 +743,9 @@ static void push_to_contains_stack(struct commit *candidate, struct contains_sta
> > contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
> > }
> >
> > -static enum contains_result contains_tag_algo(struct commit *candidate,
> > - const struct commit_list *want,
> > - struct contains_cache *cache)
> > +static enum contains_result contains_algo(struct commit *candidate,
> > + struct commit_list *want,
> > + struct contains_cache *cache)
> > {
> > struct contains_stack contains_stack = { 0, 0, NULL };
> > enum contains_result result;
> > @@ -765,6 +765,7 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> > if (result != CONTAINS_UNKNOWN)
> > return result;
> >
> > + *contains_cache_at(cache, candidate) = CONTAINS_IN_PROGRESS;
> > push_to_contains_stack(candidate, &contains_stack);
> > while (contains_stack.nr) {
> > struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
> > @@ -776,8 +777,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> > contains_stack.nr--;
> > }
> > /*
> > - * If we just popped the stack, parents->item has been marked,
> > - * therefore contains_test will return a meaningful yes/no.
> > + * A parent may have just been popped and marked, or may still
> > + * be active when replacement refs create a cycle.
> > */
> > else switch (contains_test(parents->item, want, cache, cutoff)) {
> > case CONTAINS_YES:
> > @@ -787,21 +788,41 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> > case CONTAINS_NO:
> > entry->parents = parents->next;
> > break;
> > + case CONTAINS_IN_PROGRESS:
> > + /*
> > + * Partial negative answers are not safe across a cycle.
> > + * Discard them and use the cycle-safe reachability walk.
> > + */
> > + goto cycle;
> >
>
> If I understand this correctly, we now use CONTAINS_IN_PROGRESS to
> showcase a commit for which we still don't have a result. Since any
> commit with UNKNOWN will start a recursive search through its parents.
>
> So with this if we encounter a CONTAINS_IN_PROGRESS while recursion,
> this would indicate that we hit a cyclic graph and so go to the fallback
> of using repo_is_descendant_of(). So this avoids an infinite recursion
> in such instances. Makes sense.
>
> > case CONTAINS_UNKNOWN:
> > + *contains_cache_at(cache, parents->item) =
> > + CONTAINS_IN_PROGRESS;
> > push_to_contains_stack(parents->item, &contains_stack);
> > break;
> > }
> > }
> > free(contains_stack.contains_stack);
> > return contains_test(candidate, want, cache, cutoff);
> > +
> > +cycle:
> > + free(contains_stack.contains_stack);
> > + clear_contains_cache(cache);
> > + init_contains_cache(cache);
> > +
> > + result = repo_is_descendant_of(the_repository, candidate, want);
> > + if (result < 0)
> > + exit(128);
> > + *contains_cache_at(cache, candidate) =
> > + result ? CONTAINS_YES : CONTAINS_NO;
> > + return result ? CONTAINS_YES : CONTAINS_NO;
> > }
> >
> > -int commit_contains(struct ref_filter *filter, struct commit *commit,
> > - struct commit_list *list, struct contains_cache *cache)
> > +int commit_contains(struct commit *commit, struct commit_list *list,
> > + struct contains_cache *cache)
> > {
> > - if (filter->with_commit_tag_algo)
> > - return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
> > - return repo_is_descendant_of(the_repository, commit, list);
> > + if (!list)
> > + return 1;
> > + return contains_algo(commit, list, cache) == CONTAINS_YES;
> > }
> >
> > int can_all_from_reach_with_flag(struct object_array *from,
> > diff --git a/commit-reach.h b/commit-reach.h
> > index 3f3a563d8a..144dc56275 100644
> > --- a/commit-reach.h
> > +++ b/commit-reach.h
> > @@ -5,7 +5,6 @@
> > #include "commit-slab.h"
> >
> > struct commit_list;
> > -struct ref_filter;
> > struct object_id;
> > struct object_array;
> >
> > @@ -73,13 +72,21 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
> > enum contains_result {
> > CONTAINS_UNKNOWN = 0,
> > CONTAINS_NO,
> > - CONTAINS_YES
> > + CONTAINS_YES,
> > + CONTAINS_IN_PROGRESS
> > };
> >
> > define_commit_slab(contains_cache, enum contains_result);
> >
> > -int commit_contains(struct ref_filter *filter, struct commit *commit,
> > - struct commit_list *list, struct contains_cache *cache);
> > +/*
> > + * Return whether "commit" is a descendant of any commit in "list". An empty
> > + * list matches.
> > + *
> > + * "cache" records answers for one fixed "list". Clear it before changing the
> > + * list.
> > + */
> > +int commit_contains(struct commit *commit, struct commit_list *list,
> > + struct contains_cache *cache);
> >
> > /*
> > * Determine if every commit in 'from' can reach at least one commit
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 1da4c0e60d..7788147959 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2991,11 +2991,13 @@ static struct ref_array_item *apply_ref_filter(const struct reference *ref,
> > return NULL;
> > /* We perform the filtering for the '--contains' option... */
> > if (filter->with_commit &&
> > - !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
> > + !commit_contains(commit, filter->with_commit,
> > + &filter->internal.contains_cache))
> > return NULL;
> > /* ...or for the `--no-contains' option */
> > if (filter->no_commit &&
> > - commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
> > + commit_contains(commit, filter->no_commit,
> > + &filter->internal.no_contains_cache))
> > return NULL;
> > }
> >
> > diff --git a/ref-filter.h b/ref-filter.h
> > index 120221b47f..9e14afca9c 100644
> > --- a/ref-filter.h
> > +++ b/ref-filter.h
> > @@ -73,10 +73,9 @@ struct ref_filter {
> > struct commit_list *reachable_from;
> > struct commit_list *unreachable_from;
> >
> > - unsigned int with_commit_tag_algo : 1,
> > - match_as_path : 1,
> > - ignore_case : 1,
> > - detached : 1;
> > + unsigned int match_as_path : 1,
> > + ignore_case : 1,
> > + detached : 1;
> > unsigned int kind,
> > lines;
> > int abbrev,
>
> Nit: With the changes above. I do wish it was split into two commits.
> 1. Fix cyclic recursions in the algo.
> 2. Use the algo for all filter types.
Makes sense. I split v2 accordingly. The first patch now fixes the
existing git tag --contains traversal and tests that path directly.
The second patch then uses the traversal for the other ref-filter
callers.
>
> > diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
> > index 5d86a96c17..82235f713e 100644
> > --- a/t/helper/test-reach.c
> > +++ b/t/helper/test-reach.c
> > @@ -6,7 +6,6 @@
> > #include "gettext.h"
> > #include "hex.h"
> > #include "object-name.h"
> > -#include "ref-filter.h"
> > #include "setup.h"
> > #include "string-list.h"
> > #include "tag.h"
> > @@ -138,16 +137,11 @@ int cmd__reach(int ac, const char **av)
> >
> > printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
> > } else if (!strcmp(av[1], "commit_contains")) {
> > - struct ref_filter filter = REF_FILTER_INIT;
> > struct contains_cache cache;
> > init_contains_cache(&cache);
> >
> > - if (ac > 2 && !strcmp(av[2], "--tag"))
> > - filter.with_commit_tag_algo = 1;
> > - else
> > - filter.with_commit_tag_algo = 0;
> > -
> > - printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
> > + printf("%s(_,A,X,_):%d\n", av[1],
> > + commit_contains(A, X, &cache));
> > clear_contains_cache(&cache);
> > } else if (!strcmp(av[1], "get_reachable_subset")) {
> > const int reachable_flag = 1;
>
> Makes sense.
>
> > diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
> > index 5b23ce5db9..ac68fdbacd 100755
> > --- a/t/perf/p1500-graph-walks.sh
> > +++ b/t/perf/p1500-graph-walks.sh
> > @@ -5,6 +5,8 @@ test_description='Commit walk performance tests'
> >
> > test_perf_large_repo
> >
> > +contains_ref_limit=8192
> > +
> > test_expect_success 'setup' '
> > git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
> > sort -r allrefs | head -n 50 >refs &&
> > @@ -32,10 +34,25 @@ test_expect_success 'setup' '
> > echo "X:$line" >>test-tool-tags || return 1
> > done &&
> >
> > + git rev-list --first-parent --max-count=$contains_ref_limit HEAD >contains-commits &&
> > + contains_ref_count=$(wc -l <contains-commits) &&
> > + test "$contains_ref_count" -gt 0 &&
> > + contains_base=$(tail -n 1 contains-commits) &&
> > + export contains_base &&
> > + awk "{
> > + printf \"update refs/contains-perf/%04d %s\\n\", NR, \$1
> > + }" contains-commits |
> > + git update-ref --stdin &&
> > + git pack-refs --include "refs/contains-perf/*" &&
> > +
> > commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
> > git update-ref refs/heads/disjoint-base $commit &&
> >
> > - git commit-graph write --reachable
> > + git commit-graph write --reachable &&
> > +
> > + git for-each-ref --contains="$contains_base" \
> > + refs/contains-perf/ >actual &&
> > + test_line_count = $contains_ref_count actual
>
> Shouldn't this be a separate test and not a part of the setup?
>
> > '
> >
> > test_perf 'ahead-behind counts: git for-each-ref' '
> > @@ -62,6 +79,11 @@ test_perf 'contains: git tag --merged' '
> > xargs git tag --merged=HEAD <tags
> > '
> >
> > +test_perf 'contains: git for-each-ref --contains' '
> > + git for-each-ref --contains="$contains_base" \
> > + refs/contains-perf/ >/dev/null
> > +'
> > +
>
> Ah! we also have this, so perhaps moving the `test_line_count` here and
> dropping it above would be better.
Makes sense! Done in v2.
>
> > test_perf 'is-base check: test-tool reach (refs)' '
> > test-tool reach get_branch_base_for_tip <test-tool-refs
> > '
> > diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
> > index e06feb06e9..169cc70c23 100755
> > --- a/t/t6301-for-each-ref-errors.sh
> > +++ b/t/t6301-for-each-ref-errors.sh
> > @@ -52,6 +52,24 @@ test_expect_success 'Missing objects are reported correctly' '
> > test_must_be_empty brief-err
> > '
> >
> > +test_expect_success 'missing ancestors are reported by contains filters' '
> > + test_when_finished "git update-ref -d refs/heads/missing-parent" &&
> > + {
> > + echo "tree $(git rev-parse HEAD^{tree})" &&
> > + echo "parent $MISSING" &&
> > + git cat-file commit HEAD |
> > + sed -n -e "/^author /p" -e "/^committer /p" &&
> > + echo &&
> > + echo "missing parent"
> > + } > commit &&
> > + broken=$(git hash-object -t commit -w commit) &&
> > + git update-ref refs/heads/missing-parent "$broken" &&
> > + test_must_fail git for-each-ref --contains=HEAD \
> > + refs/heads/missing-parent >out 2>err &&
> > + test_must_be_empty out &&
> > + test_grep "unable to parse commit $MISSING" err
> > +'
> > +
> > test_expect_success 'ahead-behind requires an argument' '
> > test_must_fail git for-each-ref \
> > --format="%(ahead-behind)" 2>err &&
> > diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> > index 7f060d97bf..423505d1fb 100755
> > --- a/t/t6302-for-each-ref-filter.sh
> > +++ b/t/t6302-for-each-ref-filter.sh
> > @@ -177,6 +177,27 @@ test_expect_success 'filtering with --contains and --no-contains' '
> > test_cmp expect actual
> > '
> >
> > +test_expect_success 'contains handles cyclic replacement histories' '
> > + one=$(git rev-parse one) &&
> > + three=$(git rev-parse three) &&
> > + test_when_finished "
> > + git replace -d $one
> > + git replace -d $three
> > + git tag -d cycle-a cycle-b
> > + " &&
> > + git tag cycle-a "$one" &&
> > + git tag cycle-b "$three" &&
> > + git replace --graft "$one" "$three" two &&
> > + git replace --graft "$three" "$one" &&
> > + cat >expect <<-\EOF &&
> > + refs/tags/cycle-a
> > + refs/tags/cycle-b
> > + EOF
> > + git for-each-ref --format="%(refname)" --contains=two \
> > + "refs/tags/cycle-*" >actual &&
> > + test_cmp expect actual
> > +'
> > +
> > test_expect_success '%(color) must fail' '
> > test_must_fail git for-each-ref --format="%(color)%(refname)"
> > '
> > diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> > index b5b314e570..1ecc2571c2 100755
> > --- a/t/t6600-test-reach.sh
> > +++ b/t/t6600-test-reach.sh
> > @@ -286,8 +286,7 @@ test_expect_success 'commit_contains:hit' '
> > X:commit-9-3
> > EOF
> > echo "commit_contains(_,A,X,_):1" >expect &&
> > - test_all_modes commit_contains &&
> > - test_all_modes commit_contains --tag
> > + test_all_modes commit_contains
> > '
> >
> > test_expect_success 'commit_contains:miss' '
> > @@ -303,8 +302,7 @@ test_expect_success 'commit_contains:miss' '
> > X:commit-9-3
> > EOF
> > echo "commit_contains(_,A,X,_):0" >expect &&
> > - test_all_modes commit_contains &&
> > - test_all_modes commit_contains --tag
> > + test_all_modes commit_contains
> > '
> >
> > test_expect_success 'rev-list: basic topo-order' '
> >
> > ---
> > base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
> > change-id: 20260607-ref-filter-memoized-contains-7cb6b3bccad1
> >
> > Best regards,
> > --
> > Tamir Duberstein <tamird@gmail.com>
>
> Overall the patch looks great. The perf improvements are also very
> welcome. Some small nits from me.
Thanks a lot for the review!
^ permalink raw reply
* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Jeff King @ 2026-06-08 22:34 UTC (permalink / raw)
To: Tamir Duberstein
Cc: git, Karthik Nayak, Junio C Hamano, Victoria Dye, Derrick Stolee,
Elijah Newren, Kristofer Karlsson
In-Reply-To: <20260607-ref-filter-memoized-contains-v1-1-a1972dde9c76@gmail.com>
On Sun, Jun 07, 2026 at 08:33:29PM -0700, Tamir Duberstein wrote:
> git branch and git for-each-ref call repo_is_descendant_of() for each
> candidate selected by --contains or --no-contains. Each call starts a
> new graph walk, so refs with shared history repeatedly traverse the same
> commits.
>
> ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> the tag traversal that caches positive and negative answers across
> candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> 2015-07-07) preserved the branch and tag implementations when ref-filter
> learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> the '--contains' option, 2015-10-18) noted that they should be unified.
>
> Use the memoized traversal for every ref-filter contains check and
> remove the implementation selector. The cache records answers for one
> fixed target list, so document that callers must clear it before
> changing the list.
The subject line obfuscated the intent here (at least for me). I think a
more clear subject would just be: "ref-filter: always use
contains_tag_algo" or something.
But more importantly, I think the analysis above is missing a key point
about why we didn't make the tag algo the default in the first place: it
is depth first, and thus slower when the merge base can be found quickly
by the breadth-first traversal. For tags, you tend to have to look at
all of history anyway (because you have at least one old tag that
requires walking back that far), but that is often not true for
branches.
We are able to get the best of both worlds if we can cut off the
depth-first traversal early using generation numbers.
So I think a better rule here is to tweak the selection in
commit_contains() to select the depth-first algorithm when we have
generation numbers enabled. There's a patch in an old thread, which was
revived a week or two ago by Kristofer (cc'd):
https://lore.kernel.org/git/20260527070510.3510836-1-krka@spotify.com/
> The memoized depth-first walk assumes acyclic ancestry, but replacement
> refs can create cycles. Track commits while they are on the walk. If a
> cycle is found, discard partial cache entries and use
> repo_is_descendant_of() for that candidate.
I can believe that the depth-first code doesn't handle cycles well. But
if that's the case, then it's already a problem for "git tag
--contains". And we should fix it as a separate patch from enabling that
algorithm in more cases.
I'm not quite sure how ancestry should be defined in a cycle. How does
the algorithm behave now when it sees a cycle? If it loops infinitely,
we definitely would want to fix that. If not, then to some degree I
don't care too much what answer is provided, since the input is somewhat
nonsense in the first place. And if it is expensive to track, it might
not be worth inflicting that penalty on the sane cases. But it looks
like your solution is just setting an extra flag value in the slab,
which should be pretty cheap.
> The branch and for-each-ref path passed repo_is_descendant_of() through
> a Boolean interface. In configurations where it returned -1 for missing
> ancestry, ref-filter treated the error as "contains". The memoized path
> instead fails when ancestry cannot be parsed, as git tag already did.
> During review of the 2018 reachability series, making parse failures
> fatal was explicitly deferred because that series was intended to
> preserve behavior. Unifying the implementations now makes all callers
> fail consistently instead of preserving that accidental Boolean
> interpretation.
I think that's a good outcome.
> The added p1500 case uses up to 8,192 packed refs along one first-parent
> history. It improves from 0.68 to 0.03 seconds.
>
> On a checkout with 62,174 remote-tracking refs, I ran:
>
> hyperfine --warmup 0 --runs 3 \
> --command-name parent \
> '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
> --command-name this-commit \
> '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
>
> The results were:
>
> parent this commit
> elapsed 104.365 s 467.7 ms
> user 93.702 s 220.2 ms
> system 0.723 s 182.7 ms
I didn't time it, but the probable regression case is something like
this: a very deep history with a small number of branches diverging only
a few commits away. Without a commit-graph file (or one without
generation numbers), that probably makes "git branch --contains" slower.
-Peff
^ permalink raw reply
* Re: [GSoC PATCH v2 0/4] teach git repo info to handle path keys
From: Junio C Hamano @ 2026-06-08 22:36 UTC (permalink / raw)
To: K Jayatheerth
Cc: git, a3205153416, jltobler, kumarayushjha123, lucasseikioshiro,
phillip.wood, sandals
In-Reply-To: <20260605163012.181089-1-jayatheerthkulkarni2005@gmail.com>
K Jayatheerth <jayatheerthkulkarni2005@gmail.com> writes:
> 2. Should we consider a default option?
> Currently we have path.gitdir.absolute. Should we consider an
> option where a plain `path.gitdir` returns some default?
Probably not. It will invite folks wanting to tweak the default
between absolute and relative, rendering this feature useless for
robust scripting. You do not necessarily want to save typing in
plumbing interface. You want to reduce ambiguity by reducing more
than one ways to do a thing down to just one way, and as long as
that one way is not overly verbose, you are fine.
^ permalink raw reply
* Re: [PATCH] ref-filter: restore prefix-scoped iteration
From: Tamir Duberstein @ 2026-06-08 22:39 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Karthik Nayak, Patrick Steinhardt, Victoria Dye, ZheNing Hu
In-Reply-To: <xmqqpl20vhni.fsf@gitster.g>
On Mon, Jun 8, 2026 at 2:36 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Tamir Duberstein <tamird@gmail.com> writes:
>
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 1da4c0e60d..2388a57b39 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -3315,19 +3315,31 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, refs_for
> > prefix = "refs/tags/";
> >
> > if (prefix) {
>
> Below, adding an extra call to get_main_ref_store(the_repository)
> makes one line unnecessarily split and harder to read. How about
> doing
>
> struct ref_store *store = get_main_ref_store(the_repository);
>
> upfront here, and then use that to replace these two calls of
> get_main_ref_store(the_repository)?
Yep, done in v2.
Thanks for the review!
By the way, how long should I wait before sending new versions of my
patches? I have 4 outstanding at the moment.
^ permalink raw reply
* Re: [GSoC PATCH v2 4/4] repo: add path.commondir with absolute and relative suffix formatting
From: Lucas Seiki Oshiro @ 2026-06-08 22:40 UTC (permalink / raw)
To: K Jayatheerth
Cc: git, a3205153416, gitster, jltobler, kumarayushjha123,
phillip.wood, sandals
In-Reply-To: <20260605163012.181089-5-jayatheerthkulkarni2005@gmail.com>
This patch looks really straightforward after the previous one.
I hope the rest of the path.* series will be just like that.
> +test_repo_info_path 'commondir' 'echo "$(cd .. && pwd)/.git"' '../.git'
> +test_repo_info_path 'commondir' 'echo "$(cd .. && pwd)/custom-common"' '../custom-common' 'GIT_COMMON_DIR="$(cd .. && pwd)/custom-common" GIT_DIR=../.git'
> +test_repo_info_path 'commondir' 'echo "$(cd .. && pwd)/.git"' '../.git' 'GIT_DIR=../.git'
If you use the test_repo_info_path that I suggested in the
other answer, this would be:
test_repo_info_path 'commondir without env vars' 'commondir' 'common-no-env' \
'.git' '../.git'
test_repo_info_path 'commondir with GIT_COMMON_DIR and GIT_DIR' 'commondir' \
'commondir-envs' 'custom-common' '../custom-common'\
'export GIT_COMMON_DIR="$ROOT/custom-common" &&
export GIT_DIR="../.git" &&
git init --bare "$ROOT/custom-common"
'
test_repo_info_path 'commondir with only GIT_DIR' 'commondir' \
'commondir-only-gitdir' '.git' '../.git' 'GIT_DIR=../.git'
^ permalink raw reply
* Re: [PATCH 0/3] config: allow disabling config includes
From: Jeff King @ 2026-06-08 22:51 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee
In-Reply-To: <pull.2139.git.1780927027.gitgitgadget@gmail.com>
On Mon, Jun 08, 2026 at 01:57:03PM +0000, Derrick Stolee via GitGitGadget wrote:
> This series introduces a new way to ignore config include directives via two
> mechanisms:
>
> * GIT_CONFIG_INCLUDES=0 in the environment.
> * git --no-includes ... in the command line.
>
> My motivation is from a tricky situation where users want to do the risky
> thing and include a repo-tracked file for sharing aliases and other
> recommended config. They are then struggling in a later build step that is
> running Git commands (under a tool we don't control and can't change) that
> then cause filesystem accesses outside of the build system's sandbox.
I'm not opposed to global control of includes, but this is just one way
in which config can escape the sandbox. They can always point to files
(e.g., core.attributesFile) or even commands that totally leave the
sandbox (e.g., ext diff or textconv commands). Fundamentally git config
is equivalent to arbitrary code execution, so pointing an include at a
repo-tracked file carries the risk of confusion both malicious and
accidental.
So I dunno. From the described motivation, this feels like a band-aid
that fixes only one narrow instance of a greater problem.
The notion of enabling/disabling includes per-command is itself a
flexible building block. So it's possible that it has other uses in
general. But it's also a fairly broad hammer that covers more than your
use case. If you're planning to use "git --no-includes" in some script,
then it breaks the config of anybody who uses includes in their
user-level ~/.gitconfig file.
So you may need some more directed limiting.
> One thing I do worry about is whether or not this would cause a significant
> break in behavior, or if this is a relatively safe thing to allow.
Yeah. Consider something like:
$ cat ~/.gitconfig
[user]
name = My Name
email = me@personal.example.com
[includeIf "gitdir:/path/to/work/stuff"]
path = .gitconfig-work
$ cat ~/.gitconfig-work
[user]
email = me@work.example.com
Using "git --no-include" will silently use the wrong user.email value.
That's OK if the user is asking for it, but if you are planning to
sprinkle "--no-include" inside scripts, that's likely to cause
confusion.
-Peff
^ permalink raw reply
* Re: [PATCH] ls-files: filter pathspec before lstat
From: Jeff King @ 2026-06-08 23:03 UTC (permalink / raw)
To: Junio C Hamano
Cc: Tamir Duberstein, git, René Scharfe, Patrick Steinhardt
In-Reply-To: <xmqqa4t5yyee.fsf@gitster.g>
On Mon, Jun 08, 2026 at 06:06:33AM -0700, Junio C Hamano wrote:
> On Sun, Jun 7, 2026 at 11:40, Tamir Duberstein wrote:
> > show_files() checks whether each index entry is deleted or modified
> > before show_ce() applies the pathspec. prune_index() avoids most of this
> > work for pathspecs with a common directory prefix, but a top-level name
> > or leading wildcard leaves every entry to be checked.
> >
> > Match the pathspec before lstat() for the deleted and modified modes.
> > Keep the later match in show_ce() so --error-unmatch is satisfied only
> > by entries that are actually shown.
>
> Adding an extra early `match_pathspec()` check before making slow
> system calls like `lstat()` makes sense, especially when most of the
> index entries need to be skipped. But if most of them would match,
> then we would end up doing the same match_pathspec() calls twice for
> each path, and run lstat() anyway, so you may also be able to
> construct a perf test that demonstrates a case where this approach
> is not a clear win (or even degradation), perhaps?
The patchspec matching is linear in the number of pathspecs, so it's
easy to get quadratic-ish results by just asking about:
git ls-files -- $(git ls-files)
So that probably provides an easy regression demonstration for this
patch.
I don't know how much it matters in the real world. That command is
_already_ slow, and mostly people don't care that much. Long ago I had
patches to build a trie of literal pathspecs, with the intent that
blame-tree/last-modified could use the pathspec mechanism, but
ultimately we took the code in a different direction. And nobody really
complained about it since. ;)
-Peff
^ permalink raw reply
* Re: limiting git branch --contains
From: Jeff King @ 2026-06-08 23:12 UTC (permalink / raw)
To: Kristofer Karlsson; +Cc: git, Derrick Stolee
In-Reply-To: <20260527070510.3510836-1-krka@spotify.com>
On Wed, May 27, 2026 at 09:05:09AM +0200, Kristofer Karlsson wrote:
> With generation numbers (commit-graph), the cached algorithm is
> strictly at least as good as the uncached one. Both have the same
> walk floor -- the generation cutoff in contains_tag_algo is equivalent
> to the STALE boundary in paint_down_to_common. The cache then provides
> a pure win: O(total unique commits) instead of O(N * commits per ref).
>
> In my benchmarks, the improvement is significant and scales with the
> number of refs and the depth of the target commit:
Yeah, I think this is worth pursuing. Looks like somebody else also
generated a similar patch recently:
https://lore.kernel.org/git/20260607-ref-filter-memoized-contains-v1-1-a1972dde9c76@gmail.com/
But I think the approach in this thread (to use depth-first only when we
have generation numbers) makes more sense.
I cc'd you on that thread.
-Peff
^ permalink raw reply
* Re: [PATCH] ls-files: filter pathspec before lstat
From: Jeff King @ 2026-06-08 23:25 UTC (permalink / raw)
To: Junio C Hamano
Cc: Tamir Duberstein, git, René Scharfe, Patrick Steinhardt
In-Reply-To: <20260608230315.GC340696@coredump.intra.peff.net>
On Mon, Jun 08, 2026 at 07:03:15PM -0400, Jeff King wrote:
> > Adding an extra early `match_pathspec()` check before making slow
> > system calls like `lstat()` makes sense, especially when most of the
> > index entries need to be skipped. But if most of them would match,
> > then we would end up doing the same match_pathspec() calls twice for
> > each path, and run lstat() anyway, so you may also be able to
> > construct a perf test that demonstrates a case where this approach
> > is not a clear win (or even degradation), perhaps?
>
> The patchspec matching is linear in the number of pathspecs, so it's
> easy to get quadratic-ish results by just asking about:
>
> git ls-files -- $(git ls-files)
>
> So that probably provides an easy regression demonstration for this
> patch.
Ah, yeah, it is easy to demonstrate. Making a repo of size $n like this:
n=10000
git init
for i in $(seq $n); do
echo $i >file$i
done
git add .
git commit -m foo
If we then run:
time git ls-files -- $(git ls-files) >/dev/null
then n=1000 takes ~15ms for me, but n=10000 takes ~800ms. So that shows
the slowdown of the existing pathspec code as the number of pathspecs
grows.
With this patch, starting with n=10000 and adding in "-m" (which
triggers the code in this patch), like:
time git ls-files -m -- $(git ls-files) >/dev/null
the time goes from ~15ms (without the patch) to ~800ms with it. Which
makes sense. Nothing is modified, so the current code which puts the
lstat() check first eliminates each entry before we even consider
pathspecs. So it doesn't hit the slow case at all.
But after the patch, we do a preliminary pathspec match and
pay the cost.
So it really is a question of how many items are actually modified, the
cost of lstat(), and the cost of pathspec matching (which varies with
the size of the pathspec).
But like I said, this is kind of a silly case. If it actually starts to
matter in the real world, I think it may be more productive to make the
pathspec code scale better.
-Peff
^ permalink raw reply
* Re: [PATCH] ref-filter: reuse --contains traversal results
From: Tamir Duberstein @ 2026-06-08 23:35 UTC (permalink / raw)
To: Jeff King
Cc: git, Karthik Nayak, Junio C Hamano, Victoria Dye, Derrick Stolee,
Elijah Newren, Kristofer Karlsson
In-Reply-To: <20260608223430.GA340696@coredump.intra.peff.net>
On Mon, Jun 8, 2026 at 3:34 PM Jeff King <peff@peff.net> wrote:
>
> On Sun, Jun 07, 2026 at 08:33:29PM -0700, Tamir Duberstein wrote:
>
> > git branch and git for-each-ref call repo_is_descendant_of() for each
> > candidate selected by --contains or --no-contains. Each call starts a
> > new graph walk, so refs with shared history repeatedly traverse the same
> > commits.
> >
> > ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
> > the tag traversal that caches positive and negative answers across
> > candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
> > 2015-07-07) preserved the branch and tag implementations when ref-filter
> > learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
> > the '--contains' option, 2015-10-18) noted that they should be unified.
> >
> > Use the memoized traversal for every ref-filter contains check and
> > remove the implementation selector. The cache records answers for one
> > fixed target list, so document that callers must clear it before
> > changing the list.
>
> The subject line obfuscated the intent here (at least for me). I think a
> more clear subject would just be: "ref-filter: always use
> contains_tag_algo" or something.
Ack, changed to "ref-filter: memoize --contains with generations" in v2 draft.
>
> But more importantly, I think the analysis above is missing a key point
> about why we didn't make the tag algo the default in the first place: it
> is depth first, and thus slower when the merge base can be found quickly
> by the breadth-first traversal. For tags, you tend to have to look at
> all of history anyway (because you have at least one old tag that
> requires walking back that far), but that is often not true for
> branches.
>
> We are able to get the best of both worlds if we can cut off the
> depth-first traversal early using generation numbers.
>
> So I think a better rule here is to tweak the selection in
> commit_contains() to select the depth-first algorithm when we have
> generation numbers enabled. There's a patch in an old thread, which was
> revived a week or two ago by Kristofer (cc'd):
>
> https://lore.kernel.org/git/20260527070510.3510836-1-krka@spotify.com/
Very good catch, thank you. I reproduced the regression with a
100,000-commit history and generation numbers disabled. The parent
took 13.0 ms, the unconditional depth-first version took 238.4 ms, and
the generation-aware version took 9.1 ms.
I didn't find a patch in that thread, so I will reroll using the
memoized walk for tags or when generation numbers are enabled, while
retaining the breadth-first walk otherwise. If someone else would
prefer to send that patch, that is fine by me as well.
>
> > The memoized depth-first walk assumes acyclic ancestry, but replacement
> > refs can create cycles. Track commits while they are on the walk. If a
> > cycle is found, discard partial cache entries and use
> > repo_is_descendant_of() for that candidate.
>
> I can believe that the depth-first code doesn't handle cycles well. But
> if that's the case, then it's already a problem for "git tag
> --contains". And we should fix it as a separate patch from enabling that
> algorithm in more cases.
Agreed, and Karthik flagged the same. The cycle handling is now a
separate first patch.
>
> I'm not quite sure how ancestry should be defined in a cycle. How does
> the algorithm behave now when it sees a cycle? If it loops infinitely,
> we definitely would want to fix that. If not, then to some degree I
> don't care too much what answer is provided, since the input is somewhat
> nonsense in the first place. And if it is expensive to track, it might
> not be worth inflicting that penalty on the sane cases. But it looks
> like your solution is just setting an extra flag value in the slab,
> which should be pretty cheap.
>
> > The branch and for-each-ref path passed repo_is_descendant_of() through
> > a Boolean interface. In configurations where it returned -1 for missing
> > ancestry, ref-filter treated the error as "contains". The memoized path
> > instead fails when ancestry cannot be parsed, as git tag already did.
> > During review of the 2018 reachability series, making parse failures
> > fatal was explicitly deferred because that series was intended to
> > preserve behavior. Unifying the implementations now makes all callers
> > fail consistently instead of preserving that accidental Boolean
> > interpretation.
>
> I think that's a good outcome.
>
> > The added p1500 case uses up to 8,192 packed refs along one first-parent
> > history. It improves from 0.68 to 0.03 seconds.
> >
> > On a checkout with 62,174 remote-tracking refs, I ran:
> >
> > hyperfine --warmup 0 --runs 3 \
> > --command-name parent \
> > '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
> > --command-name this-commit \
> > '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'
> >
> > The results were:
> >
> > parent this commit
> > elapsed 104.365 s 467.7 ms
> > user 93.702 s 220.2 ms
> > system 0.723 s 182.7 ms
>
> I didn't time it, but the probable regression case is something like
> this: a very deep history with a small number of branches diverging only
> a few commits away. Without a commit-graph file (or one without
> generation numbers), that probably makes "git branch --contains" slower.
>
> -Peff
^ permalink raw reply
* git-diff in a worktree is an order of magnitude slower?
From: D. Ben Knoble @ 2026-06-08 23:36 UTC (permalink / raw)
To: Git
Hello all,
I'd like to report and offer to help fix what I view as a serious performance
bug:
"git diff --no-ext-diff --quiet" performs about ~10x slower in a secondary
worktree than in the main worktree.
Fortunately, this doesn't seem to extend to "--cached" (and "--no-ext-diff" and
"--quiet" are probably both red-herrings, since it _does_ extend to plain "git
diff").
Here's a short demo in Git:
# git switch -d v2.54.0
# ninja -C build # where my meson-generated build dir is
# git worktree add --detach ../perf-test v2.54.0
# hyperfine -N --warmup 10 './build/bin-wrappers/git diff'
Benchmark 1: ./build/bin-wrappers/git diff
Time (mean ± σ): 3.4 ms ± 0.5 ms [User: 4.2 ms, System: 3.9 ms]
Range (min … max): 2.5 ms … 5.8 ms 677 runs
# pushd ../perf-test
# hyperfine -N --warmup 10 '../git/build/bin-wrappers/git diff'
Benchmark 1: ../git/build/bin-wrappers/git diff
Time (mean ± σ): 223.3 ms ± 10.5 ms [User: 210.4 ms,
System: 19.1 ms]
Range (min … max): 213.5 ms … 243.9 ms 13 runs
I've had a similar experience at $DAYJOB, where a large repo takes ~6ms for the
former and ~650ms for the latter. I noticed because the Bash prompt functions
execute "git diff --no-ext-diff --quiet", and that was (AFAICT) the largest
culprit for a slow shell prompt in a worktree. To squelch that from the prompt,
I have to go down the rabbit hole of the worktree config extension, so I figured
better to fix the slow diff if possible anyway.
2 questions:
1. Is this known, and if so is anybody working on it?
2. How can I help identify problem areas?
A little more
-------------
I've reproduced this as far back as v2.50.0, which is as far back as I could get
the meson build to work with little effort (so I can't rule out that this is an
old regression).
Using "perf record -F 99 -g -- <bin-wrappers/git> diff" in both trees and then
"perf report":
- it looks like the main worktree spends most of it's time in preload_thread,
threaded_has_symlink_leading_path, lstat_cache…
- the worktree spends a lot more time in ie_match_stat, ce_modified_check_fs,
ce_compare_data, index_fd, would_convert_to_git_filter_fd…
Here's the relevant "perf stat":
main tree:
Performance counter stats for './build/bin-wrappers/git diff':
0 context-switches:u # 0,0
cs/sec cs_per_second
0 cpu-migrations:u # 0,0
migrations/sec migrations_per_second
967 page-faults:u # 65036,4
faults/sec page_faults_per_second
14,87 msec task-clock:u # 0,3
CPUs CPUs_utilized
48 616 branch-misses:u # 3,2 %
branch_miss_rate (57,19%)
3 571 630 branches:u # 240,2
M/sec branch_frequency
13 635 411 cpu-cycles:u # 0,9
GHz cycles_frequency
22 120 068 instructions:u # 1,9
instructions insn_per_cycle (85,61%)
3 634 065 stalled-cycles-frontend:u # 0,28
frontend_cycles_idle (9,56%)
0,006860098 seconds time elapsed
0,001364000 seconds user
0,015157000 seconds sys
worktree:
Performance counter stats for '../git/build/bin-wrappers/git diff':
0 context-switches:u # 0,0
cs/sec cs_per_second
0 cpu-migrations:u # 0,0
migrations/sec migrations_per_second
1 585 page-faults:u # 5058,0
faults/sec page_faults_per_second
313,37 msec task-clock:u # 0,9
CPUs CPUs_utilized
2 481 188 branch-misses:u # 1,5 %
branch_miss_rate (48,94%)
168 664 155 branches:u # 538,2
M/sec branch_frequency (51,21%)
1 004 095 217 cpu-cycles:u # 3,2
GHz cycles_frequency (67,74%)
3 864 851 223 instructions:u # 3,9
instructions insn_per_cycle (52,73%)
70 755 234 stalled-cycles-frontend:u # 0,07
frontend_cycles_idle (49,29%)
0,306707634 seconds time elapsed
0,269027000 seconds user
0,045512000 seconds sys
My observations:
- the worktree has ~twice as many page faults and
- executes ~150 times as many instructions (3.8b compared to 23m).
(When I try to run some "perf" stats as root to access other counters, like
syscalls, "git diff" in the worktree says "not a git repository", so I'm not
counting the actual behavior. Ditto with DTrace.)
PS I almost CC'd Peff and Patrick, whose names stood out in "git
shortlog builtin/{worktree,diff}* object-file* | sort -t\( -k2 -g",
but decided they'd be their own best judge of whether they can
understand what's going on? :)
--
D. Ben Knoble
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox