* [PATCH 0/3] git for-each-ref: is-base atom and base branches
@ 2024-08-01 22:10 Derrick Stolee via GitGitGadget
2024-08-01 22:10 ` [PATCH 1/3] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
` (4 more replies)
0 siblings, 5 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-01 22:10 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee
This change introduces a new 'git for-each-ref' atom, 'is-base', in a very
similar way to the 'ahead-behind' atom. As detailed carefully in the first
change, this is motivated by the need to detect the concept of a "base
branch" in a repository with multiple long-lived branches.
This change is motivated by a third-party tool created to make this
detection with the same optimization mechanism, but using a much slower
technique due to the limitations of the Git CLI not presenting this
information. The existing algorithm involves using git rev-list
--first-parent -<N> in batches for the collection of considered references,
comparing those lists, and increasing <N> as needed until finding a
collision. This new use of 'git for-each-ref' will allow determining this
mechanism within a single process and walking a minimal number of commits.
There are benefits to users both on client-side and server-side. In an
internal monorepo, this base branch detection algorithm is used to determine
a long-lived branch based on the HEAD commit, mapping to a group within the
organizational structure of the repository, which determines a set of
projects that the user will likely need to build; this leads to
automatically selecting an initial sparse-checkout definition based on the
build dependencies required. An upcoming feature in Azure Repos will use
this algorithm to automatically create a pull request against the correct
target branch, reducing user pain from needing to select a different branch
after a large commit diff is rendered against the default branch. This atom
unlocks that ability for Git hosting services that use Git in their backend.
Thanks, -Stolee
Derrick Stolee (3):
commit-reach: add get_branch_base_for_tip
for-each-ref: add 'is-base' token
p1500: add is-base performance tests
commit-reach.c | 118 ++++++++++++++++++++++++++++++++++++
commit-reach.h | 17 ++++++
ref-filter.c | 78 +++++++++++++++++++++++-
ref-filter.h | 15 +++++
t/helper/test-reach.c | 2 +
t/perf/p1500-graph-walks.sh | 31 ++++++++++
t/t6600-test-reach.sh | 94 ++++++++++++++++++++++++++++
7 files changed, 354 insertions(+), 1 deletion(-)
base-commit: bea9ecd24b0c3bf06cab4a851694fe09e7e51408
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1768%2Fderrickstolee%2Ftarget-ref-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1768/derrickstolee/target-ref-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1768
--
gitgitgadget
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/3] commit-reach: add get_branch_base_for_tip
2024-08-01 22:10 [PATCH 0/3] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
@ 2024-08-01 22:10 ` Derrick Stolee via GitGitGadget
2024-08-01 22:10 ` [PATCH 2/3] for-each-ref: add 'is-base' token Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
4 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-01 22:10 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Add a new reachability algorithm that intends to discover (from a heuristic)
which branch was used as the starting point for a given commit. Add focused
tests using the 'test-tool reach' command.
Repositories that use pull requests (or merge requests) to advance one or
more "protected" branches, the history of that reference can be recovered by
following the first-parent history in most cases. Most are completed using
no-fast-forward merges, though squash merges are quite common. Less common
is rebase-and-merge, which still validates this assumption. Finally, the
case that breaks this assumption is the fast-forward update (with potential
rebasing). Even in this case, the previous commit commonly appears in the
first-parent history of the branch.
Similar assumptions can be made for a topic branch created by a single user
with the intention to merge back into another branch. Using 'git commit',
'git merge', and 'git cherry-pick' from HEAD will default to having the
first-parent commit be the previous commit at HEAD. This history changes
only with commands such as 'git reset' or 'git rebase', where the command
names also imply that the branch is starting from a new location.
With this movement of branches in mind, the following heuristic is proposed
as a way to determine the base branch for a given source branch:
Among a list of candidate base branches, select the candidate that
minimizes the number of commits in the first-parent history of the source
that are not in the first-parent history of the candidate.
Prior third-party solutions to this problem have used this optimization
criteria, but have relied upon extracting the first-parent history and
comparing those lists as tables instead of using commit-graph walks.
Given current command-line interface options, this optimization criteria is
not easy to detect directly. Even using the command
git rev-list --count --first-parent <base>..<source>
does not measure this count, as it uses full reachability from <base> to
determine which commits to remove from the range '<base>..<source>'. This
may lead to one asking if we should instead be using the full reachability
of the candidate and only the first-parent history of the source. This,
unfortunately, does not work for repositories that use long-lived branches
and automation to merge across those branches.
In extremely large repositories, merging into a single trunk may not be
feasible. This is usually due to the desired frequency of updates
(thousands of engineers doing daily work) combined with the time required to
perform a validation build. These factors combine to create significant
risk of semantic merge conflicts, leading to build breaks on the trunk. In
response, repository maintainers can create a single Level Zero (L0) trunk
and multiple Level One (L1) branches. By partitioning the engineers by
organization, these engineers may see lower risk of semantic merge conflicts
as well as be protected against build breaks in other L1 branches. The key
to making this system work is a semi-automated process of merging L1
branches into the L0 trunk and vice-versa. In a large enough organization,
these L1 branches may further split into L2 or L3 branches, but the same
principles apply for merging across deeper levels.
If these automated merges use a typical merge with the second parent
bringing in the "new" content, then each L0 and L1 branch can track its
previous positions by following first-parent history, which appear as
parallel paths (until reaching the first place where the branches diverged).
If we also walk to second parents, then the histories overlap significantly
and cannot be distinguished except for very-recent changes.
For this reason, the first-parent condition should be symmetrical across the
base and source branches.
Another common case for desiring the result of this optimization method is
the use of release branches. When releasing a version of a repository, a
branch can be used to track that release. Any updates that are worth fixing
in that release can be merged to the release branch and shipped with only
the necessary fixes without any new features introduced in the trunk branch.
The 'maint-2.<X>' branches represent this pattern in the Git project. The
microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that
are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This
application doesn't need the symmetrical first-parent condition, but the use
of first-parent histories does not change the results for these branches.
To determine the base branch from a list of candidates, create a new method
in commit-reach.c that performs a single* commit-graph walk. The core
concept is to walk first-parents starting at the candidate bases and the
source, tracking the "best" base to reach a given commit. Use generation
numbers to ensure that a commit is walked at most once and all children have
been explored before visiting it. When reaching a commit that is reachable
from both a base and the source, we will then have a guarantee that this is
the closest intersection of first-parent histories. Track the best base to
reach that commit and return it as a result. In rare cases involving
multiple root commits, the first-parent history of the source may never
intersect any of the candidates and thus a null result is returned.
* There are up to two walks, since we require all commits to have a computed
generation number in order to avoid incorrect results. This is similar to
the need for computed generation numbers in ahead_behind() as implemented
in fd67d149bde (commit-reach: implement ahead_behind() logic, 2023-03-20).
In order to track the "best" base, use a new commit slab that stores an
integer. This value defaults to zero upon initialization, so use -1 to
track that the source commit can reach this commit and use 'i + 1' to track
that the ith base can reach this commit. When multiple bases can reach a
commit, minimize the index to break ties. This allows the caller to specify
an order to the bases that determines some amount of preference when the
heuristic does not result in a unique result.
The trickiest part of the integer slab is what happens when reaching a
collision among the histories of the bases and the history of the source.
This is noticed when viewing the first parent and seeing that it has a slab
value that differs in sign (negative or positive). In this case, the
collision commit is stored in the method variable 'branch_point' and its
slab value is set to -1. The index of the best base (so far) is stored in
the method variable 'best_index'. It is possible that there are multiple
commits that have the branch_point as its first parent, leading to multiple
updates of best_index. The result is determined when 'branch_point' is
visited in the commit walk, giving the guarantee that all commits that could
reach 'branch_point' were visited.
Several interesting cases of collisions and different results are tested in
the t6600-test-reach.sh script. Recall that this script also tests the
algorithm in three possible states involving the commit-graph file and how
many commits are written in the file. This provides some coverage of the
need (and lack of need) for the ensure_generations_valid() method.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
commit-reach.c | 118 ++++++++++++++++++++++++++++++++++++++++++
commit-reach.h | 17 ++++++
t/helper/test-reach.c | 2 +
t/t6600-test-reach.sh | 47 +++++++++++++++++
4 files changed, 184 insertions(+)
diff --git a/commit-reach.c b/commit-reach.c
index 8f9b008f876..1b56fb081a6 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1222,3 +1222,121 @@ done:
free(commits);
repo_clear_commit_marks(r, SEEN);
}
+
+/*
+ * This slab initializes integers to zero, so use "-1" for "tip is best" and
+ * "i + 1" for "bases[i] is best".
+ */
+define_commit_slab(best_branch_base, int);
+static struct best_branch_base best_branch_base;
+#define get_best(c) (*best_branch_base_at(&best_branch_base, c))
+#define set_best(c,v) (*best_branch_base_at(&best_branch_base, c) = v)
+
+int get_branch_base_for_tip(struct repository *r,
+ struct commit *tip,
+ struct commit **bases,
+ size_t bases_nr)
+{
+ int best_index = -1;
+ struct commit *branch_point = NULL;
+ struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+ int found_missing_gen = 0;
+
+ if (!bases_nr)
+ return -1;
+
+ repo_parse_commit(r, tip);
+ if (commit_graph_generation(tip) == GENERATION_NUMBER_INFINITY)
+ found_missing_gen = 1;
+
+ /* Check for missing generation numbers. */
+ for (size_t i = 0; i < bases_nr; i++) {
+ struct commit *c = bases[i];
+ repo_parse_commit(r, c);
+ if (commit_graph_generation(c) == GENERATION_NUMBER_INFINITY)
+ found_missing_gen = 1;
+ }
+
+ if (found_missing_gen) {
+ struct commit **commits;
+ size_t commits_nr = bases_nr + 1;
+
+ CALLOC_ARRAY(commits, commits_nr);
+ COPY_ARRAY(commits, bases, bases_nr);
+ commits[bases_nr] = tip;
+ ensure_generations_valid(r, commits, commits_nr);
+ free(commits);
+ }
+
+ /* Initialize queue and slab now that generations are guaranteed. */
+ init_best_branch_base(&best_branch_base);
+ set_best(tip, -1);
+ prio_queue_put(&queue, tip);
+
+ for (size_t i = 0; i < bases_nr; i++) {
+ struct commit *c = bases[i];
+
+ /* Has this already been marked as best by another commit? */
+ if (get_best(c))
+ continue;
+
+ set_best(c, i + 1);
+ prio_queue_put(&queue, c);
+ }
+
+ while (queue.nr) {
+ struct commit *c = prio_queue_get(&queue);
+ int best_for_c = get_best(c);
+ int best_for_p, positive;
+ struct commit *parent;
+
+ /* Have we reached a known branch point? It's optimal. */
+ if (c == branch_point)
+ break;
+
+ repo_parse_commit(r, c);
+ if (!c->parents)
+ continue;
+
+ parent = c->parents->item;
+ repo_parse_commit(r, parent);
+ best_for_p = get_best(parent);
+
+ if (!best_for_p) {
+ /* 'parent' is new, so pass along best_for_c. */
+ set_best(parent, best_for_c);
+ prio_queue_put(&queue, parent);
+ continue;
+ }
+
+ if (best_for_p > 0 && best_for_c > 0) {
+ /* Collision among bases. Minimize. */
+ if (best_for_c < best_for_p)
+ set_best(parent, best_for_c);
+ continue;
+ }
+
+ /*
+ * At this point, we have reached a commit that is reachable
+ * from the tip, either from 'c' or from an earlier commit to
+ * have 'parent' as its first parent.
+ *
+ * Update 'best_index' to match the minimum of all base indices
+ * to reach 'parent'.
+ */
+
+ /* Exactly one is positive due to initial conditions. */
+ positive = (best_for_c < 0) ? best_for_p : best_for_c;
+
+ if (best_index < 0 || positive < best_index)
+ best_index = positive;
+
+ /* No matter what, track that the parent is reachable from tip. */
+ set_best(parent, -1);
+ branch_point = parent;
+ }
+
+ clear_best_branch_base(&best_branch_base);
+ clear_prio_queue(&queue);
+ return best_index > 0 ? best_index - 1 : -1;
+}
diff --git a/commit-reach.h b/commit-reach.h
index bf63cc468fd..9a745b7e176 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -139,4 +139,21 @@ void tips_reachable_from_bases(struct repository *r,
struct commit **tips, size_t tips_nr,
int mark);
+/*
+ * Given a 'tip' commit and a list potential 'bases', return the index 'i' that
+ * minimizes the number of commits in the first-parent history of 'tip' and not
+ * in the first-parent history of 'bases[i]'.
+ *
+ * Among a list of long-lived branches that are updated only by merges (with the
+ * first parent being the previous position of the branch), this would inform
+ * which branch was used to create the tip reference.
+ *
+ * Returns -1 if no common point is found in first-parent histories, which is
+ * rare, but possible with multiple root commits.
+ */
+int get_branch_base_for_tip(struct repository *r,
+ struct commit *tip,
+ struct commit **bases,
+ size_t bases_nr);
+
#endif
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 1e3b431e3e7..8579b607aa5 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -114,6 +114,8 @@ int cmd__reach(int ac, const char **av)
repo_in_merge_bases_many(the_repository, A, X_nr, X_array, 0));
else if (!strcmp(av[1], "is_descendant_of"))
printf("%s(A,X):%d\n", av[1], repo_is_descendant_of(r, A, X));
+ else if (!strcmp(av[1], "get_branch_base_for_tip"))
+ printf("%s(A,X):%d\n", av[1], get_branch_base_for_tip(r, A, X_array, X_nr));
else if (!strcmp(av[1], "get_merge_bases_many")) {
struct commit_list *list = NULL;
if (repo_get_merge_bases_many(the_repository,
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b330945f497..3069efc8601 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -612,4 +612,51 @@ test_expect_success 'for-each-ref merged:none' '
--format="%(refname)" --stdin
'
+# For get_branch_base_for_tip, we only care about
+# first-parent history. Here is the test graph with
+# second parents removed:
+#
+# (10,10)
+# /
+# (10,9) (9,10)
+# / /
+# (10,8) (9,9) (8,10)
+# / / /
+# ( continued...)
+# \ / / /
+# (3,1) (2,2) (1,3)
+# \ / /
+# (2,1) (1,2)
+# \ /
+# (1,1)
+#
+# In short, for a commit (i,j), the first-parent history
+# walks all commits (i, k) with k from j to 1, then the
+# commits (l, 1) with l from i to 1.
+
+test_expect_success 'get_branch_base_for_tip: none reach' '
+ # (2,3) branched from the first tip (i,4) in X with i > 2
+ cat >input <<-\EOF &&
+ A:commit-2-3
+ X:commit-1-2
+ X:commit-1-4
+ X:commit-4-4
+ X:commit-8-4
+ X:commit-10-4
+ EOF
+ echo "get_branch_base_for_tip(A,X):2" >expect &&
+ test_all_modes get_branch_base_for_tip
+'
+
+test_expect_success 'get_branch_base_for_tip: all reach tip' '
+ # (2,3) branched from the first tip (i,4) in X with i > 2
+ cat >input <<-\EOF &&
+ A:commit-4-1
+ X:commit-4-2
+ X:commit-5-1
+ EOF
+ echo "get_branch_base_for_tip(A,X):0" >expect &&
+ test_all_modes get_branch_base_for_tip
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/3] for-each-ref: add 'is-base' token
2024-08-01 22:10 [PATCH 0/3] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
2024-08-01 22:10 ` [PATCH 1/3] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
@ 2024-08-01 22:10 ` Derrick Stolee via GitGitGadget
2024-08-01 22:10 ` [PATCH 3/3] p1500: add is-base performance tests Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
4 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-01 22:10 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The previous change introduced the get_branch_base_for_tip() method in
commit-reach.c. The motivation of that change was about using a heuristic to
deteremine the base branch for a source commit from a list of candidate
commit tips. This change makes that algorithm visible to users via a new
atom in the 'git for-each-ref' format. This change is very similar to the
chang in 49abcd21da6 (for-each-ref: add ahead-behind format atom,
2023-03-20).
Introduce the 'is-base:<source>' atom, which will indicate that the
algorithm should be computed and the result of the algorithm is reported
using an indicator of the form '(<source>)'. For example, using
'%(is-base:HEAD)' would result in one line having the token '(HEAD)'.
Use the sorted order of refs included in the ref filter to break ties in the
algorithm's heuristic. In the previous change, the motivating examples
include using an L0 trunk, long-lived L1 branches, and temporary release
branches. A caller could communicate the ordered preference among these
categories using the input refpecs and avoiding a different sort mechanism.
This sorting behavior is tested in the test scripts.
It is important to include this atom as a special case to
can_do_iterative_format() to match the expectations created in bd98f9774e1
(ref-filter.c: filter & format refs in the same callback, 2023-11-14). The
ahead-behind atom was one of the special cases, and this similarly requires
using an algorithm across all input refs before starting the format of any
single ref.
In the test script, the format tokens use colons or lack whitespace to avoid
Git complaining about trailing whitespace errors.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
ref-filter.c | 78 ++++++++++++++++++++++++++++++++++++++++++-
ref-filter.h | 15 +++++++++
t/t6600-test-reach.sh | 47 ++++++++++++++++++++++++++
3 files changed, 139 insertions(+), 1 deletion(-)
diff --git a/ref-filter.c b/ref-filter.c
index 59ad6f54ddb..59689672da1 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -167,6 +167,7 @@ enum atom_type {
ATOM_ELSE,
ATOM_REST,
ATOM_AHEADBEHIND,
+ ATOM_ISBASE,
};
/*
@@ -889,6 +890,23 @@ static int ahead_behind_atom_parser(struct ref_format *format,
return 0;
}
+static int is_base_atom_parser(struct ref_format *format,
+ struct used_atom *atom UNUSED,
+ const char *arg, struct strbuf *err)
+{
+ struct string_list_item *item;
+
+ if (!arg)
+ return strbuf_addf_ret(err, -1, _("expected format: %%(is-base:<committish>)"));
+
+ item = string_list_append(&format->is_base_tips, arg);
+ item->util = lookup_commit_reference_by_name(arg);
+ if (!item->util)
+ die("failed to find '%s'", arg);
+
+ return 0;
+}
+
static int head_atom_parser(struct ref_format *format UNUSED,
struct used_atom *atom,
const char *arg, struct strbuf *err)
@@ -952,6 +970,7 @@ static struct {
[ATOM_ELSE] = { "else", SOURCE_NONE },
[ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser },
[ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser },
+ [ATOM_ISBASE] = { "is-base", SOURCE_OTHER, FIELD_STR, is_base_atom_parser },
/*
* Please update $__git_ref_fieldlist in git-completion.bash
* when you add new atoms
@@ -2334,6 +2353,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
int i;
struct object_info empty = OBJECT_INFO_INIT;
int ahead_behind_atoms = 0;
+ int is_base_atoms = 0;
CALLOC_ARRAY(ref->value, used_atom_cnt);
@@ -2475,6 +2495,16 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
v->s = xstrdup("");
}
continue;
+ } else if (atom_type == ATOM_ISBASE) {
+ if (ref->is_base && ref->is_base[is_base_atoms]) {
+ v->s = xstrfmt("(%s)", ref->is_base[is_base_atoms]);
+ free(ref->is_base[is_base_atoms]);
+ } else {
+ /* Not a commit. */
+ v->s = xstrdup("");
+ }
+ is_base_atoms++;
+ continue;
} else
continue;
@@ -2876,6 +2906,7 @@ static void free_array_item(struct ref_array_item *item)
free(item->value);
}
free(item->counts);
+ free(item->is_base);
free(item);
}
@@ -3040,6 +3071,49 @@ void filter_ahead_behind(struct repository *r,
free(commits);
}
+void filter_is_base(struct repository *r,
+ struct ref_format *format,
+ struct ref_array *array)
+{
+ struct commit **bases;
+ size_t bases_nr = 0;
+ struct ref_array_item **back_index;
+
+ if (!format->is_base_tips.nr || !array->nr)
+ return;
+
+ CALLOC_ARRAY(back_index, array->nr);
+ CALLOC_ARRAY(bases, array->nr);
+
+ for (size_t i = 0; i < array->nr; i++) {
+ const char *name = array->items[i]->refname;
+ struct commit *c = lookup_commit_reference_by_name(name);
+
+ CALLOC_ARRAY(array->items[i]->is_base, format->is_base_tips.nr);
+
+ if (!c)
+ continue;
+
+ back_index[bases_nr] = array->items[i];
+ bases[bases_nr] = c;
+ bases_nr++;
+ }
+
+ for (size_t i = 0; i < format->is_base_tips.nr; i++) {
+ struct commit *tip = format->is_base_tips.items[i].util;
+ int base_index = get_branch_base_for_tip(r, tip, bases, bases_nr);
+
+ if (base_index < 0)
+ continue;
+
+ /* Store the string for use in output later. */
+ back_index[base_index]->is_base[i] = xstrdup(format->is_base_tips.items[i].string);
+ }
+
+ free(back_index);
+ free(bases);
+}
+
static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
{
int ret = 0;
@@ -3126,7 +3200,8 @@ static inline int can_do_iterative_format(struct ref_filter *filter,
return !(filter->reachable_from ||
filter->unreachable_from ||
sorting ||
- format->bases.nr);
+ format->bases.nr ||
+ format->is_base_tips.nr);
}
void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
@@ -3150,6 +3225,7 @@ void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
struct ref_array array = { 0 };
filter_refs(&array, filter, type);
filter_ahead_behind(the_repository, format, &array);
+ filter_is_base(the_repository, format, &array);
ref_array_sort(sorting, &array);
print_formatted_ref_array(&array, format);
ref_array_clear(&array);
diff --git a/ref-filter.h b/ref-filter.h
index 0ca28d2bba6..20419a56218 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -48,6 +48,7 @@ struct ref_array_item {
struct commit *commit;
struct atom_value *value;
struct ahead_behind_count **counts;
+ char **is_base;
char refname[FLEX_ARRAY];
};
@@ -101,6 +102,9 @@ struct ref_format {
/* List of bases for ahead-behind counts. */
struct string_list bases;
+ /* List of bases for is-base indicators. */
+ struct string_list is_base_tips;
+
struct {
int max_count;
int omit_empty;
@@ -114,6 +118,7 @@ struct ref_format {
#define REF_FORMAT_INIT { \
.use_color = -1, \
.bases = STRING_LIST_INIT_DUP, \
+ .is_base_tips = STRING_LIST_INIT_DUP, \
}
/* Macros for checking --merged and --no-merged options */
@@ -203,6 +208,16 @@ void filter_ahead_behind(struct repository *r,
struct ref_format *format,
struct ref_array *array);
+/*
+ * If the provided format includes is-base atoms, then compute the base checks
+ * for those tips against all refs.
+ *
+ * If this is not called, then any is-base atoms will be blank.
+ */
+void filter_is_base(struct repository *r,
+ struct ref_format *format,
+ struct ref_array *array);
+
void ref_filter_init(struct ref_filter *filter);
void ref_filter_clear(struct ref_filter *filter);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 3069efc8601..6c7f92bcb38 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -659,4 +659,51 @@ test_expect_success 'get_branch_base_for_tip: all reach tip' '
test_all_modes get_branch_base_for_tip
'
+test_expect_success 'for-each-ref is-base: none reach' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-1-1
+ refs/heads/commit-4-2
+ refs/heads/commit-4-4
+ refs/heads/commit-8-4
+ EOF
+ cat >expect <<-\EOF &&
+ refs/heads/commit-1-1:
+ refs/heads/commit-4-2:(commit-2-3)
+ refs/heads/commit-4-4:
+ refs/heads/commit-8-4:
+ EOF
+ run_all_modes git for-each-ref \
+ --format="%(refname):%(is-base:commit-2-3)" --stdin
+'
+
+test_expect_success 'for-each-ref is-base: all reach' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-4-2
+ refs/heads/commit-5-1
+ EOF
+ cat >expect <<-\EOF &&
+ refs/heads/commit-4-2:(commit-4-1)
+ refs/heads/commit-5-1:
+ EOF
+ run_all_modes git for-each-ref \
+ --format="%(refname):%(is-base:commit-4-1)" --stdin
+'
+
+test_expect_success 'for-each-ref is-base:multiple' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-1-1
+ refs/heads/commit-4-2
+ refs/heads/commit-4-4
+ refs/heads/commit-8-4
+ EOF
+ cat >expect <<-\EOF &&
+ refs/heads/commit-1-1[-]
+ refs/heads/commit-4-2[(commit-2-3)-]
+ refs/heads/commit-4-4[-]
+ refs/heads/commit-8-4[-(commit-6-5)]
+ EOF
+ run_all_modes git for-each-ref \
+ --format="%(refname)[%(is-base:commit-2-3)-%(is-base:commit-6-5)]" --stdin
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/3] p1500: add is-base performance tests
2024-08-01 22:10 [PATCH 0/3] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
2024-08-01 22:10 ` [PATCH 1/3] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
2024-08-01 22:10 ` [PATCH 2/3] for-each-ref: add 'is-base' token Derrick Stolee via GitGitGadget
@ 2024-08-01 22:10 ` Derrick Stolee via GitGitGadget
2024-08-01 23:06 ` [PATCH 0/3] git for-each-ref: is-base atom and base branches Junio C Hamano
2024-08-11 17:34 ` [PATCH v2 " Derrick Stolee via GitGitGadget
4 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-01 22:10 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The previous two changes introduced a commit walking heuristic for finding
the most likely base branch for a given source. This algorithm walks
first-parent histories until reaching a collision.
This walk _should_ be very fast. Exceptions include cases where a
commit-graph file does not exist, leading to a full walk of all reachable
commits to compute generation numbers, or a case where no collision in the
first-parent history exists, leading to a walk of all first-parent history
to the root commits.
The p1500 test script guarantees a complete commit-graph file during its
setup, so we will not test that scenario. Do create a new root commit in an
effort to test the scenario of parallel first-parent histories.
Even with the extra root commit, these tests take no longer than 0.02
seconds on my machine for the Git repository. However, the results are
slightly more interesting in a copy of the Linux kernel repository:
Test
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref 0.12
1500.3: ahead-behind counts: git branch 0.12
1500.4: ahead-behind counts: git tag 0.12
1500.5: contains: git for-each-ref --merged 0.04
1500.6: contains: git branch --merged 0.04
1500.7: contains: git tag --merged 0.04
1500.8: is-base check: test-tool reach (refs) 0.03
1500.9: is-base check: test-tool reach (tags) 0.03
1500.10: is-base check: git for-each-ref 0.03
1500.11: is-base check: git for-each-ref (disjoint-base) 0.07
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/perf/p1500-graph-walks.sh | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
index e14e7620cce..5b23ce5db93 100755
--- a/t/perf/p1500-graph-walks.sh
+++ b/t/perf/p1500-graph-walks.sh
@@ -20,6 +20,21 @@ test_expect_success 'setup' '
echo tag-$ref ||
return 1
done >tags &&
+
+ echo "A:HEAD" >test-tool-refs &&
+ for line in $(cat refs)
+ do
+ echo "X:$line" >>test-tool-refs || return 1
+ done &&
+ echo "A:HEAD" >test-tool-tags &&
+ for line in $(cat tags)
+ do
+ echo "X:$line" >>test-tool-tags || return 1
+ done &&
+
+ commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
+ git update-ref refs/heads/disjoint-base $commit &&
+
git commit-graph write --reachable
'
@@ -47,4 +62,20 @@ test_perf 'contains: git tag --merged' '
xargs git tag --merged=HEAD <tags
'
+test_perf 'is-base check: test-tool reach (refs)' '
+ test-tool reach get_branch_base_for_tip <test-tool-refs
+'
+
+test_perf 'is-base check: test-tool reach (tags)' '
+ test-tool reach get_branch_base_for_tip <test-tool-tags
+'
+
+test_perf 'is-base check: git for-each-ref' '
+ git for-each-ref --format="%(is-base:HEAD)" --stdin <refs
+'
+
+test_perf 'is-base check: git for-each-ref (disjoint-base)' '
+ git for-each-ref --format="%(is-base:refs/heads/disjoint-base)" --stdin <refs
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 0/3] git for-each-ref: is-base atom and base branches
2024-08-01 22:10 [PATCH 0/3] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2024-08-01 22:10 ` [PATCH 3/3] p1500: add is-base performance tests Derrick Stolee via GitGitGadget
@ 2024-08-01 23:06 ` Junio C Hamano
2024-08-02 14:32 ` Derrick Stolee
2024-08-11 17:34 ` [PATCH v2 " Derrick Stolee via GitGitGadget
4 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2024-08-01 23:06 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, vdye, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Derrick Stolee (3):
> commit-reach: add get_branch_base_for_tip
> for-each-ref: add 'is-base' token
> p1500: add is-base performance tests
>
> commit-reach.c | 118 ++++++++++++++++++++++++++++++++++++
> commit-reach.h | 17 ++++++
> ref-filter.c | 78 +++++++++++++++++++++++-
> ref-filter.h | 15 +++++
> t/helper/test-reach.c | 2 +
> t/perf/p1500-graph-walks.sh | 31 ++++++++++
> t/t6600-test-reach.sh | 94 ++++++++++++++++++++++++++++
> 7 files changed, 354 insertions(+), 1 deletion(-)
I was expecting to see an documentation update to for-each-ref (and
probably branch and tag) so that what this new atom means. Is it
that %(is-base:<commit>) interpolates to <commit> for a ref that is an
ancestor of <commit>, and interpolates to an empty string for a ref
that is not, or something?
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/3] git for-each-ref: is-base atom and base branches
2024-08-01 23:06 ` [PATCH 0/3] git for-each-ref: is-base atom and base branches Junio C Hamano
@ 2024-08-02 14:32 ` Derrick Stolee
2024-08-02 16:55 ` Junio C Hamano
0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2024-08-02 14:32 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, vdye
On 8/1/24 7:06 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> Derrick Stolee (3):
>> commit-reach: add get_branch_base_for_tip
>> for-each-ref: add 'is-base' token
>> p1500: add is-base performance tests
>>
>> commit-reach.c | 118 ++++++++++++++++++++++++++++++++++++
>> commit-reach.h | 17 ++++++
>> ref-filter.c | 78 +++++++++++++++++++++++-
>> ref-filter.h | 15 +++++
>> t/helper/test-reach.c | 2 +
>> t/perf/p1500-graph-walks.sh | 31 ++++++++++
>> t/t6600-test-reach.sh | 94 ++++++++++++++++++++++++++++
>> 7 files changed, 354 insertions(+), 1 deletion(-)
>
> I was expecting to see an documentation update to for-each-ref (and
> probably branch and tag) so that what this new atom means. Is it
> that %(is-base:<commit>) interpolates to <commit> for a ref that is an
> ancestor of <commit>, and interpolates to an empty string for a ref
> that is not, or something?
You are absolutely right that I missed this crucial detail. I will
eventually send a v2 with this oversight corrected. For now, please
consider this documentation diff, and I look forward to other review
comments that I can use to improve this series before sending v2.
diff --git a/Documentation/git-for-each-ref.txt
b/Documentation/git-for-each-ref.txt
index c1dd12b93cf..5154ba3e2a7 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -264,6 +264,16 @@ ahead-behind:<committish>::
commits ahead and behind, respectively, when comparing the output
ref to the `<committish>` specified in the format.
+is-base:<committish>::
+ In at most one row, `(<committish>)` will appear to indicate the ref
+ that minimizes the number of commits in the first-parent history of
+ `<committish>` and not in the first-parent history of the ref. Ties
+ are broken by favoring the earliest ref in the list. Note that this
+ token will not appear if the first-parent history of `<committish>`
+ does not intersect the first-parent histories of the filtered refs.
+ This can be used as a heuristic to guess which of the filtered refs
+ was used as the base of the branch that produced `<committish>`.
+
describe[:options]::
A human-readable name, like linkgit:git-describe[1];
empty string for undescribable commits. The `describe` string may
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 0/3] git for-each-ref: is-base atom and base branches
2024-08-02 14:32 ` Derrick Stolee
@ 2024-08-02 16:55 ` Junio C Hamano
2024-08-02 17:30 ` Junio C Hamano
0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2024-08-02 16:55 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git, vdye
Derrick Stolee <stolee@gmail.com> writes:
> consider this documentation diff, and I look forward to other review
> comments that I can use to improve this series before sending v2.
>
> diff --git a/Documentation/git-for-each-ref.txt
> b/Documentation/git-for-each-ref.txt
> index c1dd12b93cf..5154ba3e2a7 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -264,6 +264,16 @@ ahead-behind:<committish>::
> commits ahead and behind, respectively, when comparing the output
> ref to the `<committish>` specified in the format.
>
> +is-base:<committish>::
> + In at most one row, `(<committish>)` will appear to indicate the ref
> + that minimizes the number of commits in the first-parent history of
> + `<committish>` and not in the first-parent history of the ref. Ties
> + are broken by favoring the earliest ref in the list. Note that this
> + token will not appear if the first-parent history of `<committish>`
> + does not intersect the first-parent histories of the filtered refs.
> + This can be used as a heuristic to guess which of the filtered refs
> + was used as the base of the branch that produced `<committish>`.
> +
OK. Knowing what definition you used is crucial when reading the
implementation, as we cannot tell what you wanted to implement
without it ;-)
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/3] git for-each-ref: is-base atom and base branches
2024-08-02 16:55 ` Junio C Hamano
@ 2024-08-02 17:30 ` Junio C Hamano
0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2024-08-02 17:30 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git, vdye
Junio C Hamano <gitster@pobox.com> writes:
>> +is-base:<committish>::
>> + In at most one row, `(<committish>)` will appear to indicate the ref
>> + that minimizes the number of commits in the first-parent history of
>> + `<committish>` and not in the first-parent history of the ref.
This was a bit too dense for me to grok. So if I have a <commit>
that is at the tip of a branch B forked from 'master', and then
'master' advanced by a lot since the branch forked, the number this
is minimizing for 'master' is the commits on the branch B, but when
showing 'maint', then even though the branch B may have the tip of
'maint' as an ancestor, the number for 'maint' would be a lot more
than the number for 'master'. If there were another branch C that
was forked from 'master' and shared some (or all) commits that are
near the tip of branch B, e.g.
---o---o---o---o---o---o---o---o---o 'master'
\
o---o---o---o 'C'
\
o---o---o---o 'B'
then the number may be even smaller for branch 'C' than 'master'.
And for at most one ref, %(is-base:<commit>) becomes "(<commit>)";
for all other refs, it becomes an empty string.
OK.
> OK. Knowing what definition you used is crucial when reading the
> implementation, as we cannot tell what you wanted to implement
> without it ;-)
>
> Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 0/3] git for-each-ref: is-base atom and base branches
2024-08-01 22:10 [PATCH 0/3] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2024-08-01 23:06 ` [PATCH 0/3] git for-each-ref: is-base atom and base branches Junio C Hamano
@ 2024-08-11 17:34 ` Derrick Stolee via GitGitGadget
2024-08-11 17:34 ` [PATCH v2 1/3] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
` (3 more replies)
4 siblings, 4 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-11 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee
This change introduces a new 'git for-each-ref' atom, 'is-base', in a very
similar way to the 'ahead-behind' atom. As detailed carefully in the first
change, this is motivated by the need to detect the concept of a "base
branch" in a repository with multiple long-lived branches.
This change is motivated by a third-party tool created to make this
detection with the same optimization mechanism, but using a much slower
technique due to the limitations of the Git CLI not presenting this
information. The existing algorithm involves using git rev-list
--first-parent -<N> in batches for the collection of considered references,
comparing those lists, and increasing <N> as needed until finding a
collision. This new use of 'git for-each-ref' will allow determining this
mechanism within a single process and walking a minimal number of commits.
There are benefits to users both on client-side and server-side. In an
internal monorepo, this base branch detection algorithm is used to determine
a long-lived branch based on the HEAD commit, mapping to a group within the
organizational structure of the repository, which determines a set of
projects that the user will likely need to build; this leads to
automatically selecting an initial sparse-checkout definition based on the
build dependencies required. An upcoming feature in Azure Repos will use
this algorithm to automatically create a pull request against the correct
target branch, reducing user pain from needing to select a different branch
after a large commit diff is rendered against the default branch. This atom
unlocks that ability for Git hosting services that use Git in their backend.
Thanks, -Stolee
Updates in v2
=============
* I had forgotten to include a documentation change in v1. My attempt to
create a succinct doc change in a follow-up hunk continued to be
confusing. This version includes a more expanded version of the
documentation blurb for the is-base token.
Derrick Stolee (3):
commit-reach: add get_branch_base_for_tip
for-each-ref: add 'is-base' token
p1500: add is-base performance tests
Documentation/git-for-each-ref.txt | 42 ++++++++++
commit-reach.c | 118 +++++++++++++++++++++++++++++
commit-reach.h | 17 +++++
ref-filter.c | 78 ++++++++++++++++++-
ref-filter.h | 15 ++++
t/helper/test-reach.c | 2 +
t/perf/p1500-graph-walks.sh | 31 ++++++++
t/t6600-test-reach.sh | 94 +++++++++++++++++++++++
8 files changed, 396 insertions(+), 1 deletion(-)
base-commit: bea9ecd24b0c3bf06cab4a851694fe09e7e51408
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1768%2Fderrickstolee%2Ftarget-ref-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1768/derrickstolee/target-ref-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1768
Range-diff vs v1:
1: 580026f910d = 1: 580026f910d commit-reach: add get_branch_base_for_tip
2: a1fbdca374f ! 2: 13341e7e512 for-each-ref: add 'is-base' token
@@ Commit message
Signed-off-by: Derrick Stolee <stolee@gmail.com>
+ ## Documentation/git-for-each-ref.txt ##
+@@ Documentation/git-for-each-ref.txt: ahead-behind:<committish>::
+ commits ahead and behind, respectively, when comparing the output
+ ref to the `<committish>` specified in the format.
+
++is-base:<committish>::
++ In at most one row, `(<committish>)` will appear to indicate the ref
++ that is most likely the ref used as a starting point for the branch
++ that produced `<committish>`. This choice is made using a heuristic:
++ choose the ref that minimizes the number of commits in the
++ first-parent history of `<committish>` and not in the first-parent
++ history of the ref.
+++
++For example, consider the following figure of first-parent histories of
++several refs:
+++
++----
++*--*--*--*--*--* refs/heads/A
++\
++ \
++ *--*--*--* refs/heads/B
++ \ \
++ \ \
++ * * refs/heads/C
++ \
++ \
++ *--* refs/heads/D
++----
+++
++Here, if `A`, `B`, and `C` are the filtered references, and the format
++string is `%(refname):%(is-base:D)`, then the output would be
+++
++----
++refs/heads/A:
++refs/heads/B:(D)
++refs/heads/C:
++----
+++
++This is because the first-parent history of `D` has its earliest
++intersection with the first-parent histories of the filtered refs at a
++common first-parent ancestor of `B` and `C` and ties are broken by the
++earliest ref in the sorted order.
+++
++Note that this token will not appear if the first-parent history of
++`<committish>` does not intersect the first-parent histories of the
++filtered refs.
++
+ describe[:options]::
+ A human-readable name, like linkgit:git-describe[1];
+ empty string for undescribable commits. The `describe` string may
+
## ref-filter.c ##
@@ ref-filter.c: enum atom_type {
ATOM_ELSE,
3: db87434e146 = 3: 757c20090db p1500: add is-base performance tests
--
gitgitgadget
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 1/3] commit-reach: add get_branch_base_for_tip
2024-08-11 17:34 ` [PATCH v2 " Derrick Stolee via GitGitGadget
@ 2024-08-11 17:34 ` Derrick Stolee via GitGitGadget
2024-08-12 20:30 ` Junio C Hamano
2024-08-11 17:34 ` [PATCH v2 2/3] for-each-ref: add 'is-base' token Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
3 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-11 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Add a new reachability algorithm that intends to discover (from a heuristic)
which branch was used as the starting point for a given commit. Add focused
tests using the 'test-tool reach' command.
Repositories that use pull requests (or merge requests) to advance one or
more "protected" branches, the history of that reference can be recovered by
following the first-parent history in most cases. Most are completed using
no-fast-forward merges, though squash merges are quite common. Less common
is rebase-and-merge, which still validates this assumption. Finally, the
case that breaks this assumption is the fast-forward update (with potential
rebasing). Even in this case, the previous commit commonly appears in the
first-parent history of the branch.
Similar assumptions can be made for a topic branch created by a single user
with the intention to merge back into another branch. Using 'git commit',
'git merge', and 'git cherry-pick' from HEAD will default to having the
first-parent commit be the previous commit at HEAD. This history changes
only with commands such as 'git reset' or 'git rebase', where the command
names also imply that the branch is starting from a new location.
With this movement of branches in mind, the following heuristic is proposed
as a way to determine the base branch for a given source branch:
Among a list of candidate base branches, select the candidate that
minimizes the number of commits in the first-parent history of the source
that are not in the first-parent history of the candidate.
Prior third-party solutions to this problem have used this optimization
criteria, but have relied upon extracting the first-parent history and
comparing those lists as tables instead of using commit-graph walks.
Given current command-line interface options, this optimization criteria is
not easy to detect directly. Even using the command
git rev-list --count --first-parent <base>..<source>
does not measure this count, as it uses full reachability from <base> to
determine which commits to remove from the range '<base>..<source>'. This
may lead to one asking if we should instead be using the full reachability
of the candidate and only the first-parent history of the source. This,
unfortunately, does not work for repositories that use long-lived branches
and automation to merge across those branches.
In extremely large repositories, merging into a single trunk may not be
feasible. This is usually due to the desired frequency of updates
(thousands of engineers doing daily work) combined with the time required to
perform a validation build. These factors combine to create significant
risk of semantic merge conflicts, leading to build breaks on the trunk. In
response, repository maintainers can create a single Level Zero (L0) trunk
and multiple Level One (L1) branches. By partitioning the engineers by
organization, these engineers may see lower risk of semantic merge conflicts
as well as be protected against build breaks in other L1 branches. The key
to making this system work is a semi-automated process of merging L1
branches into the L0 trunk and vice-versa. In a large enough organization,
these L1 branches may further split into L2 or L3 branches, but the same
principles apply for merging across deeper levels.
If these automated merges use a typical merge with the second parent
bringing in the "new" content, then each L0 and L1 branch can track its
previous positions by following first-parent history, which appear as
parallel paths (until reaching the first place where the branches diverged).
If we also walk to second parents, then the histories overlap significantly
and cannot be distinguished except for very-recent changes.
For this reason, the first-parent condition should be symmetrical across the
base and source branches.
Another common case for desiring the result of this optimization method is
the use of release branches. When releasing a version of a repository, a
branch can be used to track that release. Any updates that are worth fixing
in that release can be merged to the release branch and shipped with only
the necessary fixes without any new features introduced in the trunk branch.
The 'maint-2.<X>' branches represent this pattern in the Git project. The
microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that
are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This
application doesn't need the symmetrical first-parent condition, but the use
of first-parent histories does not change the results for these branches.
To determine the base branch from a list of candidates, create a new method
in commit-reach.c that performs a single* commit-graph walk. The core
concept is to walk first-parents starting at the candidate bases and the
source, tracking the "best" base to reach a given commit. Use generation
numbers to ensure that a commit is walked at most once and all children have
been explored before visiting it. When reaching a commit that is reachable
from both a base and the source, we will then have a guarantee that this is
the closest intersection of first-parent histories. Track the best base to
reach that commit and return it as a result. In rare cases involving
multiple root commits, the first-parent history of the source may never
intersect any of the candidates and thus a null result is returned.
* There are up to two walks, since we require all commits to have a computed
generation number in order to avoid incorrect results. This is similar to
the need for computed generation numbers in ahead_behind() as implemented
in fd67d149bde (commit-reach: implement ahead_behind() logic, 2023-03-20).
In order to track the "best" base, use a new commit slab that stores an
integer. This value defaults to zero upon initialization, so use -1 to
track that the source commit can reach this commit and use 'i + 1' to track
that the ith base can reach this commit. When multiple bases can reach a
commit, minimize the index to break ties. This allows the caller to specify
an order to the bases that determines some amount of preference when the
heuristic does not result in a unique result.
The trickiest part of the integer slab is what happens when reaching a
collision among the histories of the bases and the history of the source.
This is noticed when viewing the first parent and seeing that it has a slab
value that differs in sign (negative or positive). In this case, the
collision commit is stored in the method variable 'branch_point' and its
slab value is set to -1. The index of the best base (so far) is stored in
the method variable 'best_index'. It is possible that there are multiple
commits that have the branch_point as its first parent, leading to multiple
updates of best_index. The result is determined when 'branch_point' is
visited in the commit walk, giving the guarantee that all commits that could
reach 'branch_point' were visited.
Several interesting cases of collisions and different results are tested in
the t6600-test-reach.sh script. Recall that this script also tests the
algorithm in three possible states involving the commit-graph file and how
many commits are written in the file. This provides some coverage of the
need (and lack of need) for the ensure_generations_valid() method.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
commit-reach.c | 118 ++++++++++++++++++++++++++++++++++++++++++
commit-reach.h | 17 ++++++
t/helper/test-reach.c | 2 +
t/t6600-test-reach.sh | 47 +++++++++++++++++
4 files changed, 184 insertions(+)
diff --git a/commit-reach.c b/commit-reach.c
index 8f9b008f876..1b56fb081a6 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1222,3 +1222,121 @@ done:
free(commits);
repo_clear_commit_marks(r, SEEN);
}
+
+/*
+ * This slab initializes integers to zero, so use "-1" for "tip is best" and
+ * "i + 1" for "bases[i] is best".
+ */
+define_commit_slab(best_branch_base, int);
+static struct best_branch_base best_branch_base;
+#define get_best(c) (*best_branch_base_at(&best_branch_base, c))
+#define set_best(c,v) (*best_branch_base_at(&best_branch_base, c) = v)
+
+int get_branch_base_for_tip(struct repository *r,
+ struct commit *tip,
+ struct commit **bases,
+ size_t bases_nr)
+{
+ int best_index = -1;
+ struct commit *branch_point = NULL;
+ struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+ int found_missing_gen = 0;
+
+ if (!bases_nr)
+ return -1;
+
+ repo_parse_commit(r, tip);
+ if (commit_graph_generation(tip) == GENERATION_NUMBER_INFINITY)
+ found_missing_gen = 1;
+
+ /* Check for missing generation numbers. */
+ for (size_t i = 0; i < bases_nr; i++) {
+ struct commit *c = bases[i];
+ repo_parse_commit(r, c);
+ if (commit_graph_generation(c) == GENERATION_NUMBER_INFINITY)
+ found_missing_gen = 1;
+ }
+
+ if (found_missing_gen) {
+ struct commit **commits;
+ size_t commits_nr = bases_nr + 1;
+
+ CALLOC_ARRAY(commits, commits_nr);
+ COPY_ARRAY(commits, bases, bases_nr);
+ commits[bases_nr] = tip;
+ ensure_generations_valid(r, commits, commits_nr);
+ free(commits);
+ }
+
+ /* Initialize queue and slab now that generations are guaranteed. */
+ init_best_branch_base(&best_branch_base);
+ set_best(tip, -1);
+ prio_queue_put(&queue, tip);
+
+ for (size_t i = 0; i < bases_nr; i++) {
+ struct commit *c = bases[i];
+
+ /* Has this already been marked as best by another commit? */
+ if (get_best(c))
+ continue;
+
+ set_best(c, i + 1);
+ prio_queue_put(&queue, c);
+ }
+
+ while (queue.nr) {
+ struct commit *c = prio_queue_get(&queue);
+ int best_for_c = get_best(c);
+ int best_for_p, positive;
+ struct commit *parent;
+
+ /* Have we reached a known branch point? It's optimal. */
+ if (c == branch_point)
+ break;
+
+ repo_parse_commit(r, c);
+ if (!c->parents)
+ continue;
+
+ parent = c->parents->item;
+ repo_parse_commit(r, parent);
+ best_for_p = get_best(parent);
+
+ if (!best_for_p) {
+ /* 'parent' is new, so pass along best_for_c. */
+ set_best(parent, best_for_c);
+ prio_queue_put(&queue, parent);
+ continue;
+ }
+
+ if (best_for_p > 0 && best_for_c > 0) {
+ /* Collision among bases. Minimize. */
+ if (best_for_c < best_for_p)
+ set_best(parent, best_for_c);
+ continue;
+ }
+
+ /*
+ * At this point, we have reached a commit that is reachable
+ * from the tip, either from 'c' or from an earlier commit to
+ * have 'parent' as its first parent.
+ *
+ * Update 'best_index' to match the minimum of all base indices
+ * to reach 'parent'.
+ */
+
+ /* Exactly one is positive due to initial conditions. */
+ positive = (best_for_c < 0) ? best_for_p : best_for_c;
+
+ if (best_index < 0 || positive < best_index)
+ best_index = positive;
+
+ /* No matter what, track that the parent is reachable from tip. */
+ set_best(parent, -1);
+ branch_point = parent;
+ }
+
+ clear_best_branch_base(&best_branch_base);
+ clear_prio_queue(&queue);
+ return best_index > 0 ? best_index - 1 : -1;
+}
diff --git a/commit-reach.h b/commit-reach.h
index bf63cc468fd..9a745b7e176 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -139,4 +139,21 @@ void tips_reachable_from_bases(struct repository *r,
struct commit **tips, size_t tips_nr,
int mark);
+/*
+ * Given a 'tip' commit and a list potential 'bases', return the index 'i' that
+ * minimizes the number of commits in the first-parent history of 'tip' and not
+ * in the first-parent history of 'bases[i]'.
+ *
+ * Among a list of long-lived branches that are updated only by merges (with the
+ * first parent being the previous position of the branch), this would inform
+ * which branch was used to create the tip reference.
+ *
+ * Returns -1 if no common point is found in first-parent histories, which is
+ * rare, but possible with multiple root commits.
+ */
+int get_branch_base_for_tip(struct repository *r,
+ struct commit *tip,
+ struct commit **bases,
+ size_t bases_nr);
+
#endif
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 1e3b431e3e7..8579b607aa5 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -114,6 +114,8 @@ int cmd__reach(int ac, const char **av)
repo_in_merge_bases_many(the_repository, A, X_nr, X_array, 0));
else if (!strcmp(av[1], "is_descendant_of"))
printf("%s(A,X):%d\n", av[1], repo_is_descendant_of(r, A, X));
+ else if (!strcmp(av[1], "get_branch_base_for_tip"))
+ printf("%s(A,X):%d\n", av[1], get_branch_base_for_tip(r, A, X_array, X_nr));
else if (!strcmp(av[1], "get_merge_bases_many")) {
struct commit_list *list = NULL;
if (repo_get_merge_bases_many(the_repository,
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b330945f497..3069efc8601 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -612,4 +612,51 @@ test_expect_success 'for-each-ref merged:none' '
--format="%(refname)" --stdin
'
+# For get_branch_base_for_tip, we only care about
+# first-parent history. Here is the test graph with
+# second parents removed:
+#
+# (10,10)
+# /
+# (10,9) (9,10)
+# / /
+# (10,8) (9,9) (8,10)
+# / / /
+# ( continued...)
+# \ / / /
+# (3,1) (2,2) (1,3)
+# \ / /
+# (2,1) (1,2)
+# \ /
+# (1,1)
+#
+# In short, for a commit (i,j), the first-parent history
+# walks all commits (i, k) with k from j to 1, then the
+# commits (l, 1) with l from i to 1.
+
+test_expect_success 'get_branch_base_for_tip: none reach' '
+ # (2,3) branched from the first tip (i,4) in X with i > 2
+ cat >input <<-\EOF &&
+ A:commit-2-3
+ X:commit-1-2
+ X:commit-1-4
+ X:commit-4-4
+ X:commit-8-4
+ X:commit-10-4
+ EOF
+ echo "get_branch_base_for_tip(A,X):2" >expect &&
+ test_all_modes get_branch_base_for_tip
+'
+
+test_expect_success 'get_branch_base_for_tip: all reach tip' '
+ # (2,3) branched from the first tip (i,4) in X with i > 2
+ cat >input <<-\EOF &&
+ A:commit-4-1
+ X:commit-4-2
+ X:commit-5-1
+ EOF
+ echo "get_branch_base_for_tip(A,X):0" >expect &&
+ test_all_modes get_branch_base_for_tip
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 2/3] for-each-ref: add 'is-base' token
2024-08-11 17:34 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-08-11 17:34 ` [PATCH v2 1/3] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
@ 2024-08-11 17:34 ` Derrick Stolee via GitGitGadget
2024-08-12 21:05 ` Junio C Hamano
2024-08-11 17:34 ` [PATCH v2 3/3] p1500: add is-base performance tests Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
3 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-11 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The previous change introduced the get_branch_base_for_tip() method in
commit-reach.c. The motivation of that change was about using a heuristic to
deteremine the base branch for a source commit from a list of candidate
commit tips. This change makes that algorithm visible to users via a new
atom in the 'git for-each-ref' format. This change is very similar to the
chang in 49abcd21da6 (for-each-ref: add ahead-behind format atom,
2023-03-20).
Introduce the 'is-base:<source>' atom, which will indicate that the
algorithm should be computed and the result of the algorithm is reported
using an indicator of the form '(<source>)'. For example, using
'%(is-base:HEAD)' would result in one line having the token '(HEAD)'.
Use the sorted order of refs included in the ref filter to break ties in the
algorithm's heuristic. In the previous change, the motivating examples
include using an L0 trunk, long-lived L1 branches, and temporary release
branches. A caller could communicate the ordered preference among these
categories using the input refpecs and avoiding a different sort mechanism.
This sorting behavior is tested in the test scripts.
It is important to include this atom as a special case to
can_do_iterative_format() to match the expectations created in bd98f9774e1
(ref-filter.c: filter & format refs in the same callback, 2023-11-14). The
ahead-behind atom was one of the special cases, and this similarly requires
using an algorithm across all input refs before starting the format of any
single ref.
In the test script, the format tokens use colons or lack whitespace to avoid
Git complaining about trailing whitespace errors.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-for-each-ref.txt | 42 ++++++++++++++++
ref-filter.c | 78 +++++++++++++++++++++++++++++-
ref-filter.h | 15 ++++++
t/t6600-test-reach.sh | 47 ++++++++++++++++++
4 files changed, 181 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index c1dd12b93cf..d3764401a23 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -264,6 +264,48 @@ ahead-behind:<committish>::
commits ahead and behind, respectively, when comparing the output
ref to the `<committish>` specified in the format.
+is-base:<committish>::
+ In at most one row, `(<committish>)` will appear to indicate the ref
+ that is most likely the ref used as a starting point for the branch
+ that produced `<committish>`. This choice is made using a heuristic:
+ choose the ref that minimizes the number of commits in the
+ first-parent history of `<committish>` and not in the first-parent
+ history of the ref.
++
+For example, consider the following figure of first-parent histories of
+several refs:
++
+----
+*--*--*--*--*--* refs/heads/A
+\
+ \
+ *--*--*--* refs/heads/B
+ \ \
+ \ \
+ * * refs/heads/C
+ \
+ \
+ *--* refs/heads/D
+----
++
+Here, if `A`, `B`, and `C` are the filtered references, and the format
+string is `%(refname):%(is-base:D)`, then the output would be
++
+----
+refs/heads/A:
+refs/heads/B:(D)
+refs/heads/C:
+----
++
+This is because the first-parent history of `D` has its earliest
+intersection with the first-parent histories of the filtered refs at a
+common first-parent ancestor of `B` and `C` and ties are broken by the
+earliest ref in the sorted order.
++
+Note that this token will not appear if the first-parent history of
+`<committish>` does not intersect the first-parent histories of the
+filtered refs.
+
describe[:options]::
A human-readable name, like linkgit:git-describe[1];
empty string for undescribable commits. The `describe` string may
diff --git a/ref-filter.c b/ref-filter.c
index 59ad6f54ddb..59689672da1 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -167,6 +167,7 @@ enum atom_type {
ATOM_ELSE,
ATOM_REST,
ATOM_AHEADBEHIND,
+ ATOM_ISBASE,
};
/*
@@ -889,6 +890,23 @@ static int ahead_behind_atom_parser(struct ref_format *format,
return 0;
}
+static int is_base_atom_parser(struct ref_format *format,
+ struct used_atom *atom UNUSED,
+ const char *arg, struct strbuf *err)
+{
+ struct string_list_item *item;
+
+ if (!arg)
+ return strbuf_addf_ret(err, -1, _("expected format: %%(is-base:<committish>)"));
+
+ item = string_list_append(&format->is_base_tips, arg);
+ item->util = lookup_commit_reference_by_name(arg);
+ if (!item->util)
+ die("failed to find '%s'", arg);
+
+ return 0;
+}
+
static int head_atom_parser(struct ref_format *format UNUSED,
struct used_atom *atom,
const char *arg, struct strbuf *err)
@@ -952,6 +970,7 @@ static struct {
[ATOM_ELSE] = { "else", SOURCE_NONE },
[ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser },
[ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser },
+ [ATOM_ISBASE] = { "is-base", SOURCE_OTHER, FIELD_STR, is_base_atom_parser },
/*
* Please update $__git_ref_fieldlist in git-completion.bash
* when you add new atoms
@@ -2334,6 +2353,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
int i;
struct object_info empty = OBJECT_INFO_INIT;
int ahead_behind_atoms = 0;
+ int is_base_atoms = 0;
CALLOC_ARRAY(ref->value, used_atom_cnt);
@@ -2475,6 +2495,16 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
v->s = xstrdup("");
}
continue;
+ } else if (atom_type == ATOM_ISBASE) {
+ if (ref->is_base && ref->is_base[is_base_atoms]) {
+ v->s = xstrfmt("(%s)", ref->is_base[is_base_atoms]);
+ free(ref->is_base[is_base_atoms]);
+ } else {
+ /* Not a commit. */
+ v->s = xstrdup("");
+ }
+ is_base_atoms++;
+ continue;
} else
continue;
@@ -2876,6 +2906,7 @@ static void free_array_item(struct ref_array_item *item)
free(item->value);
}
free(item->counts);
+ free(item->is_base);
free(item);
}
@@ -3040,6 +3071,49 @@ void filter_ahead_behind(struct repository *r,
free(commits);
}
+void filter_is_base(struct repository *r,
+ struct ref_format *format,
+ struct ref_array *array)
+{
+ struct commit **bases;
+ size_t bases_nr = 0;
+ struct ref_array_item **back_index;
+
+ if (!format->is_base_tips.nr || !array->nr)
+ return;
+
+ CALLOC_ARRAY(back_index, array->nr);
+ CALLOC_ARRAY(bases, array->nr);
+
+ for (size_t i = 0; i < array->nr; i++) {
+ const char *name = array->items[i]->refname;
+ struct commit *c = lookup_commit_reference_by_name(name);
+
+ CALLOC_ARRAY(array->items[i]->is_base, format->is_base_tips.nr);
+
+ if (!c)
+ continue;
+
+ back_index[bases_nr] = array->items[i];
+ bases[bases_nr] = c;
+ bases_nr++;
+ }
+
+ for (size_t i = 0; i < format->is_base_tips.nr; i++) {
+ struct commit *tip = format->is_base_tips.items[i].util;
+ int base_index = get_branch_base_for_tip(r, tip, bases, bases_nr);
+
+ if (base_index < 0)
+ continue;
+
+ /* Store the string for use in output later. */
+ back_index[base_index]->is_base[i] = xstrdup(format->is_base_tips.items[i].string);
+ }
+
+ free(back_index);
+ free(bases);
+}
+
static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
{
int ret = 0;
@@ -3126,7 +3200,8 @@ static inline int can_do_iterative_format(struct ref_filter *filter,
return !(filter->reachable_from ||
filter->unreachable_from ||
sorting ||
- format->bases.nr);
+ format->bases.nr ||
+ format->is_base_tips.nr);
}
void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
@@ -3150,6 +3225,7 @@ void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
struct ref_array array = { 0 };
filter_refs(&array, filter, type);
filter_ahead_behind(the_repository, format, &array);
+ filter_is_base(the_repository, format, &array);
ref_array_sort(sorting, &array);
print_formatted_ref_array(&array, format);
ref_array_clear(&array);
diff --git a/ref-filter.h b/ref-filter.h
index 0ca28d2bba6..20419a56218 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -48,6 +48,7 @@ struct ref_array_item {
struct commit *commit;
struct atom_value *value;
struct ahead_behind_count **counts;
+ char **is_base;
char refname[FLEX_ARRAY];
};
@@ -101,6 +102,9 @@ struct ref_format {
/* List of bases for ahead-behind counts. */
struct string_list bases;
+ /* List of bases for is-base indicators. */
+ struct string_list is_base_tips;
+
struct {
int max_count;
int omit_empty;
@@ -114,6 +118,7 @@ struct ref_format {
#define REF_FORMAT_INIT { \
.use_color = -1, \
.bases = STRING_LIST_INIT_DUP, \
+ .is_base_tips = STRING_LIST_INIT_DUP, \
}
/* Macros for checking --merged and --no-merged options */
@@ -203,6 +208,16 @@ void filter_ahead_behind(struct repository *r,
struct ref_format *format,
struct ref_array *array);
+/*
+ * If the provided format includes is-base atoms, then compute the base checks
+ * for those tips against all refs.
+ *
+ * If this is not called, then any is-base atoms will be blank.
+ */
+void filter_is_base(struct repository *r,
+ struct ref_format *format,
+ struct ref_array *array);
+
void ref_filter_init(struct ref_filter *filter);
void ref_filter_clear(struct ref_filter *filter);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 3069efc8601..6c7f92bcb38 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -659,4 +659,51 @@ test_expect_success 'get_branch_base_for_tip: all reach tip' '
test_all_modes get_branch_base_for_tip
'
+test_expect_success 'for-each-ref is-base: none reach' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-1-1
+ refs/heads/commit-4-2
+ refs/heads/commit-4-4
+ refs/heads/commit-8-4
+ EOF
+ cat >expect <<-\EOF &&
+ refs/heads/commit-1-1:
+ refs/heads/commit-4-2:(commit-2-3)
+ refs/heads/commit-4-4:
+ refs/heads/commit-8-4:
+ EOF
+ run_all_modes git for-each-ref \
+ --format="%(refname):%(is-base:commit-2-3)" --stdin
+'
+
+test_expect_success 'for-each-ref is-base: all reach' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-4-2
+ refs/heads/commit-5-1
+ EOF
+ cat >expect <<-\EOF &&
+ refs/heads/commit-4-2:(commit-4-1)
+ refs/heads/commit-5-1:
+ EOF
+ run_all_modes git for-each-ref \
+ --format="%(refname):%(is-base:commit-4-1)" --stdin
+'
+
+test_expect_success 'for-each-ref is-base:multiple' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-1-1
+ refs/heads/commit-4-2
+ refs/heads/commit-4-4
+ refs/heads/commit-8-4
+ EOF
+ cat >expect <<-\EOF &&
+ refs/heads/commit-1-1[-]
+ refs/heads/commit-4-2[(commit-2-3)-]
+ refs/heads/commit-4-4[-]
+ refs/heads/commit-8-4[-(commit-6-5)]
+ EOF
+ run_all_modes git for-each-ref \
+ --format="%(refname)[%(is-base:commit-2-3)-%(is-base:commit-6-5)]" --stdin
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 3/3] p1500: add is-base performance tests
2024-08-11 17:34 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-08-11 17:34 ` [PATCH v2 1/3] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
2024-08-11 17:34 ` [PATCH v2 2/3] for-each-ref: add 'is-base' token Derrick Stolee via GitGitGadget
@ 2024-08-11 17:34 ` Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
3 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-11 17:34 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The previous two changes introduced a commit walking heuristic for finding
the most likely base branch for a given source. This algorithm walks
first-parent histories until reaching a collision.
This walk _should_ be very fast. Exceptions include cases where a
commit-graph file does not exist, leading to a full walk of all reachable
commits to compute generation numbers, or a case where no collision in the
first-parent history exists, leading to a walk of all first-parent history
to the root commits.
The p1500 test script guarantees a complete commit-graph file during its
setup, so we will not test that scenario. Do create a new root commit in an
effort to test the scenario of parallel first-parent histories.
Even with the extra root commit, these tests take no longer than 0.02
seconds on my machine for the Git repository. However, the results are
slightly more interesting in a copy of the Linux kernel repository:
Test
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref 0.12
1500.3: ahead-behind counts: git branch 0.12
1500.4: ahead-behind counts: git tag 0.12
1500.5: contains: git for-each-ref --merged 0.04
1500.6: contains: git branch --merged 0.04
1500.7: contains: git tag --merged 0.04
1500.8: is-base check: test-tool reach (refs) 0.03
1500.9: is-base check: test-tool reach (tags) 0.03
1500.10: is-base check: git for-each-ref 0.03
1500.11: is-base check: git for-each-ref (disjoint-base) 0.07
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/perf/p1500-graph-walks.sh | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
index e14e7620cce..5b23ce5db93 100755
--- a/t/perf/p1500-graph-walks.sh
+++ b/t/perf/p1500-graph-walks.sh
@@ -20,6 +20,21 @@ test_expect_success 'setup' '
echo tag-$ref ||
return 1
done >tags &&
+
+ echo "A:HEAD" >test-tool-refs &&
+ for line in $(cat refs)
+ do
+ echo "X:$line" >>test-tool-refs || return 1
+ done &&
+ echo "A:HEAD" >test-tool-tags &&
+ for line in $(cat tags)
+ do
+ echo "X:$line" >>test-tool-tags || return 1
+ done &&
+
+ commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
+ git update-ref refs/heads/disjoint-base $commit &&
+
git commit-graph write --reachable
'
@@ -47,4 +62,20 @@ test_perf 'contains: git tag --merged' '
xargs git tag --merged=HEAD <tags
'
+test_perf 'is-base check: test-tool reach (refs)' '
+ test-tool reach get_branch_base_for_tip <test-tool-refs
+'
+
+test_perf 'is-base check: test-tool reach (tags)' '
+ test-tool reach get_branch_base_for_tip <test-tool-tags
+'
+
+test_perf 'is-base check: git for-each-ref' '
+ git for-each-ref --format="%(is-base:HEAD)" --stdin <refs
+'
+
+test_perf 'is-base check: git for-each-ref (disjoint-base)' '
+ git for-each-ref --format="%(is-base:refs/heads/disjoint-base)" --stdin <refs
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2 1/3] commit-reach: add get_branch_base_for_tip
2024-08-11 17:34 ` [PATCH v2 1/3] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
@ 2024-08-12 20:30 ` Junio C Hamano
2024-08-13 13:39 ` Derrick Stolee
0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2024-08-12 20:30 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, vdye, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> Add a new reachability algorithm that intends to discover (from a heuristic)
> which branch was used as the starting point for a given commit. Add focused
> tests using the 'test-tool reach' command.
>
> Repositories that use pull requests (or merge requests) to advance one or
> more "protected" branches, the history of that reference can be recovered by
> following the first-parent history in most cases.
I cannot quite parse it, but perhaps "Repositories that" -> "In
repositories that"?
> Most are completed using
> no-fast-forward merges, though squash merges are quite common. Less common
> is rebase-and-merge, which still validates this assumption. Finally, the
> case that breaks this assumption is the fast-forward update (with potential
> rebasing). Even in this case, the previous commit commonly appears in the
> first-parent history of the branch.
> Given current command-line interface options, this optimization criteria is
> not easy to detect directly. Even using the command
>
> git rev-list --count --first-parent <base>..<source>
>
> does not measure this count, as it uses full reachability from <base> to
> determine which commits to remove from the range '<base>..<source>'.
Makes me wonder if "--ancestry-path" would help.
> The trickiest part of the integer slab is what happens when reaching a
> collision among the histories of the bases and the history of the source.
> This is noticed when viewing the first parent and seeing that it has a slab
> value that differs in sign (negative or positive). In this case, the
> collision commit is stored in the method variable 'branch_point' and its
> slab value is set to -1. The index of the best base (so far) is stored in
> the method variable 'best_index'. It is possible that there are multiple
> commits that have the branch_point as its first parent, leading to multiple
> updates of best_index. The result is determined when 'branch_point' is
> visited in the commit walk, giving the guarantee that all commits that could
> reach 'branch_point' were visited.
OK.
> +/*
> + * This slab initializes integers to zero, so use "-1" for "tip is best" and
> + * "i + 1" for "bases[i] is best".
> + */
> +define_commit_slab(best_branch_base, int);
> +static struct best_branch_base best_branch_base;
> +#define get_best(c) (*best_branch_base_at(&best_branch_base, c))
> +#define set_best(c,v) (*best_branch_base_at(&best_branch_base, c) = v)
Micronit. Prepare for macro arguments to be expressions, even if
current callers don't use anything more complex, i.e., something
like
(*best_branch_base_at(&best_branch_base, (c)))
(*best_branch_base_at(&best_branch_base, (c)) = (v))
> + if (found_missing_gen) {
> + struct commit **commits;
> + size_t commits_nr = bases_nr + 1;
> +
> + CALLOC_ARRAY(commits, commits_nr);
> + COPY_ARRAY(commits, bases, bases_nr);
> + commits[bases_nr] = tip;
> + ensure_generations_valid(r, commits, commits_nr);
> + free(commits);
> + }
It would have been very unfortunate if this copying were done only
because commits and tip are not in the same array, but the called
function mutates the given array of commits so we cannot avoid
passing a copy anyway. Given these constraints, this is the
cleanest implementation, probably.
> +
> + /* Initialize queue and slab now that generations are guaranteed. */
> + init_best_branch_base(&best_branch_base);
> + set_best(tip, -1);
> + prio_queue_put(&queue, tip);
> +
> + for (size_t i = 0; i < bases_nr; i++) {
> + struct commit *c = bases[i];
> +
> + /* Has this already been marked as best by another commit? */
> + if (get_best(c))
> + continue;
Oh, so this defines the tie-breaking behaviour, but simply removing
it is a wrong solution if we wanted our tie-breaking to work as
"last one wins", as we still do not want to put it in the queue, so
this "if best is already found, skip the rest" is serving dual
purposes. Good.
> + set_best(c, i + 1);
> + prio_queue_put(&queue, c);
> + }
> +
> + while (queue.nr) {
> + struct commit *c = prio_queue_get(&queue);
> + int best_for_c = get_best(c);
> + int best_for_p, positive;
> + struct commit *parent;
> +
> + /* Have we reached a known branch point? It's optimal. */
> + if (c == branch_point)
> + break;
> +
> + repo_parse_commit(r, c);
> + if (!c->parents)
> + continue;
> +
> + parent = c->parents->item;
> + repo_parse_commit(r, parent);
> + best_for_p = get_best(parent);
> +
> + if (!best_for_p) {
> + /* 'parent' is new, so pass along best_for_c. */
> + set_best(parent, best_for_c);
> + prio_queue_put(&queue, parent);
> + continue;
> + }
> +
> + if (best_for_p > 0 && best_for_c > 0) {
> + /* Collision among bases. Minimize. */
> + if (best_for_c < best_for_p)
> + set_best(parent, best_for_c);
> + continue;
> + }
> +
> + /*
> + * At this point, we have reached a commit that is reachable
> + * from the tip, either from 'c' or from an earlier commit to
> + * have 'parent' as its first parent.
> + *
> + * Update 'best_index' to match the minimum of all base indices
> + * to reach 'parent'.
> + */
> +
> + /* Exactly one is positive due to initial conditions. */
> + positive = (best_for_c < 0) ? best_for_p : best_for_c;
> +
> + if (best_index < 0 || positive < best_index)
> + best_index = positive;
> +
> + /* No matter what, track that the parent is reachable from tip. */
> + set_best(parent, -1);
> + branch_point = parent;
> + }
> +
> + clear_best_branch_base(&best_branch_base);
> + clear_prio_queue(&queue);
OK. We get rid of the slab and prio-queue once we are done.
Nice.
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 2/3] for-each-ref: add 'is-base' token
2024-08-11 17:34 ` [PATCH v2 2/3] for-each-ref: add 'is-base' token Derrick Stolee via GitGitGadget
@ 2024-08-12 21:05 ` Junio C Hamano
2024-08-13 13:44 ` Derrick Stolee
0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2024-08-12 21:05 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, vdye, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +is-base:<committish>::
> + In at most one row, `(<committish>)` will appear to indicate the ref
> + that is most likely the ref used as a starting point for the branch
> + that produced `<committish>`. This choice is made using a heuristic:
> + choose the ref that minimizes the number of commits in the
> + first-parent history of `<committish>` and not in the first-parent
> + history of the ref.
Very nicely described.
Giving the end-user oriented "purpose/meaning" first makes it easier
to understand for readers when they want to use it, and giving the
heuristics to compute the result (and the example) next allows them
to verify that the feature matches what they are looking for.
> @@ -2475,6 +2495,16 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
> v->s = xstrdup("");
> }
> continue;
> + } else if (atom_type == ATOM_ISBASE) {
> + if (ref->is_base && ref->is_base[is_base_atoms]) {
> + v->s = xstrfmt("(%s)", ref->is_base[is_base_atoms]);
> + free(ref->is_base[is_base_atoms]);
> + } else {
> + /* Not a commit. */
This is unexpected. I thought that most of the branches except at
most one that gets annotated with "Yeah, this is forked from branch
B" would take the "else" side. They are still commits, no?
> + v->s = xstrdup("");
> + }
> + is_base_atoms++;
> + continue;
> } else
> continue;
>
> @@ -2876,6 +2906,7 @@ static void free_array_item(struct ref_array_item *item)
> free(item->value);
> }
> free(item->counts);
> + free(item->is_base);
> free(item);
> }
>
> @@ -3040,6 +3071,49 @@ void filter_ahead_behind(struct repository *r,
> free(commits);
> }
>
> +void filter_is_base(struct repository *r,
> + struct ref_format *format,
> + struct ref_array *array)
> +{
> + struct commit **bases;
> + size_t bases_nr = 0;
> + struct ref_array_item **back_index;
> +
> + if (!format->is_base_tips.nr || !array->nr)
> + return;
> +
> + CALLOC_ARRAY(back_index, array->nr);
> + CALLOC_ARRAY(bases, array->nr);
> +
> + for (size_t i = 0; i < array->nr; i++) {
> + const char *name = array->items[i]->refname;
> + struct commit *c = lookup_commit_reference_by_name(name);
> +
> + CALLOC_ARRAY(array->items[i]->is_base, format->is_base_tips.nr);
> +
> + if (!c)
> + continue;
Hmph, wouldn't we want to leave array->items[i]->is_base NULL if
"name" looked up to "c" happens to be non-commit (i.e. NULL)?
> + back_index[bases_nr] = array->items[i];
> + bases[bases_nr] = c;
> + bases_nr++;
> + }
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 1/3] commit-reach: add get_branch_base_for_tip
2024-08-12 20:30 ` Junio C Hamano
@ 2024-08-13 13:39 ` Derrick Stolee
0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2024-08-13 13:39 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, vdye
On 8/12/24 4:30 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> Repositories that use pull requests (or merge requests) to advance one or
>> more "protected" branches, the history of that reference can be recovered by
>> following the first-parent history in most cases.
>
> I cannot quite parse it, but perhaps "Repositories that" -> "In
> repositories that"?
That is an improvement, thanks.
>> Most are completed using
>> no-fast-forward merges, though squash merges are quite common. Less common
>> is rebase-and-merge, which still validates this assumption. Finally, the
>> case that breaks this assumption is the fast-forward update (with potential
>> rebasing). Even in this case, the previous commit commonly appears in the
>> first-parent history of the branch.
>
>> Given current command-line interface options, this optimization criteria is
>> not easy to detect directly. Even using the command
>>
>> git rev-list --count --first-parent <base>..<source>
>>
>> does not measure this count, as it uses full reachability from <base> to
>> determine which commits to remove from the range '<base>..<source>'.
>
> Makes me wonder if "--ancestry-path" would help.
One difficulty here is that we don't know the "first-parent merge base"
to supply to the --ancestry-path argument. You could first find this by
running
git rev-list --first-parent --boundary --reverse A...B
and pulling out the first boundary commit 'C'. Then, that could be used in
git rev-list --first-parent --count --ancestry-path=C B
I believe that this two-process-per-ref approach would provide an
existing way to compute these results.
>> The trickiest part of the integer slab is what happens when reaching a
>> collision among the histories of the bases and the history of the source.
>> This is noticed when viewing the first parent and seeing that it has a slab
>> value that differs in sign (negative or positive). In this case, the
>> collision commit is stored in the method variable 'branch_point' and its
>> slab value is set to -1. The index of the best base (so far) is stored in
>> the method variable 'best_index'. It is possible that there are multiple
>> commits that have the branch_point as its first parent, leading to multiple
>> updates of best_index. The result is determined when 'branch_point' is
>> visited in the commit walk, giving the guarantee that all commits that could
>> reach 'branch_point' were visited.
>
> OK.
>
>> +/*
>> + * This slab initializes integers to zero, so use "-1" for "tip is best" and
>> + * "i + 1" for "bases[i] is best".
>> + */
>> +define_commit_slab(best_branch_base, int);
>> +static struct best_branch_base best_branch_base;
>> +#define get_best(c) (*best_branch_base_at(&best_branch_base, c))
>> +#define set_best(c,v) (*best_branch_base_at(&best_branch_base, c) = v)
>
> Micronit. Prepare for macro arguments to be expressions, even if
> current callers don't use anything more complex, i.e., something
> like
>
> (*best_branch_base_at(&best_branch_base, (c)))
> (*best_branch_base_at(&best_branch_base, (c)) = (v))
Thanks. I should have caught this myself.
>> +
>> + /* Initialize queue and slab now that generations are guaranteed. */
>> + init_best_branch_base(&best_branch_base);
>> + set_best(tip, -1);
>> + prio_queue_put(&queue, tip);
>> +
>> + for (size_t i = 0; i < bases_nr; i++) {
>> + struct commit *c = bases[i];
>> +
>> + /* Has this already been marked as best by another commit? */
>> + if (get_best(c))
>> + continue;
>
> Oh, so this defines the tie-breaking behaviour, but simply removing
> it is a wrong solution if we wanted our tie-breaking to work as
> "last one wins", as we still do not want to put it in the queue, so
> this "if best is already found, skip the rest" is serving dual
> purposes. Good.
When trying to make a test case for the for-each-ref behavior around
non-commits, I noticed a bug here. If get_best(c) is -1, then 'c' is
equal to the base and should be selected. I will update the logic here
and add an appropriate test in this patch.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 2/3] for-each-ref: add 'is-base' token
2024-08-12 21:05 ` Junio C Hamano
@ 2024-08-13 13:44 ` Derrick Stolee
0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2024-08-13 13:44 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, vdye
On 8/12/24 5:05 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> + } else if (atom_type == ATOM_ISBASE) {
>> + if (ref->is_base && ref->is_base[is_base_atoms]) {
>> + v->s = xstrfmt("(%s)", ref->is_base[is_base_atoms]);
>> + free(ref->is_base[is_base_atoms]);
>> + } else {
>> + /* Not a commit. */
>
> This is unexpected. I thought that most of the branches except at
> most one that gets annotated with "Yeah, this is forked from branch
> B" would take the "else" side. They are still commits, no?
You are correct. This is leftover from copy-pasting the ahead-behind section.
Will remove.
>> + for (size_t i = 0; i < array->nr; i++) {
>> + const char *name = array->items[i]->refname;
>> + struct commit *c = lookup_commit_reference_by_name(name);
>> +
>> + CALLOC_ARRAY(array->items[i]->is_base, format->is_base_tips.nr);
>> +
>> + if (!c)
>> + continue;
>
> Hmph, wouldn't we want to leave array->items[i]->is_base NULL if
> "name" looked up to "c" happens to be non-commit (i.e. NULL)?
Your comment initially made me second-guess the logic here, but...
>> + back_index[bases_nr] = array->items[i];
>> + bases[bases_nr] = c;
>> + bases_nr++;
This array of "back_index" is intended to allow the array being passed to
get_branch_base_for_tip() to have no gaps with NULL commits. The indices
are then translated back to the original array items when scanning the
results.
This matches the behavior of the ahead-behind code, giving an existing
behavior. The alternative would be to allow get_branch_base_for_tip() to
be sensitive to NULL commits in the 'bases' array. But since we need to
create an array of commit pointers (different from the array of ref
items that we start with) this is likely the simplest approach.
You did inspire me to double-check that this code works in the presence
of non-commit refs, so I'll update some things and send a v3 with a new
test. It will also include some things to make error messages quieter
for that case.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 0/4] git for-each-ref: is-base atom and base branches
2024-08-11 17:34 ` [PATCH v2 " Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2024-08-11 17:34 ` [PATCH v2 3/3] p1500: add is-base performance tests Derrick Stolee via GitGitGadget
@ 2024-08-14 10:31 ` Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 1/4] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
` (4 more replies)
3 siblings, 5 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-14 10:31 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee
This change introduces a new 'git for-each-ref' atom, 'is-base', in a very
similar way to the 'ahead-behind' atom. As detailed carefully in the first
change, this is motivated by the need to detect the concept of a "base
branch" in a repository with multiple long-lived branches.
This change is motivated by a third-party tool created to make this
detection with the same optimization mechanism, but using a much slower
technique due to the limitations of the Git CLI not presenting this
information. The existing algorithm involves using git rev-list
--first-parent -<N> in batches for the collection of considered references,
comparing those lists, and increasing <N> as needed until finding a
collision. This new use of 'git for-each-ref' will allow determining this
mechanism within a single process and walking a minimal number of commits.
There are benefits to users both on client-side and server-side. In an
internal monorepo, this base branch detection algorithm is used to determine
a long-lived branch based on the HEAD commit, mapping to a group within the
organizational structure of the repository, which determines a set of
projects that the user will likely need to build; this leads to
automatically selecting an initial sparse-checkout definition based on the
build dependencies required. An upcoming feature in Azure Repos will use
this algorithm to automatically create a pull request against the correct
target branch, reducing user pain from needing to select a different branch
after a large commit diff is rendered against the default branch. This atom
unlocks that ability for Git hosting services that use Git in their backend.
Thanks, -Stolee
Updates in v2
=============
* I had forgotten to include a documentation change in v1. My attempt to
create a succinct doc change in a follow-up hunk continued to be
confusing. This version includes a more expanded version of the
documentation blurb for the is-base token.
Updates in v3
=============
* Corrected some grammar in a commit message.
* Fixed (and tested for) a bug where the source branch is equal to a
candidate ref.
* Added a test in t6500-for-each-ref.sh to cover some non-commit refs and
some broken objects.
* Motivated by the test in t6500, add a new patch that adds a ..._gently()
method to reduce error noise for non-commit refs.
Derrick Stolee (4):
commit-reach: add get_branch_base_for_tip
commit: add gentle reference lookup method
for-each-ref: add 'is-base' token
p1500: add is-base performance tests
Documentation/git-for-each-ref.txt | 42 ++++++++++
commit-reach.c | 126 +++++++++++++++++++++++++++++
commit-reach.h | 17 ++++
commit.c | 8 +-
commit.h | 2 +
ref-filter.c | 77 +++++++++++++++++-
ref-filter.h | 15 ++++
t/helper/test-reach.c | 2 +
t/perf/p1500-graph-walks.sh | 31 +++++++
t/t6300-for-each-ref.sh | 9 +++
t/t6600-test-reach.sh | 121 +++++++++++++++++++++++++++
11 files changed, 448 insertions(+), 2 deletions(-)
base-commit: bea9ecd24b0c3bf06cab4a851694fe09e7e51408
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1768%2Fderrickstolee%2Ftarget-ref-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1768/derrickstolee/target-ref-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1768
Range-diff vs v2:
1: 580026f910d ! 1: f93d642c8d9 commit-reach: add get_branch_base_for_tip
@@ Commit message
which branch was used as the starting point for a given commit. Add focused
tests using the 'test-tool reach' command.
- Repositories that use pull requests (or merge requests) to advance one or
+ In repositories that use pull requests (or merge requests) to advance one or
more "protected" branches, the history of that reference can be recovered by
following the first-parent history in most cases. Most are completed using
no-fast-forward merges, though squash merges are quite common. Less common
@@ commit-reach.c: done:
+ */
+define_commit_slab(best_branch_base, int);
+static struct best_branch_base best_branch_base;
-+#define get_best(c) (*best_branch_base_at(&best_branch_base, c))
-+#define set_best(c,v) (*best_branch_base_at(&best_branch_base, c) = v)
++#define get_best(c) (*best_branch_base_at(&best_branch_base, (c)))
++#define set_best(c,v) (*best_branch_base_at(&best_branch_base, (c)) = (v))
+
+int get_branch_base_for_tip(struct repository *r,
+ struct commit *tip,
@@ commit-reach.c: done:
+
+ for (size_t i = 0; i < bases_nr; i++) {
+ struct commit *c = bases[i];
++ int best = get_best(c);
+
+ /* Has this already been marked as best by another commit? */
-+ if (get_best(c))
++ if (best) {
++ if (best == -1) {
++ /* We agree at this position. Stop now. */
++ best_index = i + 1;
++ goto cleanup;
++ }
+ continue;
++ }
+
+ set_best(c, i + 1);
+ prio_queue_put(&queue, c);
@@ commit-reach.c: done:
+ branch_point = parent;
+ }
+
++cleanup:
+ clear_best_branch_base(&best_branch_base);
+ clear_prio_queue(&queue);
+ return best_index > 0 ? best_index - 1 : -1;
@@ t/t6600-test-reach.sh: test_expect_success 'for-each-ref merged:none' '
+ test_all_modes get_branch_base_for_tip
+'
+
++test_expect_success 'get_branch_base_for_tip: equal to tip' '
++ # (2,3) branched from the first tip (i,4) in X with i > 2
++ cat >input <<-\EOF &&
++ A:commit-8-4
++ X:commit-1-2
++ X:commit-1-4
++ X:commit-4-4
++ X:commit-8-4
++ X:commit-10-4
++ EOF
++ echo "get_branch_base_for_tip(A,X):3" >expect &&
++ test_all_modes get_branch_base_for_tip
++'
++
+test_expect_success 'get_branch_base_for_tip: all reach tip' '
+ # (2,3) branched from the first tip (i,4) in X with i > 2
+ cat >input <<-\EOF &&
-: ----------- > 2: 5240c2a7b32 commit: add gentle reference lookup method
2: 13341e7e512 ! 3: df05cee6003 for-each-ref: add 'is-base' token
@@ ref-filter.c: static int populate_value(struct ref_array_item *ref, struct strbu
+ v->s = xstrfmt("(%s)", ref->is_base[is_base_atoms]);
+ free(ref->is_base[is_base_atoms]);
+ } else {
-+ /* Not a commit. */
+ v->s = xstrdup("");
+ }
+ is_base_atoms++;
@@ ref-filter.c: void filter_ahead_behind(struct repository *r,
+
+ for (size_t i = 0; i < array->nr; i++) {
+ const char *name = array->items[i]->refname;
-+ struct commit *c = lookup_commit_reference_by_name(name);
++ struct commit *c = lookup_commit_reference_by_name_gently(name, 1);
+
+ CALLOC_ARRAY(array->items[i]->is_base, format->is_base_tips.nr);
+
@@ ref-filter.h: void filter_ahead_behind(struct repository *r,
void ref_filter_clear(struct ref_filter *filter);
+ ## t/t6300-for-each-ref.sh ##
+@@ t/t6300-for-each-ref.sh: test_expect_success 'git for-each-ref with nested tags' '
+ test_cmp expect actual
+ '
+
++test_expect_success 'is-base atom with non-commits' '
++ git for-each-ref --format="%(is-base:HEAD) %(refname)" >out 2>err &&
++ grep "(HEAD) refs/heads/main" out &&
++
++ test_line_count = 2 err &&
++ grep "error: object .* is a commit, not a blob" err &&
++ grep "error: bad tag pointer to" err
++'
++
+ GRADE_FORMAT="%(signature:grade)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
+ TRUSTLEVEL_FORMAT="%(signature:trustlevel)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
+
+
## t/t6600-test-reach.sh ##
@@ t/t6600-test-reach.sh: test_expect_success 'get_branch_base_for_tip: all reach tip' '
test_all_modes get_branch_base_for_tip
@@ t/t6600-test-reach.sh: test_expect_success 'get_branch_base_for_tip: all reach t
+ --format="%(refname):%(is-base:commit-4-1)" --stdin
+'
+
++test_expect_success 'for-each-ref is-base: equal to tip' '
++ cat >input <<-\EOF &&
++ refs/heads/commit-4-2
++ refs/heads/commit-5-1
++ EOF
++ cat >expect <<-\EOF &&
++ refs/heads/commit-4-2:(commit-4-2)
++ refs/heads/commit-5-1:
++ EOF
++ run_all_modes git for-each-ref \
++ --format="%(refname):%(is-base:commit-4-2)" --stdin
++'
++
+test_expect_success 'for-each-ref is-base:multiple' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-1-1
3: 757c20090db = 4: cce9921bbd8 p1500: add is-base performance tests
--
gitgitgadget
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 1/4] commit-reach: add get_branch_base_for_tip
2024-08-14 10:31 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
@ 2024-08-14 10:31 ` Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 2/4] commit: add gentle reference lookup method Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
4 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-14 10:31 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Add a new reachability algorithm that intends to discover (from a heuristic)
which branch was used as the starting point for a given commit. Add focused
tests using the 'test-tool reach' command.
In repositories that use pull requests (or merge requests) to advance one or
more "protected" branches, the history of that reference can be recovered by
following the first-parent history in most cases. Most are completed using
no-fast-forward merges, though squash merges are quite common. Less common
is rebase-and-merge, which still validates this assumption. Finally, the
case that breaks this assumption is the fast-forward update (with potential
rebasing). Even in this case, the previous commit commonly appears in the
first-parent history of the branch.
Similar assumptions can be made for a topic branch created by a single user
with the intention to merge back into another branch. Using 'git commit',
'git merge', and 'git cherry-pick' from HEAD will default to having the
first-parent commit be the previous commit at HEAD. This history changes
only with commands such as 'git reset' or 'git rebase', where the command
names also imply that the branch is starting from a new location.
With this movement of branches in mind, the following heuristic is proposed
as a way to determine the base branch for a given source branch:
Among a list of candidate base branches, select the candidate that
minimizes the number of commits in the first-parent history of the source
that are not in the first-parent history of the candidate.
Prior third-party solutions to this problem have used this optimization
criteria, but have relied upon extracting the first-parent history and
comparing those lists as tables instead of using commit-graph walks.
Given current command-line interface options, this optimization criteria is
not easy to detect directly. Even using the command
git rev-list --count --first-parent <base>..<source>
does not measure this count, as it uses full reachability from <base> to
determine which commits to remove from the range '<base>..<source>'. This
may lead to one asking if we should instead be using the full reachability
of the candidate and only the first-parent history of the source. This,
unfortunately, does not work for repositories that use long-lived branches
and automation to merge across those branches.
In extremely large repositories, merging into a single trunk may not be
feasible. This is usually due to the desired frequency of updates
(thousands of engineers doing daily work) combined with the time required to
perform a validation build. These factors combine to create significant
risk of semantic merge conflicts, leading to build breaks on the trunk. In
response, repository maintainers can create a single Level Zero (L0) trunk
and multiple Level One (L1) branches. By partitioning the engineers by
organization, these engineers may see lower risk of semantic merge conflicts
as well as be protected against build breaks in other L1 branches. The key
to making this system work is a semi-automated process of merging L1
branches into the L0 trunk and vice-versa. In a large enough organization,
these L1 branches may further split into L2 or L3 branches, but the same
principles apply for merging across deeper levels.
If these automated merges use a typical merge with the second parent
bringing in the "new" content, then each L0 and L1 branch can track its
previous positions by following first-parent history, which appear as
parallel paths (until reaching the first place where the branches diverged).
If we also walk to second parents, then the histories overlap significantly
and cannot be distinguished except for very-recent changes.
For this reason, the first-parent condition should be symmetrical across the
base and source branches.
Another common case for desiring the result of this optimization method is
the use of release branches. When releasing a version of a repository, a
branch can be used to track that release. Any updates that are worth fixing
in that release can be merged to the release branch and shipped with only
the necessary fixes without any new features introduced in the trunk branch.
The 'maint-2.<X>' branches represent this pattern in the Git project. The
microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that
are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This
application doesn't need the symmetrical first-parent condition, but the use
of first-parent histories does not change the results for these branches.
To determine the base branch from a list of candidates, create a new method
in commit-reach.c that performs a single* commit-graph walk. The core
concept is to walk first-parents starting at the candidate bases and the
source, tracking the "best" base to reach a given commit. Use generation
numbers to ensure that a commit is walked at most once and all children have
been explored before visiting it. When reaching a commit that is reachable
from both a base and the source, we will then have a guarantee that this is
the closest intersection of first-parent histories. Track the best base to
reach that commit and return it as a result. In rare cases involving
multiple root commits, the first-parent history of the source may never
intersect any of the candidates and thus a null result is returned.
* There are up to two walks, since we require all commits to have a computed
generation number in order to avoid incorrect results. This is similar to
the need for computed generation numbers in ahead_behind() as implemented
in fd67d149bde (commit-reach: implement ahead_behind() logic, 2023-03-20).
In order to track the "best" base, use a new commit slab that stores an
integer. This value defaults to zero upon initialization, so use -1 to
track that the source commit can reach this commit and use 'i + 1' to track
that the ith base can reach this commit. When multiple bases can reach a
commit, minimize the index to break ties. This allows the caller to specify
an order to the bases that determines some amount of preference when the
heuristic does not result in a unique result.
The trickiest part of the integer slab is what happens when reaching a
collision among the histories of the bases and the history of the source.
This is noticed when viewing the first parent and seeing that it has a slab
value that differs in sign (negative or positive). In this case, the
collision commit is stored in the method variable 'branch_point' and its
slab value is set to -1. The index of the best base (so far) is stored in
the method variable 'best_index'. It is possible that there are multiple
commits that have the branch_point as its first parent, leading to multiple
updates of best_index. The result is determined when 'branch_point' is
visited in the commit walk, giving the guarantee that all commits that could
reach 'branch_point' were visited.
Several interesting cases of collisions and different results are tested in
the t6600-test-reach.sh script. Recall that this script also tests the
algorithm in three possible states involving the commit-graph file and how
many commits are written in the file. This provides some coverage of the
need (and lack of need) for the ensure_generations_valid() method.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
commit-reach.c | 126 ++++++++++++++++++++++++++++++++++++++++++
commit-reach.h | 17 ++++++
t/helper/test-reach.c | 2 +
t/t6600-test-reach.sh | 61 ++++++++++++++++++++
4 files changed, 206 insertions(+)
diff --git a/commit-reach.c b/commit-reach.c
index 8f9b008f876..4753197ec88 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1222,3 +1222,129 @@ done:
free(commits);
repo_clear_commit_marks(r, SEEN);
}
+
+/*
+ * This slab initializes integers to zero, so use "-1" for "tip is best" and
+ * "i + 1" for "bases[i] is best".
+ */
+define_commit_slab(best_branch_base, int);
+static struct best_branch_base best_branch_base;
+#define get_best(c) (*best_branch_base_at(&best_branch_base, (c)))
+#define set_best(c,v) (*best_branch_base_at(&best_branch_base, (c)) = (v))
+
+int get_branch_base_for_tip(struct repository *r,
+ struct commit *tip,
+ struct commit **bases,
+ size_t bases_nr)
+{
+ int best_index = -1;
+ struct commit *branch_point = NULL;
+ struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+ int found_missing_gen = 0;
+
+ if (!bases_nr)
+ return -1;
+
+ repo_parse_commit(r, tip);
+ if (commit_graph_generation(tip) == GENERATION_NUMBER_INFINITY)
+ found_missing_gen = 1;
+
+ /* Check for missing generation numbers. */
+ for (size_t i = 0; i < bases_nr; i++) {
+ struct commit *c = bases[i];
+ repo_parse_commit(r, c);
+ if (commit_graph_generation(c) == GENERATION_NUMBER_INFINITY)
+ found_missing_gen = 1;
+ }
+
+ if (found_missing_gen) {
+ struct commit **commits;
+ size_t commits_nr = bases_nr + 1;
+
+ CALLOC_ARRAY(commits, commits_nr);
+ COPY_ARRAY(commits, bases, bases_nr);
+ commits[bases_nr] = tip;
+ ensure_generations_valid(r, commits, commits_nr);
+ free(commits);
+ }
+
+ /* Initialize queue and slab now that generations are guaranteed. */
+ init_best_branch_base(&best_branch_base);
+ set_best(tip, -1);
+ prio_queue_put(&queue, tip);
+
+ for (size_t i = 0; i < bases_nr; i++) {
+ struct commit *c = bases[i];
+ int best = get_best(c);
+
+ /* Has this already been marked as best by another commit? */
+ if (best) {
+ if (best == -1) {
+ /* We agree at this position. Stop now. */
+ best_index = i + 1;
+ goto cleanup;
+ }
+ continue;
+ }
+
+ set_best(c, i + 1);
+ prio_queue_put(&queue, c);
+ }
+
+ while (queue.nr) {
+ struct commit *c = prio_queue_get(&queue);
+ int best_for_c = get_best(c);
+ int best_for_p, positive;
+ struct commit *parent;
+
+ /* Have we reached a known branch point? It's optimal. */
+ if (c == branch_point)
+ break;
+
+ repo_parse_commit(r, c);
+ if (!c->parents)
+ continue;
+
+ parent = c->parents->item;
+ repo_parse_commit(r, parent);
+ best_for_p = get_best(parent);
+
+ if (!best_for_p) {
+ /* 'parent' is new, so pass along best_for_c. */
+ set_best(parent, best_for_c);
+ prio_queue_put(&queue, parent);
+ continue;
+ }
+
+ if (best_for_p > 0 && best_for_c > 0) {
+ /* Collision among bases. Minimize. */
+ if (best_for_c < best_for_p)
+ set_best(parent, best_for_c);
+ continue;
+ }
+
+ /*
+ * At this point, we have reached a commit that is reachable
+ * from the tip, either from 'c' or from an earlier commit to
+ * have 'parent' as its first parent.
+ *
+ * Update 'best_index' to match the minimum of all base indices
+ * to reach 'parent'.
+ */
+
+ /* Exactly one is positive due to initial conditions. */
+ positive = (best_for_c < 0) ? best_for_p : best_for_c;
+
+ if (best_index < 0 || positive < best_index)
+ best_index = positive;
+
+ /* No matter what, track that the parent is reachable from tip. */
+ set_best(parent, -1);
+ branch_point = parent;
+ }
+
+cleanup:
+ clear_best_branch_base(&best_branch_base);
+ clear_prio_queue(&queue);
+ return best_index > 0 ? best_index - 1 : -1;
+}
diff --git a/commit-reach.h b/commit-reach.h
index bf63cc468fd..9a745b7e176 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -139,4 +139,21 @@ void tips_reachable_from_bases(struct repository *r,
struct commit **tips, size_t tips_nr,
int mark);
+/*
+ * Given a 'tip' commit and a list potential 'bases', return the index 'i' that
+ * minimizes the number of commits in the first-parent history of 'tip' and not
+ * in the first-parent history of 'bases[i]'.
+ *
+ * Among a list of long-lived branches that are updated only by merges (with the
+ * first parent being the previous position of the branch), this would inform
+ * which branch was used to create the tip reference.
+ *
+ * Returns -1 if no common point is found in first-parent histories, which is
+ * rare, but possible with multiple root commits.
+ */
+int get_branch_base_for_tip(struct repository *r,
+ struct commit *tip,
+ struct commit **bases,
+ size_t bases_nr);
+
#endif
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 1e3b431e3e7..8579b607aa5 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -114,6 +114,8 @@ int cmd__reach(int ac, const char **av)
repo_in_merge_bases_many(the_repository, A, X_nr, X_array, 0));
else if (!strcmp(av[1], "is_descendant_of"))
printf("%s(A,X):%d\n", av[1], repo_is_descendant_of(r, A, X));
+ else if (!strcmp(av[1], "get_branch_base_for_tip"))
+ printf("%s(A,X):%d\n", av[1], get_branch_base_for_tip(r, A, X_array, X_nr));
else if (!strcmp(av[1], "get_merge_bases_many")) {
struct commit_list *list = NULL;
if (repo_get_merge_bases_many(the_repository,
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b330945f497..e789a4720c1 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -612,4 +612,65 @@ test_expect_success 'for-each-ref merged:none' '
--format="%(refname)" --stdin
'
+# For get_branch_base_for_tip, we only care about
+# first-parent history. Here is the test graph with
+# second parents removed:
+#
+# (10,10)
+# /
+# (10,9) (9,10)
+# / /
+# (10,8) (9,9) (8,10)
+# / / /
+# ( continued...)
+# \ / / /
+# (3,1) (2,2) (1,3)
+# \ / /
+# (2,1) (1,2)
+# \ /
+# (1,1)
+#
+# In short, for a commit (i,j), the first-parent history
+# walks all commits (i, k) with k from j to 1, then the
+# commits (l, 1) with l from i to 1.
+
+test_expect_success 'get_branch_base_for_tip: none reach' '
+ # (2,3) branched from the first tip (i,4) in X with i > 2
+ cat >input <<-\EOF &&
+ A:commit-2-3
+ X:commit-1-2
+ X:commit-1-4
+ X:commit-4-4
+ X:commit-8-4
+ X:commit-10-4
+ EOF
+ echo "get_branch_base_for_tip(A,X):2" >expect &&
+ test_all_modes get_branch_base_for_tip
+'
+
+test_expect_success 'get_branch_base_for_tip: equal to tip' '
+ # (2,3) branched from the first tip (i,4) in X with i > 2
+ cat >input <<-\EOF &&
+ A:commit-8-4
+ X:commit-1-2
+ X:commit-1-4
+ X:commit-4-4
+ X:commit-8-4
+ X:commit-10-4
+ EOF
+ echo "get_branch_base_for_tip(A,X):3" >expect &&
+ test_all_modes get_branch_base_for_tip
+'
+
+test_expect_success 'get_branch_base_for_tip: all reach tip' '
+ # (2,3) branched from the first tip (i,4) in X with i > 2
+ cat >input <<-\EOF &&
+ A:commit-4-1
+ X:commit-4-2
+ X:commit-5-1
+ EOF
+ echo "get_branch_base_for_tip(A,X):0" >expect &&
+ test_all_modes get_branch_base_for_tip
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 2/4] commit: add gentle reference lookup method
2024-08-14 10:31 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 1/4] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
@ 2024-08-14 10:31 ` Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 3/4] for-each-ref: add 'is-base' token Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
4 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-14 10:31 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <derrickstolee@github.com>
The lookup_commit_reference_by_name() method uses lookup_commit_reference()
without an option to use lookup_commit_reference_gently(). Create a gentle
version of the method so it can be used in locations where non-commits may
be found but error messages should be silenced.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
commit.c | 8 +++++++-
commit.h | 2 ++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/commit.c b/commit.c
index 1a479a997c4..ed49be8dce5 100644
--- a/commit.c
+++ b/commit.c
@@ -82,13 +82,19 @@ struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
}
struct commit *lookup_commit_reference_by_name(const char *name)
+{
+ return lookup_commit_reference_by_name_gently(name, 0);
+}
+
+struct commit *lookup_commit_reference_by_name_gently(const char *name,
+ int quiet)
{
struct object_id oid;
struct commit *commit;
if (repo_get_oid_committish(the_repository, name, &oid))
return NULL;
- commit = lookup_commit_reference(the_repository, &oid);
+ commit = lookup_commit_reference_gently(the_repository, &oid, quiet);
if (repo_parse_commit(the_repository, commit))
return NULL;
return commit;
diff --git a/commit.h b/commit.h
index 62fe0d77a70..ef17668cc69 100644
--- a/commit.h
+++ b/commit.h
@@ -81,6 +81,8 @@ struct commit *lookup_commit_reference_gently(struct repository *r,
const struct object_id *oid,
int quiet);
struct commit *lookup_commit_reference_by_name(const char *name);
+struct commit *lookup_commit_reference_by_name_gently(const char *name,
+ int quiet);
/*
* Look up object named by "oid", dereference tag as necessary,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 3/4] for-each-ref: add 'is-base' token
2024-08-14 10:31 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 1/4] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 2/4] commit: add gentle reference lookup method Derrick Stolee via GitGitGadget
@ 2024-08-14 10:31 ` Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 4/4] p1500: add is-base performance tests Derrick Stolee via GitGitGadget
2024-08-19 19:52 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Junio C Hamano
4 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-14 10:31 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The previous change introduced the get_branch_base_for_tip() method in
commit-reach.c. The motivation of that change was about using a heuristic to
deteremine the base branch for a source commit from a list of candidate
commit tips. This change makes that algorithm visible to users via a new
atom in the 'git for-each-ref' format. This change is very similar to the
chang in 49abcd21da6 (for-each-ref: add ahead-behind format atom,
2023-03-20).
Introduce the 'is-base:<source>' atom, which will indicate that the
algorithm should be computed and the result of the algorithm is reported
using an indicator of the form '(<source>)'. For example, using
'%(is-base:HEAD)' would result in one line having the token '(HEAD)'.
Use the sorted order of refs included in the ref filter to break ties in the
algorithm's heuristic. In the previous change, the motivating examples
include using an L0 trunk, long-lived L1 branches, and temporary release
branches. A caller could communicate the ordered preference among these
categories using the input refpecs and avoiding a different sort mechanism.
This sorting behavior is tested in the test scripts.
It is important to include this atom as a special case to
can_do_iterative_format() to match the expectations created in bd98f9774e1
(ref-filter.c: filter & format refs in the same callback, 2023-11-14). The
ahead-behind atom was one of the special cases, and this similarly requires
using an algorithm across all input refs before starting the format of any
single ref.
In the test script, the format tokens use colons or lack whitespace to avoid
Git complaining about trailing whitespace errors.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-for-each-ref.txt | 42 ++++++++++++++++
ref-filter.c | 77 +++++++++++++++++++++++++++++-
ref-filter.h | 15 ++++++
t/t6300-for-each-ref.sh | 9 ++++
t/t6600-test-reach.sh | 60 +++++++++++++++++++++++
5 files changed, 202 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index c1dd12b93cf..d3764401a23 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -264,6 +264,48 @@ ahead-behind:<committish>::
commits ahead and behind, respectively, when comparing the output
ref to the `<committish>` specified in the format.
+is-base:<committish>::
+ In at most one row, `(<committish>)` will appear to indicate the ref
+ that is most likely the ref used as a starting point for the branch
+ that produced `<committish>`. This choice is made using a heuristic:
+ choose the ref that minimizes the number of commits in the
+ first-parent history of `<committish>` and not in the first-parent
+ history of the ref.
++
+For example, consider the following figure of first-parent histories of
+several refs:
++
+----
+*--*--*--*--*--* refs/heads/A
+\
+ \
+ *--*--*--* refs/heads/B
+ \ \
+ \ \
+ * * refs/heads/C
+ \
+ \
+ *--* refs/heads/D
+----
++
+Here, if `A`, `B`, and `C` are the filtered references, and the format
+string is `%(refname):%(is-base:D)`, then the output would be
++
+----
+refs/heads/A:
+refs/heads/B:(D)
+refs/heads/C:
+----
++
+This is because the first-parent history of `D` has its earliest
+intersection with the first-parent histories of the filtered refs at a
+common first-parent ancestor of `B` and `C` and ties are broken by the
+earliest ref in the sorted order.
++
+Note that this token will not appear if the first-parent history of
+`<committish>` does not intersect the first-parent histories of the
+filtered refs.
+
describe[:options]::
A human-readable name, like linkgit:git-describe[1];
empty string for undescribable commits. The `describe` string may
diff --git a/ref-filter.c b/ref-filter.c
index 59ad6f54ddb..3d598f6b6e6 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -167,6 +167,7 @@ enum atom_type {
ATOM_ELSE,
ATOM_REST,
ATOM_AHEADBEHIND,
+ ATOM_ISBASE,
};
/*
@@ -889,6 +890,23 @@ static int ahead_behind_atom_parser(struct ref_format *format,
return 0;
}
+static int is_base_atom_parser(struct ref_format *format,
+ struct used_atom *atom UNUSED,
+ const char *arg, struct strbuf *err)
+{
+ struct string_list_item *item;
+
+ if (!arg)
+ return strbuf_addf_ret(err, -1, _("expected format: %%(is-base:<committish>)"));
+
+ item = string_list_append(&format->is_base_tips, arg);
+ item->util = lookup_commit_reference_by_name(arg);
+ if (!item->util)
+ die("failed to find '%s'", arg);
+
+ return 0;
+}
+
static int head_atom_parser(struct ref_format *format UNUSED,
struct used_atom *atom,
const char *arg, struct strbuf *err)
@@ -952,6 +970,7 @@ static struct {
[ATOM_ELSE] = { "else", SOURCE_NONE },
[ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser },
[ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser },
+ [ATOM_ISBASE] = { "is-base", SOURCE_OTHER, FIELD_STR, is_base_atom_parser },
/*
* Please update $__git_ref_fieldlist in git-completion.bash
* when you add new atoms
@@ -2334,6 +2353,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
int i;
struct object_info empty = OBJECT_INFO_INIT;
int ahead_behind_atoms = 0;
+ int is_base_atoms = 0;
CALLOC_ARRAY(ref->value, used_atom_cnt);
@@ -2475,6 +2495,15 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
v->s = xstrdup("");
}
continue;
+ } else if (atom_type == ATOM_ISBASE) {
+ if (ref->is_base && ref->is_base[is_base_atoms]) {
+ v->s = xstrfmt("(%s)", ref->is_base[is_base_atoms]);
+ free(ref->is_base[is_base_atoms]);
+ } else {
+ v->s = xstrdup("");
+ }
+ is_base_atoms++;
+ continue;
} else
continue;
@@ -2876,6 +2905,7 @@ static void free_array_item(struct ref_array_item *item)
free(item->value);
}
free(item->counts);
+ free(item->is_base);
free(item);
}
@@ -3040,6 +3070,49 @@ void filter_ahead_behind(struct repository *r,
free(commits);
}
+void filter_is_base(struct repository *r,
+ struct ref_format *format,
+ struct ref_array *array)
+{
+ struct commit **bases;
+ size_t bases_nr = 0;
+ struct ref_array_item **back_index;
+
+ if (!format->is_base_tips.nr || !array->nr)
+ return;
+
+ CALLOC_ARRAY(back_index, array->nr);
+ CALLOC_ARRAY(bases, array->nr);
+
+ for (size_t i = 0; i < array->nr; i++) {
+ const char *name = array->items[i]->refname;
+ struct commit *c = lookup_commit_reference_by_name_gently(name, 1);
+
+ CALLOC_ARRAY(array->items[i]->is_base, format->is_base_tips.nr);
+
+ if (!c)
+ continue;
+
+ back_index[bases_nr] = array->items[i];
+ bases[bases_nr] = c;
+ bases_nr++;
+ }
+
+ for (size_t i = 0; i < format->is_base_tips.nr; i++) {
+ struct commit *tip = format->is_base_tips.items[i].util;
+ int base_index = get_branch_base_for_tip(r, tip, bases, bases_nr);
+
+ if (base_index < 0)
+ continue;
+
+ /* Store the string for use in output later. */
+ back_index[base_index]->is_base[i] = xstrdup(format->is_base_tips.items[i].string);
+ }
+
+ free(back_index);
+ free(bases);
+}
+
static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
{
int ret = 0;
@@ -3126,7 +3199,8 @@ static inline int can_do_iterative_format(struct ref_filter *filter,
return !(filter->reachable_from ||
filter->unreachable_from ||
sorting ||
- format->bases.nr);
+ format->bases.nr ||
+ format->is_base_tips.nr);
}
void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
@@ -3150,6 +3224,7 @@ void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
struct ref_array array = { 0 };
filter_refs(&array, filter, type);
filter_ahead_behind(the_repository, format, &array);
+ filter_is_base(the_repository, format, &array);
ref_array_sort(sorting, &array);
print_formatted_ref_array(&array, format);
ref_array_clear(&array);
diff --git a/ref-filter.h b/ref-filter.h
index 0ca28d2bba6..20419a56218 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -48,6 +48,7 @@ struct ref_array_item {
struct commit *commit;
struct atom_value *value;
struct ahead_behind_count **counts;
+ char **is_base;
char refname[FLEX_ARRAY];
};
@@ -101,6 +102,9 @@ struct ref_format {
/* List of bases for ahead-behind counts. */
struct string_list bases;
+ /* List of bases for is-base indicators. */
+ struct string_list is_base_tips;
+
struct {
int max_count;
int omit_empty;
@@ -114,6 +118,7 @@ struct ref_format {
#define REF_FORMAT_INIT { \
.use_color = -1, \
.bases = STRING_LIST_INIT_DUP, \
+ .is_base_tips = STRING_LIST_INIT_DUP, \
}
/* Macros for checking --merged and --no-merged options */
@@ -203,6 +208,16 @@ void filter_ahead_behind(struct repository *r,
struct ref_format *format,
struct ref_array *array);
+/*
+ * If the provided format includes is-base atoms, then compute the base checks
+ * for those tips against all refs.
+ *
+ * If this is not called, then any is-base atoms will be blank.
+ */
+void filter_is_base(struct repository *r,
+ struct ref_format *format,
+ struct ref_array *array);
+
void ref_filter_init(struct ref_filter *filter);
void ref_filter_clear(struct ref_filter *filter);
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index eb6c8204e8b..8d15713cc67 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1907,6 +1907,15 @@ test_expect_success 'git for-each-ref with nested tags' '
test_cmp expect actual
'
+test_expect_success 'is-base atom with non-commits' '
+ git for-each-ref --format="%(is-base:HEAD) %(refname)" >out 2>err &&
+ grep "(HEAD) refs/heads/main" out &&
+
+ test_line_count = 2 err &&
+ grep "error: object .* is a commit, not a blob" err &&
+ grep "error: bad tag pointer to" err
+'
+
GRADE_FORMAT="%(signature:grade)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
TRUSTLEVEL_FORMAT="%(signature:trustlevel)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index e789a4720c1..2591f8b8b39 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -673,4 +673,64 @@ test_expect_success 'get_branch_base_for_tip: all reach tip' '
test_all_modes get_branch_base_for_tip
'
+test_expect_success 'for-each-ref is-base: none reach' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-1-1
+ refs/heads/commit-4-2
+ refs/heads/commit-4-4
+ refs/heads/commit-8-4
+ EOF
+ cat >expect <<-\EOF &&
+ refs/heads/commit-1-1:
+ refs/heads/commit-4-2:(commit-2-3)
+ refs/heads/commit-4-4:
+ refs/heads/commit-8-4:
+ EOF
+ run_all_modes git for-each-ref \
+ --format="%(refname):%(is-base:commit-2-3)" --stdin
+'
+
+test_expect_success 'for-each-ref is-base: all reach' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-4-2
+ refs/heads/commit-5-1
+ EOF
+ cat >expect <<-\EOF &&
+ refs/heads/commit-4-2:(commit-4-1)
+ refs/heads/commit-5-1:
+ EOF
+ run_all_modes git for-each-ref \
+ --format="%(refname):%(is-base:commit-4-1)" --stdin
+'
+
+test_expect_success 'for-each-ref is-base: equal to tip' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-4-2
+ refs/heads/commit-5-1
+ EOF
+ cat >expect <<-\EOF &&
+ refs/heads/commit-4-2:(commit-4-2)
+ refs/heads/commit-5-1:
+ EOF
+ run_all_modes git for-each-ref \
+ --format="%(refname):%(is-base:commit-4-2)" --stdin
+'
+
+test_expect_success 'for-each-ref is-base:multiple' '
+ cat >input <<-\EOF &&
+ refs/heads/commit-1-1
+ refs/heads/commit-4-2
+ refs/heads/commit-4-4
+ refs/heads/commit-8-4
+ EOF
+ cat >expect <<-\EOF &&
+ refs/heads/commit-1-1[-]
+ refs/heads/commit-4-2[(commit-2-3)-]
+ refs/heads/commit-4-4[-]
+ refs/heads/commit-8-4[-(commit-6-5)]
+ EOF
+ run_all_modes git for-each-ref \
+ --format="%(refname)[%(is-base:commit-2-3)-%(is-base:commit-6-5)]" --stdin
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 4/4] p1500: add is-base performance tests
2024-08-14 10:31 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2024-08-14 10:31 ` [PATCH v3 3/4] for-each-ref: add 'is-base' token Derrick Stolee via GitGitGadget
@ 2024-08-14 10:31 ` Derrick Stolee via GitGitGadget
2024-08-19 19:52 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Junio C Hamano
4 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-08-14 10:31 UTC (permalink / raw)
To: git; +Cc: gitster, vdye, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The previous two changes introduced a commit walking heuristic for finding
the most likely base branch for a given source. This algorithm walks
first-parent histories until reaching a collision.
This walk _should_ be very fast. Exceptions include cases where a
commit-graph file does not exist, leading to a full walk of all reachable
commits to compute generation numbers, or a case where no collision in the
first-parent history exists, leading to a walk of all first-parent history
to the root commits.
The p1500 test script guarantees a complete commit-graph file during its
setup, so we will not test that scenario. Do create a new root commit in an
effort to test the scenario of parallel first-parent histories.
Even with the extra root commit, these tests take no longer than 0.02
seconds on my machine for the Git repository. However, the results are
slightly more interesting in a copy of the Linux kernel repository:
Test
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref 0.12
1500.3: ahead-behind counts: git branch 0.12
1500.4: ahead-behind counts: git tag 0.12
1500.5: contains: git for-each-ref --merged 0.04
1500.6: contains: git branch --merged 0.04
1500.7: contains: git tag --merged 0.04
1500.8: is-base check: test-tool reach (refs) 0.03
1500.9: is-base check: test-tool reach (tags) 0.03
1500.10: is-base check: git for-each-ref 0.03
1500.11: is-base check: git for-each-ref (disjoint-base) 0.07
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/perf/p1500-graph-walks.sh | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
index e14e7620cce..5b23ce5db93 100755
--- a/t/perf/p1500-graph-walks.sh
+++ b/t/perf/p1500-graph-walks.sh
@@ -20,6 +20,21 @@ test_expect_success 'setup' '
echo tag-$ref ||
return 1
done >tags &&
+
+ echo "A:HEAD" >test-tool-refs &&
+ for line in $(cat refs)
+ do
+ echo "X:$line" >>test-tool-refs || return 1
+ done &&
+ echo "A:HEAD" >test-tool-tags &&
+ for line in $(cat tags)
+ do
+ echo "X:$line" >>test-tool-tags || return 1
+ done &&
+
+ commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
+ git update-ref refs/heads/disjoint-base $commit &&
+
git commit-graph write --reachable
'
@@ -47,4 +62,20 @@ test_perf 'contains: git tag --merged' '
xargs git tag --merged=HEAD <tags
'
+test_perf 'is-base check: test-tool reach (refs)' '
+ test-tool reach get_branch_base_for_tip <test-tool-refs
+'
+
+test_perf 'is-base check: test-tool reach (tags)' '
+ test-tool reach get_branch_base_for_tip <test-tool-tags
+'
+
+test_perf 'is-base check: git for-each-ref' '
+ git for-each-ref --format="%(is-base:HEAD)" --stdin <refs
+'
+
+test_perf 'is-base check: git for-each-ref (disjoint-base)' '
+ git for-each-ref --format="%(is-base:refs/heads/disjoint-base)" --stdin <refs
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v3 0/4] git for-each-ref: is-base atom and base branches
2024-08-14 10:31 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2024-08-14 10:31 ` [PATCH v3 4/4] p1500: add is-base performance tests Derrick Stolee via GitGitGadget
@ 2024-08-19 19:52 ` Junio C Hamano
2024-08-20 1:33 ` Derrick Stolee
4 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2024-08-19 19:52 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, vdye, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> There are benefits to users both on client-side and server-side. In an
> internal monorepo, this base branch detection algorithm is used to determine
> a long-lived branch based on the HEAD commit, mapping to a group within the
> organizational structure of the repository, which determines a set of
> projects that the user will likely need to build; this leads to
> automatically selecting an initial sparse-checkout definition based on the
> build dependencies required. An upcoming feature in Azure Repos will use
> this algorithm to automatically create a pull request against the correct
> target branch, reducing user pain from needing to select a different branch
> after a large commit diff is rendered against the default branch. This atom
> unlocks that ability for Git hosting services that use Git in their backend.
Thanks for an update. This iteration looks good to me.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 0/4] git for-each-ref: is-base atom and base branches
2024-08-19 19:52 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Junio C Hamano
@ 2024-08-20 1:33 ` Derrick Stolee
0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2024-08-20 1:33 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, vdye
On 8/19/24 3:52 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> There are benefits to users both on client-side and server-side. In an
>> internal monorepo, this base branch detection algorithm is used to determine
>> a long-lived branch based on the HEAD commit, mapping to a group within the
>> organizational structure of the repository, which determines a set of
>> projects that the user will likely need to build; this leads to
>> automatically selecting an initial sparse-checkout definition based on the
>> build dependencies required. An upcoming feature in Azure Repos will use
>> this algorithm to automatically create a pull request against the correct
>> target branch, reducing user pain from needing to select a different branch
>> after a large commit diff is rendered against the default branch. This atom
>> unlocks that ability for Git hosting services that use Git in their backend.
>
> Thanks for an update. This iteration looks good to me.
Thank you for your careful review.
-Stolee
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2024-08-20 1:33 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-01 22:10 [PATCH 0/3] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
2024-08-01 22:10 ` [PATCH 1/3] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
2024-08-01 22:10 ` [PATCH 2/3] for-each-ref: add 'is-base' token Derrick Stolee via GitGitGadget
2024-08-01 22:10 ` [PATCH 3/3] p1500: add is-base performance tests Derrick Stolee via GitGitGadget
2024-08-01 23:06 ` [PATCH 0/3] git for-each-ref: is-base atom and base branches Junio C Hamano
2024-08-02 14:32 ` Derrick Stolee
2024-08-02 16:55 ` Junio C Hamano
2024-08-02 17:30 ` Junio C Hamano
2024-08-11 17:34 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-08-11 17:34 ` [PATCH v2 1/3] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
2024-08-12 20:30 ` Junio C Hamano
2024-08-13 13:39 ` Derrick Stolee
2024-08-11 17:34 ` [PATCH v2 2/3] for-each-ref: add 'is-base' token Derrick Stolee via GitGitGadget
2024-08-12 21:05 ` Junio C Hamano
2024-08-13 13:44 ` Derrick Stolee
2024-08-11 17:34 ` [PATCH v2 3/3] p1500: add is-base performance tests Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 1/4] commit-reach: add get_branch_base_for_tip Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 2/4] commit: add gentle reference lookup method Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 3/4] for-each-ref: add 'is-base' token Derrick Stolee via GitGitGadget
2024-08-14 10:31 ` [PATCH v3 4/4] p1500: add is-base performance tests Derrick Stolee via GitGitGadget
2024-08-19 19:52 ` [PATCH v3 0/4] git for-each-ref: is-base atom and base branches Junio C Hamano
2024-08-20 1:33 ` Derrick Stolee
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).