public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, Johannes Sixt <j6t@kdbg.org>,
	Derrick Stolee <stolee@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v2] revision: add --maximal-only option
Date: Thu, 22 Jan 2026 16:05:58 +0000	[thread overview]
Message-ID: <pull.2032.v2.git.1769097958549.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2032.git.1768703645125.gitgitgadget@gmail.com>

From: Derrick Stolee <stolee@gmail.com>

When inspecting a range of commits from some set of starting references, it
is sometimes useful to learn which commits are not reachable from any other
commits in the selected range.

One such application is in the creation of a sequence of bundles for the
bundle URI feature. Creating a stack of bundles representing different
slices of time includes defining which references to include. If all
references are used, then this may be overwhelming or redundant. Instead,
selecting commits that are maximal to the range could help defining a
smaller reference set to use in the bundle header.

Add a new '--maximal-only' option to restrict the output of a revision range
to be only the commits that are not reachable from any other commit in the
range, based on the reachability definition of the walk.

This is accomplished by adding a new 28th bit flag, CHILD_VISITED, that is
set as we walk. This does extend the bit range in object.h, but using an
earlier bit may collide with another feature.

The tests demonstrate the behavior of the feature with a positive-only
range, ranges with negative references, and walk-modifying flags like
--first-parent and --exclude-first-parent-only.

Since the --boundary option would not increase any results when used with
the --maximal-only option, mark them as incompatible.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
    revision: add --maximal-only option
    
    My motivation for this feature is very similar to the bundle URI
    application. I can get around it by creating a tool that uses git
    rev-list --parents and then uses a hashset to collect the parent list
    and filter out any commits that ever appear as parents. It would be more
    efficient to use Git's native revision-walking feature.
    
    This does bring the object struct up to a 32-bit boundary with 28 flag
    bits, 3 type bits, and a parsed bit. That's the biggest concern I have
    about this update adding a new flag bit. I would understand if this
    feature is not worth running out of room for extensions there.
    
    I considered looking through the earlier bit positions to see the impact
    of an overlap, but they certainly looked potentially risky to reuse.
    
    I wonder if anyone else has thought about this as a useful technique.
    For instance, it could be part of a strategy for choosing commits for
    reachability bitmaps.
    
    
    Updates in v2
    =============
    
     * option is now called --maximal-only.
     * Documentation is moved within the commit-filtering options not the
       walk-altering options.
     * --boundary and --maximal-only are marked as incompatible.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2032%2Fderrickstolee%2Fmaximal-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2032/derrickstolee/maximal-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2032

Range-diff vs v1:

 1:  889a2737bc ! 1:  54fbf36a1f revision: add --maximal option
     @@ Metadata
      Author: Derrick Stolee <stolee@gmail.com>
      
       ## Commit message ##
     -    revision: add --maximal option
     +    revision: add --maximal-only option
      
          When inspecting a range of commits from some set of starting references, it
          is sometimes useful to learn which commits are not reachable from any other
     @@ Commit message
          selecting commits that are maximal to the range could help defining a
          smaller reference set to use in the bundle header.
      
     -    Add a new '--maximal' option to restrict the output of a revision range to
     -    be only the commits that are not reachable from any other commit in the
     +    Add a new '--maximal-only' option to restrict the output of a revision range
     +    to be only the commits that are not reachable from any other commit in the
          range, based on the reachability definition of the walk.
      
          This is accomplished by adding a new 28th bit flag, CHILD_VISITED, that is
     @@ Commit message
          range, ranges with negative references, and walk-modifying flags like
          --first-parent and --exclude-first-parent-only.
      
     +    Since the --boundary option would not increase any results when used with
     +    the --maximal-only option, mark them as incompatible.
     +
          Signed-off-by: Derrick Stolee <stolee@gmail.com>
      
       ## Documentation/rev-list-options.adoc ##
     -@@ Documentation/rev-list-options.adoc: The following options affect the way the simplification is performed:
     - 	times; if so, a commit is included if it is any of the commits
     - 	given or if it is an ancestor or descendant of one of them.
     +@@ Documentation/rev-list-options.adoc: endif::git-log[]
     + 	from the point where it diverged from the remote branch, given
     + 	that arbitrary merges can be valid topic branch changes.
       
     -+`--maximal`::
     ++`--maximal-only`::
      +	Restrict the output commits to be those that are not reachable
      +	from any other commits in the revision range.
      +
     - A more detailed explanation follows.
     - 
     - Suppose you specified `foo` as the _<paths>_.  We shall call commits
     + `--not`::
     + 	Reverses the meaning of the '{caret}' prefix (or lack thereof)
     + 	for all following revision specifiers, up to the next `--not`.
      
       ## object.h ##
      @@ object.h: void object_array_init(struct object_array *array);
     @@ revision.c: static int process_parents(struct rev_info *revs, struct commit *com
       			p->object.flags |= (SEEN | NOT_USER_GIVEN);
       			if (list)
      @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
     + 	} else if ((argcount = parse_long_opt("until", argv, &optarg))) {
     + 		revs->min_age = approxidate(optarg);
     + 		return argcount;
     ++	} else if (!strcmp(arg, "--maximal-only")) {
     ++		revs->maximal_only = 1;
     + 	} else if (!strcmp(arg, "--first-parent")) {
       		revs->first_parent_only = 1;
       	} else if (!strcmp(arg, "--exclude-first-parent-only")) {
     - 		revs->exclude_first_parent_only = 1;
     -+	} else if (!strcmp(arg, "--maximal")) {
     -+		revs->maximal = 1;
     - 	} else if (!strcmp(arg, "--ancestry-path")) {
     - 		revs->ancestry_path = 1;
     - 		revs->simplify_history = 0;
     +@@ revision.c: int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
     + 				  !!revs->reverse, "--reverse",
     + 				  !!revs->reflog_info, "--walk-reflogs");
     + 
     ++	die_for_incompatible_opt2(!!revs->boundary, "--boundary",
     ++				  !!revs->maximal_only, "--maximal-only");
     ++
     + 	if (revs->no_walk && revs->graph)
     + 		die(_("options '%s' and '%s' cannot be used together"), "--no-walk", "--graph");
     + 	if (!revs->reflog_info && revs->grep_filter.use_reflog_filter)
      @@ revision.c: enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi
       {
       	if (commit->object.flags & SHOWN)
       		return commit_ignore;
     -+	if (revs->maximal && (commit->object.flags & CHILD_VISITED))
     ++	if (revs->maximal_only && (commit->object.flags & CHILD_VISITED))
      +		return commit_ignore;
       	if (revs->unpacked && has_object_pack(revs->repo, &commit->object.oid))
       		return commit_ignore;
     @@ revision.h
       #define DECORATE_SHORT_REFS	1
       #define DECORATE_FULL_REFS	2
      @@ revision.h: struct rev_info {
     - 			cherry_mark:1,
     - 			bisect:1,
     - 			ancestry_path:1,
     -+			maximal:1,
     + 			left_right:1,
     + 			left_only:1,
     + 			right_only:1,
     ++			maximal_only:1,
     + 			rewrite_parents:1,
     + 			print_parents:1,
     + 			show_decorations:1,
     +
     + ## t/t6000-rev-list-misc.sh ##
     +@@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list -z --boundary' '
     + 	test_cmp expect actual
     + '
       
     - 			/* True if --ancestry-path was specified without an
     - 			 * argument. The bottom revisions are implicitly
     ++test_expect_success 'rev-list --boundary incompatible with --maximal-only' '
     ++	test_when_finished rm -rf repo &&
     ++
     ++	git init repo &&
     ++	test_commit -C repo 1 &&
     ++	test_commit -C repo 2 &&
     ++
     ++	oid1=$(git -C repo rev-parse HEAD~) &&
     ++	oid2=$(git -C repo rev-parse HEAD) &&
     ++
     ++	test_must_fail git -C repo rev-list --boundary --maximal-only \
     ++		HEAD~1..HEAD 2>err &&
     ++	test_grep "cannot be used together" err
     ++'
     ++
     + test_done
      
       ## t/t6600-test-reach.sh ##
      @@ t/t6600-test-reach.sh: test_expect_success 'for-each-ref is-base: --sort' '
       		--sort=refname --sort=-is-base:commit-2-3
       '
       
     -+test_expect_success 'rev-list --maximal (all positive)' '
     ++test_expect_success 'rev-list --maximal-only (all positive)' '
      +	# Only one maximal.
      +	cat >input <<-\EOF &&
      +	refs/heads/commit-1-1
     @@ t/t6600-test-reach.sh: test_expect_success 'for-each-ref is-base: --sort' '
      +	cat >expect <<-EOF &&
      +	$(git rev-parse refs/heads/commit-8-4)
      +	EOF
     -+	run_all_modes git rev-list --maximal --stdin &&
     ++	run_all_modes git rev-list --maximal-only --stdin &&
      +
      +	# All maximal.
      +	cat >input <<-\EOF &&
     @@ t/t6600-test-reach.sh: test_expect_success 'for-each-ref is-base: --sort' '
      +	$(git rev-parse refs/heads/commit-3-4)
      +	$(git rev-parse refs/heads/commit-2-5)
      +	EOF
     -+	run_all_modes git rev-list --maximal --stdin &&
     ++	run_all_modes git rev-list --maximal-only --stdin &&
      +
      +	# Mix of both.
      +	cat >input <<-\EOF &&
     @@ t/t6600-test-reach.sh: test_expect_success 'for-each-ref is-base: --sort' '
      +	$(git rev-parse refs/heads/commit-5-2)
      +	$(git rev-parse refs/heads/commit-2-5)
      +	EOF
     -+	run_all_modes git rev-list --maximal --stdin
     ++	run_all_modes git rev-list --maximal-only --stdin
      +'
      +
     -+test_expect_success 'rev-list --maximal (range)' '
     ++test_expect_success 'rev-list --maximal-only (range)' '
      +	cat >input <<-\EOF &&
      +	refs/heads/commit-1-1
      +	refs/heads/commit-2-5
     @@ t/t6600-test-reach.sh: test_expect_success 'for-each-ref is-base: --sort' '
      +	cat >expect <<-EOF &&
      +	$(git rev-parse refs/heads/commit-6-4)
      +	EOF
     -+	run_all_modes git rev-list --maximal --stdin &&
     ++	run_all_modes git rev-list --maximal-only --stdin &&
      +
      +	# first-parent changes reachability: the first parent
      +	# reduces the second coordinate to 1 before reducing the
     @@ t/t6600-test-reach.sh: test_expect_success 'for-each-ref is-base: --sort' '
      +	$(git rev-parse refs/heads/commit-6-4)
      +	$(git rev-parse refs/heads/commit-2-5)
      +	EOF
     -+	run_all_modes git rev-list --maximal --stdin \
     ++	run_all_modes git rev-list --maximal-only --stdin \
      +		--first-parent --exclude-first-parent-only
      +'
      +


 Documentation/rev-list-options.adoc |  4 ++
 object.h                            |  4 +-
 revision.c                          | 12 ++++-
 revision.h                          |  5 +-
 t/t6000-rev-list-misc.sh            | 15 ++++++
 t/t6600-test-reach.sh               | 75 +++++++++++++++++++++++++++++
 6 files changed, 110 insertions(+), 5 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 453ec59057..a39cf88bbc 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -148,6 +148,10 @@ endif::git-log[]
 	from the point where it diverged from the remote branch, given
 	that arbitrary merges can be valid topic branch changes.
 
+`--maximal-only`::
+	Restrict the output commits to be those that are not reachable
+	from any other commits in the revision range.
+
 `--not`::
 	Reverses the meaning of the '{caret}' prefix (or lack thereof)
 	for all following revision specifiers, up to the next `--not`.
diff --git a/object.h b/object.h
index 4bca957b8d..dfe7a1f0ea 100644
--- a/object.h
+++ b/object.h
@@ -64,7 +64,7 @@ void object_array_init(struct object_array *array);
 
 /*
  * object flag allocation:
- * revision.h:               0---------10         15               23------27
+ * revision.h:               0---------10         15               23--------28
  * fetch-pack.c:             01    67
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
@@ -86,7 +86,7 @@ void object_array_init(struct object_array *array);
  * builtin/unpack-objects.c:                                 2021
  * pack-bitmap.h:                                              2122
  */
-#define FLAG_BITS  28
+#define FLAG_BITS  29
 
 #define TYPE_BITS 3
 
diff --git a/revision.c b/revision.c
index 1858e093ee..2dee78b838 100644
--- a/revision.c
+++ b/revision.c
@@ -1150,7 +1150,8 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 			struct commit *p = parent->item;
 			parent = parent->next;
 			if (p)
-				p->object.flags |= UNINTERESTING;
+				p->object.flags |= UNINTERESTING |
+						   CHILD_VISITED;
 			if (repo_parse_commit_gently(revs->repo, p, 1) < 0)
 				continue;
 			if (p->parents)
@@ -1204,7 +1205,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 			if (!*slot)
 				*slot = *revision_sources_at(revs->sources, commit);
 		}
-		p->object.flags |= pass_flags;
+		p->object.flags |= pass_flags | CHILD_VISITED;
 		if (!(p->object.flags & SEEN)) {
 			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
@@ -2377,6 +2378,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if ((argcount = parse_long_opt("until", argv, &optarg))) {
 		revs->min_age = approxidate(optarg);
 		return argcount;
+	} else if (!strcmp(arg, "--maximal-only")) {
+		revs->maximal_only = 1;
 	} else if (!strcmp(arg, "--first-parent")) {
 		revs->first_parent_only = 1;
 	} else if (!strcmp(arg, "--exclude-first-parent-only")) {
@@ -3147,6 +3150,9 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 				  !!revs->reverse, "--reverse",
 				  !!revs->reflog_info, "--walk-reflogs");
 
+	die_for_incompatible_opt2(!!revs->boundary, "--boundary",
+				  !!revs->maximal_only, "--maximal-only");
+
 	if (revs->no_walk && revs->graph)
 		die(_("options '%s' and '%s' cannot be used together"), "--no-walk", "--graph");
 	if (!revs->reflog_info && revs->grep_filter.use_reflog_filter)
@@ -4125,6 +4131,8 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi
 {
 	if (commit->object.flags & SHOWN)
 		return commit_ignore;
+	if (revs->maximal_only && (commit->object.flags & CHILD_VISITED))
+		return commit_ignore;
 	if (revs->unpacked && has_object_pack(revs->repo, &commit->object.oid))
 		return commit_ignore;
 	if (revs->no_kept_objects) {
diff --git a/revision.h b/revision.h
index b36acfc2d9..69242ecb18 100644
--- a/revision.h
+++ b/revision.h
@@ -52,7 +52,9 @@
 #define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
 #define ANCESTRY_PATH	(1u<<27)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR | PULL_MERGE)
+#define CHILD_VISITED	(1u<<28)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR \
+				      | PULL_MERGE | CHILD_VISITED)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
@@ -189,6 +191,7 @@ struct rev_info {
 			left_right:1,
 			left_only:1,
 			right_only:1,
+			maximal_only:1,
 			rewrite_parents:1,
 			print_parents:1,
 			show_decorations:1,
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index fec16448cf..d0a2a86610 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -248,4 +248,19 @@ test_expect_success 'rev-list -z --boundary' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list --boundary incompatible with --maximal-only' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD~) &&
+	oid2=$(git -C repo rev-parse HEAD) &&
+
+	test_must_fail git -C repo rev-list --boundary --maximal-only \
+		HEAD~1..HEAD 2>err &&
+	test_grep "cannot be used together" err
+'
+
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 6638d1aa1d..2613075894 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -762,4 +762,79 @@ test_expect_success 'for-each-ref is-base: --sort' '
 		--sort=refname --sort=-is-base:commit-2-3
 '
 
+test_expect_success 'rev-list --maximal-only (all positive)' '
+	# Only one maximal.
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	refs/heads/commit-8-4
+	EOF
+
+	cat >expect <<-EOF &&
+	$(git rev-parse refs/heads/commit-8-4)
+	EOF
+	run_all_modes git rev-list --maximal-only --stdin &&
+
+	# All maximal.
+	cat >input <<-\EOF &&
+	refs/heads/commit-5-2
+	refs/heads/commit-4-3
+	refs/heads/commit-3-4
+	refs/heads/commit-2-5
+	EOF
+
+	cat >expect <<-EOF &&
+	$(git rev-parse refs/heads/commit-5-2)
+	$(git rev-parse refs/heads/commit-4-3)
+	$(git rev-parse refs/heads/commit-3-4)
+	$(git rev-parse refs/heads/commit-2-5)
+	EOF
+	run_all_modes git rev-list --maximal-only --stdin &&
+
+	# Mix of both.
+	cat >input <<-\EOF &&
+	refs/heads/commit-5-2
+	refs/heads/commit-3-2
+	refs/heads/commit-2-5
+	EOF
+
+	cat >expect <<-EOF &&
+	$(git rev-parse refs/heads/commit-5-2)
+	$(git rev-parse refs/heads/commit-2-5)
+	EOF
+	run_all_modes git rev-list --maximal-only --stdin
+'
+
+test_expect_success 'rev-list --maximal-only (range)' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-5
+	refs/heads/commit-6-4
+	^refs/heads/commit-4-5
+	EOF
+
+	cat >expect <<-EOF &&
+	$(git rev-parse refs/heads/commit-6-4)
+	EOF
+	run_all_modes git rev-list --maximal-only --stdin &&
+
+	# first-parent changes reachability: the first parent
+	# reduces the second coordinate to 1 before reducing the
+	# first coordinate.
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-5
+	refs/heads/commit-6-4
+	^refs/heads/commit-4-5
+	EOF
+
+	cat >expect <<-EOF &&
+	$(git rev-parse refs/heads/commit-6-4)
+	$(git rev-parse refs/heads/commit-2-5)
+	EOF
+	run_all_modes git rev-list --maximal-only --stdin \
+		--first-parent --exclude-first-parent-only
+'
+
 test_done

base-commit: b5c409c40f1595e3e590760c6f14a16b6683e22c
-- 
gitgitgadget

  parent reply	other threads:[~2026-01-22 16:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-18  2:34 [PATCH] revision: add --maximal option Derrick Stolee via GitGitGadget
2026-01-18  9:05 ` Johannes Sixt
2026-01-18 18:27   ` Derrick Stolee
2026-01-19 11:15     ` Johannes Sixt
2026-01-19 16:44       ` Derrick Stolee
2026-01-19 19:05         ` Johannes Sixt
2026-01-20  0:22       ` Junio C Hamano
2026-01-22 15:08         ` Derrick Stolee
2026-01-22 16:05 ` Derrick Stolee via GitGitGadget [this message]
2026-01-22 21:44   ` [PATCH v2] revision: add --maximal-only option Junio C Hamano
2026-01-22 22:15     ` Derrick Stolee
2026-01-22 23:11       ` Junio C Hamano
2026-01-23  6:38       ` Johannes Sixt
2026-01-23 15:58         ` Junio C Hamano
2026-01-23 16:55           ` Derrick Stolee
2026-01-23 18:08             ` Junio C Hamano
2026-01-28 14:28               ` Derrick Stolee
2026-01-29  0:14                 ` Junio C Hamano
2026-01-29 14:57                   ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2032.v2.git.1769097958549.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox