git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible
@ 2025-04-11 23:26 Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
                   ` (12 more replies)
  0 siblings, 13 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

This is a short-ish series I wrote today while thinking through an idea
that Peff and I were talking about yesterday that allows us to avoid
MIDX'ing any cruft pack(s) in a repository when repacking.

The core of the idea is to introduce a variant of the '--stdin-packs'
option in 'pack-objects'. The existing behavior is to create a pack
whose contents is the set difference between the specified included and
exclude packs. The new mode (which I'm calling --stdin-packs=follow)
tweaks the namehash traversal we do at the end of --stdin-packs to also
pick up and pack objects which were reachable from commits in the above
set difference, but don't appear in the included or excluded pack.

If you repack consistently using this strategy, you can guarantee that
the union of geometrically-repacked packs are closed under reachability
without having to keep track of any cruft pack(s) in the MIDX.

I'm pretty sure that this is all sound, having played with it for the
better part of the day and not being able to come up with any
counter-examples. I'm sending this as an RFC because I'm not sure if
there's an obvious case that I am missing that makes this whole idea
bogus.

Code-review is welcome, but I think at this stage it may be more useful
to center the discussion around whether or not the idea makes sense
first.

Thanks in advance :-).

Taylor Blau (8):
  pack-objects: use standard option incompatibility functions
  pack-objects: limit scope in 'add_object_entry_from_pack()'
  pack-objects: factor out handling '--stdin-packs'
  pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  pack-objects: perform name-hash traversal for unpacked objects
  pack-objects: introduce '--stdin-packs=follow'
  repack: keep track of existing MIDX'd packs
  repack: exclude cruft pack(s) from the MIDX where possible

 Documentation/git-pack-objects.adoc |   8 +-
 builtin/pack-objects.c              | 193 +++++++++++++++++-----------
 builtin/repack.c                    |  97 +++++++++++---
 t/t5331-pack-objects-stdin.sh       | 103 ++++++++++++++-
 t/t7704-repack-cruft.sh             |  70 ++++++++++
 5 files changed, 376 insertions(+), 95 deletions(-)


base-commit: 485f5f863615e670fd97ae40af744e14072cfe18
-- 
2.49.0.229.g19b69c1246

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [RFC PATCH 1/8] pack-objects: use standard option incompatibility functions
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
@ 2025-04-11 23:26 ` Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 2/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

pack-objects has a handful of explicit checks for pairs of command-line
options which are mutually incompatible. Many of these pre-date
a699367bb8 (i18n: factorize more 'incompatible options' messages,
2022-01-31).

Convert the explicit checks into die_for_incompatible_opt2() calls,
which simplifies the implementation and standardizes pack-objects'
output when given incompatible options (e.g., --stdin-packs with
--filter gives different output than --keep-unreachable with
--unpack-unreachable).

There is one minor piece of test fallout in t5331 that expects the old
format, which has been corrected.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c        | 19 ++++++++++---------
 t/t5331-pack-objects-stdin.sh |  2 +-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6b06d159d2..aaea968ed2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4651,9 +4651,10 @@ int cmd_pack_objects(int argc,
 		strvec_push(&rp, "--unpacked");
 	}
 
-	if (exclude_promisor_objects && exclude_promisor_objects_best_effort)
-		die(_("options '%s' and '%s' cannot be used together"),
-		    "--exclude-promisor-objects", "--exclude-promisor-objects-best-effort");
+	die_for_incompatible_opt2(exclude_promisor_objects,
+				  "--exclude-promisor-objects",
+				  exclude_promisor_objects_best_effort,
+				  "--exclude-promisor-objects-best-effort");
 	if (exclude_promisor_objects) {
 		use_internal_rev_list = 1;
 		fetch_if_missing = 0;
@@ -4691,13 +4692,13 @@ int cmd_pack_objects(int argc,
 	if (!pack_to_stdout && thin)
 		die(_("--thin cannot be used to build an indexable pack"));
 
-	if (keep_unreachable && unpack_unreachable)
-		die(_("options '%s' and '%s' cannot be used together"), "--keep-unreachable", "--unpack-unreachable");
+	die_for_incompatible_opt2(keep_unreachable, "--keep-unreachable",
+				  unpack_unreachable, "--unpack-unreachable");
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (stdin_packs && filter_options.choice)
-		die(_("cannot use --filter with --stdin-packs"));
+	die_for_incompatible_opt2(filter_options.choice, "--filter",
+				  stdin_packs, "--stdin-packs");
 
 	if (stdin_packs && use_internal_rev_list)
 		die(_("cannot use internal rev list with --stdin-packs"));
@@ -4705,8 +4706,8 @@ int cmd_pack_objects(int argc,
 	if (cruft) {
 		if (use_internal_rev_list)
 			die(_("cannot use internal rev list with --cruft"));
-		if (stdin_packs)
-			die(_("cannot use --stdin-packs with --cruft"));
+		die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
+					  cruft, "--cruft");
 	}
 
 	/*
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index b48c0cbe8f..4f5e2733a2 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -64,7 +64,7 @@ test_expect_success '--stdin-packs is incompatible with --filter' '
 		cd stdin-packs &&
 		test_must_fail git pack-objects --stdin-packs --stdout \
 			--filter=blob:none </dev/null 2>err &&
-		test_grep "cannot use --filter with --stdin-packs" err
+		test_grep "options .--filter. and .--stdin-packs. cannot be used together" err
 	)
 '
 
-- 
2.49.0.229.g19b69c1246


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH 2/8] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
@ 2025-04-11 23:26 ` Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 3/8] pack-objects: factor out handling '--stdin-packs' Taylor Blau
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

add_object_entry_from_pack() handles objects from identified packs by
checking their type, before adding commit objects as pending in the
subsequent traversal used by `--stdin-packs`.

There are a couple of quality-of-life refactorings that I noticed while
working in this area:

  - We declare 'revs' (given to us through the miscellaneous context
    argument) earlier in the "if (p)" conditional than is necessary.

  - The 'struct object_info' can use a designated initializer to fill in
    the structures type pointer, since that is the only field that we
    care about.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index aaea968ed2..540e5eba9e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3490,14 +3490,12 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 		return 0;
 
 	if (p) {
-		struct rev_info *revs = _data;
-		struct object_info oi = OBJECT_INFO_INIT;
-
-		oi.typep = &type;
+		struct object_info oi = { .typep = &type };
 		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
 			die(_("could not get type of object %s in pack %s"),
 			    oid_to_hex(oid), p->pack_name);
 		} else if (type == OBJ_COMMIT) {
+			struct rev_info *revs = _data;
 			/*
 			 * commits in included packs are used as starting points for the
 			 * subsequent revision walk
-- 
2.49.0.229.g19b69c1246


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH 3/8] pack-objects: factor out handling '--stdin-packs'
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 2/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
@ 2025-04-11 23:26 ` Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 4/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

At the bottom of cmd_pack_objects() we check which mode the command is
running in (e.g., generating a cruft pack, handling '--stdin-packs',
using the internal rev-list, etc.) and handle the mode appropriately.

The '--stdin-packs' case is handled inline (dating back to its
introduction in 339bce27f4 (builtin/pack-objects.c: add '--stdin-packs'
option, 2021-02-22)) since it is relatively short. Extract the body of
"if (stdin_packs)" into its own function to prepare for the
implementation to become lengthier in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 540e5eba9e..793d245721 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3672,6 +3672,17 @@ static void read_packs_list_from_stdin(void)
 	string_list_clear(&exclude_packs, 0);
 }
 
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin();
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+}
+
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
 				   struct packed_git *pack, off_t offset,
 				   const char *name, uint32_t mtime)
@@ -3767,7 +3778,6 @@ static void mark_pack_kept_in_core(struct string_list *packs, unsigned keep)
 	}
 }
 
-static void add_unreachable_loose_objects(void);
 static void add_objects_in_unpacked_packs(void);
 
 static void enumerate_cruft_objects(void)
@@ -4773,11 +4783,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		/* avoids adding objects in excluded packs */
-		ignore_packed_keep_in_core = 1;
-		read_packs_list_from_stdin();
-		if (rev_list_unpacked)
-			add_unreachable_loose_objects();
+		read_stdin_packs(rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
-- 
2.49.0.229.g19b69c1246


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH 4/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
                   ` (2 preceding siblings ...)
  2025-04-11 23:26 ` [RFC PATCH 3/8] pack-objects: factor out handling '--stdin-packs' Taylor Blau
@ 2025-04-11 23:26 ` Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 5/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
on each of the input packs, we do a reachability traversal to discover
names for any objects we picked up so we can generate name hash values
and hopefully get higher quality deltas as a result.

A future commit will change the purpose of this reachability traversal
to find and pack objects which are reachable from commits in the input
packs, but are packed in an unknown (not included nor excluded) pack.

Extract the code which initializes and performs the reachability
traversal to take place in the caller, not the callee, which prepares us
to share this code for the '--unpacked' case (see the function
add_unreachable_loose_objects() for more details).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 71 +++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 35 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 793d245721..1689cddd3a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3556,7 +3556,7 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
 		return 0;
 }
 
-static void read_packs_list_from_stdin(void)
+static void read_packs_list_from_stdin(struct rev_info *revs)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct string_list include_packs = STRING_LIST_INIT_DUP;
@@ -3564,24 +3564,6 @@ static void read_packs_list_from_stdin(void)
 	struct string_list_item *item = NULL;
 
 	struct packed_git *p;
-	struct rev_info revs;
-
-	repo_init_revisions(the_repository, &revs, NULL);
-	/*
-	 * Use a revision walk to fill in the namehash of objects in the include
-	 * packs. To save time, we'll avoid traversing through objects that are
-	 * in excluded packs.
-	 *
-	 * That may cause us to avoid populating all of the namehash fields of
-	 * all included objects, but our goal is best-effort, since this is only
-	 * an optimization during delta selection.
-	 */
-	revs.no_kept_objects = 1;
-	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
-	revs.blob_objects = 1;
-	revs.tree_objects = 1;
-	revs.tag_objects = 1;
-	revs.ignore_missing_links = 1;
 
 	while (strbuf_getline(&buf, stdin) != EOF) {
 		if (!buf.len)
@@ -3651,10 +3633,44 @@ static void read_packs_list_from_stdin(void)
 		struct packed_git *p = item->util;
 		for_each_object_in_pack(p,
 					add_object_entry_from_pack,
-					&revs,
+					revs,
 					FOR_EACH_OBJECT_PACK_ORDER);
 	}
 
+	strbuf_release(&buf);
+	string_list_clear(&include_packs, 0);
+	string_list_clear(&exclude_packs, 0);
+}
+
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	struct rev_info revs;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	/*
+	 * Use a revision walk to fill in the namehash of objects in the include
+	 * packs. To save time, we'll avoid traversing through objects that are
+	 * in excluded packs.
+	 *
+	 * That may cause us to avoid populating all of the namehash fields of
+	 * all included objects, but our goal is best-effort, since this is only
+	 * an optimization during delta selection.
+	 */
+	revs.no_kept_objects = 1;
+	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
+	revs.blob_objects = 1;
+	revs.tree_objects = 1;
+	revs.tag_objects = 1;
+	revs.ignore_missing_links = 1;
+
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin(&revs);
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 	traverse_commit_list(&revs,
@@ -3666,21 +3682,6 @@ static void read_packs_list_from_stdin(void)
 			   stdin_packs_found_nr);
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
 			   stdin_packs_hints_nr);
-
-	strbuf_release(&buf);
-	string_list_clear(&include_packs, 0);
-	string_list_clear(&exclude_packs, 0);
-}
-
-static void add_unreachable_loose_objects(void);
-
-static void read_stdin_packs(int rev_list_unpacked)
-{
-	/* avoids adding objects in excluded packs */
-	ignore_packed_keep_in_core = 1;
-	read_packs_list_from_stdin();
-	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
 }
 
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
-- 
2.49.0.229.g19b69c1246


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH 5/8] pack-objects: perform name-hash traversal for unpacked objects
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
                   ` (3 preceding siblings ...)
  2025-04-11 23:26 ` [RFC PATCH 4/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
@ 2025-04-11 23:26 ` Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 6/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

With '--unpacked', pack-objects adds loose objects (which don't appear
in any of the excluded packs from '--stdin-packs') to the output pack
without considering them as reachability tips for the name-hash
traversal.

This was an oversight in the original implementation of '--stdin-packs',
since the code which enumerates and adds loose objects to the output
pack (`add_unreachable_loose_objects()`) did not have access to the
'rev_info' struct found in `read_packs_list_from_stdin()`.

Excluding unpacked objects from that traversal doesn't effect the
correctness of the resulting pack, but it does make it harder to
discover good deltas for loose objects.

Now that the 'rev_info' struct is declared outside of
`read_packs_list_from_stdin()`, we can pass it to
`add_objects_in_unpacked_packs()` and add any loose objects as tips to
the above-mentioned traversal, in theory producing slightly tighter
packs as a result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1689cddd3a..2aa12da4af 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3642,7 +3642,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 	string_list_clear(&exclude_packs, 0);
 }
 
-static void add_unreachable_loose_objects(void);
+static void add_unreachable_loose_objects(struct rev_info *revs);
 
 static void read_stdin_packs(int rev_list_unpacked)
 {
@@ -3669,7 +3669,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	ignore_packed_keep_in_core = 1;
 	read_packs_list_from_stdin(&revs);
 	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(&revs);
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
@@ -3788,7 +3788,7 @@ static void enumerate_cruft_objects(void)
 						_("Enumerating cruft objects"), 0);
 
 	add_objects_in_unpacked_packs();
-	add_unreachable_loose_objects();
+	add_unreachable_loose_objects(NULL);
 
 	stop_progress(&progress_state);
 }
@@ -4066,8 +4066,9 @@ static void add_objects_in_unpacked_packs(void)
 }
 
 static int add_loose_object(const struct object_id *oid, const char *path,
-			    void *data UNUSED)
+			    void *data)
 {
+	struct rev_info *revs = data;
 	enum object_type type = oid_object_info(the_repository, oid, NULL);
 
 	if (type < 0) {
@@ -4088,6 +4089,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
 	} else {
 		add_object_entry(oid, type, "", 0);
 	}
+
+	if (revs && type == OBJ_COMMIT)
+		add_pending_oid(revs, NULL, oid, 0);
+
 	return 0;
 }
 
@@ -4096,11 +4101,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
  * add_object_entry will weed out duplicates, so we just add every
  * loose object we find.
  */
-static void add_unreachable_loose_objects(void)
+static void add_unreachable_loose_objects(struct rev_info *revs)
 {
 	for_each_loose_file_in_objdir(repo_get_object_directory(the_repository),
-				      add_loose_object,
-				      NULL, NULL, NULL);
+				      add_loose_object, NULL, NULL, revs);
 }
 
 static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid)
@@ -4356,7 +4360,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
 	if (keep_unreachable)
 		add_objects_in_unpacked_packs();
 	if (pack_loose_unreachable)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(NULL);
 	if (unpack_unreachable)
 		loosen_unused_packed_objects();
 
-- 
2.49.0.229.g19b69c1246


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH 6/8] pack-objects: introduce '--stdin-packs=follow'
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
                   ` (4 preceding siblings ...)
  2025-04-11 23:26 ` [RFC PATCH 5/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
@ 2025-04-11 23:26 ` Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 7/8] repack: keep track of existing MIDX'd packs Taylor Blau
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

When invoked with '--stdin-packs', pack-objects will generate a pack
which contains the objects found in the "included" packs, less any
objects from "excluded" packs.

Packs that exist in the repository but weren't specified as either
included or excluded are in practice treated like the latter, at least
in the sense that pack-objects won't include objects from those packs.
This behavior forces us to include any cruft pack(s) in a repository's
multi-pack index for the reasons described in ddee3703b3
(builtin/repack.c: add cruft packs to MIDX during geometric repack,
2022-05-20).

The full details are in ddee3703b3, but the gist is if you
have a once-unreachable object in a cruft pack which later becomes
reachable via one or more commits in a pack generated with
'--stdin-packs', you *have* to include that object in the MIDX via the
copy in the cruft pack, otherwise we cannot generate reachability
bitmaps for any commits which reach that object.

This prepares us for new repacking behavior which will "resurrect"
objects found in cruft or otherwise unspecified packs when generating
new packs. In the context of geometric repacking, this may be used to
maintain a sequence of geometrically-repacked packs, the union of which
is closed under reachability, even in the case described earlier.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc |   8 ++-
 builtin/pack-objects.c              |  89 +++++++++++++++++-------
 t/t5331-pack-objects-stdin.sh       | 101 ++++++++++++++++++++++++++++
 3 files changed, 171 insertions(+), 27 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 7f69ae4855..c894582799 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -87,13 +87,19 @@ base-name::
 	reference was included in the resulting packfile.  This
 	can be useful to send new tags to native Git clients.
 
---stdin-packs::
+--stdin-packs[=<mode>]::
 	Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
 	from the standard input, instead of object names or revision
 	arguments. The resulting pack contains all objects listed in the
 	included packs (those not beginning with `^`), excluding any
 	objects listed in the excluded packs (beginning with `^`).
 +
+When `mode` is "follow", pack objects which are reachable from objects
+in the included packs, but appear in packs that are not listed.
+Reachable objects which appear in excluded packs are not packed. Useful
+for resurrecting once-cruft objects to generate packs which are closed
+under reachability up to the excluded packs.
++
 Incompatible with `--revs`, or options that imply `--revs` (such as
 `--all`), with the exception of `--unpacked`, which is compatible.
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 2aa12da4af..6406f4a5b1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -272,6 +272,12 @@ static struct oidmap configured_exclusions;
 static struct oidset excluded_by_config;
 static int name_hash_version = -1;
 
+enum stdin_packs_mode {
+	STDIN_PACKS_MODE_NONE,
+	STDIN_PACKS_MODE_STANDARD,
+	STDIN_PACKS_MODE_FOLLOW,
+};
+
 /**
  * Check whether the name_hash_version chosen by user input is appropriate,
  * and also validate whether it is compatible with other features.
@@ -3511,32 +3517,43 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 	return 0;
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
-{
-	/* nothing to do; commits don't have a namehash */
-}
-
 static void show_object_pack_hint(struct object *object, const char *name,
-				  void *data UNUSED)
+				  void *data)
 {
-	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
-	if (!oe)
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		add_object_entry(&object->oid, object->type, name, 0);
+	} else {
+		struct object_entry *oe = packlist_find(&to_pack, &object->oid);
+		if (!oe)
+			return;
+
+		/*
+		 * Our 'to_pack' list was constructed by iterating all
+		 * objects packed in included packs, and so doesn't
+		 * have a non-zero hash field that you would typically
+		 * pick up during a reachability traversal.
+		 *
+		 * Make a best-effort attempt to fill in the ->hash
+		 * and ->no_try_delta here using a now in order to
+		 * perhaps improve the delta selection process.
+		 */
+		oe->hash = pack_name_hash_fn(name);
+		oe->no_try_delta = name && no_try_delta(name);
+
+		stdin_packs_hints_nr++;
+	}
+}
+
+static void show_commit_pack_hint(struct commit *commit, void *data)
+{
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		show_object_pack_hint((struct object *)commit, "", data);
 		return;
+	}
+	/* nothing to do; commits don't have a namehash */
 
-	/*
-	 * Our 'to_pack' list was constructed by iterating all objects packed in
-	 * included packs, and so doesn't have a non-zero hash field that you
-	 * would typically pick up during a reachability traversal.
-	 *
-	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * here using a now in order to perhaps improve the delta selection
-	 * process.
-	 */
-	oe->hash = pack_name_hash_fn(name);
-	oe->no_try_delta = name && no_try_delta(name);
-
-	stdin_packs_hints_nr++;
 }
 
 static int pack_mtime_cmp(const void *_a, const void *_b)
@@ -3644,7 +3661,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 
 static void add_unreachable_loose_objects(struct rev_info *revs);
 
-static void read_stdin_packs(int rev_list_unpacked)
+static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 {
 	struct rev_info revs;
 
@@ -3676,7 +3693,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	traverse_commit_list(&revs,
 			     show_commit_pack_hint,
 			     show_object_pack_hint,
-			     NULL);
+			     &mode);
 
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
 			   stdin_packs_found_nr);
@@ -4467,6 +4484,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
 	return is_not_in_promisor_pack_obj((struct object *) commit, data);
 }
 
+static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
+				  int unset)
+{
+	enum stdin_packs_mode *mode = opt->value;
+
+	if (unset)
+		*mode = STDIN_PACKS_MODE_NONE;
+	else if (!arg || !*arg)
+		*mode = STDIN_PACKS_MODE_STANDARD;
+	else if (!strcmp(arg, "follow"))
+		*mode = STDIN_PACKS_MODE_FOLLOW;
+	else
+		die(_("invalid value for '%s': '%s'"), opt->long_name, arg);
+
+	return 0;
+}
+
 int cmd_pack_objects(int argc,
 		     const char **argv,
 		     const char *prefix,
@@ -4478,7 +4512,7 @@ int cmd_pack_objects(int argc,
 	struct strvec rp = STRVEC_INIT;
 	int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
 	int rev_list_index = 0;
-	int stdin_packs = 0;
+	enum stdin_packs_mode stdin_packs = STDIN_PACKS_MODE_NONE;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
 	struct list_objects_filter_options filter_options =
 		LIST_OBJECTS_FILTER_INIT;
@@ -4533,6 +4567,9 @@ int cmd_pack_objects(int argc,
 		OPT_SET_INT_F(0, "indexed-objects", &rev_list_index,
 			      N_("include objects referred to by the index"),
 			      1, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F(0, "stdin-packs", &stdin_packs, N_("mode"),
+			     N_("read packs from stdin"),
+			     PARSE_OPT_OPTARG, parse_stdin_packs_mode),
 		OPT_BOOL(0, "stdin-packs", &stdin_packs,
 			 N_("read packs from stdin")),
 		OPT_BOOL(0, "stdout", &pack_to_stdout,
@@ -4788,7 +4825,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		read_stdin_packs(rev_list_unpacked);
+		read_stdin_packs(stdin_packs, rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 4f5e2733a2..f97d2d1b71 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -236,4 +236,105 @@ test_expect_success 'pack-objects --stdin with packfiles from main and alternate
 	test_cmp expected-objects actual-objects
 '
 
+packdir=.git/objects/pack
+
+objects_in_packs () {
+	for p in "$@"
+	do
+		git show-index <"$packdir/pack-$p.idx" || return 1
+	done >objects.raw &&
+
+	cut -d' ' -f2 objects.raw | sort &&
+	rm -f objects.raw
+}
+
+test_expect_success 'setup for --stdin-packs=follow' '
+	git init stdin-packs--follow &&
+	(
+		cd stdin-packs--follow &&
+
+		for c in A B C D
+		do
+			test_commit "$c" || return 1
+		done &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
+
+		git prune-packed
+	)
+'
+
+test_expect_success '--stdin-packs=follow walks into unknown packs' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+
+		for c in A B C D
+		do
+			test_commit "$c" || return 1
+		done &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
+
+		git prune-packed &&
+
+		cat >in <<-EOF &&
+		pack-$B.pack
+		^pack-$C.pack
+		pack-$D.pack
+		EOF
+
+		# With just --stdin-packs, pack "A" is unknown to us, so
+		# only objects from packs "B" and "D" are included in
+		# the output pack.
+		P=$(git pack-objects --stdin-packs $packdir/pack <in) &&
+		objects_in_packs $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# But with --stdin-packs=follow, objects from both
+		# included packs reach objects from the unknown pack, so
+		# objects from pack "A" is included in the output pack
+		# in addition to the above.
+		P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
+		objects_in_packs $A $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		test_commit E &&
+		# And with --unpacked, we will pick up objects from unknown
+		# packs that are reachable from loose objects. Loose object E
+		# reaches objects in pack A, but there are three excluded packs
+		# in between.
+		#
+		# The resulting pack should include objects reachable from E
+		# that are not present in packs B, C, or D, along with those
+		# present in pack A.
+		cat >in <<-EOF &&
+		^pack-$B.pack
+		^pack-$C.pack
+		^pack-$D.pack
+		EOF
+
+		P=$(git pack-objects --stdin-packs=follow --unpacked \
+			$packdir/pack <in) &&
+
+		{
+			objects_in_packs $A &&
+			git rev-list --objects --no-object-names D..E
+		}>expect.raw &&
+		sort expect.raw >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.49.0.229.g19b69c1246


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH 7/8] repack: keep track of existing MIDX'd packs
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
                   ` (5 preceding siblings ...)
  2025-04-11 23:26 ` [RFC PATCH 6/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
@ 2025-04-11 23:26 ` Taylor Blau
  2025-04-11 23:26 ` [RFC PATCH 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

The following commit will want to condition whether or not it generates
a pack during geometric repacking with '--stdin-packs=follow' based on
whether or not the existing MIDX has a cruft pack in it.

Keep track of that in the 'existing_packs' struct by adding an
additional flag bit to denote which packs appear in a MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index f3330ade7b..bc47bede7b 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -31,6 +31,7 @@
 
 #define DELETE_PACK 1
 #define RETAIN_PACK 2
+#define PACK_IN_MIDX 4
 
 static int pack_everything;
 static int delta_base_offset = 1;
@@ -161,6 +162,11 @@ static int pack_is_retained(struct string_list_item *item)
 	return (uintptr_t)item->util & RETAIN_PACK;
 }
 
+static void pack_mark_in_midx(struct string_list_item *item)
+{
+	item->util = (void*)((uintptr_t)item->util | PACK_IN_MIDX);
+}
+
 static void mark_packs_for_deletion_1(struct string_list *names,
 				      struct string_list *list)
 {
@@ -264,6 +270,7 @@ static void collect_pack_filenames(struct existing_packs *existing,
 	for (p = get_all_packs(the_repository); p; p = p->next) {
 		int i;
 		const char *base;
+		struct string_list_item *item;
 
 		if (!p->pack_local)
 			continue;
@@ -279,11 +286,17 @@ static void collect_pack_filenames(struct existing_packs *existing,
 		strbuf_strip_suffix(&buf, ".pack");
 
 		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep)
-			string_list_append(&existing->kept_packs, buf.buf);
+			item = string_list_append(&existing->kept_packs,
+						  buf.buf);
 		else if (p->is_cruft)
-			string_list_append(&existing->cruft_packs, buf.buf);
+			item = string_list_append(&existing->cruft_packs,
+						  buf.buf);
 		else
-			string_list_append(&existing->non_kept_packs, buf.buf);
+			item = string_list_append(&existing->non_kept_packs,
+						  buf.buf);
+
+		if (p->multi_pack_index)
+			pack_mark_in_midx(item);
 	}
 
 	string_list_sort(&existing->kept_packs);
-- 
2.49.0.229.g19b69c1246


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH 8/8] repack: exclude cruft pack(s) from the MIDX where possible
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
                   ` (6 preceding siblings ...)
  2025-04-11 23:26 ` [RFC PATCH 7/8] repack: keep track of existing MIDX'd packs Taylor Blau
@ 2025-04-11 23:26 ` Taylor Blau
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.

Suppose you have a once-unreachable object packed in a cruft pack, which
later on becomes reachable from one or more objects in a geometrically
repacked pack. That once-unreachable object *won't* appear in the new
pack, since the cruft pack was specified as neither included nor
excluded to 'pack-objects --stdin-packs'. If the bitmap selection
process picks one or more commits which reach the once-unreachable
objects, commit ddee3703b3 ensures that the MIDX will be closed under
reachability. Without it, we would fail to generate a MIDX bitmap.

ddee3703b3 alludes to the fact that this is sub-optimal by saying

    [...] it's desirable to avoid including cruft packs in the MIDX
    because it causes the MIDX to store a bunch of objects which are
    likely to get thrown away.

, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.

This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read, but object reads that go through the
MIDX have to search through them anyway.

This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.

Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with following as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with following is to see if there is a cruft
pack already in the MIDX.

If there is, then starting to generate packs with following during
geometric repacking won't work, since it's open to the same race as
described above.

But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.

Detect when this is the case and avoid including cruft packs in the MIDX
where possible.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c        | 78 ++++++++++++++++++++++++++++++++---------
 t/t7704-repack-cruft.sh | 70 ++++++++++++++++++++++++++++++++++++
 2 files changed, 131 insertions(+), 17 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index bc47bede7b..d0b88f12f6 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -167,6 +167,21 @@ static void pack_mark_in_midx(struct string_list_item *item)
 	item->util = (void*)((uintptr_t)item->util | PACK_IN_MIDX);
 }
 
+static int pack_is_in_midx(struct string_list_item *item)
+{
+	return (uintptr_t)item->util & PACK_IN_MIDX;
+}
+
+static int existing_has_cruft_in_midx(struct existing_packs *existing)
+{
+	struct string_list_item *item;
+	for_each_string_list_item(item, &existing->cruft_packs) {
+		if (pack_is_in_midx(item))
+			return 1;
+	}
+	return 0;
+}
+
 static void mark_packs_for_deletion_1(struct string_list *names,
 				      struct string_list *list)
 {
@@ -821,26 +836,52 @@ static void midx_included_packs(struct string_list *include,
 		}
 	}
 
-	for_each_string_list_item(item, &existing->cruft_packs) {
+	if (existing_has_cruft_in_midx(existing)) {
 		/*
-		 * When doing a --geometric repack, there is no need to check
-		 * for deleted packs, since we're by definition not doing an
-		 * ALL_INTO_ONE repack (hence no packs will be deleted).
-		 * Otherwise we must check for and exclude any packs which are
-		 * enqueued for deletion.
+		 * If we had one or more cruft pack(s) present in the
+		 * MIDX before the repack, keep them as they may be
+		 * required to form a reachability closure if the MIDX
+		 * is bitmapped.
 		 *
-		 * So we could omit the conditional below in the --geometric
-		 * case, but doing so is unnecessary since no packs are marked
-		 * as pending deletion (since we only call
-		 * `mark_packs_for_deletion()` when doing an all-into-one
-		 * repack).
+		 * A cruft pack can be required to form a reachability
+		 * closure if the MIDX is bitmapped and one or more of
+		 * its selected commits reaches a once-cruft object that
+		 * was later made reachable.
 		 */
-		if (pack_is_marked_for_deletion(item))
-			continue;
+		for_each_string_list_item(item, &existing->cruft_packs) {
+			/*
+			 * When doing a --geometric repack, there is no
+			 * need to check for deleted packs, since we're
+			 * by definition not doing an ALL_INTO_ONE
+			 * repack (hence no packs will be deleted).
+			 * Otherwise we must check for and exclude any
+			 * packs which are enqueued for deletion.
+			 *
+			 * So we could omit the conditional below in the
+			 * --geometric case, but doing so is unnecessary
+			 *  since no packs are marked as pending
+			 *  deletion (since we only call
+			 *  `mark_packs_for_deletion()` when doing an
+			 *  all-into-one repack).
+			 */
+			if (pack_is_marked_for_deletion(item))
+				continue;
 
-		strbuf_reset(&buf);
-		strbuf_addf(&buf, "%s.idx", item->string);
-		string_list_insert(include, buf.buf);
+			strbuf_reset(&buf);
+			strbuf_addf(&buf, "%s.idx", item->string);
+			string_list_insert(include, buf.buf);
+		}
+	} else {
+		/*
+		 * Modern versions of Git will write new copies of
+		 * once-cruft objects when doing a --geometric repack.
+		 *
+		 * If the MIDX has no cruft pack, new packs written
+		 * during a --geometric repack will not rely on the
+		 * cruft pack to form a reachability closure, so we can
+		 * avoid including them in the MIDX in that case.
+		 */
+		;
 	}
 
 	strbuf_release(&buf);
@@ -1369,7 +1410,10 @@ int cmd_repack(int argc,
 		    !(pack_everything & PACK_CRUFT))
 			strvec_push(&cmd.args, "--pack-loose-unreachable");
 	} else if (geometry.split_factor) {
-		strvec_push(&cmd.args, "--stdin-packs");
+		if (existing_has_cruft_in_midx(&existing))
+			strvec_push(&cmd.args, "--stdin-packs");
+		else
+			strvec_push(&cmd.args, "--stdin-packs=follow");
 		strvec_push(&cmd.args, "--unpacked");
 	} else {
 		strvec_push(&cmd.args, "--unpacked");
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 8aebfb45f5..33ac58a3a5 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -724,4 +724,74 @@ test_expect_success 'cruft repack respects --quiet' '
 	)
 '
 
+test_expect_success 'repack --write-midx excludes cruft where possible' '
+	git init exclude-cruft-when-possible &&
+	(
+		cd exclude-cruft-when-possible &&
+
+		test_commit one &&
+
+		test_commit --no-tag two &&
+		two="$(git rev-parse HEAD)" &&
+		test_commit --no-tag three &&
+		three="$(git rev-parse HEAD)" &&
+		git reset --hard one &&
+
+		git reflog expire --all --expire=all &&
+
+		git repack --cruft -d &&
+		ls $packdir/pack-*.idx | sort >packs.before &&
+
+		git merge $two &&
+		test_commit four &&
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+		ls $packdir/pack-*.idx | sort >packs.after &&
+
+		comm -13 packs.before packs.after >packs.new &&
+		test_line_count = 1 packs.new &&
+
+		git rev-list --objects --no-object-names one..four >expect.raw &&
+		sort expect.raw >expect &&
+
+		git show-index <$(cat packs.new) >actual.raw &&
+		cut -d" " -f2 actual.raw | sort >actual &&
+
+		test_cmp expect actual &&
+
+		test-tool read-midx --show-objects $objdir >actual.raw &&
+		grep "\.pack$" actual.raw | cut -d" " -f1 | sort >actual.objects &&
+		git rev-list --objects --no-object-names HEAD >expect.raw &&
+		sort expect.raw >expect.objects &&
+
+		test_cmp expect.objects actual.objects &&
+
+		cruft="$(basename $(ls $packdir/*.mtimes))" &&
+		grep "^pack-" actual.raw >actual.packs &&
+		! test_grep "${cruft%.mtimes}.idx" actual.packs
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when necessary' '
+	(
+		cd exclude-cruft-when-possible &&
+
+		ls $packdir/pack-*.idx | sort >packs.all &&
+		grep -o "pack-.*\.idx$" packs.all >in &&
+
+		git multi-pack-index write --stdin-packs --bitmap <in &&
+
+		test_commit five &&
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >actual.raw &&
+		grep "\.pack$" actual.raw | cut -d" " -f1 | sort >actual.objects &&
+		git cat-file --batch-all-objects --batch-check="%(objectname)" \
+			>expect.objects &&
+		test_cmp expect.objects actual.objects &&
+
+		grep "^pack-" actual.raw >actual.packs &&
+		test_line_count = "$(($(wc -l <packs.all) + 1))" actual.packs
+	)
+'
+
 test_done
-- 
2.49.0.229.g19b69c1246

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) where possible
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
                   ` (7 preceding siblings ...)
  2025-04-11 23:26 ` [RFC PATCH 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
@ 2025-04-14 20:06 ` Taylor Blau
  2025-04-14 20:06   ` [PATCH v2 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
                     ` (8 more replies)
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
                   ` (3 subsequent siblings)
  12 siblings, 9 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-14 20:06 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Here is a non-RFC version of my series to explore creating MIDXs while
repacking that don't include the cruft pack.

The core idea behind this approach is to ensure that packs generated via
geometric repacking traverse through objects that appear in packs which
are neither included nor excluded. Then if some commit (for example) in
a pack reaches some once-unreachable object stored in a cruft pack, the
pack generated via geometric repacking will pick up and write a copy of
that object during its traversal.

If you repack consistently using this strategy, you can guarantee that
the union of geometrically-repacked packs are closed under reachability
without having to keep track of any cruft pack(s) in the MIDX.

This version has a couple of minor changes from the RFC:

  - Before using a designated initializer to setup a 'struct
    object_info', add a new preparatory commit to explain that such
    designated initializers rely on the default value for
    non-initialized fields to be zero'd.

  - Less cruft-pack specific reasoning for when repack can use this new
    mode (thanks to a helpful discussion with Peff while thinking
    through and talking about these changes).

  - A new 'repack.midxMustContainCruft' configuration knob to opt-in to
    this new behavior.

  - More readable (IMHO) test scripts.

I think this version is sufficiently ready for review. I'm going to
deploy a copy of this within GitHub's infrastructure and see how it
behaves on a single replica of an internal repository over a ~week and
report back.

Thanks in advance for any review in the meantime :-).

Taylor Blau (8):
  pack-objects: use standard option incompatibility functions
  object-store-ll.h: add note about designated initializers
  pack-objects: limit scope in 'add_object_entry_from_pack()'
  pack-objects: factor out handling '--stdin-packs'
  pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  pack-objects: perform name-hash traversal for unpacked objects
  pack-objects: introduce '--stdin-packs=follow'
  repack: exclude cruft pack(s) from the MIDX where possible

 Documentation/config/repack.adoc    |   7 +
 Documentation/git-pack-objects.adoc |   8 +-
 builtin/pack-objects.c              | 193 +++++++++++++++++-----------
 builtin/repack.c                    | 162 ++++++++++++++++++++---
 object-store-ll.h                   |   8 ++
 t/t5331-pack-objects-stdin.sh       | 103 ++++++++++++++-
 t/t7704-repack-cruft.sh             |  90 +++++++++++++
 7 files changed, 478 insertions(+), 93 deletions(-)

Range-diff against v1:
1:  63fb4dab30 = 1:  65bc7e4630 pack-objects: use standard option incompatibility functions
-:  ---------- > 2:  920c91eb1e object-store-ll.h: add note about designated initializers
2:  6357633f6d = 3:  f8ac36b110 pack-objects: limit scope in 'add_object_entry_from_pack()'
3:  43e889b157 = 4:  5e03b482ba pack-objects: factor out handling '--stdin-packs'
4:  07a91be3ec = 5:  bccbac2ec5 pack-objects: declare 'rev_info' for '--stdin-packs' earlier
5:  241f7c87e5 = 6:  0bc2183dc3 pack-objects: perform name-hash traversal for unpacked objects
6:  a0318321ec = 7:  697a337cb1 pack-objects: introduce '--stdin-packs=follow'
7:  ef0bc38cf0 < -:  ---------- repack: keep track of existing MIDX'd packs
8:  19b69c1246 ! 8:  a2ec1b826c repack: exclude cruft pack(s) from the MIDX where possible
    @@ Commit message
         Note that you cannot guarantee that a collection of packs is closed
         under reachability if not all of them were generated with following as
         above. One tell-tale sign that not all geometrically-repacked packs in
    -    the MIDX were generated with following is to see if there is a cruft
    -    pack already in the MIDX.
    +    the MIDX were generated with following is to see if there is a pack in
    +    the existing MIDX that is not going to be somehow represented (either
    +    verbatim or as part of a geometric rollup) in the new MIDX.
     
         If there is, then starting to generate packs with following during
         geometric repacking won't work, since it's open to the same race as
    @@ Commit message
         under reachability.
     
         Detect when this is the case and avoid including cruft packs in the MIDX
    -    where possible.
    +    where possible. The existing behavior remains the default, and the new
    +    behavior is available with the config 'repack.midxMustIncludeCruft' set
    +    to 'false'.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    + ## Documentation/config/repack.adoc ##
    +@@ Documentation/config/repack.adoc: repack.cruftThreads::
    + 	a cruft pack and the respective parameters are not given over
    + 	the command line. See similarly named `pack.*` configuration
    + 	variables for defaults and meaning.
    ++
    ++repack.midxMustContainCruft::
    ++	When set to true, linkgit:git-repack[1] will unconditionally include
    ++	cruft pack(s), if any, in the multi-pack index when invoked with
    ++	`--write-midx`. When false, cruft packs are only included in the MIDX
    ++	when necessary (e.g., because they might be required to form a
    ++	reachability closure with MIDX bitmaps). Defaults to true.
    +
      ## builtin/repack.c ##
    -@@ builtin/repack.c: static void pack_mark_in_midx(struct string_list_item *item)
    - 	item->util = (void*)((uintptr_t)item->util | PACK_IN_MIDX);
    +@@ builtin/repack.c: static int write_bitmaps = -1;
    + static int use_delta_islands;
    + static int run_update_server_info = 1;
    + static char *packdir, *packtmp_name, *packtmp;
    ++static int midx_must_contain_cruft = 1;
    + 
    + static const char *const git_repack_usage[] = {
    + 	N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
    +@@ builtin/repack.c: static int repack_config(const char *var, const char *value,
    + 		free(cruft_po_args->threads);
    + 		return git_config_string(&cruft_po_args->threads, var, value);
    + 	}
    ++	if (!strcmp(var, "repack.midxmustcontaincruft")) {
    ++		midx_must_contain_cruft = git_config_bool(var, value);
    ++		return 0;
    ++	}
    + 	return git_default_config(var, value, ctx, cb);
    + }
    + 
    +@@ builtin/repack.c: static void free_pack_geometry(struct pack_geometry *geometry)
    + 	free(geometry->pack);
      }
      
    -+static int pack_is_in_midx(struct string_list_item *item)
    ++static int midx_has_unknown_packs(char **midx_pack_names,
    ++				  size_t midx_pack_names_nr,
    ++				  struct string_list *include,
    ++				  struct pack_geometry *geometry,
    ++				  struct existing_packs *existing)
     +{
    -+	return (uintptr_t)item->util & PACK_IN_MIDX;
    -+}
    ++	size_t i;
     +
    -+static int existing_has_cruft_in_midx(struct existing_packs *existing)
    -+{
    -+	struct string_list_item *item;
    -+	for_each_string_list_item(item, &existing->cruft_packs) {
    -+		if (pack_is_in_midx(item))
    -+			return 1;
    ++	string_list_sort(include);
    ++
    ++	for (i = 0; i < midx_pack_names_nr; i++) {
    ++		const char *pack_name = midx_pack_names[i];
    ++
    ++		/*
    ++		 * Determine whether or not each MIDX'd pack from the existing
    ++		 * MIDX (if any) is represented in the new MIDX. For each pack
    ++		 * in the MIDX, it must either be:
    ++		 *
    ++		 *  - In the "include" list of packs to be included in the new
    ++		 *    MIDX. Note this function is called before the include
    ++		 *    list is populated with any cruft pack(s).
    ++		 *
    ++		 *  - Below the geometric split line (if using pack geometry),
    ++		 *    indicating that the pack won't be included in the new
    ++		 *    MIDX, but its contents were rolled up as part of the
    ++		 *    geometric repack.
    ++		 *
    ++		 *  - In the existing non-kept packs list (if not using pack
    ++		 *    geometry), and marked as non-deleted.
    ++		 */
    ++		if (string_list_has_string(include, pack_name)) {
    ++			continue;
    ++		} else if (geometry) {
    ++			struct strbuf buf = STRBUF_INIT;
    ++			uint32_t j;
    ++
    ++			for (j = 0; j < geometry->split; j++) {
    ++				strbuf_reset(&buf);
    ++				strbuf_addstr(&buf, pack_basename(geometry->pack[j]));
    ++				strbuf_strip_suffix(&buf, ".pack");
    ++				strbuf_addstr(&buf, ".idx");
    ++
    ++				if (!strcmp(pack_name, buf.buf)) {
    ++					strbuf_release(&buf);
    ++					break;
    ++				}
    ++			}
    ++
    ++			strbuf_release(&buf);
    ++
    ++			if (j < geometry->split)
    ++				continue;
    ++		} else {
    ++			struct string_list_item *item;
    ++
    ++			item = string_list_lookup(&existing->non_kept_packs,
    ++						  pack_name);
    ++			if (item && !pack_is_marked_for_deletion(item))
    ++				continue;
    ++		}
    ++
    ++		/*
    ++		 * If we got to this point, the MIDX includes some pack that we
    ++		 * don't know about.
    ++		 */
    ++		return 1;
     +	}
    ++
     +	return 0;
     +}
     +
    - static void mark_packs_for_deletion_1(struct string_list *names,
    - 				      struct string_list *list)
    + struct midx_snapshot_ref_data {
    + 	struct tempfile *f;
    + 	struct oidset seen;
    +@@ builtin/repack.c: static void midx_snapshot_refs(struct tempfile *f)
    + 
    + static void midx_included_packs(struct string_list *include,
    + 				struct existing_packs *existing,
    ++				char **midx_pack_names,
    ++				size_t midx_pack_names_nr,
    + 				struct string_list *names,
    + 				struct pack_geometry *geometry)
      {
     @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
      		}
      	}
      
     -	for_each_string_list_item(item, &existing->cruft_packs) {
    -+	if (existing_has_cruft_in_midx(existing)) {
    ++	if (midx_must_contain_cruft ||
    ++	    midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
    ++				   include, geometry, existing)) {
      		/*
     -		 * When doing a --geometric repack, there is no need to check
     -		 * for deleted packs, since we're by definition not doing an
     -		 * ALL_INTO_ONE repack (hence no packs will be deleted).
     -		 * Otherwise we must check for and exclude any packs which are
     -		 * enqueued for deletion.
    -+		 * If we had one or more cruft pack(s) present in the
    -+		 * MIDX before the repack, keep them as they may be
    -+		 * required to form a reachability closure if the MIDX
    -+		 * is bitmapped.
    ++		 * If there are one or more unknown pack(s) present (see
    ++		 * midx_has_unknown_packs() for what makes a pack
    ++		 * "unknown") in the MIDX before the repack, keep them
    ++		 * as they may be required to form a reachability
    ++		 * closure if the MIDX is bitmapped.
      		 *
     -		 * So we could omit the conditional below in the --geometric
     -		 * case, but doing so is unnecessary since no packs are marked
     -		 * as pending deletion (since we only call
     -		 * `mark_packs_for_deletion()` when doing an all-into-one
     -		 * repack).
    -+		 * A cruft pack can be required to form a reachability
    -+		 * closure if the MIDX is bitmapped and one or more of
    -+		 * its selected commits reaches a once-cruft object that
    -+		 * was later made reachable.
    ++		 * For example, a cruft pack can be required to form a
    ++		 * reachability closure if the MIDX is bitmapped and one
    ++		 * or more of its selected commits reaches a once-cruft
    ++		 * object that was later made reachable.
      		 */
     -		if (pack_is_marked_for_deletion(item))
     -			continue;
    @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
      	}
      
      	strbuf_release(&buf);
    +@@ builtin/repack.c: int cmd_repack(int argc,
    + 	struct tempfile *refs_snapshot = NULL;
    + 	int i, ext, ret;
    + 	int show_progress;
    ++	char **midx_pack_names = NULL;
    ++	size_t midx_pack_names_nr = 0;
    + 
    + 	/* variables to be filled by option parsing */
    + 	int delete_redundant = 0;
     @@ builtin/repack.c: int cmd_repack(int argc,
      		    !(pack_everything & PACK_CRUFT))
      			strvec_push(&cmd.args, "--pack-loose-unreachable");
      	} else if (geometry.split_factor) {
     -		strvec_push(&cmd.args, "--stdin-packs");
    -+		if (existing_has_cruft_in_midx(&existing))
    ++		if (midx_must_contain_cruft)
     +			strvec_push(&cmd.args, "--stdin-packs");
     +		else
     +			strvec_push(&cmd.args, "--stdin-packs=follow");
      		strvec_push(&cmd.args, "--unpacked");
      	} else {
      		strvec_push(&cmd.args, "--unpacked");
    +@@ builtin/repack.c: int cmd_repack(int argc,
    + 
    + 	string_list_sort(&names);
    + 
    ++	if (get_local_multi_pack_index(the_repository)) {
    ++		uint32_t i;
    ++		struct multi_pack_index *m =
    ++			get_local_multi_pack_index(the_repository);
    ++
    ++		ALLOC_ARRAY(midx_pack_names, m->num_packs);
    ++		for (i = 0; i < m->num_packs; i++)
    ++			midx_pack_names[midx_pack_names_nr++] = xstrdup(m->pack_names[i]);
    ++	}
    ++
    + 	close_object_store(the_repository->objects);
    + 
    + 	/*
    +@@ builtin/repack.c: int cmd_repack(int argc,
    + 
    + 	if (write_midx) {
    + 		struct string_list include = STRING_LIST_INIT_DUP;
    +-		midx_included_packs(&include, &existing, &names, &geometry);
    ++		midx_included_packs(&include, &existing, midx_pack_names,
    ++				    midx_pack_names_nr, &names, &geometry);
    + 
    + 		ret = write_midx_included_packs(&include, &geometry, &names,
    + 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
    +@@ builtin/repack.c: int cmd_repack(int argc,
    + 	string_list_clear(&names, 1);
    + 	existing_packs_release(&existing);
    + 	free_pack_geometry(&geometry);
    ++	for (size_t i = 0; i < midx_pack_names_nr; i++)
    ++		free(midx_pack_names[i]);
    ++	free(midx_pack_names);
    + 	pack_objects_args_release(&po_args);
    + 	pack_objects_args_release(&cruft_po_args);
    + 
     
      ## t/t7704-repack-cruft.sh ##
     @@ t/t7704-repack-cruft.sh: test_expect_success 'cruft repack respects --quiet' '
      	)
      '
      
    -+test_expect_success 'repack --write-midx excludes cruft where possible' '
    -+	git init exclude-cruft-when-possible &&
    ++setup_cruft_exclude_tests() {
    ++	git init "$1" &&
     +	(
    -+		cd exclude-cruft-when-possible &&
    ++		cd "$1" &&
    ++
    ++		git config repack.midxMustContainCruft false &&
     +
     +		test_commit one &&
     +
    @@ t/t7704-repack-cruft.sh: test_expect_success 'cruft repack respects --quiet' '
     +		test_commit --no-tag three &&
     +		three="$(git rev-parse HEAD)" &&
     +		git reset --hard one &&
    -+
     +		git reflog expire --all --expire=all &&
     +
    -+		git repack --cruft -d &&
    -+		ls $packdir/pack-*.idx | sort >packs.before &&
    ++		GIT_TEST_MULTI_PACK_INDEX=0 git repack --cruft -d &&
     +
     +		git merge $two &&
    -+		test_commit four &&
    ++		test_commit four
    ++	)
    ++}
    ++
    ++test_expect_success 'repack --write-midx excludes cruft where possible' '
    ++	setup_cruft_exclude_tests exclude-cruft-when-possible &&
    ++	(
    ++		cd exclude-cruft-when-possible &&
    ++
    ++		GIT_TEST_MULTI_PACK_INDEX=0 \
     +		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
    -+		ls $packdir/pack-*.idx | sort >packs.after &&
     +
    -+		comm -13 packs.before packs.after >packs.new &&
    -+		test_line_count = 1 packs.new &&
    ++		test-tool read-midx --show-objects $objdir >midx &&
    ++		cruft="$(ls $packdir/*.mtimes)" &&
    ++		test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
     +
    -+		git rev-list --objects --no-object-names one..four >expect.raw &&
    -+		sort expect.raw >expect &&
    ++		git rev-list --all --objects --no-object-names >reachable.raw &&
    ++		sort reachable.raw >reachable.objects &&
    ++		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
     +
    -+		git show-index <$(cat packs.new) >actual.raw &&
    -+		cut -d" " -f2 actual.raw | sort >actual &&
    ++		test_cmp reachable.objects midx.objects
    ++	)
    ++'
     +
    -+		test_cmp expect actual &&
    ++test_expect_success 'repack --write-midx includes cruft when instructed' '
    ++	setup_cruft_exclude_tests exclude-cruft-when-instructed &&
    ++	(
    ++		cd exclude-cruft-when-instructed &&
     +
    -+		test-tool read-midx --show-objects $objdir >actual.raw &&
    -+		grep "\.pack$" actual.raw | cut -d" " -f1 | sort >actual.objects &&
    -+		git rev-list --objects --no-object-names HEAD >expect.raw &&
    -+		sort expect.raw >expect.objects &&
    ++		GIT_TEST_MULTI_PACK_INDEX=0 \
    ++		git -c repack.midxMustContainCruft=true repack \
    ++			-d --geometric=2 --write-midx --write-bitmap-index &&
     +
    -+		test_cmp expect.objects actual.objects &&
    ++		test-tool read-midx --show-objects $objdir >midx &&
    ++		cruft="$(ls $packdir/*.mtimes)" &&
    ++		test_grep "$(basename "$cruft" .mtimes).idx" midx &&
     +
    -+		cruft="$(basename $(ls $packdir/*.mtimes))" &&
    -+		grep "^pack-" actual.raw >actual.packs &&
    -+		! test_grep "${cruft%.mtimes}.idx" actual.packs
    ++		git cat-file --batch-check="%(objectname)" --batch-all-objects \
    ++			>all.objects &&
    ++		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
    ++
    ++		test_cmp all.objects midx.objects
     +	)
     +'
     +
     +test_expect_success 'repack --write-midx includes cruft when necessary' '
    ++	setup_cruft_exclude_tests exclude-cruft-when-necessary &&
     +	(
    -+		cd exclude-cruft-when-possible &&
    ++		cd exclude-cruft-when-necessary &&
     +
    ++		test_path_is_file $(ls $packdir/pack-*.mtimes) &&
     +		ls $packdir/pack-*.idx | sort >packs.all &&
     +		grep -o "pack-.*\.idx$" packs.all >in &&
     +
     +		git multi-pack-index write --stdin-packs --bitmap <in &&
     +
     +		test_commit five &&
    ++		GIT_TEST_MULTI_PACK_INDEX=0 \
     +		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
     +
    -+		test-tool read-midx --show-objects $objdir >actual.raw &&
    -+		grep "\.pack$" actual.raw | cut -d" " -f1 | sort >actual.objects &&
    ++		test-tool read-midx --show-objects $objdir >midx &&
    ++		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
     +		git cat-file --batch-all-objects --batch-check="%(objectname)" \
     +			>expect.objects &&
    -+		test_cmp expect.objects actual.objects &&
    ++		test_cmp expect.objects midx.objects &&
     +
    -+		grep "^pack-" actual.raw >actual.packs &&
    -+		test_line_count = "$(($(wc -l <packs.all) + 1))" actual.packs
    ++		grep "^pack-" midx >midx.packs &&
    ++		test_line_count = "$(($(wc -l <packs.all) + 1))" midx.packs
     +	)
     +'
     +

base-commit: 485f5f863615e670fd97ae40af744e14072cfe18
-- 
2.49.0.229.gc267761125.dirty

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH v2 1/8] pack-objects: use standard option incompatibility functions
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
@ 2025-04-14 20:06   ` Taylor Blau
  2025-04-14 20:41     ` Junio C Hamano
  2025-04-14 20:06   ` [PATCH v2 2/8] object-store-ll.h: add note about designated initializers Taylor Blau
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-14 20:06 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

pack-objects has a handful of explicit checks for pairs of command-line
options which are mutually incompatible. Many of these pre-date
a699367bb8 (i18n: factorize more 'incompatible options' messages,
2022-01-31).

Convert the explicit checks into die_for_incompatible_opt2() calls,
which simplifies the implementation and standardizes pack-objects'
output when given incompatible options (e.g., --stdin-packs with
--filter gives different output than --keep-unreachable with
--unpack-unreachable).

There is one minor piece of test fallout in t5331 that expects the old
format, which has been corrected.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c        | 19 ++++++++++---------
 t/t5331-pack-objects-stdin.sh |  2 +-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6b06d159d2..aaea968ed2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4651,9 +4651,10 @@ int cmd_pack_objects(int argc,
 		strvec_push(&rp, "--unpacked");
 	}
 
-	if (exclude_promisor_objects && exclude_promisor_objects_best_effort)
-		die(_("options '%s' and '%s' cannot be used together"),
-		    "--exclude-promisor-objects", "--exclude-promisor-objects-best-effort");
+	die_for_incompatible_opt2(exclude_promisor_objects,
+				  "--exclude-promisor-objects",
+				  exclude_promisor_objects_best_effort,
+				  "--exclude-promisor-objects-best-effort");
 	if (exclude_promisor_objects) {
 		use_internal_rev_list = 1;
 		fetch_if_missing = 0;
@@ -4691,13 +4692,13 @@ int cmd_pack_objects(int argc,
 	if (!pack_to_stdout && thin)
 		die(_("--thin cannot be used to build an indexable pack"));
 
-	if (keep_unreachable && unpack_unreachable)
-		die(_("options '%s' and '%s' cannot be used together"), "--keep-unreachable", "--unpack-unreachable");
+	die_for_incompatible_opt2(keep_unreachable, "--keep-unreachable",
+				  unpack_unreachable, "--unpack-unreachable");
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (stdin_packs && filter_options.choice)
-		die(_("cannot use --filter with --stdin-packs"));
+	die_for_incompatible_opt2(filter_options.choice, "--filter",
+				  stdin_packs, "--stdin-packs");
 
 	if (stdin_packs && use_internal_rev_list)
 		die(_("cannot use internal rev list with --stdin-packs"));
@@ -4705,8 +4706,8 @@ int cmd_pack_objects(int argc,
 	if (cruft) {
 		if (use_internal_rev_list)
 			die(_("cannot use internal rev list with --cruft"));
-		if (stdin_packs)
-			die(_("cannot use --stdin-packs with --cruft"));
+		die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
+					  cruft, "--cruft");
 	}
 
 	/*
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index b48c0cbe8f..4f5e2733a2 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -64,7 +64,7 @@ test_expect_success '--stdin-packs is incompatible with --filter' '
 		cd stdin-packs &&
 		test_must_fail git pack-objects --stdin-packs --stdout \
 			--filter=blob:none </dev/null 2>err &&
-		test_grep "cannot use --filter with --stdin-packs" err
+		test_grep "options .--filter. and .--stdin-packs. cannot be used together" err
 	)
 '
 
-- 
2.49.0.229.gc267761125.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v2 2/8] object-store-ll.h: add note about designated initializers
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
  2025-04-14 20:06   ` [PATCH v2 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
@ 2025-04-14 20:06   ` Taylor Blau
  2025-04-14 21:07     ` Junio C Hamano
  2025-04-15  2:57     ` Elijah Newren
  2025-04-14 20:06   ` [PATCH v2 3/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
                     ` (6 subsequent siblings)
  8 siblings, 2 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-14 20:06 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

The following commit will use a designated initializer to initialize a
'struct object_info'. This obviously depends on having the rest of the
fields having a default value of zero, since unspecified fields in a
designated initializer are zero'd out.

Before writing that designated initializer, I wondered if there were
other spots that also use designated initializers to set up object_info
structs, and there are a handful.

To prevent potential breakage against future object_info changes that
would introduce/change a field to have a non-zero default value, note
this dependency in a comment near the OBJECT_INFO_INIT macro.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 object-store-ll.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/object-store-ll.h b/object-store-ll.h
index cd3bd5bd99..7ff180d7f2 100644
--- a/object-store-ll.h
+++ b/object-store-ll.h
@@ -337,6 +337,14 @@ struct object_info {
 /*
  * Initializer for a "struct object_info" that wants no items. You may
  * also memset() the memory to all-zeroes.
+ *
+ * NOTE: callers expect the initial value of an object_info struct to
+ * be zero'd out. Designated initializers like
+ *
+ *     struct object_info oi = { .sizep = &sz };
+ *
+ * depend on this behavior, so consider strongly before adding new
+ * fields that have a non-zero default value.
  */
 #define OBJECT_INFO_INIT { 0 }
 
-- 
2.49.0.229.gc267761125.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v2 3/8] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
  2025-04-14 20:06   ` [PATCH v2 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
  2025-04-14 20:06   ` [PATCH v2 2/8] object-store-ll.h: add note about designated initializers Taylor Blau
@ 2025-04-14 20:06   ` Taylor Blau
  2025-04-15  3:10     ` Elijah Newren
  2025-04-14 20:06   ` [PATCH v2 4/8] pack-objects: factor out handling '--stdin-packs' Taylor Blau
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-14 20:06 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

add_object_entry_from_pack() handles objects from identified packs by
checking their type, before adding commit objects as pending in the
subsequent traversal used by `--stdin-packs`.

There are a couple of quality-of-life refactorings that I noticed while
working in this area:

  - We declare 'revs' (given to us through the miscellaneous context
    argument) earlier in the "if (p)" conditional than is necessary.

  - The 'struct object_info' can use a designated initializer to fill in
    the structures type pointer, since that is the only field that we
    care about.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index aaea968ed2..540e5eba9e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3490,14 +3490,12 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 		return 0;
 
 	if (p) {
-		struct rev_info *revs = _data;
-		struct object_info oi = OBJECT_INFO_INIT;
-
-		oi.typep = &type;
+		struct object_info oi = { .typep = &type };
 		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
 			die(_("could not get type of object %s in pack %s"),
 			    oid_to_hex(oid), p->pack_name);
 		} else if (type == OBJ_COMMIT) {
+			struct rev_info *revs = _data;
 			/*
 			 * commits in included packs are used as starting points for the
 			 * subsequent revision walk
-- 
2.49.0.229.gc267761125.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v2 4/8] pack-objects: factor out handling '--stdin-packs'
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (2 preceding siblings ...)
  2025-04-14 20:06   ` [PATCH v2 3/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
@ 2025-04-14 20:06   ` Taylor Blau
  2025-04-14 20:06   ` [PATCH v2 5/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-14 20:06 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

At the bottom of cmd_pack_objects() we check which mode the command is
running in (e.g., generating a cruft pack, handling '--stdin-packs',
using the internal rev-list, etc.) and handle the mode appropriately.

The '--stdin-packs' case is handled inline (dating back to its
introduction in 339bce27f4 (builtin/pack-objects.c: add '--stdin-packs'
option, 2021-02-22)) since it is relatively short. Extract the body of
"if (stdin_packs)" into its own function to prepare for the
implementation to become lengthier in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 540e5eba9e..793d245721 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3672,6 +3672,17 @@ static void read_packs_list_from_stdin(void)
 	string_list_clear(&exclude_packs, 0);
 }
 
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin();
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+}
+
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
 				   struct packed_git *pack, off_t offset,
 				   const char *name, uint32_t mtime)
@@ -3767,7 +3778,6 @@ static void mark_pack_kept_in_core(struct string_list *packs, unsigned keep)
 	}
 }
 
-static void add_unreachable_loose_objects(void);
 static void add_objects_in_unpacked_packs(void);
 
 static void enumerate_cruft_objects(void)
@@ -4773,11 +4783,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		/* avoids adding objects in excluded packs */
-		ignore_packed_keep_in_core = 1;
-		read_packs_list_from_stdin();
-		if (rev_list_unpacked)
-			add_unreachable_loose_objects();
+		read_stdin_packs(rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
-- 
2.49.0.229.gc267761125.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v2 5/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (3 preceding siblings ...)
  2025-04-14 20:06   ` [PATCH v2 4/8] pack-objects: factor out handling '--stdin-packs' Taylor Blau
@ 2025-04-14 20:06   ` Taylor Blau
  2025-04-14 20:06   ` [PATCH v2 6/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-14 20:06 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
on each of the input packs, we do a reachability traversal to discover
names for any objects we picked up so we can generate name hash values
and hopefully get higher quality deltas as a result.

A future commit will change the purpose of this reachability traversal
to find and pack objects which are reachable from commits in the input
packs, but are packed in an unknown (not included nor excluded) pack.

Extract the code which initializes and performs the reachability
traversal to take place in the caller, not the callee, which prepares us
to share this code for the '--unpacked' case (see the function
add_unreachable_loose_objects() for more details).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 71 +++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 35 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 793d245721..1689cddd3a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3556,7 +3556,7 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
 		return 0;
 }
 
-static void read_packs_list_from_stdin(void)
+static void read_packs_list_from_stdin(struct rev_info *revs)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct string_list include_packs = STRING_LIST_INIT_DUP;
@@ -3564,24 +3564,6 @@ static void read_packs_list_from_stdin(void)
 	struct string_list_item *item = NULL;
 
 	struct packed_git *p;
-	struct rev_info revs;
-
-	repo_init_revisions(the_repository, &revs, NULL);
-	/*
-	 * Use a revision walk to fill in the namehash of objects in the include
-	 * packs. To save time, we'll avoid traversing through objects that are
-	 * in excluded packs.
-	 *
-	 * That may cause us to avoid populating all of the namehash fields of
-	 * all included objects, but our goal is best-effort, since this is only
-	 * an optimization during delta selection.
-	 */
-	revs.no_kept_objects = 1;
-	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
-	revs.blob_objects = 1;
-	revs.tree_objects = 1;
-	revs.tag_objects = 1;
-	revs.ignore_missing_links = 1;
 
 	while (strbuf_getline(&buf, stdin) != EOF) {
 		if (!buf.len)
@@ -3651,10 +3633,44 @@ static void read_packs_list_from_stdin(void)
 		struct packed_git *p = item->util;
 		for_each_object_in_pack(p,
 					add_object_entry_from_pack,
-					&revs,
+					revs,
 					FOR_EACH_OBJECT_PACK_ORDER);
 	}
 
+	strbuf_release(&buf);
+	string_list_clear(&include_packs, 0);
+	string_list_clear(&exclude_packs, 0);
+}
+
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	struct rev_info revs;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	/*
+	 * Use a revision walk to fill in the namehash of objects in the include
+	 * packs. To save time, we'll avoid traversing through objects that are
+	 * in excluded packs.
+	 *
+	 * That may cause us to avoid populating all of the namehash fields of
+	 * all included objects, but our goal is best-effort, since this is only
+	 * an optimization during delta selection.
+	 */
+	revs.no_kept_objects = 1;
+	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
+	revs.blob_objects = 1;
+	revs.tree_objects = 1;
+	revs.tag_objects = 1;
+	revs.ignore_missing_links = 1;
+
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin(&revs);
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 	traverse_commit_list(&revs,
@@ -3666,21 +3682,6 @@ static void read_packs_list_from_stdin(void)
 			   stdin_packs_found_nr);
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
 			   stdin_packs_hints_nr);
-
-	strbuf_release(&buf);
-	string_list_clear(&include_packs, 0);
-	string_list_clear(&exclude_packs, 0);
-}
-
-static void add_unreachable_loose_objects(void);
-
-static void read_stdin_packs(int rev_list_unpacked)
-{
-	/* avoids adding objects in excluded packs */
-	ignore_packed_keep_in_core = 1;
-	read_packs_list_from_stdin();
-	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
 }
 
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
-- 
2.49.0.229.gc267761125.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v2 6/8] pack-objects: perform name-hash traversal for unpacked objects
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (4 preceding siblings ...)
  2025-04-14 20:06   ` [PATCH v2 5/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
@ 2025-04-14 20:06   ` Taylor Blau
  2025-04-15  3:10     ` Elijah Newren
  2025-04-14 20:06   ` [PATCH v2 7/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-14 20:06 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

With '--unpacked', pack-objects adds loose objects (which don't appear
in any of the excluded packs from '--stdin-packs') to the output pack
without considering them as reachability tips for the name-hash
traversal.

This was an oversight in the original implementation of '--stdin-packs',
since the code which enumerates and adds loose objects to the output
pack (`add_unreachable_loose_objects()`) did not have access to the
'rev_info' struct found in `read_packs_list_from_stdin()`.

Excluding unpacked objects from that traversal doesn't effect the
correctness of the resulting pack, but it does make it harder to
discover good deltas for loose objects.

Now that the 'rev_info' struct is declared outside of
`read_packs_list_from_stdin()`, we can pass it to
`add_objects_in_unpacked_packs()` and add any loose objects as tips to
the above-mentioned traversal, in theory producing slightly tighter
packs as a result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1689cddd3a..2aa12da4af 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3642,7 +3642,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 	string_list_clear(&exclude_packs, 0);
 }
 
-static void add_unreachable_loose_objects(void);
+static void add_unreachable_loose_objects(struct rev_info *revs);
 
 static void read_stdin_packs(int rev_list_unpacked)
 {
@@ -3669,7 +3669,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	ignore_packed_keep_in_core = 1;
 	read_packs_list_from_stdin(&revs);
 	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(&revs);
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
@@ -3788,7 +3788,7 @@ static void enumerate_cruft_objects(void)
 						_("Enumerating cruft objects"), 0);
 
 	add_objects_in_unpacked_packs();
-	add_unreachable_loose_objects();
+	add_unreachable_loose_objects(NULL);
 
 	stop_progress(&progress_state);
 }
@@ -4066,8 +4066,9 @@ static void add_objects_in_unpacked_packs(void)
 }
 
 static int add_loose_object(const struct object_id *oid, const char *path,
-			    void *data UNUSED)
+			    void *data)
 {
+	struct rev_info *revs = data;
 	enum object_type type = oid_object_info(the_repository, oid, NULL);
 
 	if (type < 0) {
@@ -4088,6 +4089,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
 	} else {
 		add_object_entry(oid, type, "", 0);
 	}
+
+	if (revs && type == OBJ_COMMIT)
+		add_pending_oid(revs, NULL, oid, 0);
+
 	return 0;
 }
 
@@ -4096,11 +4101,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
  * add_object_entry will weed out duplicates, so we just add every
  * loose object we find.
  */
-static void add_unreachable_loose_objects(void)
+static void add_unreachable_loose_objects(struct rev_info *revs)
 {
 	for_each_loose_file_in_objdir(repo_get_object_directory(the_repository),
-				      add_loose_object,
-				      NULL, NULL, NULL);
+				      add_loose_object, NULL, NULL, revs);
 }
 
 static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid)
@@ -4356,7 +4360,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
 	if (keep_unreachable)
 		add_objects_in_unpacked_packs();
 	if (pack_loose_unreachable)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(NULL);
 	if (unpack_unreachable)
 		loosen_unused_packed_objects();
 
-- 
2.49.0.229.gc267761125.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v2 7/8] pack-objects: introduce '--stdin-packs=follow'
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (5 preceding siblings ...)
  2025-04-14 20:06   ` [PATCH v2 6/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
@ 2025-04-14 20:06   ` Taylor Blau
  2025-04-15  3:11     ` Elijah Newren
  2025-04-14 20:06   ` [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
  2025-04-15  2:57   ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Elijah Newren
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-14 20:06 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

When invoked with '--stdin-packs', pack-objects will generate a pack
which contains the objects found in the "included" packs, less any
objects from "excluded" packs.

Packs that exist in the repository but weren't specified as either
included or excluded are in practice treated like the latter, at least
in the sense that pack-objects won't include objects from those packs.
This behavior forces us to include any cruft pack(s) in a repository's
multi-pack index for the reasons described in ddee3703b3
(builtin/repack.c: add cruft packs to MIDX during geometric repack,
2022-05-20).

The full details are in ddee3703b3, but the gist is if you
have a once-unreachable object in a cruft pack which later becomes
reachable via one or more commits in a pack generated with
'--stdin-packs', you *have* to include that object in the MIDX via the
copy in the cruft pack, otherwise we cannot generate reachability
bitmaps for any commits which reach that object.

This prepares us for new repacking behavior which will "resurrect"
objects found in cruft or otherwise unspecified packs when generating
new packs. In the context of geometric repacking, this may be used to
maintain a sequence of geometrically-repacked packs, the union of which
is closed under reachability, even in the case described earlier.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc |   8 ++-
 builtin/pack-objects.c              |  89 +++++++++++++++++-------
 t/t5331-pack-objects-stdin.sh       | 101 ++++++++++++++++++++++++++++
 3 files changed, 171 insertions(+), 27 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 7f69ae4855..c894582799 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -87,13 +87,19 @@ base-name::
 	reference was included in the resulting packfile.  This
 	can be useful to send new tags to native Git clients.
 
---stdin-packs::
+--stdin-packs[=<mode>]::
 	Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
 	from the standard input, instead of object names or revision
 	arguments. The resulting pack contains all objects listed in the
 	included packs (those not beginning with `^`), excluding any
 	objects listed in the excluded packs (beginning with `^`).
 +
+When `mode` is "follow", pack objects which are reachable from objects
+in the included packs, but appear in packs that are not listed.
+Reachable objects which appear in excluded packs are not packed. Useful
+for resurrecting once-cruft objects to generate packs which are closed
+under reachability up to the excluded packs.
++
 Incompatible with `--revs`, or options that imply `--revs` (such as
 `--all`), with the exception of `--unpacked`, which is compatible.
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 2aa12da4af..6406f4a5b1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -272,6 +272,12 @@ static struct oidmap configured_exclusions;
 static struct oidset excluded_by_config;
 static int name_hash_version = -1;
 
+enum stdin_packs_mode {
+	STDIN_PACKS_MODE_NONE,
+	STDIN_PACKS_MODE_STANDARD,
+	STDIN_PACKS_MODE_FOLLOW,
+};
+
 /**
  * Check whether the name_hash_version chosen by user input is appropriate,
  * and also validate whether it is compatible with other features.
@@ -3511,32 +3517,43 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 	return 0;
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
-{
-	/* nothing to do; commits don't have a namehash */
-}
-
 static void show_object_pack_hint(struct object *object, const char *name,
-				  void *data UNUSED)
+				  void *data)
 {
-	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
-	if (!oe)
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		add_object_entry(&object->oid, object->type, name, 0);
+	} else {
+		struct object_entry *oe = packlist_find(&to_pack, &object->oid);
+		if (!oe)
+			return;
+
+		/*
+		 * Our 'to_pack' list was constructed by iterating all
+		 * objects packed in included packs, and so doesn't
+		 * have a non-zero hash field that you would typically
+		 * pick up during a reachability traversal.
+		 *
+		 * Make a best-effort attempt to fill in the ->hash
+		 * and ->no_try_delta here using a now in order to
+		 * perhaps improve the delta selection process.
+		 */
+		oe->hash = pack_name_hash_fn(name);
+		oe->no_try_delta = name && no_try_delta(name);
+
+		stdin_packs_hints_nr++;
+	}
+}
+
+static void show_commit_pack_hint(struct commit *commit, void *data)
+{
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		show_object_pack_hint((struct object *)commit, "", data);
 		return;
+	}
+	/* nothing to do; commits don't have a namehash */
 
-	/*
-	 * Our 'to_pack' list was constructed by iterating all objects packed in
-	 * included packs, and so doesn't have a non-zero hash field that you
-	 * would typically pick up during a reachability traversal.
-	 *
-	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * here using a now in order to perhaps improve the delta selection
-	 * process.
-	 */
-	oe->hash = pack_name_hash_fn(name);
-	oe->no_try_delta = name && no_try_delta(name);
-
-	stdin_packs_hints_nr++;
 }
 
 static int pack_mtime_cmp(const void *_a, const void *_b)
@@ -3644,7 +3661,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 
 static void add_unreachable_loose_objects(struct rev_info *revs);
 
-static void read_stdin_packs(int rev_list_unpacked)
+static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 {
 	struct rev_info revs;
 
@@ -3676,7 +3693,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	traverse_commit_list(&revs,
 			     show_commit_pack_hint,
 			     show_object_pack_hint,
-			     NULL);
+			     &mode);
 
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
 			   stdin_packs_found_nr);
@@ -4467,6 +4484,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
 	return is_not_in_promisor_pack_obj((struct object *) commit, data);
 }
 
+static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
+				  int unset)
+{
+	enum stdin_packs_mode *mode = opt->value;
+
+	if (unset)
+		*mode = STDIN_PACKS_MODE_NONE;
+	else if (!arg || !*arg)
+		*mode = STDIN_PACKS_MODE_STANDARD;
+	else if (!strcmp(arg, "follow"))
+		*mode = STDIN_PACKS_MODE_FOLLOW;
+	else
+		die(_("invalid value for '%s': '%s'"), opt->long_name, arg);
+
+	return 0;
+}
+
 int cmd_pack_objects(int argc,
 		     const char **argv,
 		     const char *prefix,
@@ -4478,7 +4512,7 @@ int cmd_pack_objects(int argc,
 	struct strvec rp = STRVEC_INIT;
 	int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
 	int rev_list_index = 0;
-	int stdin_packs = 0;
+	enum stdin_packs_mode stdin_packs = STDIN_PACKS_MODE_NONE;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
 	struct list_objects_filter_options filter_options =
 		LIST_OBJECTS_FILTER_INIT;
@@ -4533,6 +4567,9 @@ int cmd_pack_objects(int argc,
 		OPT_SET_INT_F(0, "indexed-objects", &rev_list_index,
 			      N_("include objects referred to by the index"),
 			      1, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F(0, "stdin-packs", &stdin_packs, N_("mode"),
+			     N_("read packs from stdin"),
+			     PARSE_OPT_OPTARG, parse_stdin_packs_mode),
 		OPT_BOOL(0, "stdin-packs", &stdin_packs,
 			 N_("read packs from stdin")),
 		OPT_BOOL(0, "stdout", &pack_to_stdout,
@@ -4788,7 +4825,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		read_stdin_packs(rev_list_unpacked);
+		read_stdin_packs(stdin_packs, rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 4f5e2733a2..f97d2d1b71 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -236,4 +236,105 @@ test_expect_success 'pack-objects --stdin with packfiles from main and alternate
 	test_cmp expected-objects actual-objects
 '
 
+packdir=.git/objects/pack
+
+objects_in_packs () {
+	for p in "$@"
+	do
+		git show-index <"$packdir/pack-$p.idx" || return 1
+	done >objects.raw &&
+
+	cut -d' ' -f2 objects.raw | sort &&
+	rm -f objects.raw
+}
+
+test_expect_success 'setup for --stdin-packs=follow' '
+	git init stdin-packs--follow &&
+	(
+		cd stdin-packs--follow &&
+
+		for c in A B C D
+		do
+			test_commit "$c" || return 1
+		done &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
+
+		git prune-packed
+	)
+'
+
+test_expect_success '--stdin-packs=follow walks into unknown packs' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+
+		for c in A B C D
+		do
+			test_commit "$c" || return 1
+		done &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
+
+		git prune-packed &&
+
+		cat >in <<-EOF &&
+		pack-$B.pack
+		^pack-$C.pack
+		pack-$D.pack
+		EOF
+
+		# With just --stdin-packs, pack "A" is unknown to us, so
+		# only objects from packs "B" and "D" are included in
+		# the output pack.
+		P=$(git pack-objects --stdin-packs $packdir/pack <in) &&
+		objects_in_packs $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# But with --stdin-packs=follow, objects from both
+		# included packs reach objects from the unknown pack, so
+		# objects from pack "A" is included in the output pack
+		# in addition to the above.
+		P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
+		objects_in_packs $A $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		test_commit E &&
+		# And with --unpacked, we will pick up objects from unknown
+		# packs that are reachable from loose objects. Loose object E
+		# reaches objects in pack A, but there are three excluded packs
+		# in between.
+		#
+		# The resulting pack should include objects reachable from E
+		# that are not present in packs B, C, or D, along with those
+		# present in pack A.
+		cat >in <<-EOF &&
+		^pack-$B.pack
+		^pack-$C.pack
+		^pack-$D.pack
+		EOF
+
+		P=$(git pack-objects --stdin-packs=follow --unpacked \
+			$packdir/pack <in) &&
+
+		{
+			objects_in_packs $A &&
+			git rev-list --objects --no-object-names D..E
+		}>expect.raw &&
+		sort expect.raw >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.49.0.229.gc267761125.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (6 preceding siblings ...)
  2025-04-14 20:06   ` [PATCH v2 7/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
@ 2025-04-14 20:06   ` Taylor Blau
  2025-04-15  3:11     ` Elijah Newren
  2025-04-15  2:57   ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Elijah Newren
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-14 20:06 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.

Suppose you have a once-unreachable object packed in a cruft pack, which
later on becomes reachable from one or more objects in a geometrically
repacked pack. That once-unreachable object *won't* appear in the new
pack, since the cruft pack was specified as neither included nor
excluded to 'pack-objects --stdin-packs'. If the bitmap selection
process picks one or more commits which reach the once-unreachable
objects, commit ddee3703b3 ensures that the MIDX will be closed under
reachability. Without it, we would fail to generate a MIDX bitmap.

ddee3703b3 alludes to the fact that this is sub-optimal by saying

    [...] it's desirable to avoid including cruft packs in the MIDX
    because it causes the MIDX to store a bunch of objects which are
    likely to get thrown away.

, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.

This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read, but object reads that go through the
MIDX have to search through them anyway.

This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.

Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with following as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with following is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.

If there is, then starting to generate packs with following during
geometric repacking won't work, since it's open to the same race as
described above.

But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.

Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/repack.adoc |   7 ++
 builtin/repack.c                 | 162 +++++++++++++++++++++++++++----
 t/t7704-repack-cruft.sh          |  90 +++++++++++++++++
 3 files changed, 241 insertions(+), 18 deletions(-)

diff --git a/Documentation/config/repack.adoc b/Documentation/config/repack.adoc
index c79af6d7b8..e9e78dcb19 100644
--- a/Documentation/config/repack.adoc
+++ b/Documentation/config/repack.adoc
@@ -39,3 +39,10 @@ repack.cruftThreads::
 	a cruft pack and the respective parameters are not given over
 	the command line. See similarly named `pack.*` configuration
 	variables for defaults and meaning.
+
+repack.midxMustContainCruft::
+	When set to true, linkgit:git-repack[1] will unconditionally include
+	cruft pack(s), if any, in the multi-pack index when invoked with
+	`--write-midx`. When false, cruft packs are only included in the MIDX
+	when necessary (e.g., because they might be required to form a
+	reachability closure with MIDX bitmaps). Defaults to true.
diff --git a/builtin/repack.c b/builtin/repack.c
index f3330ade7b..ee43a4f4c1 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -39,6 +39,7 @@ static int write_bitmaps = -1;
 static int use_delta_islands;
 static int run_update_server_info = 1;
 static char *packdir, *packtmp_name, *packtmp;
+static int midx_must_contain_cruft = 1;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
@@ -107,6 +108,10 @@ static int repack_config(const char *var, const char *value,
 		free(cruft_po_args->threads);
 		return git_config_string(&cruft_po_args->threads, var, value);
 	}
+	if (!strcmp(var, "repack.midxmustcontaincruft")) {
+		midx_must_contain_cruft = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, ctx, cb);
 }
 
@@ -687,6 +692,77 @@ static void free_pack_geometry(struct pack_geometry *geometry)
 	free(geometry->pack);
 }
 
+static int midx_has_unknown_packs(char **midx_pack_names,
+				  size_t midx_pack_names_nr,
+				  struct string_list *include,
+				  struct pack_geometry *geometry,
+				  struct existing_packs *existing)
+{
+	size_t i;
+
+	string_list_sort(include);
+
+	for (i = 0; i < midx_pack_names_nr; i++) {
+		const char *pack_name = midx_pack_names[i];
+
+		/*
+		 * Determine whether or not each MIDX'd pack from the existing
+		 * MIDX (if any) is represented in the new MIDX. For each pack
+		 * in the MIDX, it must either be:
+		 *
+		 *  - In the "include" list of packs to be included in the new
+		 *    MIDX. Note this function is called before the include
+		 *    list is populated with any cruft pack(s).
+		 *
+		 *  - Below the geometric split line (if using pack geometry),
+		 *    indicating that the pack won't be included in the new
+		 *    MIDX, but its contents were rolled up as part of the
+		 *    geometric repack.
+		 *
+		 *  - In the existing non-kept packs list (if not using pack
+		 *    geometry), and marked as non-deleted.
+		 */
+		if (string_list_has_string(include, pack_name)) {
+			continue;
+		} else if (geometry) {
+			struct strbuf buf = STRBUF_INIT;
+			uint32_t j;
+
+			for (j = 0; j < geometry->split; j++) {
+				strbuf_reset(&buf);
+				strbuf_addstr(&buf, pack_basename(geometry->pack[j]));
+				strbuf_strip_suffix(&buf, ".pack");
+				strbuf_addstr(&buf, ".idx");
+
+				if (!strcmp(pack_name, buf.buf)) {
+					strbuf_release(&buf);
+					break;
+				}
+			}
+
+			strbuf_release(&buf);
+
+			if (j < geometry->split)
+				continue;
+		} else {
+			struct string_list_item *item;
+
+			item = string_list_lookup(&existing->non_kept_packs,
+						  pack_name);
+			if (item && !pack_is_marked_for_deletion(item))
+				continue;
+		}
+
+		/*
+		 * If we got to this point, the MIDX includes some pack that we
+		 * don't know about.
+		 */
+		return 1;
+	}
+
+	return 0;
+}
+
 struct midx_snapshot_ref_data {
 	struct tempfile *f;
 	struct oidset seen;
@@ -755,6 +831,8 @@ static void midx_snapshot_refs(struct tempfile *f)
 
 static void midx_included_packs(struct string_list *include,
 				struct existing_packs *existing,
+				char **midx_pack_names,
+				size_t midx_pack_names_nr,
 				struct string_list *names,
 				struct pack_geometry *geometry)
 {
@@ -808,26 +886,55 @@ static void midx_included_packs(struct string_list *include,
 		}
 	}
 
-	for_each_string_list_item(item, &existing->cruft_packs) {
+	if (midx_must_contain_cruft ||
+	    midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
+				   include, geometry, existing)) {
 		/*
-		 * When doing a --geometric repack, there is no need to check
-		 * for deleted packs, since we're by definition not doing an
-		 * ALL_INTO_ONE repack (hence no packs will be deleted).
-		 * Otherwise we must check for and exclude any packs which are
-		 * enqueued for deletion.
+		 * If there are one or more unknown pack(s) present (see
+		 * midx_has_unknown_packs() for what makes a pack
+		 * "unknown") in the MIDX before the repack, keep them
+		 * as they may be required to form a reachability
+		 * closure if the MIDX is bitmapped.
 		 *
-		 * So we could omit the conditional below in the --geometric
-		 * case, but doing so is unnecessary since no packs are marked
-		 * as pending deletion (since we only call
-		 * `mark_packs_for_deletion()` when doing an all-into-one
-		 * repack).
+		 * For example, a cruft pack can be required to form a
+		 * reachability closure if the MIDX is bitmapped and one
+		 * or more of its selected commits reaches a once-cruft
+		 * object that was later made reachable.
 		 */
-		if (pack_is_marked_for_deletion(item))
-			continue;
+		for_each_string_list_item(item, &existing->cruft_packs) {
+			/*
+			 * When doing a --geometric repack, there is no
+			 * need to check for deleted packs, since we're
+			 * by definition not doing an ALL_INTO_ONE
+			 * repack (hence no packs will be deleted).
+			 * Otherwise we must check for and exclude any
+			 * packs which are enqueued for deletion.
+			 *
+			 * So we could omit the conditional below in the
+			 * --geometric case, but doing so is unnecessary
+			 *  since no packs are marked as pending
+			 *  deletion (since we only call
+			 *  `mark_packs_for_deletion()` when doing an
+			 *  all-into-one repack).
+			 */
+			if (pack_is_marked_for_deletion(item))
+				continue;
 
-		strbuf_reset(&buf);
-		strbuf_addf(&buf, "%s.idx", item->string);
-		string_list_insert(include, buf.buf);
+			strbuf_reset(&buf);
+			strbuf_addf(&buf, "%s.idx", item->string);
+			string_list_insert(include, buf.buf);
+		}
+	} else {
+		/*
+		 * Modern versions of Git will write new copies of
+		 * once-cruft objects when doing a --geometric repack.
+		 *
+		 * If the MIDX has no cruft pack, new packs written
+		 * during a --geometric repack will not rely on the
+		 * cruft pack to form a reachability closure, so we can
+		 * avoid including them in the MIDX in that case.
+		 */
+		;
 	}
 
 	strbuf_release(&buf);
@@ -1142,6 +1249,8 @@ int cmd_repack(int argc,
 	struct tempfile *refs_snapshot = NULL;
 	int i, ext, ret;
 	int show_progress;
+	char **midx_pack_names = NULL;
+	size_t midx_pack_names_nr = 0;
 
 	/* variables to be filled by option parsing */
 	int delete_redundant = 0;
@@ -1356,7 +1465,10 @@ int cmd_repack(int argc,
 		    !(pack_everything & PACK_CRUFT))
 			strvec_push(&cmd.args, "--pack-loose-unreachable");
 	} else if (geometry.split_factor) {
-		strvec_push(&cmd.args, "--stdin-packs");
+		if (midx_must_contain_cruft)
+			strvec_push(&cmd.args, "--stdin-packs");
+		else
+			strvec_push(&cmd.args, "--stdin-packs=follow");
 		strvec_push(&cmd.args, "--unpacked");
 	} else {
 		strvec_push(&cmd.args, "--unpacked");
@@ -1478,6 +1590,16 @@ int cmd_repack(int argc,
 
 	string_list_sort(&names);
 
+	if (get_local_multi_pack_index(the_repository)) {
+		uint32_t i;
+		struct multi_pack_index *m =
+			get_local_multi_pack_index(the_repository);
+
+		ALLOC_ARRAY(midx_pack_names, m->num_packs);
+		for (i = 0; i < m->num_packs; i++)
+			midx_pack_names[midx_pack_names_nr++] = xstrdup(m->pack_names[i]);
+	}
+
 	close_object_store(the_repository->objects);
 
 	/*
@@ -1519,7 +1641,8 @@ int cmd_repack(int argc,
 
 	if (write_midx) {
 		struct string_list include = STRING_LIST_INIT_DUP;
-		midx_included_packs(&include, &existing, &names, &geometry);
+		midx_included_packs(&include, &existing, midx_pack_names,
+				    midx_pack_names_nr, &names, &geometry);
 
 		ret = write_midx_included_packs(&include, &geometry, &names,
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
@@ -1570,6 +1693,9 @@ int cmd_repack(int argc,
 	string_list_clear(&names, 1);
 	existing_packs_release(&existing);
 	free_pack_geometry(&geometry);
+	for (size_t i = 0; i < midx_pack_names_nr; i++)
+		free(midx_pack_names[i]);
+	free(midx_pack_names);
 	pack_objects_args_release(&po_args);
 	pack_objects_args_release(&cruft_po_args);
 
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 8aebfb45f5..2b0a55f8fd 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -724,4 +724,94 @@ test_expect_success 'cruft repack respects --quiet' '
 	)
 '
 
+setup_cruft_exclude_tests() {
+	git init "$1" &&
+	(
+		cd "$1" &&
+
+		git config repack.midxMustContainCruft false &&
+
+		test_commit one &&
+
+		test_commit --no-tag two &&
+		two="$(git rev-parse HEAD)" &&
+		test_commit --no-tag three &&
+		three="$(git rev-parse HEAD)" &&
+		git reset --hard one &&
+		git reflog expire --all --expire=all &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack --cruft -d &&
+
+		git merge $two &&
+		test_commit four
+	)
+}
+
+test_expect_success 'repack --write-midx excludes cruft where possible' '
+	setup_cruft_exclude_tests exclude-cruft-when-possible &&
+	(
+		cd exclude-cruft-when-possible &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git rev-list --all --objects --no-object-names >reachable.raw &&
+		sort reachable.raw >reachable.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp reachable.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when instructed' '
+	setup_cruft_exclude_tests exclude-cruft-when-instructed &&
+	(
+		cd exclude-cruft-when-instructed &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git -c repack.midxMustContainCruft=true repack \
+			-d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git cat-file --batch-check="%(objectname)" --batch-all-objects \
+			>all.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp all.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when necessary' '
+	setup_cruft_exclude_tests exclude-cruft-when-necessary &&
+	(
+		cd exclude-cruft-when-necessary &&
+
+		test_path_is_file $(ls $packdir/pack-*.mtimes) &&
+		ls $packdir/pack-*.idx | sort >packs.all &&
+		grep -o "pack-.*\.idx$" packs.all >in &&
+
+		git multi-pack-index write --stdin-packs --bitmap <in &&
+
+		test_commit five &&
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+		git cat-file --batch-all-objects --batch-check="%(objectname)" \
+			>expect.objects &&
+		test_cmp expect.objects midx.objects &&
+
+		grep "^pack-" midx >midx.packs &&
+		test_line_count = "$(($(wc -l <packs.all) + 1))" midx.packs
+	)
+'
+
 test_done
-- 
2.49.0.229.gc267761125.dirty

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 1/8] pack-objects: use standard option incompatibility functions
  2025-04-14 20:06   ` [PATCH v2 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
@ 2025-04-14 20:41     ` Junio C Hamano
  2025-04-15 19:32       ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Junio C Hamano @ 2025-04-14 20:41 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> pack-objects has a handful of explicit checks for pairs of command-line
> options which are mutually incompatible. Many of these pre-date
> a699367bb8 (i18n: factorize more 'incompatible options' messages,
> 2022-01-31).
>
> Convert the explicit checks into die_for_incompatible_opt2() calls,
> which simplifies the implementation and standardizes pack-objects'
> output when given incompatible options (e.g., --stdin-packs with
> --filter gives different output than --keep-unreachable with
> --unpack-unreachable).

Makes sense.

> -	if (stdin_packs && filter_options.choice)
> -		die(_("cannot use --filter with --stdin-packs"));
> +	die_for_incompatible_opt2(filter_options.choice, "--filter",
> +				  stdin_packs, "--stdin-packs");

The order of check is now reversed (which does not make any
difference to correctness or performance), but this way, we list the
options in the same order in the message as before, which is nice.

>  		test_must_fail git pack-objects --stdin-packs --stdout \
>  			--filter=blob:none </dev/null 2>err &&
> -		test_grep "cannot use --filter with --stdin-packs" err
> +		test_grep "options .--filter. and .--stdin-packs. cannot be used together" err
>  	)
>  '

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 2/8] object-store-ll.h: add note about designated initializers
  2025-04-14 20:06   ` [PATCH v2 2/8] object-store-ll.h: add note about designated initializers Taylor Blau
@ 2025-04-14 21:07     ` Junio C Hamano
  2025-04-15 19:51       ` Taylor Blau
  2025-04-15  2:57     ` Elijah Newren
  1 sibling, 1 reply; 105+ messages in thread
From: Junio C Hamano @ 2025-04-14 21:07 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> @@ -337,6 +337,14 @@ struct object_info {
>  /*
>   * Initializer for a "struct object_info" that wants no items. You may
>   * also memset() the memory to all-zeroes.
> + *
> + * NOTE: callers expect the initial value of an object_info struct to
> + * be zero'd out. Designated initializers like
> + *
> + *     struct object_info oi = { .sizep = &sz };
> + *
> + * depend on this behavior, so consider strongly before adding new
> + * fields that have a non-zero default value.
>   */
>  #define OBJECT_INFO_INIT { 0 }

Hmph, after thinking hard enough, if a developer cannot come up with
a way to avoid non-zero default value, the callers could just work
if they instead did

	struct object_info oi = OBJECT_INFO_INIT;
        oi.sizep = &sz;

and the member of non-zero default value can be delat with by
updating the default initializer, perhaps like

	#define OBJECT_INFO_INIT { .enabled = 1 }

So I am not sure how the advice in the new comment really helps the
intended audiences.  Shouldn't the advice be more like

    NOTE: when a structure foo has FOO_INIT macro to initialize,
    *never* use your own initialization like so:

	struct foo foo_instance = { .member_i_care_about = 13 };

    Instead, use the _INIT macro and then assign to the member you
    care about, like so:

	struct foo foo_instance = FOO_INIT;
	foo_instance.member_i_care_about = 13;

    This is because there may be members of "struct foo" whose
    default value is not zero, or there will later be added such
    members to the structure.

perhaps?

I can buy a counter-proposal that does not forbid a custom
designated initializers that depends on "all zero" default IF it
gives a piece of advice to the readers that is more usable than
"consider strongly before adding", as "consider strongly before
adding" there smells like just an euphemism for "never add", though.

Thanks.





^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) where possible
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (7 preceding siblings ...)
  2025-04-14 20:06   ` [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
@ 2025-04-15  2:57   ` Elijah Newren
  2025-04-15 22:05     ` Taylor Blau
  8 siblings, 1 reply; 105+ messages in thread
From: Elijah Newren @ 2025-04-15  2:57 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Here is a non-RFC version of my series to explore creating MIDXs while
> repacking that don't include the cruft pack.
>
> The core idea behind this approach is to ensure that packs generated via
> geometric repacking traverse through objects that appear in packs which
> are neither included nor excluded.

This phrasing feels confusing -- what does it mean for packs to be
neither included nor excluded?  Maybe:

"The core idea behind this approach is to allow some (most) of the
objects in a pack to be excluded, while still including some subset of
objects from that pack as part of the repack.  In particular, we
include the objects in that pack which are reachable from the other
objects we repack.  This is different from our current handling which
either entirely includes or entirely excludes all objects from a given
pack."

> Then if some commit (for example) in
> a pack reaches some once-unreachable object stored in a cruft pack, the
> pack generated via geometric repacking will pick up and write a copy of
> that object during its traversal.
>
> If you repack consistently using this strategy, you can guarantee that
> the union of geometrically-repacked packs are closed under reachability
> without having to keep track of any cruft pack(s) in the MIDX.

Also, if you do a single non-geometric repack with this strategy, you
are also closed under reachability, right?  Is that the suggested
transition plan for those that want to use this...first do a
non-geometric repack, and then ensure that subsequent geometric
repacks are done with this strategy?

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 2/8] object-store-ll.h: add note about designated initializers
  2025-04-14 20:06   ` [PATCH v2 2/8] object-store-ll.h: add note about designated initializers Taylor Blau
  2025-04-14 21:07     ` Junio C Hamano
@ 2025-04-15  2:57     ` Elijah Newren
  2025-04-15 19:47       ` Taylor Blau
  1 sibling, 1 reply; 105+ messages in thread
From: Elijah Newren @ 2025-04-15  2:57 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> The following commit will use a designated initializer to initialize a
> 'struct object_info'. This obviously depends on having the rest of the
> fields having a default value of zero, since unspecified fields in a
> designated initializer are zero'd out.
>
> Before writing that designated initializer, I wondered if there were
> other spots that also use designated initializers to set up object_info
> structs, and there are a handful.
>
> To prevent potential breakage against future object_info changes that
> would introduce/change a field to have a non-zero default value, note
> this dependency in a comment near the OBJECT_INFO_INIT macro.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  object-store-ll.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/object-store-ll.h b/object-store-ll.h
> index cd3bd5bd99..7ff180d7f2 100644
> --- a/object-store-ll.h
> +++ b/object-store-ll.h
> @@ -337,6 +337,14 @@ struct object_info {
>  /*
>   * Initializer for a "struct object_info" that wants no items. You may
>   * also memset() the memory to all-zeroes.
> + *
> + * NOTE: callers expect the initial value of an object_info struct to
> + * be zero'd out. Designated initializers like
> + *
> + *     struct object_info oi = { .sizep = &sz };
> + *
> + * depend on this behavior, so consider strongly before adding new
> + * fields that have a non-zero default value.
>   */
>  #define OBJECT_INFO_INIT { 0 }

There are 46 #define'd designated initializers in the code base, from
DIR_INIT to OIDMAP_INIT and everything in-between.  The logic used in
your comment to suggest not using an all-zeroes initializer doesn't
seem to depend in any way on something specific to object_info, yet
none of those other 46 cases in my quick scanning have such a warning.
And 29 of the 46 define some kind of initial value for some fields
instead of using all zeroes.  That would suggest that one of the
following is true: (a) those 29 cases are buggy and shouldn't be doing
that, (b) those 29 are all special cases someone has thought through
carefully but perhaps someone should add the same warning you have
here to those 29 other cases to avoid uncarefully thought cases from
being added, (c) there is something specific about object_info that
you didn't call out here, or (d) this warning you add is unnecessary.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 3/8] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-04-14 20:06   ` [PATCH v2 3/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
@ 2025-04-15  3:10     ` Elijah Newren
  0 siblings, 0 replies; 105+ messages in thread
From: Elijah Newren @ 2025-04-15  3:10 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> add_object_entry_from_pack() handles objects from identified packs by
> checking their type, before adding commit objects as pending in the
> subsequent traversal used by `--stdin-packs`.
>
> There are a couple of quality-of-life refactorings that I noticed while
> working in this area:
>
>   - We declare 'revs' (given to us through the miscellaneous context
>     argument) earlier in the "if (p)" conditional than is necessary.

Fair enough.

>   - The 'struct object_info' can use a designated initializer to fill in
>     the structures type pointer, since that is the only field that we
>     care about.

I prefer the original; it's more future-proof against someone making
OBJECT_INFO_INIT be something other than an all-zero initializer.

>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index aaea968ed2..540e5eba9e 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3490,14 +3490,12 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>                 return 0;
>
>         if (p) {
> -               struct rev_info *revs = _data;
> -               struct object_info oi = OBJECT_INFO_INIT;
> -
> -               oi.typep = &type;
> +               struct object_info oi = { .typep = &type };
>                 if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
>                         die(_("could not get type of object %s in pack %s"),
>                             oid_to_hex(oid), p->pack_name);
>                 } else if (type == OBJ_COMMIT) {
> +                       struct rev_info *revs = _data;
>                         /*
>                          * commits in included packs are used as starting points for the
>                          * subsequent revision walk
> --
> 2.49.0.229.gc267761125.dirty

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 6/8] pack-objects: perform name-hash traversal for unpacked objects
  2025-04-14 20:06   ` [PATCH v2 6/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
@ 2025-04-15  3:10     ` Elijah Newren
  2025-04-15 19:57       ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Elijah Newren @ 2025-04-15  3:10 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> With '--unpacked', pack-objects adds loose objects (which don't appear
> in any of the excluded packs from '--stdin-packs') to the output pack
> without considering them as reachability tips for the name-hash
> traversal.
>
> This was an oversight in the original implementation of '--stdin-packs',
> since the code which enumerates and adds loose objects to the output
> pack (`add_unreachable_loose_objects()`) did not have access to the
> 'rev_info' struct found in `read_packs_list_from_stdin()`.
>
> Excluding unpacked objects from that traversal doesn't effect the

s/effect/affect/ ?

> correctness of the resulting pack, but it does make it harder to
> discover good deltas for loose objects.
>
> Now that the 'rev_info' struct is declared outside of
> `read_packs_list_from_stdin()`, we can pass it to
> `add_objects_in_unpacked_packs()` and add any loose objects as tips to
> the above-mentioned traversal, in theory producing slightly tighter
> packs as a result.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 1689cddd3a..2aa12da4af 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3642,7 +3642,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
>         string_list_clear(&exclude_packs, 0);
>  }
>
> -static void add_unreachable_loose_objects(void);
> +static void add_unreachable_loose_objects(struct rev_info *revs);
>
>  static void read_stdin_packs(int rev_list_unpacked)
>  {
> @@ -3669,7 +3669,7 @@ static void read_stdin_packs(int rev_list_unpacked)
>         ignore_packed_keep_in_core = 1;
>         read_packs_list_from_stdin(&revs);
>         if (rev_list_unpacked)
> -               add_unreachable_loose_objects();
> +               add_unreachable_loose_objects(&revs);
>
>         if (prepare_revision_walk(&revs))
>                 die(_("revision walk setup failed"));
> @@ -3788,7 +3788,7 @@ static void enumerate_cruft_objects(void)
>                                                 _("Enumerating cruft objects"), 0);
>
>         add_objects_in_unpacked_packs();
> -       add_unreachable_loose_objects();
> +       add_unreachable_loose_objects(NULL);
>
>         stop_progress(&progress_state);
>  }
> @@ -4066,8 +4066,9 @@ static void add_objects_in_unpacked_packs(void)
>  }
>
>  static int add_loose_object(const struct object_id *oid, const char *path,
> -                           void *data UNUSED)
> +                           void *data)
>  {
> +       struct rev_info *revs = data;
>         enum object_type type = oid_object_info(the_repository, oid, NULL);
>
>         if (type < 0) {
> @@ -4088,6 +4089,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
>         } else {
>                 add_object_entry(oid, type, "", 0);
>         }
> +
> +       if (revs && type == OBJ_COMMIT)
> +               add_pending_oid(revs, NULL, oid, 0);
> +
>         return 0;
>  }
>
> @@ -4096,11 +4101,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
>   * add_object_entry will weed out duplicates, so we just add every
>   * loose object we find.
>   */
> -static void add_unreachable_loose_objects(void)
> +static void add_unreachable_loose_objects(struct rev_info *revs)
>  {
>         for_each_loose_file_in_objdir(repo_get_object_directory(the_repository),
> -                                     add_loose_object,
> -                                     NULL, NULL, NULL);
> +                                     add_loose_object, NULL, NULL, revs);
>  }
>
>  static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid)
> @@ -4356,7 +4360,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
>         if (keep_unreachable)
>                 add_objects_in_unpacked_packs();
>         if (pack_loose_unreachable)
> -               add_unreachable_loose_objects();
> +               add_unreachable_loose_objects(NULL);
>         if (unpack_unreachable)
>                 loosen_unused_packed_objects();
>
> --
> 2.49.0.229.gc267761125.dirty

Should this patch have some tests demonstrating the difference in
which objects are included?

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 7/8] pack-objects: introduce '--stdin-packs=follow'
  2025-04-14 20:06   ` [PATCH v2 7/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
@ 2025-04-15  3:11     ` Elijah Newren
  2025-04-15 20:45       ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Elijah Newren @ 2025-04-15  3:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> When invoked with '--stdin-packs', pack-objects will generate a pack
> which contains the objects found in the "included" packs, less any
> objects from "excluded" packs.
>
> Packs that exist in the repository but weren't specified as either
> included or excluded are in practice treated like the latter, at least
> in the sense that pack-objects won't include objects from those packs.
> This behavior forces us to include any cruft pack(s) in a repository's
> multi-pack index for the reasons described in ddee3703b3
> (builtin/repack.c: add cruft packs to MIDX during geometric repack,
> 2022-05-20).
>
> The full details are in ddee3703b3, but the gist is if you
> have a once-unreachable object in a cruft pack which later becomes
> reachable via one or more commits in a pack generated with
> '--stdin-packs', you *have* to include that object in the MIDX via the
> copy in the cruft pack, otherwise we cannot generate reachability
> bitmaps for any commits which reach that object.
>
> This prepares us for new repacking behavior which will "resurrect"
> objects found in cruft or otherwise unspecified packs when generating
> new packs. In the context of geometric repacking, this may be used to
> maintain a sequence of geometrically-repacked packs, the union of which
> is closed under reachability, even in the case described earlier.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Documentation/git-pack-objects.adoc |   8 ++-
>  builtin/pack-objects.c              |  89 +++++++++++++++++-------
>  t/t5331-pack-objects-stdin.sh       | 101 ++++++++++++++++++++++++++++
>  3 files changed, 171 insertions(+), 27 deletions(-)
>
> diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
> index 7f69ae4855..c894582799 100644
> --- a/Documentation/git-pack-objects.adoc
> +++ b/Documentation/git-pack-objects.adoc
> @@ -87,13 +87,19 @@ base-name::
>         reference was included in the resulting packfile.  This
>         can be useful to send new tags to native Git clients.
>
> ---stdin-packs::
> +--stdin-packs[=<mode>]::
>         Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
>         from the standard input, instead of object names or revision
>         arguments. The resulting pack contains all objects listed in the
>         included packs (those not beginning with `^`), excluding any
>         objects listed in the excluded packs (beginning with `^`).
>  +
> +When `mode` is "follow", pack objects which are reachable from objects
> +in the included packs, but appear in packs that are not listed.
> +Reachable objects which appear in excluded packs are not packed. Useful
> +for resurrecting once-cruft objects to generate packs which are closed
> +under reachability up to the excluded packs.

Maybe:

When `mode` is "follow", objects from packs not listed on stdin
receive special treatment.  Objects within unlisted packs will be
included if those objects (1) are reachable from the included packs,
and (2) are not also found in any of the excluded packs.  This mode is
useful for resurrecting once-cruft objects to generate packs which are
closed under reachability up to the boundary set by the excluded
packs.

> ++
>  Incompatible with `--revs`, or options that imply `--revs` (such as
>  `--all`), with the exception of `--unpacked`, which is compatible.
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 2aa12da4af..6406f4a5b1 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -272,6 +272,12 @@ static struct oidmap configured_exclusions;
>  static struct oidset excluded_by_config;
>  static int name_hash_version = -1;
>
> +enum stdin_packs_mode {
> +       STDIN_PACKS_MODE_NONE,
> +       STDIN_PACKS_MODE_STANDARD,
> +       STDIN_PACKS_MODE_FOLLOW,
> +};
> +
>  /**
>   * Check whether the name_hash_version chosen by user input is appropriate,
>   * and also validate whether it is compatible with other features.
> @@ -3511,32 +3517,43 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>         return 0;
>  }
>
> -static void show_commit_pack_hint(struct commit *commit UNUSED,
> -                                 void *data UNUSED)
> -{
> -       /* nothing to do; commits don't have a namehash */
> -}
> -
>  static void show_object_pack_hint(struct object *object, const char *name,
> -                                 void *data UNUSED)
> +                                 void *data)
>  {
> -       struct object_entry *oe = packlist_find(&to_pack, &object->oid);
> -       if (!oe)
> +       enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
> +       if (mode == STDIN_PACKS_MODE_FOLLOW) {
> +               add_object_entry(&object->oid, object->type, name, 0);
> +       } else {
> +               struct object_entry *oe = packlist_find(&to_pack, &object->oid);
> +               if (!oe)
> +                       return;
> +
> +               /*
> +                * Our 'to_pack' list was constructed by iterating all
> +                * objects packed in included packs, and so doesn't
> +                * have a non-zero hash field that you would typically
> +                * pick up during a reachability traversal.
> +                *
> +                * Make a best-effort attempt to fill in the ->hash
> +                * and ->no_try_delta here using a now in order to
> +                * perhaps improve the delta selection process.
> +                */

I know you just moved this paragraph from below...but it doesn't parse
for me.  "using a now in order to perhaps"?  What does that mean?

> +               oe->hash = pack_name_hash_fn(name);
> +               oe->no_try_delta = name && no_try_delta(name);
> +
> +               stdin_packs_hints_nr++;
> +       }
> +}
> +
> +static void show_commit_pack_hint(struct commit *commit, void *data)
> +{
> +       enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
> +       if (mode == STDIN_PACKS_MODE_FOLLOW) {
> +               show_object_pack_hint((struct object *)commit, "", data);
>                 return;
> +       }
> +       /* nothing to do; commits don't have a namehash */
>
> -       /*
> -        * Our 'to_pack' list was constructed by iterating all objects packed in
> -        * included packs, and so doesn't have a non-zero hash field that you
> -        * would typically pick up during a reachability traversal.
> -        *
> -        * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
> -        * here using a now in order to perhaps improve the delta selection
> -        * process.
> -        */
> -       oe->hash = pack_name_hash_fn(name);
> -       oe->no_try_delta = name && no_try_delta(name);
> -
> -       stdin_packs_hints_nr++;
>  }

It might be worth swapping the order of functions as a preparatory
patch (both here and when you've done it elsewhere in this series),
just because it'll make the diff so much easier to read when we can
see the changes to the function without have to also deal with the
order swapping (since order swapping looks like a large deletion and
large addition of one of the two functions).

>  static int pack_mtime_cmp(const void *_a, const void *_b)
> @@ -3644,7 +3661,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
>
>  static void add_unreachable_loose_objects(struct rev_info *revs);
>
> -static void read_stdin_packs(int rev_list_unpacked)
> +static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
>  {
>         struct rev_info revs;
>
> @@ -3676,7 +3693,7 @@ static void read_stdin_packs(int rev_list_unpacked)
>         traverse_commit_list(&revs,
>                              show_commit_pack_hint,
>                              show_object_pack_hint,
> -                            NULL);
> +                            &mode);
>
>         trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
>                            stdin_packs_found_nr);
> @@ -4467,6 +4484,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
>         return is_not_in_promisor_pack_obj((struct object *) commit, data);
>  }
>
> +static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
> +                                 int unset)
> +{
> +       enum stdin_packs_mode *mode = opt->value;
> +
> +       if (unset)
> +               *mode = STDIN_PACKS_MODE_NONE;
> +       else if (!arg || !*arg)
> +               *mode = STDIN_PACKS_MODE_STANDARD;

I don't understand why you have both a None mode and a Standard mode,
especially since the implementation seems to only care about whether
or not the Follow mode has been set.  Shouldn't these both be setting
mode to the same value?

> +       else if (!strcmp(arg, "follow"))
> +               *mode = STDIN_PACKS_MODE_FOLLOW;
> +       else
> +               die(_("invalid value for '%s': '%s'"), opt->long_name, arg);
> +
> +       return 0;
> +}
> +
>  int cmd_pack_objects(int argc,
>                      const char **argv,
>                      const char *prefix,
> @@ -4478,7 +4512,7 @@ int cmd_pack_objects(int argc,
>         struct strvec rp = STRVEC_INIT;
>         int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
>         int rev_list_index = 0;
> -       int stdin_packs = 0;
> +       enum stdin_packs_mode stdin_packs = STDIN_PACKS_MODE_NONE;
>         struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
>         struct list_objects_filter_options filter_options =
>                 LIST_OBJECTS_FILTER_INIT;
> @@ -4533,6 +4567,9 @@ int cmd_pack_objects(int argc,
>                 OPT_SET_INT_F(0, "indexed-objects", &rev_list_index,
>                               N_("include objects referred to by the index"),
>                               1, PARSE_OPT_NONEG),
> +               OPT_CALLBACK_F(0, "stdin-packs", &stdin_packs, N_("mode"),
> +                            N_("read packs from stdin"),
> +                            PARSE_OPT_OPTARG, parse_stdin_packs_mode),
>                 OPT_BOOL(0, "stdin-packs", &stdin_packs,
>                          N_("read packs from stdin")),
>                 OPT_BOOL(0, "stdout", &pack_to_stdout,
> @@ -4788,7 +4825,7 @@ int cmd_pack_objects(int argc,
>                 progress_state = start_progress(the_repository,
>                                                 _("Enumerating objects"), 0);
>         if (stdin_packs) {
> -               read_stdin_packs(rev_list_unpacked);
> +               read_stdin_packs(stdin_packs, rev_list_unpacked);
>         } else if (cruft) {
>                 read_cruft_objects();
>         } else if (!use_internal_rev_list) {
> diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
> index 4f5e2733a2..f97d2d1b71 100755
> --- a/t/t5331-pack-objects-stdin.sh
> +++ b/t/t5331-pack-objects-stdin.sh
> @@ -236,4 +236,105 @@ test_expect_success 'pack-objects --stdin with packfiles from main and alternate
>         test_cmp expected-objects actual-objects
>  '
>
> +packdir=.git/objects/pack
> +
> +objects_in_packs () {
> +       for p in "$@"
> +       do
> +               git show-index <"$packdir/pack-$p.idx" || return 1
> +       done >objects.raw &&
> +
> +       cut -d' ' -f2 objects.raw | sort &&
> +       rm -f objects.raw
> +}
> +
> +test_expect_success 'setup for --stdin-packs=follow' '
> +       git init stdin-packs--follow &&
> +       (
> +               cd stdin-packs--follow &&
> +
> +               for c in A B C D
> +               do
> +                       test_commit "$c" || return 1
> +               done &&
> +
> +               A="$(echo A | git pack-objects --revs $packdir/pack)" &&
> +               B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
> +               C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
> +               D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
> +
> +               git prune-packed
> +       )
> +'
> +
> +test_expect_success '--stdin-packs=follow walks into unknown packs' '
> +       test_when_finished "rm -fr repo" &&
> +
> +       git init repo &&
> +       (
> +               cd repo &&
> +
> +               for c in A B C D
> +               do
> +                       test_commit "$c" || return 1
> +               done &&
> +
> +               A="$(echo A | git pack-objects --revs $packdir/pack)" &&
> +               B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
> +               C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
> +               D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
> +
> +               git prune-packed &&
> +
> +               cat >in <<-EOF &&
> +               pack-$B.pack
> +               ^pack-$C.pack
> +               pack-$D.pack
> +               EOF
> +
> +               # With just --stdin-packs, pack "A" is unknown to us, so
> +               # only objects from packs "B" and "D" are included in
> +               # the output pack.
> +               P=$(git pack-objects --stdin-packs $packdir/pack <in) &&
> +               objects_in_packs $B $D >expect &&
> +               objects_in_packs $P >actual &&
> +               test_cmp expect actual &&
> +
> +               # But with --stdin-packs=follow, objects from both
> +               # included packs reach objects from the unknown pack, so
> +               # objects from pack "A" is included in the output pack
> +               # in addition to the above.
> +               P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
> +               objects_in_packs $A $B $D >expect &&
> +               objects_in_packs $P >actual &&
> +               test_cmp expect actual &&
> +
> +               test_commit E &&
> +               # And with --unpacked, we will pick up objects from unknown
> +               # packs that are reachable from loose objects. Loose object E
> +               # reaches objects in pack A, but there are three excluded packs
> +               # in between.
> +               #
> +               # The resulting pack should include objects reachable from E
> +               # that are not present in packs B, C, or D, along with those
> +               # present in pack A.
> +               cat >in <<-EOF &&
> +               ^pack-$B.pack
> +               ^pack-$C.pack
> +               ^pack-$D.pack
> +               EOF
> +
> +               P=$(git pack-objects --stdin-packs=follow --unpacked \
> +                       $packdir/pack <in) &&
> +
> +               {
> +                       objects_in_packs $A &&
> +                       git rev-list --objects --no-object-names D..E
> +               }>expect.raw &&
> +               sort expect.raw >expect &&
> +               objects_in_packs $P >actual &&
> +               test_cmp expect actual
> +       )
> +'
> +
>  test_done
> --
> 2.49.0.229.gc267761125.dirty

I like the tests -- normal --stdin-packs, then --stdin-packs=follow,
then --stdin-packs=follow + --unpacked.

However, would it be worthwhile to create commit E immediately after
creating the packs?

Currently, the third test shows us that unpacked objects are included
when --unpacked is passed.  But the tests don't let us know whether
that flag is necessary, i.e. whether unpacked objects will just be
included anyway.  If you move the creation of commit E as a loose
object immediately after the pack creation and before all the tests,
then these same tests demonstrate this additional bit of information.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible
  2025-04-14 20:06   ` [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
@ 2025-04-15  3:11     ` Elijah Newren
  2025-04-15 20:51       ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Elijah Newren @ 2025-04-15  3:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
> geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
> MIDX with '--write-midx' to ensure that the resulting MIDX was always
> closed under reachability in order to generate reachability bitmaps.
>
> Suppose you have a once-unreachable object packed in a cruft pack, which
> later on becomes reachable from one or more objects in a geometrically
> repacked pack. That once-unreachable object *won't* appear in the new
> pack, since the cruft pack was specified as neither included nor
> excluded to 'pack-objects --stdin-packs'.

I believe you are talking about the state before your series (i.e.,
this is carrying on from the previous paragraph), but it reads as
though you are talking about the state after the first seven patches
of this series.  Some kind of connection wording to clarify would
really help here.

> If the bitmap selection
> process picks one or more commits which reach the once-unreachable
> objects, commit ddee3703b3 ensures that the MIDX will be closed under
> reachability. Without it, we would fail to generate a MIDX bitmap.

After reading this part, I had to go back and re-read and figure out
what point in time everything was referring to.

> ddee3703b3 alludes to the fact that this is sub-optimal by saying
>
>     [...] it's desirable to avoid including cruft packs in the MIDX
>     because it causes the MIDX to store a bunch of objects which are
>     likely to get thrown away.
>
> , which is true, but hides an even larger problem. If repositories
> rarely prune their unreachable objects and/or have many of them, the
> MIDX must keep track of a large number of objects which bloats the MIDX
> and slows down object lookup.
>
> This is doubly unfortunate because the vast majority of objects in cruft
> pack(s) are unlikely to be read, but object reads that go through the
> MIDX have to search through them anyway.

"have to search through them"?  That could be read to suggest those
individual objects are read, rather than just traversed over.  Maybe
"...unlikely to be read, so the enlarged MIDX is for mostly tracking
known-to-likely-be-irrelevant objects", or something like that?

> This patch causes geometrically-repacked packs to contain a copy of any
> once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
> allowing us to avoid including any cruft packs in the MIDX. This is
> because a sequence of geometrically-repacked packs that were all
> generated with '--stdin-packs=follow' are guaranteed to have their union
> be closed under reachability.
>
> Note that you cannot guarantee that a collection of packs is closed
> under reachability if not all of them were generated with following as

maybe: ...with "follow" as above.  "follow" or "following" feels like
it needs quotes so the reader understands its meant as the name of a
mode, rather than a verb in the sentence.

> above. One tell-tale sign that not all geometrically-repacked packs in
> the MIDX were generated with following is to see if there is a pack in

same here with "following"...and below.

> the existing MIDX that is not going to be somehow represented (either
> verbatim or as part of a geometric rollup) in the new MIDX.
>
> If there is, then starting to generate packs with following during
> geometric repacking won't work, since it's open to the same race as
> described above.
>
> But if you're starting from scratch (e.g., building the first MIDX after
> an all-into-one '--cruft' repack), then you can guarantee that the union
> of subsequently generated packs from geometric repacking *is* closed
> under reachability.
>
> Detect when this is the case and avoid including cruft packs in the MIDX
> where possible. The existing behavior remains the default, and the new
> behavior is available with the config 'repack.midxMustIncludeCruft' set
> to 'false'.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Documentation/config/repack.adoc |   7 ++
>  builtin/repack.c                 | 162 +++++++++++++++++++++++++++----
>  t/t7704-repack-cruft.sh          |  90 +++++++++++++++++
>  3 files changed, 241 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/config/repack.adoc b/Documentation/config/repack.adoc
> index c79af6d7b8..e9e78dcb19 100644
> --- a/Documentation/config/repack.adoc
> +++ b/Documentation/config/repack.adoc
> @@ -39,3 +39,10 @@ repack.cruftThreads::
>         a cruft pack and the respective parameters are not given over
>         the command line. See similarly named `pack.*` configuration
>         variables for defaults and meaning.
> +
> +repack.midxMustContainCruft::
> +       When set to true, linkgit:git-repack[1] will unconditionally include
> +       cruft pack(s), if any, in the multi-pack index when invoked with
> +       `--write-midx`. When false, cruft packs are only included in the MIDX
> +       when necessary (e.g., because they might be required to form a
> +       reachability closure with MIDX bitmaps). Defaults to true.
> diff --git a/builtin/repack.c b/builtin/repack.c
> index f3330ade7b..ee43a4f4c1 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -39,6 +39,7 @@ static int write_bitmaps = -1;
>  static int use_delta_islands;
>  static int run_update_server_info = 1;
>  static char *packdir, *packtmp_name, *packtmp;
> +static int midx_must_contain_cruft = 1;
>
>  static const char *const git_repack_usage[] = {
>         N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
> @@ -107,6 +108,10 @@ static int repack_config(const char *var, const char *value,
>                 free(cruft_po_args->threads);
>                 return git_config_string(&cruft_po_args->threads, var, value);
>         }
> +       if (!strcmp(var, "repack.midxmustcontaincruft")) {
> +               midx_must_contain_cruft = git_config_bool(var, value);
> +               return 0;
> +       }
>         return git_default_config(var, value, ctx, cb);
>  }
>
> @@ -687,6 +692,77 @@ static void free_pack_geometry(struct pack_geometry *geometry)
>         free(geometry->pack);
>  }
>
> +static int midx_has_unknown_packs(char **midx_pack_names,
> +                                 size_t midx_pack_names_nr,
> +                                 struct string_list *include,
> +                                 struct pack_geometry *geometry,
> +                                 struct existing_packs *existing)
> +{
> +       size_t i;
> +
> +       string_list_sort(include);
> +
> +       for (i = 0; i < midx_pack_names_nr; i++) {
> +               const char *pack_name = midx_pack_names[i];
> +
> +               /*
> +                * Determine whether or not each MIDX'd pack from the existing
> +                * MIDX (if any) is represented in the new MIDX. For each pack
> +                * in the MIDX, it must either be:
> +                *
> +                *  - In the "include" list of packs to be included in the new
> +                *    MIDX. Note this function is called before the include
> +                *    list is populated with any cruft pack(s).
> +                *
> +                *  - Below the geometric split line (if using pack geometry),
> +                *    indicating that the pack won't be included in the new
> +                *    MIDX, but its contents were rolled up as part of the
> +                *    geometric repack.
> +                *
> +                *  - In the existing non-kept packs list (if not using pack
> +                *    geometry), and marked as non-deleted.
> +                */
> +               if (string_list_has_string(include, pack_name)) {
> +                       continue;
> +               } else if (geometry) {
> +                       struct strbuf buf = STRBUF_INIT;
> +                       uint32_t j;
> +
> +                       for (j = 0; j < geometry->split; j++) {
> +                               strbuf_reset(&buf);
> +                               strbuf_addstr(&buf, pack_basename(geometry->pack[j]));
> +                               strbuf_strip_suffix(&buf, ".pack");
> +                               strbuf_addstr(&buf, ".idx");
> +
> +                               if (!strcmp(pack_name, buf.buf)) {
> +                                       strbuf_release(&buf);
> +                                       break;
> +                               }
> +                       }
> +
> +                       strbuf_release(&buf);
> +
> +                       if (j < geometry->split)
> +                               continue;
> +               } else {
> +                       struct string_list_item *item;
> +
> +                       item = string_list_lookup(&existing->non_kept_packs,
> +                                                 pack_name);
> +                       if (item && !pack_is_marked_for_deletion(item))
> +                               continue;
> +               }
> +
> +               /*
> +                * If we got to this point, the MIDX includes some pack that we
> +                * don't know about.
> +                */
> +               return 1;
> +       }
> +
> +       return 0;
> +}
> +
>  struct midx_snapshot_ref_data {
>         struct tempfile *f;
>         struct oidset seen;
> @@ -755,6 +831,8 @@ static void midx_snapshot_refs(struct tempfile *f)
>
>  static void midx_included_packs(struct string_list *include,
>                                 struct existing_packs *existing,
> +                               char **midx_pack_names,
> +                               size_t midx_pack_names_nr,
>                                 struct string_list *names,
>                                 struct pack_geometry *geometry)
>  {
> @@ -808,26 +886,55 @@ static void midx_included_packs(struct string_list *include,
>                 }
>         }
>
> -       for_each_string_list_item(item, &existing->cruft_packs) {
> +       if (midx_must_contain_cruft ||
> +           midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
> +                                  include, geometry, existing)) {
>                 /*
> -                * When doing a --geometric repack, there is no need to check
> -                * for deleted packs, since we're by definition not doing an
> -                * ALL_INTO_ONE repack (hence no packs will be deleted).
> -                * Otherwise we must check for and exclude any packs which are
> -                * enqueued for deletion.
> +                * If there are one or more unknown pack(s) present (see
> +                * midx_has_unknown_packs() for what makes a pack
> +                * "unknown") in the MIDX before the repack, keep them
> +                * as they may be required to form a reachability
> +                * closure if the MIDX is bitmapped.
>                  *
> -                * So we could omit the conditional below in the --geometric
> -                * case, but doing so is unnecessary since no packs are marked
> -                * as pending deletion (since we only call
> -                * `mark_packs_for_deletion()` when doing an all-into-one
> -                * repack).
> +                * For example, a cruft pack can be required to form a
> +                * reachability closure if the MIDX is bitmapped and one
> +                * or more of its selected commits reaches a once-cruft
> +                * object that was later made reachable.

The antecedent of "its" is unclear here; just spell it out to reduce
how much thinking the reader needs to do?

>                  */
> -               if (pack_is_marked_for_deletion(item))
> -                       continue;
> +               for_each_string_list_item(item, &existing->cruft_packs) {
> +                       /*
> +                        * When doing a --geometric repack, there is no
> +                        * need to check for deleted packs, since we're
> +                        * by definition not doing an ALL_INTO_ONE
> +                        * repack (hence no packs will be deleted).
> +                        * Otherwise we must check for and exclude any
> +                        * packs which are enqueued for deletion.
> +                        *
> +                        * So we could omit the conditional below in the
> +                        * --geometric case, but doing so is unnecessary
> +                        *  since no packs are marked as pending
> +                        *  deletion (since we only call
> +                        *  `mark_packs_for_deletion()` when doing an
> +                        *  all-into-one repack).
> +                        */
> +                       if (pack_is_marked_for_deletion(item))
> +                               continue;
>
> -               strbuf_reset(&buf);
> -               strbuf_addf(&buf, "%s.idx", item->string);
> -               string_list_insert(include, buf.buf);
> +                       strbuf_reset(&buf);
> +                       strbuf_addf(&buf, "%s.idx", item->string);
> +                       string_list_insert(include, buf.buf);
> +               }
> +       } else {
> +               /*
> +                * Modern versions of Git will write new copies of
> +                * once-cruft objects when doing a --geometric repack.

"Modern versions of Git" -> "Modern versions of Git with the
appropriate config setting" ?


> +                *
> +                * If the MIDX has no cruft pack, new packs written
> +                * during a --geometric repack will not rely on the
> +                * cruft pack to form a reachability closure, so we can
> +                * avoid including them in the MIDX in that case.
> +                */
> +               ;
>         }
>
>         strbuf_release(&buf);
> @@ -1142,6 +1249,8 @@ int cmd_repack(int argc,
>         struct tempfile *refs_snapshot = NULL;
>         int i, ext, ret;
>         int show_progress;
> +       char **midx_pack_names = NULL;
> +       size_t midx_pack_names_nr = 0;
>
>         /* variables to be filled by option parsing */
>         int delete_redundant = 0;
> @@ -1356,7 +1465,10 @@ int cmd_repack(int argc,
>                     !(pack_everything & PACK_CRUFT))
>                         strvec_push(&cmd.args, "--pack-loose-unreachable");
>         } else if (geometry.split_factor) {
> -               strvec_push(&cmd.args, "--stdin-packs");
> +               if (midx_must_contain_cruft)
> +                       strvec_push(&cmd.args, "--stdin-packs");
> +               else
> +                       strvec_push(&cmd.args, "--stdin-packs=follow");
>                 strvec_push(&cmd.args, "--unpacked");
>         } else {
>                 strvec_push(&cmd.args, "--unpacked");
> @@ -1478,6 +1590,16 @@ int cmd_repack(int argc,
>
>         string_list_sort(&names);
>
> +       if (get_local_multi_pack_index(the_repository)) {
> +               uint32_t i;
> +               struct multi_pack_index *m =
> +                       get_local_multi_pack_index(the_repository);
> +
> +               ALLOC_ARRAY(midx_pack_names, m->num_packs);
> +               for (i = 0; i < m->num_packs; i++)
> +                       midx_pack_names[midx_pack_names_nr++] = xstrdup(m->pack_names[i]);
> +       }
> +
>         close_object_store(the_repository->objects);
>
>         /*
> @@ -1519,7 +1641,8 @@ int cmd_repack(int argc,
>
>         if (write_midx) {
>                 struct string_list include = STRING_LIST_INIT_DUP;
> -               midx_included_packs(&include, &existing, &names, &geometry);
> +               midx_included_packs(&include, &existing, midx_pack_names,
> +                                   midx_pack_names_nr, &names, &geometry);
>
>                 ret = write_midx_included_packs(&include, &geometry, &names,
>                                                 refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
> @@ -1570,6 +1693,9 @@ int cmd_repack(int argc,
>         string_list_clear(&names, 1);
>         existing_packs_release(&existing);
>         free_pack_geometry(&geometry);
> +       for (size_t i = 0; i < midx_pack_names_nr; i++)
> +               free(midx_pack_names[i]);
> +       free(midx_pack_names);
>         pack_objects_args_release(&po_args);
>         pack_objects_args_release(&cruft_po_args);
>
> diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
> index 8aebfb45f5..2b0a55f8fd 100755
> --- a/t/t7704-repack-cruft.sh
> +++ b/t/t7704-repack-cruft.sh
> @@ -724,4 +724,94 @@ test_expect_success 'cruft repack respects --quiet' '
>         )
>  '
>
> +setup_cruft_exclude_tests() {
> +       git init "$1" &&
> +       (
> +               cd "$1" &&
> +
> +               git config repack.midxMustContainCruft false &&
> +
> +               test_commit one &&
> +
> +               test_commit --no-tag two &&
> +               two="$(git rev-parse HEAD)" &&
> +               test_commit --no-tag three &&
> +               three="$(git rev-parse HEAD)" &&
> +               git reset --hard one &&
> +               git reflog expire --all --expire=all &&
> +
> +               GIT_TEST_MULTI_PACK_INDEX=0 git repack --cruft -d &&
> +
> +               git merge $two &&
> +               test_commit four
> +       )
> +}
> +
> +test_expect_success 'repack --write-midx excludes cruft where possible' '
> +       setup_cruft_exclude_tests exclude-cruft-when-possible &&
> +       (
> +               cd exclude-cruft-when-possible &&
> +
> +               GIT_TEST_MULTI_PACK_INDEX=0 \
> +               git repack -d --geometric=2 --write-midx --write-bitmap-index &&
> +
> +               test-tool read-midx --show-objects $objdir >midx &&
> +               cruft="$(ls $packdir/*.mtimes)" &&
> +               test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
> +
> +               git rev-list --all --objects --no-object-names >reachable.raw &&
> +               sort reachable.raw >reachable.objects &&
> +               awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
> +
> +               test_cmp reachable.objects midx.objects
> +       )
> +'
> +
> +test_expect_success 'repack --write-midx includes cruft when instructed' '
> +       setup_cruft_exclude_tests exclude-cruft-when-instructed &&
> +       (
> +               cd exclude-cruft-when-instructed &&
> +
> +               GIT_TEST_MULTI_PACK_INDEX=0 \
> +               git -c repack.midxMustContainCruft=true repack \
> +                       -d --geometric=2 --write-midx --write-bitmap-index &&
> +
> +               test-tool read-midx --show-objects $objdir >midx &&
> +               cruft="$(ls $packdir/*.mtimes)" &&
> +               test_grep "$(basename "$cruft" .mtimes).idx" midx &&
> +
> +               git cat-file --batch-check="%(objectname)" --batch-all-objects \
> +                       >all.objects &&
> +               awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
> +
> +               test_cmp all.objects midx.objects
> +       )
> +'
> +
> +test_expect_success 'repack --write-midx includes cruft when necessary' '
> +       setup_cruft_exclude_tests exclude-cruft-when-necessary &&
> +       (
> +               cd exclude-cruft-when-necessary &&
> +
> +               test_path_is_file $(ls $packdir/pack-*.mtimes) &&
> +               ls $packdir/pack-*.idx | sort >packs.all &&
> +               grep -o "pack-.*\.idx$" packs.all >in &&
> +
> +               git multi-pack-index write --stdin-packs --bitmap <in &&
> +
> +               test_commit five &&
> +               GIT_TEST_MULTI_PACK_INDEX=0 \
> +               git repack -d --geometric=2 --write-midx --write-bitmap-index &&
> +
> +               test-tool read-midx --show-objects $objdir >midx &&
> +               awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
> +               git cat-file --batch-all-objects --batch-check="%(objectname)" \
> +                       >expect.objects &&
> +               test_cmp expect.objects midx.objects &&
> +
> +               grep "^pack-" midx >midx.packs &&
> +               test_line_count = "$(($(wc -l <packs.all) + 1))" midx.packs
> +       )
> +'
> +
>  test_done
> --
> 2.49.0.229.gc267761125.dirty

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 1/8] pack-objects: use standard option incompatibility functions
  2025-04-14 20:41     ` Junio C Hamano
@ 2025-04-15 19:32       ` Taylor Blau
  2025-04-15 19:48         ` Junio C Hamano
  0 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 19:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Jeff King

On Mon, Apr 14, 2025 at 01:41:27PM -0700, Junio C Hamano wrote:
> > -	if (stdin_packs && filter_options.choice)
> > -		die(_("cannot use --filter with --stdin-packs"));
> > +	die_for_incompatible_opt2(filter_options.choice, "--filter",
> > +				  stdin_packs, "--stdin-packs");
>
> The order of check is now reversed (which does not make any
> difference to correctness or performance), but this way, we list the
> options in the same order in the message as before, which is nice.

Now I can't un-see it ;-). Even though it's not a correctness issue as
you note, the whole thing leaves a bad taste in my mouth. I'll swap the
ordering to match the original in the next round.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 2/8] object-store-ll.h: add note about designated initializers
  2025-04-15  2:57     ` Elijah Newren
@ 2025-04-15 19:47       ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 19:47 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 07:57:59PM -0700, Elijah Newren wrote:
> There are 46 #define'd designated initializers in the code base, from
> DIR_INIT to OIDMAP_INIT and everything in-between.  The logic used in
> your comment to suggest not using an all-zeroes initializer doesn't
> seem to depend in any way on something specific to object_info, yet
> none of those other 46 cases in my quick scanning have such a warning.
> And 29 of the 46 define some kind of initial value for some fields
> instead of using all zeroes.  That would suggest that one of the
> following is true: (a) those 29 cases are buggy and shouldn't be doing
> that, (b) those 29 are all special cases someone has thought through
> carefully but perhaps someone should add the same warning you have
> here to those 29 other cases to avoid uncarefully thought cases from
> being added, (c) there is something specific about object_info that
> you didn't call out here, or (d) this warning you add is unnecessary.

For (a), I wouldn't say that the _INIT macros are the potentially-buggy
component, but rather the use of a designated initializer in the
presence of an _INIT macro that initializes the struct to something
other than all zeros.

I don't think there is anything specific to the object_info structure
here, which suggests (d), but I don't think that the warning is
unnecessary, it's just overly-specific. I think we should have a
convention in CodingGuidelines that forbids designated initializers in
structures with non-zero _INIT macros that don't otherwise have a
comment about their correctness.

But I think there is some discussion to be had there, and I want to
disentangle that from this series. I'm going to drop this and the
following patch from this series and split them out so we can discuss
them more thoroughly while letting this series move ahead.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 1/8] pack-objects: use standard option incompatibility functions
  2025-04-15 19:32       ` Taylor Blau
@ 2025-04-15 19:48         ` Junio C Hamano
  2025-04-15 22:27           ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Junio C Hamano @ 2025-04-15 19:48 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> Now I can't un-see it ;-). Even though it's not a correctness issue as
> you note, the whole thing leaves a bad taste in my mouth. I'll swap the
> ordering to match the original in the next round.

I do not think we can be completely faithful to the original in this
rewrite, simply because the original is not consistent with what
die_for_incompat() thing produces and you'd need to adjust the test
anyway.  So unless there are other things you need to reroll, I
wouldn't worry about it too much.

Thanks.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 2/8] object-store-ll.h: add note about designated initializers
  2025-04-14 21:07     ` Junio C Hamano
@ 2025-04-15 19:51       ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 19:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Jeff King

On Mon, Apr 14, 2025 at 02:07:27PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > @@ -337,6 +337,14 @@ struct object_info {
> >  /*
> >   * Initializer for a "struct object_info" that wants no items. You may
> >   * also memset() the memory to all-zeroes.
> > + *
> > + * NOTE: callers expect the initial value of an object_info struct to
> > + * be zero'd out. Designated initializers like
> > + *
> > + *     struct object_info oi = { .sizep = &sz };
> > + *
> > + * depend on this behavior, so consider strongly before adding new
> > + * fields that have a non-zero default value.
> >   */
> >  #define OBJECT_INFO_INIT { 0 }
>
> Hmph, after thinking hard enough, if a developer cannot come up with
> a way to avoid non-zero default value, the callers could just work
> if they instead did
>
> 	struct object_info oi = OBJECT_INFO_INIT;
>         oi.sizep = &sz;
>
> and the member of non-zero default value can be delat with by
> updating the default initializer, perhaps like
>
> 	#define OBJECT_INFO_INIT { .enabled = 1 }

Yeah... that's what I was trying to get at with this patch. Basically,
"if you have an _INIT macro with non-zero defaults, don't use a custom
designated initializer and instead assign fields after using the _INIT
macro (which itself is a designated initializer)".

But like I wrote to Elijah in the same sub-thread, I think that there is
probably more to discuss here, so I ejected this patch from my copy of
the series and will re-submit it as its own series later on.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 6/8] pack-objects: perform name-hash traversal for unpacked objects
  2025-04-15  3:10     ` Elijah Newren
@ 2025-04-15 19:57       ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 19:57 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 08:10:51PM -0700, Elijah Newren wrote:
> On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > With '--unpacked', pack-objects adds loose objects (which don't appear
> > in any of the excluded packs from '--stdin-packs') to the output pack
> > without considering them as reachability tips for the name-hash
> > traversal.
> >
> > This was an oversight in the original implementation of '--stdin-packs',
> > since the code which enumerates and adds loose objects to the output
> > pack (`add_unreachable_loose_objects()`) did not have access to the
> > 'rev_info' struct found in `read_packs_list_from_stdin()`.
> >
> > Excluding unpacked objects from that traversal doesn't effect the
>
> s/effect/affect/ ?

Oops, yes.

> Should this patch have some tests demonstrating the difference in
> which objects are included?

No; this patch doesn't actually change the set of objects we include
with '--stdin-packs' in conjunction with '--unpacked', it just alters
their name-hash values in an attempt to produce better deltas.

I don't think we have any tests that check this traversal in the packed
or unpacked case, though we could probably add some. It's not obvious
how we'd test that the traversal actually produced better/different
deltas, but we could at least check that it happened with the trace2
identifier "pack-objects/stdin_packs_hints".

I think it's probably worth doing at some point, though I don't think I
see it as especially urgent, unless you feel strongly otherwise.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 7/8] pack-objects: introduce '--stdin-packs=follow'
  2025-04-15  3:11     ` Elijah Newren
@ 2025-04-15 20:45       ` Taylor Blau
  2025-04-16  5:26         ` Elijah Newren
  0 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 20:45 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 08:11:08PM -0700, Elijah Newren wrote:
> > diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
> > index 7f69ae4855..c894582799 100644
> > --- a/Documentation/git-pack-objects.adoc
> > +++ b/Documentation/git-pack-objects.adoc
> > @@ -87,13 +87,19 @@ base-name::
> >         reference was included in the resulting packfile.  This
> >         can be useful to send new tags to native Git clients.
> >
> > ---stdin-packs::
> > +--stdin-packs[=<mode>]::
> >         Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
> >         from the standard input, instead of object names or revision
> >         arguments. The resulting pack contains all objects listed in the
> >         included packs (those not beginning with `^`), excluding any
> >         objects listed in the excluded packs (beginning with `^`).
> >  +
> > +When `mode` is "follow", pack objects which are reachable from objects
> > +in the included packs, but appear in packs that are not listed.
> > +Reachable objects which appear in excluded packs are not packed. Useful
> > +for resurrecting once-cruft objects to generate packs which are closed
> > +under reachability up to the excluded packs.
>
> Maybe:
>
> When `mode` is "follow", objects from packs not listed on stdin
> receive special treatment.  Objects within unlisted packs will be
> included if those objects (1) are reachable from the included packs,
> and (2) are not also found in any of the excluded packs.  This mode is
> useful for resurrecting once-cruft objects to generate packs which are
> closed under reachability up to the boundary set by the excluded
> packs.

I like it. I went with your version with some minor rewording and tweaks
on top.

> > +               /*
> > +                * Our 'to_pack' list was constructed by iterating all
> > +                * objects packed in included packs, and so doesn't
> > +                * have a non-zero hash field that you would typically
> > +                * pick up during a reachability traversal.
> > +                *
> > +                * Make a best-effort attempt to fill in the ->hash
> > +                * and ->no_try_delta here using a now in order to
> > +                * perhaps improve the delta selection process.
> > +                */
>
> I know you just moved this paragraph from below...but it doesn't parse
> for me.  "using a now in order to perhaps"?  What does that mean?

Yeah, this is just bogus, and was so before this patch. The rewording is
minor enough (just dropping "using a now") that I think we can just
squash it in with the movement in this patch.

> > +               oe->hash = pack_name_hash_fn(name);
> > +               oe->no_try_delta = name && no_try_delta(name);
> > +
> > +               stdin_packs_hints_nr++;
> > +       }
> > +}
> > +
> > +static void show_commit_pack_hint(struct commit *commit, void *data)
> > +{
> > +       enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
> > +       if (mode == STDIN_PACKS_MODE_FOLLOW) {
> > +               show_object_pack_hint((struct object *)commit, "", data);
> >                 return;
> > +       }
> > +       /* nothing to do; commits don't have a namehash */
> >
> > -       /*
> > -        * Our 'to_pack' list was constructed by iterating all objects packed in
> > -        * included packs, and so doesn't have a non-zero hash field that you
> > -        * would typically pick up during a reachability traversal.
> > -        *
> > -        * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
> > -        * here using a now in order to perhaps improve the delta selection
> > -        * process.
> > -        */
> > -       oe->hash = pack_name_hash_fn(name);
> > -       oe->no_try_delta = name && no_try_delta(name);
> > -
> > -       stdin_packs_hints_nr++;
> >  }
>
> It might be worth swapping the order of functions as a preparatory
> patch (both here and when you've done it elsewhere in this series),
> just because it'll make the diff so much easier to read when we can
> see the changes to the function without have to also deal with the
> order swapping (since order swapping looks like a large deletion and
> large addition of one of the two functions).

Fair enough.

> > @@ -4467,6 +4484,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
> >         return is_not_in_promisor_pack_obj((struct object *) commit, data);
> >  }
> >
> > +static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
> > +                                 int unset)
> > +{
> > +       enum stdin_packs_mode *mode = opt->value;
> > +
> > +       if (unset)
> > +               *mode = STDIN_PACKS_MODE_NONE;
> > +       else if (!arg || !*arg)
> > +               *mode = STDIN_PACKS_MODE_STANDARD;
>
> I don't understand why you have both a None mode and a Standard mode,
> especially since the implementation seems to only care about whether
> or not the Follow mode has been set.  Shouldn't these both be setting
> mode to the same value?

I'm not sure I follow your question... stdin_packs is a tri-state. It
can be off, on in standard/legacy mode, or on in follow mode.

> > +test_expect_success 'setup for --stdin-packs=follow' '
> > +       git init stdin-packs--follow &&
> > +       (
> > +               cd stdin-packs--follow &&
> > +
> > +               for c in A B C D
> > +               do
> > +                       test_commit "$c" || return 1
> > +               done &&
> > +
> > +               A="$(echo A | git pack-objects --revs $packdir/pack)" &&
> > +               B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
> > +               C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
> > +               D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
> > +
> > +               git prune-packed
> > +       )
> > +'

Huh, I have no idea how this snuck in. This "setup" test does nothing
and creates a repository that isn't used later on in the script.
Probably leftover from writing these tests in the first place, but I've
removed it.

> I like the tests -- normal --stdin-packs, then --stdin-packs=follow,
> then --stdin-packs=follow + --unpacked.

I think the normal tests are accidental since we use pack-objects to
write packs A, B, C, and D. But the --stdin-packs vs.
--stdin-packs=follow and --stdin-packs=follow + --unpacked was
definitely intentional.

> However, would it be worthwhile to create commit E immediately after
> creating the packs?

Yeah, I think that is a good suggestion. We already have tests that
exercise --stdin-packs with --unpacked earlier in the same script, but
obviously not with --stdin-packs=follow. Moving the creation of commit E
earlier up makes a lot of sense to me, thanks!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible
  2025-04-15  3:11     ` Elijah Newren
@ 2025-04-15 20:51       ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 20:51 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 08:11:22PM -0700, Elijah Newren wrote:
> On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
> > geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
> > MIDX with '--write-midx' to ensure that the resulting MIDX was always
> > closed under reachability in order to generate reachability bitmaps.
> >
> > Suppose you have a once-unreachable object packed in a cruft pack, which
> > later on becomes reachable from one or more objects in a geometrically
> > repacked pack. That once-unreachable object *won't* appear in the new
> > pack, since the cruft pack was specified as neither included nor
> > excluded to 'pack-objects --stdin-packs'.
>
> I believe you are talking about the state before your series (i.e.,
> this is carrying on from the previous paragraph), but it reads as
> though you are talking about the state after the first seven patches
> of this series.  Some kind of connection wording to clarify would
> really help here.

Sure.

> > If the bitmap selection
> > process picks one or more commits which reach the once-unreachable
> > objects, commit ddee3703b3 ensures that the MIDX will be closed under
> > reachability. Without it, we would fail to generate a MIDX bitmap.
>
> After reading this part, I had to go back and re-read and figure out
> what point in time everything was referring to.

Yeah, this is confusing to me too after reading it back. I made some
tweaks that I think clarify things.

> > ddee3703b3 alludes to the fact that this is sub-optimal by saying
> >
> >     [...] it's desirable to avoid including cruft packs in the MIDX
> >     because it causes the MIDX to store a bunch of objects which are
> >     likely to get thrown away.
> >
> > , which is true, but hides an even larger problem. If repositories
> > rarely prune their unreachable objects and/or have many of them, the
> > MIDX must keep track of a large number of objects which bloats the MIDX
> > and slows down object lookup.
> >
> > This is doubly unfortunate because the vast majority of objects in cruft
> > pack(s) are unlikely to be read, but object reads that go through the
> > MIDX have to search through them anyway.
>
> "have to search through them"?  That could be read to suggest those
> individual objects are read, rather than just traversed over.  Maybe
> "...unlikely to be read, so the enlarged MIDX is for mostly tracking
> known-to-likely-be-irrelevant objects", or something like that?

Thanks for pointing out... I clarified this one as well.

> > This patch causes geometrically-repacked packs to contain a copy of any
> > once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
> > allowing us to avoid including any cruft packs in the MIDX. This is
> > because a sequence of geometrically-repacked packs that were all
> > generated with '--stdin-packs=follow' are guaranteed to have their union
> > be closed under reachability.
> >
> > Note that you cannot guarantee that a collection of packs is closed
> > under reachability if not all of them were generated with following as
>
> maybe: ...with "follow" as above.  "follow" or "following" feels like
> it needs quotes so the reader understands its meant as the name of a
> mode, rather than a verb in the sentence.
>
> > above. One tell-tale sign that not all geometrically-repacked packs in
> > the MIDX were generated with following is to see if there is a pack in
>
> same here with "following"...and below.

Great calls on both, thanks.

> > @@ -808,26 +886,55 @@ static void midx_included_packs(struct string_list *include,
> >                 }
> >         }
> >
> > -       for_each_string_list_item(item, &existing->cruft_packs) {
> > +       if (midx_must_contain_cruft ||
> > +           midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
> > +                                  include, geometry, existing)) {
> >                 /*
> > -                * When doing a --geometric repack, there is no need to check
> > -                * for deleted packs, since we're by definition not doing an
> > -                * ALL_INTO_ONE repack (hence no packs will be deleted).
> > -                * Otherwise we must check for and exclude any packs which are
> > -                * enqueued for deletion.
> > +                * If there are one or more unknown pack(s) present (see
> > +                * midx_has_unknown_packs() for what makes a pack
> > +                * "unknown") in the MIDX before the repack, keep them
> > +                * as they may be required to form a reachability
> > +                * closure if the MIDX is bitmapped.
> >                  *
> > -                * So we could omit the conditional below in the --geometric
> > -                * case, but doing so is unnecessary since no packs are marked
> > -                * as pending deletion (since we only call
> > -                * `mark_packs_for_deletion()` when doing an all-into-one
> > -                * repack).
> > +                * For example, a cruft pack can be required to form a
> > +                * reachability closure if the MIDX is bitmapped and one
> > +                * or more of its selected commits reaches a once-cruft
> > +                * object that was later made reachable.
>
> The antecedent of "its" is unclear here; just spell it out to reduce
> how much thinking the reader needs to do?

Eek, good suggestion again. Thanks, I fixed it up and made the
antecedent explicit.

> >                  */
> > -               if (pack_is_marked_for_deletion(item))
> > -                       continue;
> > +               for_each_string_list_item(item, &existing->cruft_packs) {
> > +                       /*
> > +                        * When doing a --geometric repack, there is no
> > +                        * need to check for deleted packs, since we're
> > +                        * by definition not doing an ALL_INTO_ONE
> > +                        * repack (hence no packs will be deleted).
> > +                        * Otherwise we must check for and exclude any
> > +                        * packs which are enqueued for deletion.
> > +                        *
> > +                        * So we could omit the conditional below in the
> > +                        * --geometric case, but doing so is unnecessary
> > +                        *  since no packs are marked as pending
> > +                        *  deletion (since we only call
> > +                        *  `mark_packs_for_deletion()` when doing an
> > +                        *  all-into-one repack).
> > +                        */
> > +                       if (pack_is_marked_for_deletion(item))
> > +                               continue;
> >
> > -               strbuf_reset(&buf);
> > -               strbuf_addf(&buf, "%s.idx", item->string);
> > -               string_list_insert(include, buf.buf);
> > +                       strbuf_reset(&buf);
> > +                       strbuf_addf(&buf, "%s.idx", item->string);
> > +                       string_list_insert(include, buf.buf);
> > +               }
> > +       } else {
> > +               /*
> > +                * Modern versions of Git will write new copies of
> > +                * once-cruft objects when doing a --geometric repack.
>
> "Modern versions of Git" -> "Modern versions of Git with the
> appropriate config setting" ?

Heh. Great catch. Can you tell this part was written before I added the
configuration option? ;-)

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) where possible
  2025-04-15  2:57   ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Elijah Newren
@ 2025-04-15 22:05     ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:05 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano

On Mon, Apr 14, 2025 at 07:57:52PM -0700, Elijah Newren wrote:
> On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > Here is a non-RFC version of my series to explore creating MIDXs while
> > repacking that don't include the cruft pack.
> >
> > The core idea behind this approach is to ensure that packs generated via
> > geometric repacking traverse through objects that appear in packs which
> > are neither included nor excluded.
>
> This phrasing feels confusing -- what does it mean for packs to be
> neither included nor excluded?  Maybe:
>
> "The core idea behind this approach is to allow some (most) of the
> objects in a pack to be excluded, while still including some subset of
> objects from that pack as part of the repack.  In particular, we
> include the objects in that pack which are reachable from the other
> objects we repack.  This is different from our current handling which
> either entirely includes or entirely excludes all objects from a given
> pack."

I am admittedly having a little bit of a hard time parsing your version
of this, but I think this part:

    [...] In particular, we include the objects in that pack which are
    reachable from the other objects we repack.

isn't quite right. It's not that the output pack contains objects
reachable from the other objects we repack, but rather it contains the
reachable objects from the other objects we repack *if* those objects
don't appear in an excluded pack given as part of the input.

> > Then if some commit (for example) in
> > a pack reaches some once-unreachable object stored in a cruft pack, the
> > pack generated via geometric repacking will pick up and write a copy of
> > that object during its traversal.
> >
> > If you repack consistently using this strategy, you can guarantee that
> > the union of geometrically-repacked packs are closed under reachability
> > without having to keep track of any cruft pack(s) in the MIDX.
>
> Also, if you do a single non-geometric repack with this strategy, you
> are also closed under reachability, right?  Is that the suggested
> transition plan for those that want to use this...first do a
> non-geometric repack, and then ensure that subsequent geometric
> repacks are done with this strategy?

Yeah, the last commit gets at this a bit. The property you have to
maintain is that the union of geometrically-repacked packs (which form
the MIDX) are and stay closed under reachability. I am pretty sure that
the way this is constructed, adding new geometrically-repacked packs to
the chain does not violate this property[^1].

But you can't guarantee it part of the way through a sequence of
geometric repacks, which is what midx_has_unknown_packs() is checking
for.

If you do an all-into-one cruft repack first, then there is no MIDX to
begin with, so there aren't any unknown packs to worry about (since
there are no packs in a MIDX to begin with). When that property is met,
then we can use the new behavior.

Thanks,
Taylor

[^1]: So long as you don't drop part of the geometric progression, e.g.,
      if you have some pack that was in the existing MIDX, but wasn't
      repacked or included in the new MIDX.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 1/8] pack-objects: use standard option incompatibility functions
  2025-04-15 19:48         ` Junio C Hamano
@ 2025-04-15 22:27           ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Jeff King

On Tue, Apr 15, 2025 at 12:48:53PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Now I can't un-see it ;-). Even though it's not a correctness issue as
> > you note, the whole thing leaves a bad taste in my mouth. I'll swap the
> > ordering to match the original in the next round.
>
> I do not think we can be completely faithful to the original in this
> rewrite, simply because the original is not consistent with what
> die_for_incompat() thing produces and you'd need to adjust the test
> anyway.  So unless there are other things you need to reroll, I
> wouldn't worry about it too much.

Yeah, we need to adjust the test either way. I just disliked reading the
patch and seeing:

    if (stdin_packs && filter_options.choice)
      die(_("--stdin-packs and --filter can't be used together"));

turn into

    die_for_incompatible_opt2(filter_options.choice, "--filter",
                              stdin_packs, "--stdin-packs");

since the check is "stdin_packs then filter_options.choice" in the
original, but "filter_options.choice then stdin_packs" in this patch.

Funny enough, the test that breaks expects output that mentions
"--filter" before "--stdin-packs" here, so preserving the order of the
check in the code reverses the order in which the incompatible arguments
appear in the die() message.

> Thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH v3 0/9] repack: avoid MIDX'ing cruft pack(s) where possible
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
                   ` (8 preceding siblings ...)
  2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
@ 2025-04-15 22:46 ` Taylor Blau
  2025-04-15 22:46   ` [PATCH v3 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
                     ` (8 more replies)
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                   ` (2 subsequent siblings)
  12 siblings, 9 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:46 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Here is a small-ish reroll of my series to explore creating MIDXs while
repacking that don't include the cruft pack.

The bulk of the series is unchanged, save for a few minor points that
I'll call out here (as usual, a complete range-diff is below for
convenience):

  * Dropped the patch adding a warning about using designated
    initializers. I think that we should resurrect this patch soon and
    update the CodingGuidelines, but I'd rather disentangle that from
    this series.

  * Dropped the designated initializer component of the "limit scope"
    patch.

  * Swapped ordering on one of the die_for_incompatible_opt2() checks.

  * Various wording tweaks.

  * Split out some code movement changes into their own patches to make
    substantive patches easier to read/review.

  * Test updates and cleanup.

These changes are all thanks to helpful review from Junio and Elijah.
Thanks, both!

Otherwise the series is unchanged. I still need to deploy it to GitHub's
infrastructure and try it out on some internal repos, but I should be
able to do that tomorrow and report back on my findings a few days after
that.

Thanks in advance for any review :-).

Taylor Blau (9):
  pack-objects: use standard option incompatibility functions
  pack-objects: limit scope in 'add_object_entry_from_pack()'
  pack-objects: factor out handling '--stdin-packs'
  pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  pack-objects: perform name-hash traversal for unpacked objects
  pack-objects: fix typo in 'show_object_pack_hint()'
  pack-objects: swap 'show_{object,commit}_pack_hint'
  pack-objects: introduce '--stdin-packs=follow'
  repack: exclude cruft pack(s) from the MIDX where possible

 Documentation/config/repack.adoc    |   7 +
 Documentation/git-pack-objects.adoc |  10 +-
 builtin/pack-objects.c              | 192 ++++++++++++++++++----------
 builtin/repack.c                    | 163 ++++++++++++++++++++---
 t/t5331-pack-objects-stdin.sh       |  84 +++++++++++-
 t/t7704-repack-cruft.sh             |  90 +++++++++++++
 6 files changed, 456 insertions(+), 90 deletions(-)

Range-diff against v2:
 1:  65bc7e4630 !  1:  f8b31c6a8d pack-objects: use standard option incompatibility functions
    @@ builtin/pack-objects.c: int cmd_pack_objects(int argc,
      
     -	if (stdin_packs && filter_options.choice)
     -		die(_("cannot use --filter with --stdin-packs"));
    -+	die_for_incompatible_opt2(filter_options.choice, "--filter",
    -+				  stdin_packs, "--stdin-packs");
    ++	die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
    ++				  filter_options.choice, "--filter");
    ++
      
      	if (stdin_packs && use_internal_rev_list)
      		die(_("cannot use internal rev list with --stdin-packs"));
    @@ t/t5331-pack-objects-stdin.sh: test_expect_success '--stdin-packs is incompatibl
      		test_must_fail git pack-objects --stdin-packs --stdout \
      			--filter=blob:none </dev/null 2>err &&
     -		test_grep "cannot use --filter with --stdin-packs" err
    -+		test_grep "options .--filter. and .--stdin-packs. cannot be used together" err
    ++		test_grep "options .--stdin-packs. and .--filter. cannot be used together" err
      	)
      '
      
 2:  920c91eb1e <  -:  ---------- object-store-ll.h: add note about designated initializers
 3:  f8ac36b110 !  2:  986bef29b5 pack-objects: limit scope in 'add_object_entry_from_pack()'
    @@ Metadata
      ## Commit message ##
         pack-objects: limit scope in 'add_object_entry_from_pack()'
     
    -    add_object_entry_from_pack() handles objects from identified packs by
    -    checking their type, before adding commit objects as pending in the
    -    subsequent traversal used by `--stdin-packs`.
    -
    -    There are a couple of quality-of-life refactorings that I noticed while
    -    working in this area:
    -
    -      - We declare 'revs' (given to us through the miscellaneous context
    -        argument) earlier in the "if (p)" conditional than is necessary.
    -
    -      - The 'struct object_info' can use a designated initializer to fill in
    -        the structures type pointer, since that is the only field that we
    -        care about.
    +    In add_object_entry_from_pack() we declare 'revs' (given to us through
    +    the miscellaneous context argument) earlier in the "if (p)" conditional
    +    than is necessary.  Move it down as far as it can go to reduce its
    +    scope.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct objec
      
      	if (p) {
     -		struct rev_info *revs = _data;
    --		struct object_info oi = OBJECT_INFO_INIT;
    + 		struct object_info oi = OBJECT_INFO_INIT;
     -
    --		oi.typep = &type;
    -+		struct object_info oi = { .typep = &type };
    + 		oi.typep = &type;
    ++
      		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
      			die(_("could not get type of object %s in pack %s"),
      			    oid_to_hex(oid), p->pack_name);
 4:  5e03b482ba =  3:  6f8fe8a4e1 pack-objects: factor out handling '--stdin-packs'
 5:  bccbac2ec5 =  4:  2a235461a6 pack-objects: declare 'rev_info' for '--stdin-packs' earlier
 6:  0bc2183dc3 !  5:  240e90b68d pack-objects: perform name-hash traversal for unpacked objects
    @@ Commit message
         pack (`add_unreachable_loose_objects()`) did not have access to the
         'rev_info' struct found in `read_packs_list_from_stdin()`.
     
    -    Excluding unpacked objects from that traversal doesn't effect the
    +    Excluding unpacked objects from that traversal doesn't affect the
         correctness of the resulting pack, but it does make it harder to
         discover good deltas for loose objects.
     
 -:  ---------- >  6:  9a18fa2e52 pack-objects: fix typo in 'show_object_pack_hint()'
 -:  ---------- >  7:  6c997853f1 pack-objects: swap 'show_{object,commit}_pack_hint'
 7:  697a337cb1 !  8:  0ff699f056 pack-objects: introduce '--stdin-packs=follow'
    @@ Documentation/git-pack-objects.adoc: base-name::
      	included packs (those not beginning with `^`), excluding any
      	objects listed in the excluded packs (beginning with `^`).
      +
    -+When `mode` is "follow", pack objects which are reachable from objects
    -+in the included packs, but appear in packs that are not listed.
    -+Reachable objects which appear in excluded packs are not packed. Useful
    -+for resurrecting once-cruft objects to generate packs which are closed
    -+under reachability up to the excluded packs.
    ++When `mode` is "follow", objects from packs not listed on stdin receive
    ++special treatment. Objects within unlisted packs will be included if
    ++those objects are (1) reachable from the included packs, and (2) not
    ++found in any excluded packs. This mode is useful, for example, to
    ++resurrect once-unreachable objects found in cruft packs to generate
    ++packs which are closed under reachability up to the boundary set by the
    ++excluded packs.
     ++
      Incompatible with `--revs`, or options that imply `--revs` (such as
      `--all`), with the exception of `--unpacked`, which is compatible.
    @@ builtin/pack-objects.c: static struct oidmap configured_exclusions;
       * Check whether the name_hash_version chosen by user input is appropriate,
       * and also validate whether it is compatible with other features.
     @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct object_id *oid,
    - 	return 0;
      }
      
    --static void show_commit_pack_hint(struct commit *commit UNUSED,
    --				  void *data UNUSED)
    --{
    --	/* nothing to do; commits don't have a namehash */
    --}
    --
      static void show_object_pack_hint(struct object *object, const char *name,
     -				  void *data UNUSED)
     +				  void *data)
      {
     -	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
     -	if (!oe)
    +-		return;
     +	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
     +	if (mode == STDIN_PACKS_MODE_FOLLOW) {
     +		add_object_entry(&object->oid, object->type, name, 0);
    @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct objec
     +		struct object_entry *oe = packlist_find(&to_pack, &object->oid);
     +		if (!oe)
     +			return;
    -+
    -+		/*
    -+		 * Our 'to_pack' list was constructed by iterating all
    -+		 * objects packed in included packs, and so doesn't
    -+		 * have a non-zero hash field that you would typically
    -+		 * pick up during a reachability traversal.
    -+		 *
    -+		 * Make a best-effort attempt to fill in the ->hash
    -+		 * and ->no_try_delta here using a now in order to
    -+		 * perhaps improve the delta selection process.
    -+		 */
    -+		oe->hash = pack_name_hash_fn(name);
    -+		oe->no_try_delta = name && no_try_delta(name);
    -+
    -+		stdin_packs_hints_nr++;
    -+	}
    -+}
    -+
    -+static void show_commit_pack_hint(struct commit *commit, void *data)
    -+{
    -+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
    -+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
    -+		show_object_pack_hint((struct object *)commit, "", data);
    - 		return;
    -+	}
    -+	/* nothing to do; commits don't have a namehash */
      
     -	/*
     -	 * Our 'to_pack' list was constructed by iterating all objects packed in
    @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct objec
     -	 * would typically pick up during a reachability traversal.
     -	 *
     -	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
    --	 * here using a now in order to perhaps improve the delta selection
    +-	 * fields here in order to perhaps improve the delta selection
     -	 * process.
     -	 */
     -	oe->hash = pack_name_hash_fn(name);
     -	oe->no_try_delta = name && no_try_delta(name);
    --
    ++		/*
    ++		 * Our 'to_pack' list was constructed by iterating all
    ++		 * objects packed in included packs, and so doesn't have
    ++		 * a non-zero hash field that you would typically pick
    ++		 * up during a reachability traversal.
    ++		 *
    ++		 * Make a best-effort attempt to fill in the ->hash and
    ++		 * ->no_try_delta fields here in order to perhaps
    ++		 * improve the delta selection process.
    ++		 */
    ++		oe->hash = pack_name_hash_fn(name);
    ++		oe->no_try_delta = name && no_try_delta(name);
    + 
     -	stdin_packs_hints_nr++;
    ++		stdin_packs_hints_nr++;
    ++	}
    + }
    + 
    +-static void show_commit_pack_hint(struct commit *commit UNUSED,
    +-				  void *data UNUSED)
    ++static void show_commit_pack_hint(struct commit *commit, void *data)
    + {
    ++	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
    ++
    ++	if (mode == STDIN_PACKS_MODE_FOLLOW) {
    ++		show_object_pack_hint((struct object *)commit, "", data);
    ++		return;
    ++	}
    ++
    + 	/* nothing to do; commits don't have a namehash */
    ++
      }
      
      static int pack_mtime_cmp(const void *_a, const void *_b)
    @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with pa
     +	rm -f objects.raw
     +}
     +
    -+test_expect_success 'setup for --stdin-packs=follow' '
    -+	git init stdin-packs--follow &&
    -+	(
    -+		cd stdin-packs--follow &&
    -+
    -+		for c in A B C D
    -+		do
    -+			test_commit "$c" || return 1
    -+		done &&
    -+
    -+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
    -+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
    -+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
    -+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
    -+
    -+		git prune-packed
    -+	)
    -+'
    -+
     +test_expect_success '--stdin-packs=follow walks into unknown packs' '
     +	test_when_finished "rm -fr repo" &&
     +
    @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with pa
     +		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
     +		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
     +		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
    ++		test_commit E &&
     +
     +		git prune-packed &&
     +
    @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with pa
     +		objects_in_packs $P >actual &&
     +		test_cmp expect actual &&
     +
    -+		test_commit E &&
     +		# And with --unpacked, we will pick up objects from unknown
     +		# packs that are reachable from loose objects. Loose object E
     +		# reaches objects in pack A, but there are three excluded packs
 8:  a2ec1b826c !  9:  58891101f3 repack: exclude cruft pack(s) from the MIDX where possible
    @@ Commit message
         MIDX with '--write-midx' to ensure that the resulting MIDX was always
         closed under reachability in order to generate reachability bitmaps.
     
    -    Suppose you have a once-unreachable object packed in a cruft pack, which
    -    later on becomes reachable from one or more objects in a geometrically
    -    repacked pack. That once-unreachable object *won't* appear in the new
    -    pack, since the cruft pack was specified as neither included nor
    -    excluded to 'pack-objects --stdin-packs'. If the bitmap selection
    -    process picks one or more commits which reach the once-unreachable
    -    objects, commit ddee3703b3 ensures that the MIDX will be closed under
    -    reachability. Without it, we would fail to generate a MIDX bitmap.
    +    Suppose (prior to this patch) you have a once-unreachable object packed
    +    in a cruft pack, which later on becomes reachable from one or more
    +    objects in a geometrically repacked pack. That once-unreachable object
    +    *won't* appear in the new pack, since the cruft pack was specified as
    +    neither included nor excluded to 'pack-objects --stdin-packs'. If the
    +    new pack is included in a MIDX without the cruft pack, then trying to
    +    generate bitmaps for that MIDX may fail. This happens when the bitmap
    +    selection process picks one or more commits which reach the
    +    once-unreachable objects, commit ddee3703b3 ensures that the MIDX will
    +    be closed under reachability. Without it, we would fail to generate a
    +    MIDX bitmap.
     
         ddee3703b3 alludes to the fact that this is sub-optimal by saying
     
    @@ Commit message
         and slows down object lookup.
     
         This is doubly unfortunate because the vast majority of objects in cruft
    -    pack(s) are unlikely to be read, but object reads that go through the
    -    MIDX have to search through them anyway.
    +    pack(s) are unlikely to be read. But any object lookups that go through
    +    the MIDX must binary search over them anyway, slowing down object
    +    lookups using the MIDX.
     
         This patch causes geometrically-repacked packs to contain a copy of any
         once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
    @@ Commit message
         be closed under reachability.
     
         Note that you cannot guarantee that a collection of packs is closed
    -    under reachability if not all of them were generated with following as
    +    under reachability if not all of them were generated with "following" as
         above. One tell-tale sign that not all geometrically-repacked packs in
    -    the MIDX were generated with following is to see if there is a pack in
    +    the MIDX were generated with "following" is to see if there is a pack in
         the existing MIDX that is not going to be somehow represented (either
         verbatim or as part of a geometric rollup) in the new MIDX.
     
    -    If there is, then starting to generate packs with following during
    +    If there is, then starting to generate packs with "following" during
         geometric repacking won't work, since it's open to the same race as
         described above.
     
    @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
     -		 * repack).
     +		 * For example, a cruft pack can be required to form a
     +		 * reachability closure if the MIDX is bitmapped and one
    -+		 * or more of its selected commits reaches a once-cruft
    -+		 * object that was later made reachable.
    ++		 * or more of the bitmap's selected commits reaches a
    ++		 * once-cruft object that was later made reachable.
      		 */
     -		if (pack_is_marked_for_deletion(item))
     -			continue;
    @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
     +		}
     +	} else {
     +		/*
    -+		 * Modern versions of Git will write new copies of
    ++		 * Modern versions of Git (with the appropriate
    ++		 * configuration setting) will write new copies of
     +		 * once-cruft objects when doing a --geometric repack.
     +		 *
     +		 * If the MIDX has no cruft pack, new packs written

base-commit: 485f5f863615e670fd97ae40af744e14072cfe18
-- 
2.49.0.230.ga662d77f78

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH v3 1/9] pack-objects: use standard option incompatibility functions
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
@ 2025-04-15 22:46   ` Taylor Blau
  2025-04-15 22:46   ` [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:46 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

pack-objects has a handful of explicit checks for pairs of command-line
options which are mutually incompatible. Many of these pre-date
a699367bb8 (i18n: factorize more 'incompatible options' messages,
2022-01-31).

Convert the explicit checks into die_for_incompatible_opt2() calls,
which simplifies the implementation and standardizes pack-objects'
output when given incompatible options (e.g., --stdin-packs with
--filter gives different output than --keep-unreachable with
--unpack-unreachable).

There is one minor piece of test fallout in t5331 that expects the old
format, which has been corrected.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c        | 20 +++++++++++---------
 t/t5331-pack-objects-stdin.sh |  2 +-
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6b06d159d2..20dd870bbf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4651,9 +4651,10 @@ int cmd_pack_objects(int argc,
 		strvec_push(&rp, "--unpacked");
 	}
 
-	if (exclude_promisor_objects && exclude_promisor_objects_best_effort)
-		die(_("options '%s' and '%s' cannot be used together"),
-		    "--exclude-promisor-objects", "--exclude-promisor-objects-best-effort");
+	die_for_incompatible_opt2(exclude_promisor_objects,
+				  "--exclude-promisor-objects",
+				  exclude_promisor_objects_best_effort,
+				  "--exclude-promisor-objects-best-effort");
 	if (exclude_promisor_objects) {
 		use_internal_rev_list = 1;
 		fetch_if_missing = 0;
@@ -4691,13 +4692,14 @@ int cmd_pack_objects(int argc,
 	if (!pack_to_stdout && thin)
 		die(_("--thin cannot be used to build an indexable pack"));
 
-	if (keep_unreachable && unpack_unreachable)
-		die(_("options '%s' and '%s' cannot be used together"), "--keep-unreachable", "--unpack-unreachable");
+	die_for_incompatible_opt2(keep_unreachable, "--keep-unreachable",
+				  unpack_unreachable, "--unpack-unreachable");
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (stdin_packs && filter_options.choice)
-		die(_("cannot use --filter with --stdin-packs"));
+	die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
+				  filter_options.choice, "--filter");
+
 
 	if (stdin_packs && use_internal_rev_list)
 		die(_("cannot use internal rev list with --stdin-packs"));
@@ -4705,8 +4707,8 @@ int cmd_pack_objects(int argc,
 	if (cruft) {
 		if (use_internal_rev_list)
 			die(_("cannot use internal rev list with --cruft"));
-		if (stdin_packs)
-			die(_("cannot use --stdin-packs with --cruft"));
+		die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
+					  cruft, "--cruft");
 	}
 
 	/*
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index b48c0cbe8f..8fd07deb8d 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -64,7 +64,7 @@ test_expect_success '--stdin-packs is incompatible with --filter' '
 		cd stdin-packs &&
 		test_must_fail git pack-objects --stdin-packs --stdout \
 			--filter=blob:none </dev/null 2>err &&
-		test_grep "cannot use --filter with --stdin-packs" err
+		test_grep "options .--stdin-packs. and .--filter. cannot be used together" err
 	)
 '
 
-- 
2.49.0.230.ga662d77f78


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
  2025-04-15 22:46   ` [PATCH v3 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
@ 2025-04-15 22:46   ` Taylor Blau
  2025-04-16  0:58     ` Junio C Hamano
  2025-04-16  5:31     ` Elijah Newren
  2025-04-15 22:46   ` [PATCH v3 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
                     ` (6 subsequent siblings)
  8 siblings, 2 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:46 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In add_object_entry_from_pack() we declare 'revs' (given to us through
the miscellaneous context argument) earlier in the "if (p)" conditional
than is necessary.  Move it down as far as it can go to reduce its
scope.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 20dd870bbf..4ab695a3aa 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3490,14 +3490,14 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 		return 0;
 
 	if (p) {
-		struct rev_info *revs = _data;
 		struct object_info oi = OBJECT_INFO_INIT;
-
 		oi.typep = &type;
+
 		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
 			die(_("could not get type of object %s in pack %s"),
 			    oid_to_hex(oid), p->pack_name);
 		} else if (type == OBJ_COMMIT) {
+			struct rev_info *revs = _data;
 			/*
 			 * commits in included packs are used as starting points for the
 			 * subsequent revision walk
-- 
2.49.0.230.ga662d77f78


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v3 3/9] pack-objects: factor out handling '--stdin-packs'
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
  2025-04-15 22:46   ` [PATCH v3 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
  2025-04-15 22:46   ` [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
@ 2025-04-15 22:46   ` Taylor Blau
  2025-04-16  0:59     ` Junio C Hamano
  2025-04-15 22:46   ` [PATCH v3 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:46 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

At the bottom of cmd_pack_objects() we check which mode the command is
running in (e.g., generating a cruft pack, handling '--stdin-packs',
using the internal rev-list, etc.) and handle the mode appropriately.

The '--stdin-packs' case is handled inline (dating back to its
introduction in 339bce27f4 (builtin/pack-objects.c: add '--stdin-packs'
option, 2021-02-22)) since it is relatively short. Extract the body of
"if (stdin_packs)" into its own function to prepare for the
implementation to become lengthier in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4ab695a3aa..a293267074 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3674,6 +3674,17 @@ static void read_packs_list_from_stdin(void)
 	string_list_clear(&exclude_packs, 0);
 }
 
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin();
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+}
+
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
 				   struct packed_git *pack, off_t offset,
 				   const char *name, uint32_t mtime)
@@ -3769,7 +3780,6 @@ static void mark_pack_kept_in_core(struct string_list *packs, unsigned keep)
 	}
 }
 
-static void add_unreachable_loose_objects(void);
 static void add_objects_in_unpacked_packs(void);
 
 static void enumerate_cruft_objects(void)
@@ -4776,11 +4786,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		/* avoids adding objects in excluded packs */
-		ignore_packed_keep_in_core = 1;
-		read_packs_list_from_stdin();
-		if (rev_list_unpacked)
-			add_unreachable_loose_objects();
+		read_stdin_packs(rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
-- 
2.49.0.230.ga662d77f78


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v3 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
                     ` (2 preceding siblings ...)
  2025-04-15 22:46   ` [PATCH v3 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
@ 2025-04-15 22:46   ` Taylor Blau
  2025-04-15 22:47   ` [PATCH v3 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:46 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
on each of the input packs, we do a reachability traversal to discover
names for any objects we picked up so we can generate name hash values
and hopefully get higher quality deltas as a result.

A future commit will change the purpose of this reachability traversal
to find and pack objects which are reachable from commits in the input
packs, but are packed in an unknown (not included nor excluded) pack.

Extract the code which initializes and performs the reachability
traversal to take place in the caller, not the callee, which prepares us
to share this code for the '--unpacked' case (see the function
add_unreachable_loose_objects() for more details).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 71 +++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 35 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a293267074..d60cb042c9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3558,7 +3558,7 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
 		return 0;
 }
 
-static void read_packs_list_from_stdin(void)
+static void read_packs_list_from_stdin(struct rev_info *revs)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct string_list include_packs = STRING_LIST_INIT_DUP;
@@ -3566,24 +3566,6 @@ static void read_packs_list_from_stdin(void)
 	struct string_list_item *item = NULL;
 
 	struct packed_git *p;
-	struct rev_info revs;
-
-	repo_init_revisions(the_repository, &revs, NULL);
-	/*
-	 * Use a revision walk to fill in the namehash of objects in the include
-	 * packs. To save time, we'll avoid traversing through objects that are
-	 * in excluded packs.
-	 *
-	 * That may cause us to avoid populating all of the namehash fields of
-	 * all included objects, but our goal is best-effort, since this is only
-	 * an optimization during delta selection.
-	 */
-	revs.no_kept_objects = 1;
-	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
-	revs.blob_objects = 1;
-	revs.tree_objects = 1;
-	revs.tag_objects = 1;
-	revs.ignore_missing_links = 1;
 
 	while (strbuf_getline(&buf, stdin) != EOF) {
 		if (!buf.len)
@@ -3653,10 +3635,44 @@ static void read_packs_list_from_stdin(void)
 		struct packed_git *p = item->util;
 		for_each_object_in_pack(p,
 					add_object_entry_from_pack,
-					&revs,
+					revs,
 					FOR_EACH_OBJECT_PACK_ORDER);
 	}
 
+	strbuf_release(&buf);
+	string_list_clear(&include_packs, 0);
+	string_list_clear(&exclude_packs, 0);
+}
+
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	struct rev_info revs;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	/*
+	 * Use a revision walk to fill in the namehash of objects in the include
+	 * packs. To save time, we'll avoid traversing through objects that are
+	 * in excluded packs.
+	 *
+	 * That may cause us to avoid populating all of the namehash fields of
+	 * all included objects, but our goal is best-effort, since this is only
+	 * an optimization during delta selection.
+	 */
+	revs.no_kept_objects = 1;
+	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
+	revs.blob_objects = 1;
+	revs.tree_objects = 1;
+	revs.tag_objects = 1;
+	revs.ignore_missing_links = 1;
+
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin(&revs);
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 	traverse_commit_list(&revs,
@@ -3668,21 +3684,6 @@ static void read_packs_list_from_stdin(void)
 			   stdin_packs_found_nr);
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
 			   stdin_packs_hints_nr);
-
-	strbuf_release(&buf);
-	string_list_clear(&include_packs, 0);
-	string_list_clear(&exclude_packs, 0);
-}
-
-static void add_unreachable_loose_objects(void);
-
-static void read_stdin_packs(int rev_list_unpacked)
-{
-	/* avoids adding objects in excluded packs */
-	ignore_packed_keep_in_core = 1;
-	read_packs_list_from_stdin();
-	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
 }
 
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
-- 
2.49.0.230.ga662d77f78


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v3 5/9] pack-objects: perform name-hash traversal for unpacked objects
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
                     ` (3 preceding siblings ...)
  2025-04-15 22:46   ` [PATCH v3 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
@ 2025-04-15 22:47   ` Taylor Blau
  2025-04-16  9:21     ` Junio C Hamano
  2025-04-15 22:47   ` [PATCH v3 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:47 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

With '--unpacked', pack-objects adds loose objects (which don't appear
in any of the excluded packs from '--stdin-packs') to the output pack
without considering them as reachability tips for the name-hash
traversal.

This was an oversight in the original implementation of '--stdin-packs',
since the code which enumerates and adds loose objects to the output
pack (`add_unreachable_loose_objects()`) did not have access to the
'rev_info' struct found in `read_packs_list_from_stdin()`.

Excluding unpacked objects from that traversal doesn't affect the
correctness of the resulting pack, but it does make it harder to
discover good deltas for loose objects.

Now that the 'rev_info' struct is declared outside of
`read_packs_list_from_stdin()`, we can pass it to
`add_objects_in_unpacked_packs()` and add any loose objects as tips to
the above-mentioned traversal, in theory producing slightly tighter
packs as a result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d60cb042c9..eb2a4099cc 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3644,7 +3644,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 	string_list_clear(&exclude_packs, 0);
 }
 
-static void add_unreachable_loose_objects(void);
+static void add_unreachable_loose_objects(struct rev_info *revs);
 
 static void read_stdin_packs(int rev_list_unpacked)
 {
@@ -3671,7 +3671,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	ignore_packed_keep_in_core = 1;
 	read_packs_list_from_stdin(&revs);
 	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(&revs);
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
@@ -3790,7 +3790,7 @@ static void enumerate_cruft_objects(void)
 						_("Enumerating cruft objects"), 0);
 
 	add_objects_in_unpacked_packs();
-	add_unreachable_loose_objects();
+	add_unreachable_loose_objects(NULL);
 
 	stop_progress(&progress_state);
 }
@@ -4068,8 +4068,9 @@ static void add_objects_in_unpacked_packs(void)
 }
 
 static int add_loose_object(const struct object_id *oid, const char *path,
-			    void *data UNUSED)
+			    void *data)
 {
+	struct rev_info *revs = data;
 	enum object_type type = oid_object_info(the_repository, oid, NULL);
 
 	if (type < 0) {
@@ -4090,6 +4091,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
 	} else {
 		add_object_entry(oid, type, "", 0);
 	}
+
+	if (revs && type == OBJ_COMMIT)
+		add_pending_oid(revs, NULL, oid, 0);
+
 	return 0;
 }
 
@@ -4098,11 +4103,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
  * add_object_entry will weed out duplicates, so we just add every
  * loose object we find.
  */
-static void add_unreachable_loose_objects(void)
+static void add_unreachable_loose_objects(struct rev_info *revs)
 {
 	for_each_loose_file_in_objdir(repo_get_object_directory(the_repository),
-				      add_loose_object,
-				      NULL, NULL, NULL);
+				      add_loose_object, NULL, NULL, revs);
 }
 
 static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid)
@@ -4358,7 +4362,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
 	if (keep_unreachable)
 		add_objects_in_unpacked_packs();
 	if (pack_loose_unreachable)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(NULL);
 	if (unpack_unreachable)
 		loosen_unused_packed_objects();
 
-- 
2.49.0.230.ga662d77f78


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v3 6/9] pack-objects: fix typo in 'show_object_pack_hint()'
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
                     ` (4 preceding siblings ...)
  2025-04-15 22:47   ` [PATCH v3 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
@ 2025-04-15 22:47   ` Taylor Blau
  2025-04-16  5:36     ` Elijah Newren
  2025-04-15 22:47   ` [PATCH v3 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:47 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Noticed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index eb2a4099cc..f06b359150 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3532,7 +3532,7 @@ static void show_object_pack_hint(struct object *object, const char *name,
 	 * would typically pick up during a reachability traversal.
 	 *
 	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * here using a now in order to perhaps improve the delta selection
+	 * fields here in order to perhaps improve the delta selection
 	 * process.
 	 */
 	oe->hash = pack_name_hash_fn(name);
-- 
2.49.0.230.ga662d77f78


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v3 7/9] pack-objects: swap 'show_{object,commit}_pack_hint'
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
                     ` (5 preceding siblings ...)
  2025-04-15 22:47   ` [PATCH v3 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
@ 2025-04-15 22:47   ` Taylor Blau
  2025-04-15 22:47   ` [PATCH v3 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
  2025-04-15 22:47   ` [PATCH v3 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:47 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

show_commit_pack_hint() has heretofore been a noop, so its position
within its compilation unit only needs to appear before its first use.

But the following commit will sometimes have `show_commit_pack_hint()`
call `show_object_pack_hint()`, so reorder the former to appear after
the latter to minimize the code movement in that patch.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f06b359150..f4009cd391 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3513,12 +3513,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 	return 0;
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
-{
-	/* nothing to do; commits don't have a namehash */
-}
-
 static void show_object_pack_hint(struct object *object, const char *name,
 				  void *data UNUSED)
 {
@@ -3541,6 +3535,12 @@ static void show_object_pack_hint(struct object *object, const char *name,
 	stdin_packs_hints_nr++;
 }
 
+static void show_commit_pack_hint(struct commit *commit UNUSED,
+				  void *data UNUSED)
+{
+	/* nothing to do; commits don't have a namehash */
+}
+
 static int pack_mtime_cmp(const void *_a, const void *_b)
 {
 	struct packed_git *a = ((const struct string_list_item*)_a)->util;
-- 
2.49.0.230.ga662d77f78


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v3 8/9] pack-objects: introduce '--stdin-packs=follow'
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
                     ` (6 preceding siblings ...)
  2025-04-15 22:47   ` [PATCH v3 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
@ 2025-04-15 22:47   ` Taylor Blau
  2025-04-15 22:47   ` [PATCH v3 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:47 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

When invoked with '--stdin-packs', pack-objects will generate a pack
which contains the objects found in the "included" packs, less any
objects from "excluded" packs.

Packs that exist in the repository but weren't specified as either
included or excluded are in practice treated like the latter, at least
in the sense that pack-objects won't include objects from those packs.
This behavior forces us to include any cruft pack(s) in a repository's
multi-pack index for the reasons described in ddee3703b3
(builtin/repack.c: add cruft packs to MIDX during geometric repack,
2022-05-20).

The full details are in ddee3703b3, but the gist is if you
have a once-unreachable object in a cruft pack which later becomes
reachable via one or more commits in a pack generated with
'--stdin-packs', you *have* to include that object in the MIDX via the
copy in the cruft pack, otherwise we cannot generate reachability
bitmaps for any commits which reach that object.

This prepares us for new repacking behavior which will "resurrect"
objects found in cruft or otherwise unspecified packs when generating
new packs. In the context of geometric repacking, this may be used to
maintain a sequence of geometrically-repacked packs, the union of which
is closed under reachability, even in the case described earlier.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc | 10 +++-
 builtin/pack-objects.c              | 83 +++++++++++++++++++++--------
 t/t5331-pack-objects-stdin.sh       | 82 ++++++++++++++++++++++++++++
 3 files changed, 152 insertions(+), 23 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 7f69ae4855..8f0cecaec9 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -87,13 +87,21 @@ base-name::
 	reference was included in the resulting packfile.  This
 	can be useful to send new tags to native Git clients.
 
---stdin-packs::
+--stdin-packs[=<mode>]::
 	Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
 	from the standard input, instead of object names or revision
 	arguments. The resulting pack contains all objects listed in the
 	included packs (those not beginning with `^`), excluding any
 	objects listed in the excluded packs (beginning with `^`).
 +
+When `mode` is "follow", objects from packs not listed on stdin receive
+special treatment. Objects within unlisted packs will be included if
+those objects are (1) reachable from the included packs, and (2) not
+found in any excluded packs. This mode is useful, for example, to
+resurrect once-unreachable objects found in cruft packs to generate
+packs which are closed under reachability up to the boundary set by the
+excluded packs.
++
 Incompatible with `--revs`, or options that imply `--revs` (such as
 `--all`), with the exception of `--unpacked`, which is compatible.
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f4009cd391..67a22b2dc4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -272,6 +272,12 @@ static struct oidmap configured_exclusions;
 static struct oidset excluded_by_config;
 static int name_hash_version = -1;
 
+enum stdin_packs_mode {
+	STDIN_PACKS_MODE_NONE,
+	STDIN_PACKS_MODE_STANDARD,
+	STDIN_PACKS_MODE_FOLLOW,
+};
+
 /**
  * Check whether the name_hash_version chosen by user input is appropriate,
  * and also validate whether it is compatible with other features.
@@ -3514,31 +3520,44 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 }
 
 static void show_object_pack_hint(struct object *object, const char *name,
-				  void *data UNUSED)
+				  void *data)
 {
-	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
-	if (!oe)
-		return;
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		add_object_entry(&object->oid, object->type, name, 0);
+	} else {
+		struct object_entry *oe = packlist_find(&to_pack, &object->oid);
+		if (!oe)
+			return;
 
-	/*
-	 * Our 'to_pack' list was constructed by iterating all objects packed in
-	 * included packs, and so doesn't have a non-zero hash field that you
-	 * would typically pick up during a reachability traversal.
-	 *
-	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * fields here in order to perhaps improve the delta selection
-	 * process.
-	 */
-	oe->hash = pack_name_hash_fn(name);
-	oe->no_try_delta = name && no_try_delta(name);
+		/*
+		 * Our 'to_pack' list was constructed by iterating all
+		 * objects packed in included packs, and so doesn't have
+		 * a non-zero hash field that you would typically pick
+		 * up during a reachability traversal.
+		 *
+		 * Make a best-effort attempt to fill in the ->hash and
+		 * ->no_try_delta fields here in order to perhaps
+		 * improve the delta selection process.
+		 */
+		oe->hash = pack_name_hash_fn(name);
+		oe->no_try_delta = name && no_try_delta(name);
 
-	stdin_packs_hints_nr++;
+		stdin_packs_hints_nr++;
+	}
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
+static void show_commit_pack_hint(struct commit *commit, void *data)
 {
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		show_object_pack_hint((struct object *)commit, "", data);
+		return;
+	}
+
 	/* nothing to do; commits don't have a namehash */
+
 }
 
 static int pack_mtime_cmp(const void *_a, const void *_b)
@@ -3646,7 +3665,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 
 static void add_unreachable_loose_objects(struct rev_info *revs);
 
-static void read_stdin_packs(int rev_list_unpacked)
+static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 {
 	struct rev_info revs;
 
@@ -3678,7 +3697,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	traverse_commit_list(&revs,
 			     show_commit_pack_hint,
 			     show_object_pack_hint,
-			     NULL);
+			     &mode);
 
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
 			   stdin_packs_found_nr);
@@ -4469,6 +4488,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
 	return is_not_in_promisor_pack_obj((struct object *) commit, data);
 }
 
+static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
+				  int unset)
+{
+	enum stdin_packs_mode *mode = opt->value;
+
+	if (unset)
+		*mode = STDIN_PACKS_MODE_NONE;
+	else if (!arg || !*arg)
+		*mode = STDIN_PACKS_MODE_STANDARD;
+	else if (!strcmp(arg, "follow"))
+		*mode = STDIN_PACKS_MODE_FOLLOW;
+	else
+		die(_("invalid value for '%s': '%s'"), opt->long_name, arg);
+
+	return 0;
+}
+
 int cmd_pack_objects(int argc,
 		     const char **argv,
 		     const char *prefix,
@@ -4480,7 +4516,7 @@ int cmd_pack_objects(int argc,
 	struct strvec rp = STRVEC_INIT;
 	int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
 	int rev_list_index = 0;
-	int stdin_packs = 0;
+	enum stdin_packs_mode stdin_packs = STDIN_PACKS_MODE_NONE;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
 	struct list_objects_filter_options filter_options =
 		LIST_OBJECTS_FILTER_INIT;
@@ -4535,6 +4571,9 @@ int cmd_pack_objects(int argc,
 		OPT_SET_INT_F(0, "indexed-objects", &rev_list_index,
 			      N_("include objects referred to by the index"),
 			      1, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F(0, "stdin-packs", &stdin_packs, N_("mode"),
+			     N_("read packs from stdin"),
+			     PARSE_OPT_OPTARG, parse_stdin_packs_mode),
 		OPT_BOOL(0, "stdin-packs", &stdin_packs,
 			 N_("read packs from stdin")),
 		OPT_BOOL(0, "stdout", &pack_to_stdout,
@@ -4791,7 +4830,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		read_stdin_packs(rev_list_unpacked);
+		read_stdin_packs(stdin_packs, rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 8fd07deb8d..60a2b4bc07 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -236,4 +236,86 @@ test_expect_success 'pack-objects --stdin with packfiles from main and alternate
 	test_cmp expected-objects actual-objects
 '
 
+packdir=.git/objects/pack
+
+objects_in_packs () {
+	for p in "$@"
+	do
+		git show-index <"$packdir/pack-$p.idx" || return 1
+	done >objects.raw &&
+
+	cut -d' ' -f2 objects.raw | sort &&
+	rm -f objects.raw
+}
+
+test_expect_success '--stdin-packs=follow walks into unknown packs' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+
+		for c in A B C D
+		do
+			test_commit "$c" || return 1
+		done &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
+		test_commit E &&
+
+		git prune-packed &&
+
+		cat >in <<-EOF &&
+		pack-$B.pack
+		^pack-$C.pack
+		pack-$D.pack
+		EOF
+
+		# With just --stdin-packs, pack "A" is unknown to us, so
+		# only objects from packs "B" and "D" are included in
+		# the output pack.
+		P=$(git pack-objects --stdin-packs $packdir/pack <in) &&
+		objects_in_packs $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# But with --stdin-packs=follow, objects from both
+		# included packs reach objects from the unknown pack, so
+		# objects from pack "A" is included in the output pack
+		# in addition to the above.
+		P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
+		objects_in_packs $A $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# And with --unpacked, we will pick up objects from unknown
+		# packs that are reachable from loose objects. Loose object E
+		# reaches objects in pack A, but there are three excluded packs
+		# in between.
+		#
+		# The resulting pack should include objects reachable from E
+		# that are not present in packs B, C, or D, along with those
+		# present in pack A.
+		cat >in <<-EOF &&
+		^pack-$B.pack
+		^pack-$C.pack
+		^pack-$D.pack
+		EOF
+
+		P=$(git pack-objects --stdin-packs=follow --unpacked \
+			$packdir/pack <in) &&
+
+		{
+			objects_in_packs $A &&
+			git rev-list --objects --no-object-names D..E
+		}>expect.raw &&
+		sort expect.raw >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.49.0.230.ga662d77f78


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v3 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
                     ` (7 preceding siblings ...)
  2025-04-15 22:47   ` [PATCH v3 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
@ 2025-04-15 22:47   ` Taylor Blau
  2025-04-16  5:56     ` Elijah Newren
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-15 22:47 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.

Suppose (prior to this patch) you have a once-unreachable object packed
in a cruft pack, which later on becomes reachable from one or more
objects in a geometrically repacked pack. That once-unreachable object
*won't* appear in the new pack, since the cruft pack was specified as
neither included nor excluded to 'pack-objects --stdin-packs'. If the
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects, commit ddee3703b3 ensures that the MIDX will
be closed under reachability. Without it, we would fail to generate a
MIDX bitmap.

ddee3703b3 alludes to the fact that this is sub-optimal by saying

    [...] it's desirable to avoid including cruft packs in the MIDX
    because it causes the MIDX to store a bunch of objects which are
    likely to get thrown away.

, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.

This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.

This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.

Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.

If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.

But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.

Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/repack.adoc |   7 ++
 builtin/repack.c                 | 163 +++++++++++++++++++++++++++----
 t/t7704-repack-cruft.sh          |  90 +++++++++++++++++
 3 files changed, 242 insertions(+), 18 deletions(-)

diff --git a/Documentation/config/repack.adoc b/Documentation/config/repack.adoc
index c79af6d7b8..e9e78dcb19 100644
--- a/Documentation/config/repack.adoc
+++ b/Documentation/config/repack.adoc
@@ -39,3 +39,10 @@ repack.cruftThreads::
 	a cruft pack and the respective parameters are not given over
 	the command line. See similarly named `pack.*` configuration
 	variables for defaults and meaning.
+
+repack.midxMustContainCruft::
+	When set to true, linkgit:git-repack[1] will unconditionally include
+	cruft pack(s), if any, in the multi-pack index when invoked with
+	`--write-midx`. When false, cruft packs are only included in the MIDX
+	when necessary (e.g., because they might be required to form a
+	reachability closure with MIDX bitmaps). Defaults to true.
diff --git a/builtin/repack.c b/builtin/repack.c
index f3330ade7b..c9e2e3d04d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -39,6 +39,7 @@ static int write_bitmaps = -1;
 static int use_delta_islands;
 static int run_update_server_info = 1;
 static char *packdir, *packtmp_name, *packtmp;
+static int midx_must_contain_cruft = 1;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
@@ -107,6 +108,10 @@ static int repack_config(const char *var, const char *value,
 		free(cruft_po_args->threads);
 		return git_config_string(&cruft_po_args->threads, var, value);
 	}
+	if (!strcmp(var, "repack.midxmustcontaincruft")) {
+		midx_must_contain_cruft = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, ctx, cb);
 }
 
@@ -687,6 +692,77 @@ static void free_pack_geometry(struct pack_geometry *geometry)
 	free(geometry->pack);
 }
 
+static int midx_has_unknown_packs(char **midx_pack_names,
+				  size_t midx_pack_names_nr,
+				  struct string_list *include,
+				  struct pack_geometry *geometry,
+				  struct existing_packs *existing)
+{
+	size_t i;
+
+	string_list_sort(include);
+
+	for (i = 0; i < midx_pack_names_nr; i++) {
+		const char *pack_name = midx_pack_names[i];
+
+		/*
+		 * Determine whether or not each MIDX'd pack from the existing
+		 * MIDX (if any) is represented in the new MIDX. For each pack
+		 * in the MIDX, it must either be:
+		 *
+		 *  - In the "include" list of packs to be included in the new
+		 *    MIDX. Note this function is called before the include
+		 *    list is populated with any cruft pack(s).
+		 *
+		 *  - Below the geometric split line (if using pack geometry),
+		 *    indicating that the pack won't be included in the new
+		 *    MIDX, but its contents were rolled up as part of the
+		 *    geometric repack.
+		 *
+		 *  - In the existing non-kept packs list (if not using pack
+		 *    geometry), and marked as non-deleted.
+		 */
+		if (string_list_has_string(include, pack_name)) {
+			continue;
+		} else if (geometry) {
+			struct strbuf buf = STRBUF_INIT;
+			uint32_t j;
+
+			for (j = 0; j < geometry->split; j++) {
+				strbuf_reset(&buf);
+				strbuf_addstr(&buf, pack_basename(geometry->pack[j]));
+				strbuf_strip_suffix(&buf, ".pack");
+				strbuf_addstr(&buf, ".idx");
+
+				if (!strcmp(pack_name, buf.buf)) {
+					strbuf_release(&buf);
+					break;
+				}
+			}
+
+			strbuf_release(&buf);
+
+			if (j < geometry->split)
+				continue;
+		} else {
+			struct string_list_item *item;
+
+			item = string_list_lookup(&existing->non_kept_packs,
+						  pack_name);
+			if (item && !pack_is_marked_for_deletion(item))
+				continue;
+		}
+
+		/*
+		 * If we got to this point, the MIDX includes some pack that we
+		 * don't know about.
+		 */
+		return 1;
+	}
+
+	return 0;
+}
+
 struct midx_snapshot_ref_data {
 	struct tempfile *f;
 	struct oidset seen;
@@ -755,6 +831,8 @@ static void midx_snapshot_refs(struct tempfile *f)
 
 static void midx_included_packs(struct string_list *include,
 				struct existing_packs *existing,
+				char **midx_pack_names,
+				size_t midx_pack_names_nr,
 				struct string_list *names,
 				struct pack_geometry *geometry)
 {
@@ -808,26 +886,56 @@ static void midx_included_packs(struct string_list *include,
 		}
 	}
 
-	for_each_string_list_item(item, &existing->cruft_packs) {
+	if (midx_must_contain_cruft ||
+	    midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
+				   include, geometry, existing)) {
 		/*
-		 * When doing a --geometric repack, there is no need to check
-		 * for deleted packs, since we're by definition not doing an
-		 * ALL_INTO_ONE repack (hence no packs will be deleted).
-		 * Otherwise we must check for and exclude any packs which are
-		 * enqueued for deletion.
+		 * If there are one or more unknown pack(s) present (see
+		 * midx_has_unknown_packs() for what makes a pack
+		 * "unknown") in the MIDX before the repack, keep them
+		 * as they may be required to form a reachability
+		 * closure if the MIDX is bitmapped.
 		 *
-		 * So we could omit the conditional below in the --geometric
-		 * case, but doing so is unnecessary since no packs are marked
-		 * as pending deletion (since we only call
-		 * `mark_packs_for_deletion()` when doing an all-into-one
-		 * repack).
+		 * For example, a cruft pack can be required to form a
+		 * reachability closure if the MIDX is bitmapped and one
+		 * or more of the bitmap's selected commits reaches a
+		 * once-cruft object that was later made reachable.
 		 */
-		if (pack_is_marked_for_deletion(item))
-			continue;
+		for_each_string_list_item(item, &existing->cruft_packs) {
+			/*
+			 * When doing a --geometric repack, there is no
+			 * need to check for deleted packs, since we're
+			 * by definition not doing an ALL_INTO_ONE
+			 * repack (hence no packs will be deleted).
+			 * Otherwise we must check for and exclude any
+			 * packs which are enqueued for deletion.
+			 *
+			 * So we could omit the conditional below in the
+			 * --geometric case, but doing so is unnecessary
+			 *  since no packs are marked as pending
+			 *  deletion (since we only call
+			 *  `mark_packs_for_deletion()` when doing an
+			 *  all-into-one repack).
+			 */
+			if (pack_is_marked_for_deletion(item))
+				continue;
 
-		strbuf_reset(&buf);
-		strbuf_addf(&buf, "%s.idx", item->string);
-		string_list_insert(include, buf.buf);
+			strbuf_reset(&buf);
+			strbuf_addf(&buf, "%s.idx", item->string);
+			string_list_insert(include, buf.buf);
+		}
+	} else {
+		/*
+		 * Modern versions of Git (with the appropriate
+		 * configuration setting) will write new copies of
+		 * once-cruft objects when doing a --geometric repack.
+		 *
+		 * If the MIDX has no cruft pack, new packs written
+		 * during a --geometric repack will not rely on the
+		 * cruft pack to form a reachability closure, so we can
+		 * avoid including them in the MIDX in that case.
+		 */
+		;
 	}
 
 	strbuf_release(&buf);
@@ -1142,6 +1250,8 @@ int cmd_repack(int argc,
 	struct tempfile *refs_snapshot = NULL;
 	int i, ext, ret;
 	int show_progress;
+	char **midx_pack_names = NULL;
+	size_t midx_pack_names_nr = 0;
 
 	/* variables to be filled by option parsing */
 	int delete_redundant = 0;
@@ -1356,7 +1466,10 @@ int cmd_repack(int argc,
 		    !(pack_everything & PACK_CRUFT))
 			strvec_push(&cmd.args, "--pack-loose-unreachable");
 	} else if (geometry.split_factor) {
-		strvec_push(&cmd.args, "--stdin-packs");
+		if (midx_must_contain_cruft)
+			strvec_push(&cmd.args, "--stdin-packs");
+		else
+			strvec_push(&cmd.args, "--stdin-packs=follow");
 		strvec_push(&cmd.args, "--unpacked");
 	} else {
 		strvec_push(&cmd.args, "--unpacked");
@@ -1478,6 +1591,16 @@ int cmd_repack(int argc,
 
 	string_list_sort(&names);
 
+	if (get_local_multi_pack_index(the_repository)) {
+		uint32_t i;
+		struct multi_pack_index *m =
+			get_local_multi_pack_index(the_repository);
+
+		ALLOC_ARRAY(midx_pack_names, m->num_packs);
+		for (i = 0; i < m->num_packs; i++)
+			midx_pack_names[midx_pack_names_nr++] = xstrdup(m->pack_names[i]);
+	}
+
 	close_object_store(the_repository->objects);
 
 	/*
@@ -1519,7 +1642,8 @@ int cmd_repack(int argc,
 
 	if (write_midx) {
 		struct string_list include = STRING_LIST_INIT_DUP;
-		midx_included_packs(&include, &existing, &names, &geometry);
+		midx_included_packs(&include, &existing, midx_pack_names,
+				    midx_pack_names_nr, &names, &geometry);
 
 		ret = write_midx_included_packs(&include, &geometry, &names,
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
@@ -1570,6 +1694,9 @@ int cmd_repack(int argc,
 	string_list_clear(&names, 1);
 	existing_packs_release(&existing);
 	free_pack_geometry(&geometry);
+	for (size_t i = 0; i < midx_pack_names_nr; i++)
+		free(midx_pack_names[i]);
+	free(midx_pack_names);
 	pack_objects_args_release(&po_args);
 	pack_objects_args_release(&cruft_po_args);
 
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 8aebfb45f5..2b0a55f8fd 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -724,4 +724,94 @@ test_expect_success 'cruft repack respects --quiet' '
 	)
 '
 
+setup_cruft_exclude_tests() {
+	git init "$1" &&
+	(
+		cd "$1" &&
+
+		git config repack.midxMustContainCruft false &&
+
+		test_commit one &&
+
+		test_commit --no-tag two &&
+		two="$(git rev-parse HEAD)" &&
+		test_commit --no-tag three &&
+		three="$(git rev-parse HEAD)" &&
+		git reset --hard one &&
+		git reflog expire --all --expire=all &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack --cruft -d &&
+
+		git merge $two &&
+		test_commit four
+	)
+}
+
+test_expect_success 'repack --write-midx excludes cruft where possible' '
+	setup_cruft_exclude_tests exclude-cruft-when-possible &&
+	(
+		cd exclude-cruft-when-possible &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git rev-list --all --objects --no-object-names >reachable.raw &&
+		sort reachable.raw >reachable.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp reachable.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when instructed' '
+	setup_cruft_exclude_tests exclude-cruft-when-instructed &&
+	(
+		cd exclude-cruft-when-instructed &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git -c repack.midxMustContainCruft=true repack \
+			-d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git cat-file --batch-check="%(objectname)" --batch-all-objects \
+			>all.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp all.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when necessary' '
+	setup_cruft_exclude_tests exclude-cruft-when-necessary &&
+	(
+		cd exclude-cruft-when-necessary &&
+
+		test_path_is_file $(ls $packdir/pack-*.mtimes) &&
+		ls $packdir/pack-*.idx | sort >packs.all &&
+		grep -o "pack-.*\.idx$" packs.all >in &&
+
+		git multi-pack-index write --stdin-packs --bitmap <in &&
+
+		test_commit five &&
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+		git cat-file --batch-all-objects --batch-check="%(objectname)" \
+			>expect.objects &&
+		test_cmp expect.objects midx.objects &&
+
+		grep "^pack-" midx >midx.packs &&
+		test_line_count = "$(($(wc -l <packs.all) + 1))" midx.packs
+	)
+'
+
 test_done
-- 
2.49.0.230.ga662d77f78

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-04-15 22:46   ` [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
@ 2025-04-16  0:58     ` Junio C Hamano
  2025-04-16 22:07       ` Taylor Blau
  2025-04-16  5:31     ` Elijah Newren
  1 sibling, 1 reply; 105+ messages in thread
From: Junio C Hamano @ 2025-04-16  0:58 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> In add_object_entry_from_pack() we declare 'revs' (given to us through
> the miscellaneous context argument) earlier in the "if (p)" conditional
> than is necessary.  Move it down as far as it can go to reduce its
> scope.

That makes sense, but ...

> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 20dd870bbf..4ab695a3aa 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3490,14 +3490,14 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>  		return 0;
>  
>  	if (p) {
> -		struct rev_info *revs = _data;
>  		struct object_info oi = OBJECT_INFO_INIT;
> -
>  		oi.typep = &type;
> +

Isn't this change about spacing around oi's decl and the first
statement in the block strictly worsening the code?  At least it is
an unrelated change.

>  		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
>  			die(_("could not get type of object %s in pack %s"),
>  			    oid_to_hex(oid), p->pack_name);
>  		} else if (type == OBJ_COMMIT) {
> +			struct rev_info *revs = _data;
>  			/*
>  			 * commits in included packs are used as starting points for the
>  			 * subsequent revision walk

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v3 3/9] pack-objects: factor out handling '--stdin-packs'
  2025-04-15 22:46   ` [PATCH v3 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
@ 2025-04-16  0:59     ` Junio C Hamano
  0 siblings, 0 replies; 105+ messages in thread
From: Junio C Hamano @ 2025-04-16  0:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> At the bottom of cmd_pack_objects() we check which mode the command is
> running in (e.g., generating a cruft pack, handling '--stdin-packs',
> using the internal rev-list, etc.) and handle the mode appropriately.
>
> The '--stdin-packs' case is handled inline (dating back to its
> introduction in 339bce27f4 (builtin/pack-objects.c: add '--stdin-packs'
> option, 2021-02-22)) since it is relatively short. Extract the body of
> "if (stdin_packs)" into its own function to prepare for the
> implementation to become lengthier in a following commit.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)

Makes sense.

>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 4ab695a3aa..a293267074 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3674,6 +3674,17 @@ static void read_packs_list_from_stdin(void)
>  	string_list_clear(&exclude_packs, 0);
>  }
>  
> +static void add_unreachable_loose_objects(void);
> +
> +static void read_stdin_packs(int rev_list_unpacked)
> +{
> +	/* avoids adding objects in excluded packs */
> +	ignore_packed_keep_in_core = 1;
> +	read_packs_list_from_stdin();
> +	if (rev_list_unpacked)
> +		add_unreachable_loose_objects();
> +}
> +
>  static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
>  				   struct packed_git *pack, off_t offset,
>  				   const char *name, uint32_t mtime)
> @@ -3769,7 +3780,6 @@ static void mark_pack_kept_in_core(struct string_list *packs, unsigned keep)
>  	}
>  }
>  
> -static void add_unreachable_loose_objects(void);
>  static void add_objects_in_unpacked_packs(void);
>  
>  static void enumerate_cruft_objects(void)
> @@ -4776,11 +4786,7 @@ int cmd_pack_objects(int argc,
>  		progress_state = start_progress(the_repository,
>  						_("Enumerating objects"), 0);
>  	if (stdin_packs) {
> -		/* avoids adding objects in excluded packs */
> -		ignore_packed_keep_in_core = 1;
> -		read_packs_list_from_stdin();
> -		if (rev_list_unpacked)
> -			add_unreachable_loose_objects();
> +		read_stdin_packs(rev_list_unpacked);
>  	} else if (cruft) {
>  		read_cruft_objects();
>  	} else if (!use_internal_rev_list) {

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v2 7/8] pack-objects: introduce '--stdin-packs=follow'
  2025-04-15 20:45       ` Taylor Blau
@ 2025-04-16  5:26         ` Elijah Newren
  0 siblings, 0 replies; 105+ messages in thread
From: Elijah Newren @ 2025-04-16  5:26 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Tue, Apr 15, 2025 at 1:45 PM Taylor Blau <me@ttaylorr.com> wrote:

> > > @@ -4467,6 +4484,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
> > >         return is_not_in_promisor_pack_obj((struct object *) commit, data);
> > >  }
> > >
> > > +static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
> > > +                                 int unset)
> > > +{
> > > +       enum stdin_packs_mode *mode = opt->value;
> > > +
> > > +       if (unset)
> > > +               *mode = STDIN_PACKS_MODE_NONE;
> > > +       else if (!arg || !*arg)
> > > +               *mode = STDIN_PACKS_MODE_STANDARD;
> >
> > I don't understand why you have both a None mode and a Standard mode,
> > especially since the implementation seems to only care about whether
> > or not the Follow mode has been set.  Shouldn't these both be setting
> > mode to the same value?
>
> I'm not sure I follow your question... stdin_packs is a tri-state. It
> can be off, on in standard/legacy mode, or on in follow mode.

I was just confused.  I looked in the code for
STDIN_PACKS_MODE_{NONE,STANDARD,FOLLOW}, and other than initial setup,
only the _FOLLOW variant was used anywhere.  I overlooked the "if
(stdin_packs" usage, which is what further distinguishes between _NONE
and _STANDARD.  Sorry.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-04-15 22:46   ` [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
  2025-04-16  0:58     ` Junio C Hamano
@ 2025-04-16  5:31     ` Elijah Newren
  2025-04-16 22:07       ` Taylor Blau
  1 sibling, 1 reply; 105+ messages in thread
From: Elijah Newren @ 2025-04-16  5:31 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Tue, Apr 15, 2025 at 3:46 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> In add_object_entry_from_pack() we declare 'revs' (given to us through
> the miscellaneous context argument) earlier in the "if (p)" conditional
> than is necessary.  Move it down as far as it can go to reduce its
> scope.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 20dd870bbf..4ab695a3aa 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3490,14 +3490,14 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>                 return 0;
>
>         if (p) {
> -               struct rev_info *revs = _data;

This change is half of what you mention in your commit message.

>                 struct object_info oi = OBJECT_INFO_INIT;
> -
>                 oi.typep = &type;
> +

This is an unrelated, distracting change that I think was accidental
and came from trying to back out the other change that was part of
v1/v2 but not quite backing it out completely.

>                 if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
>                         die(_("could not get type of object %s in pack %s"),
>                             oid_to_hex(oid), p->pack_name);
>                 } else if (type == OBJ_COMMIT) {
> +                       struct rev_info *revs = _data;

This is the other half of what you mention in the commit message.

>                         /*
>                          * commits in included packs are used as starting points for the
>                          * subsequent revision walk
> --
> 2.49.0.230.ga662d77f78

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v3 6/9] pack-objects: fix typo in 'show_object_pack_hint()'
  2025-04-15 22:47   ` [PATCH v3 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
@ 2025-04-16  5:36     ` Elijah Newren
  0 siblings, 0 replies; 105+ messages in thread
From: Elijah Newren @ 2025-04-16  5:36 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Tue, Apr 15, 2025 at 3:47 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Noticed-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index eb2a4099cc..f06b359150 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3532,7 +3532,7 @@ static void show_object_pack_hint(struct object *object, const char *name,
>          * would typically pick up during a reachability traversal.
>          *
>          * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
> -        * here using a now in order to perhaps improve the delta selection
> +        * fields here in order to perhaps improve the delta selection
>          * process.

Thanks; much improved.

>          */
>         oe->hash = pack_name_hash_fn(name);
> --
> 2.49.0.230.ga662d77f78
>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v3 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-04-15 22:47   ` [PATCH v3 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
@ 2025-04-16  5:56     ` Elijah Newren
  2025-04-16 22:16       ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Elijah Newren @ 2025-04-16  5:56 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Tue, Apr 15, 2025 at 3:47 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
> geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
> MIDX with '--write-midx' to ensure that the resulting MIDX was always
> closed under reachability in order to generate reachability bitmaps.
>
> Suppose (prior to this patch) you have a once-unreachable object packed
> in a cruft pack, which later on becomes reachable from one or more
> objects in a geometrically repacked pack. That once-unreachable object
> *won't* appear in the new pack, since the cruft pack was specified as
> neither included nor excluded to 'pack-objects --stdin-packs'.

But immediately prior to this patch you implemented
--stdin-packs=follow, so the once-unreachable object would actually
appear in the pack if that new option was used.  The "(prior to this
patch)" addition was meant to help clarify here, but to me it doesn't
succeed.  (If it had been "(prior to this series)" it would have
clarified that we aren't yet using the new feature from the previous
patch.)  Perhaps you meant that geometric repacking doesn't use
--stdin-packs=follow currently, and therefore the once-unreachable
object won't be in the new pack, but if so I think it would be helpful
to call that out explicitly so the reader can more easily follow which
hypothetical state you are discussing.

> If the
> new pack is included in a MIDX without the cruft pack, then trying to
> generate bitmaps for that MIDX may fail. This happens when the bitmap
> selection process picks one or more commits which reach the
> once-unreachable objects, commit ddee3703b3 ensures that the MIDX will
> be closed under reachability. Without it, we would fail to generate a
> MIDX bitmap.

The comma between objects and commit seems insufficient.  To me, that
feels like a contrasting thought that should start a new sentence.
Perhaps the last three lines could read something like:

"""
once-unreachable objects.  Commit ddee3703b3 ensures that the MIDX will
be closed under reachability by including cruft pack(s); without them,
we would fail to generate a
MIDX bitmap.
"""

[...]
> ---
>  Documentation/config/repack.adoc |   7 ++
>  builtin/repack.c                 | 163 +++++++++++++++++++++++++++----
>  t/t7704-repack-cruft.sh          |  90 +++++++++++++++++
>  3 files changed, 242 insertions(+), 18 deletions(-)

You addressed the rest of my feedback with this patch, other than the
two items I highlighted above.  I'm excited to see how this works out.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v3 5/9] pack-objects: perform name-hash traversal for unpacked objects
  2025-04-15 22:47   ` [PATCH v3 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
@ 2025-04-16  9:21     ` Junio C Hamano
  0 siblings, 0 replies; 105+ messages in thread
From: Junio C Hamano @ 2025-04-16  9:21 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> With '--unpacked', pack-objects adds loose objects (which don't appear
> in any of the excluded packs from '--stdin-packs') to the output pack
> without considering them as reachability tips for the name-hash
> traversal.
>
> This was an oversight in the original implementation of '--stdin-packs',
> since the code which enumerates and adds loose objects to the output
> pack (`add_unreachable_loose_objects()`) did not have access to the
> 'rev_info' struct found in `read_packs_list_from_stdin()`.
>
> Excluding unpacked objects from that traversal doesn't affect the
> correctness of the resulting pack, but it does make it harder to
> discover good deltas for loose objects.
>
> Now that the 'rev_info' struct is declared outside of
> `read_packs_list_from_stdin()`, we can pass it to
> `add_objects_in_unpacked_packs()` and add any loose objects as tips to
> the above-mentioned traversal, in theory producing slightly tighter
> packs as a result.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---


Clever.  And the necessary changes are surprisingly small.  I like
it.


>  builtin/pack-objects.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index d60cb042c9..eb2a4099cc 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3644,7 +3644,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
>  	string_list_clear(&exclude_packs, 0);
>  }
>  
> -static void add_unreachable_loose_objects(void);
> +static void add_unreachable_loose_objects(struct rev_info *revs);
>  
>  static void read_stdin_packs(int rev_list_unpacked)
>  {
> @@ -3671,7 +3671,7 @@ static void read_stdin_packs(int rev_list_unpacked)
>  	ignore_packed_keep_in_core = 1;
>  	read_packs_list_from_stdin(&revs);
>  	if (rev_list_unpacked)
> -		add_unreachable_loose_objects();
> +		add_unreachable_loose_objects(&revs);
>  
>  	if (prepare_revision_walk(&revs))
>  		die(_("revision walk setup failed"));
> @@ -3790,7 +3790,7 @@ static void enumerate_cruft_objects(void)
>  						_("Enumerating cruft objects"), 0);
>  
>  	add_objects_in_unpacked_packs();
> -	add_unreachable_loose_objects();
> +	add_unreachable_loose_objects(NULL);
>  
>  	stop_progress(&progress_state);
>  }
> @@ -4068,8 +4068,9 @@ static void add_objects_in_unpacked_packs(void)
>  }
>  
>  static int add_loose_object(const struct object_id *oid, const char *path,
> -			    void *data UNUSED)
> +			    void *data)
>  {
> +	struct rev_info *revs = data;
>  	enum object_type type = oid_object_info(the_repository, oid, NULL);
>  
>  	if (type < 0) {
> @@ -4090,6 +4091,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
>  	} else {
>  		add_object_entry(oid, type, "", 0);
>  	}
> +
> +	if (revs && type == OBJ_COMMIT)
> +		add_pending_oid(revs, NULL, oid, 0);
> +
>  	return 0;
>  }
>  
> @@ -4098,11 +4103,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
>   * add_object_entry will weed out duplicates, so we just add every
>   * loose object we find.
>   */
> -static void add_unreachable_loose_objects(void)
> +static void add_unreachable_loose_objects(struct rev_info *revs)
>  {
>  	for_each_loose_file_in_objdir(repo_get_object_directory(the_repository),
> -				      add_loose_object,
> -				      NULL, NULL, NULL);
> +				      add_loose_object, NULL, NULL, revs);
>  }
>  
>  static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid)
> @@ -4358,7 +4362,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
>  	if (keep_unreachable)
>  		add_objects_in_unpacked_packs();
>  	if (pack_loose_unreachable)
> -		add_unreachable_loose_objects();
> +		add_unreachable_loose_objects(NULL);
>  	if (unpack_unreachable)
>  		loosen_unused_packed_objects();

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-04-16  0:58     ` Junio C Hamano
@ 2025-04-16 22:07       ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-16 22:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Jeff King

On Tue, Apr 15, 2025 at 05:58:23PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > In add_object_entry_from_pack() we declare 'revs' (given to us through
> > the miscellaneous context argument) earlier in the "if (p)" conditional
> > than is necessary.  Move it down as far as it can go to reduce its
> > scope.
>
> That makes sense, but ...
>
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  builtin/pack-objects.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 20dd870bbf..4ab695a3aa 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -3490,14 +3490,14 @@ static int add_object_entry_from_pack(const struct object_id *oid,
> >  		return 0;
> >
> >  	if (p) {
> > -		struct rev_info *revs = _data;
> >  		struct object_info oi = OBJECT_INFO_INIT;
> > -
> >  		oi.typep = &type;
> > +
>
> Isn't this change about spacing around oi's decl and the first
> statement in the block strictly worsening the code?  At least it is
> an unrelated change.

Yeah, this is cruft that I thought I had expunged while rebasing. Here's
a better version of the patch, but I'm happy to send a new round of the
series if it would be more convenient for you:

--- 8< ---

Subject: [PATCH] pack-objects: limit scope in 'add_object_entry_from_pack()'

In add_object_entry_from_pack() we declare 'revs' (given to us through
the miscellaneous context argument) earlier in the "if (p)" conditional
than is necessary.  Move it down as far as it can go to reduce its
scope.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 20dd870bbf..682e80be40 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3490,7 +3490,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 		return 0;

 	if (p) {
-		struct rev_info *revs = _data;
 		struct object_info oi = OBJECT_INFO_INIT;

 		oi.typep = &type;
@@ -3498,6 +3497,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 			die(_("could not get type of object %s in pack %s"),
 			    oid_to_hex(oid), p->pack_name);
 		} else if (type == OBJ_COMMIT) {
+			struct rev_info *revs = _data;
 			/*
 			 * commits in included packs are used as starting points for the
 			 * subsequent revision walk
--
2.49.0.230.ga662d77f78

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-04-16  5:31     ` Elijah Newren
@ 2025-04-16 22:07       ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-04-16 22:07 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano

On Tue, Apr 15, 2025 at 10:31:26PM -0700, Elijah Newren wrote:
> On Tue, Apr 15, 2025 at 3:46 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > In add_object_entry_from_pack() we declare 'revs' (given to us through
> > the miscellaneous context argument) earlier in the "if (p)" conditional
> > than is necessary.  Move it down as far as it can go to reduce its
> > scope.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  builtin/pack-objects.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 20dd870bbf..4ab695a3aa 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -3490,14 +3490,14 @@ static int add_object_entry_from_pack(const struct object_id *oid,
> >                 return 0;
> >
> >         if (p) {
> > -               struct rev_info *revs = _data;
>
> This change is half of what you mention in your commit message.

Yeah, Junio noted the same in his review as well. See my response there
for a cleaner version of this patch.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v3 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-04-16  5:56     ` Elijah Newren
@ 2025-04-16 22:16       ` Taylor Blau
  2025-05-13  3:34         ` Elijah Newren
  0 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-04-16 22:16 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano

On Tue, Apr 15, 2025 at 10:56:59PM -0700, Elijah Newren wrote:
> On Tue, Apr 15, 2025 at 3:47 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
> > geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
> > MIDX with '--write-midx' to ensure that the resulting MIDX was always
> > closed under reachability in order to generate reachability bitmaps.
> >
> > Suppose (prior to this patch) you have a once-unreachable object packed
> > in a cruft pack, which later on becomes reachable from one or more
> > objects in a geometrically repacked pack. That once-unreachable object
> > *won't* appear in the new pack, since the cruft pack was specified as
> > neither included nor excluded to 'pack-objects --stdin-packs'.
>
> But immediately prior to this patch you implemented
> --stdin-packs=follow, so the once-unreachable object would actually
> appear in the pack if that new option was used.  The "(prior to this
> patch)" addition was meant to help clarify here, but to me it doesn't
> succeed.  (If it had been "(prior to this series)" it would have
> clarified that we aren't yet using the new feature from the previous
> patch.)  Perhaps you meant that geometric repacking doesn't use
> --stdin-packs=follow currently, and therefore the once-unreachable
> object won't be in the new pack, but if so I think it would be helpful
> to call that out explicitly so the reader can more easily follow which
> hypothetical state you are discussing.

Yeah, I am referring to the state of the world from repack's perspective
here. It is the case that prior to this patch (9/9) we don't use
'--stdin-packs=follow' from the repack code when invoking pack-objects,
but I can sympathize with the confusion that this creates since the
distinction between the new mode existing and having real-life callers
from other builtins is subtle.

> > If the
> > new pack is included in a MIDX without the cruft pack, then trying to
> > generate bitmaps for that MIDX may fail. This happens when the bitmap
> > selection process picks one or more commits which reach the
> > once-unreachable objects, commit ddee3703b3 ensures that the MIDX will
> > be closed under reachability. Without it, we would fail to generate a
> > MIDX bitmap.
>
> The comma between objects and commit seems insufficient.  To me, that
> feels like a contrasting thought that should start a new sentence.
> Perhaps the last three lines could read something like:
>
> """
> once-unreachable objects.  Commit ddee3703b3 ensures that the MIDX will
> be closed under reachability by including cruft pack(s); without them,
> we would fail to generate a
> MIDX bitmap.
> """

I think swapping the comma for a semi-colon would have worked as well.
I'm going to leave it as-is unless you feel strongly about it, if that's
alright with you.

> > ---
> >  Documentation/config/repack.adoc |   7 ++
> >  builtin/repack.c                 | 163 +++++++++++++++++++++++++++----
> >  t/t7704-repack-cruft.sh          |  90 +++++++++++++++++
> >  3 files changed, 242 insertions(+), 18 deletions(-)
>
> You addressed the rest of my feedback with this patch, other than the
> two items I highlighted above.  I'm excited to see how this works out.

Thanks, and thanks for the review! I'm curious to see how it turns out
myself, too ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v3 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-04-16 22:16       ` Taylor Blau
@ 2025-05-13  3:34         ` Elijah Newren
  0 siblings, 0 replies; 105+ messages in thread
From: Elijah Newren @ 2025-05-13  3:34 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Wed, Apr 16, 2025 at 3:16 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Tue, Apr 15, 2025 at 10:56:59PM -0700, Elijah Newren wrote:
> > On Tue, Apr 15, 2025 at 3:47 PM Taylor Blau <me@ttaylorr.com> wrote:
> > >
> > > In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
> > > geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
> > > MIDX with '--write-midx' to ensure that the resulting MIDX was always
> > > closed under reachability in order to generate reachability bitmaps.
> > >
> > > Suppose (prior to this patch) you have a once-unreachable object packed
> > > in a cruft pack, which later on becomes reachable from one or more
> > > objects in a geometrically repacked pack. That once-unreachable object
> > > *won't* appear in the new pack, since the cruft pack was specified as
> > > neither included nor excluded to 'pack-objects --stdin-packs'.
> >
> > But immediately prior to this patch you implemented
> > --stdin-packs=follow, so the once-unreachable object would actually
> > appear in the pack if that new option was used.  The "(prior to this
> > patch)" addition was meant to help clarify here, but to me it doesn't
> > succeed.  (If it had been "(prior to this series)" it would have
> > clarified that we aren't yet using the new feature from the previous
> > patch.)  Perhaps you meant that geometric repacking doesn't use
> > --stdin-packs=follow currently, and therefore the once-unreachable
> > object won't be in the new pack, but if so I think it would be helpful
> > to call that out explicitly so the reader can more easily follow which
> > hypothetical state you are discussing.
>
> Yeah, I am referring to the state of the world from repack's perspective
> here. It is the case that prior to this patch (9/9) we don't use
> '--stdin-packs=follow' from the repack code when invoking pack-objects,
> but I can sympathize with the confusion that this creates since the
> distinction between the new mode existing and having real-life callers
> from other builtins is subtle.

Okay, but can we simply call out that --stdin-packs=follow is not yet
used by default in the commit message to make clear that we're talking
about the state without it?

And while we're at it, fix up the end of the paragraph as I mentioned before?

So, something like:

While the previous patch added the --stdin-packs=follow option to
pack-objects, it is not on by default.  Imagine you have a
once-unreachable object packed in a cruft pack, which later on becomes
reachable from one or more objects in a geometrically repacked
pack. That once-unreachable object *won't* appear in the new pack, since
the cruft pack was specified as neither included nor excluded to
'pack-objects --stdin-packs' (and `--stdin-packs=follow` is not on). If
the new pack is included in a MIDX without the cruft pack, then trying
to generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.  Commit ddee3703b3 ensures that the MIDX will
be closed under reachability by including cruft pack(s); without them,
we would fail to generate a MIDX bitmap.

> >
> > The comma between objects and commit seems insufficient.  To me, that
> > feels like a contrasting thought that should start a new sentence.
> > Perhaps the last three lines could read something like:
> >
> > """
> > once-unreachable objects.  Commit ddee3703b3 ensures that the MIDX will
> > be closed under reachability by including cruft pack(s); without them,
> > we would fail to generate a
> > MIDX bitmap.
> > """
>
> I think swapping the comma for a semi-colon would have worked as well.

I didn't feel as strongly about this second suggestion, but
incorporated it into the alternate wording I provided above for the
full paragraph since I think it's still an improvement.

As a sidenote, though, it's not just a comma and semicolon swap --
there's also the addition of a full stop and some extra clarifying
words.

> I'm going to leave it as-is unless you feel strongly about it, if that's
> alright with you.

I originally read this thinking that you were only turning down fixing
the second issue I raised, not the first.  Sorry for misunderstanding.
I would like to see a fix for the first issue I raised, at least.

Besides, since Junio didn't take your fixup 2/9 patch, I think you
should re-roll at a minimum to get a correct 2/9 out there (seen still
has the broken one).  And, if you're re-rolling the series, it'd be
really nice to incorporate the suggestions above in 9/9.  If you do
that, I'll put my stamp on the series and encourage Junio to merge it.
:-)

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) where possible
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
                   ` (9 preceding siblings ...)
  2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
@ 2025-05-28 23:20 ` Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
                     ` (9 more replies)
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
  12 siblings, 10 replies; 105+ messages in thread
From: Taylor Blau @ 2025-05-28 23:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Here is a tiny reroll of my series to explore creating MIDXs while
repacking that don't include the cruft pack.

The only changes since last time are as follows:

 - removed a stray whitespace change in patch 2/9 caught by Junio and
   Elijah

 - reworded the final commit message based on helpful feedback from
   Elijah.

The rest of the series is unchanged, and a range-diff is included below
as usual for convenience.

A meta-note on something new since last time: I have deployed a
cherry-picked version of this series to GitHub's infrastructure a few
weeks ago. The testing repository (whose maintenance failure many years
ago precipitated commit ddee3703b3 (builtin/repack.c: add cruft packs to
MIDX during geometric repack, 2022-05-20)) has been happily running
maintenance tasks without any issues since then, and the cruft pack has
been excluded from the MIDX.

Thanks in advance for any review :-).

Taylor Blau (9):
  pack-objects: use standard option incompatibility functions
  pack-objects: limit scope in 'add_object_entry_from_pack()'
  pack-objects: factor out handling '--stdin-packs'
  pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  pack-objects: perform name-hash traversal for unpacked objects
  pack-objects: fix typo in 'show_object_pack_hint()'
  pack-objects: swap 'show_{object,commit}_pack_hint'
  pack-objects: introduce '--stdin-packs=follow'
  repack: exclude cruft pack(s) from the MIDX where possible

 Documentation/config/repack.adoc    |   7 +
 Documentation/git-pack-objects.adoc |  10 +-
 builtin/pack-objects.c              | 190 ++++++++++++++++++----------
 builtin/repack.c                    | 163 +++++++++++++++++++++---
 t/t5331-pack-objects-stdin.sh       |  84 +++++++++++-
 t/t7704-repack-cruft.sh             |  90 +++++++++++++
 6 files changed, 455 insertions(+), 89 deletions(-)

Range-diff against v3:
 1:  986bef29b5 !  1:  f8b31c6a8d pack-objects: limit scope in 'add_object_entry_from_pack()'
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>

      ## Commit message ##
    -    pack-objects: limit scope in 'add_object_entry_from_pack()'
    +    pack-objects: use standard option incompatibility functions

    -    In add_object_entry_from_pack() we declare 'revs' (given to us through
    -    the miscellaneous context argument) earlier in the "if (p)" conditional
    -    than is necessary.  Move it down as far as it can go to reduce its
    -    scope.
    +    pack-objects has a handful of explicit checks for pairs of command-line
    +    options which are mutually incompatible. Many of these pre-date
    +    a699367bb8 (i18n: factorize more 'incompatible options' messages,
    +    2022-01-31).
    +
    +    Convert the explicit checks into die_for_incompatible_opt2() calls,
    +    which simplifies the implementation and standardizes pack-objects'
    +    output when given incompatible options (e.g., --stdin-packs with
    +    --filter gives different output than --keep-unreachable with
    +    --unpack-unreachable).
    +
    +    There is one minor piece of test fallout in t5331 that expects the old
    +    format, which has been corrected.

         Signed-off-by: Taylor Blau <me@ttaylorr.com>

      ## builtin/pack-objects.c ##
    -@@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct object_id *oid,
    - 		return 0;
    +@@ builtin/pack-objects.c: int cmd_pack_objects(int argc,
    + 		strvec_push(&rp, "--unpacked");
    + 	}

    - 	if (p) {
    --		struct rev_info *revs = _data;
    - 		struct object_info oi = OBJECT_INFO_INIT;
    --
    - 		oi.typep = &type;
    +-	if (exclude_promisor_objects && exclude_promisor_objects_best_effort)
    +-		die(_("options '%s' and '%s' cannot be used together"),
    +-		    "--exclude-promisor-objects", "--exclude-promisor-objects-best-effort");
    ++	die_for_incompatible_opt2(exclude_promisor_objects,
    ++				  "--exclude-promisor-objects",
    ++				  exclude_promisor_objects_best_effort,
    ++				  "--exclude-promisor-objects-best-effort");
    + 	if (exclude_promisor_objects) {
    + 		use_internal_rev_list = 1;
    + 		fetch_if_missing = 0;
    +@@ builtin/pack-objects.c: int cmd_pack_objects(int argc,
    + 	if (!pack_to_stdout && thin)
    + 		die(_("--thin cannot be used to build an indexable pack"));
    +
    +-	if (keep_unreachable && unpack_unreachable)
    +-		die(_("options '%s' and '%s' cannot be used together"), "--keep-unreachable", "--unpack-unreachable");
    ++	die_for_incompatible_opt2(keep_unreachable, "--keep-unreachable",
    ++				  unpack_unreachable, "--unpack-unreachable");
    + 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
    + 		unpack_unreachable_expiration = 0;
    +
    +-	if (stdin_packs && filter_options.choice)
    +-		die(_("cannot use --filter with --stdin-packs"));
    ++	die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
    ++				  filter_options.choice, "--filter");
     +
    - 		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
    - 			die(_("could not get type of object %s in pack %s"),
    - 			    oid_to_hex(oid), p->pack_name);
    - 		} else if (type == OBJ_COMMIT) {
    -+			struct rev_info *revs = _data;
    - 			/*
    - 			 * commits in included packs are used as starting points for the
    - 			 * subsequent revision walk
    +
    + 	if (stdin_packs && use_internal_rev_list)
    + 		die(_("cannot use internal rev list with --stdin-packs"));
    +@@ builtin/pack-objects.c: int cmd_pack_objects(int argc,
    + 	if (cruft) {
    + 		if (use_internal_rev_list)
    + 			die(_("cannot use internal rev list with --cruft"));
    +-		if (stdin_packs)
    +-			die(_("cannot use --stdin-packs with --cruft"));
    ++		die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
    ++					  cruft, "--cruft");
    + 	}
    +
    + 	/*
    +
    + ## t/t5331-pack-objects-stdin.sh ##
    +@@ t/t5331-pack-objects-stdin.sh: test_expect_success '--stdin-packs is incompatible with --filter' '
    + 		cd stdin-packs &&
    + 		test_must_fail git pack-objects --stdin-packs --stdout \
    + 			--filter=blob:none </dev/null 2>err &&
    +-		test_grep "cannot use --filter with --stdin-packs" err
    ++		test_grep "options .--stdin-packs. and .--filter. cannot be used together" err
    + 	)
    + '
    +
 -:  ---------- >  2:  2753e29648 pack-objects: limit scope in 'add_object_entry_from_pack()'
 2:  6f8fe8a4e1 =  3:  32b49d9073 pack-objects: factor out handling '--stdin-packs'
 3:  2a235461a6 =  4:  a797ff3a83 pack-objects: declare 'rev_info' for '--stdin-packs' earlier
 4:  240e90b68d =  5:  29bf05633a pack-objects: perform name-hash traversal for unpacked objects
 5:  9a18fa2e52 =  6:  0696fa1736 pack-objects: fix typo in 'show_object_pack_hint()'
 6:  6c997853f1 =  7:  1cc45b4472 pack-objects: swap 'show_{object,commit}_pack_hint'
 7:  0ff699f056 =  8:  3e3d929bd0 pack-objects: introduce '--stdin-packs=follow'
 8:  58891101f3 !  9:  52a069ef48 repack: exclude cruft pack(s) from the MIDX where possible
    @@ Commit message
         MIDX with '--write-midx' to ensure that the resulting MIDX was always
         closed under reachability in order to generate reachability bitmaps.

    -    Suppose (prior to this patch) you have a once-unreachable object packed
    -    in a cruft pack, which later on becomes reachable from one or more
    -    objects in a geometrically repacked pack. That once-unreachable object
    -    *won't* appear in the new pack, since the cruft pack was specified as
    -    neither included nor excluded to 'pack-objects --stdin-packs'. If the
    +    While the previous patch added the '--stdin-packs=follow' option to
    +    pack-objects, it is not yet on by default. Given that, suppose you have
    +    a once-unreachable object packed in a cruft pack, which later becomes
    +    reachable from one or more objects in a geometrically repacked pack.
    +    That once-unreachable object *won't* appear in the new pack, since the
    +    cruft pack was not specified as included or excluded when the
    +    geometrically repacked pack was created with 'pack-objects
    +    --stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
         new pack is included in a MIDX without the cruft pack, then trying to
         generate bitmaps for that MIDX may fail. This happens when the bitmap
         selection process picks one or more commits which reach the
    -    once-unreachable objects, commit ddee3703b3 ensures that the MIDX will
    -    be closed under reachability. Without it, we would fail to generate a
    -    MIDX bitmap.
    +    once-unreachable objects.

    +    To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
    +    will be closed under reachability by including cruft pack(s). If cruft
    +    pack(s) were not included, we would fail to generate a MIDX bitmap. But
         ddee3703b3 alludes to the fact that this is sub-optimal by saying

             [...] it's desirable to avoid including cruft packs in the MIDX

base-commit: 485f5f863615e670fd97ae40af744e14072cfe18
--
2.49.0.640.ga4de40e6a8

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH v4 1/9] pack-objects: use standard option incompatibility functions
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
@ 2025-05-28 23:20   ` Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-05-28 23:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

pack-objects has a handful of explicit checks for pairs of command-line
options which are mutually incompatible. Many of these pre-date
a699367bb8 (i18n: factorize more 'incompatible options' messages,
2022-01-31).

Convert the explicit checks into die_for_incompatible_opt2() calls,
which simplifies the implementation and standardizes pack-objects'
output when given incompatible options (e.g., --stdin-packs with
--filter gives different output than --keep-unreachable with
--unpack-unreachable).

There is one minor piece of test fallout in t5331 that expects the old
format, which has been corrected.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c        | 20 +++++++++++---------
 t/t5331-pack-objects-stdin.sh |  2 +-
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6b06d159d2..20dd870bbf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4651,9 +4651,10 @@ int cmd_pack_objects(int argc,
 		strvec_push(&rp, "--unpacked");
 	}
 
-	if (exclude_promisor_objects && exclude_promisor_objects_best_effort)
-		die(_("options '%s' and '%s' cannot be used together"),
-		    "--exclude-promisor-objects", "--exclude-promisor-objects-best-effort");
+	die_for_incompatible_opt2(exclude_promisor_objects,
+				  "--exclude-promisor-objects",
+				  exclude_promisor_objects_best_effort,
+				  "--exclude-promisor-objects-best-effort");
 	if (exclude_promisor_objects) {
 		use_internal_rev_list = 1;
 		fetch_if_missing = 0;
@@ -4691,13 +4692,14 @@ int cmd_pack_objects(int argc,
 	if (!pack_to_stdout && thin)
 		die(_("--thin cannot be used to build an indexable pack"));
 
-	if (keep_unreachable && unpack_unreachable)
-		die(_("options '%s' and '%s' cannot be used together"), "--keep-unreachable", "--unpack-unreachable");
+	die_for_incompatible_opt2(keep_unreachable, "--keep-unreachable",
+				  unpack_unreachable, "--unpack-unreachable");
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (stdin_packs && filter_options.choice)
-		die(_("cannot use --filter with --stdin-packs"));
+	die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
+				  filter_options.choice, "--filter");
+
 
 	if (stdin_packs && use_internal_rev_list)
 		die(_("cannot use internal rev list with --stdin-packs"));
@@ -4705,8 +4707,8 @@ int cmd_pack_objects(int argc,
 	if (cruft) {
 		if (use_internal_rev_list)
 			die(_("cannot use internal rev list with --cruft"));
-		if (stdin_packs)
-			die(_("cannot use --stdin-packs with --cruft"));
+		die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
+					  cruft, "--cruft");
 	}
 
 	/*
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index b48c0cbe8f..8fd07deb8d 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -64,7 +64,7 @@ test_expect_success '--stdin-packs is incompatible with --filter' '
 		cd stdin-packs &&
 		test_must_fail git pack-objects --stdin-packs --stdout \
 			--filter=blob:none </dev/null 2>err &&
-		test_grep "cannot use --filter with --stdin-packs" err
+		test_grep "options .--stdin-packs. and .--filter. cannot be used together" err
 	)
 '
 
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v4 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
@ 2025-05-28 23:20   ` Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-05-28 23:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In add_object_entry_from_pack() we declare 'revs' (given to us through
the miscellaneous context argument) earlier in the "if (p)" conditional
than is necessary.  Move it down as far as it can go to reduce its
scope.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 20dd870bbf..682e80be40 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3490,7 +3490,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 		return 0;
 
 	if (p) {
-		struct rev_info *revs = _data;
 		struct object_info oi = OBJECT_INFO_INIT;
 
 		oi.typep = &type;
@@ -3498,6 +3497,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 			die(_("could not get type of object %s in pack %s"),
 			    oid_to_hex(oid), p->pack_name);
 		} else if (type == OBJ_COMMIT) {
+			struct rev_info *revs = _data;
 			/*
 			 * commits in included packs are used as starting points for the
 			 * subsequent revision walk
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v4 3/9] pack-objects: factor out handling '--stdin-packs'
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
@ 2025-05-28 23:20   ` Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-05-28 23:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

At the bottom of cmd_pack_objects() we check which mode the command is
running in (e.g., generating a cruft pack, handling '--stdin-packs',
using the internal rev-list, etc.) and handle the mode appropriately.

The '--stdin-packs' case is handled inline (dating back to its
introduction in 339bce27f4 (builtin/pack-objects.c: add '--stdin-packs'
option, 2021-02-22)) since it is relatively short. Extract the body of
"if (stdin_packs)" into its own function to prepare for the
implementation to become lengthier in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 682e80be40..3f6a7c62e6 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3674,6 +3674,17 @@ static void read_packs_list_from_stdin(void)
 	string_list_clear(&exclude_packs, 0);
 }
 
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin();
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+}
+
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
 				   struct packed_git *pack, off_t offset,
 				   const char *name, uint32_t mtime)
@@ -3769,7 +3780,6 @@ static void mark_pack_kept_in_core(struct string_list *packs, unsigned keep)
 	}
 }
 
-static void add_unreachable_loose_objects(void);
 static void add_objects_in_unpacked_packs(void);
 
 static void enumerate_cruft_objects(void)
@@ -4776,11 +4786,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		/* avoids adding objects in excluded packs */
-		ignore_packed_keep_in_core = 1;
-		read_packs_list_from_stdin();
-		if (rev_list_unpacked)
-			add_unreachable_loose_objects();
+		read_stdin_packs(rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v4 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (2 preceding siblings ...)
  2025-05-28 23:20   ` [PATCH v4 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
@ 2025-05-28 23:20   ` Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-05-28 23:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
on each of the input packs, we do a reachability traversal to discover
names for any objects we picked up so we can generate name hash values
and hopefully get higher quality deltas as a result.

A future commit will change the purpose of this reachability traversal
to find and pack objects which are reachable from commits in the input
packs, but are packed in an unknown (not included nor excluded) pack.

Extract the code which initializes and performs the reachability
traversal to take place in the caller, not the callee, which prepares us
to share this code for the '--unpacked' case (see the function
add_unreachable_loose_objects() for more details).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 71 +++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 35 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3f6a7c62e6..88071b4d8c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3558,7 +3558,7 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
 		return 0;
 }
 
-static void read_packs_list_from_stdin(void)
+static void read_packs_list_from_stdin(struct rev_info *revs)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct string_list include_packs = STRING_LIST_INIT_DUP;
@@ -3566,24 +3566,6 @@ static void read_packs_list_from_stdin(void)
 	struct string_list_item *item = NULL;
 
 	struct packed_git *p;
-	struct rev_info revs;
-
-	repo_init_revisions(the_repository, &revs, NULL);
-	/*
-	 * Use a revision walk to fill in the namehash of objects in the include
-	 * packs. To save time, we'll avoid traversing through objects that are
-	 * in excluded packs.
-	 *
-	 * That may cause us to avoid populating all of the namehash fields of
-	 * all included objects, but our goal is best-effort, since this is only
-	 * an optimization during delta selection.
-	 */
-	revs.no_kept_objects = 1;
-	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
-	revs.blob_objects = 1;
-	revs.tree_objects = 1;
-	revs.tag_objects = 1;
-	revs.ignore_missing_links = 1;
 
 	while (strbuf_getline(&buf, stdin) != EOF) {
 		if (!buf.len)
@@ -3653,10 +3635,44 @@ static void read_packs_list_from_stdin(void)
 		struct packed_git *p = item->util;
 		for_each_object_in_pack(p,
 					add_object_entry_from_pack,
-					&revs,
+					revs,
 					FOR_EACH_OBJECT_PACK_ORDER);
 	}
 
+	strbuf_release(&buf);
+	string_list_clear(&include_packs, 0);
+	string_list_clear(&exclude_packs, 0);
+}
+
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	struct rev_info revs;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	/*
+	 * Use a revision walk to fill in the namehash of objects in the include
+	 * packs. To save time, we'll avoid traversing through objects that are
+	 * in excluded packs.
+	 *
+	 * That may cause us to avoid populating all of the namehash fields of
+	 * all included objects, but our goal is best-effort, since this is only
+	 * an optimization during delta selection.
+	 */
+	revs.no_kept_objects = 1;
+	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
+	revs.blob_objects = 1;
+	revs.tree_objects = 1;
+	revs.tag_objects = 1;
+	revs.ignore_missing_links = 1;
+
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin(&revs);
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 	traverse_commit_list(&revs,
@@ -3668,21 +3684,6 @@ static void read_packs_list_from_stdin(void)
 			   stdin_packs_found_nr);
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
 			   stdin_packs_hints_nr);
-
-	strbuf_release(&buf);
-	string_list_clear(&include_packs, 0);
-	string_list_clear(&exclude_packs, 0);
-}
-
-static void add_unreachable_loose_objects(void);
-
-static void read_stdin_packs(int rev_list_unpacked)
-{
-	/* avoids adding objects in excluded packs */
-	ignore_packed_keep_in_core = 1;
-	read_packs_list_from_stdin();
-	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
 }
 
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v4 5/9] pack-objects: perform name-hash traversal for unpacked objects
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (3 preceding siblings ...)
  2025-05-28 23:20   ` [PATCH v4 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
@ 2025-05-28 23:20   ` Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-05-28 23:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

With '--unpacked', pack-objects adds loose objects (which don't appear
in any of the excluded packs from '--stdin-packs') to the output pack
without considering them as reachability tips for the name-hash
traversal.

This was an oversight in the original implementation of '--stdin-packs',
since the code which enumerates and adds loose objects to the output
pack (`add_unreachable_loose_objects()`) did not have access to the
'rev_info' struct found in `read_packs_list_from_stdin()`.

Excluding unpacked objects from that traversal doesn't affect the
correctness of the resulting pack, but it does make it harder to
discover good deltas for loose objects.

Now that the 'rev_info' struct is declared outside of
`read_packs_list_from_stdin()`, we can pass it to
`add_objects_in_unpacked_packs()` and add any loose objects as tips to
the above-mentioned traversal, in theory producing slightly tighter
packs as a result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 88071b4d8c..359f0c3c30 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3644,7 +3644,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 	string_list_clear(&exclude_packs, 0);
 }
 
-static void add_unreachable_loose_objects(void);
+static void add_unreachable_loose_objects(struct rev_info *revs);
 
 static void read_stdin_packs(int rev_list_unpacked)
 {
@@ -3671,7 +3671,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	ignore_packed_keep_in_core = 1;
 	read_packs_list_from_stdin(&revs);
 	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(&revs);
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
@@ -3790,7 +3790,7 @@ static void enumerate_cruft_objects(void)
 						_("Enumerating cruft objects"), 0);
 
 	add_objects_in_unpacked_packs();
-	add_unreachable_loose_objects();
+	add_unreachable_loose_objects(NULL);
 
 	stop_progress(&progress_state);
 }
@@ -4068,8 +4068,9 @@ static void add_objects_in_unpacked_packs(void)
 }
 
 static int add_loose_object(const struct object_id *oid, const char *path,
-			    void *data UNUSED)
+			    void *data)
 {
+	struct rev_info *revs = data;
 	enum object_type type = oid_object_info(the_repository, oid, NULL);
 
 	if (type < 0) {
@@ -4090,6 +4091,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
 	} else {
 		add_object_entry(oid, type, "", 0);
 	}
+
+	if (revs && type == OBJ_COMMIT)
+		add_pending_oid(revs, NULL, oid, 0);
+
 	return 0;
 }
 
@@ -4098,11 +4103,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
  * add_object_entry will weed out duplicates, so we just add every
  * loose object we find.
  */
-static void add_unreachable_loose_objects(void)
+static void add_unreachable_loose_objects(struct rev_info *revs)
 {
 	for_each_loose_file_in_objdir(repo_get_object_directory(the_repository),
-				      add_loose_object,
-				      NULL, NULL, NULL);
+				      add_loose_object, NULL, NULL, revs);
 }
 
 static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid)
@@ -4358,7 +4362,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
 	if (keep_unreachable)
 		add_objects_in_unpacked_packs();
 	if (pack_loose_unreachable)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(NULL);
 	if (unpack_unreachable)
 		loosen_unused_packed_objects();
 
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v4 6/9] pack-objects: fix typo in 'show_object_pack_hint()'
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (4 preceding siblings ...)
  2025-05-28 23:20   ` [PATCH v4 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
@ 2025-05-28 23:20   ` Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-05-28 23:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Noticed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 359f0c3c30..a68451c3d2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3532,7 +3532,7 @@ static void show_object_pack_hint(struct object *object, const char *name,
 	 * would typically pick up during a reachability traversal.
 	 *
 	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * here using a now in order to perhaps improve the delta selection
+	 * fields here in order to perhaps improve the delta selection
 	 * process.
 	 */
 	oe->hash = pack_name_hash_fn(name);
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v4 7/9] pack-objects: swap 'show_{object,commit}_pack_hint'
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (5 preceding siblings ...)
  2025-05-28 23:20   ` [PATCH v4 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
@ 2025-05-28 23:20   ` Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-05-28 23:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

show_commit_pack_hint() has heretofore been a noop, so its position
within its compilation unit only needs to appear before its first use.

But the following commit will sometimes have `show_commit_pack_hint()`
call `show_object_pack_hint()`, so reorder the former to appear after
the latter to minimize the code movement in that patch.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a68451c3d2..d3dfe983c3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3513,12 +3513,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 	return 0;
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
-{
-	/* nothing to do; commits don't have a namehash */
-}
-
 static void show_object_pack_hint(struct object *object, const char *name,
 				  void *data UNUSED)
 {
@@ -3541,6 +3535,12 @@ static void show_object_pack_hint(struct object *object, const char *name,
 	stdin_packs_hints_nr++;
 }
 
+static void show_commit_pack_hint(struct commit *commit UNUSED,
+				  void *data UNUSED)
+{
+	/* nothing to do; commits don't have a namehash */
+}
+
 static int pack_mtime_cmp(const void *_a, const void *_b)
 {
 	struct packed_git *a = ((const struct string_list_item*)_a)->util;
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v4 8/9] pack-objects: introduce '--stdin-packs=follow'
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (6 preceding siblings ...)
  2025-05-28 23:20   ` [PATCH v4 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
@ 2025-05-28 23:20   ` Taylor Blau
  2025-05-28 23:20   ` [PATCH v4 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
  2025-05-29  0:07   ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
  9 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-05-28 23:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

When invoked with '--stdin-packs', pack-objects will generate a pack
which contains the objects found in the "included" packs, less any
objects from "excluded" packs.

Packs that exist in the repository but weren't specified as either
included or excluded are in practice treated like the latter, at least
in the sense that pack-objects won't include objects from those packs.
This behavior forces us to include any cruft pack(s) in a repository's
multi-pack index for the reasons described in ddee3703b3
(builtin/repack.c: add cruft packs to MIDX during geometric repack,
2022-05-20).

The full details are in ddee3703b3, but the gist is if you
have a once-unreachable object in a cruft pack which later becomes
reachable via one or more commits in a pack generated with
'--stdin-packs', you *have* to include that object in the MIDX via the
copy in the cruft pack, otherwise we cannot generate reachability
bitmaps for any commits which reach that object.

This prepares us for new repacking behavior which will "resurrect"
objects found in cruft or otherwise unspecified packs when generating
new packs. In the context of geometric repacking, this may be used to
maintain a sequence of geometrically-repacked packs, the union of which
is closed under reachability, even in the case described earlier.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc | 10 +++-
 builtin/pack-objects.c              | 83 +++++++++++++++++++++--------
 t/t5331-pack-objects-stdin.sh       | 82 ++++++++++++++++++++++++++++
 3 files changed, 152 insertions(+), 23 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 7f69ae4855..8f0cecaec9 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -87,13 +87,21 @@ base-name::
 	reference was included in the resulting packfile.  This
 	can be useful to send new tags to native Git clients.
 
---stdin-packs::
+--stdin-packs[=<mode>]::
 	Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
 	from the standard input, instead of object names or revision
 	arguments. The resulting pack contains all objects listed in the
 	included packs (those not beginning with `^`), excluding any
 	objects listed in the excluded packs (beginning with `^`).
 +
+When `mode` is "follow", objects from packs not listed on stdin receive
+special treatment. Objects within unlisted packs will be included if
+those objects are (1) reachable from the included packs, and (2) not
+found in any excluded packs. This mode is useful, for example, to
+resurrect once-unreachable objects found in cruft packs to generate
+packs which are closed under reachability up to the boundary set by the
+excluded packs.
++
 Incompatible with `--revs`, or options that imply `--revs` (such as
 `--all`), with the exception of `--unpacked`, which is compatible.
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d3dfe983c3..c6ec346369 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -272,6 +272,12 @@ static struct oidmap configured_exclusions;
 static struct oidset excluded_by_config;
 static int name_hash_version = -1;
 
+enum stdin_packs_mode {
+	STDIN_PACKS_MODE_NONE,
+	STDIN_PACKS_MODE_STANDARD,
+	STDIN_PACKS_MODE_FOLLOW,
+};
+
 /**
  * Check whether the name_hash_version chosen by user input is appropriate,
  * and also validate whether it is compatible with other features.
@@ -3514,31 +3520,44 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 }
 
 static void show_object_pack_hint(struct object *object, const char *name,
-				  void *data UNUSED)
+				  void *data)
 {
-	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
-	if (!oe)
-		return;
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		add_object_entry(&object->oid, object->type, name, 0);
+	} else {
+		struct object_entry *oe = packlist_find(&to_pack, &object->oid);
+		if (!oe)
+			return;
 
-	/*
-	 * Our 'to_pack' list was constructed by iterating all objects packed in
-	 * included packs, and so doesn't have a non-zero hash field that you
-	 * would typically pick up during a reachability traversal.
-	 *
-	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * fields here in order to perhaps improve the delta selection
-	 * process.
-	 */
-	oe->hash = pack_name_hash_fn(name);
-	oe->no_try_delta = name && no_try_delta(name);
+		/*
+		 * Our 'to_pack' list was constructed by iterating all
+		 * objects packed in included packs, and so doesn't have
+		 * a non-zero hash field that you would typically pick
+		 * up during a reachability traversal.
+		 *
+		 * Make a best-effort attempt to fill in the ->hash and
+		 * ->no_try_delta fields here in order to perhaps
+		 * improve the delta selection process.
+		 */
+		oe->hash = pack_name_hash_fn(name);
+		oe->no_try_delta = name && no_try_delta(name);
 
-	stdin_packs_hints_nr++;
+		stdin_packs_hints_nr++;
+	}
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
+static void show_commit_pack_hint(struct commit *commit, void *data)
 {
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		show_object_pack_hint((struct object *)commit, "", data);
+		return;
+	}
+
 	/* nothing to do; commits don't have a namehash */
+
 }
 
 static int pack_mtime_cmp(const void *_a, const void *_b)
@@ -3646,7 +3665,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 
 static void add_unreachable_loose_objects(struct rev_info *revs);
 
-static void read_stdin_packs(int rev_list_unpacked)
+static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 {
 	struct rev_info revs;
 
@@ -3678,7 +3697,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	traverse_commit_list(&revs,
 			     show_commit_pack_hint,
 			     show_object_pack_hint,
-			     NULL);
+			     &mode);
 
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
 			   stdin_packs_found_nr);
@@ -4469,6 +4488,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
 	return is_not_in_promisor_pack_obj((struct object *) commit, data);
 }
 
+static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
+				  int unset)
+{
+	enum stdin_packs_mode *mode = opt->value;
+
+	if (unset)
+		*mode = STDIN_PACKS_MODE_NONE;
+	else if (!arg || !*arg)
+		*mode = STDIN_PACKS_MODE_STANDARD;
+	else if (!strcmp(arg, "follow"))
+		*mode = STDIN_PACKS_MODE_FOLLOW;
+	else
+		die(_("invalid value for '%s': '%s'"), opt->long_name, arg);
+
+	return 0;
+}
+
 int cmd_pack_objects(int argc,
 		     const char **argv,
 		     const char *prefix,
@@ -4480,7 +4516,7 @@ int cmd_pack_objects(int argc,
 	struct strvec rp = STRVEC_INIT;
 	int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
 	int rev_list_index = 0;
-	int stdin_packs = 0;
+	enum stdin_packs_mode stdin_packs = STDIN_PACKS_MODE_NONE;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
 	struct list_objects_filter_options filter_options =
 		LIST_OBJECTS_FILTER_INIT;
@@ -4535,6 +4571,9 @@ int cmd_pack_objects(int argc,
 		OPT_SET_INT_F(0, "indexed-objects", &rev_list_index,
 			      N_("include objects referred to by the index"),
 			      1, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F(0, "stdin-packs", &stdin_packs, N_("mode"),
+			     N_("read packs from stdin"),
+			     PARSE_OPT_OPTARG, parse_stdin_packs_mode),
 		OPT_BOOL(0, "stdin-packs", &stdin_packs,
 			 N_("read packs from stdin")),
 		OPT_BOOL(0, "stdout", &pack_to_stdout,
@@ -4791,7 +4830,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		read_stdin_packs(rev_list_unpacked);
+		read_stdin_packs(stdin_packs, rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 8fd07deb8d..60a2b4bc07 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -236,4 +236,86 @@ test_expect_success 'pack-objects --stdin with packfiles from main and alternate
 	test_cmp expected-objects actual-objects
 '
 
+packdir=.git/objects/pack
+
+objects_in_packs () {
+	for p in "$@"
+	do
+		git show-index <"$packdir/pack-$p.idx" || return 1
+	done >objects.raw &&
+
+	cut -d' ' -f2 objects.raw | sort &&
+	rm -f objects.raw
+}
+
+test_expect_success '--stdin-packs=follow walks into unknown packs' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+
+		for c in A B C D
+		do
+			test_commit "$c" || return 1
+		done &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
+		test_commit E &&
+
+		git prune-packed &&
+
+		cat >in <<-EOF &&
+		pack-$B.pack
+		^pack-$C.pack
+		pack-$D.pack
+		EOF
+
+		# With just --stdin-packs, pack "A" is unknown to us, so
+		# only objects from packs "B" and "D" are included in
+		# the output pack.
+		P=$(git pack-objects --stdin-packs $packdir/pack <in) &&
+		objects_in_packs $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# But with --stdin-packs=follow, objects from both
+		# included packs reach objects from the unknown pack, so
+		# objects from pack "A" is included in the output pack
+		# in addition to the above.
+		P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
+		objects_in_packs $A $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# And with --unpacked, we will pick up objects from unknown
+		# packs that are reachable from loose objects. Loose object E
+		# reaches objects in pack A, but there are three excluded packs
+		# in between.
+		#
+		# The resulting pack should include objects reachable from E
+		# that are not present in packs B, C, or D, along with those
+		# present in pack A.
+		cat >in <<-EOF &&
+		^pack-$B.pack
+		^pack-$C.pack
+		^pack-$D.pack
+		EOF
+
+		P=$(git pack-objects --stdin-packs=follow --unpacked \
+			$packdir/pack <in) &&
+
+		{
+			objects_in_packs $A &&
+			git rev-list --objects --no-object-names D..E
+		}>expect.raw &&
+		sort expect.raw >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v4 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (7 preceding siblings ...)
  2025-05-28 23:20   ` [PATCH v4 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
@ 2025-05-28 23:20   ` Taylor Blau
  2025-06-19 11:33     ` Carlo Marcelo Arenas Belón
  2025-06-19 13:08     ` [PATCH] fixup! " Carlo Marcelo Arenas Belón
  2025-05-29  0:07   ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
  9 siblings, 2 replies; 105+ messages in thread
From: Taylor Blau @ 2025-05-28 23:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.

While the previous patch added the '--stdin-packs=follow' option to
pack-objects, it is not yet on by default. Given that, suppose you have
a once-unreachable object packed in a cruft pack, which later becomes
reachable from one or more objects in a geometrically repacked pack.
That once-unreachable object *won't* appear in the new pack, since the
cruft pack was not specified as included or excluded when the
geometrically repacked pack was created with 'pack-objects
--stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.

To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
will be closed under reachability by including cruft pack(s). If cruft
pack(s) were not included, we would fail to generate a MIDX bitmap. But
ddee3703b3 alludes to the fact that this is sub-optimal by saying

    [...] it's desirable to avoid including cruft packs in the MIDX
    because it causes the MIDX to store a bunch of objects which are
    likely to get thrown away.

, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.

This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.

This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.

Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.

If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.

But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.

Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/repack.adoc |   7 ++
 builtin/repack.c                 | 163 +++++++++++++++++++++++++++----
 t/t7704-repack-cruft.sh          |  90 +++++++++++++++++
 3 files changed, 242 insertions(+), 18 deletions(-)

diff --git a/Documentation/config/repack.adoc b/Documentation/config/repack.adoc
index c79af6d7b8..e9e78dcb19 100644
--- a/Documentation/config/repack.adoc
+++ b/Documentation/config/repack.adoc
@@ -39,3 +39,10 @@ repack.cruftThreads::
 	a cruft pack and the respective parameters are not given over
 	the command line. See similarly named `pack.*` configuration
 	variables for defaults and meaning.
+
+repack.midxMustContainCruft::
+	When set to true, linkgit:git-repack[1] will unconditionally include
+	cruft pack(s), if any, in the multi-pack index when invoked with
+	`--write-midx`. When false, cruft packs are only included in the MIDX
+	when necessary (e.g., because they might be required to form a
+	reachability closure with MIDX bitmaps). Defaults to true.
diff --git a/builtin/repack.c b/builtin/repack.c
index f3330ade7b..c9e2e3d04d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -39,6 +39,7 @@ static int write_bitmaps = -1;
 static int use_delta_islands;
 static int run_update_server_info = 1;
 static char *packdir, *packtmp_name, *packtmp;
+static int midx_must_contain_cruft = 1;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
@@ -107,6 +108,10 @@ static int repack_config(const char *var, const char *value,
 		free(cruft_po_args->threads);
 		return git_config_string(&cruft_po_args->threads, var, value);
 	}
+	if (!strcmp(var, "repack.midxmustcontaincruft")) {
+		midx_must_contain_cruft = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, ctx, cb);
 }
 
@@ -687,6 +692,77 @@ static void free_pack_geometry(struct pack_geometry *geometry)
 	free(geometry->pack);
 }
 
+static int midx_has_unknown_packs(char **midx_pack_names,
+				  size_t midx_pack_names_nr,
+				  struct string_list *include,
+				  struct pack_geometry *geometry,
+				  struct existing_packs *existing)
+{
+	size_t i;
+
+	string_list_sort(include);
+
+	for (i = 0; i < midx_pack_names_nr; i++) {
+		const char *pack_name = midx_pack_names[i];
+
+		/*
+		 * Determine whether or not each MIDX'd pack from the existing
+		 * MIDX (if any) is represented in the new MIDX. For each pack
+		 * in the MIDX, it must either be:
+		 *
+		 *  - In the "include" list of packs to be included in the new
+		 *    MIDX. Note this function is called before the include
+		 *    list is populated with any cruft pack(s).
+		 *
+		 *  - Below the geometric split line (if using pack geometry),
+		 *    indicating that the pack won't be included in the new
+		 *    MIDX, but its contents were rolled up as part of the
+		 *    geometric repack.
+		 *
+		 *  - In the existing non-kept packs list (if not using pack
+		 *    geometry), and marked as non-deleted.
+		 */
+		if (string_list_has_string(include, pack_name)) {
+			continue;
+		} else if (geometry) {
+			struct strbuf buf = STRBUF_INIT;
+			uint32_t j;
+
+			for (j = 0; j < geometry->split; j++) {
+				strbuf_reset(&buf);
+				strbuf_addstr(&buf, pack_basename(geometry->pack[j]));
+				strbuf_strip_suffix(&buf, ".pack");
+				strbuf_addstr(&buf, ".idx");
+
+				if (!strcmp(pack_name, buf.buf)) {
+					strbuf_release(&buf);
+					break;
+				}
+			}
+
+			strbuf_release(&buf);
+
+			if (j < geometry->split)
+				continue;
+		} else {
+			struct string_list_item *item;
+
+			item = string_list_lookup(&existing->non_kept_packs,
+						  pack_name);
+			if (item && !pack_is_marked_for_deletion(item))
+				continue;
+		}
+
+		/*
+		 * If we got to this point, the MIDX includes some pack that we
+		 * don't know about.
+		 */
+		return 1;
+	}
+
+	return 0;
+}
+
 struct midx_snapshot_ref_data {
 	struct tempfile *f;
 	struct oidset seen;
@@ -755,6 +831,8 @@ static void midx_snapshot_refs(struct tempfile *f)
 
 static void midx_included_packs(struct string_list *include,
 				struct existing_packs *existing,
+				char **midx_pack_names,
+				size_t midx_pack_names_nr,
 				struct string_list *names,
 				struct pack_geometry *geometry)
 {
@@ -808,26 +886,56 @@ static void midx_included_packs(struct string_list *include,
 		}
 	}
 
-	for_each_string_list_item(item, &existing->cruft_packs) {
+	if (midx_must_contain_cruft ||
+	    midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
+				   include, geometry, existing)) {
 		/*
-		 * When doing a --geometric repack, there is no need to check
-		 * for deleted packs, since we're by definition not doing an
-		 * ALL_INTO_ONE repack (hence no packs will be deleted).
-		 * Otherwise we must check for and exclude any packs which are
-		 * enqueued for deletion.
+		 * If there are one or more unknown pack(s) present (see
+		 * midx_has_unknown_packs() for what makes a pack
+		 * "unknown") in the MIDX before the repack, keep them
+		 * as they may be required to form a reachability
+		 * closure if the MIDX is bitmapped.
 		 *
-		 * So we could omit the conditional below in the --geometric
-		 * case, but doing so is unnecessary since no packs are marked
-		 * as pending deletion (since we only call
-		 * `mark_packs_for_deletion()` when doing an all-into-one
-		 * repack).
+		 * For example, a cruft pack can be required to form a
+		 * reachability closure if the MIDX is bitmapped and one
+		 * or more of the bitmap's selected commits reaches a
+		 * once-cruft object that was later made reachable.
 		 */
-		if (pack_is_marked_for_deletion(item))
-			continue;
+		for_each_string_list_item(item, &existing->cruft_packs) {
+			/*
+			 * When doing a --geometric repack, there is no
+			 * need to check for deleted packs, since we're
+			 * by definition not doing an ALL_INTO_ONE
+			 * repack (hence no packs will be deleted).
+			 * Otherwise we must check for and exclude any
+			 * packs which are enqueued for deletion.
+			 *
+			 * So we could omit the conditional below in the
+			 * --geometric case, but doing so is unnecessary
+			 *  since no packs are marked as pending
+			 *  deletion (since we only call
+			 *  `mark_packs_for_deletion()` when doing an
+			 *  all-into-one repack).
+			 */
+			if (pack_is_marked_for_deletion(item))
+				continue;
 
-		strbuf_reset(&buf);
-		strbuf_addf(&buf, "%s.idx", item->string);
-		string_list_insert(include, buf.buf);
+			strbuf_reset(&buf);
+			strbuf_addf(&buf, "%s.idx", item->string);
+			string_list_insert(include, buf.buf);
+		}
+	} else {
+		/*
+		 * Modern versions of Git (with the appropriate
+		 * configuration setting) will write new copies of
+		 * once-cruft objects when doing a --geometric repack.
+		 *
+		 * If the MIDX has no cruft pack, new packs written
+		 * during a --geometric repack will not rely on the
+		 * cruft pack to form a reachability closure, so we can
+		 * avoid including them in the MIDX in that case.
+		 */
+		;
 	}
 
 	strbuf_release(&buf);
@@ -1142,6 +1250,8 @@ int cmd_repack(int argc,
 	struct tempfile *refs_snapshot = NULL;
 	int i, ext, ret;
 	int show_progress;
+	char **midx_pack_names = NULL;
+	size_t midx_pack_names_nr = 0;
 
 	/* variables to be filled by option parsing */
 	int delete_redundant = 0;
@@ -1356,7 +1466,10 @@ int cmd_repack(int argc,
 		    !(pack_everything & PACK_CRUFT))
 			strvec_push(&cmd.args, "--pack-loose-unreachable");
 	} else if (geometry.split_factor) {
-		strvec_push(&cmd.args, "--stdin-packs");
+		if (midx_must_contain_cruft)
+			strvec_push(&cmd.args, "--stdin-packs");
+		else
+			strvec_push(&cmd.args, "--stdin-packs=follow");
 		strvec_push(&cmd.args, "--unpacked");
 	} else {
 		strvec_push(&cmd.args, "--unpacked");
@@ -1478,6 +1591,16 @@ int cmd_repack(int argc,
 
 	string_list_sort(&names);
 
+	if (get_local_multi_pack_index(the_repository)) {
+		uint32_t i;
+		struct multi_pack_index *m =
+			get_local_multi_pack_index(the_repository);
+
+		ALLOC_ARRAY(midx_pack_names, m->num_packs);
+		for (i = 0; i < m->num_packs; i++)
+			midx_pack_names[midx_pack_names_nr++] = xstrdup(m->pack_names[i]);
+	}
+
 	close_object_store(the_repository->objects);
 
 	/*
@@ -1519,7 +1642,8 @@ int cmd_repack(int argc,
 
 	if (write_midx) {
 		struct string_list include = STRING_LIST_INIT_DUP;
-		midx_included_packs(&include, &existing, &names, &geometry);
+		midx_included_packs(&include, &existing, midx_pack_names,
+				    midx_pack_names_nr, &names, &geometry);
 
 		ret = write_midx_included_packs(&include, &geometry, &names,
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
@@ -1570,6 +1694,9 @@ int cmd_repack(int argc,
 	string_list_clear(&names, 1);
 	existing_packs_release(&existing);
 	free_pack_geometry(&geometry);
+	for (size_t i = 0; i < midx_pack_names_nr; i++)
+		free(midx_pack_names[i]);
+	free(midx_pack_names);
 	pack_objects_args_release(&po_args);
 	pack_objects_args_release(&cruft_po_args);
 
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 8aebfb45f5..2b0a55f8fd 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -724,4 +724,94 @@ test_expect_success 'cruft repack respects --quiet' '
 	)
 '
 
+setup_cruft_exclude_tests() {
+	git init "$1" &&
+	(
+		cd "$1" &&
+
+		git config repack.midxMustContainCruft false &&
+
+		test_commit one &&
+
+		test_commit --no-tag two &&
+		two="$(git rev-parse HEAD)" &&
+		test_commit --no-tag three &&
+		three="$(git rev-parse HEAD)" &&
+		git reset --hard one &&
+		git reflog expire --all --expire=all &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack --cruft -d &&
+
+		git merge $two &&
+		test_commit four
+	)
+}
+
+test_expect_success 'repack --write-midx excludes cruft where possible' '
+	setup_cruft_exclude_tests exclude-cruft-when-possible &&
+	(
+		cd exclude-cruft-when-possible &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git rev-list --all --objects --no-object-names >reachable.raw &&
+		sort reachable.raw >reachable.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp reachable.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when instructed' '
+	setup_cruft_exclude_tests exclude-cruft-when-instructed &&
+	(
+		cd exclude-cruft-when-instructed &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git -c repack.midxMustContainCruft=true repack \
+			-d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git cat-file --batch-check="%(objectname)" --batch-all-objects \
+			>all.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp all.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when necessary' '
+	setup_cruft_exclude_tests exclude-cruft-when-necessary &&
+	(
+		cd exclude-cruft-when-necessary &&
+
+		test_path_is_file $(ls $packdir/pack-*.mtimes) &&
+		ls $packdir/pack-*.idx | sort >packs.all &&
+		grep -o "pack-.*\.idx$" packs.all >in &&
+
+		git multi-pack-index write --stdin-packs --bitmap <in &&
+
+		test_commit five &&
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+		git cat-file --batch-all-objects --batch-check="%(objectname)" \
+			>expect.objects &&
+		test_cmp expect.objects midx.objects &&
+
+		grep "^pack-" midx >midx.packs &&
+		test_line_count = "$(($(wc -l <packs.all) + 1))" midx.packs
+	)
+'
+
 test_done
-- 
2.49.0.640.ga4de40e6a8

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) where possible
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (8 preceding siblings ...)
  2025-05-28 23:20   ` [PATCH v4 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
@ 2025-05-29  0:07   ` Taylor Blau
  2025-05-29  0:15     ` Elijah Newren
  9 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-05-29  0:07 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

On Wed, May 28, 2025 at 07:20:07PM -0400, Taylor Blau wrote:
> Range-diff against v3:

Hmm. My tool for submitting patches botched the range-diff here. The
correct range-diff is:

1:  986bef29b5 ! 1:  2753e29648 pack-objects: limit scope in 'add_object_entry_from_pack()'
    @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct objec
      	if (p) {
     -		struct rev_info *revs = _data;
      		struct object_info oi = OBJECT_INFO_INIT;
    --
    +
      		oi.typep = &type;
    -+
    - 		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
    +@@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct object_id *oid,
      			die(_("could not get type of object %s in pack %s"),
      			    oid_to_hex(oid), p->pack_name);
      		} else if (type == OBJ_COMMIT) {
2:  6f8fe8a4e1 = 2:  32b49d9073 pack-objects: factor out handling '--stdin-packs'
3:  2a235461a6 = 3:  a797ff3a83 pack-objects: declare 'rev_info' for '--stdin-packs' earlier
4:  240e90b68d = 4:  29bf05633a pack-objects: perform name-hash traversal for unpacked objects
5:  9a18fa2e52 = 5:  0696fa1736 pack-objects: fix typo in 'show_object_pack_hint()'
6:  6c997853f1 = 6:  1cc45b4472 pack-objects: swap 'show_{object,commit}_pack_hint'
7:  0ff699f056 = 7:  3e3d929bd0 pack-objects: introduce '--stdin-packs=follow'
8:  58891101f3 ! 8:  52a069ef48 repack: exclude cruft pack(s) from the MIDX where possible
    @@ Commit message
         MIDX with '--write-midx' to ensure that the resulting MIDX was always
         closed under reachability in order to generate reachability bitmaps.

    -    Suppose (prior to this patch) you have a once-unreachable object packed
    -    in a cruft pack, which later on becomes reachable from one or more
    -    objects in a geometrically repacked pack. That once-unreachable object
    -    *won't* appear in the new pack, since the cruft pack was specified as
    -    neither included nor excluded to 'pack-objects --stdin-packs'. If the
    +    While the previous patch added the '--stdin-packs=follow' option to
    +    pack-objects, it is not yet on by default. Given that, suppose you have
    +    a once-unreachable object packed in a cruft pack, which later becomes
    +    reachable from one or more objects in a geometrically repacked pack.
    +    That once-unreachable object *won't* appear in the new pack, since the
    +    cruft pack was not specified as included or excluded when the
    +    geometrically repacked pack was created with 'pack-objects
    +    --stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
         new pack is included in a MIDX without the cruft pack, then trying to
         generate bitmaps for that MIDX may fail. This happens when the bitmap
         selection process picks one or more commits which reach the
    -    once-unreachable objects, commit ddee3703b3 ensures that the MIDX will
    -    be closed under reachability. Without it, we would fail to generate a
    -    MIDX bitmap.
    +    once-unreachable objects.

    +    To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
    +    will be closed under reachability by including cruft pack(s). If cruft
    +    pack(s) were not included, we would fail to generate a MIDX bitmap. But
         ddee3703b3 alludes to the fact that this is sub-optimal by saying

             [...] it's desirable to avoid including cruft packs in the MIDX

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) where possible
  2025-05-29  0:07   ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
@ 2025-05-29  0:15     ` Elijah Newren
  0 siblings, 0 replies; 105+ messages in thread
From: Elijah Newren @ 2025-05-29  0:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Wed, May 28, 2025 at 5:07 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Wed, May 28, 2025 at 07:20:07PM -0400, Taylor Blau wrote:
> > Range-diff against v3:
>
> Hmm. My tool for submitting patches botched the range-diff here. The
> correct range-diff is:
>
> 1:  986bef29b5 ! 1:  2753e29648 pack-objects: limit scope in 'add_object_entry_from_pack()'
>     @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct objec
>         if (p) {
>      -          struct rev_info *revs = _data;
>                 struct object_info oi = OBJECT_INFO_INIT;
>     --
>     +
>                 oi.typep = &type;
>     -+
>     -           if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
>     +@@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct object_id *oid,
>                         die(_("could not get type of object %s in pack %s"),
>                             oid_to_hex(oid), p->pack_name);
>                 } else if (type == OBJ_COMMIT) {
> 2:  6f8fe8a4e1 = 2:  32b49d9073 pack-objects: factor out handling '--stdin-packs'
> 3:  2a235461a6 = 3:  a797ff3a83 pack-objects: declare 'rev_info' for '--stdin-packs' earlier
> 4:  240e90b68d = 4:  29bf05633a pack-objects: perform name-hash traversal for unpacked objects
> 5:  9a18fa2e52 = 5:  0696fa1736 pack-objects: fix typo in 'show_object_pack_hint()'
> 6:  6c997853f1 = 6:  1cc45b4472 pack-objects: swap 'show_{object,commit}_pack_hint'
> 7:  0ff699f056 = 7:  3e3d929bd0 pack-objects: introduce '--stdin-packs=follow'
> 8:  58891101f3 ! 8:  52a069ef48 repack: exclude cruft pack(s) from the MIDX where possible
>     @@ Commit message
>          MIDX with '--write-midx' to ensure that the resulting MIDX was always
>          closed under reachability in order to generate reachability bitmaps.
>
>     -    Suppose (prior to this patch) you have a once-unreachable object packed
>     -    in a cruft pack, which later on becomes reachable from one or more
>     -    objects in a geometrically repacked pack. That once-unreachable object
>     -    *won't* appear in the new pack, since the cruft pack was specified as
>     -    neither included nor excluded to 'pack-objects --stdin-packs'. If the
>     +    While the previous patch added the '--stdin-packs=follow' option to
>     +    pack-objects, it is not yet on by default. Given that, suppose you have
>     +    a once-unreachable object packed in a cruft pack, which later becomes
>     +    reachable from one or more objects in a geometrically repacked pack.
>     +    That once-unreachable object *won't* appear in the new pack, since the
>     +    cruft pack was not specified as included or excluded when the
>     +    geometrically repacked pack was created with 'pack-objects
>     +    --stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
>          new pack is included in a MIDX without the cruft pack, then trying to
>          generate bitmaps for that MIDX may fail. This happens when the bitmap
>          selection process picks one or more commits which reach the
>     -    once-unreachable objects, commit ddee3703b3 ensures that the MIDX will
>     -    be closed under reachability. Without it, we would fail to generate a
>     -    MIDX bitmap.
>     +    once-unreachable objects.
>
>     +    To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
>     +    will be closed under reachability by including cruft pack(s). If cruft
>     +    pack(s) were not included, we would fail to generate a MIDX bitmap. But
>          ddee3703b3 alludes to the fact that this is sub-optimal by saying
>
>              [...] it's desirable to avoid including cruft packs in the MIDX
>
> Thanks,
> Taylor

Thanks, this version addresses all the issues I brought up; this round
looks good to me.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v4 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-05-28 23:20   ` [PATCH v4 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
@ 2025-06-19 11:33     ` Carlo Marcelo Arenas Belón
  2025-06-19 13:08     ` [PATCH] fixup! " Carlo Marcelo Arenas Belón
  1 sibling, 0 replies; 105+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2025-06-19 11:33 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Wed, May 28, 2025 at 07:20:35PM -0800, Taylor Blau wrote:
> diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
> index 8aebfb45f5..2b0a55f8fd 100755
> --- a/t/t7704-repack-cruft.sh
> +++ b/t/t7704-repack-cruft.sh
> @@ -724,4 +724,94 @@ test_expect_success 'cruft repack respects --quiet' '
>  	)
>  '
>  
> +setup_cruft_exclude_tests() {
> +	git init "$1" &&
> +	(
> +		cd "$1" &&
> +
> +		git config repack.midxMustContainCruft false &&
> +
> +		test_commit one &&
> +
> +		test_commit --no-tag two &&
> +		two="$(git rev-parse HEAD)" &&
> +		test_commit --no-tag three &&
> +		three="$(git rev-parse HEAD)" &&
> +		git reset --hard one &&
> +		git reflog expire --all --expire=all &&
> +
> +		GIT_TEST_MULTI_PACK_INDEX=0 git repack --cruft -d &&
> +
> +		git merge $two &&
> +		test_commit four
> +	)
> +}
> +
> +test_expect_success 'repack --write-midx excludes cruft where possible' '
> +	setup_cruft_exclude_tests exclude-cruft-when-possible &&
> +	(
> +		cd exclude-cruft-when-possible &&
> +
> +		GIT_TEST_MULTI_PACK_INDEX=0 \
> +		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
> +
> +		test-tool read-midx --show-objects $objdir >midx &&
> +		cruft="$(ls $packdir/*.mtimes)" &&
> +		test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
> +
> +		git rev-list --all --objects --no-object-names >reachable.raw &&
> +		sort reachable.raw >reachable.objects &&
> +		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
> +
> +		test_cmp reachable.objects midx.objects
> +	)
> +'
> +
> +test_expect_success 'repack --write-midx includes cruft when instructed' '
> +	setup_cruft_exclude_tests exclude-cruft-when-instructed &&
> +	(
> +		cd exclude-cruft-when-instructed &&
> +
> +		GIT_TEST_MULTI_PACK_INDEX=0 \
> +		git -c repack.midxMustContainCruft=true repack \
> +			-d --geometric=2 --write-midx --write-bitmap-index &&
> +
> +		test-tool read-midx --show-objects $objdir >midx &&
> +		cruft="$(ls $packdir/*.mtimes)" &&
> +		test_grep "$(basename "$cruft" .mtimes).idx" midx &&
> +
> +		git cat-file --batch-check="%(objectname)" --batch-all-objects \
> +			>all.objects &&
> +		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
> +
> +		test_cmp all.objects midx.objects
> +	)
> +'
> +
> +test_expect_success 'repack --write-midx includes cruft when necessary' '
> +	setup_cruft_exclude_tests exclude-cruft-when-necessary &&
> +	(
> +		cd exclude-cruft-when-necessary &&
> +
> +		test_path_is_file $(ls $packdir/pack-*.mtimes) &&
> +		ls $packdir/pack-*.idx | sort >packs.all &&
> +		grep -o "pack-.*\.idx$" packs.all >in &&

this is introducing `grep -o` to our codebase, which is not in POSIX and
therefore will not be portable (ex: AIX)

something like (untested) :

	sed -n '/\(pack-.*\.idx$\)/\1/p' packs.all >in

will likely work the same and be more portable.

Carlo

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH] fixup! repack: exclude cruft pack(s) from the MIDX where possible
  2025-05-28 23:20   ` [PATCH v4 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
  2025-06-19 11:33     ` Carlo Marcelo Arenas Belón
@ 2025-06-19 13:08     ` Carlo Marcelo Arenas Belón
  2025-06-19 17:07       ` Junio C Hamano
  1 sibling, 1 reply; 105+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2025-06-19 13:08 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, gitster, newren, peff,
	Carlo Marcelo Arenas Belón

In a previous commit, `grep -o` was introduced as part of t7704.

POSIX doesn't have that flag as defined and while it is a popular
one is not available at least in the latest release of AIX.

Use a sed equivalent that ought to be more portable.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 t/t7704-repack-cruft.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 2b0a55f8fd..3df6b53cd3 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -795,7 +795,7 @@ test_expect_success 'repack --write-midx includes cruft when necessary' '
 
 		test_path_is_file $(ls $packdir/pack-*.mtimes) &&
 		ls $packdir/pack-*.idx | sort >packs.all &&
-		grep -o "pack-.*\.idx$" packs.all >in &&
+		sed -n "s/.*\(pack-.*\.idx\)$/\1/p" packs.all >in &&
 
 		git multi-pack-index write --stdin-packs --bitmap <in &&
 
-- 
2.50.0.53.g63c9ac04f7


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH] fixup! repack: exclude cruft pack(s) from the MIDX where possible
  2025-06-19 13:08     ` [PATCH] fixup! " Carlo Marcelo Arenas Belón
@ 2025-06-19 17:07       ` Junio C Hamano
  2025-06-19 23:26         ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Junio C Hamano @ 2025-06-19 17:07 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón; +Cc: git, Taylor Blau, newren, peff

Carlo Marcelo Arenas Belón <carenas@gmail.com> writes:

> In a previous commit, `grep -o` was introduced as part of t7704.
>
> POSIX doesn't have that flag as defined and while it is a popular
> one is not available at least in the latest release of AIX.
>
> Use a sed equivalent that ought to be more portable.

OK.  The patterns are not exactly the same but as long as we know
$packdir does *not* contain a substring "pack-", it should be OK.

As the topic is not even in 'next', perhaps a refresh can squash
this change in?

>  		ls $packdir/pack-*.idx | sort >packs.all &&
> -		grep -o "pack-.*\.idx$" packs.all >in &&
> +		sed -n "s/.*\(pack-.*\.idx\)$/\1/p" packs.all >in &&
>  		git multi-pack-index write --stdin-packs --bitmap <in &&

I do not quite see the need for temporary files or "grep/sed" here,
though.

		(cd "$packdir" && ls pack-*.idx) |
		sort |
		git multi-pack-index write --stdin-packs --bitmap &&


Tangent to this discussion, but I just noticed that

    $ git multi-pack-index -h

lacks quite a lot of information.  Perhaps it needs updating?


Thanks.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH] fixup! repack: exclude cruft pack(s) from the MIDX where possible
  2025-06-19 17:07       ` Junio C Hamano
@ 2025-06-19 23:26         ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Carlo Marcelo Arenas Belón, git, newren, peff

On Thu, Jun 19, 2025 at 10:07:14AM -0700, Junio C Hamano wrote:
> Carlo Marcelo Arenas Belón <carenas@gmail.com> writes:
>
> > In a previous commit, `grep -o` was introduced as part of t7704.
> >
> > POSIX doesn't have that flag as defined and while it is a popular
> > one is not available at least in the latest release of AIX.
> >
> > Use a sed equivalent that ought to be more portable.
>
> OK.  The patterns are not exactly the same but as long as we know
> $packdir does *not* contain a substring "pack-", it should be OK.

Good catch; thanks, both.

> As the topic is not even in 'next', perhaps a refresh can squash
> this change in?

Yep, I'll squash this in and adjust the other "grep -o" use in this
series.

> >  		ls $packdir/pack-*.idx | sort >packs.all &&
> > -		grep -o "pack-.*\.idx$" packs.all >in &&
> > +		sed -n "s/.*\(pack-.*\.idx\)$/\1/p" packs.all >in &&
> >  		git multi-pack-index write --stdin-packs --bitmap <in &&
>
> I do not quite see the need for temporary files or "grep/sed" here,
> though.
>
> 		(cd "$packdir" && ls pack-*.idx) |
> 		sort |
> 		git multi-pack-index write --stdin-packs --bitmap &&

The grep/sed is unnecessary (I was just trying to be clever and avoid a
sub-shell), but the packs.all temporary file is still needed since we
run comm against it later on in the test script.

> Tangent to this discussion, but I just noticed that
>
>     $ git multi-pack-index -h
>
> lacks quite a lot of information.  Perhaps it needs updating?

Definitely some good #leftoverbits ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH v5 0/9] repack: avoid MIDX'ing cruft pack(s) where possible
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
                   ` (10 preceding siblings ...)
  2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
@ 2025-06-19 23:30 ` Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
                     ` (8 more replies)
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
  12 siblings, 9 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

(Junio: I rebased this onto the tip of current 'master', which is
f9aa0eedb3 (Start 2.51 cycle, the first batch, 2025-06-17) at the time
of writing.)

Here is another small reroll of my series to create MIDXs that do not
include a repository's cruft pack(s).

The bulk of the series is unchanged, save for a couple of points I'll
discuss below. As usual, a complete-range diff is included below for
convenience. There are two tiny but important changes that I included
here as a result of rolling this out to GitHub's infrastructure. They
are:

 - Tolerating missing objects in the follow-on traversal. When missing
   an object in a non "follow" `--stdin-packs` traversal, we didn't
   notice it because that traversal does not add discovered objects to
   the packing list. That is not the case for `--stdin-packs=follow`
   traversals which do. This has been corrected, and is explained in
   detail in the penultimate commit.

 - The behavior when writing a MIDX after a noop repack. There are
   some cases here where we may need to include the cruft pack in the
   MIDX depending on how the existing packs are structured. This is
   corrected and explained in more detail in the final commit.

There is an additional change that Carlo Marcelo Arenas Belón pointed
out[1] that drops the non-portable "grep -o" invocations added by the
previous round.

I promised in the last round that I'd report any interesting findings
after deploying this to GitHub's infrastructure. Here they are:

 - On unreachable object-heavy repositories, I was able to measure
   about a ~5% performance improvement across many operations.

 - GitHub's average time to repack a repository went down by about
   ~20% as a result of rolling this change out everywhere.

Thanks in advance for any review :-).

[1]: <lfrgmt2ukanevmcctzsnc422iv2l2nb3qmiddpsj6jnyvz4m4s@5eohhsm6knw3>

Taylor Blau (9):
  pack-objects: use standard option incompatibility functions
  pack-objects: limit scope in 'add_object_entry_from_pack()'
  pack-objects: factor out handling '--stdin-packs'
  pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  pack-objects: perform name-hash traversal for unpacked objects
  pack-objects: fix typo in 'show_object_pack_hint()'
  pack-objects: swap 'show_{object,commit}_pack_hint'
  pack-objects: introduce '--stdin-packs=follow'
  repack: exclude cruft pack(s) from the MIDX where possible

 Documentation/config/repack.adoc    |   7 +
 Documentation/git-pack-objects.adoc |  10 +-
 builtin/pack-objects.c              | 193 ++++++++++++++++++----------
 builtin/repack.c                    | 184 +++++++++++++++++++++++---
 t/t5331-pack-objects-stdin.sh       | 122 +++++++++++++++++-
 t/t7704-repack-cruft.sh             | 145 +++++++++++++++++++++
 6 files changed, 570 insertions(+), 91 deletions(-)

Range-diff against v4:
 1:  f8b31c6a8d =  1:  ebaf47262a pack-objects: use standard option incompatibility functions
 2:  2753e29648 =  2:  eaa1f41b25 pack-objects: limit scope in 'add_object_entry_from_pack()'
 3:  32b49d9073 =  3:  8d0492a80d pack-objects: factor out handling '--stdin-packs'
 4:  a797ff3a83 =  4:  3d5d3b78b2 pack-objects: declare 'rev_info' for '--stdin-packs' earlier
 5:  29bf05633a =  5:  2a3676cb86 pack-objects: perform name-hash traversal for unpacked objects
 6:  0696fa1736 =  6:  bcbce75695 pack-objects: fix typo in 'show_object_pack_hint()'
 7:  1cc45b4472 =  7:  c8cf316c50 pack-objects: swap 'show_{object,commit}_pack_hint'
 8:  3e3d929bd0 !  8:  b81b6213e8 pack-objects: introduce '--stdin-packs=follow'
    @@ Commit message
         copy in the cruft pack, otherwise we cannot generate reachability
         bitmaps for any commits which reach that object.
     
    +    Note that the traversal here is best-effort, similar to the existing
    +    traversal which provides name-hash hints. This means that the object
    +    traversal may hand us back a blob that does not actually exist. We
    +    *won't* see missing trees/commits with 'ignore_missing_links' because:
    +
    +     - missing commit parents are discarded at the commit traversal stage by
    +       revision.c::process_parents()
    +
    +     - missing tag objects are discarded by revision.c::handle_commit()
    +
    +     - missing tree objects are discarded by the list-objects code in
    +       list-objects.c::process_tree()
    +
    +    But we have to handle potentially-missing blobs specially by making a
    +    separate check to ensure they exist in the repository. Failing to do so
    +    would mean that we'd add an object to the packing list which doesn't
    +    actually exist, rendering us unable to write out the pack.
    +
         This prepares us for new repacking behavior which will "resurrect"
         objects found in cruft or otherwise unspecified packs when generating
         new packs. In the context of geometric repacking, this may be used to
    @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct objec
     -		return;
     +	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
     +	if (mode == STDIN_PACKS_MODE_FOLLOW) {
    ++		if (object->type == OBJ_BLOB && !has_object(the_repository,
    ++							    &object->oid, 0))
    ++			return;
     +		add_object_entry(&object->oid, object->type, name, 0);
     +	} else {
     +		struct object_entry *oe = packlist_find(&to_pack, &object->oid);
    @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with pa
      	test_cmp expected-objects actual-objects
      '
      
    -+packdir=.git/objects/pack
    ++objdir=.git/objects
    ++packdir=$objdir/pack
     +
     +objects_in_packs () {
     +	for p in "$@"
    @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with pa
     +		test_cmp expect actual
     +	)
     +'
    ++
    ++stdin_packs__follow_with_only () {
    ++	rm -fr stdin_packs__follow_with_only &&
    ++	git init stdin_packs__follow_with_only &&
    ++	(
    ++		cd stdin_packs__follow_with_only &&
    ++
    ++		test_commit A &&
    ++		test_commit B &&
    ++
    ++		git rev-parse "$@" >B.objects &&
    ++
    ++		echo A | git pack-objects --revs $packdir/pack &&
    ++		B="$(git pack-objects $packdir/pack <B.objects)" &&
    ++
    ++		git cat-file --batch-check="%(objectname)" --batch-all-objects >objs &&
    ++		for obj in $(cat objs)
    ++		do
    ++			rm -f $objdir/$(test_oid_to_path $obj) || return 1
    ++		done &&
    ++
    ++		( cd $packdir && ls pack-*.pack ) >in &&
    ++		git pack-objects --stdin-packs=follow --stdout >/dev/null <in
    ++	)
    ++}
    ++
    ++test_expect_success '--stdin-packs=follow tolerates missing blobs' '
    ++	stdin_packs__follow_with_only HEAD HEAD^{tree}
    ++'
    ++
    ++test_expect_success '--stdin-packs=follow tolerates missing trees' '
    ++	stdin_packs__follow_with_only HEAD HEAD:B.t
    ++'
    ++
    ++test_expect_success '--stdin-packs=follow tolerates missing commits' '
    ++	stdin_packs__follow_with_only HEAD HEAD^{tree}
    ++'
     +
      test_done
 9:  52a069ef48 !  9:  6487001f64 repack: exclude cruft pack(s) from the MIDX where possible
    @@ Commit message
         of subsequently generated packs from geometric repacking *is* closed
         under reachability.
     
    +    (One exception here is when "starting from scratch" results in a noop
    +    repack, e.g., because the non-cruft pack(s) in a repository already form
    +    a geometric progression. Since we can't tell whether or not those were
    +    generated with '--stdin-packs=follow', they may depend on
    +    once-unreachable objects, so we have to include the cruft pack in the
    +    MIDX in this case.)
    +
         Detect when this is the case and avoid including cruft packs in the MIDX
         where possible. The existing behavior remains the default, and the new
         behavior is available with the config 'repack.midxMustIncludeCruft' set
    @@ builtin/repack.c: int cmd_repack(int argc,
      	} else {
      		strvec_push(&cmd.args, "--unpacked");
     @@ builtin/repack.c: int cmd_repack(int argc,
    + 	if (ret)
    + 		goto cleanup;
    + 
    +-	if (!names.nr && !po_args.quiet)
    +-		printf_ln(_("Nothing new to pack."));
    ++	if (!names.nr) {
    ++		if (!po_args.quiet)
    ++			printf_ln(_("Nothing new to pack."));
    ++		/*
    ++		 * If we didn't write any new packs, the non-cruft packs
    ++		 * may refer to once-unreachable objects in the cruft
    ++		 * pack(s).
    ++		 *
    ++		 * If there isn't already a MIDX, the one we write
    ++		 * must include the cruft pack(s), in case the
    ++		 * non-cruft pack(s) refer to once-cruft objects.
    ++		 *
    ++		 * If there is already a MIDX, we can punt here, since
    ++		 * midx_has_unknown_packs() will make the decision for
    ++		 * us.
    ++		 */
    ++		if (!get_local_multi_pack_index(the_repository))
    ++			midx_must_contain_cruft = 1;
    ++	}
    + 
    + 	if (pack_everything & PACK_CRUFT) {
    + 		const char *pack_prefix = find_pack_prefix(packdir, packtmp);
    +@@ builtin/repack.c: int cmd_repack(int argc,
      
      	string_list_sort(&names);
      
    @@ t/t7704-repack-cruft.sh: test_expect_success 'cruft repack respects --quiet' '
     +		cd exclude-cruft-when-necessary &&
     +
     +		test_path_is_file $(ls $packdir/pack-*.mtimes) &&
    -+		ls $packdir/pack-*.idx | sort >packs.all &&
    -+		grep -o "pack-.*\.idx$" packs.all >in &&
    -+
    -+		git multi-pack-index write --stdin-packs --bitmap <in &&
    ++		( cd $packdir && ls pack-*.idx ) | sort >packs.all &&
    ++		git multi-pack-index write --stdin-packs --bitmap <packs.all &&
     +
     +		test_commit five &&
     +		GIT_TEST_MULTI_PACK_INDEX=0 \
    @@ t/t7704-repack-cruft.sh: test_expect_success 'cruft repack respects --quiet' '
     +		test_line_count = "$(($(wc -l <packs.all) + 1))" midx.packs
     +	)
     +'
    ++
    ++test_expect_success 'repack --write-midx includes cruft when already geometric' '
    ++	git init repack--write-midx-geometric-noop &&
    ++	(
    ++		cd repack--write-midx-geometric-noop &&
    ++
    ++		git branch -M main &&
    ++		test_commit A &&
    ++		test_commit B &&
    ++
    ++		git checkout -B side &&
    ++		test_commit --no-tag C &&
    ++		C="$(git rev-parse HEAD)" &&
    ++
    ++		git checkout main &&
    ++		git branch -D side &&
    ++		git reflog expire --all --expire=all &&
    ++
    ++		# At this point we have two packs: one containing the
    ++		# objects belonging to commits A and B, and another
    ++		# (cruft) pack containing the objects belonging to
    ++		# commit C.
    ++		git repack --cruft -d &&
    ++
    ++		# Create a third pack which contains a merge commit
    ++		# making commit C reachable again.
    ++		#
    ++		# --no-ff is important here, as it ensures that we
    ++		# actually write a new object and subsequently a new
    ++		# pack to contain it.
    ++		git merge --no-ff $C &&
    ++		git repack -d &&
    ++
    ++		ls $packdir/pack-*.idx | sort >packs.all &&
    ++		cruft="$(ls $packdir/pack-*.mtimes)" &&
    ++		cruft="${cruft%.mtimes}.idx" &&
    ++
    ++		for idx in $(grep -v $cruft <packs.all)
    ++		do
    ++			git show-index <$idx >out &&
    ++			wc -l <out || return 1
    ++		done >sizes.raw &&
    ++
    ++		# Make sure that there are two non-cruft packs, and
    ++		# that one of them contains at least twice as many
    ++		# objects as the other, ensuring that they are already
    ++		# in a geometric progression.
    ++		sort -n sizes.raw >sizes &&
    ++		test_line_count = 2 sizes &&
    ++		s1=$(head -n 1 sizes) &&
    ++		s2=$(tail -n 1 sizes) &&
    ++		test "$s2" -gt "$((2 * $s1))" &&
    ++
    ++		git -c repack.midxMustContainCruft=false repack --geometric=2 \
    ++			--write-midx --write-bitmap-index
    ++	)
    ++'
     +
      test_done

base-commit: f9aa0eedb37eb94d9d3711ef0d565fd7cb3b6148
-- 
2.50.0.61.gf819b10624.dirty

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH v5 1/9] pack-objects: use standard option incompatibility functions
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
@ 2025-06-19 23:30   ` Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

pack-objects has a handful of explicit checks for pairs of command-line
options which are mutually incompatible. Many of these pre-date
a699367bb8 (i18n: factorize more 'incompatible options' messages,
2022-01-31).

Convert the explicit checks into die_for_incompatible_opt2() calls,
which simplifies the implementation and standardizes pack-objects'
output when given incompatible options (e.g., --stdin-packs with
--filter gives different output than --keep-unreachable with
--unpack-unreachable).

There is one minor piece of test fallout in t5331 that expects the old
format, which has been corrected.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c        | 20 +++++++++++---------
 t/t5331-pack-objects-stdin.sh |  2 +-
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 67941c8a60..e7274e0e00 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -5010,9 +5010,10 @@ int cmd_pack_objects(int argc,
 		strvec_push(&rp, "--unpacked");
 	}
 
-	if (exclude_promisor_objects && exclude_promisor_objects_best_effort)
-		die(_("options '%s' and '%s' cannot be used together"),
-		    "--exclude-promisor-objects", "--exclude-promisor-objects-best-effort");
+	die_for_incompatible_opt2(exclude_promisor_objects,
+				  "--exclude-promisor-objects",
+				  exclude_promisor_objects_best_effort,
+				  "--exclude-promisor-objects-best-effort");
 	if (exclude_promisor_objects) {
 		use_internal_rev_list = 1;
 		fetch_if_missing = 0;
@@ -5050,13 +5051,14 @@ int cmd_pack_objects(int argc,
 	if (!pack_to_stdout && thin)
 		die(_("--thin cannot be used to build an indexable pack"));
 
-	if (keep_unreachable && unpack_unreachable)
-		die(_("options '%s' and '%s' cannot be used together"), "--keep-unreachable", "--unpack-unreachable");
+	die_for_incompatible_opt2(keep_unreachable, "--keep-unreachable",
+				  unpack_unreachable, "--unpack-unreachable");
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (stdin_packs && filter_options.choice)
-		die(_("cannot use --filter with --stdin-packs"));
+	die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
+				  filter_options.choice, "--filter");
+
 
 	if (stdin_packs && use_internal_rev_list)
 		die(_("cannot use internal rev list with --stdin-packs"));
@@ -5064,8 +5066,8 @@ int cmd_pack_objects(int argc,
 	if (cruft) {
 		if (use_internal_rev_list)
 			die(_("cannot use internal rev list with --cruft"));
-		if (stdin_packs)
-			die(_("cannot use --stdin-packs with --cruft"));
+		die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
+					  cruft, "--cruft");
 	}
 
 	/*
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index b48c0cbe8f..8fd07deb8d 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -64,7 +64,7 @@ test_expect_success '--stdin-packs is incompatible with --filter' '
 		cd stdin-packs &&
 		test_must_fail git pack-objects --stdin-packs --stdout \
 			--filter=blob:none </dev/null 2>err &&
-		test_grep "cannot use --filter with --stdin-packs" err
+		test_grep "options .--stdin-packs. and .--filter. cannot be used together" err
 	)
 '
 
-- 
2.50.0.61.gf819b10624.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v5 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
@ 2025-06-19 23:30   ` Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In add_object_entry_from_pack() we declare 'revs' (given to us through
the miscellaneous context argument) earlier in the "if (p)" conditional
than is necessary.  Move it down as far as it can go to reduce its
scope.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e7274e0e00..d04a36a6bf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3725,7 +3725,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 		return 0;
 
 	if (p) {
-		struct rev_info *revs = _data;
 		struct object_info oi = OBJECT_INFO_INIT;
 
 		oi.typep = &type;
@@ -3733,6 +3732,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 			die(_("could not get type of object %s in pack %s"),
 			    oid_to_hex(oid), p->pack_name);
 		} else if (type == OBJ_COMMIT) {
+			struct rev_info *revs = _data;
 			/*
 			 * commits in included packs are used as starting points for the
 			 * subsequent revision walk
-- 
2.50.0.61.gf819b10624.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v5 3/9] pack-objects: factor out handling '--stdin-packs'
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
@ 2025-06-19 23:30   ` Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

At the bottom of cmd_pack_objects() we check which mode the command is
running in (e.g., generating a cruft pack, handling '--stdin-packs',
using the internal rev-list, etc.) and handle the mode appropriately.

The '--stdin-packs' case is handled inline (dating back to its
introduction in 339bce27f4 (builtin/pack-objects.c: add '--stdin-packs'
option, 2021-02-22)) since it is relatively short. Extract the body of
"if (stdin_packs)" into its own function to prepare for the
implementation to become lengthier in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d04a36a6bf..7ce04b71dd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3909,6 +3909,17 @@ static void read_packs_list_from_stdin(void)
 	string_list_clear(&exclude_packs, 0);
 }
 
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin();
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+}
+
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
 				   struct packed_git *pack, off_t offset,
 				   const char *name, uint32_t mtime)
@@ -4004,7 +4015,6 @@ static void mark_pack_kept_in_core(struct string_list *packs, unsigned keep)
 	}
 }
 
-static void add_unreachable_loose_objects(void);
 static void add_objects_in_unpacked_packs(void);
 
 static void enumerate_cruft_objects(void)
@@ -5135,11 +5145,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		/* avoids adding objects in excluded packs */
-		ignore_packed_keep_in_core = 1;
-		read_packs_list_from_stdin();
-		if (rev_list_unpacked)
-			add_unreachable_loose_objects();
+		read_stdin_packs(rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
-- 
2.50.0.61.gf819b10624.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v5 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
                     ` (2 preceding siblings ...)
  2025-06-19 23:30   ` [PATCH v5 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
@ 2025-06-19 23:30   ` Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
on each of the input packs, we do a reachability traversal to discover
names for any objects we picked up so we can generate name hash values
and hopefully get higher quality deltas as a result.

A future commit will change the purpose of this reachability traversal
to find and pack objects which are reachable from commits in the input
packs, but are packed in an unknown (not included nor excluded) pack.

Extract the code which initializes and performs the reachability
traversal to take place in the caller, not the callee, which prepares us
to share this code for the '--unpacked' case (see the function
add_unreachable_loose_objects() for more details).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 71 +++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 35 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7ce04b71dd..4258ac1792 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3793,7 +3793,7 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
 		return 0;
 }
 
-static void read_packs_list_from_stdin(void)
+static void read_packs_list_from_stdin(struct rev_info *revs)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct string_list include_packs = STRING_LIST_INIT_DUP;
@@ -3801,24 +3801,6 @@ static void read_packs_list_from_stdin(void)
 	struct string_list_item *item = NULL;
 
 	struct packed_git *p;
-	struct rev_info revs;
-
-	repo_init_revisions(the_repository, &revs, NULL);
-	/*
-	 * Use a revision walk to fill in the namehash of objects in the include
-	 * packs. To save time, we'll avoid traversing through objects that are
-	 * in excluded packs.
-	 *
-	 * That may cause us to avoid populating all of the namehash fields of
-	 * all included objects, but our goal is best-effort, since this is only
-	 * an optimization during delta selection.
-	 */
-	revs.no_kept_objects = 1;
-	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
-	revs.blob_objects = 1;
-	revs.tree_objects = 1;
-	revs.tag_objects = 1;
-	revs.ignore_missing_links = 1;
 
 	while (strbuf_getline(&buf, stdin) != EOF) {
 		if (!buf.len)
@@ -3888,10 +3870,44 @@ static void read_packs_list_from_stdin(void)
 		struct packed_git *p = item->util;
 		for_each_object_in_pack(p,
 					add_object_entry_from_pack,
-					&revs,
+					revs,
 					FOR_EACH_OBJECT_PACK_ORDER);
 	}
 
+	strbuf_release(&buf);
+	string_list_clear(&include_packs, 0);
+	string_list_clear(&exclude_packs, 0);
+}
+
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	struct rev_info revs;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	/*
+	 * Use a revision walk to fill in the namehash of objects in the include
+	 * packs. To save time, we'll avoid traversing through objects that are
+	 * in excluded packs.
+	 *
+	 * That may cause us to avoid populating all of the namehash fields of
+	 * all included objects, but our goal is best-effort, since this is only
+	 * an optimization during delta selection.
+	 */
+	revs.no_kept_objects = 1;
+	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
+	revs.blob_objects = 1;
+	revs.tree_objects = 1;
+	revs.tag_objects = 1;
+	revs.ignore_missing_links = 1;
+
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin(&revs);
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 	traverse_commit_list(&revs,
@@ -3903,21 +3919,6 @@ static void read_packs_list_from_stdin(void)
 			   stdin_packs_found_nr);
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
 			   stdin_packs_hints_nr);
-
-	strbuf_release(&buf);
-	string_list_clear(&include_packs, 0);
-	string_list_clear(&exclude_packs, 0);
-}
-
-static void add_unreachable_loose_objects(void);
-
-static void read_stdin_packs(int rev_list_unpacked)
-{
-	/* avoids adding objects in excluded packs */
-	ignore_packed_keep_in_core = 1;
-	read_packs_list_from_stdin();
-	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
 }
 
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
-- 
2.50.0.61.gf819b10624.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v5 5/9] pack-objects: perform name-hash traversal for unpacked objects
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
                     ` (3 preceding siblings ...)
  2025-06-19 23:30   ` [PATCH v5 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
@ 2025-06-19 23:30   ` Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

With '--unpacked', pack-objects adds loose objects (which don't appear
in any of the excluded packs from '--stdin-packs') to the output pack
without considering them as reachability tips for the name-hash
traversal.

This was an oversight in the original implementation of '--stdin-packs',
since the code which enumerates and adds loose objects to the output
pack (`add_unreachable_loose_objects()`) did not have access to the
'rev_info' struct found in `read_packs_list_from_stdin()`.

Excluding unpacked objects from that traversal doesn't affect the
correctness of the resulting pack, but it does make it harder to
discover good deltas for loose objects.

Now that the 'rev_info' struct is declared outside of
`read_packs_list_from_stdin()`, we can pass it to
`add_objects_in_unpacked_packs()` and add any loose objects as tips to
the above-mentioned traversal, in theory producing slightly tighter
packs as a result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4258ac1792..3437dbd7f1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3879,7 +3879,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 	string_list_clear(&exclude_packs, 0);
 }
 
-static void add_unreachable_loose_objects(void);
+static void add_unreachable_loose_objects(struct rev_info *revs);
 
 static void read_stdin_packs(int rev_list_unpacked)
 {
@@ -3906,7 +3906,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	ignore_packed_keep_in_core = 1;
 	read_packs_list_from_stdin(&revs);
 	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(&revs);
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
@@ -4025,7 +4025,7 @@ static void enumerate_cruft_objects(void)
 						_("Enumerating cruft objects"), 0);
 
 	add_objects_in_unpacked_packs();
-	add_unreachable_loose_objects();
+	add_unreachable_loose_objects(NULL);
 
 	stop_progress(&progress_state);
 }
@@ -4303,8 +4303,9 @@ static void add_objects_in_unpacked_packs(void)
 }
 
 static int add_loose_object(const struct object_id *oid, const char *path,
-			    void *data UNUSED)
+			    void *data)
 {
+	struct rev_info *revs = data;
 	enum object_type type = oid_object_info(the_repository, oid, NULL);
 
 	if (type < 0) {
@@ -4325,6 +4326,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
 	} else {
 		add_object_entry(oid, type, "", 0);
 	}
+
+	if (revs && type == OBJ_COMMIT)
+		add_pending_oid(revs, NULL, oid, 0);
+
 	return 0;
 }
 
@@ -4333,11 +4338,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
  * add_object_entry will weed out duplicates, so we just add every
  * loose object we find.
  */
-static void add_unreachable_loose_objects(void)
+static void add_unreachable_loose_objects(struct rev_info *revs)
 {
 	for_each_loose_file_in_objdir(repo_get_object_directory(the_repository),
-				      add_loose_object,
-				      NULL, NULL, NULL);
+				      add_loose_object, NULL, NULL, revs);
 }
 
 static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid)
@@ -4684,7 +4688,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
 	if (keep_unreachable)
 		add_objects_in_unpacked_packs();
 	if (pack_loose_unreachable)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(NULL);
 	if (unpack_unreachable)
 		loosen_unused_packed_objects();
 
-- 
2.50.0.61.gf819b10624.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v5 6/9] pack-objects: fix typo in 'show_object_pack_hint()'
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
                     ` (4 preceding siblings ...)
  2025-06-19 23:30   ` [PATCH v5 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
@ 2025-06-19 23:30   ` Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Noticed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3437dbd7f1..9580b4ea1a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3767,7 +3767,7 @@ static void show_object_pack_hint(struct object *object, const char *name,
 	 * would typically pick up during a reachability traversal.
 	 *
 	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * here using a now in order to perhaps improve the delta selection
+	 * fields here in order to perhaps improve the delta selection
 	 * process.
 	 */
 	oe->hash = pack_name_hash_fn(name);
-- 
2.50.0.61.gf819b10624.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v5 7/9] pack-objects: swap 'show_{object,commit}_pack_hint'
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
                     ` (5 preceding siblings ...)
  2025-06-19 23:30   ` [PATCH v5 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
@ 2025-06-19 23:30   ` Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
  2025-06-19 23:30   ` [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

show_commit_pack_hint() has heretofore been a noop, so its position
within its compilation unit only needs to appear before its first use.

But the following commit will sometimes have `show_commit_pack_hint()`
call `show_object_pack_hint()`, so reorder the former to appear after
the latter to minimize the code movement in that patch.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9580b4ea1a..f44447a3f9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3748,12 +3748,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 	return 0;
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
-{
-	/* nothing to do; commits don't have a namehash */
-}
-
 static void show_object_pack_hint(struct object *object, const char *name,
 				  void *data UNUSED)
 {
@@ -3776,6 +3770,12 @@ static void show_object_pack_hint(struct object *object, const char *name,
 	stdin_packs_hints_nr++;
 }
 
+static void show_commit_pack_hint(struct commit *commit UNUSED,
+				  void *data UNUSED)
+{
+	/* nothing to do; commits don't have a namehash */
+}
+
 static int pack_mtime_cmp(const void *_a, const void *_b)
 {
 	struct packed_git *a = ((const struct string_list_item*)_a)->util;
-- 
2.50.0.61.gf819b10624.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v5 8/9] pack-objects: introduce '--stdin-packs=follow'
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
                     ` (6 preceding siblings ...)
  2025-06-19 23:30   ` [PATCH v5 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
@ 2025-06-19 23:30   ` Taylor Blau
  2025-06-20 15:27     ` Junio C Hamano
  2025-06-19 23:30   ` [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

When invoked with '--stdin-packs', pack-objects will generate a pack
which contains the objects found in the "included" packs, less any
objects from "excluded" packs.

Packs that exist in the repository but weren't specified as either
included or excluded are in practice treated like the latter, at least
in the sense that pack-objects won't include objects from those packs.
This behavior forces us to include any cruft pack(s) in a repository's
multi-pack index for the reasons described in ddee3703b3
(builtin/repack.c: add cruft packs to MIDX during geometric repack,
2022-05-20).

The full details are in ddee3703b3, but the gist is if you
have a once-unreachable object in a cruft pack which later becomes
reachable via one or more commits in a pack generated with
'--stdin-packs', you *have* to include that object in the MIDX via the
copy in the cruft pack, otherwise we cannot generate reachability
bitmaps for any commits which reach that object.

Note that the traversal here is best-effort, similar to the existing
traversal which provides name-hash hints. This means that the object
traversal may hand us back a blob that does not actually exist. We
*won't* see missing trees/commits with 'ignore_missing_links' because:

 - missing commit parents are discarded at the commit traversal stage by
   revision.c::process_parents()

 - missing tag objects are discarded by revision.c::handle_commit()

 - missing tree objects are discarded by the list-objects code in
   list-objects.c::process_tree()

But we have to handle potentially-missing blobs specially by making a
separate check to ensure they exist in the repository. Failing to do so
would mean that we'd add an object to the packing list which doesn't
actually exist, rendering us unable to write out the pack.

This prepares us for new repacking behavior which will "resurrect"
objects found in cruft or otherwise unspecified packs when generating
new packs. In the context of geometric repacking, this may be used to
maintain a sequence of geometrically-repacked packs, the union of which
is closed under reachability, even in the case described earlier.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc |  10 ++-
 builtin/pack-objects.c              |  86 +++++++++++++++-----
 t/t5331-pack-objects-stdin.sh       | 120 ++++++++++++++++++++++++++++
 3 files changed, 193 insertions(+), 23 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index b1c5aa27da..eba014c406 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -87,13 +87,21 @@ base-name::
 	reference was included in the resulting packfile.  This
 	can be useful to send new tags to native Git clients.
 
---stdin-packs::
+--stdin-packs[=<mode>]::
 	Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
 	from the standard input, instead of object names or revision
 	arguments. The resulting pack contains all objects listed in the
 	included packs (those not beginning with `^`), excluding any
 	objects listed in the excluded packs (beginning with `^`).
 +
+When `mode` is "follow", objects from packs not listed on stdin receive
+special treatment. Objects within unlisted packs will be included if
+those objects are (1) reachable from the included packs, and (2) not
+found in any excluded packs. This mode is useful, for example, to
+resurrect once-unreachable objects found in cruft packs to generate
+packs which are closed under reachability up to the boundary set by the
+excluded packs.
++
 Incompatible with `--revs`, or options that imply `--revs` (such as
 `--all`), with the exception of `--unpacked`, which is compatible.
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f44447a3f9..d51fe9c820 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -284,6 +284,12 @@ static struct oidmap configured_exclusions;
 static struct oidset excluded_by_config;
 static int name_hash_version = -1;
 
+enum stdin_packs_mode {
+	STDIN_PACKS_MODE_NONE,
+	STDIN_PACKS_MODE_STANDARD,
+	STDIN_PACKS_MODE_FOLLOW,
+};
+
 /**
  * Check whether the name_hash_version chosen by user input is appropriate,
  * and also validate whether it is compatible with other features.
@@ -3749,31 +3755,47 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 }
 
 static void show_object_pack_hint(struct object *object, const char *name,
-				  void *data UNUSED)
+				  void *data)
 {
-	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
-	if (!oe)
-		return;
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		if (object->type == OBJ_BLOB && !has_object(the_repository,
+							    &object->oid, 0))
+			return;
+		add_object_entry(&object->oid, object->type, name, 0);
+	} else {
+		struct object_entry *oe = packlist_find(&to_pack, &object->oid);
+		if (!oe)
+			return;
 
-	/*
-	 * Our 'to_pack' list was constructed by iterating all objects packed in
-	 * included packs, and so doesn't have a non-zero hash field that you
-	 * would typically pick up during a reachability traversal.
-	 *
-	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * fields here in order to perhaps improve the delta selection
-	 * process.
-	 */
-	oe->hash = pack_name_hash_fn(name);
-	oe->no_try_delta = name && no_try_delta(name);
+		/*
+		 * Our 'to_pack' list was constructed by iterating all
+		 * objects packed in included packs, and so doesn't have
+		 * a non-zero hash field that you would typically pick
+		 * up during a reachability traversal.
+		 *
+		 * Make a best-effort attempt to fill in the ->hash and
+		 * ->no_try_delta fields here in order to perhaps
+		 * improve the delta selection process.
+		 */
+		oe->hash = pack_name_hash_fn(name);
+		oe->no_try_delta = name && no_try_delta(name);
 
-	stdin_packs_hints_nr++;
+		stdin_packs_hints_nr++;
+	}
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
+static void show_commit_pack_hint(struct commit *commit, void *data)
 {
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		show_object_pack_hint((struct object *)commit, "", data);
+		return;
+	}
+
 	/* nothing to do; commits don't have a namehash */
+
 }
 
 static int pack_mtime_cmp(const void *_a, const void *_b)
@@ -3881,7 +3903,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 
 static void add_unreachable_loose_objects(struct rev_info *revs);
 
-static void read_stdin_packs(int rev_list_unpacked)
+static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 {
 	struct rev_info revs;
 
@@ -3913,7 +3935,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	traverse_commit_list(&revs,
 			     show_commit_pack_hint,
 			     show_object_pack_hint,
-			     NULL);
+			     &mode);
 
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
 			   stdin_packs_found_nr);
@@ -4795,6 +4817,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
 	return is_not_in_promisor_pack_obj((struct object *) commit, data);
 }
 
+static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
+				  int unset)
+{
+	enum stdin_packs_mode *mode = opt->value;
+
+	if (unset)
+		*mode = STDIN_PACKS_MODE_NONE;
+	else if (!arg || !*arg)
+		*mode = STDIN_PACKS_MODE_STANDARD;
+	else if (!strcmp(arg, "follow"))
+		*mode = STDIN_PACKS_MODE_FOLLOW;
+	else
+		die(_("invalid value for '%s': '%s'"), opt->long_name, arg);
+
+	return 0;
+}
+
 int cmd_pack_objects(int argc,
 		     const char **argv,
 		     const char *prefix,
@@ -4805,7 +4844,7 @@ int cmd_pack_objects(int argc,
 	struct strvec rp = STRVEC_INIT;
 	int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
 	int rev_list_index = 0;
-	int stdin_packs = 0;
+	enum stdin_packs_mode stdin_packs = STDIN_PACKS_MODE_NONE;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
 	struct list_objects_filter_options filter_options =
 		LIST_OBJECTS_FILTER_INIT;
@@ -4860,6 +4899,9 @@ int cmd_pack_objects(int argc,
 		OPT_SET_INT_F(0, "indexed-objects", &rev_list_index,
 			      N_("include objects referred to by the index"),
 			      1, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F(0, "stdin-packs", &stdin_packs, N_("mode"),
+			     N_("read packs from stdin"),
+			     PARSE_OPT_OPTARG, parse_stdin_packs_mode),
 		OPT_BOOL(0, "stdin-packs", &stdin_packs,
 			 N_("read packs from stdin")),
 		OPT_BOOL(0, "stdout", &pack_to_stdout,
@@ -5150,7 +5192,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		read_stdin_packs(rev_list_unpacked);
+		read_stdin_packs(stdin_packs, rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 8fd07deb8d..4a8df5a389 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -236,4 +236,124 @@ test_expect_success 'pack-objects --stdin with packfiles from main and alternate
 	test_cmp expected-objects actual-objects
 '
 
+objdir=.git/objects
+packdir=$objdir/pack
+
+objects_in_packs () {
+	for p in "$@"
+	do
+		git show-index <"$packdir/pack-$p.idx" || return 1
+	done >objects.raw &&
+
+	cut -d' ' -f2 objects.raw | sort &&
+	rm -f objects.raw
+}
+
+test_expect_success '--stdin-packs=follow walks into unknown packs' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+
+		for c in A B C D
+		do
+			test_commit "$c" || return 1
+		done &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
+		test_commit E &&
+
+		git prune-packed &&
+
+		cat >in <<-EOF &&
+		pack-$B.pack
+		^pack-$C.pack
+		pack-$D.pack
+		EOF
+
+		# With just --stdin-packs, pack "A" is unknown to us, so
+		# only objects from packs "B" and "D" are included in
+		# the output pack.
+		P=$(git pack-objects --stdin-packs $packdir/pack <in) &&
+		objects_in_packs $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# But with --stdin-packs=follow, objects from both
+		# included packs reach objects from the unknown pack, so
+		# objects from pack "A" is included in the output pack
+		# in addition to the above.
+		P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
+		objects_in_packs $A $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# And with --unpacked, we will pick up objects from unknown
+		# packs that are reachable from loose objects. Loose object E
+		# reaches objects in pack A, but there are three excluded packs
+		# in between.
+		#
+		# The resulting pack should include objects reachable from E
+		# that are not present in packs B, C, or D, along with those
+		# present in pack A.
+		cat >in <<-EOF &&
+		^pack-$B.pack
+		^pack-$C.pack
+		^pack-$D.pack
+		EOF
+
+		P=$(git pack-objects --stdin-packs=follow --unpacked \
+			$packdir/pack <in) &&
+
+		{
+			objects_in_packs $A &&
+			git rev-list --objects --no-object-names D..E
+		}>expect.raw &&
+		sort expect.raw >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual
+	)
+'
+
+stdin_packs__follow_with_only () {
+	rm -fr stdin_packs__follow_with_only &&
+	git init stdin_packs__follow_with_only &&
+	(
+		cd stdin_packs__follow_with_only &&
+
+		test_commit A &&
+		test_commit B &&
+
+		git rev-parse "$@" >B.objects &&
+
+		echo A | git pack-objects --revs $packdir/pack &&
+		B="$(git pack-objects $packdir/pack <B.objects)" &&
+
+		git cat-file --batch-check="%(objectname)" --batch-all-objects >objs &&
+		for obj in $(cat objs)
+		do
+			rm -f $objdir/$(test_oid_to_path $obj) || return 1
+		done &&
+
+		( cd $packdir && ls pack-*.pack ) >in &&
+		git pack-objects --stdin-packs=follow --stdout >/dev/null <in
+	)
+}
+
+test_expect_success '--stdin-packs=follow tolerates missing blobs' '
+	stdin_packs__follow_with_only HEAD HEAD^{tree}
+'
+
+test_expect_success '--stdin-packs=follow tolerates missing trees' '
+	stdin_packs__follow_with_only HEAD HEAD:B.t
+'
+
+test_expect_success '--stdin-packs=follow tolerates missing commits' '
+	stdin_packs__follow_with_only HEAD HEAD^{tree}
+'
+
 test_done
-- 
2.50.0.61.gf819b10624.dirty


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
                     ` (7 preceding siblings ...)
  2025-06-19 23:30   ` [PATCH v5 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
@ 2025-06-19 23:30   ` Taylor Blau
  2025-06-21  4:35     ` Jeff King
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-06-19 23:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.

While the previous patch added the '--stdin-packs=follow' option to
pack-objects, it is not yet on by default. Given that, suppose you have
a once-unreachable object packed in a cruft pack, which later becomes
reachable from one or more objects in a geometrically repacked pack.
That once-unreachable object *won't* appear in the new pack, since the
cruft pack was not specified as included or excluded when the
geometrically repacked pack was created with 'pack-objects
--stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.

To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
will be closed under reachability by including cruft pack(s). If cruft
pack(s) were not included, we would fail to generate a MIDX bitmap. But
ddee3703b3 alludes to the fact that this is sub-optimal by saying

    [...] it's desirable to avoid including cruft packs in the MIDX
    because it causes the MIDX to store a bunch of objects which are
    likely to get thrown away.

, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.

This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.

This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.

Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.

If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.

But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.

(One exception here is when "starting from scratch" results in a noop
repack, e.g., because the non-cruft pack(s) in a repository already form
a geometric progression. Since we can't tell whether or not those were
generated with '--stdin-packs=follow', they may depend on
once-unreachable objects, so we have to include the cruft pack in the
MIDX in this case.)

Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/repack.adoc |   7 ++
 builtin/repack.c                 | 184 +++++++++++++++++++++++++++----
 t/t7704-repack-cruft.sh          | 145 ++++++++++++++++++++++++
 3 files changed, 316 insertions(+), 20 deletions(-)

diff --git a/Documentation/config/repack.adoc b/Documentation/config/repack.adoc
index c79af6d7b8..e9e78dcb19 100644
--- a/Documentation/config/repack.adoc
+++ b/Documentation/config/repack.adoc
@@ -39,3 +39,10 @@ repack.cruftThreads::
 	a cruft pack and the respective parameters are not given over
 	the command line. See similarly named `pack.*` configuration
 	variables for defaults and meaning.
+
+repack.midxMustContainCruft::
+	When set to true, linkgit:git-repack[1] will unconditionally include
+	cruft pack(s), if any, in the multi-pack index when invoked with
+	`--write-midx`. When false, cruft packs are only included in the MIDX
+	when necessary (e.g., because they might be required to form a
+	reachability closure with MIDX bitmaps). Defaults to true.
diff --git a/builtin/repack.c b/builtin/repack.c
index 5ddc6e7f95..346d44fbcd 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -39,6 +39,7 @@ static int write_bitmaps = -1;
 static int use_delta_islands;
 static int run_update_server_info = 1;
 static char *packdir, *packtmp_name, *packtmp;
+static int midx_must_contain_cruft = 1;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
@@ -108,6 +109,10 @@ static int repack_config(const char *var, const char *value,
 		free(cruft_po_args->threads);
 		return git_config_string(&cruft_po_args->threads, var, value);
 	}
+	if (!strcmp(var, "repack.midxmustcontaincruft")) {
+		midx_must_contain_cruft = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, ctx, cb);
 }
 
@@ -690,6 +695,77 @@ static void free_pack_geometry(struct pack_geometry *geometry)
 	free(geometry->pack);
 }
 
+static int midx_has_unknown_packs(char **midx_pack_names,
+				  size_t midx_pack_names_nr,
+				  struct string_list *include,
+				  struct pack_geometry *geometry,
+				  struct existing_packs *existing)
+{
+	size_t i;
+
+	string_list_sort(include);
+
+	for (i = 0; i < midx_pack_names_nr; i++) {
+		const char *pack_name = midx_pack_names[i];
+
+		/*
+		 * Determine whether or not each MIDX'd pack from the existing
+		 * MIDX (if any) is represented in the new MIDX. For each pack
+		 * in the MIDX, it must either be:
+		 *
+		 *  - In the "include" list of packs to be included in the new
+		 *    MIDX. Note this function is called before the include
+		 *    list is populated with any cruft pack(s).
+		 *
+		 *  - Below the geometric split line (if using pack geometry),
+		 *    indicating that the pack won't be included in the new
+		 *    MIDX, but its contents were rolled up as part of the
+		 *    geometric repack.
+		 *
+		 *  - In the existing non-kept packs list (if not using pack
+		 *    geometry), and marked as non-deleted.
+		 */
+		if (string_list_has_string(include, pack_name)) {
+			continue;
+		} else if (geometry) {
+			struct strbuf buf = STRBUF_INIT;
+			uint32_t j;
+
+			for (j = 0; j < geometry->split; j++) {
+				strbuf_reset(&buf);
+				strbuf_addstr(&buf, pack_basename(geometry->pack[j]));
+				strbuf_strip_suffix(&buf, ".pack");
+				strbuf_addstr(&buf, ".idx");
+
+				if (!strcmp(pack_name, buf.buf)) {
+					strbuf_release(&buf);
+					break;
+				}
+			}
+
+			strbuf_release(&buf);
+
+			if (j < geometry->split)
+				continue;
+		} else {
+			struct string_list_item *item;
+
+			item = string_list_lookup(&existing->non_kept_packs,
+						  pack_name);
+			if (item && !pack_is_marked_for_deletion(item))
+				continue;
+		}
+
+		/*
+		 * If we got to this point, the MIDX includes some pack that we
+		 * don't know about.
+		 */
+		return 1;
+	}
+
+	return 0;
+}
+
 struct midx_snapshot_ref_data {
 	struct tempfile *f;
 	struct oidset seen;
@@ -758,6 +834,8 @@ static void midx_snapshot_refs(struct tempfile *f)
 
 static void midx_included_packs(struct string_list *include,
 				struct existing_packs *existing,
+				char **midx_pack_names,
+				size_t midx_pack_names_nr,
 				struct string_list *names,
 				struct pack_geometry *geometry)
 {
@@ -811,26 +889,56 @@ static void midx_included_packs(struct string_list *include,
 		}
 	}
 
-	for_each_string_list_item(item, &existing->cruft_packs) {
+	if (midx_must_contain_cruft ||
+	    midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
+				   include, geometry, existing)) {
 		/*
-		 * When doing a --geometric repack, there is no need to check
-		 * for deleted packs, since we're by definition not doing an
-		 * ALL_INTO_ONE repack (hence no packs will be deleted).
-		 * Otherwise we must check for and exclude any packs which are
-		 * enqueued for deletion.
+		 * If there are one or more unknown pack(s) present (see
+		 * midx_has_unknown_packs() for what makes a pack
+		 * "unknown") in the MIDX before the repack, keep them
+		 * as they may be required to form a reachability
+		 * closure if the MIDX is bitmapped.
 		 *
-		 * So we could omit the conditional below in the --geometric
-		 * case, but doing so is unnecessary since no packs are marked
-		 * as pending deletion (since we only call
-		 * `mark_packs_for_deletion()` when doing an all-into-one
-		 * repack).
+		 * For example, a cruft pack can be required to form a
+		 * reachability closure if the MIDX is bitmapped and one
+		 * or more of the bitmap's selected commits reaches a
+		 * once-cruft object that was later made reachable.
 		 */
-		if (pack_is_marked_for_deletion(item))
-			continue;
+		for_each_string_list_item(item, &existing->cruft_packs) {
+			/*
+			 * When doing a --geometric repack, there is no
+			 * need to check for deleted packs, since we're
+			 * by definition not doing an ALL_INTO_ONE
+			 * repack (hence no packs will be deleted).
+			 * Otherwise we must check for and exclude any
+			 * packs which are enqueued for deletion.
+			 *
+			 * So we could omit the conditional below in the
+			 * --geometric case, but doing so is unnecessary
+			 *  since no packs are marked as pending
+			 *  deletion (since we only call
+			 *  `mark_packs_for_deletion()` when doing an
+			 *  all-into-one repack).
+			 */
+			if (pack_is_marked_for_deletion(item))
+				continue;
 
-		strbuf_reset(&buf);
-		strbuf_addf(&buf, "%s.idx", item->string);
-		string_list_insert(include, buf.buf);
+			strbuf_reset(&buf);
+			strbuf_addf(&buf, "%s.idx", item->string);
+			string_list_insert(include, buf.buf);
+		}
+	} else {
+		/*
+		 * Modern versions of Git (with the appropriate
+		 * configuration setting) will write new copies of
+		 * once-cruft objects when doing a --geometric repack.
+		 *
+		 * If the MIDX has no cruft pack, new packs written
+		 * during a --geometric repack will not rely on the
+		 * cruft pack to form a reachability closure, so we can
+		 * avoid including them in the MIDX in that case.
+		 */
+		;
 	}
 
 	strbuf_release(&buf);
@@ -1145,6 +1253,8 @@ int cmd_repack(int argc,
 	struct tempfile *refs_snapshot = NULL;
 	int i, ext, ret;
 	int show_progress;
+	char **midx_pack_names = NULL;
+	size_t midx_pack_names_nr = 0;
 
 	/* variables to be filled by option parsing */
 	int delete_redundant = 0;
@@ -1361,7 +1471,10 @@ int cmd_repack(int argc,
 		    !(pack_everything & PACK_CRUFT))
 			strvec_push(&cmd.args, "--pack-loose-unreachable");
 	} else if (geometry.split_factor) {
-		strvec_push(&cmd.args, "--stdin-packs");
+		if (midx_must_contain_cruft)
+			strvec_push(&cmd.args, "--stdin-packs");
+		else
+			strvec_push(&cmd.args, "--stdin-packs=follow");
 		strvec_push(&cmd.args, "--unpacked");
 	} else {
 		strvec_push(&cmd.args, "--unpacked");
@@ -1401,8 +1514,25 @@ int cmd_repack(int argc,
 	if (ret)
 		goto cleanup;
 
-	if (!names.nr && !po_args.quiet)
-		printf_ln(_("Nothing new to pack."));
+	if (!names.nr) {
+		if (!po_args.quiet)
+			printf_ln(_("Nothing new to pack."));
+		/*
+		 * If we didn't write any new packs, the non-cruft packs
+		 * may refer to once-unreachable objects in the cruft
+		 * pack(s).
+		 *
+		 * If there isn't already a MIDX, the one we write
+		 * must include the cruft pack(s), in case the
+		 * non-cruft pack(s) refer to once-cruft objects.
+		 *
+		 * If there is already a MIDX, we can punt here, since
+		 * midx_has_unknown_packs() will make the decision for
+		 * us.
+		 */
+		if (!get_local_multi_pack_index(the_repository))
+			midx_must_contain_cruft = 1;
+	}
 
 	if (pack_everything & PACK_CRUFT) {
 		const char *pack_prefix = find_pack_prefix(packdir, packtmp);
@@ -1483,6 +1613,16 @@ int cmd_repack(int argc,
 
 	string_list_sort(&names);
 
+	if (get_local_multi_pack_index(the_repository)) {
+		uint32_t i;
+		struct multi_pack_index *m =
+			get_local_multi_pack_index(the_repository);
+
+		ALLOC_ARRAY(midx_pack_names, m->num_packs);
+		for (i = 0; i < m->num_packs; i++)
+			midx_pack_names[midx_pack_names_nr++] = xstrdup(m->pack_names[i]);
+	}
+
 	close_object_store(the_repository->objects);
 
 	/*
@@ -1524,7 +1664,8 @@ int cmd_repack(int argc,
 
 	if (write_midx) {
 		struct string_list include = STRING_LIST_INIT_DUP;
-		midx_included_packs(&include, &existing, &names, &geometry);
+		midx_included_packs(&include, &existing, midx_pack_names,
+				    midx_pack_names_nr, &names, &geometry);
 
 		ret = write_midx_included_packs(&include, &geometry, &names,
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
@@ -1575,6 +1716,9 @@ int cmd_repack(int argc,
 	string_list_clear(&names, 1);
 	existing_packs_release(&existing);
 	free_pack_geometry(&geometry);
+	for (size_t i = 0; i < midx_pack_names_nr; i++)
+		free(midx_pack_names[i]);
+	free(midx_pack_names);
 	pack_objects_args_release(&po_args);
 	pack_objects_args_release(&cruft_po_args);
 
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 8aebfb45f5..aa2e2e6ad8 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -724,4 +724,149 @@ test_expect_success 'cruft repack respects --quiet' '
 	)
 '
 
+setup_cruft_exclude_tests() {
+	git init "$1" &&
+	(
+		cd "$1" &&
+
+		git config repack.midxMustContainCruft false &&
+
+		test_commit one &&
+
+		test_commit --no-tag two &&
+		two="$(git rev-parse HEAD)" &&
+		test_commit --no-tag three &&
+		three="$(git rev-parse HEAD)" &&
+		git reset --hard one &&
+		git reflog expire --all --expire=all &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack --cruft -d &&
+
+		git merge $two &&
+		test_commit four
+	)
+}
+
+test_expect_success 'repack --write-midx excludes cruft where possible' '
+	setup_cruft_exclude_tests exclude-cruft-when-possible &&
+	(
+		cd exclude-cruft-when-possible &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git rev-list --all --objects --no-object-names >reachable.raw &&
+		sort reachable.raw >reachable.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp reachable.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when instructed' '
+	setup_cruft_exclude_tests exclude-cruft-when-instructed &&
+	(
+		cd exclude-cruft-when-instructed &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git -c repack.midxMustContainCruft=true repack \
+			-d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git cat-file --batch-check="%(objectname)" --batch-all-objects \
+			>all.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp all.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when necessary' '
+	setup_cruft_exclude_tests exclude-cruft-when-necessary &&
+	(
+		cd exclude-cruft-when-necessary &&
+
+		test_path_is_file $(ls $packdir/pack-*.mtimes) &&
+		( cd $packdir && ls pack-*.idx ) | sort >packs.all &&
+		git multi-pack-index write --stdin-packs --bitmap <packs.all &&
+
+		test_commit five &&
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+		git cat-file --batch-all-objects --batch-check="%(objectname)" \
+			>expect.objects &&
+		test_cmp expect.objects midx.objects &&
+
+		grep "^pack-" midx >midx.packs &&
+		test_line_count = "$(($(wc -l <packs.all) + 1))" midx.packs
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when already geometric' '
+	git init repack--write-midx-geometric-noop &&
+	(
+		cd repack--write-midx-geometric-noop &&
+
+		git branch -M main &&
+		test_commit A &&
+		test_commit B &&
+
+		git checkout -B side &&
+		test_commit --no-tag C &&
+		C="$(git rev-parse HEAD)" &&
+
+		git checkout main &&
+		git branch -D side &&
+		git reflog expire --all --expire=all &&
+
+		# At this point we have two packs: one containing the
+		# objects belonging to commits A and B, and another
+		# (cruft) pack containing the objects belonging to
+		# commit C.
+		git repack --cruft -d &&
+
+		# Create a third pack which contains a merge commit
+		# making commit C reachable again.
+		#
+		# --no-ff is important here, as it ensures that we
+		# actually write a new object and subsequently a new
+		# pack to contain it.
+		git merge --no-ff $C &&
+		git repack -d &&
+
+		ls $packdir/pack-*.idx | sort >packs.all &&
+		cruft="$(ls $packdir/pack-*.mtimes)" &&
+		cruft="${cruft%.mtimes}.idx" &&
+
+		for idx in $(grep -v $cruft <packs.all)
+		do
+			git show-index <$idx >out &&
+			wc -l <out || return 1
+		done >sizes.raw &&
+
+		# Make sure that there are two non-cruft packs, and
+		# that one of them contains at least twice as many
+		# objects as the other, ensuring that they are already
+		# in a geometric progression.
+		sort -n sizes.raw >sizes &&
+		test_line_count = 2 sizes &&
+		s1=$(head -n 1 sizes) &&
+		s2=$(tail -n 1 sizes) &&
+		test "$s2" -gt "$((2 * $s1))" &&
+
+		git -c repack.midxMustContainCruft=false repack --geometric=2 \
+			--write-midx --write-bitmap-index
+	)
+'
+
 test_done
-- 
2.50.0.61.gf819b10624.dirty

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH v5 8/9] pack-objects: introduce '--stdin-packs=follow'
  2025-06-19 23:30   ` [PATCH v5 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
@ 2025-06-20 15:27     ` Junio C Hamano
  0 siblings, 0 replies; 105+ messages in thread
From: Junio C Hamano @ 2025-06-20 15:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> +	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
> +	if (mode == STDIN_PACKS_MODE_FOLLOW) {
> +		if (object->type == OBJ_BLOB && !has_object(the_repository,
> +							    &object->oid, 0))
> +			return;

Sorry for making a comment that is not about the contents of the
patch, but since we were discussing clang-format elsewhere, and this
happens to be a case the tool gets it right, the above should read
more like:

		if (object->type == OBJ_BLOB &&
		    !has_object(the_repository, &object->oid, 0))
			return;

cf. Documentation/CodingGuidelines

 - When splitting a long logical line, with everything else being
   equal, it is preferable to split after the operator at higher
   level in the parse tree.



^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-06-19 23:30   ` [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
@ 2025-06-21  4:35     ` Jeff King
  2025-06-23 18:47       ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Jeff King @ 2025-06-21  4:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Jun 19, 2025 at 07:30:33PM -0400, Taylor Blau wrote:

> +test_expect_success 'repack --write-midx excludes cruft where possible' '
> +	setup_cruft_exclude_tests exclude-cruft-when-possible &&
> +	(
> +		cd exclude-cruft-when-possible &&
> +
> +		GIT_TEST_MULTI_PACK_INDEX=0 \
> +		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
> +
> +		test-tool read-midx --show-objects $objdir >midx &&
> +		cruft="$(ls $packdir/*.mtimes)" &&
> +		test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
> +
> +		git rev-list --all --objects --no-object-names >reachable.raw &&
> +		sort reachable.raw >reachable.objects &&
> +		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
> +
> +		test_cmp reachable.objects midx.objects
> +	)
> +'

This test (but none of the others) fails when run with:

  GIT_TEST_MULTI_PACK_INDEX=1 \
  GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=1 \
  ./t7704-repack-cruft.sh

The culprit is the incremental flag, but you need the first one for the
second to do anything. The issue is that the cruft pack unexpectedly
appears in the midx:

  error: '! grep pack-45dcff625845dc0ad702f91d853d0950f9be0eb9.idx midx' did find a match in:
  header: 4d494458 1 20 6 3
  chunks: pack-names oid-fanout oid-lookup object-offsets
  num_objects: 12
  packs:
  pack-110f8bab659db6e691a75b6462d043214fd1da92.idx
  pack-45dcff625845dc0ad702f91d853d0950f9be0eb9.idx
  pack-91db7b4b856b00c3e675824c5bc5389b6810037a.idx
  object-dir: .git/objects
  07d4aa2eb79f3a92e1dadaee6ef6b883cdbba641 12	.git/objects/pack/pack-45dcff625845dc0ad702f91d853d0950f9be0eb9.pack
  139b20d8e6c5b496de61f033f642d0e3dbff528d 114	.git/objects/pack/pack-45dcff625845dc0ad702f91d853d0950f9be0eb9.pack
  25e18d2c3e3563b690593dcce936302010e6aa7e 12	.git/objects/pack/pack-110f8bab659db6e691a75b6462d043214fd1da92.pack
  2bdf67abb163a4ffb2d7f3f0880c9fe5068ce782 270	.git/objects/pack/pack-45dcff625845dc0ad702f91d853d0950f9be0eb9.pack
  2f00a404aed7e63d867313d504bd0fccea53fd25 285	.git/objects/pack/pack-45dcff625845dc0ad702f91d853d0950f9be0eb9.pack
  5626abf0f72e58d7a153368ba57db4c673c0e171 182	.git/objects/pack/pack-91db7b4b856b00c3e675824c5bc5389b6810037a.pack
  7c7cd714e262561f73f3079dfca4e8724682ac21 358	.git/objects/pack/pack-45dcff625845dc0ad702f91d853d0950f9be0eb9.pack
  8510665149157c2bc901848c3e0b746954e9cbd9 271	.git/objects/pack/pack-110f8bab659db6e691a75b6462d043214fd1da92.pack
  a7cddf35737959e1438bc929b665619e9e79bfee 138	.git/objects/pack/pack-91db7b4b856b00c3e675824c5bc5389b6810037a.pack
  d79ce1670bdcb76e6d1da2ae095e890ccb326ae9 12	.git/objects/pack/pack-91db7b4b856b00c3e675824c5bc5389b6810037a.pack
  db6165b80a148f78daad30f4e29c7b77fe8f04c2 169	.git/objects/pack/pack-110f8bab659db6e691a75b6462d043214fd1da92.pack
  f719efd430d52bcfc8566a43b2eb655688d38871 517	.git/objects/pack/pack-45dcff625845dc0ad702f91d853d0950f9be0eb9.pack

I'm not sure if it's just a funky interaction with the hacky GIT_TEST_*
variables, or if it's a real bug.

-Peff

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-06-21  4:35     ` Jeff King
@ 2025-06-23 18:47       ` Taylor Blau
  2025-06-24 10:54         ` Jeff King
  0 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 18:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Sat, Jun 21, 2025 at 12:35:51AM -0400, Jeff King wrote:
> On Thu, Jun 19, 2025 at 07:30:33PM -0400, Taylor Blau wrote:
>
> > +test_expect_success 'repack --write-midx excludes cruft where possible' '
> > +	setup_cruft_exclude_tests exclude-cruft-when-possible &&
> > +	(
> > +		cd exclude-cruft-when-possible &&
> > +
> > +		GIT_TEST_MULTI_PACK_INDEX=0 \
> > +		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
> > +
> > +		test-tool read-midx --show-objects $objdir >midx &&
> > +		cruft="$(ls $packdir/*.mtimes)" &&
> > +		test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
> > +
> > +		git rev-list --all --objects --no-object-names >reachable.raw &&
> > +		sort reachable.raw >reachable.objects &&
> > +		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
> > +
> > +		test_cmp reachable.objects midx.objects
> > +	)
> > +'
>
> This test (but none of the others) fails when run with:
>
>   GIT_TEST_MULTI_PACK_INDEX=1 \
>   GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=1 \
>   ./t7704-repack-cruft.sh
>
> The culprit is the incremental flag, but you need the first one for the
> second to do anything. The issue is that the cruft pack unexpectedly
> appears in the midx:
>
> [...]
>
> I'm not sure if it's just a funky interaction with the hacky GIT_TEST_*
> variables, or if it's a real bug.

Thanks for spotting. This is definitely a real bug. The root cause here
is that our loop to gather the set of packs we know are in the MIDX does
not account for multi-layered / incremental MIDXs.

In our example, if there's a cruft pack in any other layer of a MIDX
besides the tip, the proposed implementation here won't realize it, and
thus (incorrectly) conclude that the cruft pack is not in the MIDX
already, so can thusly be omitted.

If we do this on top:

--- 8< ---
diff --git a/builtin/repack.c b/builtin/repack.c
index 346d44fbcd..8d1540a0fd 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1614,13 +1614,16 @@ int cmd_repack(int argc,
 	string_list_sort(&names);

 	if (get_local_multi_pack_index(the_repository)) {
-		uint32_t i;
 		struct multi_pack_index *m =
 			get_local_multi_pack_index(the_repository);

-		ALLOC_ARRAY(midx_pack_names, m->num_packs);
-		for (i = 0; i < m->num_packs; i++)
-			midx_pack_names[midx_pack_names_nr++] = xstrdup(m->pack_names[i]);
+		ALLOC_ARRAY(midx_pack_names,
+			    m->num_packs + m->num_packs_in_base);
+
+		for (; m; m = m->base_midx)
+			for (uint32_t i = 0; i < m->num_packs; i++)
+				midx_pack_names[midx_pack_names_nr++] =
+					xstrdup(m->pack_names[i]);
 	}

 	close_object_store(the_repository->objects);
--- >8 ---

(Note that this assumes that 'tb/prepare-midx-pack-cleanup' has not
landed, this could be a little bit simplified with the
nth_midx_pack_names() function. I kept these two separate so they could
proceed independently.)

Things work as expected. I'll send out a new round with this fix
Incorporated as well as a style issue that Junio noted earlier in this
thread.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) where possible
  2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
                   ` (11 preceding siblings ...)
  2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
@ 2025-06-23 22:32 ` Taylor Blau
  2025-06-23 22:32   ` [PATCH v6 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
                     ` (8 more replies)
  12 siblings, 9 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Here is an additional reroll of my series to create MIDXs that do not
include a repository's cruft pack(s).

Nearly everything is identical between this version and the previous
(v5), with two exceptions:

 - Adjusted where to split a long line in show_object_pack_hint().

 - Fixed a test failure with GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL

Thanks for Junio and Peff (respectively) for pointing out each of the
above. As usual, a range-diff is attached for convenience.

Thanks in advance for any review :-).

Taylor Blau (9):
  pack-objects: use standard option incompatibility functions
  pack-objects: limit scope in 'add_object_entry_from_pack()'
  pack-objects: factor out handling '--stdin-packs'
  pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  pack-objects: perform name-hash traversal for unpacked objects
  pack-objects: fix typo in 'show_object_pack_hint()'
  pack-objects: swap 'show_{object,commit}_pack_hint'
  pack-objects: introduce '--stdin-packs=follow'
  repack: exclude cruft pack(s) from the MIDX where possible

 Documentation/config/repack.adoc    |   7 +
 Documentation/git-pack-objects.adoc |  10 +-
 builtin/pack-objects.c              | 193 ++++++++++++++++++----------
 builtin/repack.c                    | 187 ++++++++++++++++++++++++---
 t/t5331-pack-objects-stdin.sh       | 122 +++++++++++++++++-
 t/t7704-repack-cruft.sh             | 145 +++++++++++++++++++++
 6 files changed, 573 insertions(+), 91 deletions(-)

Range-diff against v5:
 1:  19fab7a35c =  1:  8e7b2dacc7 pack-objects: use standard option incompatibility functions
 2:  6f2d3f17a4 =  2:  86fb36d317 pack-objects: limit scope in 'add_object_entry_from_pack()'
 3:  c06f5b264a =  3:  19e8c789e9 pack-objects: factor out handling '--stdin-packs'
 4:  40d7d87cb1 =  4:  c9f874eb94 pack-objects: declare 'rev_info' for '--stdin-packs' earlier
 5:  5e2599436c =  5:  6b0149a32d pack-objects: perform name-hash traversal for unpacked objects
 6:  3a5c3f63d8 =  6:  f31dd00a98 pack-objects: fix typo in 'show_object_pack_hint()'
 7:  796e8743f8 =  7:  5d15055985 pack-objects: swap 'show_{object,commit}_pack_hint'
 8:  8830775beb !  8:  3699c25337 pack-objects: introduce '--stdin-packs=follow'
    @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct objec
     -		return;
     +	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
     +	if (mode == STDIN_PACKS_MODE_FOLLOW) {
    -+		if (object->type == OBJ_BLOB && !has_object(the_repository,
    -+							    &object->oid, 0))
    ++		if (object->type == OBJ_BLOB &&
    ++		    !has_object(the_repository, &object->oid, 0))
     +			return;
     +		add_object_entry(&object->oid, object->type, name, 0);
     +	} else {
 9:  8f505179cc !  9:  f519777059 repack: exclude cruft pack(s) from the MIDX where possible
    @@ builtin/repack.c: int cmd_repack(int argc,
      	string_list_sort(&names);
      
     +	if (get_local_multi_pack_index(the_repository)) {
    -+		uint32_t i;
     +		struct multi_pack_index *m =
     +			get_local_multi_pack_index(the_repository);
     +
    -+		ALLOC_ARRAY(midx_pack_names, m->num_packs);
    -+		for (i = 0; i < m->num_packs; i++)
    -+			midx_pack_names[midx_pack_names_nr++] = xstrdup(m->pack_names[i]);
    ++		ALLOC_ARRAY(midx_pack_names,
    ++			    m->num_packs + m->num_packs_in_base);
    ++
    ++		for (; m; m = m->base_midx)
    ++			for (uint32_t i = 0; i < m->num_packs; i++)
    ++				midx_pack_names[midx_pack_names_nr++] =
    ++					xstrdup(m->pack_names[i]);
     +	}
     +
      	close_object_store(the_repository->objects);

base-commit: f9aa0eedb37eb94d9d3711ef0d565fd7cb3b6148
-- 
2.50.0.61.g1981e40f2d

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH v6 1/9] pack-objects: use standard option incompatibility functions
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
@ 2025-06-23 22:32   ` Taylor Blau
  2025-06-24 15:52     ` Junio C Hamano
  2025-06-23 22:32   ` [PATCH v6 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

pack-objects has a handful of explicit checks for pairs of command-line
options which are mutually incompatible. Many of these pre-date
a699367bb8 (i18n: factorize more 'incompatible options' messages,
2022-01-31).

Convert the explicit checks into die_for_incompatible_opt2() calls,
which simplifies the implementation and standardizes pack-objects'
output when given incompatible options (e.g., --stdin-packs with
--filter gives different output than --keep-unreachable with
--unpack-unreachable).

There is one minor piece of test fallout in t5331 that expects the old
format, which has been corrected.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c        | 20 +++++++++++---------
 t/t5331-pack-objects-stdin.sh |  2 +-
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 67941c8a60..e7274e0e00 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -5010,9 +5010,10 @@ int cmd_pack_objects(int argc,
 		strvec_push(&rp, "--unpacked");
 	}
 
-	if (exclude_promisor_objects && exclude_promisor_objects_best_effort)
-		die(_("options '%s' and '%s' cannot be used together"),
-		    "--exclude-promisor-objects", "--exclude-promisor-objects-best-effort");
+	die_for_incompatible_opt2(exclude_promisor_objects,
+				  "--exclude-promisor-objects",
+				  exclude_promisor_objects_best_effort,
+				  "--exclude-promisor-objects-best-effort");
 	if (exclude_promisor_objects) {
 		use_internal_rev_list = 1;
 		fetch_if_missing = 0;
@@ -5050,13 +5051,14 @@ int cmd_pack_objects(int argc,
 	if (!pack_to_stdout && thin)
 		die(_("--thin cannot be used to build an indexable pack"));
 
-	if (keep_unreachable && unpack_unreachable)
-		die(_("options '%s' and '%s' cannot be used together"), "--keep-unreachable", "--unpack-unreachable");
+	die_for_incompatible_opt2(keep_unreachable, "--keep-unreachable",
+				  unpack_unreachable, "--unpack-unreachable");
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (stdin_packs && filter_options.choice)
-		die(_("cannot use --filter with --stdin-packs"));
+	die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
+				  filter_options.choice, "--filter");
+
 
 	if (stdin_packs && use_internal_rev_list)
 		die(_("cannot use internal rev list with --stdin-packs"));
@@ -5064,8 +5066,8 @@ int cmd_pack_objects(int argc,
 	if (cruft) {
 		if (use_internal_rev_list)
 			die(_("cannot use internal rev list with --cruft"));
-		if (stdin_packs)
-			die(_("cannot use --stdin-packs with --cruft"));
+		die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
+					  cruft, "--cruft");
 	}
 
 	/*
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index b48c0cbe8f..8fd07deb8d 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -64,7 +64,7 @@ test_expect_success '--stdin-packs is incompatible with --filter' '
 		cd stdin-packs &&
 		test_must_fail git pack-objects --stdin-packs --stdout \
 			--filter=blob:none </dev/null 2>err &&
-		test_grep "cannot use --filter with --stdin-packs" err
+		test_grep "options .--stdin-packs. and .--filter. cannot be used together" err
 	)
 '
 
-- 
2.50.0.61.g1981e40f2d


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v6 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
  2025-06-23 22:32   ` [PATCH v6 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
@ 2025-06-23 22:32   ` Taylor Blau
  2025-06-23 22:49     ` Junio C Hamano
  2025-06-23 22:32   ` [PATCH v6 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In add_object_entry_from_pack() we declare 'revs' (given to us through
the miscellaneous context argument) earlier in the "if (p)" conditional
than is necessary.  Move it down as far as it can go to reduce its
scope.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e7274e0e00..d04a36a6bf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3725,7 +3725,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 		return 0;
 
 	if (p) {
-		struct rev_info *revs = _data;
 		struct object_info oi = OBJECT_INFO_INIT;
 
 		oi.typep = &type;
@@ -3733,6 +3732,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 			die(_("could not get type of object %s in pack %s"),
 			    oid_to_hex(oid), p->pack_name);
 		} else if (type == OBJ_COMMIT) {
+			struct rev_info *revs = _data;
 			/*
 			 * commits in included packs are used as starting points for the
 			 * subsequent revision walk
-- 
2.50.0.61.g1981e40f2d


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v6 3/9] pack-objects: factor out handling '--stdin-packs'
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
  2025-06-23 22:32   ` [PATCH v6 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
  2025-06-23 22:32   ` [PATCH v6 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
@ 2025-06-23 22:32   ` Taylor Blau
  2025-06-23 22:32   ` [PATCH v6 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

At the bottom of cmd_pack_objects() we check which mode the command is
running in (e.g., generating a cruft pack, handling '--stdin-packs',
using the internal rev-list, etc.) and handle the mode appropriately.

The '--stdin-packs' case is handled inline (dating back to its
introduction in 339bce27f4 (builtin/pack-objects.c: add '--stdin-packs'
option, 2021-02-22)) since it is relatively short. Extract the body of
"if (stdin_packs)" into its own function to prepare for the
implementation to become lengthier in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d04a36a6bf..7ce04b71dd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3909,6 +3909,17 @@ static void read_packs_list_from_stdin(void)
 	string_list_clear(&exclude_packs, 0);
 }
 
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin();
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+}
+
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
 				   struct packed_git *pack, off_t offset,
 				   const char *name, uint32_t mtime)
@@ -4004,7 +4015,6 @@ static void mark_pack_kept_in_core(struct string_list *packs, unsigned keep)
 	}
 }
 
-static void add_unreachable_loose_objects(void);
 static void add_objects_in_unpacked_packs(void);
 
 static void enumerate_cruft_objects(void)
@@ -5135,11 +5145,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		/* avoids adding objects in excluded packs */
-		ignore_packed_keep_in_core = 1;
-		read_packs_list_from_stdin();
-		if (rev_list_unpacked)
-			add_unreachable_loose_objects();
+		read_stdin_packs(rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
-- 
2.50.0.61.g1981e40f2d


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v6 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (2 preceding siblings ...)
  2025-06-23 22:32   ` [PATCH v6 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
@ 2025-06-23 22:32   ` Taylor Blau
  2025-06-23 22:59     ` Junio C Hamano
  2025-06-23 22:32   ` [PATCH v6 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
on each of the input packs, we do a reachability traversal to discover
names for any objects we picked up so we can generate name hash values
and hopefully get higher quality deltas as a result.

A future commit will change the purpose of this reachability traversal
to find and pack objects which are reachable from commits in the input
packs, but are packed in an unknown (not included nor excluded) pack.

Extract the code which initializes and performs the reachability
traversal to take place in the caller, not the callee, which prepares us
to share this code for the '--unpacked' case (see the function
add_unreachable_loose_objects() for more details).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 71 +++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 35 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7ce04b71dd..4258ac1792 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3793,7 +3793,7 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
 		return 0;
 }
 
-static void read_packs_list_from_stdin(void)
+static void read_packs_list_from_stdin(struct rev_info *revs)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct string_list include_packs = STRING_LIST_INIT_DUP;
@@ -3801,24 +3801,6 @@ static void read_packs_list_from_stdin(void)
 	struct string_list_item *item = NULL;
 
 	struct packed_git *p;
-	struct rev_info revs;
-
-	repo_init_revisions(the_repository, &revs, NULL);
-	/*
-	 * Use a revision walk to fill in the namehash of objects in the include
-	 * packs. To save time, we'll avoid traversing through objects that are
-	 * in excluded packs.
-	 *
-	 * That may cause us to avoid populating all of the namehash fields of
-	 * all included objects, but our goal is best-effort, since this is only
-	 * an optimization during delta selection.
-	 */
-	revs.no_kept_objects = 1;
-	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
-	revs.blob_objects = 1;
-	revs.tree_objects = 1;
-	revs.tag_objects = 1;
-	revs.ignore_missing_links = 1;
 
 	while (strbuf_getline(&buf, stdin) != EOF) {
 		if (!buf.len)
@@ -3888,10 +3870,44 @@ static void read_packs_list_from_stdin(void)
 		struct packed_git *p = item->util;
 		for_each_object_in_pack(p,
 					add_object_entry_from_pack,
-					&revs,
+					revs,
 					FOR_EACH_OBJECT_PACK_ORDER);
 	}
 
+	strbuf_release(&buf);
+	string_list_clear(&include_packs, 0);
+	string_list_clear(&exclude_packs, 0);
+}
+
+static void add_unreachable_loose_objects(void);
+
+static void read_stdin_packs(int rev_list_unpacked)
+{
+	struct rev_info revs;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	/*
+	 * Use a revision walk to fill in the namehash of objects in the include
+	 * packs. To save time, we'll avoid traversing through objects that are
+	 * in excluded packs.
+	 *
+	 * That may cause us to avoid populating all of the namehash fields of
+	 * all included objects, but our goal is best-effort, since this is only
+	 * an optimization during delta selection.
+	 */
+	revs.no_kept_objects = 1;
+	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
+	revs.blob_objects = 1;
+	revs.tree_objects = 1;
+	revs.tag_objects = 1;
+	revs.ignore_missing_links = 1;
+
+	/* avoids adding objects in excluded packs */
+	ignore_packed_keep_in_core = 1;
+	read_packs_list_from_stdin(&revs);
+	if (rev_list_unpacked)
+		add_unreachable_loose_objects();
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 	traverse_commit_list(&revs,
@@ -3903,21 +3919,6 @@ static void read_packs_list_from_stdin(void)
 			   stdin_packs_found_nr);
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
 			   stdin_packs_hints_nr);
-
-	strbuf_release(&buf);
-	string_list_clear(&include_packs, 0);
-	string_list_clear(&exclude_packs, 0);
-}
-
-static void add_unreachable_loose_objects(void);
-
-static void read_stdin_packs(int rev_list_unpacked)
-{
-	/* avoids adding objects in excluded packs */
-	ignore_packed_keep_in_core = 1;
-	read_packs_list_from_stdin();
-	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
 }
 
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
-- 
2.50.0.61.g1981e40f2d


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v6 5/9] pack-objects: perform name-hash traversal for unpacked objects
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (3 preceding siblings ...)
  2025-06-23 22:32   ` [PATCH v6 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
@ 2025-06-23 22:32   ` Taylor Blau
  2025-06-23 23:08     ` Junio C Hamano
  2025-06-23 22:32   ` [PATCH v6 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

With '--unpacked', pack-objects adds loose objects (which don't appear
in any of the excluded packs from '--stdin-packs') to the output pack
without considering them as reachability tips for the name-hash
traversal.

This was an oversight in the original implementation of '--stdin-packs',
since the code which enumerates and adds loose objects to the output
pack (`add_unreachable_loose_objects()`) did not have access to the
'rev_info' struct found in `read_packs_list_from_stdin()`.

Excluding unpacked objects from that traversal doesn't affect the
correctness of the resulting pack, but it does make it harder to
discover good deltas for loose objects.

Now that the 'rev_info' struct is declared outside of
`read_packs_list_from_stdin()`, we can pass it to
`add_objects_in_unpacked_packs()` and add any loose objects as tips to
the above-mentioned traversal, in theory producing slightly tighter
packs as a result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4258ac1792..3437dbd7f1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3879,7 +3879,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 	string_list_clear(&exclude_packs, 0);
 }
 
-static void add_unreachable_loose_objects(void);
+static void add_unreachable_loose_objects(struct rev_info *revs);
 
 static void read_stdin_packs(int rev_list_unpacked)
 {
@@ -3906,7 +3906,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	ignore_packed_keep_in_core = 1;
 	read_packs_list_from_stdin(&revs);
 	if (rev_list_unpacked)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(&revs);
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
@@ -4025,7 +4025,7 @@ static void enumerate_cruft_objects(void)
 						_("Enumerating cruft objects"), 0);
 
 	add_objects_in_unpacked_packs();
-	add_unreachable_loose_objects();
+	add_unreachable_loose_objects(NULL);
 
 	stop_progress(&progress_state);
 }
@@ -4303,8 +4303,9 @@ static void add_objects_in_unpacked_packs(void)
 }
 
 static int add_loose_object(const struct object_id *oid, const char *path,
-			    void *data UNUSED)
+			    void *data)
 {
+	struct rev_info *revs = data;
 	enum object_type type = oid_object_info(the_repository, oid, NULL);
 
 	if (type < 0) {
@@ -4325,6 +4326,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
 	} else {
 		add_object_entry(oid, type, "", 0);
 	}
+
+	if (revs && type == OBJ_COMMIT)
+		add_pending_oid(revs, NULL, oid, 0);
+
 	return 0;
 }
 
@@ -4333,11 +4338,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
  * add_object_entry will weed out duplicates, so we just add every
  * loose object we find.
  */
-static void add_unreachable_loose_objects(void)
+static void add_unreachable_loose_objects(struct rev_info *revs)
 {
 	for_each_loose_file_in_objdir(repo_get_object_directory(the_repository),
-				      add_loose_object,
-				      NULL, NULL, NULL);
+				      add_loose_object, NULL, NULL, revs);
 }
 
 static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid)
@@ -4684,7 +4688,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
 	if (keep_unreachable)
 		add_objects_in_unpacked_packs();
 	if (pack_loose_unreachable)
-		add_unreachable_loose_objects();
+		add_unreachable_loose_objects(NULL);
 	if (unpack_unreachable)
 		loosen_unused_packed_objects();
 
-- 
2.50.0.61.g1981e40f2d


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v6 6/9] pack-objects: fix typo in 'show_object_pack_hint()'
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (4 preceding siblings ...)
  2025-06-23 22:32   ` [PATCH v6 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
@ 2025-06-23 22:32   ` Taylor Blau
  2025-06-23 22:32   ` [PATCH v6 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

Noticed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3437dbd7f1..9580b4ea1a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3767,7 +3767,7 @@ static void show_object_pack_hint(struct object *object, const char *name,
 	 * would typically pick up during a reachability traversal.
 	 *
 	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * here using a now in order to perhaps improve the delta selection
+	 * fields here in order to perhaps improve the delta selection
 	 * process.
 	 */
 	oe->hash = pack_name_hash_fn(name);
-- 
2.50.0.61.g1981e40f2d


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v6 7/9] pack-objects: swap 'show_{object,commit}_pack_hint'
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (5 preceding siblings ...)
  2025-06-23 22:32   ` [PATCH v6 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
@ 2025-06-23 22:32   ` Taylor Blau
  2025-06-23 22:32   ` [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
  2025-06-23 22:32   ` [PATCH v6 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

show_commit_pack_hint() has heretofore been a noop, so its position
within its compilation unit only needs to appear before its first use.

But the following commit will sometimes have `show_commit_pack_hint()`
call `show_object_pack_hint()`, so reorder the former to appear after
the latter to minimize the code movement in that patch.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9580b4ea1a..f44447a3f9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3748,12 +3748,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 	return 0;
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
-{
-	/* nothing to do; commits don't have a namehash */
-}
-
 static void show_object_pack_hint(struct object *object, const char *name,
 				  void *data UNUSED)
 {
@@ -3776,6 +3770,12 @@ static void show_object_pack_hint(struct object *object, const char *name,
 	stdin_packs_hints_nr++;
 }
 
+static void show_commit_pack_hint(struct commit *commit UNUSED,
+				  void *data UNUSED)
+{
+	/* nothing to do; commits don't have a namehash */
+}
+
 static int pack_mtime_cmp(const void *_a, const void *_b)
 {
 	struct packed_git *a = ((const struct string_list_item*)_a)->util;
-- 
2.50.0.61.g1981e40f2d


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow'
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (6 preceding siblings ...)
  2025-06-23 22:32   ` [PATCH v6 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
@ 2025-06-23 22:32   ` Taylor Blau
  2025-06-23 23:35     ` Junio C Hamano
  2025-06-23 22:32   ` [PATCH v6 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
  8 siblings, 1 reply; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

When invoked with '--stdin-packs', pack-objects will generate a pack
which contains the objects found in the "included" packs, less any
objects from "excluded" packs.

Packs that exist in the repository but weren't specified as either
included or excluded are in practice treated like the latter, at least
in the sense that pack-objects won't include objects from those packs.
This behavior forces us to include any cruft pack(s) in a repository's
multi-pack index for the reasons described in ddee3703b3
(builtin/repack.c: add cruft packs to MIDX during geometric repack,
2022-05-20).

The full details are in ddee3703b3, but the gist is if you
have a once-unreachable object in a cruft pack which later becomes
reachable via one or more commits in a pack generated with
'--stdin-packs', you *have* to include that object in the MIDX via the
copy in the cruft pack, otherwise we cannot generate reachability
bitmaps for any commits which reach that object.

Note that the traversal here is best-effort, similar to the existing
traversal which provides name-hash hints. This means that the object
traversal may hand us back a blob that does not actually exist. We
*won't* see missing trees/commits with 'ignore_missing_links' because:

 - missing commit parents are discarded at the commit traversal stage by
   revision.c::process_parents()

 - missing tag objects are discarded by revision.c::handle_commit()

 - missing tree objects are discarded by the list-objects code in
   list-objects.c::process_tree()

But we have to handle potentially-missing blobs specially by making a
separate check to ensure they exist in the repository. Failing to do so
would mean that we'd add an object to the packing list which doesn't
actually exist, rendering us unable to write out the pack.

This prepares us for new repacking behavior which will "resurrect"
objects found in cruft or otherwise unspecified packs when generating
new packs. In the context of geometric repacking, this may be used to
maintain a sequence of geometrically-repacked packs, the union of which
is closed under reachability, even in the case described earlier.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc |  10 ++-
 builtin/pack-objects.c              |  86 +++++++++++++++-----
 t/t5331-pack-objects-stdin.sh       | 120 ++++++++++++++++++++++++++++
 3 files changed, 193 insertions(+), 23 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index b1c5aa27da..eba014c406 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -87,13 +87,21 @@ base-name::
 	reference was included in the resulting packfile.  This
 	can be useful to send new tags to native Git clients.
 
---stdin-packs::
+--stdin-packs[=<mode>]::
 	Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
 	from the standard input, instead of object names or revision
 	arguments. The resulting pack contains all objects listed in the
 	included packs (those not beginning with `^`), excluding any
 	objects listed in the excluded packs (beginning with `^`).
 +
+When `mode` is "follow", objects from packs not listed on stdin receive
+special treatment. Objects within unlisted packs will be included if
+those objects are (1) reachable from the included packs, and (2) not
+found in any excluded packs. This mode is useful, for example, to
+resurrect once-unreachable objects found in cruft packs to generate
+packs which are closed under reachability up to the boundary set by the
+excluded packs.
++
 Incompatible with `--revs`, or options that imply `--revs` (such as
 `--all`), with the exception of `--unpacked`, which is compatible.
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f44447a3f9..4ae52c6a29 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -284,6 +284,12 @@ static struct oidmap configured_exclusions;
 static struct oidset excluded_by_config;
 static int name_hash_version = -1;
 
+enum stdin_packs_mode {
+	STDIN_PACKS_MODE_NONE,
+	STDIN_PACKS_MODE_STANDARD,
+	STDIN_PACKS_MODE_FOLLOW,
+};
+
 /**
  * Check whether the name_hash_version chosen by user input is appropriate,
  * and also validate whether it is compatible with other features.
@@ -3749,31 +3755,47 @@ static int add_object_entry_from_pack(const struct object_id *oid,
 }
 
 static void show_object_pack_hint(struct object *object, const char *name,
-				  void *data UNUSED)
+				  void *data)
 {
-	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
-	if (!oe)
-		return;
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		if (object->type == OBJ_BLOB &&
+		    !has_object(the_repository, &object->oid, 0))
+			return;
+		add_object_entry(&object->oid, object->type, name, 0);
+	} else {
+		struct object_entry *oe = packlist_find(&to_pack, &object->oid);
+		if (!oe)
+			return;
 
-	/*
-	 * Our 'to_pack' list was constructed by iterating all objects packed in
-	 * included packs, and so doesn't have a non-zero hash field that you
-	 * would typically pick up during a reachability traversal.
-	 *
-	 * Make a best-effort attempt to fill in the ->hash and ->no_try_delta
-	 * fields here in order to perhaps improve the delta selection
-	 * process.
-	 */
-	oe->hash = pack_name_hash_fn(name);
-	oe->no_try_delta = name && no_try_delta(name);
+		/*
+		 * Our 'to_pack' list was constructed by iterating all
+		 * objects packed in included packs, and so doesn't have
+		 * a non-zero hash field that you would typically pick
+		 * up during a reachability traversal.
+		 *
+		 * Make a best-effort attempt to fill in the ->hash and
+		 * ->no_try_delta fields here in order to perhaps
+		 * improve the delta selection process.
+		 */
+		oe->hash = pack_name_hash_fn(name);
+		oe->no_try_delta = name && no_try_delta(name);
 
-	stdin_packs_hints_nr++;
+		stdin_packs_hints_nr++;
+	}
 }
 
-static void show_commit_pack_hint(struct commit *commit UNUSED,
-				  void *data UNUSED)
+static void show_commit_pack_hint(struct commit *commit, void *data)
 {
+	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
+
+	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+		show_object_pack_hint((struct object *)commit, "", data);
+		return;
+	}
+
 	/* nothing to do; commits don't have a namehash */
+
 }
 
 static int pack_mtime_cmp(const void *_a, const void *_b)
@@ -3881,7 +3903,7 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
 
 static void add_unreachable_loose_objects(struct rev_info *revs);
 
-static void read_stdin_packs(int rev_list_unpacked)
+static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 {
 	struct rev_info revs;
 
@@ -3913,7 +3935,7 @@ static void read_stdin_packs(int rev_list_unpacked)
 	traverse_commit_list(&revs,
 			     show_commit_pack_hint,
 			     show_object_pack_hint,
-			     NULL);
+			     &mode);
 
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
 			   stdin_packs_found_nr);
@@ -4795,6 +4817,23 @@ static int is_not_in_promisor_pack(struct commit *commit, void *data) {
 	return is_not_in_promisor_pack_obj((struct object *) commit, data);
 }
 
+static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
+				  int unset)
+{
+	enum stdin_packs_mode *mode = opt->value;
+
+	if (unset)
+		*mode = STDIN_PACKS_MODE_NONE;
+	else if (!arg || !*arg)
+		*mode = STDIN_PACKS_MODE_STANDARD;
+	else if (!strcmp(arg, "follow"))
+		*mode = STDIN_PACKS_MODE_FOLLOW;
+	else
+		die(_("invalid value for '%s': '%s'"), opt->long_name, arg);
+
+	return 0;
+}
+
 int cmd_pack_objects(int argc,
 		     const char **argv,
 		     const char *prefix,
@@ -4805,7 +4844,7 @@ int cmd_pack_objects(int argc,
 	struct strvec rp = STRVEC_INIT;
 	int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
 	int rev_list_index = 0;
-	int stdin_packs = 0;
+	enum stdin_packs_mode stdin_packs = STDIN_PACKS_MODE_NONE;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
 	struct list_objects_filter_options filter_options =
 		LIST_OBJECTS_FILTER_INIT;
@@ -4860,6 +4899,9 @@ int cmd_pack_objects(int argc,
 		OPT_SET_INT_F(0, "indexed-objects", &rev_list_index,
 			      N_("include objects referred to by the index"),
 			      1, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F(0, "stdin-packs", &stdin_packs, N_("mode"),
+			     N_("read packs from stdin"),
+			     PARSE_OPT_OPTARG, parse_stdin_packs_mode),
 		OPT_BOOL(0, "stdin-packs", &stdin_packs,
 			 N_("read packs from stdin")),
 		OPT_BOOL(0, "stdout", &pack_to_stdout,
@@ -5150,7 +5192,7 @@ int cmd_pack_objects(int argc,
 		progress_state = start_progress(the_repository,
 						_("Enumerating objects"), 0);
 	if (stdin_packs) {
-		read_stdin_packs(rev_list_unpacked);
+		read_stdin_packs(stdin_packs, rev_list_unpacked);
 	} else if (cruft) {
 		read_cruft_objects();
 	} else if (!use_internal_rev_list) {
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 8fd07deb8d..4a8df5a389 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -236,4 +236,124 @@ test_expect_success 'pack-objects --stdin with packfiles from main and alternate
 	test_cmp expected-objects actual-objects
 '
 
+objdir=.git/objects
+packdir=$objdir/pack
+
+objects_in_packs () {
+	for p in "$@"
+	do
+		git show-index <"$packdir/pack-$p.idx" || return 1
+	done >objects.raw &&
+
+	cut -d' ' -f2 objects.raw | sort &&
+	rm -f objects.raw
+}
+
+test_expect_success '--stdin-packs=follow walks into unknown packs' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+
+		for c in A B C D
+		do
+			test_commit "$c" || return 1
+		done &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
+		test_commit E &&
+
+		git prune-packed &&
+
+		cat >in <<-EOF &&
+		pack-$B.pack
+		^pack-$C.pack
+		pack-$D.pack
+		EOF
+
+		# With just --stdin-packs, pack "A" is unknown to us, so
+		# only objects from packs "B" and "D" are included in
+		# the output pack.
+		P=$(git pack-objects --stdin-packs $packdir/pack <in) &&
+		objects_in_packs $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# But with --stdin-packs=follow, objects from both
+		# included packs reach objects from the unknown pack, so
+		# objects from pack "A" is included in the output pack
+		# in addition to the above.
+		P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
+		objects_in_packs $A $B $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual &&
+
+		# And with --unpacked, we will pick up objects from unknown
+		# packs that are reachable from loose objects. Loose object E
+		# reaches objects in pack A, but there are three excluded packs
+		# in between.
+		#
+		# The resulting pack should include objects reachable from E
+		# that are not present in packs B, C, or D, along with those
+		# present in pack A.
+		cat >in <<-EOF &&
+		^pack-$B.pack
+		^pack-$C.pack
+		^pack-$D.pack
+		EOF
+
+		P=$(git pack-objects --stdin-packs=follow --unpacked \
+			$packdir/pack <in) &&
+
+		{
+			objects_in_packs $A &&
+			git rev-list --objects --no-object-names D..E
+		}>expect.raw &&
+		sort expect.raw >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual
+	)
+'
+
+stdin_packs__follow_with_only () {
+	rm -fr stdin_packs__follow_with_only &&
+	git init stdin_packs__follow_with_only &&
+	(
+		cd stdin_packs__follow_with_only &&
+
+		test_commit A &&
+		test_commit B &&
+
+		git rev-parse "$@" >B.objects &&
+
+		echo A | git pack-objects --revs $packdir/pack &&
+		B="$(git pack-objects $packdir/pack <B.objects)" &&
+
+		git cat-file --batch-check="%(objectname)" --batch-all-objects >objs &&
+		for obj in $(cat objs)
+		do
+			rm -f $objdir/$(test_oid_to_path $obj) || return 1
+		done &&
+
+		( cd $packdir && ls pack-*.pack ) >in &&
+		git pack-objects --stdin-packs=follow --stdout >/dev/null <in
+	)
+}
+
+test_expect_success '--stdin-packs=follow tolerates missing blobs' '
+	stdin_packs__follow_with_only HEAD HEAD^{tree}
+'
+
+test_expect_success '--stdin-packs=follow tolerates missing trees' '
+	stdin_packs__follow_with_only HEAD HEAD:B.t
+'
+
+test_expect_success '--stdin-packs=follow tolerates missing commits' '
+	stdin_packs__follow_with_only HEAD HEAD^{tree}
+'
+
 test_done
-- 
2.50.0.61.g1981e40f2d


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH v6 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
                     ` (7 preceding siblings ...)
  2025-06-23 22:32   ` [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
@ 2025-06-23 22:32   ` Taylor Blau
  8 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-23 22:32 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.

While the previous patch added the '--stdin-packs=follow' option to
pack-objects, it is not yet on by default. Given that, suppose you have
a once-unreachable object packed in a cruft pack, which later becomes
reachable from one or more objects in a geometrically repacked pack.
That once-unreachable object *won't* appear in the new pack, since the
cruft pack was not specified as included or excluded when the
geometrically repacked pack was created with 'pack-objects
--stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.

To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
will be closed under reachability by including cruft pack(s). If cruft
pack(s) were not included, we would fail to generate a MIDX bitmap. But
ddee3703b3 alludes to the fact that this is sub-optimal by saying

    [...] it's desirable to avoid including cruft packs in the MIDX
    because it causes the MIDX to store a bunch of objects which are
    likely to get thrown away.

, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.

This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.

This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.

Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.

If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.

But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.

(One exception here is when "starting from scratch" results in a noop
repack, e.g., because the non-cruft pack(s) in a repository already form
a geometric progression. Since we can't tell whether or not those were
generated with '--stdin-packs=follow', they may depend on
once-unreachable objects, so we have to include the cruft pack in the
MIDX in this case.)

Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/repack.adoc |   7 ++
 builtin/repack.c                 | 187 +++++++++++++++++++++++++++----
 t/t7704-repack-cruft.sh          | 145 ++++++++++++++++++++++++
 3 files changed, 319 insertions(+), 20 deletions(-)

diff --git a/Documentation/config/repack.adoc b/Documentation/config/repack.adoc
index c79af6d7b8..e9e78dcb19 100644
--- a/Documentation/config/repack.adoc
+++ b/Documentation/config/repack.adoc
@@ -39,3 +39,10 @@ repack.cruftThreads::
 	a cruft pack and the respective parameters are not given over
 	the command line. See similarly named `pack.*` configuration
 	variables for defaults and meaning.
+
+repack.midxMustContainCruft::
+	When set to true, linkgit:git-repack[1] will unconditionally include
+	cruft pack(s), if any, in the multi-pack index when invoked with
+	`--write-midx`. When false, cruft packs are only included in the MIDX
+	when necessary (e.g., because they might be required to form a
+	reachability closure with MIDX bitmaps). Defaults to true.
diff --git a/builtin/repack.c b/builtin/repack.c
index 5ddc6e7f95..8d1540a0fd 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -39,6 +39,7 @@ static int write_bitmaps = -1;
 static int use_delta_islands;
 static int run_update_server_info = 1;
 static char *packdir, *packtmp_name, *packtmp;
+static int midx_must_contain_cruft = 1;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
@@ -108,6 +109,10 @@ static int repack_config(const char *var, const char *value,
 		free(cruft_po_args->threads);
 		return git_config_string(&cruft_po_args->threads, var, value);
 	}
+	if (!strcmp(var, "repack.midxmustcontaincruft")) {
+		midx_must_contain_cruft = git_config_bool(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, ctx, cb);
 }
 
@@ -690,6 +695,77 @@ static void free_pack_geometry(struct pack_geometry *geometry)
 	free(geometry->pack);
 }
 
+static int midx_has_unknown_packs(char **midx_pack_names,
+				  size_t midx_pack_names_nr,
+				  struct string_list *include,
+				  struct pack_geometry *geometry,
+				  struct existing_packs *existing)
+{
+	size_t i;
+
+	string_list_sort(include);
+
+	for (i = 0; i < midx_pack_names_nr; i++) {
+		const char *pack_name = midx_pack_names[i];
+
+		/*
+		 * Determine whether or not each MIDX'd pack from the existing
+		 * MIDX (if any) is represented in the new MIDX. For each pack
+		 * in the MIDX, it must either be:
+		 *
+		 *  - In the "include" list of packs to be included in the new
+		 *    MIDX. Note this function is called before the include
+		 *    list is populated with any cruft pack(s).
+		 *
+		 *  - Below the geometric split line (if using pack geometry),
+		 *    indicating that the pack won't be included in the new
+		 *    MIDX, but its contents were rolled up as part of the
+		 *    geometric repack.
+		 *
+		 *  - In the existing non-kept packs list (if not using pack
+		 *    geometry), and marked as non-deleted.
+		 */
+		if (string_list_has_string(include, pack_name)) {
+			continue;
+		} else if (geometry) {
+			struct strbuf buf = STRBUF_INIT;
+			uint32_t j;
+
+			for (j = 0; j < geometry->split; j++) {
+				strbuf_reset(&buf);
+				strbuf_addstr(&buf, pack_basename(geometry->pack[j]));
+				strbuf_strip_suffix(&buf, ".pack");
+				strbuf_addstr(&buf, ".idx");
+
+				if (!strcmp(pack_name, buf.buf)) {
+					strbuf_release(&buf);
+					break;
+				}
+			}
+
+			strbuf_release(&buf);
+
+			if (j < geometry->split)
+				continue;
+		} else {
+			struct string_list_item *item;
+
+			item = string_list_lookup(&existing->non_kept_packs,
+						  pack_name);
+			if (item && !pack_is_marked_for_deletion(item))
+				continue;
+		}
+
+		/*
+		 * If we got to this point, the MIDX includes some pack that we
+		 * don't know about.
+		 */
+		return 1;
+	}
+
+	return 0;
+}
+
 struct midx_snapshot_ref_data {
 	struct tempfile *f;
 	struct oidset seen;
@@ -758,6 +834,8 @@ static void midx_snapshot_refs(struct tempfile *f)
 
 static void midx_included_packs(struct string_list *include,
 				struct existing_packs *existing,
+				char **midx_pack_names,
+				size_t midx_pack_names_nr,
 				struct string_list *names,
 				struct pack_geometry *geometry)
 {
@@ -811,26 +889,56 @@ static void midx_included_packs(struct string_list *include,
 		}
 	}
 
-	for_each_string_list_item(item, &existing->cruft_packs) {
+	if (midx_must_contain_cruft ||
+	    midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
+				   include, geometry, existing)) {
 		/*
-		 * When doing a --geometric repack, there is no need to check
-		 * for deleted packs, since we're by definition not doing an
-		 * ALL_INTO_ONE repack (hence no packs will be deleted).
-		 * Otherwise we must check for and exclude any packs which are
-		 * enqueued for deletion.
+		 * If there are one or more unknown pack(s) present (see
+		 * midx_has_unknown_packs() for what makes a pack
+		 * "unknown") in the MIDX before the repack, keep them
+		 * as they may be required to form a reachability
+		 * closure if the MIDX is bitmapped.
 		 *
-		 * So we could omit the conditional below in the --geometric
-		 * case, but doing so is unnecessary since no packs are marked
-		 * as pending deletion (since we only call
-		 * `mark_packs_for_deletion()` when doing an all-into-one
-		 * repack).
+		 * For example, a cruft pack can be required to form a
+		 * reachability closure if the MIDX is bitmapped and one
+		 * or more of the bitmap's selected commits reaches a
+		 * once-cruft object that was later made reachable.
 		 */
-		if (pack_is_marked_for_deletion(item))
-			continue;
+		for_each_string_list_item(item, &existing->cruft_packs) {
+			/*
+			 * When doing a --geometric repack, there is no
+			 * need to check for deleted packs, since we're
+			 * by definition not doing an ALL_INTO_ONE
+			 * repack (hence no packs will be deleted).
+			 * Otherwise we must check for and exclude any
+			 * packs which are enqueued for deletion.
+			 *
+			 * So we could omit the conditional below in the
+			 * --geometric case, but doing so is unnecessary
+			 *  since no packs are marked as pending
+			 *  deletion (since we only call
+			 *  `mark_packs_for_deletion()` when doing an
+			 *  all-into-one repack).
+			 */
+			if (pack_is_marked_for_deletion(item))
+				continue;
 
-		strbuf_reset(&buf);
-		strbuf_addf(&buf, "%s.idx", item->string);
-		string_list_insert(include, buf.buf);
+			strbuf_reset(&buf);
+			strbuf_addf(&buf, "%s.idx", item->string);
+			string_list_insert(include, buf.buf);
+		}
+	} else {
+		/*
+		 * Modern versions of Git (with the appropriate
+		 * configuration setting) will write new copies of
+		 * once-cruft objects when doing a --geometric repack.
+		 *
+		 * If the MIDX has no cruft pack, new packs written
+		 * during a --geometric repack will not rely on the
+		 * cruft pack to form a reachability closure, so we can
+		 * avoid including them in the MIDX in that case.
+		 */
+		;
 	}
 
 	strbuf_release(&buf);
@@ -1145,6 +1253,8 @@ int cmd_repack(int argc,
 	struct tempfile *refs_snapshot = NULL;
 	int i, ext, ret;
 	int show_progress;
+	char **midx_pack_names = NULL;
+	size_t midx_pack_names_nr = 0;
 
 	/* variables to be filled by option parsing */
 	int delete_redundant = 0;
@@ -1361,7 +1471,10 @@ int cmd_repack(int argc,
 		    !(pack_everything & PACK_CRUFT))
 			strvec_push(&cmd.args, "--pack-loose-unreachable");
 	} else if (geometry.split_factor) {
-		strvec_push(&cmd.args, "--stdin-packs");
+		if (midx_must_contain_cruft)
+			strvec_push(&cmd.args, "--stdin-packs");
+		else
+			strvec_push(&cmd.args, "--stdin-packs=follow");
 		strvec_push(&cmd.args, "--unpacked");
 	} else {
 		strvec_push(&cmd.args, "--unpacked");
@@ -1401,8 +1514,25 @@ int cmd_repack(int argc,
 	if (ret)
 		goto cleanup;
 
-	if (!names.nr && !po_args.quiet)
-		printf_ln(_("Nothing new to pack."));
+	if (!names.nr) {
+		if (!po_args.quiet)
+			printf_ln(_("Nothing new to pack."));
+		/*
+		 * If we didn't write any new packs, the non-cruft packs
+		 * may refer to once-unreachable objects in the cruft
+		 * pack(s).
+		 *
+		 * If there isn't already a MIDX, the one we write
+		 * must include the cruft pack(s), in case the
+		 * non-cruft pack(s) refer to once-cruft objects.
+		 *
+		 * If there is already a MIDX, we can punt here, since
+		 * midx_has_unknown_packs() will make the decision for
+		 * us.
+		 */
+		if (!get_local_multi_pack_index(the_repository))
+			midx_must_contain_cruft = 1;
+	}
 
 	if (pack_everything & PACK_CRUFT) {
 		const char *pack_prefix = find_pack_prefix(packdir, packtmp);
@@ -1483,6 +1613,19 @@ int cmd_repack(int argc,
 
 	string_list_sort(&names);
 
+	if (get_local_multi_pack_index(the_repository)) {
+		struct multi_pack_index *m =
+			get_local_multi_pack_index(the_repository);
+
+		ALLOC_ARRAY(midx_pack_names,
+			    m->num_packs + m->num_packs_in_base);
+
+		for (; m; m = m->base_midx)
+			for (uint32_t i = 0; i < m->num_packs; i++)
+				midx_pack_names[midx_pack_names_nr++] =
+					xstrdup(m->pack_names[i]);
+	}
+
 	close_object_store(the_repository->objects);
 
 	/*
@@ -1524,7 +1667,8 @@ int cmd_repack(int argc,
 
 	if (write_midx) {
 		struct string_list include = STRING_LIST_INIT_DUP;
-		midx_included_packs(&include, &existing, &names, &geometry);
+		midx_included_packs(&include, &existing, midx_pack_names,
+				    midx_pack_names_nr, &names, &geometry);
 
 		ret = write_midx_included_packs(&include, &geometry, &names,
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
@@ -1575,6 +1719,9 @@ int cmd_repack(int argc,
 	string_list_clear(&names, 1);
 	existing_packs_release(&existing);
 	free_pack_geometry(&geometry);
+	for (size_t i = 0; i < midx_pack_names_nr; i++)
+		free(midx_pack_names[i]);
+	free(midx_pack_names);
 	pack_objects_args_release(&po_args);
 	pack_objects_args_release(&cruft_po_args);
 
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 8aebfb45f5..aa2e2e6ad8 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -724,4 +724,149 @@ test_expect_success 'cruft repack respects --quiet' '
 	)
 '
 
+setup_cruft_exclude_tests() {
+	git init "$1" &&
+	(
+		cd "$1" &&
+
+		git config repack.midxMustContainCruft false &&
+
+		test_commit one &&
+
+		test_commit --no-tag two &&
+		two="$(git rev-parse HEAD)" &&
+		test_commit --no-tag three &&
+		three="$(git rev-parse HEAD)" &&
+		git reset --hard one &&
+		git reflog expire --all --expire=all &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack --cruft -d &&
+
+		git merge $two &&
+		test_commit four
+	)
+}
+
+test_expect_success 'repack --write-midx excludes cruft where possible' '
+	setup_cruft_exclude_tests exclude-cruft-when-possible &&
+	(
+		cd exclude-cruft-when-possible &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep ! "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git rev-list --all --objects --no-object-names >reachable.raw &&
+		sort reachable.raw >reachable.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp reachable.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when instructed' '
+	setup_cruft_exclude_tests exclude-cruft-when-instructed &&
+	(
+		cd exclude-cruft-when-instructed &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git -c repack.midxMustContainCruft=true repack \
+			-d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		cruft="$(ls $packdir/*.mtimes)" &&
+		test_grep "$(basename "$cruft" .mtimes).idx" midx &&
+
+		git cat-file --batch-check="%(objectname)" --batch-all-objects \
+			>all.objects &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+
+		test_cmp all.objects midx.objects
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when necessary' '
+	setup_cruft_exclude_tests exclude-cruft-when-necessary &&
+	(
+		cd exclude-cruft-when-necessary &&
+
+		test_path_is_file $(ls $packdir/pack-*.mtimes) &&
+		( cd $packdir && ls pack-*.idx ) | sort >packs.all &&
+		git multi-pack-index write --stdin-packs --bitmap <packs.all &&
+
+		test_commit five &&
+		GIT_TEST_MULTI_PACK_INDEX=0 \
+		git repack -d --geometric=2 --write-midx --write-bitmap-index &&
+
+		test-tool read-midx --show-objects $objdir >midx &&
+		awk "/\.pack$/ { print \$1 }" <midx | sort >midx.objects &&
+		git cat-file --batch-all-objects --batch-check="%(objectname)" \
+			>expect.objects &&
+		test_cmp expect.objects midx.objects &&
+
+		grep "^pack-" midx >midx.packs &&
+		test_line_count = "$(($(wc -l <packs.all) + 1))" midx.packs
+	)
+'
+
+test_expect_success 'repack --write-midx includes cruft when already geometric' '
+	git init repack--write-midx-geometric-noop &&
+	(
+		cd repack--write-midx-geometric-noop &&
+
+		git branch -M main &&
+		test_commit A &&
+		test_commit B &&
+
+		git checkout -B side &&
+		test_commit --no-tag C &&
+		C="$(git rev-parse HEAD)" &&
+
+		git checkout main &&
+		git branch -D side &&
+		git reflog expire --all --expire=all &&
+
+		# At this point we have two packs: one containing the
+		# objects belonging to commits A and B, and another
+		# (cruft) pack containing the objects belonging to
+		# commit C.
+		git repack --cruft -d &&
+
+		# Create a third pack which contains a merge commit
+		# making commit C reachable again.
+		#
+		# --no-ff is important here, as it ensures that we
+		# actually write a new object and subsequently a new
+		# pack to contain it.
+		git merge --no-ff $C &&
+		git repack -d &&
+
+		ls $packdir/pack-*.idx | sort >packs.all &&
+		cruft="$(ls $packdir/pack-*.mtimes)" &&
+		cruft="${cruft%.mtimes}.idx" &&
+
+		for idx in $(grep -v $cruft <packs.all)
+		do
+			git show-index <$idx >out &&
+			wc -l <out || return 1
+		done >sizes.raw &&
+
+		# Make sure that there are two non-cruft packs, and
+		# that one of them contains at least twice as many
+		# objects as the other, ensuring that they are already
+		# in a geometric progression.
+		sort -n sizes.raw >sizes &&
+		test_line_count = 2 sizes &&
+		s1=$(head -n 1 sizes) &&
+		s2=$(tail -n 1 sizes) &&
+		test "$s2" -gt "$((2 * $s1))" &&
+
+		git -c repack.midxMustContainCruft=false repack --geometric=2 \
+			--write-midx --write-bitmap-index
+	)
+'
+
 test_done
-- 
2.50.0.61.g1981e40f2d

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH v6 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()'
  2025-06-23 22:32   ` [PATCH v6 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
@ 2025-06-23 22:49     ` Junio C Hamano
  0 siblings, 0 replies; 105+ messages in thread
From: Junio C Hamano @ 2025-06-23 22:49 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> In add_object_entry_from_pack() we declare 'revs' (given to us through
> the miscellaneous context argument) earlier in the "if (p)" conditional
> than is necessary.  Move it down as far as it can go to reduce its
> scope.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index e7274e0e00..d04a36a6bf 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3725,7 +3725,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>  		return 0;
>  
>  	if (p) {
> -		struct rev_info *revs = _data;
>  		struct object_info oi = OBJECT_INFO_INIT;
>  
>  		oi.typep = &type;
> @@ -3733,6 +3732,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>  			die(_("could not get type of object %s in pack %s"),
>  			    oid_to_hex(oid), p->pack_name);
>  		} else if (type == OBJ_COMMIT) {
> +			struct rev_info *revs = _data;

Nice.  This block is the only one that needs this variable.  Makes sense.

>  			/*
>  			 * commits in included packs are used as starting points for the
>  			 * subsequent revision walk

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v6 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  2025-06-23 22:32   ` [PATCH v6 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
@ 2025-06-23 22:59     ` Junio C Hamano
  0 siblings, 0 replies; 105+ messages in thread
From: Junio C Hamano @ 2025-06-23 22:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
> on each of the input packs, we do a reachability traversal to discover
> names for any objects we picked up so we can generate name hash values
> and hopefully get higher quality deltas as a result.
>
> A future commit will change the purpose of this reachability traversal
> to find and pack objects which are reachable from commits in the input
> packs, but are packed in an unknown (not included nor excluded) pack.
>
> Extract the code which initializes and performs the reachability
> traversal to take place in the caller, not the callee, which prepares us
> to share this code for the '--unpacked' case (see the function
> add_unreachable_loose_objects() for more details).
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 71 +++++++++++++++++++++---------------------
>  1 file changed, 36 insertions(+), 35 deletions(-)

Makes sense.  

Another forward declaration of add_unreachable_loose_objects(),
after one was already added in the previous step, confused me a bit,
but this step is merely moving that a bit higher, so there is
nothing funny here.  Looking good.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v6 5/9] pack-objects: perform name-hash traversal for unpacked objects
  2025-06-23 22:32   ` [PATCH v6 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
@ 2025-06-23 23:08     ` Junio C Hamano
  2025-06-24 16:08       ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Junio C Hamano @ 2025-06-23 23:08 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> Now that the 'rev_info' struct is declared outside of
> `read_packs_list_from_stdin()`, we can pass it to
> `add_objects_in_unpacked_packs()` and add any loose objects as tips to
> the above-mentioned traversal, in theory producing slightly tighter
> packs as a result.

So the idea is to pretend any and all loose commits as if they are
at the tip of branches?  By doing so, we ensure each of the tree and
blob objects contained in them has a reasonable path-from-the-root?

> @@ -4325,6 +4326,10 @@ static int add_loose_object(const struct object_id *oid, const char *path,
>  	} else {
>  		add_object_entry(oid, type, "", 0);
>  	}
> +
> +	if (revs && type == OBJ_COMMIT)
> +		add_pending_oid(revs, NULL, oid, 0);
> +
>  	return 0;
>  }

OK.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow'
  2025-06-23 22:32   ` [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
@ 2025-06-23 23:35     ` Junio C Hamano
  2025-06-24 16:10       ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Junio C Hamano @ 2025-06-23 23:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

>  static void show_object_pack_hint(struct object *object, const char *name,
> -				  void *data UNUSED)
> +				  void *data)
>  {
> -	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
> -	if (!oe)
> -		return;
> +	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
> +	if (mode == STDIN_PACKS_MODE_FOLLOW) {
> +		if (object->type == OBJ_BLOB &&
> +		    !has_object(the_repository, &object->oid, 0))
> +			return;

So, --stdin-packs opened a pack and is feeding the objects contained
in it to this machinery.  show_commit_pack_hint() calls this
function in the `follow` mode.  How would such an object be missing?
Ah, lazy clones.  OK.

> +		add_object_entry(&object->oid, object->type, name, 0);
> +	} else {

And only up to this point is the new code.  The "else" clause is
just the original indented one-level deeper.

> +static void show_commit_pack_hint(struct commit *commit, void *data)
>  {
> +	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
> +
> +	if (mode == STDIN_PACKS_MODE_FOLLOW) {
> +		show_object_pack_hint((struct object *)commit, "", data);
> +		return;
> +	}
> +
>  	/* nothing to do; commits don't have a namehash */
> +
>  }

What is this new blank line doing here?



^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-06-23 18:47       ` Taylor Blau
@ 2025-06-24 10:54         ` Jeff King
  2025-06-24 16:05           ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Jeff King @ 2025-06-24 10:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Mon, Jun 23, 2025 at 02:47:29PM -0400, Taylor Blau wrote:

> > This test (but none of the others) fails when run with:
> >
> >   GIT_TEST_MULTI_PACK_INDEX=1 \
> >   GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=1 \
> >   ./t7704-repack-cruft.sh
> >
> > The culprit is the incremental flag, but you need the first one for the
> > second to do anything. The issue is that the cruft pack unexpectedly
> > appears in the midx:
> >
> > [...]
> >
> > I'm not sure if it's just a funky interaction with the hacky GIT_TEST_*
> > variables, or if it's a real bug.
> 
> Thanks for spotting. This is definitely a real bug. The root cause here
> is that our loop to gather the set of packs we know are in the MIDX does
> not account for multi-layered / incremental MIDXs.
> 
> In our example, if there's a cruft pack in any other layer of a MIDX
> besides the tip, the proposed implementation here won't realize it, and
> thus (incorrectly) conclude that the cruft pack is not in the MIDX
> already, so can thusly be omitted.

Ah, right, that makes perfect sense.

> If we do this on top:
> 
> --- 8< ---
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 346d44fbcd..8d1540a0fd 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -1614,13 +1614,16 @@ int cmd_repack(int argc,
>  	string_list_sort(&names);
> 
>  	if (get_local_multi_pack_index(the_repository)) {
> -		uint32_t i;
>  		struct multi_pack_index *m =
>  			get_local_multi_pack_index(the_repository);
> 
> -		ALLOC_ARRAY(midx_pack_names, m->num_packs);
> -		for (i = 0; i < m->num_packs; i++)
> -			midx_pack_names[midx_pack_names_nr++] = xstrdup(m->pack_names[i]);
> +		ALLOC_ARRAY(midx_pack_names,
> +			    m->num_packs + m->num_packs_in_base);
> +
> +		for (; m; m = m->base_midx)
> +			for (uint32_t i = 0; i < m->num_packs; i++)
> +				midx_pack_names[midx_pack_names_nr++] =
> +					xstrdup(m->pack_names[i]);
>  	}
> 
>  	close_object_store(the_repository->objects);
> --- >8 ---

And this fix looks reasonable to me. It is a bit unfortunate that the
incremental midx concept bleeds all the way out to callers like this,
because it means we might have the same problem in other spots. But that
is nothing new, and I'm not sure of a good solution. If the
public-facing API pretended as if "struct multi_pack_midx" contained the
packs for all of the sub-midx entries of the chain, that would solve it.
But then all of the internal parts of the code that look at the
incremental entries would need a separate representation. And I suspect
there's a lot more code in that latter group than the former (most
callers won't be this intimate with the midx, and just want to convert
an oid to a pack/offset pair).

Would we want a test to cover this case? We do catch it in the
linux-TEST-vars build, but it might be nice to have coverage in normal
test runs. I'm not sure how much of a pain that would be.

-Peff

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v6 1/9] pack-objects: use standard option incompatibility functions
  2025-06-23 22:32   ` [PATCH v6 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
@ 2025-06-24 15:52     ` Junio C Hamano
  2025-06-24 16:06       ` Taylor Blau
  0 siblings, 1 reply; 105+ messages in thread
From: Junio C Hamano @ 2025-06-24 15:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> @@ -5050,13 +5051,14 @@ int cmd_pack_objects(int argc,
>  	if (!pack_to_stdout && thin)
>  		die(_("--thin cannot be used to build an indexable pack"));
>  
> -	if (keep_unreachable && unpack_unreachable)
> -		die(_("options '%s' and '%s' cannot be used together"), "--keep-unreachable", "--unpack-unreachable");
> +	die_for_incompatible_opt2(keep_unreachable, "--keep-unreachable",
> +				  unpack_unreachable, "--unpack-unreachable");
>  	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
>  		unpack_unreachable_expiration = 0;
>  
> -	if (stdin_packs && filter_options.choice)
> -		die(_("cannot use --filter with --stdin-packs"));
> +	die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
> +				  filter_options.choice, "--filter");
> +
>  

We do not need two blank lines here, do we?

> @@ -5064,8 +5066,8 @@ int cmd_pack_objects(int argc,
>  	if (cruft) {
>  		if (use_internal_rev_list)
>  			die(_("cannot use internal rev list with --cruft"));
> -		if (stdin_packs)
> -			die(_("cannot use --stdin-packs with --cruft"));
> +		die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
> +					  cruft, "--cruft");
>  	}
>  
>  	/*
> diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
> index b48c0cbe8f..8fd07deb8d 100755
> --- a/t/t5331-pack-objects-stdin.sh
> +++ b/t/t5331-pack-objects-stdin.sh
> @@ -64,7 +64,7 @@ test_expect_success '--stdin-packs is incompatible with --filter' '
>  		cd stdin-packs &&
>  		test_must_fail git pack-objects --stdin-packs --stdout \
>  			--filter=blob:none </dev/null 2>err &&
> -		test_grep "cannot use --filter with --stdin-packs" err
> +		test_grep "options .--stdin-packs. and .--filter. cannot be used together" err

OK.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible
  2025-06-24 10:54         ` Jeff King
@ 2025-06-24 16:05           ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-24 16:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Tue, Jun 24, 2025 at 06:54:47AM -0400, Jeff King wrote:
> > If we do this on top:
> >
> > --- 8< ---
> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index 346d44fbcd..8d1540a0fd 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
> > @@ -1614,13 +1614,16 @@ int cmd_repack(int argc,
> >  	string_list_sort(&names);
> >
> >  	if (get_local_multi_pack_index(the_repository)) {
> > -		uint32_t i;
> >  		struct multi_pack_index *m =
> >  			get_local_multi_pack_index(the_repository);
> >
> > -		ALLOC_ARRAY(midx_pack_names, m->num_packs);
> > -		for (i = 0; i < m->num_packs; i++)
> > -			midx_pack_names[midx_pack_names_nr++] = xstrdup(m->pack_names[i]);
> > +		ALLOC_ARRAY(midx_pack_names,
> > +			    m->num_packs + m->num_packs_in_base);
> > +
> > +		for (; m; m = m->base_midx)
> > +			for (uint32_t i = 0; i < m->num_packs; i++)
> > +				midx_pack_names[midx_pack_names_nr++] =
> > +					xstrdup(m->pack_names[i]);
> >  	}
> >
> >  	close_object_store(the_repository->objects);
> > --- >8 ---
>
> And this fix looks reasonable to me. It is a bit unfortunate that the
> incremental midx concept bleeds all the way out to callers like this,
> because it means we might have the same problem in other spots. But that
> is nothing new, and I'm not sure of a good solution. If the
> public-facing API pretended as if "struct multi_pack_midx" contained the
> packs for all of the sub-midx entries of the chain, that would solve it.
> But then all of the internal parts of the code that look at the
> incremental entries would need a separate representation. And I suspect
> there's a lot more code in that latter group than the former (most
> callers won't be this intimate with the midx, and just want to convert
> an oid to a pack/offset pair).
>
> Would we want a test to cover this case? We do catch it in the
> linux-TEST-vars build, but it might be nice to have coverage in normal
> test runs. I'm not sure how much of a pain that would be.

I thought quite a bit about this and decided against it. The extra test
would really just be this on top:

--- 8< ---
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index aa2e2e6ad8..9b71387325 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -842,7 +842,9 @@ test_expect_success 'repack --write-midx includes cruft when already geometric'
 		# actually write a new object and subsequently a new
 		# pack to contain it.
 		git merge --no-ff $C &&
-		git repack -d &&
+		GIT_TEST_MULTI_PACK_INDEX=1 \
+		GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=1 \
+			git repack -d &&

 		ls $packdir/pack-*.idx | sort >packs.all &&
 		cruft="$(ls $packdir/pack-*.mtimes)" &&
--- >8 ---

, to force us to put the cruft pack in an earlier MIDX layer. But that
felt like making this test too-specific to incremental MIDXs when the
original test has very little to do with incremental- vs non-incremental
MIDXs.

I tried to write a smaller test case that demonstrates the problem but
couldn't find a straightforward way to minimize the reproduction. As an
alternative, we could duplicate and/or parameterize the test entirely,
but that felt like overkill.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH v6 1/9] pack-objects: use standard option incompatibility functions
  2025-06-24 15:52     ` Junio C Hamano
@ 2025-06-24 16:06       ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-24 16:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Jeff King

On Tue, Jun 24, 2025 at 08:52:59AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > @@ -5050,13 +5051,14 @@ int cmd_pack_objects(int argc,
> >  	if (!pack_to_stdout && thin)
> >  		die(_("--thin cannot be used to build an indexable pack"));
> >
> > -	if (keep_unreachable && unpack_unreachable)
> > -		die(_("options '%s' and '%s' cannot be used together"), "--keep-unreachable", "--unpack-unreachable");
> > +	die_for_incompatible_opt2(keep_unreachable, "--keep-unreachable",
> > +				  unpack_unreachable, "--unpack-unreachable");
> >  	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
> >  		unpack_unreachable_expiration = 0;
> >
> > -	if (stdin_packs && filter_options.choice)
> > -		die(_("cannot use --filter with --stdin-packs"));
> > +	die_for_incompatible_opt2(stdin_packs, "--stdin-packs",
> > +				  filter_options.choice, "--filter");
> > +
> >
>
> We do not need two blank lines here, do we?

Yeah, it looks like an extra one snuck in and I missed it when
proof-reading.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v6 5/9] pack-objects: perform name-hash traversal for unpacked objects
  2025-06-23 23:08     ` Junio C Hamano
@ 2025-06-24 16:08       ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-24 16:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Jeff King

On Mon, Jun 23, 2025 at 04:08:29PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Now that the 'rev_info' struct is declared outside of
> > `read_packs_list_from_stdin()`, we can pass it to
> > `add_objects_in_unpacked_packs()` and add any loose objects as tips to
> > the above-mentioned traversal, in theory producing slightly tighter
> > packs as a result.
>
> So the idea is to pretend any and all loose commits as if they are
> at the tip of branches?  By doing so, we ensure each of the tree and
> blob objects contained in them has a reasonable path-from-the-root?

That's right. We had previously only considered commit objects in the
pack(s) being combined as possible traversal tips, but this change
causes us to do the same for loose commit objects as well.

I do kind of wonder how much of a difference this makes on delta quality
overall, and suspect that it is highly workflow-specific and likely very
difficult to measure in general.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow'
  2025-06-23 23:35     ` Junio C Hamano
@ 2025-06-24 16:10       ` Taylor Blau
  0 siblings, 0 replies; 105+ messages in thread
From: Taylor Blau @ 2025-06-24 16:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Jeff King

On Mon, Jun 23, 2025 at 04:35:44PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >  static void show_object_pack_hint(struct object *object, const char *name,
> > -				  void *data UNUSED)
> > +				  void *data)
> >  {
> > -	struct object_entry *oe = packlist_find(&to_pack, &object->oid);
> > -	if (!oe)
> > -		return;
> > +	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
> > +	if (mode == STDIN_PACKS_MODE_FOLLOW) {
> > +		if (object->type == OBJ_BLOB &&
> > +		    !has_object(the_repository, &object->oid, 0))
> > +			return;
>
> So, --stdin-packs opened a pack and is feeding the objects contained
> in it to this machinery.  show_commit_pack_hint() calls this
> function in the `follow` mode.  How would such an object be missing?
> Ah, lazy clones.  OK.

That is one such place, but another would be that the object is part of
some unreachable portion of the repository and points at another
unreachable object that is missing. We take care to accommodate those
holes in the unreachable object graph when generating cruft packs, and
AFAIK in general tolerate broken links and/or missing objects provided
they are unreachable.

> > +		add_object_entry(&object->oid, object->type, name, 0);
> > +	} else {
>
> And only up to this point is the new code.  The "else" clause is
> just the original indented one-level deeper.

Right.

> > +static void show_commit_pack_hint(struct commit *commit, void *data)
> >  {
> > +	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
> > +
> > +	if (mode == STDIN_PACKS_MODE_FOLLOW) {
> > +		show_object_pack_hint((struct object *)commit, "", data);
> > +		return;
> > +	}
> > +
> >  	/* nothing to do; commits don't have a namehash */
> > +
> >  }
>
> What is this new blank line doing here?

Weird, this one evaded my proof-reading as well. Sorry about that -- I
can send a new round with this and the other spot fixed up if you want
one.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 105+ messages in thread

end of thread, other threads:[~2025-06-24 16:10 UTC | newest]

Thread overview: 105+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 2/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 3/8] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 4/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 5/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 6/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 7/8] repack: keep track of existing MIDX'd packs Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-04-14 20:06   ` [PATCH v2 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
2025-04-14 20:41     ` Junio C Hamano
2025-04-15 19:32       ` Taylor Blau
2025-04-15 19:48         ` Junio C Hamano
2025-04-15 22:27           ` Taylor Blau
2025-04-14 20:06   ` [PATCH v2 2/8] object-store-ll.h: add note about designated initializers Taylor Blau
2025-04-14 21:07     ` Junio C Hamano
2025-04-15 19:51       ` Taylor Blau
2025-04-15  2:57     ` Elijah Newren
2025-04-15 19:47       ` Taylor Blau
2025-04-14 20:06   ` [PATCH v2 3/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-04-15  3:10     ` Elijah Newren
2025-04-14 20:06   ` [PATCH v2 4/8] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-04-14 20:06   ` [PATCH v2 5/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-04-14 20:06   ` [PATCH v2 6/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-04-15  3:10     ` Elijah Newren
2025-04-15 19:57       ` Taylor Blau
2025-04-14 20:06   ` [PATCH v2 7/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-04-15  3:11     ` Elijah Newren
2025-04-15 20:45       ` Taylor Blau
2025-04-16  5:26         ` Elijah Newren
2025-04-14 20:06   ` [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-04-15  3:11     ` Elijah Newren
2025-04-15 20:51       ` Taylor Blau
2025-04-15  2:57   ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Elijah Newren
2025-04-15 22:05     ` Taylor Blau
2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
2025-04-15 22:46   ` [PATCH v3 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-04-15 22:46   ` [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-04-16  0:58     ` Junio C Hamano
2025-04-16 22:07       ` Taylor Blau
2025-04-16  5:31     ` Elijah Newren
2025-04-16 22:07       ` Taylor Blau
2025-04-15 22:46   ` [PATCH v3 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-04-16  0:59     ` Junio C Hamano
2025-04-15 22:46   ` [PATCH v3 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-04-15 22:47   ` [PATCH v3 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-04-16  9:21     ` Junio C Hamano
2025-04-15 22:47   ` [PATCH v3 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-04-16  5:36     ` Elijah Newren
2025-04-15 22:47   ` [PATCH v3 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-04-15 22:47   ` [PATCH v3 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-04-15 22:47   ` [PATCH v3 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-04-16  5:56     ` Elijah Newren
2025-04-16 22:16       ` Taylor Blau
2025-05-13  3:34         ` Elijah Newren
2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-05-28 23:20   ` [PATCH v4 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-05-28 23:20   ` [PATCH v4 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-05-28 23:20   ` [PATCH v4 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-05-28 23:20   ` [PATCH v4 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-05-28 23:20   ` [PATCH v4 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-05-28 23:20   ` [PATCH v4 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-05-28 23:20   ` [PATCH v4 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-05-28 23:20   ` [PATCH v4 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-05-28 23:20   ` [PATCH v4 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-06-19 11:33     ` Carlo Marcelo Arenas Belón
2025-06-19 13:08     ` [PATCH] fixup! " Carlo Marcelo Arenas Belón
2025-06-19 17:07       ` Junio C Hamano
2025-06-19 23:26         ` Taylor Blau
2025-05-29  0:07   ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-05-29  0:15     ` Elijah Newren
2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
2025-06-19 23:30   ` [PATCH v5 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-06-19 23:30   ` [PATCH v5 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-06-19 23:30   ` [PATCH v5 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-06-19 23:30   ` [PATCH v5 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-06-19 23:30   ` [PATCH v5 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-06-19 23:30   ` [PATCH v5 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-06-19 23:30   ` [PATCH v5 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-06-19 23:30   ` [PATCH v5 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-06-20 15:27     ` Junio C Hamano
2025-06-19 23:30   ` [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-06-21  4:35     ` Jeff King
2025-06-23 18:47       ` Taylor Blau
2025-06-24 10:54         ` Jeff King
2025-06-24 16:05           ` Taylor Blau
2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-06-23 22:32   ` [PATCH v6 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-06-24 15:52     ` Junio C Hamano
2025-06-24 16:06       ` Taylor Blau
2025-06-23 22:32   ` [PATCH v6 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-06-23 22:49     ` Junio C Hamano
2025-06-23 22:32   ` [PATCH v6 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-06-23 22:32   ` [PATCH v6 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-06-23 22:59     ` Junio C Hamano
2025-06-23 22:32   ` [PATCH v6 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-06-23 23:08     ` Junio C Hamano
2025-06-24 16:08       ` Taylor Blau
2025-06-23 22:32   ` [PATCH v6 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-06-23 22:32   ` [PATCH v6 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-06-23 22:32   ` [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-06-23 23:35     ` Junio C Hamano
2025-06-24 16:10       ` Taylor Blau
2025-06-23 22:32   ` [PATCH v6 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).