Git development
 help / color / mirror / Atom feed
* Re: [PATCH RFC v2 2/2] Move libgit.a sources into separate "lib/" directory
From: Junio C Hamano @ 2026-06-26 18:50 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Patrick Steinhardt, git, brian m. carlson, Elijah Newren,
	Derrick Stolee, Phillip Wood
In-Reply-To: <32bb1cf6-1e37-dc0c-dfb2-e78a30763342@gmx.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> -        path: 'compat/vcbuild/vcpkg'
>> +        path: 'lib/compat/vcbuild/vcpkg'
>>      - name: download vcpkg artifacts
>>        uses: git-for-windows/get-azure-pipelines-artifact@v0
>>        with:
>
> Please also adopt:
>
> -- snip --
> From 1d09a51d426bd3592e4f4b0331f7715ab3b5d502 Mon Sep 17 00:00:00 2001
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> Date: Fri, 26 Jun 2026 14:39:19 +0200
> Subject: [PATCH] fixup??? Move libgit.a sources into separate "lib/" directory
>
> Turns out that there was one path that was forgotten to be adjusted.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  .github/workflows/main.yml | 1 +
>  1 file changed, 1 insertion(+)

Thanks.  Queued at the tip of the topic for now, but I trust Patrick
will include/squash it in the next iteration.


^ permalink raw reply

* [RFC PATCH 00/10] repack: combine '--geometric' and '--cruft'
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt

First, a short note. This series is an RFC because I have not had the
chance to review and test it as thoroughly as I normally would, and
because we are deep in the -rc phase.

I wanted to get this series off my backlog since I have decided to leave
GitHub at the end of the month for a new role. I will still be
contributing to Git in my new role (which I will start in the early part
of July), but wanted to get this off my backlog nonetheless.

This series teaches `git repack` how to combine `--geometric` and
`--cruft`.

Today these two modes are mutually exclusive, since `--cruft` implies
`-a`, and `-a` is fundamentally incompatible with `--geometric`. As a
result, repositories have to choose between keeping reachable objects in
a geometric progression of packs and collecting unreachable objects into
cruft packs.

The goal of this series is to to be able to do both simultaneously. When
both options are given, 'git repack' rolls up the selected non-cruft
packs as usual while collecting unreachable objects separately into a
cruft pack. That means a command like

    $ git repack -d --geometric=2 --cruft --combine-cruft-below-size=1G

will keep reachable non-cruft packs in a geometric progression, while
combining sufficiently-small cruft packs (along with newly-discovered
unreachable objects) into a fresh cruft pack.

The series is structured roughly as follows:

 * The first two patches prepare the cruft pack machinery for the later
   changes by making non-kept pack exclusion unconditional and
   extracting a helper for looking up packs in an `existing_packs` list.

 * The next four patches route geometric pack deletion through the
   common `existing_packs` machinery. They mark packs above the
   geometric split as retained, teach incremental-MIDX retention not to
   keep packs that are being rolled up, switch geometric repacks over to
   the common deletion path, and then remove the old geometry-specific
   deletion helper.

 * The next three patches teach `pack-objects` the new pieces needed by
   this mode. The main addition is `--stdin-packs=follow-reachable`,
   which walks from reference tips and includes only reachable objects
   from the selected packs, while still allowing traversal through
   excluded-open packs and stopping at excluded-closed ones. The
   following patch teaches that mode to use `--refs-snapshot`, so that
   `pack-objects` and the MIDX bitmap writer can agree on the same set
   of tips.

 * The final patch wires everything together in `git repack`, including
   teaching the cruft writer how to interpret the geometric split when
   choosing which packs to include or exclude.

Thanks in advance for your review!

Taylor Blau (10):
  repack: unconditionally exclude non-kept packs
  repack: extract `locate_existing_pack()` helper
  repack: mark geometric progression of packs as retained
  repack: teach MIDX retention about geometric rollups
  repack: delete geometric packs via existing_packs
  repack-geometry: drop unused redundant-pack removal
  pack-objects: extract `stdin_packs_add_all_pack_entries()`
  pack-objects: introduce '--stdin-packs=follow-reachable'
  pack-objects: support '--refs-snapshot' with 'follow-reachable'
  repack: support combining '--geometric' with '--cruft'

 Documentation/git-pack-objects.adoc |  25 +++
 Documentation/git-repack.adoc       |  11 ++
 builtin/pack-objects.c              | 276 ++++++++++++++++++++++++----
 builtin/repack.c                    |  38 ++--
 repack-cruft.c                      |  29 ++-
 repack-geometry.c                   |  44 -----
 repack.c                            | 101 +++++++++-
 repack.h                            |  15 +-
 t/t5331-pack-objects-stdin.sh       | 201 ++++++++++++++++++++
 t/t7704-repack-cruft.sh             | 251 +++++++++++++++++++++++++
 10 files changed, 878 insertions(+), 113 deletions(-)


base-commit: ab776a62a78576513ee121424adb19597fbb7613
-- 
2.55.0.rc2.10.g29e31820dce

^ permalink raw reply

* [RFC PATCH 01/10] repack: unconditionally exclude non-kept packs
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1782500507.git.me@ttaylorr.com>

In `write_cruft_pack()`, we handle excluding objects found in non-kept
packs from being included in the cruft pack via two code paths:

 * When using '--combine-cruft-below-size' (provided that we are not
   expiring cruft objects), we use the aptly-named
   `combine_small_cruft_packs()` function.

 * In all other cases, we handle it directly in the 'else' branch of the
   same conditional.

Simplify this by moving the non-kept pack exclusion out of the
conditional entirely, so that non-kept packs are always excluded
regardless of whether we are combining small cruft packs or not.

This is a preparatory refactor for a subsequent change that will use the
pack_geometry struct when available to determine which non-kept packs to
exclude.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 repack-cruft.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/repack-cruft.c b/repack-cruft.c
index 0653e887923..6a040e98017 100644
--- a/repack-cruft.c
+++ b/repack-cruft.c
@@ -9,7 +9,6 @@ static void combine_small_cruft_packs(FILE *in, off_t combine_cruft_below_size,
 {
 	struct packed_git *p;
 	struct strbuf buf = STRBUF_INIT;
-	size_t i;
 
 	repo_for_each_pack(existing->repo, p) {
 		if (!(p->is_cruft && p->pack_local))
@@ -30,10 +29,6 @@ static void combine_small_cruft_packs(FILE *in, off_t combine_cruft_below_size,
 		}
 	}
 
-	for (i = 0; i < existing->non_kept_packs.nr; i++)
-		fprintf(in, "-%s.pack\n",
-			existing->non_kept_packs.items[i].string);
-
 	strbuf_release(&buf);
 }
 
@@ -80,15 +75,14 @@ int write_cruft_pack(const struct write_pack_opts *opts,
 	in = xfdopen(cmd.in, "w");
 	for_each_string_list_item(item, names)
 		fprintf(in, "%s-%s.pack\n", pack_prefix, item->string);
-	if (combine_cruft_below_size && !cruft_expiration) {
+	if (combine_cruft_below_size && !cruft_expiration)
 		combine_small_cruft_packs(in, combine_cruft_below_size,
 					  existing);
-	} else {
-		for_each_string_list_item(item, &existing->non_kept_packs)
-			fprintf(in, "-%s.pack\n", item->string);
+	else
 		for_each_string_list_item(item, &existing->cruft_packs)
 			fprintf(in, "-%s.pack\n", item->string);
-	}
+	for_each_string_list_item(item, &existing->non_kept_packs)
+		fprintf(in, "-%s.pack\n", item->string);
 	for_each_string_list_item(item, &existing->kept_packs)
 		fprintf(in, "%s.pack\n", item->string);
 	fclose(in);
-- 
2.55.0.rc2.10.g29e31820dce


^ permalink raw reply related

* [RFC PATCH 03/10] repack: mark geometric progression of packs as retained
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1782500507.git.me@ttaylorr.com>

In non-geometric repacks, any packs which repack wishes to delete are
handled via the `existing_packs` struct, which has a mechanism to retain
would-be-deleted packs (e.g., if we happened to write a new pack
identical to one otherwise marked for deletion).

In geometric repacks, repack removes any rewritten packs (alternatively,
any packs which were combined in order to restore a geometric
progression) by enumerating them via `pack_geometry_remove_redundant()`.

Prepare to use the `existing_packs` deletion machinery for geometric
repacks by marking any non-kept packs above the geometric split line as
retained. Do the same for promisor packs, which have their own split
point.

This commit only records which packs the later deletion pass must keep;
it does not change which packs are written or removed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c |  2 ++
 repack.c         | 27 +++++++++++++++++++++++++++
 repack.h         |  3 +++
 3 files changed, 32 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 1524a9c13ad..ce979d86d96 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -325,6 +325,8 @@ int cmd_repack(int argc,
 		}
 		pack_geometry_init(&geometry, &existing, &po_args);
 		pack_geometry_split(&geometry);
+
+		existing_packs_retain_from_geometry(&existing, &geometry);
 	}
 
 	prepare_pack_objects(&cmd, &po_args, packtmp);
diff --git a/repack.c b/repack.c
index 986c74ac7e8..9b3cb425431 100644
--- a/repack.c
+++ b/repack.c
@@ -254,6 +254,33 @@ void existing_packs_retain_cruft(struct existing_packs *existing,
 	existing_packs_mark_retained(item);
 }
 
+static void existing_packs_retain_non_kept(struct existing_packs *existing,
+					   struct packed_git *p)
+{
+	struct string_list_item *item;
+
+	if (!p->pack_local)
+		return;
+
+	item = locate_existing_pack(&existing->non_kept_packs, p);
+	if (!item)
+		BUG("could not find non-kept pack '%s'", pack_basename(p));
+
+	existing_packs_mark_retained(item);
+}
+
+void existing_packs_retain_from_geometry(struct existing_packs *existing,
+					 const struct pack_geometry *geometry)
+{
+	uint32_t i;
+
+	for (i = geometry->split; i < geometry->pack_nr; i++)
+		existing_packs_retain_non_kept(existing, geometry->pack[i]);
+	for (i = geometry->promisor_split; i < geometry->promisor_pack_nr; i++)
+		existing_packs_retain_non_kept(existing,
+					       geometry->promisor_pack[i]);
+}
+
 void existing_packs_mark_for_deletion(struct existing_packs *existing,
 				      struct string_list *names)
 
diff --git a/repack.h b/repack.h
index f9fbc895f02..bb4c944d0cb 100644
--- a/repack.h
+++ b/repack.h
@@ -54,6 +54,7 @@ int finish_pack_objects_cmd(const struct git_hash_algo *algop,
 
 struct repository;
 struct packed_git;
+struct pack_geometry;
 
 struct existing_packs {
 	struct repository *repo;
@@ -82,6 +83,8 @@ int existing_packs_has_non_kept(const struct existing_packs *existing);
 int existing_pack_is_marked_for_deletion(struct string_list_item *item);
 void existing_packs_retain_cruft(struct existing_packs *existing,
 				 struct packed_git *cruft);
+void existing_packs_retain_from_geometry(struct existing_packs *existing,
+					 const struct pack_geometry *geometry);
 void existing_packs_mark_for_deletion(struct existing_packs *existing,
 				      struct string_list *names);
 void existing_packs_retain_midx_packs(struct existing_packs *existing);
-- 
2.55.0.rc2.10.g29e31820dce


^ permalink raw reply related

* [RFC PATCH 04/10] repack: teach MIDX retention about geometric rollups
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1782500507.git.me@ttaylorr.com>

When writing an incremental MIDX, existing_packs_retain_midx_packs()
marks packs in the existing MIDX chain as retained. This keeps them from
being deleted by the later existing_packs deletion pass, since retained
MIDX layers may still refer to those packs.

Geometric repacks need a narrower rule. Packs below the split are rolled
up into the newly-written pack, and should remain eligible for deletion
even if the old MIDX chain mentions them. Packs above the split were
marked as retained by the previous commit.

Teach existing_packs_retain_midx_packs() to skip packs which are part of
the geometric rollup. This does not change the current caller's behavior,
since geometric repacks do not yet use the existing_packs deletion path.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c |  2 +-
 repack.c         | 43 +++++++++++++++++++++++++++++++++++++++++--
 repack.h         |  3 ++-
 3 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index ce979d86d96..66b46b86896 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -576,7 +576,7 @@ int cmd_repack(int argc,
 
 	if (delete_redundant && pack_everything & ALL_INTO_ONE) {
 		if (write_midx == REPACK_WRITE_MIDX_INCREMENTAL)
-			existing_packs_retain_midx_packs(&existing);
+			existing_packs_retain_midx_packs(&existing, &geometry);
 		existing_packs_mark_for_deletion(&existing, &names);
 	}
 
diff --git a/repack.c b/repack.c
index 9b3cb425431..c7b79a3c113 100644
--- a/repack.c
+++ b/repack.c
@@ -292,6 +292,39 @@ void existing_packs_mark_for_deletion(struct existing_packs *existing,
 					   &existing->cruft_packs);
 }
 
+static int pack_geometry_contains_pack(struct packed_git **packs,
+				       uint32_t packs_nr,
+				       const char *base)
+{
+	struct strbuf buf = STRBUF_INIT;
+	uint32_t i;
+
+	for (i = 0; i < packs_nr; i++) {
+		strbuf_reset(&buf);
+		strbuf_addstr(&buf, pack_basename(packs[i]));
+		strbuf_strip_suffix(&buf, ".pack");
+
+		if (!strcmp(buf.buf, base)) {
+			strbuf_release(&buf);
+			return 1;
+		}
+	}
+
+	strbuf_release(&buf);
+	return 0;
+}
+
+static int pack_geometry_contains_rollup(const struct pack_geometry *geometry,
+					 const char *base)
+{
+	if (!geometry || !geometry->split_factor)
+		return 0;
+
+	return pack_geometry_contains_pack(geometry->pack, geometry->split, base) ||
+	       pack_geometry_contains_pack(geometry->promisor_pack,
+					   geometry->promisor_split, base);
+}
+
 /*
  * Mark every pack that is referenced by the existing MIDX chain as
  * retained, so that a subsequent call to
@@ -300,9 +333,12 @@ void existing_packs_mark_for_deletion(struct existing_packs *existing,
  * This is used when writing an incremental MIDX layer on top of an
  * existing chain: retained layers continue to reference the same
  * packs on disk, so those packs must not be unlinked even if the
- * freshly-written pack supersedes them.
+ * freshly-written pack supersedes them. When doing a geometric repack,
+ * packs below the split are rewritten into the new MIDX tip and should
+ * remain eligible for deletion.
  */
-void existing_packs_retain_midx_packs(struct existing_packs *existing)
+void existing_packs_retain_midx_packs(struct existing_packs *existing,
+				      const struct pack_geometry *geometry)
 {
 	struct string_list_item *item;
 	struct strbuf buf = STRBUF_INIT;
@@ -315,6 +351,9 @@ void existing_packs_retain_midx_packs(struct existing_packs *existing)
 		strbuf_strip_suffix(&buf, ".pack");
 		strbuf_strip_suffix(&buf, ".idx");
 
+		if (pack_geometry_contains_rollup(geometry, buf.buf))
+			continue;
+
 		found = string_list_lookup(&existing->non_kept_packs, buf.buf);
 		if (found)
 			existing_packs_mark_retained(found);
diff --git a/repack.h b/repack.h
index bb4c944d0cb..f0d082df9e8 100644
--- a/repack.h
+++ b/repack.h
@@ -87,7 +87,8 @@ void existing_packs_retain_from_geometry(struct existing_packs *existing,
 					 const struct pack_geometry *geometry);
 void existing_packs_mark_for_deletion(struct existing_packs *existing,
 				      struct string_list *names);
-void existing_packs_retain_midx_packs(struct existing_packs *existing);
+void existing_packs_retain_midx_packs(struct existing_packs *existing,
+				      const struct pack_geometry *geometry);
 void existing_packs_remove_redundant(struct existing_packs *existing,
 				     const char *packdir,
 				     bool wrote_incremental_midx);
-- 
2.55.0.rc2.10.g29e31820dce


^ permalink raw reply related

* [RFC PATCH 05/10] repack: delete geometric packs via existing_packs
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1782500507.git.me@ttaylorr.com>

Now that packs above the geometric split are marked as retained, teach
geometric repacks to use the existing_packs deletion machinery instead of
calling pack_geometry_remove_redundant().

This lets geometric repacks share the same mark-then-remove path as
all-into-one repacks: packs below the split are marked for deletion, and
packs above the split are ignored because they were retained earlier.

When doing a geometric repack without --combine-cruft-below-size, retain
all cruft packs before marking anything for deletion. Geometric repacks do
not rewrite cruft packs in that mode, so the common deletion path must not
remove them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c | 11 +++++------
 repack.c         |  8 ++++++++
 repack.h         |  1 +
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 66b46b86896..dfb6fed231d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -574,10 +574,13 @@ int cmd_repack(int argc,
 				       packtmp);
 	/* End of pack replacement. */
 
-	if (delete_redundant && pack_everything & ALL_INTO_ONE) {
+	if (delete_redundant) {
 		if (write_midx == REPACK_WRITE_MIDX_INCREMENTAL)
 			existing_packs_retain_midx_packs(&existing, &geometry);
-		existing_packs_mark_for_deletion(&existing, &names);
+		if (geometry.split_factor && !combine_cruft_below_size)
+			existing_packs_retain_all_cruft(&existing);
+		if (pack_everything & ALL_INTO_ONE || geometry.split_factor)
+			existing_packs_mark_for_deletion(&existing, &names);
 	}
 
 	if (write_midx != REPACK_WRITE_MIDX_NONE) {
@@ -609,10 +612,6 @@ int cmd_repack(int argc,
 		existing_packs_remove_redundant(&existing, packdir,
 						wrote_incremental_midx);
 
-		if (geometry.split_factor)
-			pack_geometry_remove_redundant(&geometry, &names,
-						       &existing, packdir,
-						       wrote_incremental_midx);
 		if (show_progress)
 			opts |= PRUNE_PACKED_VERBOSE;
 		prune_packed_objects(opts);
diff --git a/repack.c b/repack.c
index c7b79a3c113..90797561954 100644
--- a/repack.c
+++ b/repack.c
@@ -242,6 +242,14 @@ static struct string_list_item *locate_existing_pack(struct string_list *list,
 	return item;
 }
 
+void existing_packs_retain_all_cruft(struct existing_packs *existing)
+{
+	struct string_list_item *item;
+
+	for_each_string_list_item(item, &existing->cruft_packs)
+		existing_packs_mark_retained(item);
+}
+
 void existing_packs_retain_cruft(struct existing_packs *existing,
 				 struct packed_git *cruft)
 {
diff --git a/repack.h b/repack.h
index f0d082df9e8..90c89630ef8 100644
--- a/repack.h
+++ b/repack.h
@@ -81,6 +81,7 @@ void existing_packs_collect(struct existing_packs *existing,
 			    const struct string_list *extra_keep);
 int existing_packs_has_non_kept(const struct existing_packs *existing);
 int existing_pack_is_marked_for_deletion(struct string_list_item *item);
+void existing_packs_retain_all_cruft(struct existing_packs *existing);
 void existing_packs_retain_cruft(struct existing_packs *existing,
 				 struct packed_git *cruft);
 void existing_packs_retain_from_geometry(struct existing_packs *existing,
-- 
2.55.0.rc2.10.g29e31820dce


^ permalink raw reply related

* [RFC PATCH 06/10] repack-geometry: drop unused redundant-pack removal
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1782500507.git.me@ttaylorr.com>

The previous commit stopped using pack_geometry_remove_redundant() when
deleting packs after a geometric repack. The existing_packs machinery now
handles the same removal after geometric packs are marked for deletion.

Remove the unused geometry-specific helper and its declaration.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 repack-geometry.c | 44 --------------------------------------------
 repack.h          |  5 -----
 2 files changed, 49 deletions(-)

diff --git a/repack-geometry.c b/repack-geometry.c
index 2064683dcfe..c75fa508612 100644
--- a/repack-geometry.c
+++ b/repack-geometry.c
@@ -245,50 +245,6 @@ struct packed_git *pack_geometry_preferred_pack(struct pack_geometry *geometry)
 	return NULL;
 }
 
-static void remove_redundant_packs(struct packed_git **pack,
-				   uint32_t pack_nr,
-				   struct string_list *names,
-				   struct existing_packs *existing,
-				   const char *packdir,
-				   bool wrote_incremental_midx)
-{
-	const struct git_hash_algo *algop = existing->repo->hash_algo;
-	struct strbuf buf = STRBUF_INIT;
-	uint32_t i;
-
-	for (i = 0; i < pack_nr; i++) {
-		struct packed_git *p = pack[i];
-		if (string_list_has_string(names, hash_to_hex_algop(p->hash,
-								    algop)))
-			continue;
-
-		strbuf_reset(&buf);
-		strbuf_addstr(&buf, pack_basename(p));
-		strbuf_strip_suffix(&buf, ".pack");
-
-		if ((p->pack_keep) ||
-		    (string_list_has_string(&existing->kept_packs, buf.buf)))
-			continue;
-
-		repack_remove_redundant_pack(existing->repo, packdir, buf.buf,
-					     wrote_incremental_midx);
-	}
-
-	strbuf_release(&buf);
-}
-
-void pack_geometry_remove_redundant(struct pack_geometry *geometry,
-				    struct string_list *names,
-				    struct existing_packs *existing,
-				    const char *packdir,
-				    bool wrote_incremental_midx)
-{
-	remove_redundant_packs(geometry->pack, geometry->split,
-			       names, existing, packdir, wrote_incremental_midx);
-	remove_redundant_packs(geometry->promisor_pack, geometry->promisor_split,
-			       names, existing, packdir, wrote_incremental_midx);
-}
-
 void pack_geometry_release(struct pack_geometry *geometry)
 {
 	if (!geometry)
diff --git a/repack.h b/repack.h
index 90c89630ef8..4295829cea0 100644
--- a/repack.h
+++ b/repack.h
@@ -134,11 +134,6 @@ void pack_geometry_init(struct pack_geometry *geometry,
 			const struct pack_objects_args *args);
 void pack_geometry_split(struct pack_geometry *geometry);
 struct packed_git *pack_geometry_preferred_pack(struct pack_geometry *geometry);
-void pack_geometry_remove_redundant(struct pack_geometry *geometry,
-				    struct string_list *names,
-				    struct existing_packs *existing,
-				    const char *packdir,
-				    bool wrote_incremental_midx);
 void pack_geometry_release(struct pack_geometry *geometry);
 
 struct tempfile;
-- 
2.55.0.rc2.10.g29e31820dce


^ permalink raw reply related

* [RFC PATCH 02/10] repack: extract `locate_existing_pack()` helper
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1782500507.git.me@ttaylorr.com>

Factor out the lookup from `existing_packs_retain_cruft()` that converts
a pack basename to a `string_list_item` into a reusable static helper
function, `locate_existing_pack()`.

A subsequent commit will introduce a new function which will need to
perform this same lookup against a different `string_list`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 repack.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/repack.c b/repack.c
index 571dabb665e..986c74ac7e8 100644
--- a/repack.c
+++ b/repack.c
@@ -226,21 +226,32 @@ static void existing_packs_mark_for_deletion_1(const struct git_hash_algo *algop
 	}
 }
 
+static struct string_list_item *locate_existing_pack(struct string_list *list,
+						     struct packed_git *p)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list_item *item;
+
+	strbuf_addstr(&buf, pack_basename(p));
+	strbuf_strip_suffix(&buf, ".pack");
+
+	item = string_list_lookup(list, buf.buf);
+
+	strbuf_release(&buf);
+
+	return item;
+}
+
 void existing_packs_retain_cruft(struct existing_packs *existing,
 				 struct packed_git *cruft)
 {
-	struct strbuf buf = STRBUF_INIT;
 	struct string_list_item *item;
 
-	strbuf_addstr(&buf, pack_basename(cruft));
-	strbuf_strip_suffix(&buf, ".pack");
-
-	item = string_list_lookup(&existing->cruft_packs, buf.buf);
+	item = locate_existing_pack(&existing->cruft_packs, cruft);
 	if (!item)
 		BUG("could not find cruft pack '%s'", pack_basename(cruft));
 
 	existing_packs_mark_retained(item);
-	strbuf_release(&buf);
 }
 
 void existing_packs_mark_for_deletion(struct existing_packs *existing,
-- 
2.55.0.rc2.10.g29e31820dce


^ permalink raw reply related

* [RFC PATCH 07/10] pack-objects: extract `stdin_packs_add_all_pack_entries()`
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1782500507.git.me@ttaylorr.com>

Extract the pack enumeration loop from stdin_packs_add_pack_entries()
into a separate stdin_packs_add_all_pack_entries() helper, and have the
caller dispatch to it based on the stdin_packs_mode.

This prepares for a subsequent commit which will introduce an alternate
code path for '--stdin-packs=follow-reachable' that determines the set
of objects to include via a reachability walk rather than eagerly adding
all objects from included packs.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 49 ++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 27048bbb4dd..29e43abb51e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3933,30 +3933,12 @@ static int stdin_packs_include_check(struct commit *commit, void *data)
 	return stdin_packs_include_check_obj((struct object *)commit, data);
 }
 
-static void stdin_packs_add_pack_entries(struct strmap *packs,
-					 struct rev_info *revs)
+static void stdin_packs_add_all_pack_entries(struct string_list *keys,
+					     struct rev_info *revs)
 {
-	struct string_list keys = STRING_LIST_INIT_NODUP;
 	struct string_list_item *item;
-	struct hashmap_iter iter;
-	struct strmap_entry *entry;
 
-	strmap_for_each_entry(packs, &iter, entry) {
-		struct stdin_pack_info *info = entry->value;
-		if (!info->p)
-			die(_("could not find pack '%s'"), entry->key);
-
-		string_list_append(&keys, entry->key)->util = info;
-	}
-
-	/*
-	 * Order packs by ascending mtime; use QSORT directly to access the
-	 * string_list_item's ->util pointer, which string_list_sort() does not
-	 * provide.
-	 */
-	QSORT(keys.items, keys.nr, pack_mtime_cmp);
-
-	for_each_string_list_item(item, &keys) {
+	for_each_string_list_item(item, keys) {
 		struct stdin_pack_info *info = item->util;
 
 		if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
@@ -3977,6 +3959,31 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
 						revs,
 						ODB_FOR_EACH_OBJECT_PACK_ORDER);
 	}
+}
+
+static void stdin_packs_add_pack_entries(struct strmap *packs,
+					 struct rev_info *revs)
+{
+	struct string_list keys = STRING_LIST_INIT_NODUP;
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	strmap_for_each_entry(packs, &iter, entry) {
+		struct stdin_pack_info *info = entry->value;
+		if (!info->p)
+			die(_("could not find pack '%s'"), entry->key);
+
+		string_list_append(&keys, entry->key)->util = info;
+	}
+
+	/*
+	 * Order packs by ascending mtime; use QSORT directly to access the
+	 * string_list_item's ->util pointer, which string_list_sort() does not
+	 * provide.
+	 */
+	QSORT(keys.items, keys.nr, pack_mtime_cmp);
+
+	stdin_packs_add_all_pack_entries(&keys, revs);
 
 	string_list_clear(&keys, 0);
 }
-- 
2.55.0.rc2.10.g29e31820dce


^ permalink raw reply related

* [RFC PATCH 08/10] pack-objects: introduce '--stdin-packs=follow-reachable'
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1782500507.git.me@ttaylorr.com>

Introduce a new '--stdin-packs=follow-reachable' mode. Like
'--stdin-packs=follow', this mode recognizes the '!' (excluded-open)
pack prefix and halts at '^' (excluded-closed) packs.

Unlike 'follow', which eagerly includes all objects from listed packs
and then walks reachability to rescue additional objects, the new
'follow-reachable' mode uses reference tips as its traversal starting
points and only includes objects that are both reachable AND belong to
an included pack (or are reachable from a commit or tag in one):

 - Objects in included packs: added to the output if reachable.

 - Objects reachable from included-pack commits but in unknown packs:
   added to the output (rescued).

 - Objects in excluded-open ('!') packs: not included, but the traversal
   continues through them.

 - Objects in excluded-closed ('^') packs: not included, and the
   traversal halts.

The implementation uses a two-phase approach:

 1. In the first phase, commits and tags in included packs (and loose,
    when --unpacked is given) are marked with a flag bit
    (IN_INCLUDED_PACK). A commit-only walk from ref tips then identifies
    which marked objects are reachable, halting at excluded-closed
    packs.

 2. In the second phase, every reachable marked object (from the
    previous step) becomes a tip for a full object traversal whose
    `show_object_pack_hint()` and `show_commit_pack_hint()` callbacks
    add discovered objects (obeying the usual constraints imposed by
    `want_object_in_pack()`).

When '--unpacked' is given, reachable loose objects are included in the
output while unreachable loose objects are left alone. This is achieved
by marking loose commits and tags with IN_INCLUDED_PACK during the first
phase, so the pre-walk discovers them naturally.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc |  17 +++
 builtin/pack-objects.c              | 185 +++++++++++++++++++++++--
 t/t5331-pack-objects-stdin.sh       | 201 ++++++++++++++++++++++++++++
 3 files changed, 393 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 8a27aa19fd3..d7b2e39e76c 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -113,6 +113,23 @@ This mode is useful, for example, to resurrect once-unreachable
 objects found in cruft packs to generate packs which are closed under
 reachability up to the boundary set by the excluded packs.
 +
+When `mode` is "follow-reachable", the same pack prefixes are recognized
+as in "follow" (`!` for excluded-open, `^` for excluded-closed). However,
+instead of including all objects from included packs, only objects that
+are reachable from reference tips AND belong to an included pack (or are
+reachable from a commit in one) are included. Objects in excluded-open
+packs are traversed but not included; objects in excluded-closed packs
+halt the traversal.
++
+This mode is designed for geometric repacking with cruft packs, where
+the output pack should contain only reachable objects so that unreachable
+ones can be collected separately.
++
+When `--unpacked` is given alongside `--stdin-packs=follow-reachable`,
+reachable loose objects are also included in the output pack, while
+unreachable loose objects are left alone. This includes both loose
+commits and annotated tag objects.
++
 Incompatible with `--revs`, or options that imply `--revs` (such as
 `--all`), with the exception of `--unpacked`, which is compatible.
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 29e43abb51e..5d96757b645 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -290,6 +290,7 @@ enum stdin_packs_mode {
 	STDIN_PACKS_MODE_NONE,
 	STDIN_PACKS_MODE_STANDARD,
 	STDIN_PACKS_MODE_FOLLOW,
+	STDIN_PACKS_MODE_FOLLOW_REACHABLE,
 };
 
 /**
@@ -3835,7 +3836,8 @@ static void show_object_pack_hint(struct object *object, const char *name,
 				  void *data)
 {
 	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
-	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+	if (mode == STDIN_PACKS_MODE_FOLLOW ||
+	    mode == STDIN_PACKS_MODE_FOLLOW_REACHABLE) {
 		if (object->type == OBJ_BLOB &&
 		    !odb_has_object(the_repository->objects, &object->oid, 0))
 			return;
@@ -3866,7 +3868,8 @@ static void show_commit_pack_hint(struct commit *commit, void *data)
 {
 	enum stdin_packs_mode mode = *(enum stdin_packs_mode *)data;
 
-	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+	if (mode == STDIN_PACKS_MODE_FOLLOW ||
+	    mode == STDIN_PACKS_MODE_FOLLOW_REACHABLE) {
 		show_object_pack_hint((struct object *)commit, "", data);
 		return;
 	}
@@ -3933,6 +3936,156 @@ static int stdin_packs_include_check(struct commit *commit, void *data)
 	return stdin_packs_include_check_obj((struct object *)commit, data);
 }
 
+/*
+ * Flag bit set on commits that belong to an included pack during
+ * '--stdin-packs=follow-reachable'. Used by the pre-walk to
+ * identify which reachable commits should be tips for the main
+ * object traversal.
+ */
+#define IN_INCLUDED_PACK (1u<<11)
+
+static int mark_included_pack_tip(const struct object_id *oid,
+				  struct packed_git *p,
+				  uint32_t pos,
+				  void *data)
+{
+	struct rev_info *main_revs = data;
+	off_t ofs = nth_packed_object_offset(p, pos);
+	enum object_type type;
+	struct object_info oi = OBJECT_INFO_INIT;
+	struct object *obj;
+
+	oi.typep = &type;
+	if (packed_object_info(p, ofs, &oi) < 0)
+		return 0;
+	if (type != OBJ_COMMIT && type != OBJ_TAG)
+		return 0;
+
+	obj = parse_object(the_repository, oid);
+	if (!obj)
+		return 0;
+
+	obj->flags |= IN_INCLUDED_PACK;
+
+	if (type == OBJ_TAG && main_revs)
+		add_pending_object(main_revs, obj, "");
+	return 0;
+}
+
+static int mark_loose_object_tip(const struct object_id *oid,
+				 struct object_info *oi UNUSED,
+				 void *data)
+{
+	struct rev_info *main_revs = data;
+	struct object *obj;
+	enum object_type type;
+
+	type = odb_read_object_info(the_repository->objects, oid, NULL);
+	if (type != OBJ_COMMIT && type != OBJ_TAG)
+		return 0;
+
+	obj = parse_object(the_repository, oid);
+	if (!obj)
+		return 0;
+
+	obj->flags |= IN_INCLUDED_PACK;
+
+	if (type == OBJ_TAG && main_revs)
+		add_pending_object(main_revs, obj, "");
+
+	return 0;
+}
+
+static int add_ref_to_pending(const struct reference *ref, void *cb_data)
+{
+	struct rev_info *revs = cb_data;
+	struct object *object;
+
+	object = parse_object(the_repository, ref->oid);
+	if (!object)
+		return 0;
+
+	add_pending_object(revs, object, "");
+	return 0;
+}
+
+static void stdin_packs_add_reachable_pack_entries(struct string_list *keys,
+						   struct rev_info *revs,
+						   int rev_list_unpacked)
+{
+	struct rev_info pre_walk;
+	struct commit *commit;
+	struct string_list_item *item;
+
+	/*
+	 * Phase 1: mark commits in included packs, then walk from
+	 * ref tips to discover which of them are reachable. The walk
+	 * halts at excluded-closed packs (via no_kept_objects) and
+	 * continues through excluded-open ones.
+	 *
+	 * Also set include_check on the outer revs so that phase 2
+	 * (the main object traversal) halts at closed packs.
+	 */
+	revs->include_check = stdin_packs_include_check;
+	revs->include_check_obj = stdin_packs_include_check_obj;
+
+	for_each_string_list_item(item, keys) {
+		struct stdin_pack_info *info = item->util;
+		if (info->kind & STDIN_PACK_INCLUDE)
+			for_each_object_in_pack(info->p,
+						mark_included_pack_tip,
+						revs,
+						ODB_FOR_EACH_OBJECT_PACK_ORDER);
+	}
+
+	if (rev_list_unpacked) {
+		/*
+		 * With '--stdin-packs=follow-reachable', specifying
+		 * '--unpacked' instructs pack-objects to pack any loose
+		 * objects which are reachable.
+		 *
+		 * Pretend as if all loose objects are in an included
+		 * pack in order to make them eligible for packing.
+		 */
+		struct odb_source *source = revs->repo->objects->sources;
+		for (; source; source = source->next) {
+			struct odb_source_files *files = odb_source_files_downcast(source);
+			struct odb_for_each_object_options opts = { 0 };
+			if (local)
+				opts.flags |= ODB_FOR_EACH_OBJECT_LOCAL_ONLY;
+
+			odb_source_for_each_object(&files->loose->base, NULL,
+						   mark_loose_object_tip,
+						   revs, &opts);
+		}
+	}
+
+	repo_init_revisions(the_repository, &pre_walk, NULL);
+	pre_walk.no_kept_objects = 1;
+	pre_walk.keep_pack_cache_flags |= KEPT_PACK_IN_CORE;
+	pre_walk.ignore_missing_links = 1;
+
+	refs_for_each_ref(get_main_ref_store(the_repository),
+			  add_ref_to_pending, &pre_walk);
+
+	if (prepare_revision_walk(&pre_walk))
+		die(_("revision walk setup failed"));
+
+	/*
+	 * Phase 2 tips: every reachable commit that is in an
+	 * included pack becomes a starting point for the main
+	 * object traversal.
+	 */
+	while ((commit = get_revision(&pre_walk)) != NULL) {
+		if (commit->object.flags & IN_INCLUDED_PACK)
+			add_pending_oid(revs, NULL,
+					&commit->object.oid, 0);
+	}
+
+	reset_revision_walk();
+	release_revisions(&pre_walk);
+}
+
 static void stdin_packs_add_all_pack_entries(struct string_list *keys,
 					     struct rev_info *revs)
 {
@@ -3962,7 +4115,9 @@ static void stdin_packs_add_all_pack_entries(struct string_list *keys,
 }
 
 static void stdin_packs_add_pack_entries(struct strmap *packs,
-					 struct rev_info *revs)
+					 struct rev_info *revs,
+					 enum stdin_packs_mode mode,
+					 int rev_list_unpacked)
 {
 	struct string_list keys = STRING_LIST_INIT_NODUP;
 	struct hashmap_iter iter;
@@ -3983,13 +4138,18 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
 	 */
 	QSORT(keys.items, keys.nr, pack_mtime_cmp);
 
-	stdin_packs_add_all_pack_entries(&keys, revs);
+	if (mode == STDIN_PACKS_MODE_FOLLOW_REACHABLE)
+		stdin_packs_add_reachable_pack_entries(&keys, revs,
+						       rev_list_unpacked);
+	else
+		stdin_packs_add_all_pack_entries(&keys, revs);
 
 	string_list_clear(&keys, 0);
 }
 
 static void stdin_packs_read_input(struct rev_info *revs,
-				   enum stdin_packs_mode mode)
+				   enum stdin_packs_mode mode,
+				   int rev_list_unpacked)
 {
 	struct strbuf buf = STRBUF_INIT;
 	struct strmap packs = STRMAP_INIT;
@@ -4004,7 +4164,9 @@ static void stdin_packs_read_input(struct rev_info *revs,
 			continue;
 		else if (*key == '^')
 			kind = STDIN_PACK_EXCLUDE_CLOSED;
-		else if (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
+		else if (*key == '!' &&
+			 (mode == STDIN_PACKS_MODE_FOLLOW ||
+			  mode == STDIN_PACKS_MODE_FOLLOW_REACHABLE))
 			kind = STDIN_PACK_EXCLUDE_OPEN;
 
 		if (kind != STDIN_PACK_INCLUDE)
@@ -4069,7 +4231,7 @@ static void stdin_packs_read_input(struct rev_info *revs,
 		info->p = p;
 	}
 
-	stdin_packs_add_pack_entries(&packs, revs);
+	stdin_packs_add_pack_entries(&packs, revs, mode, rev_list_unpacked);
 
 	strbuf_release(&buf);
 	strmap_clear(&packs, 1);
@@ -4109,7 +4271,8 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 
 	/* avoids adding objects in excluded packs */
 	ignore_packed_keep_in_core = 1;
-	if (mode == STDIN_PACKS_MODE_FOLLOW) {
+	if (mode == STDIN_PACKS_MODE_FOLLOW ||
+	    mode == STDIN_PACKS_MODE_FOLLOW_REACHABLE) {
 		/*
 		 * In '--stdin-packs=follow' mode, additionally ignore
 		 * objects in excluded-open packs to prevent them from
@@ -4117,8 +4280,8 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 		 */
 		ignore_packed_keep_in_core_open = 1;
 	}
-	stdin_packs_read_input(&revs, mode);
-	if (rev_list_unpacked)
+	stdin_packs_read_input(&revs, mode, rev_list_unpacked);
+	if (rev_list_unpacked && mode != STDIN_PACKS_MODE_FOLLOW_REACHABLE)
 		add_unreachable_loose_objects(&revs);
 
 	if (prepare_revision_walk(&revs))
@@ -5027,6 +5190,8 @@ static int parse_stdin_packs_mode(const struct option *opt, const char *arg,
 		*mode = STDIN_PACKS_MODE_STANDARD;
 	else if (!strcmp(arg, "follow"))
 		*mode = STDIN_PACKS_MODE_FOLLOW;
+	else if (!strcmp(arg, "follow-reachable"))
+		*mode = STDIN_PACKS_MODE_FOLLOW_REACHABLE;
 	else
 		die(_("invalid value for '%s': '%s'"), opt->long_name, arg);
 
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index c74b5861af3..443d855291a 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -520,4 +520,205 @@ test_expect_success '--stdin-packs with !-delimited pack without follow' '
 	)
 '
 
+test_expect_success '--stdin-packs=follow-reachable excludes unreachable objects' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+
+		git branch -M main &&
+
+		# Create the following commit structure:
+		#
+		#   A <-- B <-- C     (main)
+		#         ^
+		#          \
+		#           U         (unreachable, no ref)
+		test_commit A &&
+		test_commit B &&
+		test_commit U &&
+		U_TIP="$(git rev-parse HEAD)" &&
+		git reset --hard HEAD^ &&
+		git tag -d U &&
+		git reflog expire --all --expire=all &&
+
+		test_commit C &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		U="$(echo "$U_TIP" | git pack-objects $packdir/pack)" &&
+
+		git prune-packed &&
+
+		# Include packs A and C, exclude B as open (since B
+		# may not have closure), leave U as unknown.
+		#
+		# With follow-reachable:
+		#  - objects from A and C are included (reachable from
+		#    main, through excluded-open B, and in included
+		#    packs)
+		#  - objects from B are excluded (excluded-open)
+		#  - objects from U are NOT included (not reachable
+		#    from any ref, even though the pack exists)
+		P=$(git pack-objects --stdin-packs=follow-reachable \
+			$packdir/pack <<-EOF
+		pack-$A.pack
+		!pack-$B.pack
+		pack-$C.pack
+		EOF
+		) &&
+
+		objects_in_packs $A $C >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--stdin-packs=follow-reachable with open-excluded packs' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+
+		git branch -M main &&
+
+		# Create the following commit structure:
+		#
+		#   A <-- B <-- C <-- D    (main)
+		#
+		# Pack each commit separately, then use follow-reachable
+		# with B excluded-open and A excluded-closed. Since B is
+		# open, the traversal continues through it, but since A
+		# is closed, it halts there.
+		test_commit A &&
+		test_commit B &&
+		test_commit C &&
+		test_commit D &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+		C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+		D="$(echo C..D | git pack-objects --revs $packdir/pack)" &&
+
+		git prune-packed &&
+
+		# Include C and D, B excluded-open, A excluded-closed.
+		#
+		# The traversal starts at main (D), walks:
+		#  D (included) -> C (included) -> B (open, continue
+		#  but do not include) -> A (closed, halt).
+		#
+		# Objects from C and D are in the output (reachable,
+		# included). B.t is also rescued (reachable via
+		# C^{tree} or similar). A and its objects are NOT
+		# (behind the closed boundary).
+		P=$(git pack-objects --stdin-packs=follow-reachable \
+			$packdir/pack <<-EOF
+		pack-$C.pack
+		pack-$D.pack
+		!pack-$B.pack
+		^pack-$A.pack
+		EOF
+		) &&
+
+		objects_in_packs $C $D >expect &&
+		objects_in_packs $P >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--stdin-packs=follow-reachable with --unpacked and loose objects' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+
+		git branch -M main &&
+
+		test_commit A &&
+		test_commit B &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+
+		git prune-packed &&
+
+		# Create a reachable loose commit on top of B.
+		test_commit C &&
+
+		# Create an unreachable loose object.
+		unreachable="$(echo "unreachable" | git hash-object -w --stdin)" &&
+
+		# Include A and B, no excluded packs. With --unpacked,
+		# the reachable loose objects from C should be included
+		# in the output but the unreachable blob should not.
+		P=$(git pack-objects --stdin-packs=follow-reachable \
+			--unpacked $packdir/pack <<-EOF
+		pack-$A.pack
+		pack-$B.pack
+		EOF
+		) &&
+
+		# The output should contain objects from A, B, and C.
+		{
+			objects_in_packs $A $B &&
+			git rev-list --objects --no-object-names B..C
+		} >expect.raw &&
+		sort expect.raw >expect &&
+
+		objects_in_packs $P >actual &&
+
+		# The unreachable blob should NOT be in the output.
+		! grep $unreachable actual &&
+
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--stdin-packs=follow-reachable with --unpacked and loose annotated tag' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+
+		git branch -M main &&
+
+		test_commit A &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+
+		git prune-packed &&
+
+		# Create a loose annotated tag pointing at A.
+		git tag -a -m "annotated" annotated-tag A &&
+		tag_oid="$(git rev-parse annotated-tag)" &&
+
+		P=$(git pack-objects --stdin-packs=follow-reachable \
+			--unpacked $packdir/pack <<-EOF
+		pack-$A.pack
+		EOF
+		) &&
+
+		# The output should contain objects from A plus the
+		# loose annotated tag object.
+		{
+			objects_in_packs $A &&
+			echo $tag_oid
+		} >expect.raw &&
+		sort expect.raw >expect &&
+
+		objects_in_packs $P >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.55.0.rc2.10.g29e31820dce


^ permalink raw reply related

* [RFC PATCH 09/10] pack-objects: support '--refs-snapshot' with 'follow-reachable'
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1782500507.git.me@ttaylorr.com>

The '--stdin-packs=follow-reachable' mode walks from reference tips to
determine which objects in included packs are reachable. Without a
snapshot, pack-objects discovers refs by iterating live references,
which may change between the time the repack writes the geometric pack
and the time it writes the MIDX bitmap.

If a reference is updated during that window, the set of reachable
objects seen by pack-objects may differ from the set seen by the MIDX
bitmap writer. This can cause reachable objects to end up in the cruft
pack (because pack-objects did not see the reference that makes them
reachable) rather than the geometric pack. While this does not cause
data loss, it has two undesirable consequences:

 - Reachable objects in the cruft pack cannot receive bitmap coverage
   (since the cruft pack may be excluded from the MIDX when
   'repack.midxMustContainCruft' is false).

 - Serving fetches that need those objects requires loading the cruft
   pack, which may contain many unrelated unreachable objects.

To avoid this, teach pack-objects to accept '--refs-snapshot=<path>'
when used with '--stdin-packs=follow-reachable'. The snapshot file uses
the same format as the MIDX bitmap writer: one hex OID per line, with
an optional '+' prefix for preferred bitmap commits.

'pack-objects' happily ignores the '+' prefix for indicating preferred
bitmap commits as a convenience, so that the ref-snapshot can be shared
between the MIDX generation machinery and 'pack-objects'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc |  8 +++++
 builtin/pack-objects.c              | 46 +++++++++++++++++++++++++++--
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index d7b2e39e76c..4ebe407cfaf 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -133,6 +133,14 @@ commits and annotated tag objects.
 Incompatible with `--revs`, or options that imply `--revs` (such as
 `--all`), with the exception of `--unpacked`, which is compatible.
 
+--refs-snapshot=<path>::
+	When used with `--stdin-packs=follow-reachable`, read reference
+	tips from `<path>` instead of iterating live references. The file
+	format is one hex object ID per line, with an optional `+` prefix
+	(for preferred bitmap commits). This ensures a consistent view of
+	references when the same snapshot is shared with other tools (e.g.,
+	the MIDX bitmap writer).
+
 --cruft::
 	Packs unreachable objects into a separate "cruft" pack, denoted
 	by the existence of a `.mtimes` file. Typically used by `git
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5d96757b645..082ff760abc 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -219,6 +219,7 @@ static int incremental;
 static int ignore_packed_keep_on_disk;
 static int ignore_packed_keep_in_core;
 static int ignore_packed_keep_in_core_open;
+static const char *stdin_packs_refs_snapshot;
 static int ignore_packed_keep_in_core_has_cruft;
 static int allow_ofs_delta;
 static struct pack_idx_option pack_idx_opts;
@@ -4009,6 +4010,38 @@ static int add_ref_to_pending(const struct reference *ref, void *cb_data)
 	return 0;
 }
 
+static void read_refs_snapshot(const char *refs_snapshot,
+			      struct rev_info *revs)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct object_id oid;
+	FILE *f = xfopen(refs_snapshot, "r");
+
+	while (strbuf_getline(&buf, f) != EOF) {
+		struct object *object;
+		const char *hex = buf.buf;
+		const char *end = NULL;
+
+		if (*hex == '+')
+			hex++;
+
+		if (parse_oid_hex_algop(hex, &oid, &end,
+					the_repository->hash_algo) < 0)
+			die(_("could not parse line: %s"), buf.buf);
+		if (*end)
+			die(_("malformed line: %s"), buf.buf);
+
+		object = parse_object(the_repository, &oid);
+		if (!object)
+			continue;
+
+		add_pending_object(revs, object, "");
+	}
+
+	fclose(f);
+	strbuf_release(&buf);
+}
+
 static void stdin_packs_add_reachable_pack_entries(struct string_list *keys,
 						   struct rev_info *revs,
 						   int rev_list_unpacked)
@@ -4065,8 +4098,11 @@ static void stdin_packs_add_reachable_pack_entries(struct string_list *keys,
 	pre_walk.keep_pack_cache_flags |= KEPT_PACK_IN_CORE;
 	pre_walk.ignore_missing_links = 1;
 
-	refs_for_each_ref(get_main_ref_store(the_repository),
-			  add_ref_to_pending, &pre_walk);
+	if (stdin_packs_refs_snapshot)
+		read_refs_snapshot(stdin_packs_refs_snapshot, &pre_walk);
+	else
+		refs_for_each_ref(get_main_ref_store(the_repository),
+				  add_ref_to_pending, &pre_walk);
 
 	if (prepare_revision_walk(&pre_walk))
 		die(_("revision walk setup failed"));
@@ -5267,6 +5303,8 @@ int cmd_pack_objects(int argc,
 		OPT_CALLBACK_F(0, "stdin-packs", &stdin_packs, N_("mode"),
 			     N_("read packs from stdin"),
 			     PARSE_OPT_OPTARG, parse_stdin_packs_mode),
+		OPT_FILENAME(0, "refs-snapshot", &stdin_packs_refs_snapshot,
+			     N_("refs snapshot for follow-reachable traversal")),
 		OPT_BOOL(0, "stdout", &pack_to_stdout,
 			 N_("output pack to stdout")),
 		OPT_BOOL(0, "include-tag", &include_tag,
@@ -5484,6 +5522,10 @@ int cmd_pack_objects(int argc,
 	if (stdin_packs && use_internal_rev_list)
 		die(_("cannot use internal rev list with --stdin-packs"));
 
+	if (stdin_packs_refs_snapshot &&
+	    stdin_packs != STDIN_PACKS_MODE_FOLLOW_REACHABLE)
+		die(_("--refs-snapshot can only be used with --stdin-packs=follow-reachable"));
+
 	if (cruft) {
 		if (use_internal_rev_list)
 			die(_("cannot use internal rev list with --cruft"));
-- 
2.55.0.rc2.10.g29e31820dce


^ permalink raw reply related

* [RFC PATCH 10/10] repack: support combining '--geometric' with '--cruft'
From: Taylor Blau @ 2026-06-26 19:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1782500507.git.me@ttaylorr.com>

Teach 'git repack' to accept '--geometric' and '--cruft' together. When
both are given, the geometric repack rolls up non-cruft packs as usual,
and a separate cruft pack is written to collect unreachable objects.

Previously, '--cruft' implied `ALL_INTO_ONE`, which is fundamentally
incompatible with geometric repacking. Relax this so that '--cruft' only
implies `ALL_INTO_ONE` when '--geometric' is not also given.

When combining the two modes:

 - Use the new '--stdin-packs=follow-reachable' mode so that only
   reachable objects from the rolled-up packs (and any reachable loose
   objects) appear in the geometric pack. Unreachable objects are left
   for the cruft writer to collect.

 - Plumb our `pack_geometry` into `write_cruft_pack()`, so that the
   latter can tell 'pack-objects' which non-kept packs are below the
   split (excluded, so their unreachable objects are candidates for the
   cruft pack) versus above the split (included, so they are treated as
   reachable).

 - Handle promisor packs in the cruft writer's geometry path, since
   promisor packs have their own split point.

 - Use the refs snapshot (when available) so that pack-objects and the
   MIDX bitmap writer see the same set of reference tips.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-repack.adoc |  11 ++
 builtin/repack.c              |  23 +++-
 repack-cruft.c                |  23 +++-
 repack.h                      |   3 +-
 t/t7704-repack-cruft.sh       | 251 ++++++++++++++++++++++++++++++++++
 5 files changed, 300 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 72c42015e23..e9df7713278 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -70,6 +70,11 @@ to the new separate pack will be written.
 	are packed into a separate cruft pack. Unreachable objects can
 	be pruned using the normal expiry rules with the next `git gc`
 	invocation (see linkgit:git-gc[1]). Incompatible with `-k`.
++
+When combined with `--geometric`, `--cruft` does not imply `-a`. Instead,
+the geometric repack rolls up packs as usual, and a separate cruft pack is
+written to collect unreachable objects. Only reachable objects from the
+rolled-up packs are included in the resulting geometric pack.
 
 --cruft-expiration=<approxidate>::
 	Expire unreachable objects older than `<approxidate>`
@@ -245,6 +250,12 @@ progression.
 Loose objects are implicitly included in this "roll-up", without respect to
 their reachability. This is subject to change in the future.
 +
+When combined with `--cruft`, only reachable objects from rolled-up packs
+are included in the geometric pack, along with any reachable loose objects.
+Unreachable objects (both from rolled-up packs and loose) are collected
+into a separate cruft pack. Existing cruft packs are retained. See
+`--cruft` above for details.
++
 When writing a multi-pack bitmap, `git repack` selects the largest resulting
 pack as the preferred pack for object selection by the MIDX (see
 linkgit:git-multi-pack-index[1]).
diff --git a/builtin/repack.c b/builtin/repack.c
index dfb6fed231d..165cfff75cd 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -260,7 +260,7 @@ int cmd_repack(int argc,
 				  keep_unreachable, "-k/--keep-unreachable",
 				  pack_everything & PACK_CRUFT, "--cruft");
 
-	if (pack_everything & PACK_CRUFT)
+	if (pack_everything & PACK_CRUFT && !geometry.split_factor)
 		pack_everything |= ALL_INTO_ONE;
 
 	if (write_bitmaps < 0) {
@@ -296,7 +296,8 @@ int cmd_repack(int argc,
 		die(_("invalid value for %s: %d"), "--midx-new-layer-threshold",
 		    config_ctx.midx_new_layer_threshold);
 
-	if (write_midx != REPACK_WRITE_MIDX_NONE && write_bitmaps) {
+	if ((write_midx != REPACK_WRITE_MIDX_NONE && write_bitmaps) ||
+	    (geometry.split_factor && (pack_everything & PACK_CRUFT))) {
 		struct strbuf path = STRBUF_INIT;
 
 		strbuf_addf(&path, "%s/%s_XXXXXX",
@@ -317,7 +318,7 @@ int cmd_repack(int argc,
 	existing_packs_collect(&existing, &keep_pack_list);
 
 	if (geometry.split_factor) {
-		if (pack_everything)
+		if (pack_everything & ~PACK_CRUFT)
 			die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
 		if (write_midx == REPACK_WRITE_MIDX_INCREMENTAL) {
 			geometry.midx_layer_threshold = config_ctx.midx_new_layer_threshold;
@@ -393,10 +394,16 @@ int cmd_repack(int argc,
 		pack_geometry_repack_promisors(repo, &po_args, &geometry,
 					       &names, packtmp);
 
-		if (midx_must_contain_cruft)
+		if (pack_everything & PACK_CRUFT) {
+			strvec_push(&cmd.args, "--stdin-packs=follow-reachable");
+			if (refs_snapshot)
+				strvec_pushf(&cmd.args, "--refs-snapshot=%s",
+					     get_tempfile_path(refs_snapshot));
+		} else if (midx_must_contain_cruft)
 			strvec_push(&cmd.args, "--stdin-packs");
 		else
 			strvec_push(&cmd.args, "--stdin-packs=follow");
+
 		strvec_push(&cmd.args, "--unpacked");
 	} else {
 		strvec_push(&cmd.args, "--unpacked");
@@ -431,7 +438,8 @@ int cmd_repack(int argc,
 			const char *basename = pack_basename(geometry.pack[i]);
 			char marker = '^';
 
-			if (!midx_must_contain_cruft &&
+			if ((pack_everything & PACK_CRUFT ||
+			     !midx_must_contain_cruft) &&
 			    !string_list_has_string(&existing.midx_packs,
 						    basename)) {
 				/*
@@ -505,7 +513,8 @@ int cmd_repack(int argc,
 
 		ret = write_cruft_pack(&opts, cruft_expiration,
 				       combine_cruft_below_size, &names,
-				       &existing);
+				       &existing,
+				       geometry.split_factor ? &geometry : NULL);
 		if (ret)
 			goto cleanup;
 
@@ -540,7 +549,7 @@ int cmd_repack(int argc,
 			 */
 			opts.destination = expire_to;
 			ret = write_cruft_pack(&opts, NULL, 0ul, &names,
-					       &existing);
+					       &existing, NULL);
 			if (ret)
 				goto cleanup;
 		}
diff --git a/repack-cruft.c b/repack-cruft.c
index 6a040e98017..6c553bbb0b5 100644
--- a/repack-cruft.c
+++ b/repack-cruft.c
@@ -36,7 +36,8 @@ int write_cruft_pack(const struct write_pack_opts *opts,
 		     const char *cruft_expiration,
 		     unsigned long combine_cruft_below_size,
 		     struct string_list *names,
-		     struct existing_packs *existing)
+		     struct existing_packs *existing,
+		     struct pack_geometry *geometry)
 {
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct string_list_item *item;
@@ -81,8 +82,24 @@ int write_cruft_pack(const struct write_pack_opts *opts,
 	else
 		for_each_string_list_item(item, &existing->cruft_packs)
 			fprintf(in, "-%s.pack\n", item->string);
-	for_each_string_list_item(item, &existing->non_kept_packs)
-		fprintf(in, "-%s.pack\n", item->string);
+	if (geometry) {
+		uint32_t j;
+		for (j = 0; j < geometry->split; j++)
+			fprintf(in, "-%s\n",
+				pack_basename(geometry->pack[j]));
+		for (; j < geometry->pack_nr; j++)
+			fprintf(in, "%s\n",
+				pack_basename(geometry->pack[j]));
+		for (j = 0; j < geometry->promisor_split; j++)
+			fprintf(in, "-%s\n",
+				pack_basename(geometry->promisor_pack[j]));
+		for (; j < geometry->promisor_pack_nr; j++)
+			fprintf(in, "%s\n",
+				pack_basename(geometry->promisor_pack[j]));
+	} else {
+		for_each_string_list_item(item, &existing->non_kept_packs)
+			fprintf(in, "-%s.pack\n", item->string);
+	}
 	for_each_string_list_item(item, &existing->kept_packs)
 		fprintf(in, "%s.pack\n", item->string);
 	fclose(in);
diff --git a/repack.h b/repack.h
index 4295829cea0..872a503fbd1 100644
--- a/repack.h
+++ b/repack.h
@@ -169,6 +169,7 @@ int write_cruft_pack(const struct write_pack_opts *opts,
 		     const char *cruft_expiration,
 		     unsigned long combine_cruft_below_size,
 		     struct string_list *names,
-		     struct existing_packs *existing);
+		     struct existing_packs *existing,
+		     struct pack_geometry *geometry);
 
 #endif /* REPACK_H */
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 9e03b04315d..5e2b776e7ba 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -891,4 +891,255 @@ test_expect_success 'repack rescues once-cruft objects above geometric split' '
 	git repack --geometric=2 -d --write-midx --write-bitmap-index
 '
 
+test_expect_success 'repack --geometric --cruft combines packs and writes cruft' '
+	git init geometric-cruft-basic &&
+	(
+		cd geometric-cruft-basic &&
+
+		test_commit A &&
+		test_commit B &&
+
+		B="$(git rev-parse B)" &&
+
+		git reset --hard $B^ &&
+		git tag -d B &&
+		git reflog expire --all --expire=all &&
+
+		# Initial state: one non-cruft pack, one cruft pack.
+		git repack -d --cruft &&
+
+		ls $packdir/pack-*.mtimes >cruft.before &&
+		test_line_count = 1 cruft.before &&
+
+		test_commit C &&
+		git repack &&
+
+		# At this point we have three packs:
+		#   - the non-cruft pack from A
+		#   - the cruft pack from B
+		#   - a new non-cruft pack from C
+		#
+		# The two non-cruft packs are not in a geometric
+		# progression, so they should be rolled up.
+		git repack -d --geometric=2 --cruft &&
+
+		# The old cruft pack for B is retained, since the
+		# geometric repack does not touch cruft packs.
+		ls $packdir/pack-*.mtimes >cruft.after &&
+		test_line_count = 1 cruft.after &&
+
+		# Ensure that all reachable objects are present.
+		git fsck
+	)
+'
+
+test_expect_success 'repack --geometric --cruft writes new cruft for loose unreachable' '
+	git init geometric-cruft-new-cruft &&
+	(
+		cd geometric-cruft-new-cruft &&
+
+		git config set maintenance.auto false &&
+
+		test_commit A &&
+		git repack &&
+
+		test_commit B &&
+		git repack &&
+
+		# Create an unreachable commit whose objects are
+		# still loose (never packed).
+		test_commit C &&
+		C="$(git rev-parse C)" &&
+		git reset --hard $C^ &&
+		git tag -d C &&
+		git reflog expire --all --expire=all &&
+
+		# At this point we have two non-cruft packs of
+		# similar size that are not in geometric progression,
+		# and loose unreachable objects from commit C.
+		ls $packdir/pack-*.idx >packs.before &&
+		test_line_count = 2 packs.before &&
+
+		# Geometric+cruft repack should roll up the two
+		# non-cruft packs and write a new cruft pack for C
+		# (whose objects are loose and unreachable).
+		git repack -d --geometric=2 --cruft &&
+
+		ls $packdir/pack-*.mtimes >cruft.after &&
+		test_line_count = 1 cruft.after &&
+
+		git fsck
+	)
+'
+
+test_expect_success 'repack --geometric --cruft -d deletes rolled-up packs' '
+	git init geometric-cruft-delete &&
+	(
+		cd geometric-cruft-delete &&
+
+		test_commit A &&
+		git repack -d &&
+
+		test_commit B &&
+		git repack -d &&
+
+		ls $packdir/pack-*.idx >before &&
+
+		git repack -d --geometric=2 --cruft &&
+
+		# Two packs should have been rolled into one. No cruft
+		# pack is written because there are no unreachable objects.
+		ls $packdir/pack-*.idx >after &&
+		test_line_count = 1 after &&
+
+		# The rolled-up packs should be gone.
+		! test_cmp before after
+	)
+'
+
+test_expect_success 'repack --geometric --cruft collects loose unreachable objects' '
+	git init geometric-cruft-loose &&
+	(
+		cd geometric-cruft-loose &&
+
+		test_commit A &&
+		git repack -d &&
+
+		test_commit B &&
+		git repack &&
+
+		# Create a loose unreachable object by making it
+		# orphaned (not in any pack).
+		loose="$(echo "cruft object" | git hash-object -w --stdin)" &&
+
+		# We have two non-cruft packs and a loose unreachable
+		# object. The geometric+cruft repack should roll up
+		# the packs AND write a cruft pack for the loose
+		# unreachable object.
+		git repack -d --geometric=2 --cruft &&
+
+		ls $packdir/pack-*.mtimes >cruft.packs &&
+		test_line_count = 1 cruft.packs &&
+
+		git fsck
+	)
+'
+
+test_expect_success 'repack --geometric --cruft accumulates cruft packs' '
+	git init geometric-cruft-accumulate &&
+	(
+		cd geometric-cruft-accumulate &&
+
+		git config set maintenance.auto false &&
+
+		test_commit A &&
+		git repack &&
+
+		# First round: create unreachable objects and do a
+		# geometric+cruft repack.
+		unreachable_1="$(echo "cruft 1" | git hash-object -w --stdin)" &&
+		git repack -d --geometric=2 --cruft &&
+
+		ls $packdir/pack-*.mtimes >cruft.1 &&
+		test_line_count = 1 cruft.1 &&
+
+		test_commit B &&
+		git repack &&
+
+		# Second round: create more unreachable objects and
+		# repack again. The old cruft pack should be retained
+		# and a new one written.
+		unreachable_2="$(echo "cruft 2" | git hash-object -w --stdin)" &&
+		git repack -d --geometric=2 --cruft &&
+
+		ls $packdir/pack-*.mtimes >cruft.2 &&
+		test_line_count = 2 cruft.2 &&
+
+		git fsck
+	)
+'
+
+test_expect_success 'repack --geometric --cruft --combine-cruft-below-size' '
+	git init geometric-cruft-combine &&
+	(
+		cd geometric-cruft-combine &&
+
+		git config set maintenance.auto false &&
+
+		test_commit A &&
+		git repack &&
+
+		# Create a small cruft pack.
+		unreachable_1="$(echo "cruft 1" | git hash-object -w --stdin)" &&
+		git repack -d --geometric=2 --cruft &&
+
+		ls $packdir/pack-*.mtimes >cruft.before &&
+		test_line_count = 1 cruft.before &&
+
+		test_commit B &&
+		git repack &&
+
+		# Create another small cruft pack.
+		unreachable_2="$(echo "cruft 2" | git hash-object -w --stdin)" &&
+		git repack -d --geometric=2 --cruft &&
+
+		ls $packdir/pack-*.mtimes >cruft.mid &&
+		test_line_count = 2 cruft.mid &&
+
+		test_commit C &&
+		git repack &&
+
+		# With --combine-cruft-below-size, the two small cruft
+		# packs should be combined into one.
+		unreachable_3="$(echo "cruft 3" | git hash-object -w --stdin)" &&
+		git repack -d --geometric=2 --cruft \
+			--combine-cruft-below-size=10M &&
+
+		ls $packdir/pack-*.mtimes >cruft.after &&
+		test_line_count = 1 cruft.after &&
+
+		git fsck
+	)
+'
+
+test_expect_success 'repack --geometric --cruft --expire-to' '
+	git init geometric-cruft-expire-to &&
+	(
+		cd geometric-cruft-expire-to &&
+
+		git config set maintenance.auto false &&
+
+		test_commit A &&
+		git repack &&
+
+		test_commit B &&
+		git repack &&
+
+		# Create unreachable objects and record them.
+		test_commit C &&
+		C="$(git rev-parse C)" &&
+		git rev-list --objects --no-object-names B..C >unreachable.raw &&
+		sort unreachable.raw >unreachable.want &&
+
+		git reset --hard $C^ &&
+		git tag -d C &&
+		git reflog expire --all --expire=all &&
+
+		git init --bare expired.git &&
+		git repack -d --geometric=2 --cruft \
+			--cruft-expiration=now \
+			--expire-to="expired.git/objects/pack/pack" &&
+
+		# The expired objects should appear in the
+		# expire-to location.
+		expired="$(ls expired.git/objects/pack/pack-*.idx)" &&
+		test_path_is_file "${expired%.idx}.mtimes" &&
+		git show-index <"$expired" >expired.raw &&
+		cut -d" " -f2 expired.raw | sort >expired.objects &&
+		test_cmp unreachable.want expired.objects &&
+
+		git fsck
+	)
+'
+
 test_done
-- 
2.55.0.rc2.10.g29e31820dce

^ permalink raw reply related

* Re: [PATCH v2 2/2] environment: move excludes_file into repo_config_values
From: Junio C Hamano @ 2026-06-26 19:12 UTC (permalink / raw)
  To: Tian Yuchen
  Cc: git, cirnovskyv, Christian Couder, Ayush Chandekar,
	Olamide Caleb Bello
In-Reply-To: <20260626075037.532164-3-cat@malon.dev>

Tian Yuchen <cat@malon.dev> writes:

> Continue the libification effort by moving the 'excludes_file' global
> variable into 'struct repo_config_values'.
>
> Since 'excludes_file' is a dynamically allocated string (char *), it
> requires proper memory management. Introduce repo_config_values_clear()
> to safely free the heap memory when repository instance is destroyed.
>
> Note:
>
>  - 'if (repo != the_repository)' fallback logic is temporarily added
> in both the getter and the clear function. This prevents calling
> repo_config_values() on uninitialized submodules, which triggers BUG().

Would it be possible for the function to be called on the_repository
before it gets initialized?

> +void repo_config_values_clear(struct repository *repo)
> +{
> +	struct repo_config_values *cfg;

What I am wondering is if this check

> +	if (repo != the_repository)
> +		return;

wants to be more like

	if (!repo->initialized)
		return;

or even

	if (!repo->initialized) {
		BUG("clearing uninitialised repo config");
		return;
	}

Or perhaps not doing anything special there.

For that matter,

> +
> +	cfg = repo_config_values(repo);
> +	if (!cfg)
> +		return;

Wouldn't it be a bug to see NULL returned from the above function?
Why is it healthy to pretend as if nothing bad happened?

> +	FREE_AND_NULL(cfg->attributes_file);
> +	FREE_AND_NULL(cfg->excludes_file);
> +}


What do we want to happen when somebody does want to access (not
_clear(), but repo_config_values() itself) repo_config_values() in
today's code?  Don't we want to catch such a code as buggy?  Isn't
it the reason why repository.c:repo_config_values() check these
conditions and calls BUG() in the first place?  And if that is the
case, I find that a caller that "works around" by pretending nothing
bad happened and not calling repo_config_values(), like the above
code does, highly questionable, as it smells like sweeping problems
that you designed other parts of the code to detect with BUG() under
the rug.

In the longer run, we would want to have separate settings, which
used to be global variables but now are stored in per repository
config, available and usable in the context of each repository that
they are configured within.  If a caller wants to clear per repo
config for a repository instance by calling this function, this
function is in no place to tweak the intention of the caller by
short-circuiting the request and pretending it did what it was asked
to do.  In other words, the rest of the code not quite prepared to
deal with these global variables that turned into per repository
configuration values is *not* a problem this function should
address.  Let it be noticed by repo_config_values() function to
catch offending callers for now, and once the codebase becomes ready
to use one repo_config_values per repository, this function does not
have to change.

^ permalink raw reply

* Re: [PATCH v3 0/2] Silence po catalog output under "make -s"
From: Harald Nordgren @ 2026-06-26 19:27 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Harald Nordgren via GitGitGadget
In-Reply-To: <d38d233c-a7c9-4457-96c1-bfb75af71ffe@kdbg.org>

Hi Johannes!

Thanks for the help. What should I expect here, will it be merged to master now?


Harald

On Tue, Jun 23, 2026 at 6:23 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 21.06.26 um 16:56 schrieb Harald Nordgren via GitGitGadget:
> >  * gitk: gate the quiet helpers on -s in MAKEFLAGS and give the catalog rule
> >    a QUIET_MSGFMT prefix, so a silent build emits no MSGFMT/GEN lines
>
> I've picked up this one.
>
> >  * git-gui: replace the QUIET_MSGFMT0/QUIET_MSGFMT1 pair with a single
> >    QUIET_MSGFMT, since with --statistics gone there is no output left to
> >    reformat
>
> But this one, I skipped, because I already have all of it in
> https://github.com/j6t/git-gui/commits/hn/silence-make-s/
>
> Thanks!
> -- Hannes
>

^ permalink raw reply

* Re: [PATCH v6 00/11] refs: fix "onbranch" conditions
From: Patrick Steinhardt @ 2026-06-26 19:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Justin Tobler, git, Karthik Nayak, Jeff King
In-Reply-To: <xmqqmrwh9vl4.fsf@gitster.g>

On Fri, Jun 26, 2026 at 08:20:55AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Justin Tobler <jltobler@gmail.com> writes:
> >
> >> On 26/06/25 11:19AM, Patrick Steinhardt wrote:
> >>> Changes in v6:
> >>>   - Drop redundant condition when setting the default for
> >>>     "core.logallrefupdates".
> >>>   - Leave breakcrumb for why we lazy-load write options for the "files"
> >>>     backend.
> >>>   - Fix commit message typo.
> >>
> >> Thanks. This version of the series looks good to me.
> >>
> >> -Justin
> >
> > Thanks, both.  Let's call it ready for 'next' then.
> 
> Ah, before I forget, as the focus of the topic shifted dramatically
> between v4 and v5, I think we should rename it to something like
> 'ps/refs-onbranch-fixes' to reflect the fact that is no longer is
> about chdir-notify-parent but to fix "onbranch" chicken-and-egg
> situation.

Agreed. I also changed the subject of the cover letter starting with v5
to reflect this, so updating the branch name to match seems sensible to
me.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH v3 5/8] commit-reach: introduce struct paint_state with per-side counters
From: René Scharfe @ 2026-06-26 21:13 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git
  Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson
In-Reply-To: <e82e0c72b6fc72b214f40efa9586c77790881f93.1782479286.git.gitgitgadget@gmail.com>

On 6/26/26 3:08 PM, Kristofer Karlsson via GitGitGadget wrote:
> 
> diff --git a/commit-reach.c b/commit-reach.c
> index f6a438550b..0f29b143bd 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -97,6 +97,75 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
>  	return commit;
>  }
>  
> +/*
> + * Priority queue with per-side commit counters for paint_down_to_common().
> + * Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
> + * PARENT2-only, or both (a pending merge-base candidate).
> + */
> +struct paint_state {
> +	struct prio_queue queue;
> +	int p1_count;
> +	int p2_count;
> +	int pending_merge_bases;
> +};
Can they become negative?  Wouldn't size_t be a more natural fit,
matching nr from struct prio_queue?

And some bikeshedding:

Why abbreviate?  parent1_count and parent2_count would be slightly
easier to read and associate with PARENT1 and PARENT2.

And pending_merge_bases is a counter as well.  Why not call it
like that, pending_merge_base_count?   Well, that's pretty long.
both_count?  That's quite generic and nondescript.  Call the other
counters parents1 and parents2?  Nah.  Or parent1s and parent2s?
Not sure why this inconsistency bothers me to begin with.

René


^ permalink raw reply

* Re: [PATCH v2 1/2] dir: encapsulate excludes_file lazy-load
From: SZEDER Gábor @ 2026-06-26 21:14 UTC (permalink / raw)
  To: Tian Yuchen
  Cc: git, cirnovskyv, Christian Couder, Ayush Chandekar,
	Olamide Caleb Bello
In-Reply-To: <20260626075037.532164-2-cat@malon.dev>

On Fri, Jun 26, 2026 at 03:50:36PM +0800, Tian Yuchen wrote:
> diff --git a/environment.c b/environment.c
> index ba2c60103f..8efcaeafa6 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -134,6 +134,13 @@ int is_bare_repository(void)
>  	return is_bare_repository_cfg && !repo_get_work_tree(the_repository);
>  }
>  
> +const char *repo_excludes_file(struct repository *repo)
> +{
> +	if (!excludes_file)
> +		excludes_file = xdg_config_home("ignore");
> +	return excludes_file;
> +}

This function has a 'repo' parameter, which is not used in the
function at all.  This causes build failure when trying to build this
commit using DEVELOPER=1:

  environment.c: In function ‘repo_excludes_file’:
  environment.c:137:51: error: unused parameter ‘repo’ [-Werror=unused-parameter]
    137 | const char *repo_excludes_file(struct repository *repo)
        |                                ~~~~~~~~~~~~~~~~~~~^~~~
  cc1: all warnings being treated as errors
  make: *** [Makefile:2922: environment.o] Error 1

Please make sure that all commits can be built with 'make
DEVELOPER=1'.


^ permalink raw reply

* Re: [RFC PATCH 04/10] repack: teach MIDX retention about geometric rollups
From: Junio C Hamano @ 2026-06-26 21:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <ad76f06fc7ed304af97c73a5931e1ebc5f2d3895.1782500507.git.me@ttaylorr.com>

Taylor Blau <me@ttaylorr.com> writes:

> +static int pack_geometry_contains_pack(struct packed_git **packs,
> +				       uint32_t packs_nr,
> +				       const char *base)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	uint32_t i;
> +
> +	for (i = 0; i < packs_nr; i++) {
> +		strbuf_reset(&buf);
> +		strbuf_addstr(&buf, pack_basename(packs[i]));
> +		strbuf_strip_suffix(&buf, ".pack");
> +
> +		if (!strcmp(buf.buf, base)) {
> +			strbuf_release(&buf);
> +			return 1;
> +		}
> +	}
> +
> +	strbuf_release(&buf);
> +	return 0;
> +}

It feels slightly inefficient to repeatedly strbuf_reset(),
strbuf_addstr(), and strbuf_strip_suffix() in the loop.  I do not
know if my understanding of what existing_packs_retain_midx_packs()
passes down in buf.buf as base is correct or not, but if so,
wouldn't it equivalent to

	for (uint32_t i = 0; i < packs_nr; i++) {
                const char *pack_name = pack_basename(packs[i]);
                const char *suffix;

                if (skip_prefix(pack_name, base, &suffix) &&
                    !strcmp(suffix, ".pack"))
                        return 1;
	}

perhaps?

Starting from "/path/to/objects/pack/pack-deadbeef.pack", you take
the basename of it to have "pack-deadbeef.pack" in buf, strip out
the ".pack" suffix to get "pack-deadbeef" in buf and then compare it
with the base.

Instead, pack_name in the rewitten one becomes the basename of the
packfile path, i.e., "pack-deadbeef.pack", then we see if it begins
with base and take the remainder in suffix, and finally we check if
that remaining suffix is ".pack".

Which should be equivalent.

> + * freshly-written pack supersedes them. When doing a geometric repack,
> + * packs below the split are rewritten into the new MIDX tip and should
> + * remain eligible for deletion.
>   */
> -void existing_packs_retain_midx_packs(struct existing_packs *existing)
> +void existing_packs_retain_midx_packs(struct existing_packs *existing,
> +				      const struct pack_geometry *geometry)
>  {
>  	struct string_list_item *item;
>  	struct strbuf buf = STRBUF_INIT;
> @@ -315,6 +351,9 @@ void existing_packs_retain_midx_packs(struct existing_packs *existing)
>  		strbuf_strip_suffix(&buf, ".pack");
>  		strbuf_strip_suffix(&buf, ".idx");

Not a fault of this patch, but it makes the hairs on the back of my
head tingle to see that a bogus input like "pack-foobar.idx.pack"
happily is taken, while "pack-foobar.pack.idx", an equally bogus
input, is not.

> +		if (pack_geometry_contains_rollup(geometry, buf.buf))
> +			continue;

^ permalink raw reply

* Re: [RFC PATCH 08/10] pack-objects: introduce '--stdin-packs=follow-reachable'
From: Junio C Hamano @ 2026-06-26 21:39 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <e3d2e46443d0b32ce29215563dde04ebcf850679.1782500507.git.me@ttaylorr.com>

Taylor Blau <me@ttaylorr.com> writes:

> +/*
> + * Flag bit set on commits that belong to an included pack during
> + * '--stdin-packs=follow-reachable'. Used by the pre-walk to
> + * identify which reachable commits should be tips for the main
> + * object traversal.
> + */
> +#define IN_INCLUDED_PACK (1u<<11)
> +
> +static int mark_included_pack_tip(const struct object_id *oid,
> +				  struct packed_git *p,
> +				  uint32_t pos,
> +				  void *data)
> +{
> +	struct rev_info *main_revs = data;
> +	off_t ofs = nth_packed_object_offset(p, pos);
> +	enum object_type type;
> +	struct object_info oi = OBJECT_INFO_INIT;
> +	struct object *obj;
> +
> +	oi.typep = &type;
> +	if (packed_object_info(p, ofs, &oi) < 0)
> +		return 0;
> +	if (type != OBJ_COMMIT && type != OBJ_TAG)
> +		return 0;

We do not care about non commits, non tags.

> +	obj = parse_object(the_repository, oid);
> +	if (!obj)
> +		return 0;
> +
> +	obj->flags |= IN_INCLUDED_PACK;
> +
> +	if (type == OBJ_TAG && main_revs)
> +		add_pending_object(main_revs, obj, "");

Any tag object is added to the pending list of the second phase
traversal here.  Doesn't this retain unreachable tags and
(unreachable) objects that are only reachable from these unreachable
tags found in the packfile?  Don't we want to limit this code to add
only tags that actually are reachable from refs, or something?

> +	return 0;
> +}

The other function ...

> +static int mark_loose_object_tip(const struct object_id *oid,
> +				 struct object_info *oi UNUSED,
> +				 void *data)

... is structured in a very similar way, and gives the same
puzzlement to me.

^ permalink raw reply

* Re: [PATCH v2 1/2] dir: encapsulate excludes_file lazy-load
From: Junio C Hamano @ 2026-06-26 21:45 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Tian Yuchen, git, cirnovskyv, Christian Couder, Ayush Chandekar,
	Olamide Caleb Bello
In-Reply-To: <aj7rtj9NsejqN357@szeder.dev>

SZEDER Gábor <szeder.dev@gmail.com> writes:

> On Fri, Jun 26, 2026 at 03:50:36PM +0800, Tian Yuchen wrote:
>> diff --git a/environment.c b/environment.c
>> index ba2c60103f..8efcaeafa6 100644
>> --- a/environment.c
>> +++ b/environment.c
>> @@ -134,6 +134,13 @@ int is_bare_repository(void)
>>  	return is_bare_repository_cfg && !repo_get_work_tree(the_repository);
>>  }
>>  
>> +const char *repo_excludes_file(struct repository *repo)
>> +{
>> +	if (!excludes_file)
>> +		excludes_file = xdg_config_home("ignore");
>> +	return excludes_file;
>> +}
>
> This function has a 'repo' parameter, which is not used in the
> function at all.  This causes build failure when trying to build this
> commit using DEVELOPER=1:
>
>   environment.c: In function ‘repo_excludes_file’:
>   environment.c:137:51: error: unused parameter ‘repo’ [-Werror=unused-parameter]
>     137 | const char *repo_excludes_file(struct repository *repo)
>         |                                ~~~~~~~~~~~~~~~~~~~^~~~
>   cc1: all warnings being treated as errors
>   make: *** [Makefile:2922: environment.o] Error 1
>
> Please make sure that all commits can be built with 'make
> DEVELOPER=1'.

Good point.  In this case, we can start with UNUSED in step 1/2 and
then drop the UNUSED in the second step.  I wonder how harder to read
it would become if these two patches are squashed together...

Thanks.

^ permalink raw reply

* Re: [PATCH v3 5/8] commit-reach: introduce struct paint_state with per-side counters
From: Kristofer Karlsson @ 2026-06-26 21:57 UTC (permalink / raw)
  To: René Scharfe
  Cc: Kristofer Karlsson via GitGitGadget, git, Derrick Stolee,
	Elijah Newren
In-Reply-To: <bd37b80d-9eff-496d-8f1f-436594968678@web.de>

On Fri, 26 Jun 2026 at 23:13, René Scharfe <l.s.r@web.de> wrote:
>
> > +struct paint_state {
> > +     struct prio_queue queue;
> > +     int p1_count;
> > +     int p2_count;
> > +     int pending_merge_bases;
> > +};
> Can they become negative?  Wouldn't size_t be a more natural fit,
> matching nr from struct prio_queue?

Negative would be a clear indication of a bug though that's
not checked right now anyway. And since it's not checked
we might as well use size_t instead - and it would technically
be more correct though I struggle to imagine a case where
the number of active elements in the frontier exceeds 2^31
or whatever a signed int would give.

I am happy to change to size_t.

> And some bikeshedding:
>
> Why abbreviate?  parent1_count and parent2_count would be slightly
> easier to read and associate with PARENT1 and PARENT2.
>
> And pending_merge_bases is a counter as well.  Why not call it
> like that, pending_merge_base_count?   Well, that's pretty long.
> both_count?  That's quite generic and nondescript.  Call the other
> counters parents1 and parents2?  Nah.  Or parent1s and parent2s?
> Not sure why this inconsistency bothers me to begin with.

Fair point, I was thinking that the surrounding context is so small
that the naming almost doesn't matter - the terms don't
escape paint_down_to_common.

I am happy to change to something like:
parent1_count, parent2_count, mb_candidate_count
to make it more consistent.

It seems the mb_ prefix is already used for
merge bases in some files - best example is perhaps builtin/diff.c

I see in the codebase that we are using multiple styles,
perhaps depending on specific context.

- nr_ prefix: nr_objects, nr_paths_watching
- num_ prefix: num_commits, num_hashes, num_workers
- _count suffix: entry_count, max_count, skip_count

so I think _count suffix is a good choice at least - it matches
other usages where we typically just increment or decrement.

Thanks,
Kristofer

^ permalink raw reply

* Re: [L10N] Kickoff for Git 2.55.0 translations
From: Junio C Hamano @ 2026-06-26 22:05 UTC (permalink / raw)
  To: Jiang Xin
  Cc: Git List, Alexander Shopov, Mikel Forcada, Ralf Thielow,
	Jean-Noël Avila, Bagas Sanjaya, Dimitriy Ryazantcev,
	Peter Krefting, Emir SARI, Arkadii Yakovets,
	Vũ Tiến Hưng, Teng Long, Yi-Jyun Pan
In-Reply-To: <20260613061658.1767987-1-worldhello.net@gmail.com>

Jiang Xin <worldhello.net@gmail.com> writes:

> Git v2.55.0-rc0 has been released, and we are starting a new round of
> localization for Git 2.55.0. Since the last release, 125 catalog entries need
> to be translated. Please open a pull request against the l10n coordinator
> repository (URL below) before the update window closes on Sat, 27 Jun 2026.
> ...
> **Reminder: the update window closes on Sat, 27 Jun 2026.**

Thanks.  

The cut-off date will be tomorrow.  Hopefully I'll hear from you on
28th in time for the final on 29th.

^ permalink raw reply

* Re: [PATCH GSoC v14 09/13] serve: advertise object-info feature
From: Karthik Nayak @ 2026-06-26 22:23 UTC (permalink / raw)
  To: Pablo Sabater, git
  Cc: chandrapratap3519, chriscool, eric.peijian, gitster, jltobler,
	peff, toon, Calvin Wan, Jonathan Tan
In-Reply-To: <20260625-ps-eric-work-rebase-v14-9-09f7ffe21a53@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1661 bytes --]

Pablo Sabater <pabloosabaterr@gmail.com> writes:

> From: Calvin Wan <calvinwan@google.com>
>
> In order for a client to know what object-info components a server can
> provide, advertise supported object-info features. This will allow a
> client to decide whether to query the server for object-info or fetch
> as a fallback.
>
> Helped-by: Jonathan Tan <jonathantanmy@google.com>
> Helped-by: Christian Couder <chriscool@tuxfamily.org>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> Signed-off-by: Eric Ju <eric.peijian@gmail.com>
> Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
> ---
>  serve.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/serve.c b/serve.c
> index 49a6e39b1d..2b07d922b3 100644
> --- a/serve.c
> +++ b/serve.c
> @@ -89,7 +89,7 @@ static void session_id_receive(struct repository *r UNUSED,
>  	trace2_data_string("transfer", NULL, "client-sid", client_sid);
>  }
>
> -static int object_info_advertise(struct repository *r, struct strbuf *value UNUSED)
> +static int object_info_advertise(struct repository *r, struct strbuf *value)
>  {
>  	if (advertise_object_info == -1 &&
>  	    repo_config_get_bool(r, "transfer.advertiseobjectinfo",
> @@ -97,6 +97,9 @@ static int object_info_advertise(struct repository *r, struct strbuf *value UNUS
>  		/* disabled by default */
>  		advertise_object_info = 0;
>  	}
> +	/* Currently only size is supported */
> +	if (value && advertise_object_info)
> +		strbuf_addstr(value, "size");

So is the plan that further options will be added here to value? If so,
whats the format we will follow?

>  	return advertise_object_info;
>  }
>
>
> --
> 2.54.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply

* Re: [RFH] Why do osx CI jobs so unreliable?
From: Michael Montalbo @ 2026-06-26 23:26 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Jeff King, git, Junio C Hamano
In-Reply-To: <aj5ZaZK7xylfs4Xw@pks.im>

On Fri, Jun 26, 2026 at 3:50 AM Patrick Steinhardt <ps@pks.im> wrote:
> The bug manifests both with HTTP/1.1 and HTTP/2 though, so this wouldn't
> fully fix the flakes we see, right?

Yes you are right. The linked fix would just prevent the hanging after timeout
for HTTP/2 tests, but still leaves HTTP/1.1 fakes.

> I was also wondering whether we can maybe work around the issue by
> increasing the Apache timeout value. That sounds like an easy potential
> solution to try, and from all we've discovered so far it doesn't feel
> like this is something we can address on the Git side.

I think Peff and Patrick's suggestion to just increase the Apache timeout
makes sense. I ran some experiments using a really long timeout with an
artificially slowed down CI runner and all the jobs made progress
(if slowly) without stalling, and eventually completed successfully:

https://github.com/mmontalbo/git/actions/runs/28267019651

I haven't spent a lot of time trying to figure out what the right timeout
value should be. An hour definitely seems like overkill, with something
on the order of 5-10 minutes seeming more reasonable, but I don't
have a principled number.

^ permalink raw reply

* Re: [RFH] Why do osx CI jobs so unreliable?
From: Jeff King @ 2026-06-26 23:43 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Michael Montalbo, git, Junio C Hamano
In-Reply-To: <aj5ZaZK7xylfs4Xw@pks.im>

On Fri, Jun 26, 2026 at 12:50:17PM +0200, Patrick Steinhardt wrote:

> > Thanks both of you for digging into this. I'm not familiar enough with
> > Apache's code to pass confident judgement, but your findings certainly
> > convinced me that this is just an apache bug.
> 
> The bug manifests both with HTTP/1.1 and HTTP/2 though, so this wouldn't
> fully fix the flakes we see, right?

If I understand the situation correctly, there are really two problems.

First, the EXPENSIVE tests in t5551 sometimes trigger a timeout with
Apache's stock settings. This presumably became a problem recently due
to 7a094d68a2 (ci: run expensive tests on push builds to integration
branches, 2026-05-08). The same problem exists in t5559, which just
wraps t5551 but tells us to use http2.

This timeout will cause test failures in t5551, because we aren't able
to complete a request we expected to. Obviously bad and annoying.

The second problem is that when Apache hits the timeout in HTTP/2 mode,
it hangs forever. And then the CI job hangs for 6 hours until it's
killed, which is an even more annoying failure.

So the root cause is the same (a timeout), but the effect depends on
HTTP/1.1 vs HTTP/2. I was able to reproduce both cases on my local
Debian unstable system by dropping the timeout as you suggested. Running
t5551 with GIT_TEST_LONG yields a failure, whereas running t5559 yields
a hang.

We can mitigate both cases by bumping the timeout value, since it's
addressing the root cause.

There's an open question of whether this is just papering over a problem
that real users might experience, and whether Git should be doing more
to keep the connection alive. I think it's probably OK to ignore this in
practice. This is an intentionally large request being served by a very
underpowered platform. The default apache timeout is 60s. If a
real-world server is seeing ls-refs requests take that long then they
probably need to reconsider some other decisions, from ref packing to
better hardware to dropping some users. ;) I don't think trying to
insert keepalives at the Git layer here is worth the trouble.

To give a sense of the time options, here are a few timings from my
local machine, timing "git upload-pack . </dev/null >/dev/null" in
t5551's big repo.git (that's a v0 advertisement, but it should be
roughly the same work as the v2 ls-refs).

  cold-cache, refs not packed:
  real	0m9.973s
  user	0m0.354s
  sys	0m1.364s

  warm cache, refs not packed:
  real	0m0.410s
  user	0m0.153s
  sys	0m0.257s

  cold-cache, refs packed:
  real	0m0.149s
  user	0m0.086s
  sys	0m0.035s

  warm cache, refs packed:
  real	0m0.069s
  user	0m0.054s
  sys	0m0.016s

So 10s is pretty abysmal (and on an SSD, no less). I would expect the
cache to be warm (we just wrote these refs!) but I could also believe
that CI systems are under heavy I/O and memory pressure, so we sometimes
end up crossing the 60s mark.

So bumping Apache's timeout to 600s or something would probably be a
fine mitigation. That's still not _solving_ the problem, but presumably
an order of magnitude is enough for it to never come up in practice.

Michael suggested packing the refs as a mitigation. I was lukewarm on
that in my previous email, because it wasn't clear to me how close we
were on the timeout budget, and if it would just make the race less
frequent (rather than never happen). But seeing those cold-cache numbers
makes me think it might be worth doing just on principle to make the
tests more efficient, and any timeout mitigation is a bonus.

Of course the pack-refs process (and the initial ref writes) will still
have to touch all of those loose files, so those will still be slow. But
they're not on a timeout, and I suspect we read the result many more
times than we write/pack (the test failures we are seeing are not in the
expensive tests, but just "normal" tests that are stuck with the
gigantic ref state).

-Peff

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox