git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX
@ 2025-12-08 18:27 Patrick Steinhardt
  2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:27 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King

Hi,

this small patch series introduces logic to avoid rewriting the
multi-pack index in case it's up-to-date already. This is especially
relevant in the context of geometric repacking, where we may decide to
not write any new packfiles, but we'd still rewrite the multi-pack
index.

This is a follow-up for the discussion that happened at [1].

Thanks!

Patrick

[1]: <20251025191550.GA279793@coredump.intra.peff.net>

---
Patrick Steinhardt (2):
      midx: fix `BUG()` when getting preferred pack without a reverse index
      builtin/repack: don't regenerate MIDX unless needed

 midx.c                      |  2 +-
 pack-revindex.h             |  3 +-
 repack-midx.c               | 90 +++++++++++++++++++++++++++++++++++++++++++++
 t/t5319-multi-pack-index.sh | 55 +++++++++++++++++++++++++++
 t/t7703-repack-geometric.sh | 80 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 228 insertions(+), 2 deletions(-)


---
base-commit: bdc5341ff65278a3cc80b2e8a02a2f02aa1fac06
change-id: 20251208-pks-skip-noop-rewrite-38d7f01c79c5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index
  2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
@ 2025-12-08 18:27 ` Patrick Steinhardt
  2025-12-10  0:22   ` Taylor Blau
  2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
  2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
  2 siblings, 1 reply; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:27 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King

The function `midx_preferred_pack()` returns the preferred pack for a
given multi-pack index. To compute the preferred pack we:

  1. Look up the position of the first object indexed by the multi-pack
     index.

  2. Convert this position from pseudo-pack order into MIDX order.

  3. We then look up pack that corresponds to this MIDX index.

This reliably returns the preferred pack given that all of its contained
objects will be up front in pseudo-pack order.

The second step that turns the pseudo-pack order into MIDX order
requires the reverse index though, which may not exist for example when
the MIDX does not have a bitmap. And in that case one may easily hit a
bug:

    BUG: ../pack-revindex.c:491: pack_pos_to_midx: reverse index not yet loaded

In theory, `midx_preferred_pack()` already knows to handle the case
where no reverse index exists, as it calls `load_midx_revindex()` before
calling into `midx_preferred_pack()`. But we only check for negative
return values there, even though the function returns a positive error
code in case the reverse index does not exist.

Fix the issue by testing for a non-zero return value instead, same as
all the other callers of this function already do. While at it, document
the return value of `load_midx_revindex()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx.c                      |  2 +-
 pack-revindex.h             |  3 ++-
 t/t5319-multi-pack-index.sh | 13 +++++++++++++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 24e1e72175..b681b18fc1 100644
--- a/midx.c
+++ b/midx.c
@@ -686,7 +686,7 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
 		uint32_t midx_pos;
-		if (load_midx_revindex(m) < 0) {
+		if (load_midx_revindex(m)) {
 			m->preferred_pack_idx = -2;
 			return -1;
 		}
diff --git a/pack-revindex.h b/pack-revindex.h
index 422c2487ae..0042892091 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -72,7 +72,8 @@ int verify_pack_revindex(struct packed_git *p);
  * multi-pack index by mmap-ing it and assigning pointers in the
  * multi_pack_index to point at it.
  *
- * A negative number is returned on error.
+ * A negative number is returned on error. A positive number is returned in
+ * case the multi-pack-index does not have a reverse index.
  */
 int load_midx_revindex(struct multi_pack_index *m);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 93f319a4b2..9492a9737b 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -350,7 +350,20 @@ test_expect_success 'preferred pack from existing MIDX without bitmaps' '
 		# the new MIDX
 		git multi-pack-index write --preferred-pack=pack-$pack.pack
 	)
+'
 
+test_expect_success 'preferred pack cannot be determined without bitmap' '
+	test_when_finished "rm -fr preferred-can-be-queried" &&
+	git init preferred-can-be-queried &&
+	(
+		cd preferred-can-be-queried &&
+		test_commit initial &&
+		git repack -Adl --write-midx --no-write-bitmap-index &&
+		test_must_fail test-tool read-midx --preferred-pack .git/objects 2>err &&
+		test_grep "could not determine MIDX preferred pack" err &&
+		git repack -Adl --write-midx --write-bitmap-index &&
+		test-tool read-midx --preferred-pack .git/objects
+	)
 '
 
 test_expect_success 'verify multi-pack-index success' '

-- 
2.52.0.270.g3f4935d65f.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed
  2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
  2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
@ 2025-12-08 18:27 ` Patrick Steinhardt
  2025-12-10  2:48   ` Taylor Blau
  2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
  2 siblings, 1 reply; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:27 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King

When calling `git repack --write-midx` we will unconditionally rewrite
the multi-pack index. This is of course expected in the case where the
layout of packfiles has changed. But when the layout of packfiles did
not change it's of course a potentially-large waste of time.

With our default maintenance strategy this isn't really that much of a
problem, as git-repack(1) would only be called by git-gc(1) in case we
know that the layout _will_ change. But that is changing with geometric
repacking, where it is the responsibility of git-repack(1) itself to
determine whether or not any packs need to be merged together. So while
git-repack happily declares that there is "Nothing new to pack.", we
end up rewriting the MIDX.

This issue can be demonstrated trivially with a benchmark in the Git
repository: executing `git repack --geometric=2 --write-midx -d` in the
Git repository takes more than 3 seconds only to end up with the same
multi-pack index as we already had before.

Address this issue by introducing a new function that determines whether
a rewrite of the MIDX would cause any user-visible changes. This covers
the following cases:

  - No multi-pack index exists at all.

  - The user asked us to write a bitmap, and we don't have any.

  - The request preferred pack is different than the one that we have.

  - The packfiles covered by the MIDX are changing.

Only if any of these conditions trigger we decide to write a new MIDX.
This allows us to significantly reduce the time for repacks that end up
doing nothing:

    Benchmark 1: git repack --geometric=2 --write-midx -d
      Time (mean ± σ):      3.183 s ±  0.078 s    [User: 2.924 s, System: 0.219 s]
      Range (min … max):    2.985 s …  3.260 s    10 runs

    Benchmark 2: ./git repack --geometric=2 --write-midx -d
      Time (mean ± σ):     102.5 ms ±   1.0 ms    [User: 89.3 ms, System: 12.7 ms]
      Range (min … max):   101.3 ms … 105.3 ms    28 runs

    Summary
      ./git repack --geometric=2 --write-midx -d ran
       31.06 ± 0.82 times faster than git repack --geometric=2 --write-midx -d

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 repack-midx.c               | 90 +++++++++++++++++++++++++++++++++++++++++++++
 t/t5319-multi-pack-index.sh | 42 +++++++++++++++++++++
 t/t7703-repack-geometric.sh | 80 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 212 insertions(+)

diff --git a/repack-midx.c b/repack-midx.c
index 74bdfa3a6e..4e47f7c713 100644
--- a/repack-midx.c
+++ b/repack-midx.c
@@ -2,6 +2,7 @@
 #include "repack.h"
 #include "hash.h"
 #include "hex.h"
+#include "midx.h"
 #include "odb.h"
 #include "oidset.h"
 #include "pack-bitmap.h"
@@ -283,6 +284,92 @@ static void remove_redundant_bitmaps(struct string_list *include,
 	strbuf_release(&path);
 }
 
+static bool midx_needs_update(struct repack_write_midx_opts *opts,
+			      struct string_list *include,
+			      struct packed_git *new_preferred)
+{
+	struct multi_pack_index *midx;
+	uint32_t preferred_pack_id;
+	bool needed = true;
+
+	/*
+	 * If there is no multi-pack index we obviously need to generate a new
+	 * one.  Note that we cannot use `get_multi_pack_index()` here, as we
+	 * might be operating with a closed object database.
+	 */
+	midx = load_multi_pack_index(opts->existing->repo->objects->sources);
+	if (!midx)
+		goto out;
+
+	/*
+	 * If we're instructed to write a bitmap we need to verify whether we
+	 * already got one.
+	 */
+	if (opts->write_bitmaps) {
+		struct bitmap_index *bitmap = prepare_midx_bitmap_git(midx);
+		bool bitmap_exists = bitmap && bitmap_is_midx(bitmap);
+		free_bitmap_index(bitmap);
+		if (!bitmap_exists)
+			goto out;
+	}
+
+	/*
+	 * If we're asked to generate the MIDX with a preferred pack we need to
+	 * verify that the current prepared pack matches the desired one. We
+	 * can only determine this in the case where we write bitmaps though,
+	 * so we ignore this setting otherwise.
+	 */
+	if (new_preferred && opts->write_bitmaps) {
+		struct packed_git *old_preferred = NULL;
+
+		if (!midx_preferred_pack(midx, &preferred_pack_id))
+			old_preferred = nth_midxed_pack(midx, preferred_pack_id);
+
+		if (!old_preferred)
+			goto out;
+
+		if (strcmp(pack_basename(new_preferred),
+			   pack_basename(old_preferred)))
+			goto out;
+	 }
+
+	/*
+	 * Otherwise, we need to verify that the packs that are about to be
+	 * included in the MIDX matches the currently covered packs.
+	 */
+	if (include->nr != opts->existing->midx_packs.nr)
+		goto out;
+
+	string_list_sort(include);
+	string_list_sort(&opts->existing->midx_packs);
+	for (size_t i = 0; i < include->nr; i++) {
+		const char *include_name = include->items[i].string;
+		const char *existing_name = opts->existing->midx_packs.items[i].string;
+		size_t include_len = strlen(include_name);
+		size_t existing_len = strlen(existing_name);
+
+		/*
+		 * We track pack indices for the include list, but the
+		 * packfiles themselves in the existing list. We thus need to
+		 * strip these suffixes.
+		 */
+		if (ends_with(include_name, ".idx"))
+			include_len -= strlen(".idx");
+		if (ends_with(existing_name, ".pack"))
+			existing_len -= strlen(".pack");
+		if (include_len != existing_len)
+			goto out;
+		if (memcmp(include_name, existing_name, include_len))
+			goto out;
+	}
+
+	needed = false;
+
+out:
+	close_midx(midx);
+	return needed;
+}
+
 int write_midx_included_packs(struct repack_write_midx_opts *opts)
 {
 	struct child_process cmd = CHILD_PROCESS_INIT;
@@ -296,6 +383,9 @@ int write_midx_included_packs(struct repack_write_midx_opts *opts)
 	if (!include.nr)
 		goto done;
 
+	if (!midx_needs_update(opts, &include, preferred))
+		goto done;
+
 	cmd.in = -1;
 	cmd.git_cmd = 1;
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 9492a9737b..4adf67385a 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -366,6 +366,48 @@ test_expect_success 'preferred pack cannot be determined without bitmap' '
 	)
 '
 
+test_midx_is_retained () {
+	test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
+	ls -l .git/objects/pack/multi-pack-index >expect &&
+	git repack "$@" &&
+	ls -l .git/objects/pack/multi-pack-index >actual &&
+	test_cmp expect actual
+}
+
+test_midx_is_rewritten () {
+	test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
+	ls -l .git/objects/pack/multi-pack-index >expect &&
+	git repack --write-midx "$@" &&
+	ls -l .git/objects/pack/multi-pack-index >actual &&
+	! test_cmp expect actual
+}
+
+test_expect_success 'up-to-date multi-pack-index is retained' '
+	test_when_finished "rm -fr midx-up-to-date" &&
+	git init midx-up-to-date &&
+	(
+		cd midx-up-to-date &&
+
+		# Write the initial pack that contains the most objects. This
+		# will be the preferred pack.
+		test_commit first &&
+		test_commit second &&
+		git repack -Ad --write-midx &&
+		test_midx_is_retained -Ad &&
+
+		# Writing a new bitmap index should cause us to regenerate the MIDX.
+		test_midx_is_rewritten -Ad --write-bitmap-index &&
+		test_midx_is_retained -Ad --write-bitmap-index &&
+
+		# Ensure that writing a new packfile causes us to rewrite the index.
+		test_commit incremental &&
+		test_midx_is_rewritten -d &&
+		test_midx_is_retained -d
+	)
+'
+
+test_done
+
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
 '
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 9fc1626fbf..980599961c 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -287,6 +287,86 @@ test_expect_success '--geometric with pack.packSizeLimit' '
 	)
 '
 
+test_expect_success '--geometric --write-midx retains up-to-date MIDX without bitmap index' '
+	test_when_finished "rm -fr repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+
+		test_path_is_missing .git/objects/pack/multi-pack-index &&
+		git repack --geometric=2 --write-midx --no-write-bitmap-index &&
+		test_path_is_file .git/objects/pack/multi-pack-index &&
+		test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
+
+		ls -l .git/objects/pack/ >expect &&
+		git repack --geometric=2 --write-midx --no-write-bitmap-index &&
+		ls -l .git/objects/pack/ >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--geometric --write-midx regenerates MIDX when preferred pack changes' '
+	test_when_finished "rm -fr repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit first &&
+		test_commit second &&
+		test_commit third &&
+		git repack --geometric=2 --write-midx --write-bitmap-index &&
+		test_commit fourth &&
+		git repack --geometric=2 --write-midx --write-bitmap-index &&
+
+		ls .git/objects/pack/*.pack >packs &&
+		test_line_count = 2 packs &&
+		preferred_pack=$(test-tool read-midx --preferred-pack .git/objects) &&
+		other_pack=$(ls .git/objects/pack/*.idx | grep -v "$preferred_pack") &&
+		echo "$preferred_pack" &&
+		echo "$other_pack" &&
+
+		# Rewrite the multi-pack index with the current preferred pack.
+		# git-repack(1) should decide to _not_ repack the MIDX in that
+		# case. This is mostly a sanity check to verify that the reason
+		# for the repack really only is the changed preferred pack.
+		rm -f .git/objects/pack/multi-pack-index* &&
+		git multi-pack-index write --bitmap --preferred-pack="$preferred_pack" &&
+		test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
+		ls -l .git/objects/pack/ >expect &&
+		git repack --geometric=2 --write-midx --write-bitmap-index &&
+		ls -l .git/objects/pack/ >actual &&
+		test_cmp expect actual &&
+
+		# Rewrite the multi-pack index with a different preferred pack.
+		# This time around, git-repack(1) should decide to repack the
+		# MIDX to rectify the preferred pack.
+		rm -f .git/objects/pack/multi-pack-index* &&
+		git multi-pack-index write --bitmap --preferred-pack="$(basename "$other_pack")" &&
+		test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
+		ls -l .git/objects/pack/ >expect &&
+		git repack --geometric=2 --write-midx --write-bitmap-index &&
+		ls -l .git/objects/pack/ >actual &&
+		! test_cmp expect actual
+	)
+'
+
+test_expect_success '--geometric --write-midx retains up-to-date MIDX with bitmap index' '
+	test_when_finished "rm -fr repo" &&
+	git init repo &&
+	test_commit -C repo initial &&
+
+	test_path_is_missing repo/.git/objects/pack/multi-pack-index &&
+	git -C repo repack --geometric=2 --write-midx --write-bitmap-index &&
+	test_path_is_file repo/.git/objects/pack/multi-pack-index &&
+	test-tool chmtime =0 repo/.git/objects/pack/multi-pack-index &&
+
+	ls -l repo/.git/objects/pack/ >expect &&
+	git -C repo repack --geometric=2 --write-midx --write-bitmap-index &&
+	ls -l repo/.git/objects/pack/ >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success '--geometric --write-midx with packfiles in main and alternate ODB' '
 	test_when_finished "rm -fr shared member" &&
 

-- 
2.52.0.270.g3f4935d65f.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index
  2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
@ 2025-12-10  0:22   ` Taylor Blau
  2025-12-10  9:40     ` Patrick Steinhardt
  0 siblings, 1 reply; 16+ messages in thread
From: Taylor Blau @ 2025-12-10  0:22 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King

On Mon, Dec 08, 2025 at 07:27:14PM +0100, Patrick Steinhardt wrote:
> The function `midx_preferred_pack()` returns the preferred pack for a
> given multi-pack index. To compute the preferred pack we:
>
>   1. Look up the position of the first object indexed by the multi-pack
>      index.
>
>   2. Convert this position from pseudo-pack order into MIDX order.
>
>   3. We then look up pack that corresponds to this MIDX index.

I think the implementation of midx_preferred_pack() works a little bit
differently than is described here. I often get confused when working in
this area juggling between the various object/pack orderings in my head.

midx_preferred_pack() cares about converting from the first position in
pseudo-pack order back into MIDX object order. To do that, we convert
the pseudo-pack position into a MIDX one, and then lookup the pack that
represents that object.

Effectively we're doing something like:

    uint32_t pseudo_pack_pos = m->num_objects_in_base;
    uint32_t midx_pos = pack_pos_to_midx(m, pseudo_pack_pos);
    uint32_t pack_int_id = nth_midxed_pack_int_id(m, midx_pos);

> This reliably returns the preferred pack given that all of its contained
> objects will be up front in pseudo-pack order.
>
> The second step that turns the pseudo-pack order into MIDX order
> requires the reverse index though, which may not exist for example when
> the MIDX does not have a bitmap. And in that case one may easily hit a
> bug:
>
>     BUG: ../pack-revindex.c:491: pack_pos_to_midx: reverse index not yet loaded

That makes sense, we can't convert a pseudo-pack position into a
MIDX-relative position without a reverse index to tell us where that bit
goes.

> In theory, `midx_preferred_pack()` already knows to handle the case
> where no reverse index exists, as it calls `load_midx_revindex()` before
> calling into `midx_preferred_pack()`. [...]

Right, it handles this case by returning -1 to indicate that it didn't
have enough information to determine the preferred pack.

> [...] But we only check for negative
> return values there, even though the function returns a positive error
> code in case the reverse index does not exist.

Ah. It looks like that was changed in 5a6072f631d (fsck: validate .rev
file header, 2023-04-17), but it looks like the caller here did not
learn about that change. It may be worth mentioning that commit in your
patch message.

While reviewing, I wanted to make sure that there weren't any other
callers of load_midx_revindex() that were also missing this check. The
return value of that function is propagated through the two expected
functions:

    $ git grep -p load_revindex_from_disk
    pack-revindex.c=struct revindex_header {
    pack-revindex.c:static int load_revindex_from_disk(const struct git_hash_algo *algo,
    pack-revindex.c=int load_pack_revindex_from_disk(struct packed_git *p)
    pack-revindex.c:        ret = load_revindex_from_disk(p->repo->hash_algo,
    pack-revindex.c=int load_midx_revindex(struct multi_pack_index *m)
    pack-revindex.c:        ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,

, and checking through the callers of those two functions, all are
prepared to handle a >0 return value.

> diff --git a/midx.c b/midx.c
> index 24e1e72175..b681b18fc1 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -686,7 +686,7 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
>  {
>  	if (m->preferred_pack_idx == -1) {
>  		uint32_t midx_pos;
> -		if (load_midx_revindex(m) < 0) {
> +		if (load_midx_revindex(m)) {
>  			m->preferred_pack_idx = -2;
>  			return -1;
>  		}
> diff --git a/pack-revindex.h b/pack-revindex.h
> index 422c2487ae..0042892091 100644
> --- a/pack-revindex.h
> +++ b/pack-revindex.h
> @@ -72,7 +72,8 @@ int verify_pack_revindex(struct packed_git *p);
>   * multi-pack index by mmap-ing it and assigning pointers in the
>   * multi_pack_index to point at it.
>   *
> - * A negative number is returned on error.
> + * A negative number is returned on error. A positive number is returned in
> + * case the multi-pack-index does not have a reverse index.

Makes sense.

> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 93f319a4b2..9492a9737b 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -350,7 +350,20 @@ test_expect_success 'preferred pack from existing MIDX without bitmaps' '
>  		# the new MIDX
>  		git multi-pack-index write --preferred-pack=pack-$pack.pack
>  	)
> +'
>
> +test_expect_success 'preferred pack cannot be determined without bitmap' '
> +	test_when_finished "rm -fr preferred-can-be-queried" &&
> +	git init preferred-can-be-queried &&
> +	(
> +		cd preferred-can-be-queried &&
> +		test_commit initial &&
> +		git repack -Adl --write-midx --no-write-bitmap-index &&
> +		test_must_fail test-tool read-midx --preferred-pack .git/objects 2>err &&
> +		test_grep "could not determine MIDX preferred pack" err &&

Looks good. I think that it's fine to end the test here, since we have
extensive coverage that we can determine the preferred pack when there
is a MIDX and matching bitmap, but I definitely don't feel strongly
about it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed
  2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
@ 2025-12-10  2:48   ` Taylor Blau
  2025-12-10  9:40     ` Patrick Steinhardt
  0 siblings, 1 reply; 16+ messages in thread
From: Taylor Blau @ 2025-12-10  2:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King

On Mon, Dec 08, 2025 at 07:27:15PM +0100, Patrick Steinhardt wrote:
> Address this issue by introducing a new function that determines whether
> a rewrite of the MIDX would cause any user-visible changes. This covers
> the following cases:
>
>   - No multi-pack index exists at all.
>
>   - The user asked us to write a bitmap, and we don't have any.
>
>   - The request preferred pack is different than the one that we have.
>
>   - The packfiles covered by the MIDX are changing.

I can't think of any cases beyond the ones you listed here that would
require us to regenerate the MIDX. One kind-of-exception here would be:

    $ git repack [...] --write-midx --write-bitmap-index
    $ git repack [...] --write-midx

where the second repack would generate an identical MIDX, but does not
want to retain a bitmap. That case is already handled in the MIDX
writing code if you search for "want_bitmap".

That makes me wonder whether the repack layer is the most appropriate
one to handle this logic. It seems like write_midx_internal() would
reasonably be able to detect whether or not the MIDX we have already is
up-to-date with respect to the given input.

I think that makes some things about your patch easier and other things
a little harder ;-).

 - On the "easier" front: while both the MIDX code and the portion of
   the repack code that drives it receive the same set of packs to
   include, the MIDX code already has the packs it would compare
   in a standard format. That would avoid you having to handle ends_with
   ends_with(include_name, ".idx") and ends_with(existing_name, ".pack")
   as special cases, which would be nice.

 - On the "harder" front: when driving MIDX generation with the
   '--stdin-packs' option, we *don't* load an existing MIDX ever since
   0c5a62f14bc (midx-write.c: do not read existing MIDX with
   `packs_to_include`, 2024-06-11).

I don't think that "harder" one is a show-stopper, though. Commit
t0c5a62f14bc has enough gory details around how we generate pack IDs and
various subtle assumptions about how and when we load packs that I am
very hesitant to recommend changing it given its fragility (though we
should examine and harden any fragilities within midx-write.c, maybe
just separately ;-)).

So I don't think that we should make that change ahead of this patch.
While you can't rely on being able to read 'ctx.m', I think you could
load the MIDX belonging to "source" ad-hoc after we have computed the
packs to fill from the MIDX's perspective, which is right around where
that want_bitmap code lives.

I suspect that that may simplify some of your patch, though please let
me know if that turns out not to be the case.

> Only if any of these conditions trigger we decide to write a new MIDX.
> This allows us to significantly reduce the time for repacks that end up
> doing nothing:
>
>     Benchmark 1: git repack --geometric=2 --write-midx -d
>       Time (mean ± σ):      3.183 s ±  0.078 s    [User: 2.924 s, System: 0.219 s]
>       Range (min … max):    2.985 s …  3.260 s    10 runs
>
>     Benchmark 2: ./git repack --geometric=2 --write-midx -d
>       Time (mean ± σ):     102.5 ms ±   1.0 ms    [User: 89.3 ms, System: 12.7 ms]
>       Range (min … max):   101.3 ms … 105.3 ms    28 runs
>
>     Summary
>       ./git repack --geometric=2 --write-midx -d ran
>        31.06 ± 0.82 times faster than git repack --geometric=2 --write-midx -d

Makes sense since we're effectively timing just MIDX generation here.

(I'll avoid reading midx_needs_update() too closely here in case you
make any changes based on the above. Glancing over it briefly, it all
seemed reasonable to me.)

> +test_expect_success 'up-to-date multi-pack-index is retained' '
> +	test_when_finished "rm -fr midx-up-to-date" &&
> +	git init midx-up-to-date &&
> +	(
> +		cd midx-up-to-date &&
> +
> +		# Write the initial pack that contains the most objects. This
> +		# will be the preferred pack.
> +		test_commit first &&
> +		test_commit second &&
> +		git repack -Ad --write-midx &&
> +		test_midx_is_retained -Ad &&

Should the MIDX be retained in this case? I think there is a reasonable
argument to be made in either direction here.

On the one hand: we performed the expected repack, and doing so did not
cause the existing MIDX to be invalid, so we left it alone. On the other
hand: the caller asked us to repack without writing a MIDX, so could be
surprised that one exists.

> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> index 9fc1626fbf..980599961c 100755
> --- a/t/t7703-repack-geometric.sh
> +++ b/t/t7703-repack-geometric.sh
> @@ -287,6 +287,86 @@ test_expect_success '--geometric with pack.packSizeLimit' '
>  	)
>  '
>
> +test_expect_success '--geometric --write-midx retains up-to-date MIDX without bitmap index' '
> +	test_when_finished "rm -fr repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit initial &&
> +
> +		test_path_is_missing .git/objects/pack/multi-pack-index &&
> +		git repack --geometric=2 --write-midx --no-write-bitmap-index &&
> +		test_path_is_file .git/objects/pack/multi-pack-index &&
> +		test-tool chmtime =0 .git/objects/pack/multi-pack-index &&

This portion is very similar to the new function added in t5319. I
wonder if it's worth sticking that in t/lib-midx.sh or similar?

I was wondering what this test was exercising that wasn't covered in the
earlier script, but since the geometric code path is quite different
from the non-geometric one, I think having coverage there is definitely
worthwhile.

> +test_expect_success '--geometric --write-midx regenerates MIDX when preferred pack changes' '
> +	test_when_finished "rm -fr repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		test_commit first &&
> +		test_commit second &&
> +		test_commit third &&
> +		git repack --geometric=2 --write-midx --write-bitmap-index &&
> +		test_commit fourth &&
> +		git repack --geometric=2 --write-midx --write-bitmap-index &&

This part is a little subtle since we're relying on the geometric repack
to pick up any loose objects that don't appear in existing pack(s). I
might suggest something like:

    for c in first second third
    do
            test_commit $c || return 1
    done &&
    git repack -d && # ...

, to make that step explicit, but I don't think it's a huge deal.

> +
> +		ls .git/objects/pack/*.pack >packs &&

Here and below you should be able to use $packdir instead of writing out
".git/objects/pack" explicitly.

> +		test_line_count = 2 packs &&
> +		preferred_pack=$(test-tool read-midx --preferred-pack .git/objects) &&
> +		other_pack=$(ls .git/objects/pack/*.idx | grep -v "$preferred_pack") &&
> +		echo "$preferred_pack" &&
> +		echo "$other_pack" &&

Stray debug output?
> +
> +		# Rewrite the multi-pack index with the current preferred pack.
> +		# git-repack(1) should decide to _not_ repack the MIDX in that

s/repack the MIDX/rewrite the MIDX/

> +		# case. This is mostly a sanity check to verify that the reason
> +		# for the repack really only is the changed preferred pack.
> +		rm -f .git/objects/pack/multi-pack-index* &&

Same note as above except here you can use $midx. I don't think you ever
have a .git/objects/pack/multi-pack-index.d here, so a standard removal
should do the trick. If you did, you would need to pass '-r' as well.

> +		git multi-pack-index write --bitmap --preferred-pack="$preferred_pack" &&
> +		test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
> +		ls -l .git/objects/pack/ >expect &&
> +		git repack --geometric=2 --write-midx --write-bitmap-index &&

I'm having a little bit of a hard time following this one. Are we
guaranteed to pick $preferred_pack as our preferred pack here in the
second repack?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index
  2025-12-10  0:22   ` Taylor Blau
@ 2025-12-10  9:40     ` Patrick Steinhardt
  2025-12-18 21:13       ` Taylor Blau
  0 siblings, 1 reply; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-10  9:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King

On Tue, Dec 09, 2025 at 07:22:11PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 07:27:14PM +0100, Patrick Steinhardt wrote:
> > The function `midx_preferred_pack()` returns the preferred pack for a
> > given multi-pack index. To compute the preferred pack we:
> >
> >   1. Look up the position of the first object indexed by the multi-pack
> >      index.
> >
> >   2. Convert this position from pseudo-pack order into MIDX order.
> >
> >   3. We then look up pack that corresponds to this MIDX index.
> 
> I think the implementation of midx_preferred_pack() works a little bit
> differently than is described here. I often get confused when working in
> this area juggling between the various object/pack orderings in my head.

Hm, I feel like I am missing something.

> midx_preferred_pack() cares about converting from the first position in
> pseudo-pack order back into MIDX object order. To do that, we convert
> the pseudo-pack position into a MIDX one, and then lookup the pack that
> represents that object.

Isn't that what I say in (2) and (3)? Or is this about (1) being
inaccurate? Would this sequence be more accurate:

  1. Take the first position indexed by the MIDX in pseudo-pack order.

  2. Convert this pseudo-pack position into the MIDX position.

  3. We then look up the pack that corresponds to this MIDX position.

In any case, I agree with you that juggling these different positions is
quite something :)

> > [...] But we only check for negative
> > return values there, even though the function returns a positive error
> > code in case the reverse index does not exist.
> 
> Ah. It looks like that was changed in 5a6072f631d (fsck: validate .rev
> file header, 2023-04-17), but it looks like the caller here did not
> learn about that change. It may be worth mentioning that commit in your
> patch message.

The caller was introduced at a later point though, via b1e3333068 (midx:
implement `midx_preferred_pack()`, 2023-12-14). So there wasn't really
any overlap here where both topics were cooking at the same point in
time, at least not upstream. And the commit that changed the return
value of `load_midx_revindex()` did update all callsites.

> While reviewing, I wanted to make sure that there weren't any other
> callers of load_midx_revindex() that were also missing this check. The
> return value of that function is propagated through the two expected
> functions:
> 
>     $ git grep -p load_revindex_from_disk
>     pack-revindex.c=struct revindex_header {
>     pack-revindex.c:static int load_revindex_from_disk(const struct git_hash_algo *algo,
>     pack-revindex.c=int load_pack_revindex_from_disk(struct packed_git *p)
>     pack-revindex.c:        ret = load_revindex_from_disk(p->repo->hash_algo,
>     pack-revindex.c=int load_midx_revindex(struct multi_pack_index *m)
>     pack-revindex.c:        ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,
> 
> , and checking through the callers of those two functions, all are
> prepared to handle a >0 return value.

Yup, thanks for double checking.

Patrick

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed
  2025-12-10  2:48   ` Taylor Blau
@ 2025-12-10  9:40     ` Patrick Steinhardt
  0 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-10  9:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King

On Tue, Dec 09, 2025 at 09:48:47PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 07:27:15PM +0100, Patrick Steinhardt wrote:
> > Address this issue by introducing a new function that determines whether
> > a rewrite of the MIDX would cause any user-visible changes. This covers
> > the following cases:
> >
> >   - No multi-pack index exists at all.
> >
> >   - The user asked us to write a bitmap, and we don't have any.
> >
> >   - The request preferred pack is different than the one that we have.
> >
> >   - The packfiles covered by the MIDX are changing.
> 
> I can't think of any cases beyond the ones you listed here that would
> require us to regenerate the MIDX. One kind-of-exception here would be:
> 
>     $ git repack [...] --write-midx --write-bitmap-index
>     $ git repack [...] --write-midx
> 
> where the second repack would generate an identical MIDX, but does not
> want to retain a bitmap. That case is already handled in the MIDX
> writing code if you search for "want_bitmap".
> 
> That makes me wonder whether the repack layer is the most appropriate
> one to handle this logic. It seems like write_midx_internal() would
> reasonably be able to detect whether or not the MIDX we have already is
> up-to-date with respect to the given input.

One upside of having it in git-repack(1) is that we need to care about
less situations in general as we are operating on a higher level. And
because of that we can make more assumptions.

That being said, putting it into `write_midx_internal()` has the benefit
that we're of course covering more potential cases where we can avoid a
needless rewrite of the MIDX, and that we have better information to
decide whether it would be needed or not.

> I think that makes some things about your patch easier and other things
> a little harder ;-).
> 
>  - On the "easier" front: while both the MIDX code and the portion of
>    the repack code that drives it receive the same set of packs to
>    include, the MIDX code already has the packs it would compare
>    in a standard format. That would avoid you having to handle ends_with
>    ends_with(include_name, ".idx") and ends_with(existing_name, ".pack")
>    as special cases, which would be nice.
> 
>  - On the "harder" front: when driving MIDX generation with the
>    '--stdin-packs' option, we *don't* load an existing MIDX ever since
>    0c5a62f14bc (midx-write.c: do not read existing MIDX with
>    `packs_to_include`, 2024-06-11).
> 
> I don't think that "harder" one is a show-stopper, though. Commit
> t0c5a62f14bc has enough gory details around how we generate pack IDs and
> various subtle assumptions about how and when we load packs that I am
> very hesitant to recommend changing it given its fragility (though we
> should examine and harden any fragilities within midx-write.c, maybe
> just separately ;-)).
> 
> So I don't think that we should make that change ahead of this patch.
> While you can't rely on being able to read 'ctx.m', I think you could
> load the MIDX belonging to "source" ad-hoc after we have computed the
> packs to fill from the MIDX's perspective, which is right around where
> that want_bitmap code lives.

Yeah, we're already loading the MIDX on-demand because in git-repack(1)
as we have closed the object database at the point in time where we're
about to write the MIDX. So overall the change is rather easy to make.

Also, now that I see that we already have some short-circuiting
conditions in git-multi-pack-index(1) I guess it makes sense to extend
those checks. They explicitly don't cover `--stdin-packs` rewrites right
now, so adding that check is an obvious improvement.

One thing I'm a bit torn on is whether or not to handle preferred packs
in the adjusted logic. We don't do so right now either, so I think I'll
drop this for now. I can see an argument that us not handling a changed
preferred pack is a bug though, in which case I'm happy to iterate.

Thanks for your feedback!

Patrick

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX
  2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
  2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
  2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
@ 2025-12-10 12:52 ` Patrick Steinhardt
  2025-12-10 12:52   ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
                     ` (3 more replies)
  2 siblings, 4 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-10 12:52 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King

Hi,

this small patch series introduces logic to avoid rewriting the
multi-pack index in case it's up-to-date already. This is especially
relevant in the context of geometric repacking, where we may decide to
not write any new packfiles, but we'd still rewrite the multi-pack
index.

This is a follow-up for the discussion that happened at [1].

Changes in v2:
  - Move the logic to skip writing updates into `write_midx_internal()`.
    We already had some logic there to skip no-op rewrites, so we only
    extend that logic now to handle the "--stdin-packs" option. This
    also has the added benefit that we know to strip bitmaps in case the
    write is a no-op.
  - I don't handle the case anymore where the preferred pack is
    changing. We didn't do so in the preexisting checks either, so I
    decided to drop this for now. This _can_ be considered as a bug, and
    if anyone thinks it is then I'll extend these checks.
  - Adapt the tests to use git-multi-pack-index(1) directly.
  - Link to v1: https://lore.kernel.org/r/20251208-pks-skip-noop-rewrite-v1-0-430d52dba9f0@pks.im

Thanks!

Patrick

[1]: <20251025191550.GA279793@coredump.intra.peff.net>

---
Patrick Steinhardt (3):
      midx: fix `BUG()` when getting preferred pack without a reverse index
      midx-write: extract function to test whether MIDX needs updating
      midx-write: skip rewriting MIDX with `--stdin-packs` unless needed

 midx-write.c                | 113 ++++++++++++++++++++++++++++++++++++--------
 midx.c                      |   2 +-
 pack-revindex.h             |   3 +-
 t/t5319-multi-pack-index.sh |  64 +++++++++++++++++++++++++
 t/t7703-repack-geometric.sh |  35 ++++++++++++++
 5 files changed, 195 insertions(+), 22 deletions(-)

Range-diff versus v1:

1:  11258a799b ! 1:  88ba93d3a5 midx: fix `BUG()` when getting preferred pack without a reverse index
    @@ Commit message
         The function `midx_preferred_pack()` returns the preferred pack for a
         given multi-pack index. To compute the preferred pack we:
     
    -      1. Look up the position of the first object indexed by the multi-pack
    -         index.
    +      1. Take the first position indexed by the MIDX in pseudo-pack order.
     
    -      2. Convert this position from pseudo-pack order into MIDX order.
    +      2. Convert this pseudo-pack position into the MIDX position.
     
    -      3. We then look up pack that corresponds to this MIDX index.
    +      3. We then look up the pack that corresponds to this MIDX position.
     
         This reliably returns the preferred pack given that all of its contained
         objects will be up front in pseudo-pack order.
2:  baf307b521 < -:  ---------- builtin/repack: don't regenerate MIDX unless needed
-:  ---------- > 2:  9bd320b2cf midx-write: extract function to test whether MIDX needs updating
-:  ---------- > 3:  f594865c12 midx-write: skip rewriting MIDX with `--stdin-packs` unless needed

---
base-commit: bdc5341ff65278a3cc80b2e8a02a2f02aa1fac06
change-id: 20251208-pks-skip-noop-rewrite-38d7f01c79c5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index
  2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
@ 2025-12-10 12:52   ` Patrick Steinhardt
  2025-12-10 12:52   ` [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating Patrick Steinhardt
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-10 12:52 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King

The function `midx_preferred_pack()` returns the preferred pack for a
given multi-pack index. To compute the preferred pack we:

  1. Take the first position indexed by the MIDX in pseudo-pack order.

  2. Convert this pseudo-pack position into the MIDX position.

  3. We then look up the pack that corresponds to this MIDX position.

This reliably returns the preferred pack given that all of its contained
objects will be up front in pseudo-pack order.

The second step that turns the pseudo-pack order into MIDX order
requires the reverse index though, which may not exist for example when
the MIDX does not have a bitmap. And in that case one may easily hit a
bug:

    BUG: ../pack-revindex.c:491: pack_pos_to_midx: reverse index not yet loaded

In theory, `midx_preferred_pack()` already knows to handle the case
where no reverse index exists, as it calls `load_midx_revindex()` before
calling into `midx_preferred_pack()`. But we only check for negative
return values there, even though the function returns a positive error
code in case the reverse index does not exist.

Fix the issue by testing for a non-zero return value instead, same as
all the other callers of this function already do. While at it, document
the return value of `load_midx_revindex()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx.c                      |  2 +-
 pack-revindex.h             |  3 ++-
 t/t5319-multi-pack-index.sh | 13 +++++++++++++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 24e1e72175..b681b18fc1 100644
--- a/midx.c
+++ b/midx.c
@@ -686,7 +686,7 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
 		uint32_t midx_pos;
-		if (load_midx_revindex(m) < 0) {
+		if (load_midx_revindex(m)) {
 			m->preferred_pack_idx = -2;
 			return -1;
 		}
diff --git a/pack-revindex.h b/pack-revindex.h
index 422c2487ae..0042892091 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -72,7 +72,8 @@ int verify_pack_revindex(struct packed_git *p);
  * multi-pack index by mmap-ing it and assigning pointers in the
  * multi_pack_index to point at it.
  *
- * A negative number is returned on error.
+ * A negative number is returned on error. A positive number is returned in
+ * case the multi-pack-index does not have a reverse index.
  */
 int load_midx_revindex(struct multi_pack_index *m);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 93f319a4b2..9492a9737b 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -350,7 +350,20 @@ test_expect_success 'preferred pack from existing MIDX without bitmaps' '
 		# the new MIDX
 		git multi-pack-index write --preferred-pack=pack-$pack.pack
 	)
+'
 
+test_expect_success 'preferred pack cannot be determined without bitmap' '
+	test_when_finished "rm -fr preferred-can-be-queried" &&
+	git init preferred-can-be-queried &&
+	(
+		cd preferred-can-be-queried &&
+		test_commit initial &&
+		git repack -Adl --write-midx --no-write-bitmap-index &&
+		test_must_fail test-tool read-midx --preferred-pack .git/objects 2>err &&
+		test_grep "could not determine MIDX preferred pack" err &&
+		git repack -Adl --write-midx --write-bitmap-index &&
+		test-tool read-midx --preferred-pack .git/objects
+	)
 '
 
 test_expect_success 'verify multi-pack-index success' '

-- 
2.52.0.270.g3f4935d65f.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating
  2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
  2025-12-10 12:52   ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
@ 2025-12-10 12:52   ` Patrick Steinhardt
  2025-12-10 12:52   ` [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed Patrick Steinhardt
  2025-12-11  8:46   ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
  3 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-10 12:52 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King

In `write_midx_internal()` we know to skip writing the new multi-pack
index in case it would be the same as the existing one. This logic does
not handle the `--stdin-packs` option yet though, so we end up always
rewriting the MIDX if that option is passed to us.

Extract the logic to decide whether or not to rewrite the MIDX into a
separate function. This will allow us to extend that feature in the next
commit to address the above issue.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx-write.c | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index e3e9be6d03..78bc8a65b8 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1014,6 +1014,41 @@ static void clear_midx_files(struct odb_source *source,
 	strbuf_release(&buf);
 }
 
+static bool midx_needs_update(struct write_midx_context *ctx)
+{
+	struct multi_pack_index *midx = ctx->m;
+	bool needed = true;
+
+	/*
+	 * Ignore incremental updates for now. The assumption is that any
+	 * incremental update would be either empty (in which case we will bail
+	 * out later) or it would actually cover at least one new pack.
+	 */
+	if (ctx->incremental)
+		goto out;
+
+	/*
+	 * If there is no MIDX then either it doesn't exist, or we're doing a
+	 * geometric repack. We cannot (yet) determine whether we need to
+	 * update the multi-pack index in the second case.
+	 */
+	if (!midx)
+		goto out;
+
+	/*
+	 * Otherwise, we need to verify that the packs covered by the existing
+	 * MIDX match the packs that we already have. This test is somewhat
+	 * lenient and will be fixed.
+	 */
+	if (ctx->nr != midx->num_packs + midx->num_packs_in_base)
+		goto out;
+
+	needed = false;
+
+out:
+	return needed;
+}
+
 static int write_midx_internal(struct odb_source *source,
 			       struct string_list *packs_to_include,
 			       struct string_list *packs_to_drop,
@@ -1111,9 +1146,7 @@ static int write_midx_internal(struct odb_source *source,
 	for_each_file_in_pack_dir(source->path, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
-	    !ctx.incremental &&
-	    !(packs_to_include || packs_to_drop)) {
+	if (!packs_to_include && !packs_to_drop && !midx_needs_update(&ctx)) {
 		struct bitmap_index *bitmap_git;
 		int bitmap_exists;
 		int want_bitmap = flags & MIDX_WRITE_BITMAP;

-- 
2.52.0.270.g3f4935d65f.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed
  2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
  2025-12-10 12:52   ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
  2025-12-10 12:52   ` [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating Patrick Steinhardt
@ 2025-12-10 12:52   ` Patrick Steinhardt
  2025-12-11  8:46   ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
  3 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-10 12:52 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King

In `write_midx_internal()` we know to skip rewriting the multi-pack
index in case the existing one already covers all packs. This logic does
not know to handle `git multi-pack-index write --stdin-packs` though, so
we end up always rewriting the MIDX in this case even if the MIDX would
not change.

With our default maintenance strategy this isn't really much of a
problem, as git-gc(1) does not use the "--stdin-packs" option. But that
is changing with geometric repacking, where "--stdin-packs" is used to
explicitly select the packfiles part of the geometric sequence.

This issue can be demonstrated trivially with a benchmark in the Git
repository: executing `git repack --geometric=2 --write-midx -d` in the
Git repository takes more than 3 seconds only to end up with the same
multi-pack index as we already had before.

The logic that decides if we need to rewrite the MIDX only checks
whether the number of packfiles covered will change. That check is of
course too lenient for "--stdin-packs", as it could happen that we want
to cover a different-but-same-size set of packfiles. But there is no
inherent reason why we cannot handle "--stdin-packs".

Improve the logic to not only check for the number of packs, but to also
verify that we are asked to generate a MIDX for the _same_ packs. This
allows us to also skip no-op rewrites for "--stdin-packs".

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx-write.c                | 100 +++++++++++++++++++++++++++++++-------------
 t/t5319-multi-pack-index.sh |  51 ++++++++++++++++++++++
 t/t7703-repack-geometric.sh |  35 ++++++++++++++++
 3 files changed, 156 insertions(+), 30 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 78bc8a65b8..ce459b02c3 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1014,9 +1014,10 @@ static void clear_midx_files(struct odb_source *source,
 	strbuf_release(&buf);
 }
 
-static bool midx_needs_update(struct write_midx_context *ctx)
+static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_context *ctx)
 {
-	struct multi_pack_index *midx = ctx->m;
+	struct strset packs = STRSET_INIT;
+	struct strbuf buf = STRBUF_INIT;
 	bool needed = true;
 
 	/*
@@ -1027,25 +1028,48 @@ static bool midx_needs_update(struct write_midx_context *ctx)
 	if (ctx->incremental)
 		goto out;
 
-	/*
-	 * If there is no MIDX then either it doesn't exist, or we're doing a
-	 * geometric repack. We cannot (yet) determine whether we need to
-	 * update the multi-pack index in the second case.
-	 */
-	if (!midx)
-		goto out;
-
 	/*
 	 * Otherwise, we need to verify that the packs covered by the existing
-	 * MIDX match the packs that we already have. This test is somewhat
-	 * lenient and will be fixed.
+	 * MIDX match the packs that we already have. The logic to do so is way
+	 * more complicated than it has any right to be. This is because:
+	 *
+	 *   - We cannot assume any ordering.
+	 *
+	 *   - The MIDX packs may not be loaded at all, and loading them would
+	 *     be wasteful. So we need to use the pack names tracked by the
+	 *     MIDX itself.
+	 *
+	 *   - The MIDX pack names are tracking the ".idx" files, whereas the
+	 *     packs themselves are tracking the ".pack" files. So we need to
+	 *     strip suffixes.
 	 */
 	if (ctx->nr != midx->num_packs + midx->num_packs_in_base)
 		goto out;
 
+	for (uint32_t i = 0; i < ctx->nr; i++) {
+		strbuf_reset(&buf);
+		strbuf_addstr(&buf, pack_basename(ctx->info[i].p));
+		strbuf_strip_suffix(&buf, ".pack");
+
+		if (!strset_add(&packs, buf.buf))
+			BUG("same pack added twice?");
+	}
+
+	for (uint32_t i = 0; i < ctx->nr; i++) {
+		strbuf_reset(&buf);
+		strbuf_addstr(&buf, midx->pack_names[i]);
+		strbuf_strip_suffix(&buf, ".idx");
+
+		if (!strset_contains(&packs, buf.buf))
+			goto out;
+		strset_remove(&packs, buf.buf);
+	}
+
 	needed = false;
 
 out:
+	strbuf_release(&buf);
+	strset_clear(&packs);
 	return needed;
 }
 
@@ -1066,6 +1090,7 @@ static int write_midx_internal(struct odb_source *source,
 	struct write_midx_context ctx = {
 		.preferred_pack_idx = NO_PREFERRED_PACK,
 	 };
+	struct multi_pack_index *midx_to_free = NULL;
 	int bitmapped_packs_concat_len = 0;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
@@ -1146,25 +1171,39 @@ static int write_midx_internal(struct odb_source *source,
 	for_each_file_in_pack_dir(source->path, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if (!packs_to_include && !packs_to_drop && !midx_needs_update(&ctx)) {
-		struct bitmap_index *bitmap_git;
-		int bitmap_exists;
-		int want_bitmap = flags & MIDX_WRITE_BITMAP;
-
-		bitmap_git = prepare_midx_bitmap_git(ctx.m);
-		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
-		free_bitmap_index(bitmap_git);
-
-		if (bitmap_exists || !want_bitmap) {
-			/*
-			 * The correct MIDX already exists, and so does a
-			 * corresponding bitmap (or one wasn't requested).
-			 */
-			if (!want_bitmap)
-				clear_midx_files_ext(source, "bitmap", NULL);
-			result = 0;
-			goto cleanup;
+	if (!packs_to_drop) {
+		/*
+		 * If there is no MIDX then either it doesn't exist, or we're
+		 * doing a geometric repack. Try to load it from the source to
+		 * tell these two cases apart.
+		 */
+		struct multi_pack_index *midx = ctx.m;
+		if (!midx)
+			midx = midx_to_free = load_multi_pack_index(ctx.source);
+
+		if (midx && !midx_needs_update(midx, &ctx)) {
+			struct bitmap_index *bitmap_git;
+			int bitmap_exists;
+			int want_bitmap = flags & MIDX_WRITE_BITMAP;
+
+			bitmap_git = prepare_midx_bitmap_git(midx);
+			bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
+			free_bitmap_index(bitmap_git);
+
+			if (bitmap_exists || !want_bitmap) {
+				/*
+				 * The correct MIDX already exists, and so does a
+				 * corresponding bitmap (or one wasn't requested).
+				 */
+				if (!want_bitmap)
+					clear_midx_files_ext(source, "bitmap", NULL);
+				result = 0;
+				goto cleanup;
+			}
 		}
+
+		close_midx(midx_to_free);
+		midx_to_free = NULL;
 	}
 
 	if (ctx.incremental && !ctx.nr) {
@@ -1520,6 +1559,7 @@ static int write_midx_internal(struct odb_source *source,
 		free(keep_hashes);
 	}
 	strbuf_release(&midx_name);
+	close_midx(midx_to_free);
 
 	trace2_region_leave("midx", "write_midx_internal", r);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 9492a9737b..794f8b5ab4 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -366,6 +366,57 @@ test_expect_success 'preferred pack cannot be determined without bitmap' '
 	)
 '
 
+test_midx_is_retained () {
+	test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
+	ls -l .git/objects/pack/multi-pack-index >expect &&
+	git multi-pack-index write "$@" &&
+	ls -l .git/objects/pack/multi-pack-index >actual &&
+	test_cmp expect actual
+}
+
+test_midx_is_rewritten () {
+	test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
+	ls -l .git/objects/pack/multi-pack-index >expect &&
+	git multi-pack-index write "$@" &&
+	ls -l .git/objects/pack/multi-pack-index >actual &&
+	! test_cmp expect actual
+}
+
+test_expect_success 'up-to-date multi-pack-index is retained' '
+	test_when_finished "rm -fr midx-up-to-date" &&
+	git init midx-up-to-date &&
+	(
+		cd midx-up-to-date &&
+
+		# Write the initial pack that contains the most objects.
+		test_commit first &&
+		test_commit second &&
+		git repack -Ad --write-midx &&
+		test_midx_is_retained &&
+
+		# Writing a new bitmap index should cause us to regenerate the MIDX.
+		test_midx_is_rewritten --bitmap &&
+		test_midx_is_retained --bitmap &&
+
+		# Ensure that writing a new packfile causes us to rewrite the index.
+		test_commit incremental &&
+		git repack -d &&
+		test_midx_is_rewritten &&
+		test_midx_is_retained &&
+
+		for pack in .git/objects/pack/*.idx
+		do
+			basename "$pack" || exit 1
+		done >stdin &&
+		test_line_count = 2 stdin &&
+		test_midx_is_retained --stdin-packs <stdin &&
+		head -n1 stdin >stdin.trimmed &&
+		test_midx_is_rewritten --stdin-packs <stdin.trimmed
+	)
+'
+
+test_done
+
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
 '
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 9fc1626fbf..98806cdb6f 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -287,6 +287,41 @@ test_expect_success '--geometric with pack.packSizeLimit' '
 	)
 '
 
+test_expect_success '--geometric --write-midx retains up-to-date MIDX without bitmap index' '
+	test_when_finished "rm -fr repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+
+		test_path_is_missing .git/objects/pack/multi-pack-index &&
+		git repack --geometric=2 --write-midx --no-write-bitmap-index &&
+		test_path_is_file .git/objects/pack/multi-pack-index &&
+		test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
+
+		ls -l .git/objects/pack/ >expect &&
+		git repack --geometric=2 --write-midx --no-write-bitmap-index &&
+		ls -l .git/objects/pack/ >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--geometric --write-midx retains up-to-date MIDX with bitmap index' '
+	test_when_finished "rm -fr repo" &&
+	git init repo &&
+	test_commit -C repo initial &&
+
+	test_path_is_missing repo/.git/objects/pack/multi-pack-index &&
+	git -C repo repack --geometric=2 --write-midx --write-bitmap-index &&
+	test_path_is_file repo/.git/objects/pack/multi-pack-index &&
+	test-tool chmtime =0 repo/.git/objects/pack/multi-pack-index &&
+
+	ls -l repo/.git/objects/pack/ >expect &&
+	git -C repo repack --geometric=2 --write-midx --write-bitmap-index &&
+	ls -l repo/.git/objects/pack/ >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success '--geometric --write-midx with packfiles in main and alternate ODB' '
 	test_when_finished "rm -fr shared member" &&
 

-- 
2.52.0.270.g3f4935d65f.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX
  2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2025-12-10 12:52   ` [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed Patrick Steinhardt
@ 2025-12-11  8:46   ` Junio C Hamano
  2025-12-12  7:33     ` Patrick Steinhardt
  3 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2025-12-11  8:46 UTC (permalink / raw)
  To: Patrick Steinhardt, Taylor Blau; +Cc: git, Jeff King

This and Taylor's incremental part 3.2 have a slight conflict in
that this topic factors away the logic to compute if we need
recomputing MIDX while the other one tweaks with yet another flag.

My tentative resolution in 'seen' looks like the attached.  Sanity
checking is very much appreciated.

Thanks.

diff --cc midx-write.c
index ce459b02c3,f2dbacef4c..66c125ccb0
--- a/midx-write.c
+++ b/midx-write.c
@@@ -1014,73 -1131,30 +1131,89 @@@ static void clear_midx_files(struct odb
  	strbuf_release(&buf);
  }
  
 +static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_context *ctx)
 +{
 +	struct strset packs = STRSET_INIT;
 +	struct strbuf buf = STRBUF_INIT;
 +	bool needed = true;
 +
 +	/*
 +	 * Ignore incremental updates for now. The assumption is that any
 +	 * incremental update would be either empty (in which case we will bail
 +	 * out later) or it would actually cover at least one new pack.
 +	 */
- 	if (ctx->incremental)
++	if (ctx->incremental || ctx->compact)
 +		goto out;
 +
 +	/*
 +	 * Otherwise, we need to verify that the packs covered by the existing
 +	 * MIDX match the packs that we already have. The logic to do so is way
 +	 * more complicated than it has any right to be. This is because:
 +	 *
 +	 *   - We cannot assume any ordering.
 +	 *
 +	 *   - The MIDX packs may not be loaded at all, and loading them would
 +	 *     be wasteful. So we need to use the pack names tracked by the
 +	 *     MIDX itself.
 +	 *
 +	 *   - The MIDX pack names are tracking the ".idx" files, whereas the
 +	 *     packs themselves are tracking the ".pack" files. So we need to
 +	 *     strip suffixes.
 +	 */
 +	if (ctx->nr != midx->num_packs + midx->num_packs_in_base)
 +		goto out;
 +
 +	for (uint32_t i = 0; i < ctx->nr; i++) {
 +		strbuf_reset(&buf);
 +		strbuf_addstr(&buf, pack_basename(ctx->info[i].p));
 +		strbuf_strip_suffix(&buf, ".pack");
 +
 +		if (!strset_add(&packs, buf.buf))
 +			BUG("same pack added twice?");
 +	}
 +
 +	for (uint32_t i = 0; i < ctx->nr; i++) {
 +		strbuf_reset(&buf);
 +		strbuf_addstr(&buf, midx->pack_names[i]);
 +		strbuf_strip_suffix(&buf, ".idx");
 +
 +		if (!strset_contains(&packs, buf.buf))
 +			goto out;
 +		strset_remove(&packs, buf.buf);
 +	}
 +
 +	needed = false;
 +
 +out:
 +	strbuf_release(&buf);
 +	strset_clear(&packs);
 +	return needed;
 +}
 +
- static int write_midx_internal(struct odb_source *source,
- 			       struct string_list *packs_to_include,
- 			       struct string_list *packs_to_drop,
- 			       const char *preferred_pack_name,
- 			       const char *refs_snapshot,
- 			       unsigned flags)
+ static int midx_hashcmp(const struct multi_pack_index *a,
+ 			const struct multi_pack_index *b,
+ 			const struct git_hash_algo *algop)
  {
- 	struct repository *r = source->odb->repo;
+ 	return hashcmp(get_midx_hash(a), get_midx_hash(b), algop);
+ }
+ 
+ struct write_midx_opts {
+ 	struct odb_source *source;
+ 
+ 	struct string_list *packs_to_include;
+ 	struct string_list *packs_to_drop;
+ 
+ 	struct multi_pack_index *compact_from;
+ 	struct multi_pack_index *compact_to;
+ 
+ 	const char *preferred_pack_name;
+ 	const char *refs_snapshot;
+ 	unsigned flags;
+ };
+ 
+ static int write_midx_internal(struct write_midx_opts *opts)
+ {
+ 	struct repository *r = opts->source->odb->repo;
  	struct strbuf midx_name = STRBUF_INIT;
  	unsigned char midx_hash[GIT_MAX_RAWSZ];
  	uint32_t start_pack;
@@@ -1166,44 -1257,43 +1317,54 @@@
  	else
  		ctx.progress = NULL;
  
- 	ctx.to_include = packs_to_include;
+ 	if (ctx.compact) {
+ 		int bitmap_order = 0;
+ 		if (opts->preferred_pack_name)
+ 			bitmap_order |= 1;
+ 		else if (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
+ 			bitmap_order |= 1;
  
- 	for_each_file_in_pack_dir(source->path, add_pack_to_midx, &ctx);
+ 		fill_packs_from_midx_range(&ctx, bitmap_order);
+ 	} else {
+ 		ctx.to_include = opts->packs_to_include;
+ 		for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
+ 	}
  	stop_progress(&ctx.progress);
  
- 	if (!packs_to_drop) {
 -	if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
 -	    !ctx.incremental &&
 -	    !ctx.compact &&
 -	    !(opts->packs_to_include || opts->packs_to_drop)) {
 -		struct bitmap_index *bitmap_git;
 -		int bitmap_exists;
 -		int want_bitmap = opts->flags & MIDX_WRITE_BITMAP;
 -
 -		bitmap_git = prepare_midx_bitmap_git(ctx.m);
 -		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
 -		free_bitmap_index(bitmap_git);
 -
 -		if (bitmap_exists || !want_bitmap) {
 -			/*
 -			 * The correct MIDX already exists, and so does a
 -			 * corresponding bitmap (or one wasn't requested).
 -			 */
 -			if (!want_bitmap)
 -				clear_midx_files_ext(opts->source, "bitmap",
 -						     NULL);
 -			result = 0;
 -			goto cleanup;
++	if (!opts->packs_to_drop) {
 +		/*
 +		 * If there is no MIDX then either it doesn't exist, or we're
 +		 * doing a geometric repack. Try to load it from the source to
 +		 * tell these two cases apart.
 +		 */
 +		struct multi_pack_index *midx = ctx.m;
 +		if (!midx)
 +			midx = midx_to_free = load_multi_pack_index(ctx.source);
 +
 +		if (midx && !midx_needs_update(midx, &ctx)) {
 +			struct bitmap_index *bitmap_git;
 +			int bitmap_exists;
- 			int want_bitmap = flags & MIDX_WRITE_BITMAP;
++			int want_bitmap = opts->flags & MIDX_WRITE_BITMAP;
 +
 +			bitmap_git = prepare_midx_bitmap_git(midx);
 +			bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
 +			free_bitmap_index(bitmap_git);
 +
 +			if (bitmap_exists || !want_bitmap) {
 +				/*
 +				 * The correct MIDX already exists, and so does a
 +				 * corresponding bitmap (or one wasn't requested).
 +				 */
 +				if (!want_bitmap)
- 					clear_midx_files_ext(source, "bitmap", NULL);
++					clear_midx_files_ext(opts->source,
++							     "bitmap", NULL);
 +				result = 0;
 +				goto cleanup;
 +			}
  		}
 +
 +		close_midx(midx_to_free);
 +		midx_to_free = NULL;
  	}
  
  	if (ctx.incremental && !ctx.nr) {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX
  2025-12-11  8:46   ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
@ 2025-12-12  7:33     ` Patrick Steinhardt
  2025-12-18 21:18       ` Taylor Blau
  0 siblings, 1 reply; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-12  7:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, Jeff King

On Thu, Dec 11, 2025 at 05:46:13PM +0900, Junio C Hamano wrote:
> This and Taylor's incremental part 3.2 have a slight conflict in
> that this topic factors away the logic to compute if we need
> recomputing MIDX while the other one tweaks with yet another flag.
> 
> My tentative resolution in 'seen' looks like the attached.  Sanity
> checking is very much appreciated.
> 
> Thanks.
> 
> diff --cc midx-write.c
> index ce459b02c3,f2dbacef4c..66c125ccb0
> --- a/midx-write.c
> +++ b/midx-write.c
> @@@ -1014,73 -1131,30 +1131,89 @@@ static void clear_midx_files(struct odb
>   	strbuf_release(&buf);
>   }
>   
>  +static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_context *ctx)
>  +{
>  +	struct strset packs = STRSET_INIT;
>  +	struct strbuf buf = STRBUF_INIT;
>  +	bool needed = true;
>  +
>  +	/*
>  +	 * Ignore incremental updates for now. The assumption is that any
>  +	 * incremental update would be either empty (in which case we will bail
>  +	 * out later) or it would actually cover at least one new pack.
>  +	 */
> - 	if (ctx->incremental)
> ++	if (ctx->incremental || ctx->compact)
>  +		goto out;

So this here is essentially the change you had to port over, which looks
about right to me. The comment is becoming somewhat stale due to the
change, but I don't think that's much of an issue for now.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index
  2025-12-10  9:40     ` Patrick Steinhardt
@ 2025-12-18 21:13       ` Taylor Blau
  0 siblings, 0 replies; 16+ messages in thread
From: Taylor Blau @ 2025-12-18 21:13 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King

On Wed, Dec 10, 2025 at 10:40:10AM +0100, Patrick Steinhardt wrote:
> On Tue, Dec 09, 2025 at 07:22:11PM -0500, Taylor Blau wrote:
> > On Mon, Dec 08, 2025 at 07:27:14PM +0100, Patrick Steinhardt wrote:
> > > The function `midx_preferred_pack()` returns the preferred pack for a
> > > given multi-pack index. To compute the preferred pack we:
> > >
> > >   1. Look up the position of the first object indexed by the multi-pack
> > >      index.
> > >
> > >   2. Convert this position from pseudo-pack order into MIDX order.
> > >
> > >   3. We then look up pack that corresponds to this MIDX index.
> >
> > I think the implementation of midx_preferred_pack() works a little bit
> > differently than is described here. I often get confused when working in
> > this area juggling between the various object/pack orderings in my head.
>
> Hm, I feel like I am missing something.
>
> > midx_preferred_pack() cares about converting from the first position in
> > pseudo-pack order back into MIDX object order. To do that, we convert
> > the pseudo-pack position into a MIDX one, and then lookup the pack that
> > represents that object.
>
> Isn't that what I say in (2) and (3)? Or is this about (1) being
> inaccurate? Would this sequence be more accurate:
>
>   1. Take the first position indexed by the MIDX in pseudo-pack order.
>
>   2. Convert this pseudo-pack position into the MIDX position.
>
>   3. We then look up the pack that corresponds to this MIDX position.
>
> In any case, I agree with you that juggling these different positions is
> quite something :)

I was thinking about (1) being inaccurate, but the rephrasing you
provided here and in the new version of the patch look good to me.

Both (2) and (3) from your original patch are correct. I was commenting
here on the phrasing in (1) suggesting we "look up" a position from the
MIDX, which is inaccurate. The position is given as the first position
in pseudo-pack order, and "take the first position" perfectly captures
that.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX
  2025-12-12  7:33     ` Patrick Steinhardt
@ 2025-12-18 21:18       ` Taylor Blau
  2025-12-19  6:25         ` Patrick Steinhardt
  0 siblings, 1 reply; 16+ messages in thread
From: Taylor Blau @ 2025-12-18 21:18 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, git, Jeff King

On Fri, Dec 12, 2025 at 08:33:14AM +0100, Patrick Steinhardt wrote:
> On Thu, Dec 11, 2025 at 05:46:13PM +0900, Junio C Hamano wrote:
> > This and Taylor's incremental part 3.2 have a slight conflict in
> > that this topic factors away the logic to compute if we need
> > recomputing MIDX while the other one tweaks with yet another flag.
> >
> > My tentative resolution in 'seen' looks like the attached.  Sanity
> > checking is very much appreciated.
> >
> > Thanks.
> >
> > diff --cc midx-write.c
> > index ce459b02c3,f2dbacef4c..66c125ccb0
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@@ -1014,73 -1131,30 +1131,89 @@@ static void clear_midx_files(struct odb
> >   	strbuf_release(&buf);
> >   }
> >
> >  +static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_context *ctx)
> >  +{
> >  +	struct strset packs = STRSET_INIT;
> >  +	struct strbuf buf = STRBUF_INIT;
> >  +	bool needed = true;
> >  +
> >  +	/*
> >  +	 * Ignore incremental updates for now. The assumption is that any
> >  +	 * incremental update would be either empty (in which case we will bail
> >  +	 * out later) or it would actually cover at least one new pack.
> >  +	 */
> > - 	if (ctx->incremental)
> > ++	if (ctx->incremental || ctx->compact)
> >  +		goto out;
>
> So this here is essentially the change you had to port over, which looks
> about right to me. The comment is becoming somewhat stale due to the
> change, but I don't think that's much of an issue for now.
>
> Thanks!

Thanks, both. The new version of these patches looks good to me. FYI I
am going out of office beginning tomorrow through the end of the year.
In case it's easier to queue, it's fine to drop my 3.2 patches from
'seen' and take Patrick's v2 as-is.

I plan on sending a new round of 3.2 in the first week of the new year
and don't mind it being dropped in the meantime, especially if it makes
things easier for the maintainer.

Enjoy the holidays everyone!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX
  2025-12-18 21:18       ` Taylor Blau
@ 2025-12-19  6:25         ` Patrick Steinhardt
  0 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-12-19  6:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, git, Jeff King

On Thu, Dec 18, 2025 at 04:18:48PM -0500, Taylor Blau wrote:
> Thanks, both. The new version of these patches looks good to me. FYI I
> am going out of office beginning tomorrow through the end of the year.
> In case it's easier to queue, it's fine to drop my 3.2 patches from
> 'seen' and take Patrick's v2 as-is.
> 
> I plan on sending a new round of 3.2 in the first week of the new year
> and don't mind it being dropped in the meantime, especially if it makes
> things easier for the maintainer.

Thanks for your review!

> Enjoy the holidays everyone!

Likewise, enjoy your holidays and see you next year!

Patrick

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-12-19  6:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10  0:22   ` Taylor Blau
2025-12-10  9:40     ` Patrick Steinhardt
2025-12-18 21:13       ` Taylor Blau
2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
2025-12-10  2:48   ` Taylor Blau
2025-12-10  9:40     ` Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed Patrick Steinhardt
2025-12-11  8:46   ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
2025-12-12  7:33     ` Patrick Steinhardt
2025-12-18 21:18       ` Taylor Blau
2025-12-19  6:25         ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).