* [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()`
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
@ 2026-03-19 22:24 ` Taylor Blau
2026-03-24 7:39 ` Patrick Steinhardt
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
` (5 subsequent siblings)
6 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2026-03-19 22:24 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
The `read_stdin_packs()` function added originally via 339bce27f4f
(builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22)
declares a `rev_info` struct but neglects to call `release_revisions()`
on it before returning, creating a leak.
The related change in 97ec43247c0 (pack-objects: declare 'rev_info' for
'--stdin-packs' earlier, 2025-06-23) carried forward this oversight and
did not address it.
Ensure that we call `release_revisions()` appropriately to prevent a
leak from this function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index cd013c0b68a..9a89bc5c4c9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3968,6 +3968,8 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
show_object_pack_hint,
&mode);
+ release_revisions(&revs);
+
trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
stdin_packs_found_nr);
trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
--
2.53.0.614.gc4fd52e751a
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()`
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
@ 2026-03-24 7:39 ` Patrick Steinhardt
2026-03-25 23:03 ` Taylor Blau
0 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2026-03-24 7:39 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
On Thu, Mar 19, 2026 at 06:24:15PM -0400, Taylor Blau wrote:
> The `read_stdin_packs()` function added originally via 339bce27f4f
> (builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22)
> declares a `rev_info` struct but neglects to call `release_revisions()`
> on it before returning, creating a leak.
>
> The related change in 97ec43247c0 (pack-objects: declare 'rev_info' for
> '--stdin-packs' earlier, 2025-06-23) carried forward this oversight and
> did not address it.
>
> Ensure that we call `release_revisions()` appropriately to prevent a
> leak from this function.
Would be curious to learn why none of our tests fail with this. The fix
looks obviously correct though, and there are no other early exits that
might need fixing here.
Patrick
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()`
2026-03-24 7:39 ` Patrick Steinhardt
@ 2026-03-25 23:03 ` Taylor Blau
0 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-25 23:03 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
On Tue, Mar 24, 2026 at 08:39:17AM +0100, Patrick Steinhardt wrote:
> On Thu, Mar 19, 2026 at 06:24:15PM -0400, Taylor Blau wrote:
> > The `read_stdin_packs()` function added originally via 339bce27f4f
> > (builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22)
> > declares a `rev_info` struct but neglects to call `release_revisions()`
> > on it before returning, creating a leak.
> >
> > The related change in 97ec43247c0 (pack-objects: declare 'rev_info' for
> > '--stdin-packs' earlier, 2025-06-23) carried forward this oversight and
> > did not address it.
> >
> > Ensure that we call `release_revisions()` appropriately to prevent a
> > leak from this function.
>
> Would be curious to learn why none of our tests fail with this. The fix
> looks obviously correct though, and there are no other early exits that
> might need fixing here.
I believe it's because the only memory we allocate here is in
revs->pending, but we copy it to old_pending in prepare_revision_walk()
and all object_array_clear() on it.
Since our traversal doesn't use any other fields of rev_info that would
cause it to allocate memory, we don't see any leaks in our tests.
I'll rewords the commit message to clarify that this is preventing the
*potential* of a leak, not an actual leak.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
@ 2026-03-19 22:24 ` Taylor Blau
2026-03-24 7:39 ` Patrick Steinhardt
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
` (4 subsequent siblings)
6 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2026-03-19 22:24 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
The '--stdin-packs' mode of pack-objects maintains two separate
string_lists: one for included packs, and one for excluded packs. Each
list stores the pack basename as a string and the corresponding
`packed_git` pointer in its `->util` field.
This works, but makes it awkward to extend the set of pack "kinds" that
pack-objects can accept via stdin, since each new kind would need its
own string_list and duplicated handling. A future commit will want to do
just this, so prepare for that change by handling the various "kinds" of
packs specified over stdin in a more generic fashion.
Namely, replace the two `string_list`s with a single `strmap` keyed on
the pack basename, with values pointing to a new `struct
stdin_pack_info`. This struct tracks both the `packed_git` pointer and a
`kind` bitfield indicating whether the pack was specified as included or
excluded.
Extract the logic for sorting packs by mtime and adding their objects
into a separate `stdin_packs_add_entries()` helper.
While we could have used a `string_list`, we must handle the case where
the same pack is specified more than once. With a `string_list` only, we
would have to pay a quadratic cost to either (a) insert elements into
their sorted positions, or (b) a repeated linear search, which is
accidentally quadratic. For that reason, use a strmap instead.
This patch does not include any functional changes.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 153 +++++++++++++++++++++++++----------------
1 file changed, 92 insertions(+), 61 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9a89bc5c4c9..72c9ddbed6b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -28,6 +28,7 @@
#include "reachable.h"
#include "oid-array.h"
#include "strvec.h"
+#include "strmap.h"
#include "list.h"
#include "packfile.h"
#include "object-file.h"
@@ -3837,90 +3838,120 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
return 0;
}
-static void read_packs_list_from_stdin(struct rev_info *revs)
+struct stdin_pack_info {
+ struct packed_git *p;
+ enum {
+ STDIN_PACK_INCLUDE = (1<<0),
+ STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
+ } kind;
+};
+
+static void stdin_packs_add_pack_entries(struct strmap *packs,
+ struct rev_info *revs)
+{
+ struct string_list keys = STRING_LIST_INIT_NODUP;
+ struct string_list_item *item;
+ struct hashmap_iter iter;
+ struct strmap_entry *entry;
+
+ strmap_for_each_entry(packs, &iter, entry) {
+ struct stdin_pack_info *info = entry->value;
+ if (!info->p)
+ die(_("could not find pack '%s'"), entry->key);
+
+ string_list_append(&keys, entry->key)->util = info->p;
+ }
+
+ /*
+ * Order packs by ascending mtime; use QSORT directly to access the
+ * string_list_item's ->util pointer, which string_list_sort() does not
+ * provide.
+ */
+ QSORT(keys.items, keys.nr, pack_mtime_cmp);
+
+ for_each_string_list_item(item, &keys) {
+ struct stdin_pack_info *info = strmap_get(packs, item->string);
+ if (!info->p)
+ die(_("could not find pack '%s'"), item->string);
+
+ if (info->kind & STDIN_PACK_INCLUDE)
+ for_each_object_in_pack(info->p,
+ add_object_entry_from_pack,
+ revs,
+ ODB_FOR_EACH_OBJECT_PACK_ORDER);
+ }
+
+ string_list_clear(&keys, 0);
+}
+
+static void stdin_packs_read_input(struct rev_info *revs)
{
struct strbuf buf = STRBUF_INIT;
- struct string_list include_packs = STRING_LIST_INIT_DUP;
- struct string_list exclude_packs = STRING_LIST_INIT_DUP;
- struct string_list_item *item = NULL;
+ struct strmap packs = STRMAP_INIT;
struct packed_git *p;
while (strbuf_getline(&buf, stdin) != EOF) {
- if (!buf.len)
+ struct stdin_pack_info *info;
+ const char *key = buf.buf;
+
+ if (!key || !*key)
continue;
+ if (*key == '^')
+ key++;
+
+ info = strmap_get(&packs, key);
+ if (!info) {
+ CALLOC_ARRAY(info, 1);
+ strmap_put(&packs, key, info);
+ }
+
if (*buf.buf == '^')
- string_list_append(&exclude_packs, buf.buf + 1);
+ info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
else
- string_list_append(&include_packs, buf.buf);
+ info->kind |= STDIN_PACK_INCLUDE;
strbuf_reset(&buf);
}
- string_list_sort_u(&include_packs, 0);
- string_list_sort_u(&exclude_packs, 0);
-
repo_for_each_pack(the_repository, p) {
- const char *pack_name = pack_basename(p);
+ struct stdin_pack_info *info;
- if ((item = string_list_lookup(&include_packs, pack_name))) {
+ info = strmap_get(&packs, pack_basename(p));
+ if (!info)
+ continue;
+
+ if (info->kind & STDIN_PACK_INCLUDE) {
if (exclude_promisor_objects && p->pack_promisor)
die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
- item->util = p;
+
+ /*
+ * Arguments we got on stdin may not even be
+ * packs. First check that to avoid segfaulting
+ * later on in e.g. pack_mtime_cmp(), excluded
+ * packs are handled below.
+ */
+ if (!is_pack_valid(p))
+ die(_("packfile %s cannot be accessed"), p->pack_name);
}
- if ((item = string_list_lookup(&exclude_packs, pack_name)))
- item->util = p;
- }
- /*
- * Arguments we got on stdin may not even be packs. First
- * check that to avoid segfaulting later on in
- * e.g. pack_mtime_cmp(), excluded packs are handled below.
- *
- * Since we first parsed our STDIN and then sorted the input
- * lines the pack we error on will be whatever line happens to
- * sort first. This is lazy, it's enough that we report one
- * bad case here, we don't need to report the first/last one,
- * or all of them.
- */
- for_each_string_list_item(item, &include_packs) {
- struct packed_git *p = item->util;
- if (!p)
- die(_("could not find pack '%s'"), item->string);
- if (!is_pack_valid(p))
- die(_("packfile %s cannot be accessed"), p->pack_name);
- }
+ if (info->kind & STDIN_PACK_EXCLUDE_CLOSED) {
+ /*
+ * Marking excluded packs as kept in-core so
+ * that later calls to add_object_entry()
+ * discards any objects that are also found in
+ * excluded packs.
+ */
+ p->pack_keep_in_core = 1;
+ }
- /*
- * Then, handle all of the excluded packs, marking them as
- * kept in-core so that later calls to add_object_entry()
- * discards any objects that are also found in excluded packs.
- */
- for_each_string_list_item(item, &exclude_packs) {
- struct packed_git *p = item->util;
- if (!p)
- die(_("could not find pack '%s'"), item->string);
- p->pack_keep_in_core = 1;
+ info->p = p;
}
- /*
- * Order packs by ascending mtime; use QSORT directly to access the
- * string_list_item's ->util pointer, which string_list_sort() does not
- * provide.
- */
- QSORT(include_packs.items, include_packs.nr, pack_mtime_cmp);
-
- for_each_string_list_item(item, &include_packs) {
- struct packed_git *p = item->util;
- for_each_object_in_pack(p,
- add_object_entry_from_pack,
- revs,
- ODB_FOR_EACH_OBJECT_PACK_ORDER);
- }
+ stdin_packs_add_pack_entries(&packs, revs);
strbuf_release(&buf);
- string_list_clear(&include_packs, 0);
- string_list_clear(&exclude_packs, 0);
+ strmap_clear(&packs, 1);
}
static void add_unreachable_loose_objects(struct rev_info *revs);
@@ -3957,7 +3988,7 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
/* avoids adding objects in excluded packs */
ignore_packed_keep_in_core = 1;
- read_packs_list_from_stdin(&revs);
+ stdin_packs_read_input(&revs);
if (rev_list_unpacked)
add_unreachable_loose_objects(&revs);
--
2.53.0.614.gc4fd52e751a
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
@ 2026-03-24 7:39 ` Patrick Steinhardt
2026-03-25 23:13 ` Taylor Blau
0 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2026-03-24 7:39 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
On Thu, Mar 19, 2026 at 06:24:20PM -0400, Taylor Blau wrote:
> The '--stdin-packs' mode of pack-objects maintains two separate
> string_lists: one for included packs, and one for excluded packs. Each
> list stores the pack basename as a string and the corresponding
> `packed_git` pointer in its `->util` field.
>
> This works, but makes it awkward to extend the set of pack "kinds" that
> pack-objects can accept via stdin, since each new kind would need its
> own string_list and duplicated handling. A future commit will want to do
> just this, so prepare for that change by handling the various "kinds" of
> packs specified over stdin in a more generic fashion.
>
> Namely, replace the two `string_list`s with a single `strmap` keyed on
> the pack basename, with values pointing to a new `struct
> stdin_pack_info`. This struct tracks both the `packed_git` pointer and a
> `kind` bitfield indicating whether the pack was specified as included or
> excluded.
Okay.
> Extract the logic for sorting packs by mtime and adding their objects
> into a separate `stdin_packs_add_entries()` helper.
Right, the ordering was my first question. Interestingly though, that
function doesn't seem to be added in this commit... ah, it's called
`stdin_packs_add_pack_entries()`.
> While we could have used a `string_list`, we must handle the case where
> the same pack is specified more than once. With a `string_list` only, we
> would have to pay a quadratic cost to either (a) insert elements into
> their sorted positions, or (b) a repeated linear search, which is
> accidentally quadratic. For that reason, use a strmap instead.
We could of course just add them and deduplicate in a later step via
`string_list_sort_u()`. But I assume that you want to handle the case
where we have duplicates specially, either by merging the duplicate into
the existing pack info or by aborting.
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 9a89bc5c4c9..72c9ddbed6b 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3837,90 +3838,120 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
> return 0;
> }
>
> -static void read_packs_list_from_stdin(struct rev_info *revs)
> +struct stdin_pack_info {
> + struct packed_git *p;
> + enum {
> + STDIN_PACK_INCLUDE = (1<<0),
> + STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
It might make sense to provide a sentence for each of the enums to
explain what they do.
> + } kind;
> +};
Okay, this is the new structure that allows us to track the packfile
kind in a way that is more extensible. Makes sense.
> +static void stdin_packs_add_pack_entries(struct strmap *packs,
> + struct rev_info *revs)
> +{
> + struct string_list keys = STRING_LIST_INIT_NODUP;
> + struct string_list_item *item;
> + struct hashmap_iter iter;
> + struct strmap_entry *entry;
> +
> + strmap_for_each_entry(packs, &iter, entry) {
> + struct stdin_pack_info *info = entry->value;
> + if (!info->p)
> + die(_("could not find pack '%s'"), entry->key);
> +
> + string_list_append(&keys, entry->key)->util = info->p;
> + }
> +
> + /*
> + * Order packs by ascending mtime; use QSORT directly to access the
> + * string_list_item's ->util pointer, which string_list_sort() does not
> + * provide.
> + */
> + QSORT(keys.items, keys.nr, pack_mtime_cmp);
Okay. I was briefly wondering whether it would make more sense to use
`string_list_sort()`, but I guess it doesn't buy us much.
> + for_each_string_list_item(item, &keys) {
> + struct stdin_pack_info *info = strmap_get(packs, item->string);
We could avoid this extra lookup if you instead were to store the pack
info in the `item->util` field.
> + if (!info->p)
> + die(_("could not find pack '%s'"), item->string);
This case basically cannot happen as we already `die()` further up,
right? Should we rather `BUG()` or drop the check completely?
> + if (info->kind & STDIN_PACK_INCLUDE)
> + for_each_object_in_pack(info->p,
> + add_object_entry_from_pack,
> + revs,
> + ODB_FOR_EACH_OBJECT_PACK_ORDER);
> + }
> +
> + string_list_clear(&keys, 0);
> +}
> +
> +static void stdin_packs_read_input(struct rev_info *revs)
> {
> struct strbuf buf = STRBUF_INIT;
> - struct string_list include_packs = STRING_LIST_INIT_DUP;
> - struct string_list exclude_packs = STRING_LIST_INIT_DUP;
> - struct string_list_item *item = NULL;
> + struct strmap packs = STRMAP_INIT;
> struct packed_git *p;
>
> while (strbuf_getline(&buf, stdin) != EOF) {
> - if (!buf.len)
> + struct stdin_pack_info *info;
> + const char *key = buf.buf;
> +
> + if (!key || !*key)
The first case of `!key` cannot ever happen as strbufs always have `buf`
set.
> continue;
>
> + if (*key == '^')
> + key++;
> +
> + info = strmap_get(&packs, key);
> + if (!info) {
> + CALLOC_ARRAY(info, 1);
> + strmap_put(&packs, key, info);
> + }
> +
> if (*buf.buf == '^')
> - string_list_append(&exclude_packs, buf.buf + 1);
> + info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
> else
> - string_list_append(&include_packs, buf.buf);
> + info->kind |= STDIN_PACK_INCLUDE;
I was briefly wondering whether we need error handling for the case
where a pack is marked both as excluded and included. But we didn't have
it beforehand, either.
> strbuf_reset(&buf);
> }
>
> - string_list_sort_u(&include_packs, 0);
> - string_list_sort_u(&exclude_packs, 0);
> -
> repo_for_each_pack(the_repository, p) {
> - const char *pack_name = pack_basename(p);
> + struct stdin_pack_info *info;
>
> - if ((item = string_list_lookup(&include_packs, pack_name))) {
> + info = strmap_get(&packs, pack_basename(p));
> + if (!info)
> + continue;
> +
> + if (info->kind & STDIN_PACK_INCLUDE) {
> if (exclude_promisor_objects && p->pack_promisor)
> die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
> - item->util = p;
> +
> + /*
> + * Arguments we got on stdin may not even be
> + * packs. First check that to avoid segfaulting
> + * later on in e.g. pack_mtime_cmp(), excluded
> + * packs are handled below.
> + */
> + if (!is_pack_valid(p))
> + die(_("packfile %s cannot be accessed"), p->pack_name);
Hm. Doesn't this change behaviour though? Beforehand, we would have
checked the packfile for every included pack. Now we only check the
packfile for every included pack that was yielded by
`repo_for_each_pack()`. So if an included pack wasn't yielded at all we
wouldn't notice that it doesn't exist?
I guess an easy fix would be to mark every pack that we have processed
as seen in the pack info, and then loop over all pack infos a second
time to verify that we've seen all that we expected to see.
Which you in fact already do :) That post-processing happens in
`stdin_packs_add_pack_entries()`, where you verify that the `p` pointer
is set as expected. And if it's not we die with a message that the pack
wasn't found. Good.
This was a bit more demanding to review, but I very much like the
outcome of this.
Patrick
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-24 7:39 ` Patrick Steinhardt
@ 2026-03-25 23:13 ` Taylor Blau
0 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-25 23:13 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
On Tue, Mar 24, 2026 at 08:39:28AM +0100, Patrick Steinhardt wrote:
> > Extract the logic for sorting packs by mtime and adding their objects
> > into a separate `stdin_packs_add_entries()` helper.
>
> Right, the ordering was my first question. Interestingly though, that
> function doesn't seem to be added in this commit... ah, it's called
> `stdin_packs_add_pack_entries()`.
Ah, good catch. I had originally called it `stdin_packs_add_entries()`
but renamed it before sending, apparently without adjusting the commit
message appropriately.
> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 9a89bc5c4c9..72c9ddbed6b 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -3837,90 +3838,120 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
> > return 0;
> > }
> >
> > -static void read_packs_list_from_stdin(struct rev_info *revs)
> > +struct stdin_pack_info {
> > + struct packed_git *p;
> > + enum {
> > + STDIN_PACK_INCLUDE = (1<<0),
> > + STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
>
> It might make sense to provide a sentence for each of the enums to
> explain what they do.
I'm not opposed, but I am not sure what information would be helpful to
add here, since these correspond one-to-one with the three possible
prefixes for packfile names we receive with --stdin-packs.
> > +static void stdin_packs_add_pack_entries(struct strmap *packs,
> > + struct rev_info *revs)
> > +{
> > + struct string_list keys = STRING_LIST_INIT_NODUP;
> > + struct string_list_item *item;
> > + struct hashmap_iter iter;
> > + struct strmap_entry *entry;
> > +
> > + strmap_for_each_entry(packs, &iter, entry) {
> > + struct stdin_pack_info *info = entry->value;
> > + if (!info->p)
> > + die(_("could not find pack '%s'"), entry->key);
> > +
> > + string_list_append(&keys, entry->key)->util = info->p;
> > + }
> > +
> > + /*
> > + * Order packs by ascending mtime; use QSORT directly to access the
> > + * string_list_item's ->util pointer, which string_list_sort() does not
> > + * provide.
> > + */
> > + QSORT(keys.items, keys.nr, pack_mtime_cmp);
>
> Okay. I was briefly wondering whether it would make more sense to use
> `string_list_sort()`, but I guess it doesn't buy us much.
Yeah. This is actually carried forward from the existing implementation,
and uses the separate QSORT() because `string_list_sort()` doesn't
provide access to the `util` field of the items, which we need to sort
by mtime.
> > + for_each_string_list_item(item, &keys) {
> > + struct stdin_pack_info *info = strmap_get(packs, item->string);
>
> We could avoid this extra lookup if you instead were to store the pack
> info in the `item->util` field.
Good idea. Funnily enough, we already assign ->util = info->p in the
loop above, but never use it. Something like this on top should clean
things up nicely:
--- 8< ---
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 72c9ddbed6b..c9b33d1673d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3859,7 +3859,7 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
if (!info->p)
die(_("could not find pack '%s'"), entry->key);
- string_list_append(&keys, entry->key)->util = info->p;
+ string_list_append(&keys, entry->key)->util = info;
}
/*
@@ -3870,9 +3870,7 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
QSORT(keys.items, keys.nr, pack_mtime_cmp);
for_each_string_list_item(item, &keys) {
- struct stdin_pack_info *info = strmap_get(packs, item->string);
- if (!info->p)
- die(_("could not find pack '%s'"), item->string);
+ struct stdin_pack_info *info = item->util;
if (info->kind & STDIN_PACK_INCLUDE)
for_each_object_in_pack(info->p,
--- >8 ---
> > + if (!info->p)
> > + die(_("could not find pack '%s'"), item->string);
>
> This case basically cannot happen as we already `die()` further up,
> right? Should we rather `BUG()` or drop the check completely?
I think we should drop the check completely here, there's no way that we
would have a NULL 'info->p' by this point with the check that exists a
few lines up.
> > + if (info->kind & STDIN_PACK_INCLUDE)
> > + for_each_object_in_pack(info->p,
> > + add_object_entry_from_pack,
> > + revs,
> > + ODB_FOR_EACH_OBJECT_PACK_ORDER);
> > + }
> > +
> > + string_list_clear(&keys, 0);
> > +}
> > +
> > +static void stdin_packs_read_input(struct rev_info *revs)
> > {
> > struct strbuf buf = STRBUF_INIT;
> > - struct string_list include_packs = STRING_LIST_INIT_DUP;
> > - struct string_list exclude_packs = STRING_LIST_INIT_DUP;
> > - struct string_list_item *item = NULL;
> > + struct strmap packs = STRMAP_INIT;
> > struct packed_git *p;
> >
> > while (strbuf_getline(&buf, stdin) != EOF) {
> > - if (!buf.len)
> > + struct stdin_pack_info *info;
> > + const char *key = buf.buf;
> > +
> > + if (!key || !*key)
>
> The first case of `!key` cannot ever happen as strbufs always have `buf`
> set.
You're right, this is just muscle memory, but the left-hand side of the
condition is unnecessary. I'll remove it.
> > continue;
> >
> > + if (*key == '^')
> > + key++;
> > +
> > + info = strmap_get(&packs, key);
> > + if (!info) {
> > + CALLOC_ARRAY(info, 1);
> > + strmap_put(&packs, key, info);
> > + }
> > +
> > if (*buf.buf == '^')
> > - string_list_append(&exclude_packs, buf.buf + 1);
> > + info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
> > else
> > - string_list_append(&include_packs, buf.buf);
> > + info->kind |= STDIN_PACK_INCLUDE;
>
> I was briefly wondering whether we need error handling for the case
> where a pack is marked both as excluded and included. But we didn't have
> it beforehand, either.
Yeah, I think this is a consequence of 752b465c3c0 (pack-objects: fix
error when same packfile is included and excluded, 2023-04-14).
> > [snip]
> > +
> > + /*
> > + * Arguments we got on stdin may not even be
> > + * packs. First check that to avoid segfaulting
> > + * later on in e.g. pack_mtime_cmp(), excluded
> > + * packs are handled below.
> > + */
> > + if (!is_pack_valid(p))
> > + die(_("packfile %s cannot be accessed"), p->pack_name);
>
> Hm. Doesn't this change behaviour though? Beforehand, we would have
> checked the packfile for every included pack. Now we only check the
> packfile for every included pack that was yielded by
> `repo_for_each_pack()`. So if an included pack wasn't yielded at all we
> wouldn't notice that it doesn't exist?
>
> I guess an easy fix would be to mark every pack that we have processed
> as seen in the pack info, and then loop over all pack infos a second
> time to verify that we've seen all that we expected to see.
>
> Which you in fact already do :) That post-processing happens in
> `stdin_packs_add_pack_entries()`, where you verify that the `p` pointer
> is set as expected. And if it's not we die with a message that the pack
> wasn't found. Good.
Thanks for double checking.
> This was a bit more demanding to review, but I very much like the
> outcome of this.
Yeah, I really struggled to try and find a productive way to break this
up into smaller changes. But in the end I couldn't find any good splits
that I liked, hence the larger-than-usual patch.
Thanks for reviewing it, I think that it makes the rest of the series a
little more palatable, and the resulting code is easier to reason about
IMHO.
Thanks,
Taylor
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
@ 2026-03-19 22:24 ` Taylor Blau
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
` (3 subsequent siblings)
6 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-19 22:24 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
Add a test demonstrating a case where geometric repacking fails to
produce a pack with full object closure, thus making it impossible to
write a reachability bitmap.
Mark the test with 'test_expect_failure' for now. The subsequent commit
will explain the precise failure mode, and implement a fix.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t7704-repack-cruft.sh | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index aa2e2e6ad88..77133395b5d 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -869,4 +869,26 @@ test_expect_success 'repack --write-midx includes cruft when already geometric'
)
'
+test_expect_failure 'repack rescues once-cruft objects above geometric split' '
+ git config repack.midxMustContainCruft false &&
+
+ test_commit reachable &&
+ test_commit unreachable &&
+
+ unreachable="$(git rev-parse HEAD)" &&
+
+ git reset --hard HEAD^ &&
+ git tag -d unreachable &&
+ git reflog expire --all --expire=all &&
+
+ git repack --cruft -d &&
+
+ echo $unreachable | git pack-objects .git/objects/pack/pack &&
+
+ test_commit new &&
+
+ git update-ref refs/heads/other $unreachable &&
+ git repack --geometric=2 -d --write-midx --write-bitmap-index
+'
+
test_done
--
2.53.0.614.gc4fd52e751a
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
` (2 preceding siblings ...)
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
@ 2026-03-19 22:24 ` Taylor Blau
2026-03-21 16:57 ` Jeff King
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
` (2 subsequent siblings)
6 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2026-03-19 22:24 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In cd846bacc7d (pack-objects: introduce '--stdin-packs=follow',
2025-06-23), pack-objects learned to traverse through commits in
included packs when using '--stdin-packs=follow', rescuing reachable
objects from unlisted packs into the output.
When we encounter a commit in an excluded pack during this rescuing
phase we will traverse through its parents. But because we set
`revs.no_kept_objects = 1`, commit simplification will prevent us from
showing it via `get_revision()`. (In practice, `--stdin-packs=follow`
walks commits down to the roots, but only opens up trees for ones that
do not appear in an excluded pack.)
But there are certain cases where we *do* need to see the parents of an
object in an excluded pack. Namely, if an object is rescue-able, but
only reachable from object(s) which appear in excluded packs, then
commit simplification will exclude those commits from the object
traversal, and we will never see a copy of that object, and thus not
rescue it.
This is what causes the failure in the previous commit during repacking.
When performing a geometric repack, packs above the geometric split that
weren't part of the previous MIDX (e.g., packs pushed directly into
`$GIT_DIR/objects/pack`) may not have full object closure. When those
packs are listed as excluded via the '^' marker, the reachability
traversal encounters the sequence described above, and may miss objects
which we expect to rescue with `--stdin-packs=follow`.
Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed
packs, objects from '!'-prefixed packs are excluded from the resulting
pack. But unlike '^', commits in '!'-prefixed packs *are* used as
starting points for the follow traversal, and the traversal does not
treat them as a closure boundary.
In order to distinguish excluded-closed from excluded-open packs during
the traversal, introduce a new `pack_keep_in_core_open` bit on
`struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN`
flag for the kept-pack cache.
In `add_object_entry_from_pack()`, move the `want_object_in_pack()`
check to *after* `add_pending_oid()`. This is necessary so that commits
from excluded-open packs are added as traversal tips even though their
objects won't appear in the output. The `include_check` and
`include_check_obj` callbacks on `rev_info` are used to halt the walk at
closed-excluded packs, since objects behind a '^' boundary are
guaranteed to have closure and need not be rescued.
The following commit will make use of this new functionality within the
repack layer to resolve the test failure demonstrated in the previous
commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-pack-objects.adoc | 25 +++++--
builtin/pack-objects.c | 87 ++++++++++++++++++++---
packfile.c | 3 +-
packfile.h | 2 +
t/t5331-pack-objects-stdin.sh | 105 ++++++++++++++++++++++++++++
5 files changed, 203 insertions(+), 19 deletions(-)
diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 71b9682485c..b78175fbe1b 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -94,13 +94,24 @@ base-name::
included packs (those not beginning with `^`), excluding any
objects listed in the excluded packs (beginning with `^`).
+
-When `mode` is "follow", objects from packs not listed on stdin receive
-special treatment. Objects within unlisted packs will be included if
-those objects are (1) reachable from the included packs, and (2) not
-found in any excluded packs. This mode is useful, for example, to
-resurrect once-unreachable objects found in cruft packs to generate
-packs which are closed under reachability up to the boundary set by the
-excluded packs.
+When `mode` is "follow" packs may additionally be prefixed with `!`,
+indicating that they are excluded but not necessarily closed under
+reachability. In addition to objects in included packs, the resulting
+pack may include additional objects based on the following:
++
+--
+* If any packs are marked with `!`, then objects reachable from such
+ packs or included ones via objects outside of excluded-closed packs
+ will be included. In this case, all `^` packs are treated as closed
+ under reachability.
+* Otherwise (if there are no `!` packs), objects within unlisted packs
+ will be included if those objects are (1) reachable from the
+ included packs, and (2) not found in any excluded packs.
+--
++
+This mode is useful, for example, to resurrect once-unreachable
+objects found in cruft packs to generate packs which are closed under
+reachability up to the boundary set by the excluded packs.
+
Incompatible with `--revs`, or options that imply `--revs` (such as
`--all`), with the exception of `--unpacked`, which is compatible.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 72c9ddbed6b..0d25a856717 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -217,6 +217,7 @@ static int have_non_local_packs;
static int incremental;
static int ignore_packed_keep_on_disk;
static int ignore_packed_keep_in_core;
+static int ignore_packed_keep_in_core_open;
static int ignore_packed_keep_in_core_has_cruft;
static int allow_ofs_delta;
static struct pack_idx_option pack_idx_opts;
@@ -1618,7 +1619,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
/*
* Then handle .keep first, as we have a fast(er) path there.
*/
- if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) {
+ if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core ||
+ ignore_packed_keep_in_core_open) {
/*
* Set the flags for the kept-pack cache to be the ones we want
* to ignore.
@@ -1632,6 +1634,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
flags |= KEPT_PACK_ON_DISK;
if (ignore_packed_keep_in_core)
flags |= KEPT_PACK_IN_CORE;
+ if (ignore_packed_keep_in_core_open)
+ flags |= KEPT_PACK_IN_CORE_OPEN;
/*
* If the object is in a pack that we want to ignore, *and* we
@@ -1643,6 +1647,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
return 0;
if (ignore_packed_keep_in_core && p->pack_keep_in_core)
return 0;
+ if (ignore_packed_keep_in_core_open && p->pack_keep_in_core_open)
+ return 0;
if (has_object_kept_pack(p->repo, oid, flags))
return 0;
} else {
@@ -3750,8 +3756,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
return 0;
ofs = nth_packed_object_offset(p, pos);
- if (!want_object_in_pack(oid, 0, &p, &ofs))
- return 0;
if (p) {
struct object_info oi = OBJECT_INFO_INIT;
@@ -3763,15 +3767,26 @@ static int add_object_entry_from_pack(const struct object_id *oid,
} else if (type == OBJ_COMMIT) {
struct rev_info *revs = _data;
/*
- * commits in included packs are used as starting points for the
- * subsequent revision walk
+ * commits in included packs are used as starting points
+ * for the subsequent revision walk
+ *
+ * Note that we do want to walk through commits that are
+ * present in excluded-open ('!') packs to pick up any
+ * objects reachable from them not present in the
+ * excluded-closed ('^') packs.
+ *
+ * However, we'll only add those objects to the packing
+ * list after checking `want_object_in_pack()` below.
*/
add_pending_oid(revs, NULL, oid, 0);
}
-
- stdin_packs_found_nr++;
}
+ if (!want_object_in_pack(oid, 0, &p, &ofs))
+ return 0;
+
+ stdin_packs_found_nr++;
+
create_object_entry(oid, type, 0, 0, 0, p, ofs);
return 0;
@@ -3838,11 +3853,23 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
return 0;
}
+static int stdin_packs_include_check_obj(struct object *obj, void *data UNUSED)
+{
+ return !has_object_kept_pack(to_pack.repo, &obj->oid,
+ KEPT_PACK_IN_CORE);
+}
+
+static int stdin_packs_include_check(struct commit *commit, void *data)
+{
+ return stdin_packs_include_check_obj((struct object *)commit, data);
+}
+
struct stdin_pack_info {
struct packed_git *p;
enum {
STDIN_PACK_INCLUDE = (1<<0),
STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
+ STDIN_PACK_EXCLUDE_OPEN = (1<<2),
} kind;
};
@@ -3874,7 +3901,19 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
if (!info->p)
die(_("could not find pack '%s'"), item->string);
- if (info->kind & STDIN_PACK_INCLUDE)
+ if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
+ /*
+ * When open-excluded packs ("!") are present, stop
+ * the parent walk at closed-excluded ("^") packs.
+ * Objects behind a "^" boundary are guaranteed to
+ * have closure and should not be rescued.
+ */
+ revs->include_check = stdin_packs_include_check;
+ revs->include_check_obj = stdin_packs_include_check_obj;
+ }
+
+ if ((info->kind & STDIN_PACK_INCLUDE) ||
+ (info->kind & STDIN_PACK_EXCLUDE_OPEN))
for_each_object_in_pack(info->p,
add_object_entry_from_pack,
revs,
@@ -3884,7 +3923,8 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
string_list_clear(&keys, 0);
}
-static void stdin_packs_read_input(struct rev_info *revs)
+static void stdin_packs_read_input(struct rev_info *revs,
+ enum stdin_packs_mode mode)
{
struct strbuf buf = STRBUF_INIT;
struct strmap packs = STRMAP_INIT;
@@ -3897,7 +3937,8 @@ static void stdin_packs_read_input(struct rev_info *revs)
if (!key || !*key)
continue;
- if (*key == '^')
+ if (*key == '^' ||
+ (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW))
key++;
info = strmap_get(&packs, key);
@@ -3908,6 +3949,8 @@ static void stdin_packs_read_input(struct rev_info *revs)
if (*buf.buf == '^')
info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
+ else if (*buf.buf == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
+ info->kind |= STDIN_PACK_EXCLUDE_OPEN;
else
info->kind |= STDIN_PACK_INCLUDE;
@@ -3945,6 +3988,20 @@ static void stdin_packs_read_input(struct rev_info *revs)
p->pack_keep_in_core = 1;
}
+ if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
+ /*
+ * Marking excluded open packs as kept in-core
+ * (open) for the same reason as we marked
+ * exclude closed packs as kept in-core.
+ *
+ * Use a separate flag here to ensure we don't
+ * halt our traversal at these packs, since they
+ * are not guaranteed to have closure.
+ *
+ */
+ p->pack_keep_in_core_open = 1;
+ }
+
info->p = p;
}
@@ -3988,7 +4045,15 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
/* avoids adding objects in excluded packs */
ignore_packed_keep_in_core = 1;
- stdin_packs_read_input(&revs);
+ if (mode == STDIN_PACKS_MODE_FOLLOW) {
+ /*
+ * In '--stdin-packs=follow' mode, additionally ignore
+ * objects in excluded-open packs to prevent them from
+ * appearing in the resulting pack.
+ */
+ ignore_packed_keep_in_core_open = 1;
+ }
+ stdin_packs_read_input(&revs, mode);
if (rev_list_unpacked)
add_unreachable_loose_objects(&revs);
diff --git a/packfile.c b/packfile.c
index 215a23e42be..076e444e32a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2246,7 +2246,8 @@ struct packed_git **packfile_store_get_kept_pack_cache(struct packfile_store *st
struct packed_git *p = e->pack;
if ((p->pack_keep && (flags & KEPT_PACK_ON_DISK)) ||
- (p->pack_keep_in_core && (flags & KEPT_PACK_IN_CORE))) {
+ (p->pack_keep_in_core && (flags & KEPT_PACK_IN_CORE)) ||
+ (p->pack_keep_in_core_open && (flags & KEPT_PACK_IN_CORE_OPEN))) {
ALLOC_GROW(packs, nr + 1, alloc);
packs[nr++] = p;
}
diff --git a/packfile.h b/packfile.h
index 8b04a258a7b..b7735c1977d 100644
--- a/packfile.h
+++ b/packfile.h
@@ -28,6 +28,7 @@ struct packed_git {
unsigned pack_local:1,
pack_keep:1,
pack_keep_in_core:1,
+ pack_keep_in_core_open:1,
freshened:1,
do_not_close:1,
pack_promisor:1,
@@ -266,6 +267,7 @@ int packfile_store_freshen_object(struct packfile_store *store,
enum kept_pack_type {
KEPT_PACK_ON_DISK = (1 << 0),
KEPT_PACK_IN_CORE = (1 << 1),
+ KEPT_PACK_IN_CORE_OPEN = (1 << 2),
};
/*
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 7eb79bc2cdb..c74b5861af3 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -415,4 +415,109 @@ test_expect_success '--stdin-packs=follow tolerates missing commits' '
stdin_packs__follow_with_only HEAD HEAD^{tree}
'
+test_expect_success '--stdin-packs=follow with open-excluded packs' '
+ test_when_finished "rm -fr repo" &&
+
+ git init repo &&
+ (
+ cd repo &&
+ git config set maintenance.auto false &&
+
+ git branch -M main &&
+
+ # Create the following commit structure:
+ #
+ # A <-- B <-- D (main)
+ # ^
+ # \
+ # C (other)
+ test_commit A &&
+ test_commit B &&
+ git checkout -B other &&
+ test_commit C &&
+ git checkout main &&
+ test_commit D &&
+
+ A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+ B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+ C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+ D="$(echo B..D | git pack-objects --revs $packdir/pack)" &&
+
+ C_ONLY="$(git rev-parse other | git pack-objects $packdir/pack)" &&
+
+ git prune-packed &&
+
+ # Create a pack using --stdin-packs=follow where:
+ #
+ # - pack D is included,
+ # - pack C_ONLY is excluded, but open,
+ # - pack B is excluded, but closed, and
+ # - packs A and C are unknown
+ #
+ # The resulting pack should therefore contain:
+ #
+ # - objects from the included pack D,
+ # - A.t (rescued via D^{tree}), and
+ # - C^{tree} and C.t (rescued via pack C_ONLY)
+ #
+ # , but should omit:
+ #
+ # - C (excluded via C_ONLY),
+ # - objects from pack B (trivially excluded-closed)
+ # - A and A^{tree} (ancestors of B)
+ P=$(git pack-objects --stdin-packs=follow $packdir/pack <<-EOF
+ pack-$D.pack
+ !pack-$C_ONLY.pack
+ ^pack-$B.pack
+ EOF
+ ) &&
+
+ {
+ objects_in_packs $D &&
+ git rev-parse A:A.t "C^{tree}" C:C.t
+ } >expect.raw &&
+ sort expect.raw >expect &&
+
+ objects_in_packs $P >actual &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success '--stdin-packs with !-delimited pack without follow' '
+ test_when_finished "rm -fr repo" &&
+
+ git init repo &&
+ (
+ test_commit A &&
+ test_commit B &&
+ test_commit C &&
+
+ A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+ B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+ C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+
+ cat >in <<-EOF &&
+ !pack-$A.pack
+ pack-$B.pack
+ pack-$C.pack
+ EOF
+
+ # Without --stdin-packs=follow, we treat the first
+ # line of input as a literal packfile name, and thus
+ # expect pack-objects to complain of a missing pack
+ test_must_fail git pack-objects --stdin-packs --stdout \
+ >/dev/null <in 2>err &&
+ test_grep "could not find pack .!pack-$A.pack." err &&
+
+ # With --stdin-packs=follow, we treat the second line
+ # of input as indicating pack-$A.pack is an excluded
+ # open pack, and thus expect pack-objects to succeed
+ P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
+
+ objects_in_packs $B $C >expect &&
+ objects_in_packs $P >actual &&
+ test_cmp expect actual
+ )
+'
+
test_done
--
2.53.0.614.gc4fd52e751a
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
@ 2026-03-21 16:57 ` Jeff King
2026-03-22 18:09 ` Taylor Blau
0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2026-03-21 16:57 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt
On Thu, Mar 19, 2026 at 06:24:25PM -0400, Taylor Blau wrote:
> @@ -3750,8 +3756,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
> return 0;
>
> ofs = nth_packed_object_offset(p, pos);
> - if (!want_object_in_pack(oid, 0, &p, &ofs))
> - return 0;
>
> if (p) {
> struct object_info oi = OBJECT_INFO_INIT;
I haven't read all of the patches carefully yet, but Coverity observed
that without this call to want_object_in_pack(), we are left with "p"
that came from the caller. If that "p" is ever NULL, then the
nth_packed_object_offset() call above will segfault. But if it is not,
then the "if (p)" below is pointless.
AFAICT it will never be NULL, and the conditional is now pointless. But
it wasn't before your patch, since want_object_in_pack() may set the
found_pack to NULL if the object is not wanted.
-Peff
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs
2026-03-21 16:57 ` Jeff King
@ 2026-03-22 18:09 ` Taylor Blau
2026-03-25 23:19 ` Taylor Blau
0 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2026-03-22 18:09 UTC (permalink / raw)
To: Jeff King; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt
On Sat, Mar 21, 2026 at 12:57:11PM -0400, Jeff King wrote:
> On Thu, Mar 19, 2026 at 06:24:25PM -0400, Taylor Blau wrote:
>
> > @@ -3750,8 +3756,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
> > return 0;
> >
> > ofs = nth_packed_object_offset(p, pos);
> > - if (!want_object_in_pack(oid, 0, &p, &ofs))
> > - return 0;
> >
> > if (p) {
> > struct object_info oi = OBJECT_INFO_INIT;
>
> I haven't read all of the patches carefully yet, but Coverity observed
> that without this call to want_object_in_pack(), we are left with "p"
> that came from the caller. If that "p" is ever NULL, then the
> nth_packed_object_offset() call above will segfault. But if it is not,
> then the "if (p)" below is pointless.
>
> AFAICT it will never be NULL, and the conditional is now pointless. But
> it wasn't before your patch, since want_object_in_pack() may set the
> found_pack to NULL if the object is not wanted.
I think that's right. Looking through the code again, the "if (p)"
conditional is indeed no longer useful, since the sole caller of this
function (the callback to `for_each_object_in_pack()`) will always pass
a non-NULL `p` pointer.
It's tempting to add something like:
if (!p)
BUG("add_object_entry_from_pack: expected non-NULL pack");
But I wonder if we should instead store the result of calling
`want_object_in_pack()` into a separate variable, only creating an
object entry if "want == 1". That would have the effect of *not* marking
objects as traversal tips if want_object_in_pack() makes `p` NULL.
I think that's OK, since what we really care about is having objects in
both included and excluded-open packs being processed as traversal tips.
Because we're enumerating only objects that are in included and
excluded-open packs:
* Objects in excluded-open packs will cause `want_found_object()` to
return 0, propagating that through `want_object_in_pack()` without
setting "p" to NULL. We'll process these objects as traversal tips,
but not create an object entry for them, which is what we want.
* Objects in included packs will return -1 from `want_found_object()`,
causing us to NULL out "p" and search through other packs. If we find
another pack containing that object, then we'll set "p" to that pack
and return 1, otherwise we'll return 0 if no such pack is found.
I think that's the right behavior, so doing something like this instead:
--- 8< ---
@@ -3742,6 +3749,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
{
off_t ofs;
enum object_type type = OBJ_NONE;
+ int want;
display_progress(progress_state, ++nr_seen);
@@ -3749,8 +3757,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
return 0;
ofs = nth_packed_object_offset(p, pos);
- if (!want_object_in_pack(oid, 0, &p, &ofs))
- return 0;
+ want = want_object_in_pack(oid, 0, &p, &ofs);
if (p) {
struct object_info oi = OBJECT_INFO_INIT;
@@ -3762,8 +3769,16 @@ static int add_object_entry_from_pack(const struct object_id *oid,
} else if (type == OBJ_COMMIT) {
struct rev_info *revs = _data;
/*
- * commits in included packs are used as starting points for the
- * subsequent revision walk
+ * commits in included packs are used as starting points
+ * for the subsequent revision walk
+ *
+ * Note that we do want to walk through commits that are
+ * present in excluded-open ('!') packs to pick up any
+ * objects reachable from them not present in the
+ * excluded-closed ('^') packs.
+ *
+ * However, we'll only add those objects to the packing
+ * list after checking `want_object_in_pack()` below.
*/
add_pending_oid(revs, NULL, oid, 0);
}
@@ -3771,7 +3786,8 @@ static int add_object_entry_from_pack(const struct object_id *oid,
stdin_packs_found_nr++;
}
- create_object_entry(oid, type, 0, 0, 0, p, ofs);
+ if (want)
+ create_object_entry(oid, type, 0, 0, 0, p, ofs);
return 0;
}
--- >8 ---
may be more readable. I can't shake the feeling that there is some
important case that this is missing, though, what do you think?
Thanks,
Taylor
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs
2026-03-22 18:09 ` Taylor Blau
@ 2026-03-25 23:19 ` Taylor Blau
0 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-25 23:19 UTC (permalink / raw)
To: Jeff King; +Cc: git, Junio C Hamano, Elijah Newren, Patrick Steinhardt
On Sun, Mar 22, 2026 at 02:09:43PM -0400, Taylor Blau wrote:
> It's tempting to add something like:
>
> if (!p)
> BUG("add_object_entry_from_pack: expected non-NULL pack");
>
> But I wonder if we should instead store the result of calling
> `want_object_in_pack()` into a separate variable, only creating an
> object entry if "want == 1". That would have the effect of *not* marking
> objects as traversal tips if want_object_in_pack() makes `p` NULL.
I ended up talking myself out of this.
There's no reason to call want_object_in_pack() early, as it may change
the very pack pointer we wish to use to determine the object type of the
given object.
Once we have determined the object type, then we are free to call
want_object_in_pack() to determine if we want to add the object to the
resulting pack.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
` (3 preceding siblings ...)
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
@ 2026-03-19 22:24 ` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
6 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-19 22:24 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In 5ee86c273bf (repack: exclude cruft pack(s) from the MIDX where
possible, 2025-06-23), geometric repacking learned to exclude cruft
packs from the MIDX when 'repack.midxMustContainCruft' is set to
'false'.
This works because packs generated with '--stdin-packs=follow' rescue
any once-unreachable objects that later become reachable, making the
resulting packs closed under reachability without needing the cruft pack
in the MIDX.
However, packs above the geometric split that were not part of the
previous MIDX may not have full object closure. When such packs are
marked as excluded-closed ('^'), pack-objects treats them as a
reachability boundary and does not traverse through them during the
follow pass, potentially leaving the resulting pack without full
closure.
Fix this by marking packs above the geometric split that were not in the
previous MIDX as excluded-open ('!') instead of excluded-closed ('^').
This causes pack-objects to walk through their commits during the follow
pass, rescuing any reachable objects not present in the closed-excluded
packs.
Note that MIDXs which were generated prior to this change and are
unlucky enough to not be closed under reachability may still exhibit
this bug, as we treat all MIDX'd packs as closed. That is true in an
overwhelming number of cases, since in order to have a non-closed MIDX
you would have to:
- Generate a pack via an earlier geometric repack that is not closed
under reachability.
- Store that pack in the MIDX.
- Avoid picking any commits to receive reachability bitmaps which
happen to reach objects from which the missing objects are reachable.
In the extremely rare chance that all of the above should happen, an
all-into-one repack will resolve the issue.
Unfortunately, there is no perfect way to determine whether a MIDX'd
pack is closed outside of ensuring that there is a '1' bit in at least
one bitmap for every bit position corresponding to objects in that pack.
While this is possible to do, this approach would treat MIDX'd packs as
open in cases where there is at least one object that is not reachable
from the subset of commits selected for bitmapping.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 19 +++++++++++++++++--
t/t7704-repack-cruft.sh | 2 +-
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index f6bb04bef72..4c5a82c2c8d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -369,8 +369,23 @@ int cmd_repack(int argc,
*/
for (i = 0; i < geometry.split; i++)
fprintf(in, "%s\n", pack_basename(geometry.pack[i]));
- for (i = geometry.split; i < geometry.pack_nr; i++)
- fprintf(in, "^%s\n", pack_basename(geometry.pack[i]));
+ for (i = geometry.split; i < geometry.pack_nr; i++) {
+ const char *basename = pack_basename(geometry.pack[i]);
+ char marker = '^';
+
+ if (!midx_must_contain_cruft &&
+ !string_list_has_string(&existing.midx_packs,
+ basename)) {
+ /*
+ * Assume non-MIDX'd packs are not
+ * necessarily closed under
+ * reachability.
+ */
+ marker = '!';
+ }
+
+ fprintf(in, "%c%s\n", marker, basename);
+ }
fclose(in);
}
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 77133395b5d..9e03b04315d 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -869,7 +869,7 @@ test_expect_success 'repack --write-midx includes cruft when already geometric'
)
'
-test_expect_failure 'repack rescues once-cruft objects above geometric split' '
+test_expect_success 'repack rescues once-cruft objects above geometric split' '
git config repack.midxMustContainCruft false &&
test_commit reachable &&
--
2.53.0.614.gc4fd52e751a
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow`
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
` (4 preceding siblings ...)
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
@ 2026-03-25 23:51 ` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
` (5 more replies)
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
6 siblings, 6 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-25 23:51 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
This is a small reroll of my series to fix an issue where MIDX bitmaps
fail to generate after a geometric repack in certain scenarios where the
set of MIDX'd objects is not closed under reachability.
The main changes since last time are:
* Clarification in the first patch that the added `release_revisions()`
call prevents a *potential* leak, not an actual one.
* Cleanup in the second patch (where we convert the --stdin-packs
handling to use a strmap) based on Patrick's review.
* Dropped an unnecessary "if (p)" conditional in the fourth patch's
`add_object_entry_from_pack()` callback that is unnecessary.
Otherwise, the series is unchanged from the original round. As usual, a
range-diff is included below for convenience.
Thanks again for your review!
Taylor Blau (5):
pack-objects: plug leak in `read_stdin_packs()`
pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
t7704: demonstrate failure with once-cruft objects above the geometric
split
pack-objects: support excluded-open packs with --stdin-packs
repack: mark non-MIDX packs above the split as excluded-open
Documentation/git-pack-objects.adoc | 25 ++-
builtin/pack-objects.c | 254 +++++++++++++++++++---------
builtin/repack.c | 19 ++-
packfile.c | 3 +-
packfile.h | 2 +
t/t5331-pack-objects-stdin.sh | 105 ++++++++++++
t/t7704-repack-cruft.sh | 22 +++
7 files changed, 339 insertions(+), 91 deletions(-)
Range-diff against v1:
1: 1dac74f1e4a ! 1: 1fabd88f5e3 pack-objects: plug leak in `read_stdin_packs()`
@@ Commit message
The `read_stdin_packs()` function added originally via 339bce27f4f
(builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22)
declares a `rev_info` struct but neglects to call `release_revisions()`
- on it before returning, creating a leak.
+ on it before returning, creating the potential for a leak.
The related change in 97ec43247c0 (pack-objects: declare 'rev_info' for
'--stdin-packs' earlier, 2025-06-23) carried forward this oversight and
did not address it.
Ensure that we call `release_revisions()` appropriately to prevent a
- leak from this function.
+ potential leak from this function. Note that in practice our `rev_info`
+ here does not have a present leak, hence t5331 passes cleanly before
+ this commit, even when built with SANITIZE=leak.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2: ea6fdbcc46f ! 2: d5cb793f0eb pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
@@ Commit message
excluded.
Extract the logic for sorting packs by mtime and adding their objects
- into a separate `stdin_packs_add_entries()` helper.
+ into a separate `stdin_packs_add_pack_entries()` helper.
While we could have used a `string_list`, we must handle the case where
the same pack is specified more than once. With a `string_list` only, we
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
+ if (!info->p)
+ die(_("could not find pack '%s'"), entry->key);
+
-+ string_list_append(&keys, entry->key)->util = info->p;
++ string_list_append(&keys, entry->key)->util = info;
+ }
+
+ /*
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
+ QSORT(keys.items, keys.nr, pack_mtime_cmp);
+
+ for_each_string_list_item(item, &keys) {
-+ struct stdin_pack_info *info = strmap_get(packs, item->string);
-+ if (!info->p)
-+ die(_("could not find pack '%s'"), item->string);
++ struct stdin_pack_info *info = item->util;
+
+ if (info->kind & STDIN_PACK_INCLUDE)
+ for_each_object_in_pack(info->p,
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
+ struct stdin_pack_info *info;
+ const char *key = buf.buf;
+
-+ if (!key || !*key)
++ if (!*key)
continue;
-
+ if (*key == '^')
+ key++;
+
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
+ CALLOC_ARRAY(info, 1);
+ strmap_put(&packs, key, info);
+ }
-+
+
if (*buf.buf == '^')
- string_list_append(&exclude_packs, buf.buf + 1);
+ info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
3: 0ec9bba92ad = 3: d8f0577077c t7704: demonstrate failure with once-cruft objects above the geometric split
4: bd78919e19c ! 4: e028dfbc9fb pack-objects: support excluded-open packs with --stdin-packs
@@ Commit message
In `add_object_entry_from_pack()`, move the `want_object_in_pack()`
check to *after* `add_pending_oid()`. This is necessary so that commits
from excluded-open packs are added as traversal tips even though their
- objects won't appear in the output. The `include_check` and
- `include_check_obj` callbacks on `rev_info` are used to halt the walk at
- closed-excluded packs, since objects behind a '^' boundary are
- guaranteed to have closure and need not be rescued.
+ objects won't appear in the output. As a consequence, the caller
+ `for_each_object_in_pack()` will always provide a non-NULL 'p', hence we
+ are able to drop the "if (p)" conditional.
+
+ The `include_check` and `include_check_obj` callbacks on `rev_info` are
+ used to halt the walk at closed-excluded packs, since objects behind a
+ '^' boundary are guaranteed to have closure and need not be rescued.
The following commit will make use of this new functionality within the
repack layer to resolve the test failure demonstrated in the previous
@@ builtin/pack-objects.c: static int want_found_object(const struct object_id *oid
return 0;
} else {
@@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct object_id *oid,
+ void *_data)
+ {
+ off_t ofs;
++ struct object_info oi = OBJECT_INFO_INIT;
+ enum object_type type = OBJ_NONE;
+
+ display_progress(progress_state, ++nr_seen);
+@@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct object_id *oid,
+ if (have_duplicate_entry(oid, 0))
return 0;
++ stdin_packs_found_nr++;
++
ofs = nth_packed_object_offset(p, pos);
-- if (!want_object_in_pack(oid, 0, &p, &ofs))
-- return 0;
++
++ oi.typep = &type;
++ if (packed_object_info(p, ofs, &oi) < 0) {
++ die(_("could not get type of object %s in pack %s"),
++ oid_to_hex(oid), p->pack_name);
++ } else if (type == OBJ_COMMIT) {
++ struct rev_info *revs = _data;
++ /*
++ * commits in included packs are used as starting points
++ * for the subsequent revision walk
++ *
++ * Note that we do want to walk through commits that are
++ * present in excluded-open ('!') packs to pick up any
++ * objects reachable from them not present in the
++ * excluded-closed ('^') packs.
++ *
++ * However, we'll only add those objects to the packing
++ * list after checking `want_object_in_pack()` below.
++ */
++ add_pending_oid(revs, NULL, oid, 0);
++ }
++
+ if (!want_object_in_pack(oid, 0, &p, &ofs))
+ return 0;
- if (p) {
- struct object_info oi = OBJECT_INFO_INIT;
-@@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct object_id *oid,
- } else if (type == OBJ_COMMIT) {
- struct rev_info *revs = _data;
- /*
+- if (p) {
+- struct object_info oi = OBJECT_INFO_INIT;
+-
+- oi.typep = &type;
+- if (packed_object_info(p, ofs, &oi) < 0) {
+- die(_("could not get type of object %s in pack %s"),
+- oid_to_hex(oid), p->pack_name);
+- } else if (type == OBJ_COMMIT) {
+- struct rev_info *revs = _data;
+- /*
- * commits in included packs are used as starting points for the
- * subsequent revision walk
-+ * commits in included packs are used as starting points
-+ * for the subsequent revision walk
-+ *
-+ * Note that we do want to walk through commits that are
-+ * present in excluded-open ('!') packs to pick up any
-+ * objects reachable from them not present in the
-+ * excluded-closed ('^') packs.
-+ *
-+ * However, we'll only add those objects to the packing
-+ * list after checking `want_object_in_pack()` below.
- */
- add_pending_oid(revs, NULL, oid, 0);
- }
+- */
+- add_pending_oid(revs, NULL, oid, 0);
+- }
-
- stdin_packs_found_nr++;
- }
-
-+ if (!want_object_in_pack(oid, 0, &p, &ofs))
-+ return 0;
-+
-+ stdin_packs_found_nr++;
-+
+- }
+-
create_object_entry(oid, type, 0, 0, 0, p, ofs);
return 0;
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
};
@@ builtin/pack-objects.c: static void stdin_packs_add_pack_entries(struct strmap *packs,
- if (!info->p)
- die(_("could not find pack '%s'"), item->string);
+ for_each_string_list_item(item, &keys) {
+ struct stdin_pack_info *info = item->util;
- if (info->kind & STDIN_PACK_INCLUDE)
+ if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
@@ builtin/pack-objects.c: static void stdin_packs_add_pack_entries(struct strmap *
struct strbuf buf = STRBUF_INIT;
struct strmap packs = STRMAP_INIT;
@@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
- if (!key || !*key)
+
+ if (!*key)
continue;
-
- if (*key == '^')
+ if (*key == '^' ||
+ (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW))
5: c4fd52e751a = 5: 23cb9f33dba repack: mark non-MIDX packs above the split as excluded-open
base-commit: 7ff1e8dc1e1680510c96e69965b3fa81372c5037
--
2.53.0.614.g164f3b634ec
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()`
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
@ 2026-03-25 23:51 ` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
` (4 subsequent siblings)
5 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-25 23:51 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
The `read_stdin_packs()` function added originally via 339bce27f4f
(builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22)
declares a `rev_info` struct but neglects to call `release_revisions()`
on it before returning, creating the potential for a leak.
The related change in 97ec43247c0 (pack-objects: declare 'rev_info' for
'--stdin-packs' earlier, 2025-06-23) carried forward this oversight and
did not address it.
Ensure that we call `release_revisions()` appropriately to prevent a
potential leak from this function. Note that in practice our `rev_info`
here does not have a present leak, hence t5331 passes cleanly before
this commit, even when built with SANITIZE=leak.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index cd013c0b68a..9a89bc5c4c9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3968,6 +3968,8 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
show_object_pack_hint,
&mode);
+ release_revisions(&revs);
+
trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
stdin_packs_found_nr);
trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
--
2.53.0.614.g164f3b634ec
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
@ 2026-03-25 23:51 ` Taylor Blau
2026-03-26 20:40 ` Derrick Stolee
2026-03-26 22:37 ` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
` (3 subsequent siblings)
5 siblings, 2 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-25 23:51 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
The '--stdin-packs' mode of pack-objects maintains two separate
string_lists: one for included packs, and one for excluded packs. Each
list stores the pack basename as a string and the corresponding
`packed_git` pointer in its `->util` field.
This works, but makes it awkward to extend the set of pack "kinds" that
pack-objects can accept via stdin, since each new kind would need its
own string_list and duplicated handling. A future commit will want to do
just this, so prepare for that change by handling the various "kinds" of
packs specified over stdin in a more generic fashion.
Namely, replace the two `string_list`s with a single `strmap` keyed on
the pack basename, with values pointing to a new `struct
stdin_pack_info`. This struct tracks both the `packed_git` pointer and a
`kind` bitfield indicating whether the pack was specified as included or
excluded.
Extract the logic for sorting packs by mtime and adding their objects
into a separate `stdin_packs_add_pack_entries()` helper.
While we could have used a `string_list`, we must handle the case where
the same pack is specified more than once. With a `string_list` only, we
would have to pay a quadratic cost to either (a) insert elements into
their sorted positions, or (b) a repeated linear search, which is
accidentally quadratic. For that reason, use a strmap instead.
This patch does not include any functional changes.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 150 ++++++++++++++++++++++++-----------------
1 file changed, 89 insertions(+), 61 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9a89bc5c4c9..068b87d2af4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -28,6 +28,7 @@
#include "reachable.h"
#include "oid-array.h"
#include "strvec.h"
+#include "strmap.h"
#include "list.h"
#include "packfile.h"
#include "object-file.h"
@@ -3837,90 +3838,117 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
return 0;
}
-static void read_packs_list_from_stdin(struct rev_info *revs)
+struct stdin_pack_info {
+ struct packed_git *p;
+ enum {
+ STDIN_PACK_INCLUDE = (1<<0),
+ STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
+ } kind;
+};
+
+static void stdin_packs_add_pack_entries(struct strmap *packs,
+ struct rev_info *revs)
+{
+ struct string_list keys = STRING_LIST_INIT_NODUP;
+ struct string_list_item *item;
+ struct hashmap_iter iter;
+ struct strmap_entry *entry;
+
+ strmap_for_each_entry(packs, &iter, entry) {
+ struct stdin_pack_info *info = entry->value;
+ if (!info->p)
+ die(_("could not find pack '%s'"), entry->key);
+
+ string_list_append(&keys, entry->key)->util = info;
+ }
+
+ /*
+ * Order packs by ascending mtime; use QSORT directly to access the
+ * string_list_item's ->util pointer, which string_list_sort() does not
+ * provide.
+ */
+ QSORT(keys.items, keys.nr, pack_mtime_cmp);
+
+ for_each_string_list_item(item, &keys) {
+ struct stdin_pack_info *info = item->util;
+
+ if (info->kind & STDIN_PACK_INCLUDE)
+ for_each_object_in_pack(info->p,
+ add_object_entry_from_pack,
+ revs,
+ ODB_FOR_EACH_OBJECT_PACK_ORDER);
+ }
+
+ string_list_clear(&keys, 0);
+}
+
+static void stdin_packs_read_input(struct rev_info *revs)
{
struct strbuf buf = STRBUF_INIT;
- struct string_list include_packs = STRING_LIST_INIT_DUP;
- struct string_list exclude_packs = STRING_LIST_INIT_DUP;
- struct string_list_item *item = NULL;
+ struct strmap packs = STRMAP_INIT;
struct packed_git *p;
while (strbuf_getline(&buf, stdin) != EOF) {
- if (!buf.len)
+ struct stdin_pack_info *info;
+ const char *key = buf.buf;
+
+ if (!*key)
continue;
+ if (*key == '^')
+ key++;
+
+ info = strmap_get(&packs, key);
+ if (!info) {
+ CALLOC_ARRAY(info, 1);
+ strmap_put(&packs, key, info);
+ }
if (*buf.buf == '^')
- string_list_append(&exclude_packs, buf.buf + 1);
+ info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
else
- string_list_append(&include_packs, buf.buf);
+ info->kind |= STDIN_PACK_INCLUDE;
strbuf_reset(&buf);
}
- string_list_sort_u(&include_packs, 0);
- string_list_sort_u(&exclude_packs, 0);
-
repo_for_each_pack(the_repository, p) {
- const char *pack_name = pack_basename(p);
+ struct stdin_pack_info *info;
- if ((item = string_list_lookup(&include_packs, pack_name))) {
+ info = strmap_get(&packs, pack_basename(p));
+ if (!info)
+ continue;
+
+ if (info->kind & STDIN_PACK_INCLUDE) {
if (exclude_promisor_objects && p->pack_promisor)
die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
- item->util = p;
+
+ /*
+ * Arguments we got on stdin may not even be
+ * packs. First check that to avoid segfaulting
+ * later on in e.g. pack_mtime_cmp(), excluded
+ * packs are handled below.
+ */
+ if (!is_pack_valid(p))
+ die(_("packfile %s cannot be accessed"), p->pack_name);
}
- if ((item = string_list_lookup(&exclude_packs, pack_name)))
- item->util = p;
- }
- /*
- * Arguments we got on stdin may not even be packs. First
- * check that to avoid segfaulting later on in
- * e.g. pack_mtime_cmp(), excluded packs are handled below.
- *
- * Since we first parsed our STDIN and then sorted the input
- * lines the pack we error on will be whatever line happens to
- * sort first. This is lazy, it's enough that we report one
- * bad case here, we don't need to report the first/last one,
- * or all of them.
- */
- for_each_string_list_item(item, &include_packs) {
- struct packed_git *p = item->util;
- if (!p)
- die(_("could not find pack '%s'"), item->string);
- if (!is_pack_valid(p))
- die(_("packfile %s cannot be accessed"), p->pack_name);
- }
+ if (info->kind & STDIN_PACK_EXCLUDE_CLOSED) {
+ /*
+ * Marking excluded packs as kept in-core so
+ * that later calls to add_object_entry()
+ * discards any objects that are also found in
+ * excluded packs.
+ */
+ p->pack_keep_in_core = 1;
+ }
- /*
- * Then, handle all of the excluded packs, marking them as
- * kept in-core so that later calls to add_object_entry()
- * discards any objects that are also found in excluded packs.
- */
- for_each_string_list_item(item, &exclude_packs) {
- struct packed_git *p = item->util;
- if (!p)
- die(_("could not find pack '%s'"), item->string);
- p->pack_keep_in_core = 1;
+ info->p = p;
}
- /*
- * Order packs by ascending mtime; use QSORT directly to access the
- * string_list_item's ->util pointer, which string_list_sort() does not
- * provide.
- */
- QSORT(include_packs.items, include_packs.nr, pack_mtime_cmp);
-
- for_each_string_list_item(item, &include_packs) {
- struct packed_git *p = item->util;
- for_each_object_in_pack(p,
- add_object_entry_from_pack,
- revs,
- ODB_FOR_EACH_OBJECT_PACK_ORDER);
- }
+ stdin_packs_add_pack_entries(&packs, revs);
strbuf_release(&buf);
- string_list_clear(&include_packs, 0);
- string_list_clear(&exclude_packs, 0);
+ strmap_clear(&packs, 1);
}
static void add_unreachable_loose_objects(struct rev_info *revs);
@@ -3957,7 +3985,7 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
/* avoids adding objects in excluded packs */
ignore_packed_keep_in_core = 1;
- read_packs_list_from_stdin(&revs);
+ stdin_packs_read_input(&revs);
if (rev_list_unpacked)
add_unreachable_loose_objects(&revs);
--
2.53.0.614.g164f3b634ec
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-25 23:51 ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
@ 2026-03-26 20:40 ` Derrick Stolee
2026-03-26 21:44 ` Taylor Blau
2026-03-26 22:37 ` Taylor Blau
1 sibling, 1 reply; 41+ messages in thread
From: Derrick Stolee @ 2026-03-26 20:40 UTC (permalink / raw)
To: Taylor Blau, git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
On 3/25/2026 7:51 PM, Taylor Blau wrote:
> -static void read_packs_list_from_stdin(struct rev_info *revs)
> +struct stdin_pack_info {
> + struct packed_git *p;
> + enum {
> + STDIN_PACK_INCLUDE = (1<<0),
> + STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
> + } kind;
> +};
I kind of wish this enum wasn't anonymous. And it matters later.
Let's call this 'enum pack_input_kind' for now.
> +static void stdin_packs_read_input(struct rev_info *revs)
> {
> struct strbuf buf = STRBUF_INIT;
> - struct string_list include_packs = STRING_LIST_INIT_DUP;
> - struct string_list exclude_packs = STRING_LIST_INIT_DUP;
> - struct string_list_item *item = NULL;
> + struct strmap packs = STRMAP_INIT;
> struct packed_git *p;
>
> while (strbuf_getline(&buf, stdin) != EOF) {
> - if (!buf.len)
> + struct stdin_pack_info *info;
> + const char *key = buf.buf;
> +
> + if (!*key)
> continue;
> + if (*key == '^')
> + key++;...
> if (*buf.buf == '^')
> - string_list_append(&exclude_packs, buf.buf + 1);
> + info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
> else
> - string_list_append(&include_packs, buf.buf);
> + info->kind |= STDIN_PACK_INCLUDE;
>
It took me a while to figure out what was going on with checking
*key == '^' and later checking *buf.buf == '^'. We should probably
combine them to the same condition:
const char *key = buf.buf;
enum pack_input_kind kind = STDIN_PACK_INCLUDE;
if (*key == '^') {
key++;
kind |= STDIN_PACK_EXCLUDE_CLOSED;
}
info = strmap_get(&packs, key);
if (!info) {
CALLOC_ARRAY(info, 1);
strmap_put(&packs, key, info);
info->kind = kind;
}
strbuf_reset(&buf);
This feels easier to read, for me.
> strbuf_reset(&buf);
> }
>
> - string_list_sort_u(&include_packs, 0);
> - string_list_sort_u(&exclude_packs, 0);
> -
> repo_for_each_pack(the_repository, p) {
> - const char *pack_name = pack_basename(p);
> + struct stdin_pack_info *info;
>
> - if ((item = string_list_lookup(&include_packs, pack_name))) {
> + info = strmap_get(&packs, pack_basename(p));
> + if (!info)
> + continue;
> +
> + if (info->kind & STDIN_PACK_INCLUDE) {
...
> }
> + if (info->kind & STDIN_PACK_EXCLUDE_CLOSED) {
This does help confirm that a pack could be in both categories, so
using flag bits helps.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-26 20:40 ` Derrick Stolee
@ 2026-03-26 21:44 ` Taylor Blau
2026-03-26 22:11 ` Junio C Hamano
0 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2026-03-26 21:44 UTC (permalink / raw)
To: Derrick Stolee
Cc: git, Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
On Thu, Mar 26, 2026 at 04:40:00PM -0400, Derrick Stolee wrote:
> On 3/25/2026 7:51 PM, Taylor Blau wrote:
>
> > -static void read_packs_list_from_stdin(struct rev_info *revs)
> > +struct stdin_pack_info {
> > + struct packed_git *p;
> > + enum {
> > + STDIN_PACK_INCLUDE = (1<<0),
> > + STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
> > + } kind;
> > +};
>
> I kind of wish this enum wasn't anonymous. And it matters later.
> Let's call this 'enum pack_input_kind' for now.
Hmm. I don't feel strongly about this, but I'm not sure I follow the
reasoning here. The enum is truly only meant to be used within the
context of a stdin_pack_info struct, so it felt natural to keep it
anonymous above.
I'm happy to change this if you feel strongly about it, but TBH I am not
sure I see the benefit of doing so.
> It took me a while to figure out what was going on with checking
> *key == '^' and later checking *buf.buf == '^'. We should probably
> combine them to the same condition:
>
> const char *key = buf.buf;
> enum pack_input_kind kind = STDIN_PACK_INCLUDE;
>
> if (*key == '^') {
> key++;
> kind |= STDIN_PACK_EXCLUDE_CLOSED;
> }
>
> info = strmap_get(&packs, key);
> if (!info) {
> CALLOC_ARRAY(info, 1);
> strmap_put(&packs, key, info);
> info->kind = kind;
> }
>
> strbuf_reset(&buf);
>
> This feels easier to read, for me.
I agree that the above is a little easier to read, but I'm not sure it
handles the case of specifying the same pack multiple times. I had
originally written it in a similar way as what you suggested above, but
it breaks if I write something like:
cat <<EOF | git pack-objects --stdin
pack-XYZ.pack
^pack-XYZ.pack
EOF
It should produce a pack with no objects, but I think the code above
would effectively ignore the second line because we already have a
strmap entry for pack-XYZ.pack so we never set the additional flag bits.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-26 21:44 ` Taylor Blau
@ 2026-03-26 22:11 ` Junio C Hamano
2026-03-26 22:32 ` Taylor Blau
0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2026-03-26 22:11 UTC (permalink / raw)
To: Taylor Blau
Cc: Derrick Stolee, git, Jeff King, Elijah Newren, Patrick Steinhardt
Taylor Blau <me@ttaylorr.com> writes:
> On Thu, Mar 26, 2026 at 04:40:00PM -0400, Derrick Stolee wrote:
>> On 3/25/2026 7:51 PM, Taylor Blau wrote:
>>
>> > -static void read_packs_list_from_stdin(struct rev_info *revs)
>> > +struct stdin_pack_info {
>> > + struct packed_git *p;
>> > + enum {
>> > + STDIN_PACK_INCLUDE = (1<<0),
>> > + STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
>> > + } kind;
>> > +};
>>
>> I kind of wish this enum wasn't anonymous. And it matters later.
>> Let's call this 'enum pack_input_kind' for now.
>
> Hmm. I don't feel strongly about this, but I'm not sure I follow the
> reasoning here. The enum is truly only meant to be used within the
> context of a stdin_pack_info struct, so it felt natural to keep it
> anonymous above.
The only thing that makes it beneficial to have a name is if a
code like this one ...
>> const char *key = buf.buf;
>> enum pack_input_kind kind = STDIN_PACK_INCLUDE;
>>
>> if (*key == '^') {
>> key++;
>> kind |= STDIN_PACK_EXCLUDE_CLOSED;
>> }
... that uses the type to define its own variable outside the
context of the struct needs to be written, right?
If these STDIN_PACK_* constants would ever appear _only_ within the
context of talking about the .kind member of the stdin_pack_info
struct and cannot possibly appear anywhere else, then there is no
point naming the enum.
A free-standing variable could use "int" or "unsigned" as the base
language C does not differentiate different enums as separate types,
but let's not go there, as -Wenum-compare and other warnings do give
us opportunity to do better than that.
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-26 22:11 ` Junio C Hamano
@ 2026-03-26 22:32 ` Taylor Blau
2026-03-27 0:29 ` Derrick Stolee
2026-03-27 15:52 ` Junio C Hamano
0 siblings, 2 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-26 22:32 UTC (permalink / raw)
To: Junio C Hamano
Cc: Derrick Stolee, git, Jeff King, Elijah Newren, Patrick Steinhardt
On Thu, Mar 26, 2026 at 03:11:06PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > On Thu, Mar 26, 2026 at 04:40:00PM -0400, Derrick Stolee wrote:
> >> On 3/25/2026 7:51 PM, Taylor Blau wrote:
> >>
> >> > -static void read_packs_list_from_stdin(struct rev_info *revs)
> >> > +struct stdin_pack_info {
> >> > + struct packed_git *p;
> >> > + enum {
> >> > + STDIN_PACK_INCLUDE = (1<<0),
> >> > + STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
> >> > + } kind;
> >> > +};
> >>
> >> I kind of wish this enum wasn't anonymous. And it matters later.
> >> Let's call this 'enum pack_input_kind' for now.
> >
> > Hmm. I don't feel strongly about this, but I'm not sure I follow the
> > reasoning here. The enum is truly only meant to be used within the
> > context of a stdin_pack_info struct, so it felt natural to keep it
> > anonymous above.
>
> The only thing that makes it beneficial to have a name is if a
> code like this one ...
>
> >> const char *key = buf.buf;
> >> enum pack_input_kind kind = STDIN_PACK_INCLUDE;
> >>
> >> if (*key == '^') {
> >> key++;
> >> kind |= STDIN_PACK_EXCLUDE_CLOSED;
> >> }
>
> ... that uses the type to define its own variable outside the
> context of the struct needs to be written, right?
>
> If these STDIN_PACK_* constants would ever appear _only_ within the
> context of talking about the .kind member of the stdin_pack_info
> struct and cannot possibly appear anywhere else, then there is no
> point naming the enum.
Yup, I agree. I'm inclined to leave the enum anonymous for now, since
the only place we would need a name for it is the suggestion Stolee made
above, which I think does not correctly handle an edge case where packs
are specified multiple times.
As it is currently, the enum is only used in the context of the .kind
member of the stdin_pack_info struct, and we don't ever declare a
int/unsigned variable to hold the kind outside of that context (which
would be gross ;-)).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-26 22:32 ` Taylor Blau
@ 2026-03-27 0:29 ` Derrick Stolee
2026-03-27 17:51 ` Taylor Blau
2026-03-27 15:52 ` Junio C Hamano
1 sibling, 1 reply; 41+ messages in thread
From: Derrick Stolee @ 2026-03-27 0:29 UTC (permalink / raw)
To: Taylor Blau, Junio C Hamano
Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt
On 3/26/26 6:32 PM, Taylor Blau wrote:
> On Thu, Mar 26, 2026 at 03:11:06PM -0700, Junio C Hamano wrote:
>> Taylor Blau <me@ttaylorr.com> writes:
>> If these STDIN_PACK_* constants would ever appear _only_ within the
>> context of talking about the .kind member of the stdin_pack_info
>> struct and cannot possibly appear anywhere else, then there is no
>> point naming the enum.
>
> Yup, I agree. I'm inclined to leave the enum anonymous for now, since
> the only place we would need a name for it is the suggestion Stolee made
> above, which I think does not correctly handle an edge case where packs
> are specified multiple times.
I see that I messed up where a '|=' should be and where a '=' should be.
const char *key = buf.buf;
enum pack_input_kind kind = STDIN_PACK_INCLUDE;
if (*key == '^') {
key++;
/* THIS ONE SHOULD BE EQUAL */
kind = STDIN_PACK_EXCLUDE_CLOSED;
}
info = strmap_get(&packs, key);
if (!info) {
CALLOC_ARRAY(info, 1);
strmap_put(&packs, key, info);
/* THIS ONE SHOULD BE ADDING THE FLAG */
info->kind |= kind;
}
strbuf_reset(&buf);
Sorry that I was less careful with the code and hadn't tested it
locally. (I still haven't, but I still think it's worth a little
more attempt to benefit from this structure, especially with your
addition of handling '!' later.
> As it is currently, the enum is only used in the context of the .kind
> member of the stdin_pack_info struct, and we don't ever declare a
> int/unsigned variable to hold the kind outside of that context (which
> would be gross ;-)).
Once you start creating a type, that tends to create the desire to use
that type in a new way. My _preference_ is to name the type because it
unlocks new ways of working with the data without needing to rewrite
the definition.
If the small tweak to my version works, I do think that the readability
of the new organization would be worth it.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-27 0:29 ` Derrick Stolee
@ 2026-03-27 17:51 ` Taylor Blau
2026-03-27 18:34 ` Derrick Stolee
0 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2026-03-27 17:51 UTC (permalink / raw)
To: Derrick Stolee
Cc: Junio C Hamano, git, Jeff King, Elijah Newren, Patrick Steinhardt
On Thu, Mar 26, 2026 at 08:29:57PM -0400, Derrick Stolee wrote:
> On 3/26/26 6:32 PM, Taylor Blau wrote:
> > On Thu, Mar 26, 2026 at 03:11:06PM -0700, Junio C Hamano wrote:
> > > Taylor Blau <me@ttaylorr.com> writes:
>
> > > If these STDIN_PACK_* constants would ever appear _only_ within the
> > > context of talking about the .kind member of the stdin_pack_info
> > > struct and cannot possibly appear anywhere else, then there is no
> > > point naming the enum.
> >
> > Yup, I agree. I'm inclined to leave the enum anonymous for now, since
> > the only place we would need a name for it is the suggestion Stolee made
> > above, which I think does not correctly handle an edge case where packs
> > are specified multiple times.
>
> I see that I messed up where a '|=' should be and where a '=' should be.
>
> const char *key = buf.buf;
> enum pack_input_kind kind = STDIN_PACK_INCLUDE;
>
> if (*key == '^') {
> key++;
>
> /* THIS ONE SHOULD BE EQUAL */
> kind = STDIN_PACK_EXCLUDE_CLOSED;
> }
>
> info = strmap_get(&packs, key);
> if (!info) {
> CALLOC_ARRAY(info, 1);
> strmap_put(&packs, key, info);
>
> /* THIS ONE SHOULD BE ADDING THE FLAG */
> info->kind |= kind;
Right, though the problem is not that we're setting the wrong flag bits
(though I agree in the previous version of this suggestion that we
should have been OR-ing them in), but that we're not setting any flag
bits if the same pack is specified multiple times.
Applying the following:
--- 8< ---
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 52bad8cea90..37c69f307d2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3833,13 +3833,15 @@ static void show_commit_pack_hint(struct commit *commit, void *data)
}
+enum stdin_pack_info_kind {
+ STDIN_PACK_INCLUDE = (1<<0),
+ STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
+ STDIN_PACK_EXCLUDE_OPEN = (1<<2),
+};
+
struct stdin_pack_info {
struct packed_git *p;
- enum {
- STDIN_PACK_INCLUDE = (1<<0),
- STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
- STDIN_PACK_EXCLUDE_OPEN = (1<<2),
- } kind;
+ enum stdin_pack_info_kind kind;
};
static int pack_mtime_cmp(const void *_a, const void *_b)
@@ -3927,26 +3929,26 @@ static void stdin_packs_read_input(struct rev_info *revs,
while (strbuf_getline(&buf, stdin) != EOF) {
struct stdin_pack_info *info;
+ enum stdin_pack_info_kind kind = STDIN_PACK_INCLUDE;
const char *key = buf.buf;
if (!*key)
continue;
- if (*key == '^' ||
- (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW))
+ else if (*key == '^')
+ kind = STDIN_PACK_EXCLUDE_CLOSED;
+ else if (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
+ kind = STDIN_PACK_EXCLUDE_OPEN;
+
+ if (kind != STDIN_PACK_INCLUDE)
key++;
info = strmap_get(&packs, key);
if (!info) {
CALLOC_ARRAY(info, 1);
strmap_put(&packs, key, info);
- }
- if (*buf.buf == '^')
- info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
- else if (*buf.buf == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
- info->kind |= STDIN_PACK_EXCLUDE_OPEN;
- else
- info->kind |= STDIN_PACK_INCLUDE;
+ info->kind |= kind;
+ }
strbuf_reset(&buf);
}
--- >8 ---
fails t5331.8, which verifies that pack-objects correctly handles the
same pack being specified as both included and excluded.
But if you do the following on top of the above:
--- 8< ---
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 37c69f307d2..b6e4f950a67 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3946,10 +3946,10 @@ static void stdin_packs_read_input(struct rev_info *revs,
if (!info) {
CALLOC_ARRAY(info, 1);
strmap_put(&packs, key, info);
-
- info->kind |= kind;
}
+ info->kind |= kind;
+
strbuf_reset(&buf);
}
--- >8 ---
Then that works as expected. I agree that the end-result is a little
easier to read, so I'll squash this into the subsequent round.
> If the small tweak to my version works, I do think that the readability
> of the new organization would be worth it.
I agree! Thanks again for the suggestion.
Thanks,
Taylor
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-27 17:51 ` Taylor Blau
@ 2026-03-27 18:34 ` Derrick Stolee
0 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee @ 2026-03-27 18:34 UTC (permalink / raw)
To: Taylor Blau
Cc: Junio C Hamano, git, Jeff King, Elijah Newren, Patrick Steinhardt
On 3/27/2026 1:51 PM, Taylor Blau wrote:
> On Thu, Mar 26, 2026 at 08:29:57PM -0400, Derrick Stolee wrote:
>> On 3/26/26 6:32 PM, Taylor Blau wrote:
>
> fails t5331.8, which verifies that pack-objects correctly handles the
> same pack being specified as both included and excluded.
>
> But if you do the following on top of the above:
>
> --- 8< ---
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 37c69f307d2..b6e4f950a67 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3946,10 +3946,10 @@ static void stdin_packs_read_input(struct rev_info *revs,
> if (!info) {
> CALLOC_ARRAY(info, 1);
> strmap_put(&packs, key, info);
> -
> - info->kind |= kind;
> }
>
> + info->kind |= kind;
> +
> strbuf_reset(&buf);
> }
> --- >8 ---
>
> Then that works as expected. I agree that the end-result is a little
> easier to read, so I'll squash this into the subsequent round.
Ah, yes. We should augment the flags even when finding a duplicate.
That's the fatal flaw. Thanks for working through it and fixing it.
-Stolee
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-26 22:32 ` Taylor Blau
2026-03-27 0:29 ` Derrick Stolee
@ 2026-03-27 15:52 ` Junio C Hamano
1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2026-03-27 15:52 UTC (permalink / raw)
To: Taylor Blau
Cc: Derrick Stolee, git, Jeff King, Elijah Newren, Patrick Steinhardt
Taylor Blau <me@ttaylorr.com> writes:
>> If these STDIN_PACK_* constants would ever appear _only_ within the
>> context of talking about the .kind member of the stdin_pack_info
>> struct and cannot possibly appear anywhere else, then there is no
>> point naming the enum.
>
> Yup, I agree. I'm inclined to leave the enum anonymous for now, since
> the only place we would need a name for it is the suggestion Stolee made
> above, which I think does not correctly handle an edge case where packs
> are specified multiple times.
>
> As it is currently, the enum is only used in the context of the .kind
> member of the stdin_pack_info struct, and we don't ever declare a
> int/unsigned variable to hold the kind outside of that context (which
> would be gross ;-)).
"As it is currently" is vastly different from "cannot possibly
appear", though ;-)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-25 23:51 ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-26 20:40 ` Derrick Stolee
@ 2026-03-26 22:37 ` Taylor Blau
1 sibling, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-26 22:37 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
On Wed, Mar 25, 2026 at 07:51:50PM -0400, Taylor Blau wrote:
> +static void stdin_packs_add_pack_entries(struct strmap *packs,
> + struct rev_info *revs)
> +{
> + struct string_list keys = STRING_LIST_INIT_NODUP;
> + struct string_list_item *item;
> + struct hashmap_iter iter;
> + struct strmap_entry *entry;
> +
> + strmap_for_each_entry(packs, &iter, entry) {
> + struct stdin_pack_info *info = entry->value;
> + if (!info->p)
> + die(_("could not find pack '%s'"), entry->key);
> +
> + string_list_append(&keys, entry->key)->util = info;
> + }
> +
> + /*
> + * Order packs by ascending mtime; use QSORT directly to access the
> + * string_list_item's ->util pointer, which string_list_sort() does not
> + * provide.
> + */
> + QSORT(keys.items, keys.nr, pack_mtime_cmp);
Yikes, this is definitely not right. pack_mtime_cmp expects the ->util
field to be a pointer to a packed_git structure, not a stdin_pack_info
one.
Indeed, this fails the ASan CI builds, which I didn't notice as I sent
this series off towards the very end of my workday yesterday.
I'll need to resubmit this series to fix this, but I'll hold off on
doing so until the discussion in response to Stolee's review of this
round settles first.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
@ 2026-03-25 23:51 ` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
` (2 subsequent siblings)
5 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-25 23:51 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
Add a test demonstrating a case where geometric repacking fails to
produce a pack with full object closure, thus making it impossible to
write a reachability bitmap.
Mark the test with 'test_expect_failure' for now. The subsequent commit
will explain the precise failure mode, and implement a fix.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t7704-repack-cruft.sh | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index aa2e2e6ad88..77133395b5d 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -869,4 +869,26 @@ test_expect_success 'repack --write-midx includes cruft when already geometric'
)
'
+test_expect_failure 'repack rescues once-cruft objects above geometric split' '
+ git config repack.midxMustContainCruft false &&
+
+ test_commit reachable &&
+ test_commit unreachable &&
+
+ unreachable="$(git rev-parse HEAD)" &&
+
+ git reset --hard HEAD^ &&
+ git tag -d unreachable &&
+ git reflog expire --all --expire=all &&
+
+ git repack --cruft -d &&
+
+ echo $unreachable | git pack-objects .git/objects/pack/pack &&
+
+ test_commit new &&
+
+ git update-ref refs/heads/other $unreachable &&
+ git repack --geometric=2 -d --write-midx --write-bitmap-index
+'
+
test_done
--
2.53.0.614.g164f3b634ec
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
` (2 preceding siblings ...)
2026-03-25 23:51 ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
@ 2026-03-25 23:51 ` Taylor Blau
2026-03-26 20:48 ` Derrick Stolee
2026-03-25 23:51 ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-26 20:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
5 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2026-03-25 23:51 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In cd846bacc7d (pack-objects: introduce '--stdin-packs=follow',
2025-06-23), pack-objects learned to traverse through commits in
included packs when using '--stdin-packs=follow', rescuing reachable
objects from unlisted packs into the output.
When we encounter a commit in an excluded pack during this rescuing
phase we will traverse through its parents. But because we set
`revs.no_kept_objects = 1`, commit simplification will prevent us from
showing it via `get_revision()`. (In practice, `--stdin-packs=follow`
walks commits down to the roots, but only opens up trees for ones that
do not appear in an excluded pack.)
But there are certain cases where we *do* need to see the parents of an
object in an excluded pack. Namely, if an object is rescue-able, but
only reachable from object(s) which appear in excluded packs, then
commit simplification will exclude those commits from the object
traversal, and we will never see a copy of that object, and thus not
rescue it.
This is what causes the failure in the previous commit during repacking.
When performing a geometric repack, packs above the geometric split that
weren't part of the previous MIDX (e.g., packs pushed directly into
`$GIT_DIR/objects/pack`) may not have full object closure. When those
packs are listed as excluded via the '^' marker, the reachability
traversal encounters the sequence described above, and may miss objects
which we expect to rescue with `--stdin-packs=follow`.
Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed
packs, objects from '!'-prefixed packs are excluded from the resulting
pack. But unlike '^', commits in '!'-prefixed packs *are* used as
starting points for the follow traversal, and the traversal does not
treat them as a closure boundary.
In order to distinguish excluded-closed from excluded-open packs during
the traversal, introduce a new `pack_keep_in_core_open` bit on
`struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN`
flag for the kept-pack cache.
In `add_object_entry_from_pack()`, move the `want_object_in_pack()`
check to *after* `add_pending_oid()`. This is necessary so that commits
from excluded-open packs are added as traversal tips even though their
objects won't appear in the output. As a consequence, the caller
`for_each_object_in_pack()` will always provide a non-NULL 'p', hence we
are able to drop the "if (p)" conditional.
The `include_check` and `include_check_obj` callbacks on `rev_info` are
used to halt the walk at closed-excluded packs, since objects behind a
'^' boundary are guaranteed to have closure and need not be rescued.
The following commit will make use of this new functionality within the
repack layer to resolve the test failure demonstrated in the previous
commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-pack-objects.adoc | 25 +++++--
builtin/pack-objects.c | 110 ++++++++++++++++++++++------
packfile.c | 3 +-
packfile.h | 2 +
t/t5331-pack-objects-stdin.sh | 105 ++++++++++++++++++++++++++
5 files changed, 213 insertions(+), 32 deletions(-)
diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 71b9682485c..b78175fbe1b 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -94,13 +94,24 @@ base-name::
included packs (those not beginning with `^`), excluding any
objects listed in the excluded packs (beginning with `^`).
+
-When `mode` is "follow", objects from packs not listed on stdin receive
-special treatment. Objects within unlisted packs will be included if
-those objects are (1) reachable from the included packs, and (2) not
-found in any excluded packs. This mode is useful, for example, to
-resurrect once-unreachable objects found in cruft packs to generate
-packs which are closed under reachability up to the boundary set by the
-excluded packs.
+When `mode` is "follow" packs may additionally be prefixed with `!`,
+indicating that they are excluded but not necessarily closed under
+reachability. In addition to objects in included packs, the resulting
+pack may include additional objects based on the following:
++
+--
+* If any packs are marked with `!`, then objects reachable from such
+ packs or included ones via objects outside of excluded-closed packs
+ will be included. In this case, all `^` packs are treated as closed
+ under reachability.
+* Otherwise (if there are no `!` packs), objects within unlisted packs
+ will be included if those objects are (1) reachable from the
+ included packs, and (2) not found in any excluded packs.
+--
++
+This mode is useful, for example, to resurrect once-unreachable
+objects found in cruft packs to generate packs which are closed under
+reachability up to the boundary set by the excluded packs.
+
Incompatible with `--revs`, or options that imply `--revs` (such as
`--all`), with the exception of `--unpacked`, which is compatible.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 068b87d2af4..b8f5b9bf718 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -217,6 +217,7 @@ static int have_non_local_packs;
static int incremental;
static int ignore_packed_keep_on_disk;
static int ignore_packed_keep_in_core;
+static int ignore_packed_keep_in_core_open;
static int ignore_packed_keep_in_core_has_cruft;
static int allow_ofs_delta;
static struct pack_idx_option pack_idx_opts;
@@ -1618,7 +1619,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
/*
* Then handle .keep first, as we have a fast(er) path there.
*/
- if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) {
+ if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core ||
+ ignore_packed_keep_in_core_open) {
/*
* Set the flags for the kept-pack cache to be the ones we want
* to ignore.
@@ -1632,6 +1634,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
flags |= KEPT_PACK_ON_DISK;
if (ignore_packed_keep_in_core)
flags |= KEPT_PACK_IN_CORE;
+ if (ignore_packed_keep_in_core_open)
+ flags |= KEPT_PACK_IN_CORE_OPEN;
/*
* If the object is in a pack that we want to ignore, *and* we
@@ -1643,6 +1647,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
return 0;
if (ignore_packed_keep_in_core && p->pack_keep_in_core)
return 0;
+ if (ignore_packed_keep_in_core_open && p->pack_keep_in_core_open)
+ return 0;
if (has_object_kept_pack(p->repo, oid, flags))
return 0;
} else {
@@ -3742,6 +3748,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
void *_data)
{
off_t ofs;
+ struct object_info oi = OBJECT_INFO_INIT;
enum object_type type = OBJ_NONE;
display_progress(progress_state, ++nr_seen);
@@ -3749,29 +3756,34 @@ static int add_object_entry_from_pack(const struct object_id *oid,
if (have_duplicate_entry(oid, 0))
return 0;
+ stdin_packs_found_nr++;
+
ofs = nth_packed_object_offset(p, pos);
+
+ oi.typep = &type;
+ if (packed_object_info(p, ofs, &oi) < 0) {
+ die(_("could not get type of object %s in pack %s"),
+ oid_to_hex(oid), p->pack_name);
+ } else if (type == OBJ_COMMIT) {
+ struct rev_info *revs = _data;
+ /*
+ * commits in included packs are used as starting points
+ * for the subsequent revision walk
+ *
+ * Note that we do want to walk through commits that are
+ * present in excluded-open ('!') packs to pick up any
+ * objects reachable from them not present in the
+ * excluded-closed ('^') packs.
+ *
+ * However, we'll only add those objects to the packing
+ * list after checking `want_object_in_pack()` below.
+ */
+ add_pending_oid(revs, NULL, oid, 0);
+ }
+
if (!want_object_in_pack(oid, 0, &p, &ofs))
return 0;
- if (p) {
- struct object_info oi = OBJECT_INFO_INIT;
-
- oi.typep = &type;
- if (packed_object_info(p, ofs, &oi) < 0) {
- die(_("could not get type of object %s in pack %s"),
- oid_to_hex(oid), p->pack_name);
- } else if (type == OBJ_COMMIT) {
- struct rev_info *revs = _data;
- /*
- * commits in included packs are used as starting points for the
- * subsequent revision walk
- */
- add_pending_oid(revs, NULL, oid, 0);
- }
-
- stdin_packs_found_nr++;
- }
-
create_object_entry(oid, type, 0, 0, 0, p, ofs);
return 0;
@@ -3838,11 +3850,23 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
return 0;
}
+static int stdin_packs_include_check_obj(struct object *obj, void *data UNUSED)
+{
+ return !has_object_kept_pack(to_pack.repo, &obj->oid,
+ KEPT_PACK_IN_CORE);
+}
+
+static int stdin_packs_include_check(struct commit *commit, void *data)
+{
+ return stdin_packs_include_check_obj((struct object *)commit, data);
+}
+
struct stdin_pack_info {
struct packed_git *p;
enum {
STDIN_PACK_INCLUDE = (1<<0),
STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
+ STDIN_PACK_EXCLUDE_OPEN = (1<<2),
} kind;
};
@@ -3872,7 +3896,19 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
for_each_string_list_item(item, &keys) {
struct stdin_pack_info *info = item->util;
- if (info->kind & STDIN_PACK_INCLUDE)
+ if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
+ /*
+ * When open-excluded packs ("!") are present, stop
+ * the parent walk at closed-excluded ("^") packs.
+ * Objects behind a "^" boundary are guaranteed to
+ * have closure and should not be rescued.
+ */
+ revs->include_check = stdin_packs_include_check;
+ revs->include_check_obj = stdin_packs_include_check_obj;
+ }
+
+ if ((info->kind & STDIN_PACK_INCLUDE) ||
+ (info->kind & STDIN_PACK_EXCLUDE_OPEN))
for_each_object_in_pack(info->p,
add_object_entry_from_pack,
revs,
@@ -3882,7 +3918,8 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
string_list_clear(&keys, 0);
}
-static void stdin_packs_read_input(struct rev_info *revs)
+static void stdin_packs_read_input(struct rev_info *revs,
+ enum stdin_packs_mode mode)
{
struct strbuf buf = STRBUF_INIT;
struct strmap packs = STRMAP_INIT;
@@ -3894,7 +3931,8 @@ static void stdin_packs_read_input(struct rev_info *revs)
if (!*key)
continue;
- if (*key == '^')
+ if (*key == '^' ||
+ (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW))
key++;
info = strmap_get(&packs, key);
@@ -3905,6 +3943,8 @@ static void stdin_packs_read_input(struct rev_info *revs)
if (*buf.buf == '^')
info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
+ else if (*buf.buf == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
+ info->kind |= STDIN_PACK_EXCLUDE_OPEN;
else
info->kind |= STDIN_PACK_INCLUDE;
@@ -3942,6 +3982,20 @@ static void stdin_packs_read_input(struct rev_info *revs)
p->pack_keep_in_core = 1;
}
+ if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
+ /*
+ * Marking excluded open packs as kept in-core
+ * (open) for the same reason as we marked
+ * exclude closed packs as kept in-core.
+ *
+ * Use a separate flag here to ensure we don't
+ * halt our traversal at these packs, since they
+ * are not guaranteed to have closure.
+ *
+ */
+ p->pack_keep_in_core_open = 1;
+ }
+
info->p = p;
}
@@ -3985,7 +4039,15 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
/* avoids adding objects in excluded packs */
ignore_packed_keep_in_core = 1;
- stdin_packs_read_input(&revs);
+ if (mode == STDIN_PACKS_MODE_FOLLOW) {
+ /*
+ * In '--stdin-packs=follow' mode, additionally ignore
+ * objects in excluded-open packs to prevent them from
+ * appearing in the resulting pack.
+ */
+ ignore_packed_keep_in_core_open = 1;
+ }
+ stdin_packs_read_input(&revs, mode);
if (rev_list_unpacked)
add_unreachable_loose_objects(&revs);
diff --git a/packfile.c b/packfile.c
index 215a23e42be..076e444e32a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2246,7 +2246,8 @@ struct packed_git **packfile_store_get_kept_pack_cache(struct packfile_store *st
struct packed_git *p = e->pack;
if ((p->pack_keep && (flags & KEPT_PACK_ON_DISK)) ||
- (p->pack_keep_in_core && (flags & KEPT_PACK_IN_CORE))) {
+ (p->pack_keep_in_core && (flags & KEPT_PACK_IN_CORE)) ||
+ (p->pack_keep_in_core_open && (flags & KEPT_PACK_IN_CORE_OPEN))) {
ALLOC_GROW(packs, nr + 1, alloc);
packs[nr++] = p;
}
diff --git a/packfile.h b/packfile.h
index 8b04a258a7b..b7735c1977d 100644
--- a/packfile.h
+++ b/packfile.h
@@ -28,6 +28,7 @@ struct packed_git {
unsigned pack_local:1,
pack_keep:1,
pack_keep_in_core:1,
+ pack_keep_in_core_open:1,
freshened:1,
do_not_close:1,
pack_promisor:1,
@@ -266,6 +267,7 @@ int packfile_store_freshen_object(struct packfile_store *store,
enum kept_pack_type {
KEPT_PACK_ON_DISK = (1 << 0),
KEPT_PACK_IN_CORE = (1 << 1),
+ KEPT_PACK_IN_CORE_OPEN = (1 << 2),
};
/*
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 7eb79bc2cdb..c74b5861af3 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -415,4 +415,109 @@ test_expect_success '--stdin-packs=follow tolerates missing commits' '
stdin_packs__follow_with_only HEAD HEAD^{tree}
'
+test_expect_success '--stdin-packs=follow with open-excluded packs' '
+ test_when_finished "rm -fr repo" &&
+
+ git init repo &&
+ (
+ cd repo &&
+ git config set maintenance.auto false &&
+
+ git branch -M main &&
+
+ # Create the following commit structure:
+ #
+ # A <-- B <-- D (main)
+ # ^
+ # \
+ # C (other)
+ test_commit A &&
+ test_commit B &&
+ git checkout -B other &&
+ test_commit C &&
+ git checkout main &&
+ test_commit D &&
+
+ A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+ B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+ C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+ D="$(echo B..D | git pack-objects --revs $packdir/pack)" &&
+
+ C_ONLY="$(git rev-parse other | git pack-objects $packdir/pack)" &&
+
+ git prune-packed &&
+
+ # Create a pack using --stdin-packs=follow where:
+ #
+ # - pack D is included,
+ # - pack C_ONLY is excluded, but open,
+ # - pack B is excluded, but closed, and
+ # - packs A and C are unknown
+ #
+ # The resulting pack should therefore contain:
+ #
+ # - objects from the included pack D,
+ # - A.t (rescued via D^{tree}), and
+ # - C^{tree} and C.t (rescued via pack C_ONLY)
+ #
+ # , but should omit:
+ #
+ # - C (excluded via C_ONLY),
+ # - objects from pack B (trivially excluded-closed)
+ # - A and A^{tree} (ancestors of B)
+ P=$(git pack-objects --stdin-packs=follow $packdir/pack <<-EOF
+ pack-$D.pack
+ !pack-$C_ONLY.pack
+ ^pack-$B.pack
+ EOF
+ ) &&
+
+ {
+ objects_in_packs $D &&
+ git rev-parse A:A.t "C^{tree}" C:C.t
+ } >expect.raw &&
+ sort expect.raw >expect &&
+
+ objects_in_packs $P >actual &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success '--stdin-packs with !-delimited pack without follow' '
+ test_when_finished "rm -fr repo" &&
+
+ git init repo &&
+ (
+ test_commit A &&
+ test_commit B &&
+ test_commit C &&
+
+ A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+ B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+ C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+
+ cat >in <<-EOF &&
+ !pack-$A.pack
+ pack-$B.pack
+ pack-$C.pack
+ EOF
+
+ # Without --stdin-packs=follow, we treat the first
+ # line of input as a literal packfile name, and thus
+ # expect pack-objects to complain of a missing pack
+ test_must_fail git pack-objects --stdin-packs --stdout \
+ >/dev/null <in 2>err &&
+ test_grep "could not find pack .!pack-$A.pack." err &&
+
+ # With --stdin-packs=follow, we treat the second line
+ # of input as indicating pack-$A.pack is an excluded
+ # open pack, and thus expect pack-objects to succeed
+ P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
+
+ objects_in_packs $B $C >expect &&
+ objects_in_packs $P >actual &&
+ test_cmp expect actual
+ )
+'
+
test_done
--
2.53.0.614.g164f3b634ec
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs
2026-03-25 23:51 ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
@ 2026-03-26 20:48 ` Derrick Stolee
0 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee @ 2026-03-26 20:48 UTC (permalink / raw)
To: Taylor Blau, git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
On 3/25/2026 7:51 PM, Taylor Blau wrote:
> struct stdin_pack_info {
> struct packed_git *p;
> enum {
> STDIN_PACK_INCLUDE = (1<<0),
> STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
> + STDIN_PACK_EXCLUDE_OPEN = (1<<2),
> } kind;
Especially because we're extending this, I feel more confident
that this would improve by being a named enum. Perhaps these
modes could get comments explaining their differences and how
they might operate in combination.
> if (!*key)
> continue;
> - if (*key == '^')
> + if (*key == '^' ||
> + (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW))
> key++;
This part is getting more complicated, giving potentially more
reason to start with a single branching point in patch 2.
> diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
> index 7eb79bc2cdb..c74b5861af3 100755
> --- a/t/t5331-pack-objects-stdin.sh
> +++ b/t/t5331-pack-objects-stdin.sh
> @@ -415,4 +415,109 @@ test_expect_success '--stdin-packs=follow tolerates missing commits' '
> stdin_packs__follow_with_only HEAD HEAD^{tree}
> '
I see that this isn't yet the fix that turns failure into success.
I look forward to seeing that changeover.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
` (3 preceding siblings ...)
2026-03-25 23:51 ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
@ 2026-03-25 23:51 ` Taylor Blau
2026-03-26 20:49 ` Derrick Stolee
2026-03-26 20:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
5 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2026-03-25 23:51 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In 5ee86c273bf (repack: exclude cruft pack(s) from the MIDX where
possible, 2025-06-23), geometric repacking learned to exclude cruft
packs from the MIDX when 'repack.midxMustContainCruft' is set to
'false'.
This works because packs generated with '--stdin-packs=follow' rescue
any once-unreachable objects that later become reachable, making the
resulting packs closed under reachability without needing the cruft pack
in the MIDX.
However, packs above the geometric split that were not part of the
previous MIDX may not have full object closure. When such packs are
marked as excluded-closed ('^'), pack-objects treats them as a
reachability boundary and does not traverse through them during the
follow pass, potentially leaving the resulting pack without full
closure.
Fix this by marking packs above the geometric split that were not in the
previous MIDX as excluded-open ('!') instead of excluded-closed ('^').
This causes pack-objects to walk through their commits during the follow
pass, rescuing any reachable objects not present in the closed-excluded
packs.
Note that MIDXs which were generated prior to this change and are
unlucky enough to not be closed under reachability may still exhibit
this bug, as we treat all MIDX'd packs as closed. That is true in an
overwhelming number of cases, since in order to have a non-closed MIDX
you would have to:
- Generate a pack via an earlier geometric repack that is not closed
under reachability.
- Store that pack in the MIDX.
- Avoid picking any commits to receive reachability bitmaps which
happen to reach objects from which the missing objects are reachable.
In the extremely rare chance that all of the above should happen, an
all-into-one repack will resolve the issue.
Unfortunately, there is no perfect way to determine whether a MIDX'd
pack is closed outside of ensuring that there is a '1' bit in at least
one bitmap for every bit position corresponding to objects in that pack.
While this is possible to do, this approach would treat MIDX'd packs as
open in cases where there is at least one object that is not reachable
from the subset of commits selected for bitmapping.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 19 +++++++++++++++++--
t/t7704-repack-cruft.sh | 2 +-
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index f6bb04bef72..4c5a82c2c8d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -369,8 +369,23 @@ int cmd_repack(int argc,
*/
for (i = 0; i < geometry.split; i++)
fprintf(in, "%s\n", pack_basename(geometry.pack[i]));
- for (i = geometry.split; i < geometry.pack_nr; i++)
- fprintf(in, "^%s\n", pack_basename(geometry.pack[i]));
+ for (i = geometry.split; i < geometry.pack_nr; i++) {
+ const char *basename = pack_basename(geometry.pack[i]);
+ char marker = '^';
+
+ if (!midx_must_contain_cruft &&
+ !string_list_has_string(&existing.midx_packs,
+ basename)) {
+ /*
+ * Assume non-MIDX'd packs are not
+ * necessarily closed under
+ * reachability.
+ */
+ marker = '!';
+ }
+
+ fprintf(in, "%c%s\n", marker, basename);
+ }
fclose(in);
}
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 77133395b5d..9e03b04315d 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -869,7 +869,7 @@ test_expect_success 'repack --write-midx includes cruft when already geometric'
)
'
-test_expect_failure 'repack rescues once-cruft objects above geometric split' '
+test_expect_success 'repack rescues once-cruft objects above geometric split' '
git config repack.midxMustContainCruft false &&
test_commit reachable &&
--
2.53.0.614.g164f3b634ec
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open
2026-03-25 23:51 ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
@ 2026-03-26 20:49 ` Derrick Stolee
2026-03-26 21:44 ` Taylor Blau
0 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee @ 2026-03-26 20:49 UTC (permalink / raw)
To: Taylor Blau, git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
On 3/25/2026 7:51 PM, Taylor Blau wrote:
> diff --git a/builtin/repack.c b/builtin/repack.c
> index f6bb04bef72..4c5a82c2c8d 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -369,8 +369,23 @@ int cmd_repack(int argc,
> */
> for (i = 0; i < geometry.split; i++)
> fprintf(in, "%s\n", pack_basename(geometry.pack[i]));
> - for (i = geometry.split; i < geometry.pack_nr; i++)
> - fprintf(in, "^%s\n", pack_basename(geometry.pack[i]));
> + for (i = geometry.split; i < geometry.pack_nr; i++) {
> + const char *basename = pack_basename(geometry.pack[i]);
> + char marker = '^';
> +
> + if (!midx_must_contain_cruft &&
> + !string_list_has_string(&existing.midx_packs,
> + basename)) {
> + /*
> + * Assume non-MIDX'd packs are not
> + * necessarily closed under
> + * reachability.
> + */
> + marker = '!';
> + }
> +
> + fprintf(in, "%c%s\n", marker, basename);
> + }
> fclose(in);
> -test_expect_failure 'repack rescues once-cruft objects above geometric split' '
> +test_expect_success 'repack rescues once-cruft objects above geometric split' '
I appreciate the brevity of this behavior change after you
established the new building blocks that make such a
concise change possible.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open
2026-03-26 20:49 ` Derrick Stolee
@ 2026-03-26 21:44 ` Taylor Blau
0 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-26 21:44 UTC (permalink / raw)
To: Derrick Stolee
Cc: git, Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
On Thu, Mar 26, 2026 at 04:49:40PM -0400, Derrick Stolee wrote:
> I appreciate the brevity of this behavior change after you
> established the new building blocks that make such a
> concise change possible.
;-).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow`
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
` (4 preceding siblings ...)
2026-03-25 23:51 ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
@ 2026-03-26 20:51 ` Derrick Stolee
2026-03-26 21:46 ` Taylor Blau
5 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee @ 2026-03-26 20:51 UTC (permalink / raw)
To: Taylor Blau, git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
On 3/25/2026 7:51 PM, Taylor Blau wrote:
> This is a small reroll of my series to fix an issue where MIDX bitmaps
> fail to generate after a geometric repack in certain scenarios where the
> set of MIDX'd objects is not closed under reachability.
>
> The main changes since last time are:
>
> * Clarification in the first patch that the added `release_revisions()`
> call prevents a *potential* leak, not an actual one.
>
> * Cleanup in the second patch (where we convert the --stdin-packs
> handling to use a strmap) based on Patrick's review.
>
> * Dropped an unnecessary "if (p)" conditional in the fourth patch's
> `add_object_entry_from_pack()` callback that is unnecessary.
>
> Otherwise, the series is unchanged from the original round. As usual, a
> range-diff is included below for convenience.
>
Sorry I didn't get to v1 in time. This was an interesting series
and fixes a bug well. My only quibbles are about some minor code
style things, but I do hope you'll consider them. I struggled to
read a few things and the changes I recommend seemed to make the
logic more obvious.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow`
2026-03-26 20:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
@ 2026-03-26 21:46 ` Taylor Blau
0 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-26 21:46 UTC (permalink / raw)
To: Derrick Stolee
Cc: git, Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
On Thu, Mar 26, 2026 at 04:51:09PM -0400, Derrick Stolee wrote:
> Sorry I didn't get to v1 in time. This was an interesting series
> and fixes a bug well. My only quibbles are about some minor code
> style things, but I do hope you'll consider them. I struggled to
> read a few things and the changes I recommend seemed to make the
> logic more obvious.
Thanks for reviewing! The two main comments I picked up from you were:
* whether or not the new enum should be non-anonymous
* changing how we lookup entries in the strmap when processing input
On the latter of those two, I think that the suggestion you made would
break one of the test cases around handling the same pack being
specified multiple times with different flags[^1], so I opted to keep
the current logic there.
On the former, I don't feel strongly either way. If you do, I'm happy to
make that change, but if not I think the series should be good to go
as-is.
Thanks,
Taylor
[^1]: I wish this weren't the case, but this is my fault for not
forbidding it when I initially wrote this feature.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow`
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
` (5 preceding siblings ...)
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
@ 2026-03-27 20:06 ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
` (5 more replies)
6 siblings, 6 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-27 20:06 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt,
Derrick Stolee
This is another small reroll of my series to fix an issue where MIDX
bitmaps fail to generate after a geometric repack in certain scenarios
where the set of MIDX'd objects is not closed under reachability.
The main changes since last time are:
* Named enum stdin_pack_info_kind.
* Refactored how we handle reading incoming packs via stdin.
* Fixed a nasty case where sorting the packs in order of mtime happened
to work on some systems, but ASan detected a very legitimate bug.
As usual, a range-diff is included below for convenience.
Thanks again for your review!
Taylor Blau (5):
pack-objects: plug leak in `read_stdin_packs()`
pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
t7704: demonstrate failure with once-cruft objects above the geometric
split
pack-objects: support excluded-open packs with --stdin-packs
repack: mark non-MIDX packs above the split as excluded-open
Documentation/git-pack-objects.adoc | 25 ++-
builtin/pack-objects.c | 301 +++++++++++++++++++---------
builtin/repack.c | 19 +-
packfile.c | 3 +-
packfile.h | 2 +
t/t5331-pack-objects-stdin.sh | 105 ++++++++++
t/t7704-repack-cruft.sh | 22 ++
7 files changed, 373 insertions(+), 104 deletions(-)
Range-diff against v2:
1: 1fabd88f5e3 = 1: d6ff4e801ab pack-objects: plug leak in `read_stdin_packs()`
2: d5cb793f0eb ! 2: dd9ff1ede4a pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
@@ builtin/pack-objects.c
#include "list.h"
#include "packfile.h"
#include "object-file.h"
-@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b)
+@@ builtin/pack-objects.c: static void show_commit_pack_hint(struct commit *commit, void *data)
+
+ }
+
++/*
++ * stdin_pack_info_kind specifies how a pack specified over stdin
++ * should be treated when pack-objects is invoked with --stdin-packs.
++ *
++ * - STDIN_PACK_INCLUDE: objects in any packs with this flag bit set
++ * should be included in the output pack, unless they appear in an
++ * excluded pack.
++ *
++ * - STDIN_PACK_EXCLUDE_CLOSED: objects in any packs with this flag
++ * bit set should be excluded from the output pack.
++ *
++ * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE are
++ * used as traversal tips when invoked with --stdin-packs=follow.
++ */
++enum stdin_pack_info_kind {
++ STDIN_PACK_INCLUDE = (1<<0),
++ STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
++};
++
++struct stdin_pack_info {
++ struct packed_git *p;
++ enum stdin_pack_info_kind kind;
++};
++
+ static int pack_mtime_cmp(const void *_a, const void *_b)
+ {
+- struct packed_git *a = ((const struct string_list_item*)_a)->util;
+- struct packed_git *b = ((const struct string_list_item*)_b)->util;
++ struct stdin_pack_info *a = ((const struct string_list_item*)_a)->util;
++ struct stdin_pack_info *b = ((const struct string_list_item*)_b)->util;
+
+ /*
+ * order packs by descending mtime so that objects are laid out
+ * roughly as newest-to-oldest
+ */
+- if (a->mtime < b->mtime)
++ if (a->p->mtime < b->p->mtime)
+ return 1;
+- else if (b->mtime < a->mtime)
++ else if (b->p->mtime < a->p->mtime)
+ return -1;
+ else
return 0;
}
-static void read_packs_list_from_stdin(struct rev_info *revs)
-+struct stdin_pack_info {
-+ struct packed_git *p;
-+ enum {
-+ STDIN_PACK_INCLUDE = (1<<0),
-+ STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
-+ } kind;
-+};
-+
+static void stdin_packs_add_pack_entries(struct strmap *packs,
+ struct rev_info *revs)
-+{
+ {
+- struct strbuf buf = STRBUF_INIT;
+- struct string_list include_packs = STRING_LIST_INIT_DUP;
+- struct string_list exclude_packs = STRING_LIST_INIT_DUP;
+- struct string_list_item *item = NULL;
+- struct packed_git *p;
+ struct string_list keys = STRING_LIST_INIT_NODUP;
+ struct string_list_item *item;
+ struct hashmap_iter iter;
+ struct strmap_entry *entry;
-+
+
+- while (strbuf_getline(&buf, stdin) != EOF) {
+- if (!buf.len)
+- continue;
+ strmap_for_each_entry(packs, &iter, entry) {
+ struct stdin_pack_info *info = entry->value;
+ if (!info->p)
+ die(_("could not find pack '%s'"), entry->key);
-+
+
+- if (*buf.buf == '^')
+- string_list_append(&exclude_packs, buf.buf + 1);
+- else
+- string_list_append(&include_packs, buf.buf);
+-
+- strbuf_reset(&buf);
+- }
+-
+- string_list_sort_u(&include_packs, 0);
+- string_list_sort_u(&exclude_packs, 0);
+-
+- repo_for_each_pack(the_repository, p) {
+- const char *pack_name = pack_basename(p);
+-
+- if ((item = string_list_lookup(&include_packs, pack_name))) {
+- if (exclude_promisor_objects && p->pack_promisor)
+- die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
+- item->util = p;
+- }
+- if ((item = string_list_lookup(&exclude_packs, pack_name)))
+- item->util = p;
+- }
+-
+- /*
+- * Arguments we got on stdin may not even be packs. First
+- * check that to avoid segfaulting later on in
+- * e.g. pack_mtime_cmp(), excluded packs are handled below.
+- *
+- * Since we first parsed our STDIN and then sorted the input
+- * lines the pack we error on will be whatever line happens to
+- * sort first. This is lazy, it's enough that we report one
+- * bad case here, we don't need to report the first/last one,
+- * or all of them.
+- */
+- for_each_string_list_item(item, &include_packs) {
+- struct packed_git *p = item->util;
+- if (!p)
+- die(_("could not find pack '%s'"), item->string);
+- if (!is_pack_valid(p))
+- die(_("packfile %s cannot be accessed"), p->pack_name);
+- }
+-
+- /*
+- * Then, handle all of the excluded packs, marking them as
+- * kept in-core so that later calls to add_object_entry()
+- * discards any objects that are also found in excluded packs.
+- */
+- for_each_string_list_item(item, &exclude_packs) {
+- struct packed_git *p = item->util;
+- if (!p)
+- die(_("could not find pack '%s'"), item->string);
+- p->pack_keep_in_core = 1;
+ string_list_append(&keys, entry->key)->util = info;
-+ }
-+
-+ /*
-+ * Order packs by ascending mtime; use QSORT directly to access the
-+ * string_list_item's ->util pointer, which string_list_sort() does not
-+ * provide.
-+ */
+ }
+
+ /*
+@@ builtin/pack-objects.c: static void read_packs_list_from_stdin(struct rev_info *revs)
+ * string_list_item's ->util pointer, which string_list_sort() does not
+ * provide.
+ */
+- QSORT(include_packs.items, include_packs.nr, pack_mtime_cmp);
+-
+- for_each_string_list_item(item, &include_packs) {
+- struct packed_git *p = item->util;
+- for_each_object_in_pack(p,
+- add_object_entry_from_pack,
+- revs,
+- ODB_FOR_EACH_OBJECT_PACK_ORDER);
+ QSORT(keys.items, keys.nr, pack_mtime_cmp);
+
+ for_each_string_list_item(item, &keys) {
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
+ add_object_entry_from_pack,
+ revs,
+ ODB_FOR_EACH_OBJECT_PACK_ORDER);
-+ }
-+
+ }
+
+ string_list_clear(&keys, 0);
+}
+
+static void stdin_packs_read_input(struct rev_info *revs)
- {
- struct strbuf buf = STRBUF_INIT;
-- struct string_list include_packs = STRING_LIST_INIT_DUP;
-- struct string_list exclude_packs = STRING_LIST_INIT_DUP;
-- struct string_list_item *item = NULL;
++{
++ struct strbuf buf = STRBUF_INIT;
+ struct strmap packs = STRMAP_INIT;
- struct packed_git *p;
-
- while (strbuf_getline(&buf, stdin) != EOF) {
-- if (!buf.len)
++ struct packed_git *p;
++
++ while (strbuf_getline(&buf, stdin) != EOF) {
+ struct stdin_pack_info *info;
++ enum stdin_pack_info_kind kind = STDIN_PACK_INCLUDE;
+ const char *key = buf.buf;
+
+ if (!*key)
- continue;
-+ if (*key == '^')
++ continue;
++ else if (*key == '^')
++ kind = STDIN_PACK_EXCLUDE_CLOSED;
++
++ if (kind != STDIN_PACK_INCLUDE)
+ key++;
+
+ info = strmap_get(&packs, key);
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
+ CALLOC_ARRAY(info, 1);
+ strmap_put(&packs, key, info);
+ }
-
- if (*buf.buf == '^')
-- string_list_append(&exclude_packs, buf.buf + 1);
-+ info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
- else
-- string_list_append(&include_packs, buf.buf);
-+ info->kind |= STDIN_PACK_INCLUDE;
-
- strbuf_reset(&buf);
- }
-
-- string_list_sort_u(&include_packs, 0);
-- string_list_sort_u(&exclude_packs, 0);
--
- repo_for_each_pack(the_repository, p) {
-- const char *pack_name = pack_basename(p);
++
++ info->kind |= kind;
++
++ strbuf_reset(&buf);
++ }
++
++ repo_for_each_pack(the_repository, p) {
+ struct stdin_pack_info *info;
-
-- if ((item = string_list_lookup(&include_packs, pack_name))) {
++
+ info = strmap_get(&packs, pack_basename(p));
+ if (!info)
+ continue;
+
+ if (info->kind & STDIN_PACK_INCLUDE) {
- if (exclude_promisor_objects && p->pack_promisor)
- die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
-- item->util = p;
++ if (exclude_promisor_objects && p->pack_promisor)
++ die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
+
+ /*
+ * Arguments we got on stdin may not even be
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
+ */
+ if (!is_pack_valid(p))
+ die(_("packfile %s cannot be accessed"), p->pack_name);
- }
-- if ((item = string_list_lookup(&exclude_packs, pack_name)))
-- item->util = p;
-- }
-
-- /*
-- * Arguments we got on stdin may not even be packs. First
-- * check that to avoid segfaulting later on in
-- * e.g. pack_mtime_cmp(), excluded packs are handled below.
-- *
-- * Since we first parsed our STDIN and then sorted the input
-- * lines the pack we error on will be whatever line happens to
-- * sort first. This is lazy, it's enough that we report one
-- * bad case here, we don't need to report the first/last one,
-- * or all of them.
-- */
-- for_each_string_list_item(item, &include_packs) {
-- struct packed_git *p = item->util;
-- if (!p)
-- die(_("could not find pack '%s'"), item->string);
-- if (!is_pack_valid(p))
-- die(_("packfile %s cannot be accessed"), p->pack_name);
-- }
++ }
++
+ if (info->kind & STDIN_PACK_EXCLUDE_CLOSED) {
+ /*
+ * Marking excluded packs as kept in-core so
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
+ */
+ p->pack_keep_in_core = 1;
+ }
-
-- /*
-- * Then, handle all of the excluded packs, marking them as
-- * kept in-core so that later calls to add_object_entry()
-- * discards any objects that are also found in excluded packs.
-- */
-- for_each_string_list_item(item, &exclude_packs) {
-- struct packed_git *p = item->util;
-- if (!p)
-- die(_("could not find pack '%s'"), item->string);
-- p->pack_keep_in_core = 1;
++
+ info->p = p;
- }
-
-- /*
-- * Order packs by ascending mtime; use QSORT directly to access the
-- * string_list_item's ->util pointer, which string_list_sort() does not
-- * provide.
-- */
-- QSORT(include_packs.items, include_packs.nr, pack_mtime_cmp);
--
-- for_each_string_list_item(item, &include_packs) {
-- struct packed_git *p = item->util;
-- for_each_object_in_pack(p,
-- add_object_entry_from_pack,
-- revs,
-- ODB_FOR_EACH_OBJECT_PACK_ORDER);
-- }
++ }
++
+ stdin_packs_add_pack_entries(&packs, revs);
-
++
strbuf_release(&buf);
- string_list_clear(&include_packs, 0);
- string_list_clear(&exclude_packs, 0);
3: d8f0577077c = 3: 5a5090f8da2 t7704: demonstrate failure with once-cruft objects above the geometric split
4: e028dfbc9fb ! 4: efba1ab93d8 pack-objects: support excluded-open packs with --stdin-packs
@@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct objec
create_object_entry(oid, type, 0, 0, 0, p, ofs);
return 0;
+@@ builtin/pack-objects.c: static void show_commit_pack_hint(struct commit *commit, void *data)
+ * - STDIN_PACK_EXCLUDE_CLOSED: objects in any packs with this flag
+ * bit set should be excluded from the output pack.
+ *
+- * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE are
+- * used as traversal tips when invoked with --stdin-packs=follow.
++ * - STDIN_PACK_EXCLUDE_OPEN: objects in any packs with this flag
++ * bit set should be excluded from the output pack, but are not
++ * guaranteed to be closed under reachability.
++ *
++ * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE or
++ * STDIN_PACK_EXCLUDE_OPEN are used as traversal tips when invoked
++ * with --stdin-packs=follow.
+ */
+ enum stdin_pack_info_kind {
+ STDIN_PACK_INCLUDE = (1<<0),
+ STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
++ STDIN_PACK_EXCLUDE_OPEN = (1<<2),
+ };
+
+ struct stdin_pack_info {
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b)
return 0;
}
@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
+ return stdin_packs_include_check_obj((struct object *)commit, data);
+}
+
- struct stdin_pack_info {
- struct packed_git *p;
- enum {
- STDIN_PACK_INCLUDE = (1<<0),
- STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
-+ STDIN_PACK_EXCLUDE_OPEN = (1<<2),
- } kind;
- };
-
+ static void stdin_packs_add_pack_entries(struct strmap *packs,
+ struct rev_info *revs)
+ {
@@ builtin/pack-objects.c: static void stdin_packs_add_pack_entries(struct strmap *packs,
for_each_string_list_item(item, &keys) {
struct stdin_pack_info *info = item->util;
@@ builtin/pack-objects.c: static void stdin_packs_add_pack_entries(struct strmap *
struct strbuf buf = STRBUF_INIT;
struct strmap packs = STRMAP_INIT;
@@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
-
- if (!*key)
continue;
-- if (*key == '^')
-+ if (*key == '^' ||
-+ (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW))
+ else if (*key == '^')
+ kind = STDIN_PACK_EXCLUDE_CLOSED;
++ else if (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
++ kind = STDIN_PACK_EXCLUDE_OPEN;
+
+ if (kind != STDIN_PACK_INCLUDE)
key++;
-
- info = strmap_get(&packs, key);
-@@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
-
- if (*buf.buf == '^')
- info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
-+ else if (*buf.buf == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
-+ info->kind |= STDIN_PACK_EXCLUDE_OPEN;
- else
- info->kind |= STDIN_PACK_INCLUDE;
-
@@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
p->pack_keep_in_core = 1;
}
5: 23cb9f33dba = 5: c9ad9a0c4ae repack: mark non-MIDX packs above the split as excluded-open
base-commit: 41688c1a2312f62f44435e1a6d03b4b904b5b0ec
--
2.53.0.724.gb20b077944a
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()`
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
@ 2026-03-27 20:06 ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
` (4 subsequent siblings)
5 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-27 20:06 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt,
Derrick Stolee
The `read_stdin_packs()` function added originally via 339bce27f4f
(builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22)
declares a `rev_info` struct but neglects to call `release_revisions()`
on it before returning, creating the potential for a leak.
The related change in 97ec43247c0 (pack-objects: declare 'rev_info' for
'--stdin-packs' earlier, 2025-06-23) carried forward this oversight and
did not address it.
Ensure that we call `release_revisions()` appropriately to prevent a
potential leak from this function. Note that in practice our `rev_info`
here does not have a present leak, hence t5331 passes cleanly before
this commit, even when built with SANITIZE=leak.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index da1087930cb..f640e556823 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3983,6 +3983,8 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
show_object_pack_hint,
&mode);
+ release_revisions(&revs);
+
trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found",
stdin_packs_found_nr);
trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
--
2.53.0.724.gb20b077944a
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
2026-03-27 20:06 ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
@ 2026-03-27 20:06 ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
` (3 subsequent siblings)
5 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-27 20:06 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt,
Derrick Stolee
The '--stdin-packs' mode of pack-objects maintains two separate
string_lists: one for included packs, and one for excluded packs. Each
list stores the pack basename as a string and the corresponding
`packed_git` pointer in its `->util` field.
This works, but makes it awkward to extend the set of pack "kinds" that
pack-objects can accept via stdin, since each new kind would need its
own string_list and duplicated handling. A future commit will want to do
just this, so prepare for that change by handling the various "kinds" of
packs specified over stdin in a more generic fashion.
Namely, replace the two `string_list`s with a single `strmap` keyed on
the pack basename, with values pointing to a new `struct
stdin_pack_info`. This struct tracks both the `packed_git` pointer and a
`kind` bitfield indicating whether the pack was specified as included or
excluded.
Extract the logic for sorting packs by mtime and adding their objects
into a separate `stdin_packs_add_pack_entries()` helper.
While we could have used a `string_list`, we must handle the case where
the same pack is specified more than once. With a `string_list` only, we
would have to pay a quadratic cost to either (a) insert elements into
their sorted positions, or (b) a repeated linear search, which is
accidentally quadratic. For that reason, use a strmap instead.
This patch does not include any functional changes.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 197 +++++++++++++++++++++++++----------------
1 file changed, 121 insertions(+), 76 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f640e556823..8ab7ca98a55 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -28,6 +28,7 @@
#include "reachable.h"
#include "oid-array.h"
#include "strvec.h"
+#include "strmap.h"
#include "list.h"
#include "packfile.h"
#include "object-file.h"
@@ -3835,87 +3836,61 @@ static void show_commit_pack_hint(struct commit *commit, void *data)
}
+/*
+ * stdin_pack_info_kind specifies how a pack specified over stdin
+ * should be treated when pack-objects is invoked with --stdin-packs.
+ *
+ * - STDIN_PACK_INCLUDE: objects in any packs with this flag bit set
+ * should be included in the output pack, unless they appear in an
+ * excluded pack.
+ *
+ * - STDIN_PACK_EXCLUDE_CLOSED: objects in any packs with this flag
+ * bit set should be excluded from the output pack.
+ *
+ * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE are
+ * used as traversal tips when invoked with --stdin-packs=follow.
+ */
+enum stdin_pack_info_kind {
+ STDIN_PACK_INCLUDE = (1<<0),
+ STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
+};
+
+struct stdin_pack_info {
+ struct packed_git *p;
+ enum stdin_pack_info_kind kind;
+};
+
static int pack_mtime_cmp(const void *_a, const void *_b)
{
- struct packed_git *a = ((const struct string_list_item*)_a)->util;
- struct packed_git *b = ((const struct string_list_item*)_b)->util;
+ struct stdin_pack_info *a = ((const struct string_list_item*)_a)->util;
+ struct stdin_pack_info *b = ((const struct string_list_item*)_b)->util;
/*
* order packs by descending mtime so that objects are laid out
* roughly as newest-to-oldest
*/
- if (a->mtime < b->mtime)
+ if (a->p->mtime < b->p->mtime)
return 1;
- else if (b->mtime < a->mtime)
+ else if (b->p->mtime < a->p->mtime)
return -1;
else
return 0;
}
-static void read_packs_list_from_stdin(struct rev_info *revs)
+static void stdin_packs_add_pack_entries(struct strmap *packs,
+ struct rev_info *revs)
{
- struct strbuf buf = STRBUF_INIT;
- struct string_list include_packs = STRING_LIST_INIT_DUP;
- struct string_list exclude_packs = STRING_LIST_INIT_DUP;
- struct string_list_item *item = NULL;
- struct packed_git *p;
+ struct string_list keys = STRING_LIST_INIT_NODUP;
+ struct string_list_item *item;
+ struct hashmap_iter iter;
+ struct strmap_entry *entry;
- while (strbuf_getline(&buf, stdin) != EOF) {
- if (!buf.len)
- continue;
+ strmap_for_each_entry(packs, &iter, entry) {
+ struct stdin_pack_info *info = entry->value;
+ if (!info->p)
+ die(_("could not find pack '%s'"), entry->key);
- if (*buf.buf == '^')
- string_list_append(&exclude_packs, buf.buf + 1);
- else
- string_list_append(&include_packs, buf.buf);
-
- strbuf_reset(&buf);
- }
-
- string_list_sort_u(&include_packs, 0);
- string_list_sort_u(&exclude_packs, 0);
-
- repo_for_each_pack(the_repository, p) {
- const char *pack_name = pack_basename(p);
-
- if ((item = string_list_lookup(&include_packs, pack_name))) {
- if (exclude_promisor_objects && p->pack_promisor)
- die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
- item->util = p;
- }
- if ((item = string_list_lookup(&exclude_packs, pack_name)))
- item->util = p;
- }
-
- /*
- * Arguments we got on stdin may not even be packs. First
- * check that to avoid segfaulting later on in
- * e.g. pack_mtime_cmp(), excluded packs are handled below.
- *
- * Since we first parsed our STDIN and then sorted the input
- * lines the pack we error on will be whatever line happens to
- * sort first. This is lazy, it's enough that we report one
- * bad case here, we don't need to report the first/last one,
- * or all of them.
- */
- for_each_string_list_item(item, &include_packs) {
- struct packed_git *p = item->util;
- if (!p)
- die(_("could not find pack '%s'"), item->string);
- if (!is_pack_valid(p))
- die(_("packfile %s cannot be accessed"), p->pack_name);
- }
-
- /*
- * Then, handle all of the excluded packs, marking them as
- * kept in-core so that later calls to add_object_entry()
- * discards any objects that are also found in excluded packs.
- */
- for_each_string_list_item(item, &exclude_packs) {
- struct packed_git *p = item->util;
- if (!p)
- die(_("could not find pack '%s'"), item->string);
- p->pack_keep_in_core = 1;
+ string_list_append(&keys, entry->key)->util = info;
}
/*
@@ -3923,19 +3898,89 @@ static void read_packs_list_from_stdin(struct rev_info *revs)
* string_list_item's ->util pointer, which string_list_sort() does not
* provide.
*/
- QSORT(include_packs.items, include_packs.nr, pack_mtime_cmp);
-
- for_each_string_list_item(item, &include_packs) {
- struct packed_git *p = item->util;
- for_each_object_in_pack(p,
- add_object_entry_from_pack,
- revs,
- ODB_FOR_EACH_OBJECT_PACK_ORDER);
+ QSORT(keys.items, keys.nr, pack_mtime_cmp);
+
+ for_each_string_list_item(item, &keys) {
+ struct stdin_pack_info *info = item->util;
+
+ if (info->kind & STDIN_PACK_INCLUDE)
+ for_each_object_in_pack(info->p,
+ add_object_entry_from_pack,
+ revs,
+ ODB_FOR_EACH_OBJECT_PACK_ORDER);
}
+ string_list_clear(&keys, 0);
+}
+
+static void stdin_packs_read_input(struct rev_info *revs)
+{
+ struct strbuf buf = STRBUF_INIT;
+ struct strmap packs = STRMAP_INIT;
+ struct packed_git *p;
+
+ while (strbuf_getline(&buf, stdin) != EOF) {
+ struct stdin_pack_info *info;
+ enum stdin_pack_info_kind kind = STDIN_PACK_INCLUDE;
+ const char *key = buf.buf;
+
+ if (!*key)
+ continue;
+ else if (*key == '^')
+ kind = STDIN_PACK_EXCLUDE_CLOSED;
+
+ if (kind != STDIN_PACK_INCLUDE)
+ key++;
+
+ info = strmap_get(&packs, key);
+ if (!info) {
+ CALLOC_ARRAY(info, 1);
+ strmap_put(&packs, key, info);
+ }
+
+ info->kind |= kind;
+
+ strbuf_reset(&buf);
+ }
+
+ repo_for_each_pack(the_repository, p) {
+ struct stdin_pack_info *info;
+
+ info = strmap_get(&packs, pack_basename(p));
+ if (!info)
+ continue;
+
+ if (info->kind & STDIN_PACK_INCLUDE) {
+ if (exclude_promisor_objects && p->pack_promisor)
+ die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
+
+ /*
+ * Arguments we got on stdin may not even be
+ * packs. First check that to avoid segfaulting
+ * later on in e.g. pack_mtime_cmp(), excluded
+ * packs are handled below.
+ */
+ if (!is_pack_valid(p))
+ die(_("packfile %s cannot be accessed"), p->pack_name);
+ }
+
+ if (info->kind & STDIN_PACK_EXCLUDE_CLOSED) {
+ /*
+ * Marking excluded packs as kept in-core so
+ * that later calls to add_object_entry()
+ * discards any objects that are also found in
+ * excluded packs.
+ */
+ p->pack_keep_in_core = 1;
+ }
+
+ info->p = p;
+ }
+
+ stdin_packs_add_pack_entries(&packs, revs);
+
strbuf_release(&buf);
- string_list_clear(&include_packs, 0);
- string_list_clear(&exclude_packs, 0);
+ strmap_clear(&packs, 1);
}
static void add_unreachable_loose_objects(struct rev_info *revs);
@@ -3972,7 +4017,7 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
/* avoids adding objects in excluded packs */
ignore_packed_keep_in_core = 1;
- read_packs_list_from_stdin(&revs);
+ stdin_packs_read_input(&revs);
if (rev_list_unpacked)
add_unreachable_loose_objects(&revs);
--
2.53.0.724.gb20b077944a
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
2026-03-27 20:06 ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
@ 2026-03-27 20:06 ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
` (2 subsequent siblings)
5 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-27 20:06 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt,
Derrick Stolee
Add a test demonstrating a case where geometric repacking fails to
produce a pack with full object closure, thus making it impossible to
write a reachability bitmap.
Mark the test with 'test_expect_failure' for now. The subsequent commit
will explain the precise failure mode, and implement a fix.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t7704-repack-cruft.sh | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index aa2e2e6ad88..77133395b5d 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -869,4 +869,26 @@ test_expect_success 'repack --write-midx includes cruft when already geometric'
)
'
+test_expect_failure 'repack rescues once-cruft objects above geometric split' '
+ git config repack.midxMustContainCruft false &&
+
+ test_commit reachable &&
+ test_commit unreachable &&
+
+ unreachable="$(git rev-parse HEAD)" &&
+
+ git reset --hard HEAD^ &&
+ git tag -d unreachable &&
+ git reflog expire --all --expire=all &&
+
+ git repack --cruft -d &&
+
+ echo $unreachable | git pack-objects .git/objects/pack/pack &&
+
+ test_commit new &&
+
+ git update-ref refs/heads/other $unreachable &&
+ git repack --geometric=2 -d --write-midx --write-bitmap-index
+'
+
test_done
--
2.53.0.724.gb20b077944a
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
` (2 preceding siblings ...)
2026-03-27 20:06 ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
@ 2026-03-27 20:06 ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-27 20:16 ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
5 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-27 20:06 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt,
Derrick Stolee
In cd846bacc7d (pack-objects: introduce '--stdin-packs=follow',
2025-06-23), pack-objects learned to traverse through commits in
included packs when using '--stdin-packs=follow', rescuing reachable
objects from unlisted packs into the output.
When we encounter a commit in an excluded pack during this rescuing
phase we will traverse through its parents. But because we set
`revs.no_kept_objects = 1`, commit simplification will prevent us from
showing it via `get_revision()`. (In practice, `--stdin-packs=follow`
walks commits down to the roots, but only opens up trees for ones that
do not appear in an excluded pack.)
But there are certain cases where we *do* need to see the parents of an
object in an excluded pack. Namely, if an object is rescue-able, but
only reachable from object(s) which appear in excluded packs, then
commit simplification will exclude those commits from the object
traversal, and we will never see a copy of that object, and thus not
rescue it.
This is what causes the failure in the previous commit during repacking.
When performing a geometric repack, packs above the geometric split that
weren't part of the previous MIDX (e.g., packs pushed directly into
`$GIT_DIR/objects/pack`) may not have full object closure. When those
packs are listed as excluded via the '^' marker, the reachability
traversal encounters the sequence described above, and may miss objects
which we expect to rescue with `--stdin-packs=follow`.
Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed
packs, objects from '!'-prefixed packs are excluded from the resulting
pack. But unlike '^', commits in '!'-prefixed packs *are* used as
starting points for the follow traversal, and the traversal does not
treat them as a closure boundary.
In order to distinguish excluded-closed from excluded-open packs during
the traversal, introduce a new `pack_keep_in_core_open` bit on
`struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN`
flag for the kept-pack cache.
In `add_object_entry_from_pack()`, move the `want_object_in_pack()`
check to *after* `add_pending_oid()`. This is necessary so that commits
from excluded-open packs are added as traversal tips even though their
objects won't appear in the output. As a consequence, the caller
`for_each_object_in_pack()` will always provide a non-NULL 'p', hence we
are able to drop the "if (p)" conditional.
The `include_check` and `include_check_obj` callbacks on `rev_info` are
used to halt the walk at closed-excluded packs, since objects behind a
'^' boundary are guaranteed to have closure and need not be rescued.
The following commit will make use of this new functionality within the
repack layer to resolve the test failure demonstrated in the previous
commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-pack-objects.adoc | 25 ++++--
builtin/pack-objects.c | 116 ++++++++++++++++++++++------
packfile.c | 3 +-
packfile.h | 2 +
t/t5331-pack-objects-stdin.sh | 105 +++++++++++++++++++++++++
5 files changed, 218 insertions(+), 33 deletions(-)
diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 71b9682485c..b78175fbe1b 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -94,13 +94,24 @@ base-name::
included packs (those not beginning with `^`), excluding any
objects listed in the excluded packs (beginning with `^`).
+
-When `mode` is "follow", objects from packs not listed on stdin receive
-special treatment. Objects within unlisted packs will be included if
-those objects are (1) reachable from the included packs, and (2) not
-found in any excluded packs. This mode is useful, for example, to
-resurrect once-unreachable objects found in cruft packs to generate
-packs which are closed under reachability up to the boundary set by the
-excluded packs.
+When `mode` is "follow" packs may additionally be prefixed with `!`,
+indicating that they are excluded but not necessarily closed under
+reachability. In addition to objects in included packs, the resulting
+pack may include additional objects based on the following:
++
+--
+* If any packs are marked with `!`, then objects reachable from such
+ packs or included ones via objects outside of excluded-closed packs
+ will be included. In this case, all `^` packs are treated as closed
+ under reachability.
+* Otherwise (if there are no `!` packs), objects within unlisted packs
+ will be included if those objects are (1) reachable from the
+ included packs, and (2) not found in any excluded packs.
+--
++
+This mode is useful, for example, to resurrect once-unreachable
+objects found in cruft packs to generate packs which are closed under
+reachability up to the boundary set by the excluded packs.
+
Incompatible with `--revs`, or options that imply `--revs` (such as
`--all`), with the exception of `--unpacked`, which is compatible.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8ab7ca98a55..d0203e72d68 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -218,6 +218,7 @@ static int have_non_local_packs;
static int incremental;
static int ignore_packed_keep_on_disk;
static int ignore_packed_keep_in_core;
+static int ignore_packed_keep_in_core_open;
static int ignore_packed_keep_in_core_has_cruft;
static int allow_ofs_delta;
static struct pack_idx_option pack_idx_opts;
@@ -1633,7 +1634,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
/*
* Then handle .keep first, as we have a fast(er) path there.
*/
- if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) {
+ if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core ||
+ ignore_packed_keep_in_core_open) {
/*
* Set the flags for the kept-pack cache to be the ones we want
* to ignore.
@@ -1647,6 +1649,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
flags |= KEPT_PACK_ON_DISK;
if (ignore_packed_keep_in_core)
flags |= KEPT_PACK_IN_CORE;
+ if (ignore_packed_keep_in_core_open)
+ flags |= KEPT_PACK_IN_CORE_OPEN;
/*
* If the object is in a pack that we want to ignore, *and* we
@@ -1658,6 +1662,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
return 0;
if (ignore_packed_keep_in_core && p->pack_keep_in_core)
return 0;
+ if (ignore_packed_keep_in_core_open && p->pack_keep_in_core_open)
+ return 0;
if (has_object_kept_pack(p->repo, oid, flags))
return 0;
} else {
@@ -3757,6 +3763,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
void *_data)
{
off_t ofs;
+ struct object_info oi = OBJECT_INFO_INIT;
enum object_type type = OBJ_NONE;
display_progress(progress_state, ++nr_seen);
@@ -3764,29 +3771,34 @@ static int add_object_entry_from_pack(const struct object_id *oid,
if (have_duplicate_entry(oid, 0))
return 0;
+ stdin_packs_found_nr++;
+
ofs = nth_packed_object_offset(p, pos);
+
+ oi.typep = &type;
+ if (packed_object_info(p, ofs, &oi) < 0) {
+ die(_("could not get type of object %s in pack %s"),
+ oid_to_hex(oid), p->pack_name);
+ } else if (type == OBJ_COMMIT) {
+ struct rev_info *revs = _data;
+ /*
+ * commits in included packs are used as starting points
+ * for the subsequent revision walk
+ *
+ * Note that we do want to walk through commits that are
+ * present in excluded-open ('!') packs to pick up any
+ * objects reachable from them not present in the
+ * excluded-closed ('^') packs.
+ *
+ * However, we'll only add those objects to the packing
+ * list after checking `want_object_in_pack()` below.
+ */
+ add_pending_oid(revs, NULL, oid, 0);
+ }
+
if (!want_object_in_pack(oid, 0, &p, &ofs))
return 0;
- if (p) {
- struct object_info oi = OBJECT_INFO_INIT;
-
- oi.typep = &type;
- if (packed_object_info(p, ofs, &oi) < 0) {
- die(_("could not get type of object %s in pack %s"),
- oid_to_hex(oid), p->pack_name);
- } else if (type == OBJ_COMMIT) {
- struct rev_info *revs = _data;
- /*
- * commits in included packs are used as starting points for the
- * subsequent revision walk
- */
- add_pending_oid(revs, NULL, oid, 0);
- }
-
- stdin_packs_found_nr++;
- }
-
create_object_entry(oid, type, 0, 0, 0, p, ofs);
return 0;
@@ -3847,12 +3859,18 @@ static void show_commit_pack_hint(struct commit *commit, void *data)
* - STDIN_PACK_EXCLUDE_CLOSED: objects in any packs with this flag
* bit set should be excluded from the output pack.
*
- * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE are
- * used as traversal tips when invoked with --stdin-packs=follow.
+ * - STDIN_PACK_EXCLUDE_OPEN: objects in any packs with this flag
+ * bit set should be excluded from the output pack, but are not
+ * guaranteed to be closed under reachability.
+ *
+ * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE or
+ * STDIN_PACK_EXCLUDE_OPEN are used as traversal tips when invoked
+ * with --stdin-packs=follow.
*/
enum stdin_pack_info_kind {
STDIN_PACK_INCLUDE = (1<<0),
STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
+ STDIN_PACK_EXCLUDE_OPEN = (1<<2),
};
struct stdin_pack_info {
@@ -3877,6 +3895,17 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
return 0;
}
+static int stdin_packs_include_check_obj(struct object *obj, void *data UNUSED)
+{
+ return !has_object_kept_pack(to_pack.repo, &obj->oid,
+ KEPT_PACK_IN_CORE);
+}
+
+static int stdin_packs_include_check(struct commit *commit, void *data)
+{
+ return stdin_packs_include_check_obj((struct object *)commit, data);
+}
+
static void stdin_packs_add_pack_entries(struct strmap *packs,
struct rev_info *revs)
{
@@ -3903,7 +3932,19 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
for_each_string_list_item(item, &keys) {
struct stdin_pack_info *info = item->util;
- if (info->kind & STDIN_PACK_INCLUDE)
+ if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
+ /*
+ * When open-excluded packs ("!") are present, stop
+ * the parent walk at closed-excluded ("^") packs.
+ * Objects behind a "^" boundary are guaranteed to
+ * have closure and should not be rescued.
+ */
+ revs->include_check = stdin_packs_include_check;
+ revs->include_check_obj = stdin_packs_include_check_obj;
+ }
+
+ if ((info->kind & STDIN_PACK_INCLUDE) ||
+ (info->kind & STDIN_PACK_EXCLUDE_OPEN))
for_each_object_in_pack(info->p,
add_object_entry_from_pack,
revs,
@@ -3913,7 +3954,8 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
string_list_clear(&keys, 0);
}
-static void stdin_packs_read_input(struct rev_info *revs)
+static void stdin_packs_read_input(struct rev_info *revs,
+ enum stdin_packs_mode mode)
{
struct strbuf buf = STRBUF_INIT;
struct strmap packs = STRMAP_INIT;
@@ -3928,6 +3970,8 @@ static void stdin_packs_read_input(struct rev_info *revs)
continue;
else if (*key == '^')
kind = STDIN_PACK_EXCLUDE_CLOSED;
+ else if (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
+ kind = STDIN_PACK_EXCLUDE_OPEN;
if (kind != STDIN_PACK_INCLUDE)
key++;
@@ -3974,6 +4018,20 @@ static void stdin_packs_read_input(struct rev_info *revs)
p->pack_keep_in_core = 1;
}
+ if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
+ /*
+ * Marking excluded open packs as kept in-core
+ * (open) for the same reason as we marked
+ * exclude closed packs as kept in-core.
+ *
+ * Use a separate flag here to ensure we don't
+ * halt our traversal at these packs, since they
+ * are not guaranteed to have closure.
+ *
+ */
+ p->pack_keep_in_core_open = 1;
+ }
+
info->p = p;
}
@@ -4017,7 +4075,15 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
/* avoids adding objects in excluded packs */
ignore_packed_keep_in_core = 1;
- stdin_packs_read_input(&revs);
+ if (mode == STDIN_PACKS_MODE_FOLLOW) {
+ /*
+ * In '--stdin-packs=follow' mode, additionally ignore
+ * objects in excluded-open packs to prevent them from
+ * appearing in the resulting pack.
+ */
+ ignore_packed_keep_in_core_open = 1;
+ }
+ stdin_packs_read_input(&revs, mode);
if (rev_list_unpacked)
add_unreachable_loose_objects(&revs);
diff --git a/packfile.c b/packfile.c
index d4de9f3ffe8..269e420c3b9 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2244,7 +2244,8 @@ struct packed_git **packfile_store_get_kept_pack_cache(struct packfile_store *st
struct packed_git *p = e->pack;
if ((p->pack_keep && (flags & KEPT_PACK_ON_DISK)) ||
- (p->pack_keep_in_core && (flags & KEPT_PACK_IN_CORE))) {
+ (p->pack_keep_in_core && (flags & KEPT_PACK_IN_CORE)) ||
+ (p->pack_keep_in_core_open && (flags & KEPT_PACK_IN_CORE_OPEN))) {
ALLOC_GROW(packs, nr + 1, alloc);
packs[nr++] = p;
}
diff --git a/packfile.h b/packfile.h
index a16ec3950d2..dd37fc1512e 100644
--- a/packfile.h
+++ b/packfile.h
@@ -28,6 +28,7 @@ struct packed_git {
unsigned pack_local:1,
pack_keep:1,
pack_keep_in_core:1,
+ pack_keep_in_core_open:1,
freshened:1,
do_not_close:1,
pack_promisor:1,
@@ -266,6 +267,7 @@ int packfile_store_freshen_object(struct packfile_store *store,
enum kept_pack_type {
KEPT_PACK_ON_DISK = (1 << 0),
KEPT_PACK_IN_CORE = (1 << 1),
+ KEPT_PACK_IN_CORE_OPEN = (1 << 2),
};
/*
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 7eb79bc2cdb..c74b5861af3 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -415,4 +415,109 @@ test_expect_success '--stdin-packs=follow tolerates missing commits' '
stdin_packs__follow_with_only HEAD HEAD^{tree}
'
+test_expect_success '--stdin-packs=follow with open-excluded packs' '
+ test_when_finished "rm -fr repo" &&
+
+ git init repo &&
+ (
+ cd repo &&
+ git config set maintenance.auto false &&
+
+ git branch -M main &&
+
+ # Create the following commit structure:
+ #
+ # A <-- B <-- D (main)
+ # ^
+ # \
+ # C (other)
+ test_commit A &&
+ test_commit B &&
+ git checkout -B other &&
+ test_commit C &&
+ git checkout main &&
+ test_commit D &&
+
+ A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+ B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+ C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+ D="$(echo B..D | git pack-objects --revs $packdir/pack)" &&
+
+ C_ONLY="$(git rev-parse other | git pack-objects $packdir/pack)" &&
+
+ git prune-packed &&
+
+ # Create a pack using --stdin-packs=follow where:
+ #
+ # - pack D is included,
+ # - pack C_ONLY is excluded, but open,
+ # - pack B is excluded, but closed, and
+ # - packs A and C are unknown
+ #
+ # The resulting pack should therefore contain:
+ #
+ # - objects from the included pack D,
+ # - A.t (rescued via D^{tree}), and
+ # - C^{tree} and C.t (rescued via pack C_ONLY)
+ #
+ # , but should omit:
+ #
+ # - C (excluded via C_ONLY),
+ # - objects from pack B (trivially excluded-closed)
+ # - A and A^{tree} (ancestors of B)
+ P=$(git pack-objects --stdin-packs=follow $packdir/pack <<-EOF
+ pack-$D.pack
+ !pack-$C_ONLY.pack
+ ^pack-$B.pack
+ EOF
+ ) &&
+
+ {
+ objects_in_packs $D &&
+ git rev-parse A:A.t "C^{tree}" C:C.t
+ } >expect.raw &&
+ sort expect.raw >expect &&
+
+ objects_in_packs $P >actual &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success '--stdin-packs with !-delimited pack without follow' '
+ test_when_finished "rm -fr repo" &&
+
+ git init repo &&
+ (
+ test_commit A &&
+ test_commit B &&
+ test_commit C &&
+
+ A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+ B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+ C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
+
+ cat >in <<-EOF &&
+ !pack-$A.pack
+ pack-$B.pack
+ pack-$C.pack
+ EOF
+
+ # Without --stdin-packs=follow, we treat the first
+ # line of input as a literal packfile name, and thus
+ # expect pack-objects to complain of a missing pack
+ test_must_fail git pack-objects --stdin-packs --stdout \
+ >/dev/null <in 2>err &&
+ test_grep "could not find pack .!pack-$A.pack." err &&
+
+ # With --stdin-packs=follow, we treat the second line
+ # of input as indicating pack-$A.pack is an excluded
+ # open pack, and thus expect pack-objects to succeed
+ P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
+
+ objects_in_packs $B $C >expect &&
+ objects_in_packs $P >actual &&
+ test_cmp expect actual
+ )
+'
+
test_done
--
2.53.0.724.gb20b077944a
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
` (3 preceding siblings ...)
2026-03-27 20:06 ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
@ 2026-03-27 20:06 ` Taylor Blau
2026-03-27 20:16 ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
5 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2026-03-27 20:06 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt,
Derrick Stolee
In 5ee86c273bf (repack: exclude cruft pack(s) from the MIDX where
possible, 2025-06-23), geometric repacking learned to exclude cruft
packs from the MIDX when 'repack.midxMustContainCruft' is set to
'false'.
This works because packs generated with '--stdin-packs=follow' rescue
any once-unreachable objects that later become reachable, making the
resulting packs closed under reachability without needing the cruft pack
in the MIDX.
However, packs above the geometric split that were not part of the
previous MIDX may not have full object closure. When such packs are
marked as excluded-closed ('^'), pack-objects treats them as a
reachability boundary and does not traverse through them during the
follow pass, potentially leaving the resulting pack without full
closure.
Fix this by marking packs above the geometric split that were not in the
previous MIDX as excluded-open ('!') instead of excluded-closed ('^').
This causes pack-objects to walk through their commits during the follow
pass, rescuing any reachable objects not present in the closed-excluded
packs.
Note that MIDXs which were generated prior to this change and are
unlucky enough to not be closed under reachability may still exhibit
this bug, as we treat all MIDX'd packs as closed. That is true in an
overwhelming number of cases, since in order to have a non-closed MIDX
you would have to:
- Generate a pack via an earlier geometric repack that is not closed
under reachability.
- Store that pack in the MIDX.
- Avoid picking any commits to receive reachability bitmaps which
happen to reach objects from which the missing objects are reachable.
In the extremely rare chance that all of the above should happen, an
all-into-one repack will resolve the issue.
Unfortunately, there is no perfect way to determine whether a MIDX'd
pack is closed outside of ensuring that there is a '1' bit in at least
one bitmap for every bit position corresponding to objects in that pack.
While this is possible to do, this approach would treat MIDX'd packs as
open in cases where there is at least one object that is not reachable
from the subset of commits selected for bitmapping.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 19 +++++++++++++++++--
t/t7704-repack-cruft.sh | 2 +-
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index f6bb04bef72..4c5a82c2c8d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -369,8 +369,23 @@ int cmd_repack(int argc,
*/
for (i = 0; i < geometry.split; i++)
fprintf(in, "%s\n", pack_basename(geometry.pack[i]));
- for (i = geometry.split; i < geometry.pack_nr; i++)
- fprintf(in, "^%s\n", pack_basename(geometry.pack[i]));
+ for (i = geometry.split; i < geometry.pack_nr; i++) {
+ const char *basename = pack_basename(geometry.pack[i]);
+ char marker = '^';
+
+ if (!midx_must_contain_cruft &&
+ !string_list_has_string(&existing.midx_packs,
+ basename)) {
+ /*
+ * Assume non-MIDX'd packs are not
+ * necessarily closed under
+ * reachability.
+ */
+ marker = '!';
+ }
+
+ fprintf(in, "%c%s\n", marker, basename);
+ }
fclose(in);
}
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 77133395b5d..9e03b04315d 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -869,7 +869,7 @@ test_expect_success 'repack --write-midx includes cruft when already geometric'
)
'
-test_expect_failure 'repack rescues once-cruft objects above geometric split' '
+test_expect_success 'repack rescues once-cruft objects above geometric split' '
git config repack.midxMustContainCruft false &&
test_commit reachable &&
--
2.53.0.724.gb20b077944a
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow`
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
` (4 preceding siblings ...)
2026-03-27 20:06 ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
@ 2026-03-27 20:16 ` Derrick Stolee
2026-03-27 20:43 ` Junio C Hamano
5 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee @ 2026-03-27 20:16 UTC (permalink / raw)
To: Taylor Blau, git
Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
On 3/27/2026 4:06 PM, Taylor Blau wrote:
> This is another small reroll of my series to fix an issue where MIDX
> bitmaps fail to generate after a geometric repack in certain scenarios
> where the set of MIDX'd objects is not closed under reachability.
>
> The main changes since last time are:
>
> * Named enum stdin_pack_info_kind.
>
> * Refactored how we handle reading incoming packs via stdin.
>
> * Fixed a nasty case where sorting the packs in order of mtime happened
> to work on some systems, but ASan detected a very legitimate bug.
>
> As usual, a range-diff is included below for convenience.
Thanks. This version LGTM.
-Stolee
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow`
2026-03-27 20:16 ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
@ 2026-03-27 20:43 ` Junio C Hamano
0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2026-03-27 20:43 UTC (permalink / raw)
To: Derrick Stolee
Cc: Taylor Blau, git, Jeff King, Elijah Newren, Patrick Steinhardt
Derrick Stolee <stolee@gmail.com> writes:
> On 3/27/2026 4:06 PM, Taylor Blau wrote:
>> This is another small reroll of my series to fix an issue where MIDX
>> bitmaps fail to generate after a geometric repack in certain scenarios
>> where the set of MIDX'd objects is not closed under reachability.
>>
>> The main changes since last time are:
>>
>> * Named enum stdin_pack_info_kind.
>>
>> * Refactored how we handle reading incoming packs via stdin.
>>
>> * Fixed a nasty case where sorting the packs in order of mtime happened
>> to work on some systems, but ASan detected a very legitimate bug.
>>
>> As usual, a range-diff is included below for convenience.
>
> Thanks. This version LGTM.
>
> -Stolee
Yeah, these look good to me too. Thanks, both.
^ permalink raw reply [flat|nested] 41+ messages in thread