From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
Date: Wed, 25 Mar 2026 19:13:56 -0400 [thread overview]
Message-ID: <acRsNHna6IJHQNZq@nand.local> (raw)
In-Reply-To: <acI_sP6ZEdw-xGpR@pks.im>
On Tue, Mar 24, 2026 at 08:39:28AM +0100, Patrick Steinhardt wrote:
> > Extract the logic for sorting packs by mtime and adding their objects
> > into a separate `stdin_packs_add_entries()` helper.
>
> Right, the ordering was my first question. Interestingly though, that
> function doesn't seem to be added in this commit... ah, it's called
> `stdin_packs_add_pack_entries()`.
Ah, good catch. I had originally called it `stdin_packs_add_entries()`
but renamed it before sending, apparently without adjusting the commit
message appropriately.
> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 9a89bc5c4c9..72c9ddbed6b 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -3837,90 +3838,120 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
> > return 0;
> > }
> >
> > -static void read_packs_list_from_stdin(struct rev_info *revs)
> > +struct stdin_pack_info {
> > + struct packed_git *p;
> > + enum {
> > + STDIN_PACK_INCLUDE = (1<<0),
> > + STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
>
> It might make sense to provide a sentence for each of the enums to
> explain what they do.
I'm not opposed, but I am not sure what information would be helpful to
add here, since these correspond one-to-one with the three possible
prefixes for packfile names we receive with --stdin-packs.
> > +static void stdin_packs_add_pack_entries(struct strmap *packs,
> > + struct rev_info *revs)
> > +{
> > + struct string_list keys = STRING_LIST_INIT_NODUP;
> > + struct string_list_item *item;
> > + struct hashmap_iter iter;
> > + struct strmap_entry *entry;
> > +
> > + strmap_for_each_entry(packs, &iter, entry) {
> > + struct stdin_pack_info *info = entry->value;
> > + if (!info->p)
> > + die(_("could not find pack '%s'"), entry->key);
> > +
> > + string_list_append(&keys, entry->key)->util = info->p;
> > + }
> > +
> > + /*
> > + * Order packs by ascending mtime; use QSORT directly to access the
> > + * string_list_item's ->util pointer, which string_list_sort() does not
> > + * provide.
> > + */
> > + QSORT(keys.items, keys.nr, pack_mtime_cmp);
>
> Okay. I was briefly wondering whether it would make more sense to use
> `string_list_sort()`, but I guess it doesn't buy us much.
Yeah. This is actually carried forward from the existing implementation,
and uses the separate QSORT() because `string_list_sort()` doesn't
provide access to the `util` field of the items, which we need to sort
by mtime.
> > + for_each_string_list_item(item, &keys) {
> > + struct stdin_pack_info *info = strmap_get(packs, item->string);
>
> We could avoid this extra lookup if you instead were to store the pack
> info in the `item->util` field.
Good idea. Funnily enough, we already assign ->util = info->p in the
loop above, but never use it. Something like this on top should clean
things up nicely:
--- 8< ---
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 72c9ddbed6b..c9b33d1673d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3859,7 +3859,7 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
if (!info->p)
die(_("could not find pack '%s'"), entry->key);
- string_list_append(&keys, entry->key)->util = info->p;
+ string_list_append(&keys, entry->key)->util = info;
}
/*
@@ -3870,9 +3870,7 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
QSORT(keys.items, keys.nr, pack_mtime_cmp);
for_each_string_list_item(item, &keys) {
- struct stdin_pack_info *info = strmap_get(packs, item->string);
- if (!info->p)
- die(_("could not find pack '%s'"), item->string);
+ struct stdin_pack_info *info = item->util;
if (info->kind & STDIN_PACK_INCLUDE)
for_each_object_in_pack(info->p,
--- >8 ---
> > + if (!info->p)
> > + die(_("could not find pack '%s'"), item->string);
>
> This case basically cannot happen as we already `die()` further up,
> right? Should we rather `BUG()` or drop the check completely?
I think we should drop the check completely here, there's no way that we
would have a NULL 'info->p' by this point with the check that exists a
few lines up.
> > + if (info->kind & STDIN_PACK_INCLUDE)
> > + for_each_object_in_pack(info->p,
> > + add_object_entry_from_pack,
> > + revs,
> > + ODB_FOR_EACH_OBJECT_PACK_ORDER);
> > + }
> > +
> > + string_list_clear(&keys, 0);
> > +}
> > +
> > +static void stdin_packs_read_input(struct rev_info *revs)
> > {
> > struct strbuf buf = STRBUF_INIT;
> > - struct string_list include_packs = STRING_LIST_INIT_DUP;
> > - struct string_list exclude_packs = STRING_LIST_INIT_DUP;
> > - struct string_list_item *item = NULL;
> > + struct strmap packs = STRMAP_INIT;
> > struct packed_git *p;
> >
> > while (strbuf_getline(&buf, stdin) != EOF) {
> > - if (!buf.len)
> > + struct stdin_pack_info *info;
> > + const char *key = buf.buf;
> > +
> > + if (!key || !*key)
>
> The first case of `!key` cannot ever happen as strbufs always have `buf`
> set.
You're right, this is just muscle memory, but the left-hand side of the
condition is unnecessary. I'll remove it.
> > continue;
> >
> > + if (*key == '^')
> > + key++;
> > +
> > + info = strmap_get(&packs, key);
> > + if (!info) {
> > + CALLOC_ARRAY(info, 1);
> > + strmap_put(&packs, key, info);
> > + }
> > +
> > if (*buf.buf == '^')
> > - string_list_append(&exclude_packs, buf.buf + 1);
> > + info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
> > else
> > - string_list_append(&include_packs, buf.buf);
> > + info->kind |= STDIN_PACK_INCLUDE;
>
> I was briefly wondering whether we need error handling for the case
> where a pack is marked both as excluded and included. But we didn't have
> it beforehand, either.
Yeah, I think this is a consequence of 752b465c3c0 (pack-objects: fix
error when same packfile is included and excluded, 2023-04-14).
> > [snip]
> > +
> > + /*
> > + * Arguments we got on stdin may not even be
> > + * packs. First check that to avoid segfaulting
> > + * later on in e.g. pack_mtime_cmp(), excluded
> > + * packs are handled below.
> > + */
> > + if (!is_pack_valid(p))
> > + die(_("packfile %s cannot be accessed"), p->pack_name);
>
> Hm. Doesn't this change behaviour though? Beforehand, we would have
> checked the packfile for every included pack. Now we only check the
> packfile for every included pack that was yielded by
> `repo_for_each_pack()`. So if an included pack wasn't yielded at all we
> wouldn't notice that it doesn't exist?
>
> I guess an easy fix would be to mark every pack that we have processed
> as seen in the pack info, and then loop over all pack infos a second
> time to verify that we've seen all that we expected to see.
>
> Which you in fact already do :) That post-processing happens in
> `stdin_packs_add_pack_entries()`, where you verify that the `p` pointer
> is set as expected. And if it's not we die with a message that the pack
> wasn't found. Good.
Thanks for double checking.
> This was a bit more demanding to review, but I very much like the
> outcome of this.
Yeah, I really struggled to try and find a productive way to break this
up into smaller changes. But in the end I couldn't find any good splits
that I liked, hence the larger-than-usual patch.
Thanks for reviewing it, I think that it makes the rest of the series a
little more palatable, and the resulting code is easier to reason about
IMHO.
Thanks,
Taylor
next prev parent reply other threads:[~2026-03-25 23:13 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-24 7:39 ` Patrick Steinhardt
2026-03-25 23:03 ` Taylor Blau
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-24 7:39 ` Patrick Steinhardt
2026-03-25 23:13 ` Taylor Blau [this message]
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-21 16:57 ` Jeff King
2026-03-22 18:09 ` Taylor Blau
2026-03-25 23:19 ` Taylor Blau
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-26 20:40 ` Derrick Stolee
2026-03-26 21:44 ` Taylor Blau
2026-03-26 22:11 ` Junio C Hamano
2026-03-26 22:32 ` Taylor Blau
2026-03-27 0:29 ` Derrick Stolee
2026-03-27 17:51 ` Taylor Blau
2026-03-27 18:34 ` Derrick Stolee
2026-03-27 15:52 ` Junio C Hamano
2026-03-26 22:37 ` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-25 23:51 ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-26 20:48 ` Derrick Stolee
2026-03-25 23:51 ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-26 20:49 ` Derrick Stolee
2026-03-26 21:44 ` Taylor Blau
2026-03-26 20:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-26 21:46 ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
2026-03-27 20:06 ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-27 20:06 ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-27 20:06 ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-27 20:16 ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-27 20:43 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acRsNHna6IJHQNZq@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox