From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
Date: Tue, 24 Mar 2026 08:39:28 +0100 [thread overview]
Message-ID: <acI_sP6ZEdw-xGpR@pks.im> (raw)
In-Reply-To: <ea6fdbcc46f608c3fbe65298e9ca91faf43a1b16.1773959041.git.me@ttaylorr.com>
On Thu, Mar 19, 2026 at 06:24:20PM -0400, Taylor Blau wrote:
> The '--stdin-packs' mode of pack-objects maintains two separate
> string_lists: one for included packs, and one for excluded packs. Each
> list stores the pack basename as a string and the corresponding
> `packed_git` pointer in its `->util` field.
>
> This works, but makes it awkward to extend the set of pack "kinds" that
> pack-objects can accept via stdin, since each new kind would need its
> own string_list and duplicated handling. A future commit will want to do
> just this, so prepare for that change by handling the various "kinds" of
> packs specified over stdin in a more generic fashion.
>
> Namely, replace the two `string_list`s with a single `strmap` keyed on
> the pack basename, with values pointing to a new `struct
> stdin_pack_info`. This struct tracks both the `packed_git` pointer and a
> `kind` bitfield indicating whether the pack was specified as included or
> excluded.
Okay.
> Extract the logic for sorting packs by mtime and adding their objects
> into a separate `stdin_packs_add_entries()` helper.
Right, the ordering was my first question. Interestingly though, that
function doesn't seem to be added in this commit... ah, it's called
`stdin_packs_add_pack_entries()`.
> While we could have used a `string_list`, we must handle the case where
> the same pack is specified more than once. With a `string_list` only, we
> would have to pay a quadratic cost to either (a) insert elements into
> their sorted positions, or (b) a repeated linear search, which is
> accidentally quadratic. For that reason, use a strmap instead.
We could of course just add them and deduplicate in a later step via
`string_list_sort_u()`. But I assume that you want to handle the case
where we have duplicates specially, either by merging the duplicate into
the existing pack info or by aborting.
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 9a89bc5c4c9..72c9ddbed6b 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3837,90 +3838,120 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
> return 0;
> }
>
> -static void read_packs_list_from_stdin(struct rev_info *revs)
> +struct stdin_pack_info {
> + struct packed_git *p;
> + enum {
> + STDIN_PACK_INCLUDE = (1<<0),
> + STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
It might make sense to provide a sentence for each of the enums to
explain what they do.
> + } kind;
> +};
Okay, this is the new structure that allows us to track the packfile
kind in a way that is more extensible. Makes sense.
> +static void stdin_packs_add_pack_entries(struct strmap *packs,
> + struct rev_info *revs)
> +{
> + struct string_list keys = STRING_LIST_INIT_NODUP;
> + struct string_list_item *item;
> + struct hashmap_iter iter;
> + struct strmap_entry *entry;
> +
> + strmap_for_each_entry(packs, &iter, entry) {
> + struct stdin_pack_info *info = entry->value;
> + if (!info->p)
> + die(_("could not find pack '%s'"), entry->key);
> +
> + string_list_append(&keys, entry->key)->util = info->p;
> + }
> +
> + /*
> + * Order packs by ascending mtime; use QSORT directly to access the
> + * string_list_item's ->util pointer, which string_list_sort() does not
> + * provide.
> + */
> + QSORT(keys.items, keys.nr, pack_mtime_cmp);
Okay. I was briefly wondering whether it would make more sense to use
`string_list_sort()`, but I guess it doesn't buy us much.
> + for_each_string_list_item(item, &keys) {
> + struct stdin_pack_info *info = strmap_get(packs, item->string);
We could avoid this extra lookup if you instead were to store the pack
info in the `item->util` field.
> + if (!info->p)
> + die(_("could not find pack '%s'"), item->string);
This case basically cannot happen as we already `die()` further up,
right? Should we rather `BUG()` or drop the check completely?
> + if (info->kind & STDIN_PACK_INCLUDE)
> + for_each_object_in_pack(info->p,
> + add_object_entry_from_pack,
> + revs,
> + ODB_FOR_EACH_OBJECT_PACK_ORDER);
> + }
> +
> + string_list_clear(&keys, 0);
> +}
> +
> +static void stdin_packs_read_input(struct rev_info *revs)
> {
> struct strbuf buf = STRBUF_INIT;
> - struct string_list include_packs = STRING_LIST_INIT_DUP;
> - struct string_list exclude_packs = STRING_LIST_INIT_DUP;
> - struct string_list_item *item = NULL;
> + struct strmap packs = STRMAP_INIT;
> struct packed_git *p;
>
> while (strbuf_getline(&buf, stdin) != EOF) {
> - if (!buf.len)
> + struct stdin_pack_info *info;
> + const char *key = buf.buf;
> +
> + if (!key || !*key)
The first case of `!key` cannot ever happen as strbufs always have `buf`
set.
> continue;
>
> + if (*key == '^')
> + key++;
> +
> + info = strmap_get(&packs, key);
> + if (!info) {
> + CALLOC_ARRAY(info, 1);
> + strmap_put(&packs, key, info);
> + }
> +
> if (*buf.buf == '^')
> - string_list_append(&exclude_packs, buf.buf + 1);
> + info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
> else
> - string_list_append(&include_packs, buf.buf);
> + info->kind |= STDIN_PACK_INCLUDE;
I was briefly wondering whether we need error handling for the case
where a pack is marked both as excluded and included. But we didn't have
it beforehand, either.
> strbuf_reset(&buf);
> }
>
> - string_list_sort_u(&include_packs, 0);
> - string_list_sort_u(&exclude_packs, 0);
> -
> repo_for_each_pack(the_repository, p) {
> - const char *pack_name = pack_basename(p);
> + struct stdin_pack_info *info;
>
> - if ((item = string_list_lookup(&include_packs, pack_name))) {
> + info = strmap_get(&packs, pack_basename(p));
> + if (!info)
> + continue;
> +
> + if (info->kind & STDIN_PACK_INCLUDE) {
> if (exclude_promisor_objects && p->pack_promisor)
> die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
> - item->util = p;
> +
> + /*
> + * Arguments we got on stdin may not even be
> + * packs. First check that to avoid segfaulting
> + * later on in e.g. pack_mtime_cmp(), excluded
> + * packs are handled below.
> + */
> + if (!is_pack_valid(p))
> + die(_("packfile %s cannot be accessed"), p->pack_name);
Hm. Doesn't this change behaviour though? Beforehand, we would have
checked the packfile for every included pack. Now we only check the
packfile for every included pack that was yielded by
`repo_for_each_pack()`. So if an included pack wasn't yielded at all we
wouldn't notice that it doesn't exist?
I guess an easy fix would be to mark every pack that we have processed
as seen in the pack info, and then loop over all pack infos a second
time to verify that we've seen all that we expected to see.
Which you in fact already do :) That post-processing happens in
`stdin_packs_add_pack_entries()`, where you verify that the `p` pointer
is set as expected. And if it's not we die with a message that the pack
wasn't found. Good.
This was a bit more demanding to review, but I very much like the
outcome of this.
Patrick
next prev parent reply other threads:[~2026-03-24 7:39 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-24 7:39 ` Patrick Steinhardt
2026-03-25 23:03 ` Taylor Blau
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-24 7:39 ` Patrick Steinhardt [this message]
2026-03-25 23:13 ` Taylor Blau
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-21 16:57 ` Jeff King
2026-03-22 18:09 ` Taylor Blau
2026-03-25 23:19 ` Taylor Blau
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-26 20:40 ` Derrick Stolee
2026-03-26 21:44 ` Taylor Blau
2026-03-26 22:11 ` Junio C Hamano
2026-03-26 22:32 ` Taylor Blau
2026-03-27 0:29 ` Derrick Stolee
2026-03-27 17:51 ` Taylor Blau
2026-03-27 18:34 ` Derrick Stolee
2026-03-27 15:52 ` Junio C Hamano
2026-03-26 22:37 ` Taylor Blau
2026-03-25 23:51 ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-25 23:51 ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-26 20:48 ` Derrick Stolee
2026-03-25 23:51 ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-26 20:49 ` Derrick Stolee
2026-03-26 21:44 ` Taylor Blau
2026-03-26 20:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-26 21:46 ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
2026-03-27 20:06 ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-27 20:06 ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-27 20:06 ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-27 20:16 ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-27 20:43 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acI_sP6ZEdw-xGpR@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox