public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
Date: Tue, 24 Mar 2026 08:39:28 +0100	[thread overview]
Message-ID: <acI_sP6ZEdw-xGpR@pks.im> (raw)
In-Reply-To: <ea6fdbcc46f608c3fbe65298e9ca91faf43a1b16.1773959041.git.me@ttaylorr.com>

On Thu, Mar 19, 2026 at 06:24:20PM -0400, Taylor Blau wrote:
> The '--stdin-packs' mode of pack-objects maintains two separate
> string_lists: one for included packs, and one for excluded packs. Each
> list stores the pack basename as a string and the corresponding
> `packed_git` pointer in its `->util` field.
> 
> This works, but makes it awkward to extend the set of pack "kinds" that
> pack-objects can accept via stdin, since each new kind would need its
> own string_list and duplicated handling. A future commit will want to do
> just this, so prepare for that change by handling the various "kinds" of
> packs specified over stdin in a more generic fashion.
> 
> Namely, replace the two `string_list`s with a single `strmap` keyed on
> the pack basename, with values pointing to a new `struct
> stdin_pack_info`. This struct tracks both the `packed_git` pointer and a
> `kind` bitfield indicating whether the pack was specified as included or
> excluded.

Okay.

> Extract the logic for sorting packs by mtime and adding their objects
> into a separate `stdin_packs_add_entries()` helper.

Right, the ordering was my first question. Interestingly though, that
function doesn't seem to be added in this commit... ah, it's called
`stdin_packs_add_pack_entries()`.

> While we could have used a `string_list`, we must handle the case where
> the same pack is specified more than once. With a `string_list` only, we
> would have to pay a quadratic cost to either (a) insert elements into
> their sorted positions, or (b) a repeated linear search, which is
> accidentally quadratic. For that reason, use a strmap instead.

We could of course just add them and deduplicate in a later step via
`string_list_sort_u()`. But I assume that you want to handle the case
where we have duplicates specially, either by merging the duplicate into
the existing pack info or by aborting.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 9a89bc5c4c9..72c9ddbed6b 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3837,90 +3838,120 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
>  		return 0;
>  }
>  
> -static void read_packs_list_from_stdin(struct rev_info *revs)
> +struct stdin_pack_info {
> +	struct packed_git *p;
> +	enum {
> +		STDIN_PACK_INCLUDE = (1<<0),
> +		STDIN_PACK_EXCLUDE_CLOSED = (1<<1),

It might make sense to provide a sentence for each of the enums to
explain what they do.

> +	} kind;
> +};

Okay, this is the new structure that allows us to track the packfile
kind in a way that is more extensible. Makes sense.

> +static void stdin_packs_add_pack_entries(struct strmap *packs,
> +					 struct rev_info *revs)
> +{
> +	struct string_list keys = STRING_LIST_INIT_NODUP;
> +	struct string_list_item *item;
> +	struct hashmap_iter iter;
> +	struct strmap_entry *entry;
> +
> +	strmap_for_each_entry(packs, &iter, entry) {
> +		struct stdin_pack_info *info = entry->value;
> +		if (!info->p)
> +			die(_("could not find pack '%s'"), entry->key);
> +
> +		string_list_append(&keys, entry->key)->util = info->p;
> +	}
> +
> +	/*
> +	 * Order packs by ascending mtime; use QSORT directly to access the
> +	 * string_list_item's ->util pointer, which string_list_sort() does not
> +	 * provide.
> +	 */
> +	QSORT(keys.items, keys.nr, pack_mtime_cmp);

Okay. I was briefly wondering whether it would make more sense to use
`string_list_sort()`, but I guess it doesn't buy us much.

> +	for_each_string_list_item(item, &keys) {
> +		struct stdin_pack_info *info = strmap_get(packs, item->string);

We could avoid this extra lookup if you instead were to store the pack
info in the `item->util` field.

> +		if (!info->p)
> +			die(_("could not find pack '%s'"), item->string);

This case basically cannot happen as we already `die()` further up,
right? Should we rather `BUG()` or drop the check completely?

> +		if (info->kind & STDIN_PACK_INCLUDE)
> +			for_each_object_in_pack(info->p,
> +						add_object_entry_from_pack,
> +						revs,
> +						ODB_FOR_EACH_OBJECT_PACK_ORDER);
> +	}
> +
> +	string_list_clear(&keys, 0);
> +}
> +
> +static void stdin_packs_read_input(struct rev_info *revs)
>  {
>  	struct strbuf buf = STRBUF_INIT;
> -	struct string_list include_packs = STRING_LIST_INIT_DUP;
> -	struct string_list exclude_packs = STRING_LIST_INIT_DUP;
> -	struct string_list_item *item = NULL;
> +	struct strmap packs = STRMAP_INIT;
>  	struct packed_git *p;
>  
>  	while (strbuf_getline(&buf, stdin) != EOF) {
> -		if (!buf.len)
> +		struct stdin_pack_info *info;
> +		const char *key = buf.buf;
> +
> +		if (!key || !*key)

The first case of `!key` cannot ever happen as strbufs always have `buf`
set.

>  			continue;
>  
> +		if (*key == '^')
> +			key++;
> +
> +		info = strmap_get(&packs, key);
> +		if (!info) {
> +			CALLOC_ARRAY(info, 1);
> +			strmap_put(&packs, key, info);
> +		}
> +
>  		if (*buf.buf == '^')
> -			string_list_append(&exclude_packs, buf.buf + 1);
> +			info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
>  		else
> -			string_list_append(&include_packs, buf.buf);
> +			info->kind |= STDIN_PACK_INCLUDE;

I was briefly wondering whether we need error handling for the case
where a pack is marked both as excluded and included. But we didn't have
it beforehand, either.

>  		strbuf_reset(&buf);
>  	}
>  
> -	string_list_sort_u(&include_packs, 0);
> -	string_list_sort_u(&exclude_packs, 0);
> -
>  	repo_for_each_pack(the_repository, p) {
> -		const char *pack_name = pack_basename(p);
> +		struct stdin_pack_info *info;
>  
> -		if ((item = string_list_lookup(&include_packs, pack_name))) {
> +		info = strmap_get(&packs, pack_basename(p));
> +		if (!info)
> +			continue;
> +
> +		if (info->kind & STDIN_PACK_INCLUDE) {
>  			if (exclude_promisor_objects && p->pack_promisor)
>  				die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
> -			item->util = p;
> +
> +			/*
> +			 * Arguments we got on stdin may not even be
> +			 * packs. First check that to avoid segfaulting
> +			 * later on in e.g.  pack_mtime_cmp(), excluded
> +			 * packs are handled below.
> +			 */
> +			if (!is_pack_valid(p))
> +				die(_("packfile %s cannot be accessed"), p->pack_name);

Hm. Doesn't this change behaviour though? Beforehand, we would have
checked the packfile for every included pack. Now we only check the
packfile for every included pack that was yielded by
`repo_for_each_pack()`. So if an included pack wasn't yielded at all we
wouldn't notice that it doesn't exist?

I guess an easy fix would be to mark every pack that we have processed
as seen in the pack info, and then loop over all pack infos a second
time to verify that we've seen all that we expected to see.

Which you in fact already do :) That post-processing happens in
`stdin_packs_add_pack_entries()`, where you verify that the `p` pointer
is set as expected. And if it's not we die with a message that the pack
wasn't found. Good.

This was a bit more demanding to review, but I very much like the
outcome of this.

Patrick

  reply	other threads:[~2026-03-24  7:39 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:03     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt [this message]
2026-03-25 23:13     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-21 16:57   ` Jeff King
2026-03-22 18:09     ` Taylor Blau
2026-03-25 23:19       ` Taylor Blau
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-26 20:40     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 22:11         ` Junio C Hamano
2026-03-26 22:32           ` Taylor Blau
2026-03-27  0:29             ` Derrick Stolee
2026-03-27 17:51               ` Taylor Blau
2026-03-27 18:34                 ` Derrick Stolee
2026-03-27 15:52             ` Junio C Hamano
2026-03-26 22:37     ` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-25 23:51   ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-26 20:48     ` Derrick Stolee
2026-03-25 23:51   ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-26 20:49     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 20:51   ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-26 21:46     ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
2026-03-27 20:06   ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-27 20:06   ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-27 20:06   ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-27 20:16   ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-27 20:43     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acI_sP6ZEdw-xGpR@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox