All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
Date: Tue, 24 Mar 2026 08:39:28 +0100	[thread overview]
Message-ID: <acI_sP6ZEdw-xGpR@pks.im> (raw)
In-Reply-To: <ea6fdbcc46f608c3fbe65298e9ca91faf43a1b16.1773959041.git.me@ttaylorr.com>

On Thu, Mar 19, 2026 at 06:24:20PM -0400, Taylor Blau wrote:
> The '--stdin-packs' mode of pack-objects maintains two separate
> string_lists: one for included packs, and one for excluded packs. Each
> list stores the pack basename as a string and the corresponding
> `packed_git` pointer in its `->util` field.
> 
> This works, but makes it awkward to extend the set of pack "kinds" that
> pack-objects can accept via stdin, since each new kind would need its
> own string_list and duplicated handling. A future commit will want to do
> just this, so prepare for that change by handling the various "kinds" of
> packs specified over stdin in a more generic fashion.
> 
> Namely, replace the two `string_list`s with a single `strmap` keyed on
> the pack basename, with values pointing to a new `struct
> stdin_pack_info`. This struct tracks both the `packed_git` pointer and a
> `kind` bitfield indicating whether the pack was specified as included or
> excluded.

Okay.

> Extract the logic for sorting packs by mtime and adding their objects
> into a separate `stdin_packs_add_entries()` helper.

Right, the ordering was my first question. Interestingly though, that
function doesn't seem to be added in this commit... ah, it's called
`stdin_packs_add_pack_entries()`.

> While we could have used a `string_list`, we must handle the case where
> the same pack is specified more than once. With a `string_list` only, we
> would have to pay a quadratic cost to either (a) insert elements into
> their sorted positions, or (b) a repeated linear search, which is
> accidentally quadratic. For that reason, use a strmap instead.

We could of course just add them and deduplicate in a later step via
`string_list_sort_u()`. But I assume that you want to handle the case
where we have duplicates specially, either by merging the duplicate into
the existing pack info or by aborting.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 9a89bc5c4c9..72c9ddbed6b 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3837,90 +3838,120 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
>  		return 0;
>  }
>  
> -static void read_packs_list_from_stdin(struct rev_info *revs)
> +struct stdin_pack_info {
> +	struct packed_git *p;
> +	enum {
> +		STDIN_PACK_INCLUDE = (1<<0),
> +		STDIN_PACK_EXCLUDE_CLOSED = (1<<1),

It might make sense to provide a sentence for each of the enums to
explain what they do.

> +	} kind;
> +};

Okay, this is the new structure that allows us to track the packfile
kind in a way that is more extensible. Makes sense.

> +static void stdin_packs_add_pack_entries(struct strmap *packs,
> +					 struct rev_info *revs)
> +{
> +	struct string_list keys = STRING_LIST_INIT_NODUP;
> +	struct string_list_item *item;
> +	struct hashmap_iter iter;
> +	struct strmap_entry *entry;
> +
> +	strmap_for_each_entry(packs, &iter, entry) {
> +		struct stdin_pack_info *info = entry->value;
> +		if (!info->p)
> +			die(_("could not find pack '%s'"), entry->key);
> +
> +		string_list_append(&keys, entry->key)->util = info->p;
> +	}
> +
> +	/*
> +	 * Order packs by ascending mtime; use QSORT directly to access the
> +	 * string_list_item's ->util pointer, which string_list_sort() does not
> +	 * provide.
> +	 */
> +	QSORT(keys.items, keys.nr, pack_mtime_cmp);

Okay. I was briefly wondering whether it would make more sense to use
`string_list_sort()`, but I guess it doesn't buy us much.

> +	for_each_string_list_item(item, &keys) {
> +		struct stdin_pack_info *info = strmap_get(packs, item->string);

We could avoid this extra lookup if you instead were to store the pack
info in the `item->util` field.

> +		if (!info->p)
> +			die(_("could not find pack '%s'"), item->string);

This case basically cannot happen as we already `die()` further up,
right? Should we rather `BUG()` or drop the check completely?

> +		if (info->kind & STDIN_PACK_INCLUDE)
> +			for_each_object_in_pack(info->p,
> +						add_object_entry_from_pack,
> +						revs,
> +						ODB_FOR_EACH_OBJECT_PACK_ORDER);
> +	}
> +
> +	string_list_clear(&keys, 0);
> +}
> +
> +static void stdin_packs_read_input(struct rev_info *revs)
>  {
>  	struct strbuf buf = STRBUF_INIT;
> -	struct string_list include_packs = STRING_LIST_INIT_DUP;
> -	struct string_list exclude_packs = STRING_LIST_INIT_DUP;
> -	struct string_list_item *item = NULL;
> +	struct strmap packs = STRMAP_INIT;
>  	struct packed_git *p;
>  
>  	while (strbuf_getline(&buf, stdin) != EOF) {
> -		if (!buf.len)
> +		struct stdin_pack_info *info;
> +		const char *key = buf.buf;
> +
> +		if (!key || !*key)

The first case of `!key` cannot ever happen as strbufs always have `buf`
set.

>  			continue;
>  
> +		if (*key == '^')
> +			key++;
> +
> +		info = strmap_get(&packs, key);
> +		if (!info) {
> +			CALLOC_ARRAY(info, 1);
> +			strmap_put(&packs, key, info);
> +		}
> +
>  		if (*buf.buf == '^')
> -			string_list_append(&exclude_packs, buf.buf + 1);
> +			info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
>  		else
> -			string_list_append(&include_packs, buf.buf);
> +			info->kind |= STDIN_PACK_INCLUDE;

I was briefly wondering whether we need error handling for the case
where a pack is marked both as excluded and included. But we didn't have
it beforehand, either.

>  		strbuf_reset(&buf);
>  	}
>  
> -	string_list_sort_u(&include_packs, 0);
> -	string_list_sort_u(&exclude_packs, 0);
> -
>  	repo_for_each_pack(the_repository, p) {
> -		const char *pack_name = pack_basename(p);
> +		struct stdin_pack_info *info;
>  
> -		if ((item = string_list_lookup(&include_packs, pack_name))) {
> +		info = strmap_get(&packs, pack_basename(p));
> +		if (!info)
> +			continue;
> +
> +		if (info->kind & STDIN_PACK_INCLUDE) {
>  			if (exclude_promisor_objects && p->pack_promisor)
>  				die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
> -			item->util = p;
> +
> +			/*
> +			 * Arguments we got on stdin may not even be
> +			 * packs. First check that to avoid segfaulting
> +			 * later on in e.g.  pack_mtime_cmp(), excluded
> +			 * packs are handled below.
> +			 */
> +			if (!is_pack_valid(p))
> +				die(_("packfile %s cannot be accessed"), p->pack_name);

Hm. Doesn't this change behaviour though? Beforehand, we would have
checked the packfile for every included pack. Now we only check the
packfile for every included pack that was yielded by
`repo_for_each_pack()`. So if an included pack wasn't yielded at all we
wouldn't notice that it doesn't exist?

I guess an easy fix would be to mark every pack that we have processed
as seen in the pack info, and then loop over all pack infos a second
time to verify that we've seen all that we expected to see.

Which you in fact already do :) That post-processing happens in
`stdin_packs_add_pack_entries()`, where you verify that the `p` pointer
is set as expected. And if it's not we die with a message that the pack
wasn't found. Good.

This was a bit more demanding to review, but I very much like the
outcome of this.

Patrick

  reply	other threads:[~2026-03-24  7:39 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:03     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt [this message]
2026-03-25 23:13     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-21 16:57   ` Jeff King
2026-03-22 18:09     ` Taylor Blau
2026-03-25 23:19       ` Taylor Blau
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-26 20:40     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 22:11         ` Junio C Hamano
2026-03-26 22:32           ` Taylor Blau
2026-03-27  0:29             ` Derrick Stolee
2026-03-27 17:51               ` Taylor Blau
2026-03-27 18:34                 ` Derrick Stolee
2026-03-27 15:52             ` Junio C Hamano
2026-03-26 22:37     ` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-25 23:51   ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-26 20:48     ` Derrick Stolee
2026-03-25 23:51   ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-26 20:49     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 20:51   ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-26 21:46     ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
2026-03-27 20:06   ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-27 20:06   ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-27 20:06   ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-27 20:16   ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-27 20:43     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acI_sP6ZEdw-xGpR@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.