public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
Date: Wed, 25 Mar 2026 19:13:56 -0400	[thread overview]
Message-ID: <acRsNHna6IJHQNZq@nand.local> (raw)
In-Reply-To: <acI_sP6ZEdw-xGpR@pks.im>

On Tue, Mar 24, 2026 at 08:39:28AM +0100, Patrick Steinhardt wrote:
> > Extract the logic for sorting packs by mtime and adding their objects
> > into a separate `stdin_packs_add_entries()` helper.
>
> Right, the ordering was my first question. Interestingly though, that
> function doesn't seem to be added in this commit... ah, it's called
> `stdin_packs_add_pack_entries()`.

Ah, good catch. I had originally called it `stdin_packs_add_entries()`
but renamed it before sending, apparently without adjusting the commit
message appropriately.

> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 9a89bc5c4c9..72c9ddbed6b 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -3837,90 +3838,120 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
> >  		return 0;
> >  }
> >
> > -static void read_packs_list_from_stdin(struct rev_info *revs)
> > +struct stdin_pack_info {
> > +	struct packed_git *p;
> > +	enum {
> > +		STDIN_PACK_INCLUDE = (1<<0),
> > +		STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
>
> It might make sense to provide a sentence for each of the enums to
> explain what they do.

I'm not opposed, but I am not sure what information would be helpful to
add here, since these correspond one-to-one with the three possible
prefixes for packfile names we receive with --stdin-packs.

> > +static void stdin_packs_add_pack_entries(struct strmap *packs,
> > +					 struct rev_info *revs)
> > +{
> > +	struct string_list keys = STRING_LIST_INIT_NODUP;
> > +	struct string_list_item *item;
> > +	struct hashmap_iter iter;
> > +	struct strmap_entry *entry;
> > +
> > +	strmap_for_each_entry(packs, &iter, entry) {
> > +		struct stdin_pack_info *info = entry->value;
> > +		if (!info->p)
> > +			die(_("could not find pack '%s'"), entry->key);
> > +
> > +		string_list_append(&keys, entry->key)->util = info->p;
> > +	}
> > +
> > +	/*
> > +	 * Order packs by ascending mtime; use QSORT directly to access the
> > +	 * string_list_item's ->util pointer, which string_list_sort() does not
> > +	 * provide.
> > +	 */
> > +	QSORT(keys.items, keys.nr, pack_mtime_cmp);
>
> Okay. I was briefly wondering whether it would make more sense to use
> `string_list_sort()`, but I guess it doesn't buy us much.

Yeah. This is actually carried forward from the existing implementation,
and uses the separate QSORT() because `string_list_sort()` doesn't
provide access to the `util` field of the items, which we need to sort
by mtime.

> > +	for_each_string_list_item(item, &keys) {
> > +		struct stdin_pack_info *info = strmap_get(packs, item->string);
>
> We could avoid this extra lookup if you instead were to store the pack
> info in the `item->util` field.

Good idea. Funnily enough, we already assign ->util = info->p in the
loop above, but never use it. Something like this on top should clean
things up nicely:

--- 8< ---
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 72c9ddbed6b..c9b33d1673d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3859,7 +3859,7 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
 		if (!info->p)
 			die(_("could not find pack '%s'"), entry->key);

-		string_list_append(&keys, entry->key)->util = info->p;
+		string_list_append(&keys, entry->key)->util = info;
 	}

 	/*
@@ -3870,9 +3870,7 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
 	QSORT(keys.items, keys.nr, pack_mtime_cmp);

 	for_each_string_list_item(item, &keys) {
-		struct stdin_pack_info *info = strmap_get(packs, item->string);
-		if (!info->p)
-			die(_("could not find pack '%s'"), item->string);
+		struct stdin_pack_info *info = item->util;

 		if (info->kind & STDIN_PACK_INCLUDE)
 			for_each_object_in_pack(info->p,
--- >8 ---

> > +		if (!info->p)
> > +			die(_("could not find pack '%s'"), item->string);
>
> This case basically cannot happen as we already `die()` further up,
> right? Should we rather `BUG()` or drop the check completely?

I think we should drop the check completely here, there's no way that we
would have a NULL 'info->p' by this point with the check that exists a
few lines up.

> > +		if (info->kind & STDIN_PACK_INCLUDE)
> > +			for_each_object_in_pack(info->p,
> > +						add_object_entry_from_pack,
> > +						revs,
> > +						ODB_FOR_EACH_OBJECT_PACK_ORDER);
> > +	}
> > +
> > +	string_list_clear(&keys, 0);
> > +}
> > +
> > +static void stdin_packs_read_input(struct rev_info *revs)
> >  {
> >  	struct strbuf buf = STRBUF_INIT;
> > -	struct string_list include_packs = STRING_LIST_INIT_DUP;
> > -	struct string_list exclude_packs = STRING_LIST_INIT_DUP;
> > -	struct string_list_item *item = NULL;
> > +	struct strmap packs = STRMAP_INIT;
> >  	struct packed_git *p;
> >
> >  	while (strbuf_getline(&buf, stdin) != EOF) {
> > -		if (!buf.len)
> > +		struct stdin_pack_info *info;
> > +		const char *key = buf.buf;
> > +
> > +		if (!key || !*key)
>
> The first case of `!key` cannot ever happen as strbufs always have `buf`
> set.

You're right, this is just muscle memory, but the left-hand side of the
condition is unnecessary. I'll remove it.

> >  			continue;
> >
> > +		if (*key == '^')
> > +			key++;
> > +
> > +		info = strmap_get(&packs, key);
> > +		if (!info) {
> > +			CALLOC_ARRAY(info, 1);
> > +			strmap_put(&packs, key, info);
> > +		}
> > +
> >  		if (*buf.buf == '^')
> > -			string_list_append(&exclude_packs, buf.buf + 1);
> > +			info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
> >  		else
> > -			string_list_append(&include_packs, buf.buf);
> > +			info->kind |= STDIN_PACK_INCLUDE;
>
> I was briefly wondering whether we need error handling for the case
> where a pack is marked both as excluded and included. But we didn't have
> it beforehand, either.

Yeah, I think this is a consequence of 752b465c3c0 (pack-objects: fix
error when same packfile is included and excluded, 2023-04-14).

> > [snip]
> > +
> > +			/*
> > +			 * Arguments we got on stdin may not even be
> > +			 * packs. First check that to avoid segfaulting
> > +			 * later on in e.g.  pack_mtime_cmp(), excluded
> > +			 * packs are handled below.
> > +			 */
> > +			if (!is_pack_valid(p))
> > +				die(_("packfile %s cannot be accessed"), p->pack_name);
>
> Hm. Doesn't this change behaviour though? Beforehand, we would have
> checked the packfile for every included pack. Now we only check the
> packfile for every included pack that was yielded by
> `repo_for_each_pack()`. So if an included pack wasn't yielded at all we
> wouldn't notice that it doesn't exist?
>
> I guess an easy fix would be to mark every pack that we have processed
> as seen in the pack info, and then loop over all pack infos a second
> time to verify that we've seen all that we expected to see.
>
> Which you in fact already do :) That post-processing happens in
> `stdin_packs_add_pack_entries()`, where you verify that the `p` pointer
> is set as expected. And if it's not we die with a message that the pack
> wasn't found. Good.

Thanks for double checking.

> This was a bit more demanding to review, but I very much like the
> outcome of this.

Yeah, I really struggled to try and find a productive way to break this
up into smaller changes. But in the end I couldn't find any good splits
that I liked, hence the larger-than-usual patch.

Thanks for reviewing it, I think that it makes the rest of the series a
little more palatable, and the resulting code is easier to reason about
IMHO.

Thanks,
Taylor

  reply	other threads:[~2026-03-25 23:13 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:03     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:13     ` Taylor Blau [this message]
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-21 16:57   ` Jeff King
2026-03-22 18:09     ` Taylor Blau
2026-03-25 23:19       ` Taylor Blau
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-26 20:40     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 22:11         ` Junio C Hamano
2026-03-26 22:32           ` Taylor Blau
2026-03-27  0:29             ` Derrick Stolee
2026-03-27 17:51               ` Taylor Blau
2026-03-27 18:34                 ` Derrick Stolee
2026-03-27 15:52             ` Junio C Hamano
2026-03-26 22:37     ` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-25 23:51   ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-26 20:48     ` Derrick Stolee
2026-03-25 23:51   ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-26 20:49     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 20:51   ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-26 21:46     ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
2026-03-27 20:06   ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-27 20:06   ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-27 20:06   ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-27 20:16   ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-27 20:43     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acRsNHna6IJHQNZq@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox