All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
Date: Wed, 25 Mar 2026 19:13:56 -0400	[thread overview]
Message-ID: <acRsNHna6IJHQNZq@nand.local> (raw)
In-Reply-To: <acI_sP6ZEdw-xGpR@pks.im>

On Tue, Mar 24, 2026 at 08:39:28AM +0100, Patrick Steinhardt wrote:
> > Extract the logic for sorting packs by mtime and adding their objects
> > into a separate `stdin_packs_add_entries()` helper.
>
> Right, the ordering was my first question. Interestingly though, that
> function doesn't seem to be added in this commit... ah, it's called
> `stdin_packs_add_pack_entries()`.

Ah, good catch. I had originally called it `stdin_packs_add_entries()`
but renamed it before sending, apparently without adjusting the commit
message appropriately.

> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 9a89bc5c4c9..72c9ddbed6b 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -3837,90 +3838,120 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
> >  		return 0;
> >  }
> >
> > -static void read_packs_list_from_stdin(struct rev_info *revs)
> > +struct stdin_pack_info {
> > +	struct packed_git *p;
> > +	enum {
> > +		STDIN_PACK_INCLUDE = (1<<0),
> > +		STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
>
> It might make sense to provide a sentence for each of the enums to
> explain what they do.

I'm not opposed, but I am not sure what information would be helpful to
add here, since these correspond one-to-one with the three possible
prefixes for packfile names we receive with --stdin-packs.

> > +static void stdin_packs_add_pack_entries(struct strmap *packs,
> > +					 struct rev_info *revs)
> > +{
> > +	struct string_list keys = STRING_LIST_INIT_NODUP;
> > +	struct string_list_item *item;
> > +	struct hashmap_iter iter;
> > +	struct strmap_entry *entry;
> > +
> > +	strmap_for_each_entry(packs, &iter, entry) {
> > +		struct stdin_pack_info *info = entry->value;
> > +		if (!info->p)
> > +			die(_("could not find pack '%s'"), entry->key);
> > +
> > +		string_list_append(&keys, entry->key)->util = info->p;
> > +	}
> > +
> > +	/*
> > +	 * Order packs by ascending mtime; use QSORT directly to access the
> > +	 * string_list_item's ->util pointer, which string_list_sort() does not
> > +	 * provide.
> > +	 */
> > +	QSORT(keys.items, keys.nr, pack_mtime_cmp);
>
> Okay. I was briefly wondering whether it would make more sense to use
> `string_list_sort()`, but I guess it doesn't buy us much.

Yeah. This is actually carried forward from the existing implementation,
and uses the separate QSORT() because `string_list_sort()` doesn't
provide access to the `util` field of the items, which we need to sort
by mtime.

> > +	for_each_string_list_item(item, &keys) {
> > +		struct stdin_pack_info *info = strmap_get(packs, item->string);
>
> We could avoid this extra lookup if you instead were to store the pack
> info in the `item->util` field.

Good idea. Funnily enough, we already assign ->util = info->p in the
loop above, but never use it. Something like this on top should clean
things up nicely:

--- 8< ---
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 72c9ddbed6b..c9b33d1673d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3859,7 +3859,7 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
 		if (!info->p)
 			die(_("could not find pack '%s'"), entry->key);

-		string_list_append(&keys, entry->key)->util = info->p;
+		string_list_append(&keys, entry->key)->util = info;
 	}

 	/*
@@ -3870,9 +3870,7 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
 	QSORT(keys.items, keys.nr, pack_mtime_cmp);

 	for_each_string_list_item(item, &keys) {
-		struct stdin_pack_info *info = strmap_get(packs, item->string);
-		if (!info->p)
-			die(_("could not find pack '%s'"), item->string);
+		struct stdin_pack_info *info = item->util;

 		if (info->kind & STDIN_PACK_INCLUDE)
 			for_each_object_in_pack(info->p,
--- >8 ---

> > +		if (!info->p)
> > +			die(_("could not find pack '%s'"), item->string);
>
> This case basically cannot happen as we already `die()` further up,
> right? Should we rather `BUG()` or drop the check completely?

I think we should drop the check completely here, there's no way that we
would have a NULL 'info->p' by this point with the check that exists a
few lines up.

> > +		if (info->kind & STDIN_PACK_INCLUDE)
> > +			for_each_object_in_pack(info->p,
> > +						add_object_entry_from_pack,
> > +						revs,
> > +						ODB_FOR_EACH_OBJECT_PACK_ORDER);
> > +	}
> > +
> > +	string_list_clear(&keys, 0);
> > +}
> > +
> > +static void stdin_packs_read_input(struct rev_info *revs)
> >  {
> >  	struct strbuf buf = STRBUF_INIT;
> > -	struct string_list include_packs = STRING_LIST_INIT_DUP;
> > -	struct string_list exclude_packs = STRING_LIST_INIT_DUP;
> > -	struct string_list_item *item = NULL;
> > +	struct strmap packs = STRMAP_INIT;
> >  	struct packed_git *p;
> >
> >  	while (strbuf_getline(&buf, stdin) != EOF) {
> > -		if (!buf.len)
> > +		struct stdin_pack_info *info;
> > +		const char *key = buf.buf;
> > +
> > +		if (!key || !*key)
>
> The first case of `!key` cannot ever happen as strbufs always have `buf`
> set.

You're right, this is just muscle memory, but the left-hand side of the
condition is unnecessary. I'll remove it.

> >  			continue;
> >
> > +		if (*key == '^')
> > +			key++;
> > +
> > +		info = strmap_get(&packs, key);
> > +		if (!info) {
> > +			CALLOC_ARRAY(info, 1);
> > +			strmap_put(&packs, key, info);
> > +		}
> > +
> >  		if (*buf.buf == '^')
> > -			string_list_append(&exclude_packs, buf.buf + 1);
> > +			info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
> >  		else
> > -			string_list_append(&include_packs, buf.buf);
> > +			info->kind |= STDIN_PACK_INCLUDE;
>
> I was briefly wondering whether we need error handling for the case
> where a pack is marked both as excluded and included. But we didn't have
> it beforehand, either.

Yeah, I think this is a consequence of 752b465c3c0 (pack-objects: fix
error when same packfile is included and excluded, 2023-04-14).

> > [snip]
> > +
> > +			/*
> > +			 * Arguments we got on stdin may not even be
> > +			 * packs. First check that to avoid segfaulting
> > +			 * later on in e.g.  pack_mtime_cmp(), excluded
> > +			 * packs are handled below.
> > +			 */
> > +			if (!is_pack_valid(p))
> > +				die(_("packfile %s cannot be accessed"), p->pack_name);
>
> Hm. Doesn't this change behaviour though? Beforehand, we would have
> checked the packfile for every included pack. Now we only check the
> packfile for every included pack that was yielded by
> `repo_for_each_pack()`. So if an included pack wasn't yielded at all we
> wouldn't notice that it doesn't exist?
>
> I guess an easy fix would be to mark every pack that we have processed
> as seen in the pack info, and then loop over all pack infos a second
> time to verify that we've seen all that we expected to see.
>
> Which you in fact already do :) That post-processing happens in
> `stdin_packs_add_pack_entries()`, where you verify that the `p` pointer
> is set as expected. And if it's not we die with a message that the pack
> wasn't found. Good.

Thanks for double checking.

> This was a bit more demanding to review, but I very much like the
> outcome of this.

Yeah, I really struggled to try and find a productive way to break this
up into smaller changes. But in the end I couldn't find any good splits
that I liked, hence the larger-than-usual patch.

Thanks for reviewing it, I think that it makes the rest of the series a
little more palatable, and the resulting code is easier to reason about
IMHO.

Thanks,
Taylor

  reply	other threads:[~2026-03-25 23:13 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:03     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:13     ` Taylor Blau [this message]
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-21 16:57   ` Jeff King
2026-03-22 18:09     ` Taylor Blau
2026-03-25 23:19       ` Taylor Blau
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-26 20:40     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 22:11         ` Junio C Hamano
2026-03-26 22:32           ` Taylor Blau
2026-03-27  0:29             ` Derrick Stolee
2026-03-27 17:51               ` Taylor Blau
2026-03-27 18:34                 ` Derrick Stolee
2026-03-27 15:52             ` Junio C Hamano
2026-03-26 22:37     ` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-25 23:51   ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-26 20:48     ` Derrick Stolee
2026-03-25 23:51   ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-26 20:49     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 20:51   ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-26 21:46     ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
2026-03-27 20:06   ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-27 20:06   ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-27 20:06   ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-27 20:16   ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-27 20:43     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acRsNHna6IJHQNZq@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.