git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources
Date: Tue, 15 Jul 2025 10:27:06 +0200	[thread overview]
Message-ID: <aHYQ2m2_uv8L8ZS9@pks.im> (raw)
In-Reply-To: <aHBTLFvL+eQcw6J0@nand.local>

On Thu, Jul 10, 2025 at 07:56:28PM -0400, Taylor Blau wrote:
> On Wed, Jul 09, 2025 at 09:54:53AM +0200, Patrick Steinhardt wrote:
> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index 9bbf032b6dd..5956df5d927 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
> > @@ -218,9 +218,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
> >  static void remove_redundant_pack(const char *dir_name, const char *base_name)
> >  {
> >  	struct strbuf buf = STRBUF_INIT;
> > -	struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
> > +	struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources);
> 
> Is the first source always guaranteed to be the local one? I assume that the
> answer here is "yes", and there are certainly other places in the code
> where we make a similar assumption. But just wanted to make sure as
> this popped into my head while reading.
> 
> >  	strbuf_addf(&buf, "%s.pack", base_name);
> > -	if (m && midx_contains_pack(m, buf.buf))
> > +	if (m && m->local && midx_contains_pack(m, buf.buf))
> 
> ...hmm, maybe not?

The answer is "almost always". The only exception is in case there is a
temporary object source added via `odb_set_temporary_primary_source()`,
for example for quarantine directories.

Generally speaking this patch here doesn't really change anything, and
we could just as well drop the `m->local` part above. But I think that
overall we're not doing a good job to track the local object source, so
it felt safer to me to add the above guard.

> > diff --git a/midx-write.c b/midx-write.c
> > index f2cfb85476e..c1ae62d3549 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
> >  static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
> >  							const char *object_dir)
> >  {
> > -	struct multi_pack_index *result = NULL;
> > -	struct multi_pack_index *cur;
> > -	char *obj_dir_real = real_pathdup(object_dir, 1);
> > -	struct strbuf cur_path_real = STRBUF_INIT;
> > -
> > -	/* Ensure the given object_dir is local, or a known alternate. */
> > -	odb_find_source(r->objects, obj_dir_real);
> > -
> > -	for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
> > -		strbuf_realpath(&cur_path_real, cur->object_dir, 1);
> > -		if (!strcmp(obj_dir_real, cur_path_real.buf)) {
> > -			result = cur;
> > -			goto cleanup;
> > -		}
> > -	}
> > -
> > -cleanup:
> > -	free(obj_dir_real);
> > -	strbuf_release(&cur_path_real);
> > -	return result;
> > +	struct odb_source *source = odb_find_source(r->objects, object_dir);
> > +	return get_multi_pack_index(source);
> 
> When I first read this I wondered what would happen if we passed in an
> unknown object_dir such that odb_find_source() returned NULL. But that
> function will never return NULL, and instead will die() if given a bogus
> object_dir.
> 
> So this is fine, though I would have imagined that we'd return NULL
> within odb_find_source() and let the caller die() (or not).

Fully agreed. I've got a follow-up patch series that does this
refactoring.

> > diff --git a/object-name.c b/object-name.c
> > index ddafe7f9b13..1e7fdcb90a8 100644
> > --- a/object-name.c
> > +++ b/object-name.c
> > @@ -198,16 +198,19 @@ static void unique_in_pack(struct packed_git *p,
> >
> >  static void find_short_packed_object(struct disambiguate_state *ds)
> >  {
> > -	struct multi_pack_index *m;
> >  	struct packed_git *p;
> >
> >  	/* Skip, unless oids from the storage hash algorithm are wanted */
> >  	if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
> >  		return;
> >
> > -	for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
> > -	     m = m->next)
> > -		unique_in_midx(m, ds);
> > +	odb_prepare_alternates(ds->repo->objects);
> > +	for (struct odb_source *source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
> > +		struct multi_pack_index *m = get_multi_pack_index(source);
> > +		if (m)
> > +			unique_in_midx(m, ds);
> > +	}
> > +
> 
> Makes sense, though now having seen this pattern a few times, I am
> wondering if it would be worth it to add a utility function that takes a
> callback and iterates over the various MIDXs. But perhaps that is taking
> DRY-ing things up a little too far ;-).
> 
> For what it's worth, I do think that what you wrote here makes more
> logical sense: MIDXs are tied to individual alternate object DBs, which
> here means that there is one MIDX per "struct odb_source". It just is a
> little more verbose to type out.

I was wondering whether it would make sense to add a looping macro, but
I ultimately decided against it as it doesn't make too much of a
difference to really matter. I wouldn't mind adding it though if you or
others feel strongly about it.

Patrick

  reply	other threads:[~2025-07-15  8:27 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-09  7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 1/8] midx: start tracking per object database source Patrick Steinhardt
2025-07-10 20:56   ` Justin Tobler
2025-07-10 23:10   ` Taylor Blau
2025-07-10 23:19     ` Taylor Blau
2025-07-15  8:26       ` Patrick Steinhardt
2025-07-15  8:26     ` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
2025-07-10 21:07   ` Justin Tobler
2025-07-10 23:14   ` Taylor Blau
2025-07-15  8:26     ` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 3/8] midx: stop using linked list when closing MIDX Patrick Steinhardt
2025-07-10 21:31   ` Justin Tobler
2025-07-15  8:26     ` Patrick Steinhardt
2025-07-10 23:22   ` Taylor Blau
2025-07-15  8:26     ` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 4/8] midx: track whether we have loaded the MIDX Patrick Steinhardt
2025-07-10 22:16   ` Justin Tobler
2025-07-10 23:26     ` Taylor Blau
2025-07-15  8:27       ` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
2025-07-10 22:35   ` Justin Tobler
2025-07-10 23:56   ` Taylor Blau
2025-07-15  8:27     ` Patrick Steinhardt [this message]
2025-07-09  7:54 ` [PATCH 6/8] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 7/8] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 8/8] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
2025-07-10 22:48   ` Justin Tobler
2025-07-09 22:04 ` [PATCH 0/8] odb: track multi-pack-indices via their object sources Junio C Hamano
2025-07-10 23:58   ` Taylor Blau
2025-07-15  8:27     ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 1/7] midx: start tracking per object database source Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 2/7] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 3/7] midx: stop using linked list when closing MIDX Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 4/7] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 5/7] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 6/7] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 7/7] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
2025-07-15 21:59   ` [PATCH v2 0/7] odb: track multi-pack-indices via their object sources Justin Tobler
2025-07-23 21:22   ` Junio C Hamano
2025-07-24  8:00     ` Patrick Steinhardt
2025-08-12 21:58       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHYQ2m2_uv8L8ZS9@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).