From: Taylor Blau <me@ttaylorr.com>
To: Elijah Newren <newren@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible
Date: Tue, 15 Apr 2025 16:51:59 -0400 [thread overview]
Message-ID: <Z/7G73FmI/bK1Jti@nand.local> (raw)
In-Reply-To: <CABPp-BHwewWMp1CY4jAr=sWwxE9of2Q73nyXQ5wpOELJZQ1OPg@mail.gmail.com>
On Mon, Apr 14, 2025 at 08:11:22PM -0700, Elijah Newren wrote:
> On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
> > geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
> > MIDX with '--write-midx' to ensure that the resulting MIDX was always
> > closed under reachability in order to generate reachability bitmaps.
> >
> > Suppose you have a once-unreachable object packed in a cruft pack, which
> > later on becomes reachable from one or more objects in a geometrically
> > repacked pack. That once-unreachable object *won't* appear in the new
> > pack, since the cruft pack was specified as neither included nor
> > excluded to 'pack-objects --stdin-packs'.
>
> I believe you are talking about the state before your series (i.e.,
> this is carrying on from the previous paragraph), but it reads as
> though you are talking about the state after the first seven patches
> of this series. Some kind of connection wording to clarify would
> really help here.
Sure.
> > If the bitmap selection
> > process picks one or more commits which reach the once-unreachable
> > objects, commit ddee3703b3 ensures that the MIDX will be closed under
> > reachability. Without it, we would fail to generate a MIDX bitmap.
>
> After reading this part, I had to go back and re-read and figure out
> what point in time everything was referring to.
Yeah, this is confusing to me too after reading it back. I made some
tweaks that I think clarify things.
> > ddee3703b3 alludes to the fact that this is sub-optimal by saying
> >
> > [...] it's desirable to avoid including cruft packs in the MIDX
> > because it causes the MIDX to store a bunch of objects which are
> > likely to get thrown away.
> >
> > , which is true, but hides an even larger problem. If repositories
> > rarely prune their unreachable objects and/or have many of them, the
> > MIDX must keep track of a large number of objects which bloats the MIDX
> > and slows down object lookup.
> >
> > This is doubly unfortunate because the vast majority of objects in cruft
> > pack(s) are unlikely to be read, but object reads that go through the
> > MIDX have to search through them anyway.
>
> "have to search through them"? That could be read to suggest those
> individual objects are read, rather than just traversed over. Maybe
> "...unlikely to be read, so the enlarged MIDX is for mostly tracking
> known-to-likely-be-irrelevant objects", or something like that?
Thanks for pointing out... I clarified this one as well.
> > This patch causes geometrically-repacked packs to contain a copy of any
> > once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
> > allowing us to avoid including any cruft packs in the MIDX. This is
> > because a sequence of geometrically-repacked packs that were all
> > generated with '--stdin-packs=follow' are guaranteed to have their union
> > be closed under reachability.
> >
> > Note that you cannot guarantee that a collection of packs is closed
> > under reachability if not all of them were generated with following as
>
> maybe: ...with "follow" as above. "follow" or "following" feels like
> it needs quotes so the reader understands its meant as the name of a
> mode, rather than a verb in the sentence.
>
> > above. One tell-tale sign that not all geometrically-repacked packs in
> > the MIDX were generated with following is to see if there is a pack in
>
> same here with "following"...and below.
Great calls on both, thanks.
> > @@ -808,26 +886,55 @@ static void midx_included_packs(struct string_list *include,
> > }
> > }
> >
> > - for_each_string_list_item(item, &existing->cruft_packs) {
> > + if (midx_must_contain_cruft ||
> > + midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
> > + include, geometry, existing)) {
> > /*
> > - * When doing a --geometric repack, there is no need to check
> > - * for deleted packs, since we're by definition not doing an
> > - * ALL_INTO_ONE repack (hence no packs will be deleted).
> > - * Otherwise we must check for and exclude any packs which are
> > - * enqueued for deletion.
> > + * If there are one or more unknown pack(s) present (see
> > + * midx_has_unknown_packs() for what makes a pack
> > + * "unknown") in the MIDX before the repack, keep them
> > + * as they may be required to form a reachability
> > + * closure if the MIDX is bitmapped.
> > *
> > - * So we could omit the conditional below in the --geometric
> > - * case, but doing so is unnecessary since no packs are marked
> > - * as pending deletion (since we only call
> > - * `mark_packs_for_deletion()` when doing an all-into-one
> > - * repack).
> > + * For example, a cruft pack can be required to form a
> > + * reachability closure if the MIDX is bitmapped and one
> > + * or more of its selected commits reaches a once-cruft
> > + * object that was later made reachable.
>
> The antecedent of "its" is unclear here; just spell it out to reduce
> how much thinking the reader needs to do?
Eek, good suggestion again. Thanks, I fixed it up and made the
antecedent explicit.
> > */
> > - if (pack_is_marked_for_deletion(item))
> > - continue;
> > + for_each_string_list_item(item, &existing->cruft_packs) {
> > + /*
> > + * When doing a --geometric repack, there is no
> > + * need to check for deleted packs, since we're
> > + * by definition not doing an ALL_INTO_ONE
> > + * repack (hence no packs will be deleted).
> > + * Otherwise we must check for and exclude any
> > + * packs which are enqueued for deletion.
> > + *
> > + * So we could omit the conditional below in the
> > + * --geometric case, but doing so is unnecessary
> > + * since no packs are marked as pending
> > + * deletion (since we only call
> > + * `mark_packs_for_deletion()` when doing an
> > + * all-into-one repack).
> > + */
> > + if (pack_is_marked_for_deletion(item))
> > + continue;
> >
> > - strbuf_reset(&buf);
> > - strbuf_addf(&buf, "%s.idx", item->string);
> > - string_list_insert(include, buf.buf);
> > + strbuf_reset(&buf);
> > + strbuf_addf(&buf, "%s.idx", item->string);
> > + string_list_insert(include, buf.buf);
> > + }
> > + } else {
> > + /*
> > + * Modern versions of Git will write new copies of
> > + * once-cruft objects when doing a --geometric repack.
>
> "Modern versions of Git" -> "Modern versions of Git with the
> appropriate config setting" ?
Heh. Great catch. Can you tell this part was written before I added the
configuration option? ;-)
Thanks,
Taylor
next prev parent reply other threads:[~2025-04-15 20:52 UTC|newest]
Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 2/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 3/8] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 4/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 5/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 6/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 7/8] repack: keep track of existing MIDX'd packs Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-04-14 20:06 ` [PATCH v2 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
2025-04-14 20:41 ` Junio C Hamano
2025-04-15 19:32 ` Taylor Blau
2025-04-15 19:48 ` Junio C Hamano
2025-04-15 22:27 ` Taylor Blau
2025-04-14 20:06 ` [PATCH v2 2/8] object-store-ll.h: add note about designated initializers Taylor Blau
2025-04-14 21:07 ` Junio C Hamano
2025-04-15 19:51 ` Taylor Blau
2025-04-15 2:57 ` Elijah Newren
2025-04-15 19:47 ` Taylor Blau
2025-04-14 20:06 ` [PATCH v2 3/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-04-15 3:10 ` Elijah Newren
2025-04-14 20:06 ` [PATCH v2 4/8] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-04-14 20:06 ` [PATCH v2 5/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-04-14 20:06 ` [PATCH v2 6/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-04-15 3:10 ` Elijah Newren
2025-04-15 19:57 ` Taylor Blau
2025-04-14 20:06 ` [PATCH v2 7/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-04-15 3:11 ` Elijah Newren
2025-04-15 20:45 ` Taylor Blau
2025-04-16 5:26 ` Elijah Newren
2025-04-14 20:06 ` [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-04-15 3:11 ` Elijah Newren
2025-04-15 20:51 ` Taylor Blau [this message]
2025-04-15 2:57 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Elijah Newren
2025-04-15 22:05 ` Taylor Blau
2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
2025-04-15 22:46 ` [PATCH v3 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-04-15 22:46 ` [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-04-16 0:58 ` Junio C Hamano
2025-04-16 22:07 ` Taylor Blau
2025-04-16 5:31 ` Elijah Newren
2025-04-16 22:07 ` Taylor Blau
2025-04-15 22:46 ` [PATCH v3 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-04-16 0:59 ` Junio C Hamano
2025-04-15 22:46 ` [PATCH v3 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-04-15 22:47 ` [PATCH v3 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-04-16 9:21 ` Junio C Hamano
2025-04-15 22:47 ` [PATCH v3 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-04-16 5:36 ` Elijah Newren
2025-04-15 22:47 ` [PATCH v3 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-04-15 22:47 ` [PATCH v3 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-04-15 22:47 ` [PATCH v3 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-04-16 5:56 ` Elijah Newren
2025-04-16 22:16 ` Taylor Blau
2025-05-13 3:34 ` Elijah Newren
2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-05-28 23:20 ` [PATCH v4 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-05-28 23:20 ` [PATCH v4 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-05-28 23:20 ` [PATCH v4 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-05-28 23:20 ` [PATCH v4 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-05-28 23:20 ` [PATCH v4 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-05-28 23:20 ` [PATCH v4 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-05-28 23:20 ` [PATCH v4 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-05-28 23:20 ` [PATCH v4 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-05-28 23:20 ` [PATCH v4 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-06-19 11:33 ` Carlo Marcelo Arenas Belón
2025-06-19 13:08 ` [PATCH] fixup! " Carlo Marcelo Arenas Belón
2025-06-19 17:07 ` Junio C Hamano
2025-06-19 23:26 ` Taylor Blau
2025-05-29 0:07 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-05-29 0:15 ` Elijah Newren
2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
2025-06-19 23:30 ` [PATCH v5 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-06-19 23:30 ` [PATCH v5 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-06-19 23:30 ` [PATCH v5 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-06-19 23:30 ` [PATCH v5 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-06-19 23:30 ` [PATCH v5 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-06-19 23:30 ` [PATCH v5 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-06-19 23:30 ` [PATCH v5 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-06-19 23:30 ` [PATCH v5 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-06-20 15:27 ` Junio C Hamano
2025-06-19 23:30 ` [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-06-21 4:35 ` Jeff King
2025-06-23 18:47 ` Taylor Blau
2025-06-24 10:54 ` Jeff King
2025-06-24 16:05 ` Taylor Blau
2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-06-23 22:32 ` [PATCH v6 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-06-24 15:52 ` Junio C Hamano
2025-06-24 16:06 ` Taylor Blau
2025-06-23 22:32 ` [PATCH v6 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-06-23 22:49 ` Junio C Hamano
2025-06-23 22:32 ` [PATCH v6 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-06-23 22:32 ` [PATCH v6 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-06-23 22:59 ` Junio C Hamano
2025-06-23 22:32 ` [PATCH v6 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-06-23 23:08 ` Junio C Hamano
2025-06-24 16:08 ` Taylor Blau
2025-06-23 22:32 ` [PATCH v6 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-06-23 22:32 ` [PATCH v6 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-06-23 22:32 ` [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-06-23 23:35 ` Junio C Hamano
2025-06-24 16:10 ` Taylor Blau
2025-06-23 22:32 ` [PATCH v6 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z/7G73FmI/bK1Jti@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).