git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Elijah Newren <newren@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible
Date: Tue, 15 Apr 2025 16:51:59 -0400	[thread overview]
Message-ID: <Z/7G73FmI/bK1Jti@nand.local> (raw)
In-Reply-To: <CABPp-BHwewWMp1CY4jAr=sWwxE9of2Q73nyXQ5wpOELJZQ1OPg@mail.gmail.com>

On Mon, Apr 14, 2025 at 08:11:22PM -0700, Elijah Newren wrote:
> On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
> > geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
> > MIDX with '--write-midx' to ensure that the resulting MIDX was always
> > closed under reachability in order to generate reachability bitmaps.
> >
> > Suppose you have a once-unreachable object packed in a cruft pack, which
> > later on becomes reachable from one or more objects in a geometrically
> > repacked pack. That once-unreachable object *won't* appear in the new
> > pack, since the cruft pack was specified as neither included nor
> > excluded to 'pack-objects --stdin-packs'.
>
> I believe you are talking about the state before your series (i.e.,
> this is carrying on from the previous paragraph), but it reads as
> though you are talking about the state after the first seven patches
> of this series.  Some kind of connection wording to clarify would
> really help here.

Sure.

> > If the bitmap selection
> > process picks one or more commits which reach the once-unreachable
> > objects, commit ddee3703b3 ensures that the MIDX will be closed under
> > reachability. Without it, we would fail to generate a MIDX bitmap.
>
> After reading this part, I had to go back and re-read and figure out
> what point in time everything was referring to.

Yeah, this is confusing to me too after reading it back. I made some
tweaks that I think clarify things.

> > ddee3703b3 alludes to the fact that this is sub-optimal by saying
> >
> >     [...] it's desirable to avoid including cruft packs in the MIDX
> >     because it causes the MIDX to store a bunch of objects which are
> >     likely to get thrown away.
> >
> > , which is true, but hides an even larger problem. If repositories
> > rarely prune their unreachable objects and/or have many of them, the
> > MIDX must keep track of a large number of objects which bloats the MIDX
> > and slows down object lookup.
> >
> > This is doubly unfortunate because the vast majority of objects in cruft
> > pack(s) are unlikely to be read, but object reads that go through the
> > MIDX have to search through them anyway.
>
> "have to search through them"?  That could be read to suggest those
> individual objects are read, rather than just traversed over.  Maybe
> "...unlikely to be read, so the enlarged MIDX is for mostly tracking
> known-to-likely-be-irrelevant objects", or something like that?

Thanks for pointing out... I clarified this one as well.

> > This patch causes geometrically-repacked packs to contain a copy of any
> > once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
> > allowing us to avoid including any cruft packs in the MIDX. This is
> > because a sequence of geometrically-repacked packs that were all
> > generated with '--stdin-packs=follow' are guaranteed to have their union
> > be closed under reachability.
> >
> > Note that you cannot guarantee that a collection of packs is closed
> > under reachability if not all of them were generated with following as
>
> maybe: ...with "follow" as above.  "follow" or "following" feels like
> it needs quotes so the reader understands its meant as the name of a
> mode, rather than a verb in the sentence.
>
> > above. One tell-tale sign that not all geometrically-repacked packs in
> > the MIDX were generated with following is to see if there is a pack in
>
> same here with "following"...and below.

Great calls on both, thanks.

> > @@ -808,26 +886,55 @@ static void midx_included_packs(struct string_list *include,
> >                 }
> >         }
> >
> > -       for_each_string_list_item(item, &existing->cruft_packs) {
> > +       if (midx_must_contain_cruft ||
> > +           midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr,
> > +                                  include, geometry, existing)) {
> >                 /*
> > -                * When doing a --geometric repack, there is no need to check
> > -                * for deleted packs, since we're by definition not doing an
> > -                * ALL_INTO_ONE repack (hence no packs will be deleted).
> > -                * Otherwise we must check for and exclude any packs which are
> > -                * enqueued for deletion.
> > +                * If there are one or more unknown pack(s) present (see
> > +                * midx_has_unknown_packs() for what makes a pack
> > +                * "unknown") in the MIDX before the repack, keep them
> > +                * as they may be required to form a reachability
> > +                * closure if the MIDX is bitmapped.
> >                  *
> > -                * So we could omit the conditional below in the --geometric
> > -                * case, but doing so is unnecessary since no packs are marked
> > -                * as pending deletion (since we only call
> > -                * `mark_packs_for_deletion()` when doing an all-into-one
> > -                * repack).
> > +                * For example, a cruft pack can be required to form a
> > +                * reachability closure if the MIDX is bitmapped and one
> > +                * or more of its selected commits reaches a once-cruft
> > +                * object that was later made reachable.
>
> The antecedent of "its" is unclear here; just spell it out to reduce
> how much thinking the reader needs to do?

Eek, good suggestion again. Thanks, I fixed it up and made the
antecedent explicit.

> >                  */
> > -               if (pack_is_marked_for_deletion(item))
> > -                       continue;
> > +               for_each_string_list_item(item, &existing->cruft_packs) {
> > +                       /*
> > +                        * When doing a --geometric repack, there is no
> > +                        * need to check for deleted packs, since we're
> > +                        * by definition not doing an ALL_INTO_ONE
> > +                        * repack (hence no packs will be deleted).
> > +                        * Otherwise we must check for and exclude any
> > +                        * packs which are enqueued for deletion.
> > +                        *
> > +                        * So we could omit the conditional below in the
> > +                        * --geometric case, but doing so is unnecessary
> > +                        *  since no packs are marked as pending
> > +                        *  deletion (since we only call
> > +                        *  `mark_packs_for_deletion()` when doing an
> > +                        *  all-into-one repack).
> > +                        */
> > +                       if (pack_is_marked_for_deletion(item))
> > +                               continue;
> >
> > -               strbuf_reset(&buf);
> > -               strbuf_addf(&buf, "%s.idx", item->string);
> > -               string_list_insert(include, buf.buf);
> > +                       strbuf_reset(&buf);
> > +                       strbuf_addf(&buf, "%s.idx", item->string);
> > +                       string_list_insert(include, buf.buf);
> > +               }
> > +       } else {
> > +               /*
> > +                * Modern versions of Git will write new copies of
> > +                * once-cruft objects when doing a --geometric repack.
>
> "Modern versions of Git" -> "Modern versions of Git with the
> appropriate config setting" ?

Heh. Great catch. Can you tell this part was written before I added the
configuration option? ;-)

Thanks,
Taylor

  reply	other threads:[~2025-04-15 20:52 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-11 23:26 [RFC PATCH 0/8] repack: avoid MIDX'ing cruft pack(s) where possible Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 2/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 3/8] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 4/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 5/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 6/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 7/8] repack: keep track of existing MIDX'd packs Taylor Blau
2025-04-11 23:26 ` [RFC PATCH 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-04-14 20:06 ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-04-14 20:06   ` [PATCH v2 1/8] pack-objects: use standard option incompatibility functions Taylor Blau
2025-04-14 20:41     ` Junio C Hamano
2025-04-15 19:32       ` Taylor Blau
2025-04-15 19:48         ` Junio C Hamano
2025-04-15 22:27           ` Taylor Blau
2025-04-14 20:06   ` [PATCH v2 2/8] object-store-ll.h: add note about designated initializers Taylor Blau
2025-04-14 21:07     ` Junio C Hamano
2025-04-15 19:51       ` Taylor Blau
2025-04-15  2:57     ` Elijah Newren
2025-04-15 19:47       ` Taylor Blau
2025-04-14 20:06   ` [PATCH v2 3/8] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-04-15  3:10     ` Elijah Newren
2025-04-14 20:06   ` [PATCH v2 4/8] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-04-14 20:06   ` [PATCH v2 5/8] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-04-14 20:06   ` [PATCH v2 6/8] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-04-15  3:10     ` Elijah Newren
2025-04-15 19:57       ` Taylor Blau
2025-04-14 20:06   ` [PATCH v2 7/8] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-04-15  3:11     ` Elijah Newren
2025-04-15 20:45       ` Taylor Blau
2025-04-16  5:26         ` Elijah Newren
2025-04-14 20:06   ` [PATCH v2 8/8] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-04-15  3:11     ` Elijah Newren
2025-04-15 20:51       ` Taylor Blau [this message]
2025-04-15  2:57   ` [PATCH v2 0/8] repack: avoid MIDX'ing cruft pack(s) " Elijah Newren
2025-04-15 22:05     ` Taylor Blau
2025-04-15 22:46 ` [PATCH v3 0/9] " Taylor Blau
2025-04-15 22:46   ` [PATCH v3 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-04-15 22:46   ` [PATCH v3 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-04-16  0:58     ` Junio C Hamano
2025-04-16 22:07       ` Taylor Blau
2025-04-16  5:31     ` Elijah Newren
2025-04-16 22:07       ` Taylor Blau
2025-04-15 22:46   ` [PATCH v3 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-04-16  0:59     ` Junio C Hamano
2025-04-15 22:46   ` [PATCH v3 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-04-15 22:47   ` [PATCH v3 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-04-16  9:21     ` Junio C Hamano
2025-04-15 22:47   ` [PATCH v3 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-04-16  5:36     ` Elijah Newren
2025-04-15 22:47   ` [PATCH v3 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-04-15 22:47   ` [PATCH v3 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-04-15 22:47   ` [PATCH v3 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-04-16  5:56     ` Elijah Newren
2025-04-16 22:16       ` Taylor Blau
2025-05-13  3:34         ` Elijah Newren
2025-05-28 23:20 ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-05-28 23:20   ` [PATCH v4 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-05-28 23:20   ` [PATCH v4 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-05-28 23:20   ` [PATCH v4 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-05-28 23:20   ` [PATCH v4 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-05-28 23:20   ` [PATCH v4 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-05-28 23:20   ` [PATCH v4 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-05-28 23:20   ` [PATCH v4 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-05-28 23:20   ` [PATCH v4 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-05-28 23:20   ` [PATCH v4 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-06-19 11:33     ` Carlo Marcelo Arenas Belón
2025-06-19 13:08     ` [PATCH] fixup! " Carlo Marcelo Arenas Belón
2025-06-19 17:07       ` Junio C Hamano
2025-06-19 23:26         ` Taylor Blau
2025-05-29  0:07   ` [PATCH v4 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-05-29  0:15     ` Elijah Newren
2025-06-19 23:30 ` [PATCH v5 " Taylor Blau
2025-06-19 23:30   ` [PATCH v5 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-06-19 23:30   ` [PATCH v5 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-06-19 23:30   ` [PATCH v5 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-06-19 23:30   ` [PATCH v5 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-06-19 23:30   ` [PATCH v5 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-06-19 23:30   ` [PATCH v5 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-06-19 23:30   ` [PATCH v5 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-06-19 23:30   ` [PATCH v5 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-06-20 15:27     ` Junio C Hamano
2025-06-19 23:30   ` [PATCH v5 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau
2025-06-21  4:35     ` Jeff King
2025-06-23 18:47       ` Taylor Blau
2025-06-24 10:54         ` Jeff King
2025-06-24 16:05           ` Taylor Blau
2025-06-23 22:32 ` [PATCH v6 0/9] repack: avoid MIDX'ing cruft pack(s) " Taylor Blau
2025-06-23 22:32   ` [PATCH v6 1/9] pack-objects: use standard option incompatibility functions Taylor Blau
2025-06-24 15:52     ` Junio C Hamano
2025-06-24 16:06       ` Taylor Blau
2025-06-23 22:32   ` [PATCH v6 2/9] pack-objects: limit scope in 'add_object_entry_from_pack()' Taylor Blau
2025-06-23 22:49     ` Junio C Hamano
2025-06-23 22:32   ` [PATCH v6 3/9] pack-objects: factor out handling '--stdin-packs' Taylor Blau
2025-06-23 22:32   ` [PATCH v6 4/9] pack-objects: declare 'rev_info' for '--stdin-packs' earlier Taylor Blau
2025-06-23 22:59     ` Junio C Hamano
2025-06-23 22:32   ` [PATCH v6 5/9] pack-objects: perform name-hash traversal for unpacked objects Taylor Blau
2025-06-23 23:08     ` Junio C Hamano
2025-06-24 16:08       ` Taylor Blau
2025-06-23 22:32   ` [PATCH v6 6/9] pack-objects: fix typo in 'show_object_pack_hint()' Taylor Blau
2025-06-23 22:32   ` [PATCH v6 7/9] pack-objects: swap 'show_{object,commit}_pack_hint' Taylor Blau
2025-06-23 22:32   ` [PATCH v6 8/9] pack-objects: introduce '--stdin-packs=follow' Taylor Blau
2025-06-23 23:35     ` Junio C Hamano
2025-06-24 16:10       ` Taylor Blau
2025-06-23 22:32   ` [PATCH v6 9/9] repack: exclude cruft pack(s) from the MIDX where possible Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z/7G73FmI/bK1Jti@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).