git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Jeff King <peff@peff.net>,
	 Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v4 4/6] pack-objects: generate cruft packs at most one object over threshold
Date: Wed, 12 Mar 2025 12:13:10 -0700	[thread overview]
Message-ID: <CABPp-BH0rbieCV4Z11pHOX-mwrtEO-FPNdywV0P5HxXnusdRKQ@mail.gmail.com> (raw)
In-Reply-To: <Z9HaYEyYgBYTiia3@nand.local>

On Wed, Mar 12, 2025 at 12:02 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Wed, Mar 12, 2025 at 11:26:53AM -0700, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > >> So would it be feasible to remember how 199MB cruft pack is lying in
> > >> the object store (i.e. earlier we packed as much as possible), and
> > >> add a logic that says "if there is nothing to expire out of this
> > >> one, do not attempt to repack---this is fine as-is"?
> > > .... So
> > > the majority of packs in this case should all be removed, and the small
> > > amount of cruft data remaining can be repacked into a small number of
> > > packs relatively quickly.
> >
> > Given the above ...
> >
> > > Suppose you have a 100MB cruft limit, and there are two cruft packs in
> > > the repository: one that is 99MB and another that is 1MB in size. Let's
> > > suppose further that if you combine these two packs, the resulting pack
> > > would be exactly 100MB in size.
> > >
> > > Today, repack will say, "I have two packs that sum together to be the
> > > value of --max-cruft-size", and mark them both to be removed (and
> > > replaced with the combined pack generated by pack-objects).
> >
> > ... yes, this logic to reach the above decision is exactly what I
> > said is broken.  Is there no way to fix that?
> >
> > > But if the
> > > combined pack is exactly 100MB, then pack-objects will break the pack
> > > into two just before the 100MB limit, and we'll end up with the same two
> > > packs we started with.
> >
> > If "the majority of packs should all be removed and remainder combined"
> > you stated earlier is true, then this case falls in a tiny minority
> > that we do not have to worry about, doesn't it?
>
> Yeah, it is a niche case. But the more I think about it the more I see
> it your way. I apologize for all of the back-and-forth here, this is
> quite a tricky problem to think through (at least for me), so I
> appreciate your patience.
>
> The original implementation in repack was designed to aggregate smaller
> cruft packs together first until the combined size exceeds the
> threshold. So the above would all be true if no new unreachable objects
> were ever added to the repository, but if another 1MB cruft pack
> appears, then we would:
>
>   - See the first 1MB pack, and decide we can repack it as it's under
>     the 100MB threshold.
>
>   - See the second 1MB pack, and repack it for the similar reasons (this
>     time because 1+1<100, not 1<100).
>
>   - See the 100MB pack, and refuse to repack it because the combined
>     size of 102MB would be over the threshold.
>
> So I think it's reasonable that if we keep the current behavior of
> repacking the smaller ones first that this case is niche enough for me
> to feel OK not worrying about it too much.
>
> And, yes, breaking --max-pack-size when given with --cruft is ugly.
>
> > > But in current Git we will keep repacking
> > > the two together, only to generate the same two packs we started with
> > > forever.
> >
> > Yes.  That is because the logic that decides these packs need to be
> > broken and recombined is flawed.  Maybe it does not have sufficient
> > information to decide that it is no use to attempt combining them,
> > in which case leaving some more info to help the later invocation of
> > repack to tell that it would be useless to attempt combining these
> > packs when you do the initial repack would help, which was what I
> > suggested.  You've thought about the issue much longer than I did,
> > and would be able to come up with better ideas.
>
> I think in the short term I came up with a worse idea than you would
> have ;-).
>
> Probably there is a way to improve this niche case as described above,
> but I think the solution space is probably complicated enough that given
> how narrow of a case it is that it's not worth introducing that much
> complexity.

Would it make sense to break the assumption that --max-cruft-size ==
--max-pack-size and perhaps rename the former?  I think the problem is
that the two imply different things (one is a minimum, the other a
maximum), and thus really should be different values.  E.g.
--combine-cruft-below-size that is set to e.g. half of
--max-pack-size, and then you can continue combining cruft packs
together until they do go above the cruft threshold, while avoiding
actually exceeding the pack size threshold?

  reply	other threads:[~2025-03-12 19:13 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-27 18:29 [PATCH 0/2] pack-objects: freshen objects with multi-cruft packs Taylor Blau
2025-02-27 18:29 ` [PATCH 1/2] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-02-27 19:23   ` Elijah Newren
2025-02-27 22:53     ` Taylor Blau
2025-02-28  7:52   ` Patrick Steinhardt
2025-03-04 21:52     ` Elijah Newren
2025-03-05  2:04       ` Junio C Hamano
2025-03-05  0:09     ` Taylor Blau
2025-02-27 18:29 ` [PATCH 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-02-27 19:26   ` Elijah Newren
2025-02-27 23:03     ` Taylor Blau
2025-02-27 19:28 ` [PATCH 0/2] pack-objects: freshen objects with multi-cruft packs Elijah Newren
2025-02-27 23:05   ` Taylor Blau
2025-03-04 21:35 ` [PATCH v2 " Taylor Blau
2025-03-04 21:35   ` [PATCH v2 1/2] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-03-04 21:35   ` [PATCH v2 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-04 22:55   ` [PATCH v2 0/2] pack-objects: freshen objects with multi-cruft packs Elijah Newren
2025-03-05  0:06     ` Taylor Blau
2025-03-05  0:13       ` Taylor Blau
2025-03-05  0:15 ` [PATCH v3 0/1] " Taylor Blau
2025-03-05  0:15   ` [PATCH v3 1/1] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-06 10:31     ` Patrick Steinhardt
2025-03-13 17:32       ` Taylor Blau
2025-03-06 10:31   ` [PATCH v3 0/1] pack-objects: freshen objects with multi-cruft packs Patrick Steinhardt
2025-03-11  0:21 ` [PATCH v4 0/6] " Taylor Blau
2025-03-11  0:21   ` [PATCH v4 1/6] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
2025-03-11  0:21   ` [PATCH v4 2/6] t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
2025-03-11  0:21   ` [PATCH v4 3/6] t/lib-cruft.sh: extract some cruft-related helpers Taylor Blau
2025-03-11  0:21   ` [PATCH v4 4/6] pack-objects: generate cruft packs at most one object over threshold Taylor Blau
2025-03-11 21:59     ` Junio C Hamano
2025-03-12 15:22       ` Taylor Blau
2025-03-12 18:26         ` Junio C Hamano
2025-03-12 19:02           ` Taylor Blau
2025-03-12 19:13             ` Elijah Newren [this message]
2025-03-12 19:33               ` Taylor Blau
2025-03-12 20:43               ` Junio C Hamano
2025-03-12 20:49                 ` Elijah Newren
2025-03-13 12:16                   ` Junio C Hamano
2025-03-13 16:23                     ` Elijah Newren
2025-03-13 17:06                       ` Junio C Hamano
2025-03-11  0:21   ` [PATCH v4 5/6] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-03-11  0:21   ` [PATCH v4 6/6] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-11 20:13   ` [PATCH v4 0/6] pack-objects: freshen objects with multi-cruft packs Junio C Hamano
2025-03-12 15:33     ` Taylor Blau
2025-03-12 18:28       ` Junio C Hamano
2025-03-12 19:04         ` Taylor Blau
2025-03-12 19:46           ` Junio C Hamano
2025-03-12 19:52             ` Taylor Blau
2025-03-13 17:17               ` Junio C Hamano
2025-03-13 17:35                 ` Taylor Blau
2025-03-13  6:29           ` Jeff King
2025-03-13 15:12             ` Junio C Hamano
2025-03-13 18:09 ` [PATCH v5] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-13 18:41   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BH0rbieCV4Z11pHOX-mwrtEO-FPNdywV0P5HxXnusdRKQ@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).