git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org,  Jeff King <peff@peff.net>,
	 Elijah Newren <newren@gmail.com>,
	 Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v4 4/6] pack-objects: generate cruft packs at most one object over threshold
Date: Wed, 12 Mar 2025 11:26:53 -0700	[thread overview]
Message-ID: <xmqqjz8uxfyq.fsf@gitster.g> (raw)
In-Reply-To: <Z9Gmo2P3Fnt3JeOs@nand.local> (Taylor Blau's message of "Wed, 12 Mar 2025 11:22:11 -0400")

Taylor Blau <me@ttaylorr.com> writes:

>> So would it be feasible to remember how 199MB cruft pack is lying in
>> the object store (i.e. earlier we packed as much as possible), and
>> add a logic that says "if there is nothing to expire out of this
>> one, do not attempt to repack---this is fine as-is"?
> .... So
> the majority of packs in this case should all be removed, and the small
> amount of cruft data remaining can be repacked into a small number of
> packs relatively quickly.

Given the above ...

> Suppose you have a 100MB cruft limit, and there are two cruft packs in
> the repository: one that is 99MB and another that is 1MB in size. Let's
> suppose further that if you combine these two packs, the resulting pack
> would be exactly 100MB in size.
>
> Today, repack will say, "I have two packs that sum together to be the
> value of --max-cruft-size", and mark them both to be removed (and
> replaced with the combined pack generated by pack-objects).

... yes, this logic to reach the above decision is exactly what I
said is broken.  Is there no way to fix that?

> But if the
> combined pack is exactly 100MB, then pack-objects will break the pack
> into two just before the 100MB limit, and we'll end up with the same two
> packs we started with.

If "the majority of packs should all be removed and remainder combined"
you stated earlier is true, then this case falls in a tiny minority
that we do not have to worry about, doesn't it?

> Ideally we would combine those packs into one that is at most one
> object's size larger than the threshold, and the steady state would be
> to avoid repacking it further.

I do not see why you can call that "Ideally".  Ideally, we would
combine those packs to create a pack (or two) without busting the
threshold, *and* avoid needress repacking.  Busting the given limit
should not be part of the definition of "ideal" solution.

> But in current Git we will keep repacking
> the two together, only to generate the same two packs we started with
> forever.

Yes.  That is because the logic that decides these packs need to be
broken and recombined is flawed.  Maybe it does not have sufficient
information to decide that it is no use to attempt combining them,
in which case leaving some more info to help the later invocation of
repack to tell that it would be useless to attempt combining these
packs when you do the initial repack would help, which was what I
suggested.  You've thought about the issue much longer than I did,
and would be able to come up with better ideas.

  reply	other threads:[~2025-03-12 18:26 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-27 18:29 [PATCH 0/2] pack-objects: freshen objects with multi-cruft packs Taylor Blau
2025-02-27 18:29 ` [PATCH 1/2] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-02-27 19:23   ` Elijah Newren
2025-02-27 22:53     ` Taylor Blau
2025-02-28  7:52   ` Patrick Steinhardt
2025-03-04 21:52     ` Elijah Newren
2025-03-05  2:04       ` Junio C Hamano
2025-03-05  0:09     ` Taylor Blau
2025-02-27 18:29 ` [PATCH 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-02-27 19:26   ` Elijah Newren
2025-02-27 23:03     ` Taylor Blau
2025-02-27 19:28 ` [PATCH 0/2] pack-objects: freshen objects with multi-cruft packs Elijah Newren
2025-02-27 23:05   ` Taylor Blau
2025-03-04 21:35 ` [PATCH v2 " Taylor Blau
2025-03-04 21:35   ` [PATCH v2 1/2] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-03-04 21:35   ` [PATCH v2 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-04 22:55   ` [PATCH v2 0/2] pack-objects: freshen objects with multi-cruft packs Elijah Newren
2025-03-05  0:06     ` Taylor Blau
2025-03-05  0:13       ` Taylor Blau
2025-03-05  0:15 ` [PATCH v3 0/1] " Taylor Blau
2025-03-05  0:15   ` [PATCH v3 1/1] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-06 10:31     ` Patrick Steinhardt
2025-03-13 17:32       ` Taylor Blau
2025-03-06 10:31   ` [PATCH v3 0/1] pack-objects: freshen objects with multi-cruft packs Patrick Steinhardt
2025-03-11  0:21 ` [PATCH v4 0/6] " Taylor Blau
2025-03-11  0:21   ` [PATCH v4 1/6] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
2025-03-11  0:21   ` [PATCH v4 2/6] t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
2025-03-11  0:21   ` [PATCH v4 3/6] t/lib-cruft.sh: extract some cruft-related helpers Taylor Blau
2025-03-11  0:21   ` [PATCH v4 4/6] pack-objects: generate cruft packs at most one object over threshold Taylor Blau
2025-03-11 21:59     ` Junio C Hamano
2025-03-12 15:22       ` Taylor Blau
2025-03-12 18:26         ` Junio C Hamano [this message]
2025-03-12 19:02           ` Taylor Blau
2025-03-12 19:13             ` Elijah Newren
2025-03-12 19:33               ` Taylor Blau
2025-03-12 20:43               ` Junio C Hamano
2025-03-12 20:49                 ` Elijah Newren
2025-03-13 12:16                   ` Junio C Hamano
2025-03-13 16:23                     ` Elijah Newren
2025-03-13 17:06                       ` Junio C Hamano
2025-03-11  0:21   ` [PATCH v4 5/6] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-03-11  0:21   ` [PATCH v4 6/6] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-11 20:13   ` [PATCH v4 0/6] pack-objects: freshen objects with multi-cruft packs Junio C Hamano
2025-03-12 15:33     ` Taylor Blau
2025-03-12 18:28       ` Junio C Hamano
2025-03-12 19:04         ` Taylor Blau
2025-03-12 19:46           ` Junio C Hamano
2025-03-12 19:52             ` Taylor Blau
2025-03-13 17:17               ` Junio C Hamano
2025-03-13 17:35                 ` Taylor Blau
2025-03-13  6:29           ` Jeff King
2025-03-13 15:12             ` Junio C Hamano
2025-03-13 18:09 ` [PATCH v5] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-13 18:41   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqjz8uxfyq.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).