From: Junio C Hamano <gitster@pobox.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v5] builtin/pack-objects.c: freshen objects from existing cruft packs
Date: Thu, 13 Mar 2025 11:41:09 -0700 [thread overview]
Message-ID: <xmqq8qp8ojsq.fsf@gitster.g> (raw)
In-Reply-To: <1563552bbda0bc910c9f41b0fabc3225c4d778fc.1741889018.git.me@ttaylorr.com> (Taylor Blau's message of "Thu, 13 Mar 2025 14:09:47 -0400")
Taylor Blau <me@ttaylorr.com> writes:
> Once an object is written into a cruft pack, we can only freshen it by
> writing a new loose or packed copy of that object with a more recent
> mtime.
>
> Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
> with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
> a repository at any given time. So freshening unreachable objects was
> straightforward when already rewriting the cruft pack (and its *.mtimes
> file).
>
> But 61568efa95 changes things: 'pack-objects' now supports writing
> multiple cruft packs when invoked with `--cruft` and the
> `--max-pack-size` flag. Cruft packs are rewritten until they reach some
> size threshold, at which point they are considered "frozen", and will
> only be modified in a pruning GC, or if the threshold itself is
> adjusted.
>
> Prior to this patch, however, this process breaks down when we attempt
> to freshen an object packed in an earlier cruft pack, and that cruft
> pack is larger than the threshold and thus will survive the repack.
>
> When this is the case, it is impossible to freshen objects in cruft
> pack(s) when those cruft packs are larger than the threshold. This is
> because we would avoid writing them in the new cruft pack entirely, for
> a couple of reasons.
>
> 1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
> we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
> over the packs we're going to retain (which are marked as kept
> in-core by 'read_cruft_objects()').
>
> This means that we will avoid enumerating additional packed copies
> of objects found in any cruft packs which are larger than the given
> size threshold. Thus there is no opportunity to call
> 'create_object_entry()' whatsoever.
>
> 2. We likewise will discard the loose copy (if one exists) of any
> unreachable object packed in a cruft pack that is larger than the
> threshold. Here our call path is 'add_unreachable_loose_objects()',
> which uses the 'add_loose_object()' callback.
>
> That function will eventually land us in 'want_object_in_pack()'
> (via 'add_cruft_object_entry()'), and we'll discard the object as it
> appears in one of the packs which we marked as kept in-core.
>
> This means in effect that it is impossible to freshen an unreachable
> object once it appears in a cruft pack larger than the given threshold.
>
> Instead, we should pack an additional copy of an unreachable object we
> want to freshen even if it appears in a cruft pack, provided that the
> cruft copy has an mtime which is before the mtime of the copy we are
> trying to pack/freshen. This is sub-optimal in the sense that it
> requires keeping an additional copy of unreachable objects upon
> freshening, but we don't have a better alternative without the ability
> to make in-place modifications to existing *.mtimes files.
>
> In order to implement this, we have to adjust the behavior of
> 'want_found_object()'. When 'pack-objects' is told that we're *not*
> going to retain any cruft packs (i.e. the set of packs marked as kept
> in-core does not contain a cruft pack), the behavior is unchanged.
>
> But when there *is* at least one cruft pack that we're holding onto, it
> is no longer sufficient to reject a copy of an object found in that
> cruft pack for that reason alone. In this case, we only want to reject a
> candidate object when copies of that object either:
>
> - exists in a non-cruft pack that we are retaining, regardless of that
> pack's mtime, or
>
> - exists in a cruft pack with an mtime at least as recent as the copy
> we are debating whether or not to pack, in which case freshening
> would be redundant.
>
> To do this, keep track of whether or not we have any cruft packs in our
> in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
> flag. When we end up in this new special case, we replace a call to
> 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
> objects when we have a copy in an existing cruft pack with at least as
> recent an mtime as our candidate (in which case "freshening" would be
> redundant).
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> builtin/pack-objects.c | 118 ++++++++++++++++++++++++++++++++++------
> packfile.c | 3 +-
> packfile.h | 2 +
> t/t7704-repack-cruft.sh | 66 ++++++++++++++++++++++
> 4 files changed, 171 insertions(+), 18 deletions(-)
Thanks. Will queue.
prev parent reply other threads:[~2025-03-13 18:41 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-27 18:29 [PATCH 0/2] pack-objects: freshen objects with multi-cruft packs Taylor Blau
2025-02-27 18:29 ` [PATCH 1/2] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-02-27 19:23 ` Elijah Newren
2025-02-27 22:53 ` Taylor Blau
2025-02-28 7:52 ` Patrick Steinhardt
2025-03-04 21:52 ` Elijah Newren
2025-03-05 2:04 ` Junio C Hamano
2025-03-05 0:09 ` Taylor Blau
2025-02-27 18:29 ` [PATCH 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-02-27 19:26 ` Elijah Newren
2025-02-27 23:03 ` Taylor Blau
2025-02-27 19:28 ` [PATCH 0/2] pack-objects: freshen objects with multi-cruft packs Elijah Newren
2025-02-27 23:05 ` Taylor Blau
2025-03-04 21:35 ` [PATCH v2 " Taylor Blau
2025-03-04 21:35 ` [PATCH v2 1/2] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-03-04 21:35 ` [PATCH v2 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-04 22:55 ` [PATCH v2 0/2] pack-objects: freshen objects with multi-cruft packs Elijah Newren
2025-03-05 0:06 ` Taylor Blau
2025-03-05 0:13 ` Taylor Blau
2025-03-05 0:15 ` [PATCH v3 0/1] " Taylor Blau
2025-03-05 0:15 ` [PATCH v3 1/1] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-06 10:31 ` Patrick Steinhardt
2025-03-13 17:32 ` Taylor Blau
2025-03-06 10:31 ` [PATCH v3 0/1] pack-objects: freshen objects with multi-cruft packs Patrick Steinhardt
2025-03-11 0:21 ` [PATCH v4 0/6] " Taylor Blau
2025-03-11 0:21 ` [PATCH v4 1/6] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
2025-03-11 0:21 ` [PATCH v4 2/6] t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
2025-03-11 0:21 ` [PATCH v4 3/6] t/lib-cruft.sh: extract some cruft-related helpers Taylor Blau
2025-03-11 0:21 ` [PATCH v4 4/6] pack-objects: generate cruft packs at most one object over threshold Taylor Blau
2025-03-11 21:59 ` Junio C Hamano
2025-03-12 15:22 ` Taylor Blau
2025-03-12 18:26 ` Junio C Hamano
2025-03-12 19:02 ` Taylor Blau
2025-03-12 19:13 ` Elijah Newren
2025-03-12 19:33 ` Taylor Blau
2025-03-12 20:43 ` Junio C Hamano
2025-03-12 20:49 ` Elijah Newren
2025-03-13 12:16 ` Junio C Hamano
2025-03-13 16:23 ` Elijah Newren
2025-03-13 17:06 ` Junio C Hamano
2025-03-11 0:21 ` [PATCH v4 5/6] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-03-11 0:21 ` [PATCH v4 6/6] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-11 20:13 ` [PATCH v4 0/6] pack-objects: freshen objects with multi-cruft packs Junio C Hamano
2025-03-12 15:33 ` Taylor Blau
2025-03-12 18:28 ` Junio C Hamano
2025-03-12 19:04 ` Taylor Blau
2025-03-12 19:46 ` Junio C Hamano
2025-03-12 19:52 ` Taylor Blau
2025-03-13 17:17 ` Junio C Hamano
2025-03-13 17:35 ` Taylor Blau
2025-03-13 6:29 ` Jeff King
2025-03-13 15:12 ` Junio C Hamano
2025-03-13 18:09 ` [PATCH v5] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-13 18:41 ` Junio C Hamano [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq8qp8ojsq.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.