From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 0/2] pack-objects: freshen objects with multi-cruft packs
Date: Tue, 4 Mar 2025 16:35:17 -0500 [thread overview]
Message-ID: <cover.1741124116.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1740680964.git.me@ttaylorr.com>
This is a minor reroll of my series to fix a bug in freshening objects
with multiple cruft packs.
Since last time the only updates to the series are textual changes to
the commit message, so everything functionally should be working as it
was before.
As usual, there is a range-diff showing as much below, and the original
cover letter is as follows:
---
This short series contains a fix for a bug I noticed while rolling out
multi-cruft packs (via 'git repack --max-cruft-size') within GitHub's
infrastructure.
The series is structured as follows:
- The first patch simplifies how 'repack' aggregates cruft packs
together when their size is below the '--max-cruft-size' or
'--max-pack-size' threshold. This simplification changes behavior
slightly, but not in a meaningful way. It occurred to me while
writing the second patch.
- The second patch describes and fixes the main bug. The gist here is
that objects which are (a) unreachable, (b) exist in a cruft pack
being retained, and (c) were freshened to have a more recent mtime
than any existing cruft copy are unable to be freshened.
The fix pursued in the second patch changes the rules around when we
want to retain an object via builtin/pack-objects.c::want_found_object()
when at least one cruft pack will survive the repack.
Previously the rule was to discard any object which appears in any
surviving pack, regardless of mtime. The rule now is to only discard an
object if it appears in either (a) a non-cruft pack which will survive
the repack, or (b) a cruft pack whose mtime for that object is older
than the one we are trying to pack.
I think that this is the right behavior, but admittedly putting this
series together hurt my brain trying to think through all of the cases.
I'm fairly confident in the testing here as I remember it being fairly
exhaustive of all interesting cases. But I'd appreciate a sanity check
from others that they too are convinced this is the right approach.
Thanks in advance for your review!
Taylor Blau (2):
builtin/repack.c: simplify cruft pack aggregation
builtin/pack-objects.c: freshen objects from existing cruft packs
builtin/pack-objects.c | 118 ++++++++++++++++++++++++++++++++++------
builtin/repack.c | 38 +------------
packfile.c | 3 +-
packfile.h | 2 +
t/t7704-repack-cruft.sh | 105 +++++++++++++++++++++--------------
5 files changed, 170 insertions(+), 96 deletions(-)
Range-diff against v1:
1: 8564f982597 ! 1: 63ea9d4d00e builtin/repack.c: simplify cruft pack aggregation
@@ Commit message
would get combined together until the sum of their sizes was no larger
than the given max pack size.
- There is a much simpler way to achieve this, however, which is to simply
- combine *all* cruft packs which are smaller than the threshold,
+ There is a much simpler way to combine cruft packs, however, which is to
+ simply combine *all* cruft packs which are smaller than the threshold,
regardless of what their sum is. With '--max-pack-size', 'pack-objects'
will split out the resulting pack into individual pack(s) if necessary
to ensure that the written pack(s) are each no larger than the provided
2: c0c926adde2 ! 2: 7ba9054701b builtin/pack-objects.c: freshen objects from existing cruft packs
@@ Commit message
only be modified in a pruning GC, or if the threshold itself is
adjusted.
- However, this process breaks down when we attempt to freshen an object
- packed in an earlier cruft pack that is larger than the threshold and
- thus will survive the repack.
+ Prior to this patch, however, this process breaks down when we attempt
+ to freshen an object packed in an earlier cruft pack, and that cruft
+ pack is larger than the threshold and thus will survive the repack.
When this is the case, it is impossible to freshen objects in cruft
- pack(s) which are larger than the threshold. This is because we avoid
- writing them in the new cruft pack entirely, for a couple of reasons.
+ pack(s) when those cruft packs are larger than the threshold. This is
+ because we would avoid writing them in the new cruft pack entirely, for
+ a couple of reasons.
1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
@@ Commit message
- exists in a non-cruft pack that we are retaining, regardless of that
pack's mtime, or
- - exists in a cruft pack with an mtime more recent than the copy we are
- debating whether or not to pack, in which case freshening would be
- redundant.
+ - exists in a cruft pack with an mtime at least as recent as the copy
+ we are debating whether or not to pack, in which case freshening
+ would be redundant.
To do this, keep track of whether or not we have any cruft packs in our
in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
flag. When we end up in this new special case, we replace a call to
- 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only
- reject objects when we have a copy in an existing cruft pack with a more
- recent mtime (in which case "freshening" would be redundant).
+ 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
+ objects when we have a copy in an existing cruft pack with at least as
+ recent an mtime as our candidate (in which case "freshening" would be
+ redundant).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
@@ t/t7704-repack-cruft.sh: test_expect_success '--max-cruft-size with freshened ob
+
+ git repack --cruft -d &&
+
-+ # Make a packed copy of object $foo with a more recent
-+ # mtime.
++ # Make an identical copy of foo stored in a pack with a more
++ # recent mtime.
+ foo="$(generate_random_blob foo $((2*1024*1024)))" &&
+ foo_pack="$(echo "$foo" | git pack-objects $packdir/pack)" &&
+ test-tool chmtime --get -100 \
+ "$packdir/pack-$foo_pack.pack" >foo.new &&
+ git prune-packed &&
+
-+ # Make a loose copy of object $bar with a more recent
-+ # mtime.
++ # Make a loose copy of bar, also with a more recent mtime.
+ bar="$(generate_random_blob bar $((2*1024*1024)))" &&
+ test-tool chmtime --get -100 \
+ "$objdir/$(test_oid_to_path "$bar")" >bar.new &&
base-commit: 08bdfd453584e489d5a551aecbdcb77584e1b958
--
2.49.0.rc0.57.gdb91954e186.dirty
next prev parent reply other threads:[~2025-03-04 21:35 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-27 18:29 [PATCH 0/2] pack-objects: freshen objects with multi-cruft packs Taylor Blau
2025-02-27 18:29 ` [PATCH 1/2] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-02-27 19:23 ` Elijah Newren
2025-02-27 22:53 ` Taylor Blau
2025-02-28 7:52 ` Patrick Steinhardt
2025-03-04 21:52 ` Elijah Newren
2025-03-05 2:04 ` Junio C Hamano
2025-03-05 0:09 ` Taylor Blau
2025-02-27 18:29 ` [PATCH 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-02-27 19:26 ` Elijah Newren
2025-02-27 23:03 ` Taylor Blau
2025-02-27 19:28 ` [PATCH 0/2] pack-objects: freshen objects with multi-cruft packs Elijah Newren
2025-02-27 23:05 ` Taylor Blau
2025-03-04 21:35 ` Taylor Blau [this message]
2025-03-04 21:35 ` [PATCH v2 1/2] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-03-04 21:35 ` [PATCH v2 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-04 22:55 ` [PATCH v2 0/2] pack-objects: freshen objects with multi-cruft packs Elijah Newren
2025-03-05 0:06 ` Taylor Blau
2025-03-05 0:13 ` Taylor Blau
2025-03-05 0:15 ` [PATCH v3 0/1] " Taylor Blau
2025-03-05 0:15 ` [PATCH v3 1/1] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-06 10:31 ` Patrick Steinhardt
2025-03-13 17:32 ` Taylor Blau
2025-03-06 10:31 ` [PATCH v3 0/1] pack-objects: freshen objects with multi-cruft packs Patrick Steinhardt
2025-03-11 0:21 ` [PATCH v4 0/6] " Taylor Blau
2025-03-11 0:21 ` [PATCH v4 1/6] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
2025-03-11 0:21 ` [PATCH v4 2/6] t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
2025-03-11 0:21 ` [PATCH v4 3/6] t/lib-cruft.sh: extract some cruft-related helpers Taylor Blau
2025-03-11 0:21 ` [PATCH v4 4/6] pack-objects: generate cruft packs at most one object over threshold Taylor Blau
2025-03-11 21:59 ` Junio C Hamano
2025-03-12 15:22 ` Taylor Blau
2025-03-12 18:26 ` Junio C Hamano
2025-03-12 19:02 ` Taylor Blau
2025-03-12 19:13 ` Elijah Newren
2025-03-12 19:33 ` Taylor Blau
2025-03-12 20:43 ` Junio C Hamano
2025-03-12 20:49 ` Elijah Newren
2025-03-13 12:16 ` Junio C Hamano
2025-03-13 16:23 ` Elijah Newren
2025-03-13 17:06 ` Junio C Hamano
2025-03-11 0:21 ` [PATCH v4 5/6] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-03-11 0:21 ` [PATCH v4 6/6] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-11 20:13 ` [PATCH v4 0/6] pack-objects: freshen objects with multi-cruft packs Junio C Hamano
2025-03-12 15:33 ` Taylor Blau
2025-03-12 18:28 ` Junio C Hamano
2025-03-12 19:04 ` Taylor Blau
2025-03-12 19:46 ` Junio C Hamano
2025-03-12 19:52 ` Taylor Blau
2025-03-13 17:17 ` Junio C Hamano
2025-03-13 17:35 ` Taylor Blau
2025-03-13 6:29 ` Jeff King
2025-03-13 15:12 ` Junio C Hamano
2025-03-13 18:09 ` [PATCH v5] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-13 18:41 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1741124116.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).