* [PATCH 0/5] repack: introduce '--combine-cruft-below-size'
@ 2025-03-17 23:00 Taylor Blau
2025-03-17 23:00 ` [PATCH 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
` (5 more replies)
0 siblings, 6 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-17 23:00 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
(This is based on tb/multi-cruft-pack-refresh-fix from Junio's tree,
which is at 08f612ba70 (builtin/pack-objects.c: freshen objects from
existing cruft packs, 2025-03-13) at the time of writing).
This series replaces something close to the existing behavior of
repack's '--max-cruft-size' flag with '--combine-cruft-below-size'.
The new flag is much clearer in its intent and function, and avoids the
lack of clarity between the two that was discussed in
<cover.1741648467.git.me@ttaylorr.com>
The new behavior is as follows:
- '--max-cruft-size' is a cruft pack-specific override for repack's
'--max-pack-size' command-line flag.
- '--combine-cruft-below-size' instructs repack to only combine cruft
packs which are smaller than the given threshold. This will likely
result in packs which are larger than the threshold. But that is OK:
the point is to limit the size of the individual packs on input, not
the size of the outgoing pack.
This series does break the existing behavior of '--max-cruft-size'. But
I think breaking backwards compatibility here is OK, since the existing
behavior was so broken to begin with. I'm happy to consider other
alternatives if others feel that this isn't OK.
The series has an interesting structure that I feel may be worth calling
out. The first three patches are trivial test cleanups. The fourth patch
modifies the existing behavior of '--max-cruft-size', but does so while
keeping some of the old infrastructure around.
That may seem like an unnecessarily complicated approach, but it greatly
simplifies introducing the new behavior in the following (and final)
commit. I think that this results in a series that is a little easier to
review (since we don't see a ton of code being removed in one commit and
then re-added in another immediately following it). But if others feel
like this was a mistake, please let me know ;-).
Thanks in advance for your review!
Taylor Blau (5):
t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
t/t7704-repack-cruft.sh: consolidate `write_blob()`
repack: avoid combining cruft packs with `--max-cruft-size`
repack: begin combining cruft packs with `--combine-cruft-below-size`
Documentation/git-repack.adoc | 20 ++-
builtin/repack.c | 62 +++----
t/t5329-pack-objects-cruft.sh | 302 ++++++----------------------------
t/t7704-repack-cruft.sh | 293 ++++++++++++++++++++++++++++++---
4 files changed, 354 insertions(+), 323 deletions(-)
base-commit: 08f612ba7000bf181ef6d8baed9ece322e567efd
--
2.49.0.rc0.6.g7f120c35e9
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
2025-03-17 23:00 [PATCH 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
@ 2025-03-17 23:00 ` Taylor Blau
2025-03-19 14:20 ` Elijah Newren
2025-03-17 23:00 ` [PATCH 2/5] t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests Taylor Blau
` (4 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2025-03-17 23:00 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
The cruft pack feature has two primary test scripts which exercise
various parts of it, which are:
- t5329-pack-objects-cruft.sh
- t7704-repack-cruft.sh
The former is designed to test low-level pack generation mechanics at
the 'git pack-objects --cruft'-level, which is plumbing. The latter, on
the other hand, is designed to test the user-facing behavior through
'git repack --cruft', which is porcelain (under the "ancillary
manipulators" sub-section).
At some point a handful of tests which should have been added to the
latter script were instead written to the former. This isn't a huge
deal, but rectifying it is straightforward. Move a handful of
'repack'-related tests out of t5329 and into their rightful home in
t7704.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t5329-pack-objects-cruft.sh | 250 ----------------------------------
t/t7704-repack-cruft.sh | 250 ++++++++++++++++++++++++++++++++++
2 files changed, 250 insertions(+), 250 deletions(-)
diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh
index b71a0aef40..60dac8312d 100755
--- a/t/t5329-pack-objects-cruft.sh
+++ b/t/t5329-pack-objects-cruft.sh
@@ -360,43 +360,6 @@ test_expect_success 'expired objects are pruned' '
)
'
-test_expect_success 'repack --cruft generates a cruft pack' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit reachable &&
- git branch -M main &&
- git checkout --orphan other &&
- test_commit unreachable &&
-
- git checkout main &&
- git branch -D other &&
- git tag -d unreachable &&
- # objects are not cruft if they are contained in the reflogs
- git reflog expire --all --expire=all &&
-
- git rev-list --objects --all --no-object-names >reachable.raw &&
- git cat-file --batch-all-objects --batch-check="%(objectname)" >objects &&
- sort <reachable.raw >reachable &&
- comm -13 reachable objects >unreachable &&
-
- git repack --cruft -d &&
-
- cruft=$(basename $(ls $packdir/pack-*.mtimes) .mtimes) &&
- pack=$(basename $(ls $packdir/pack-*.pack | grep -v $cruft) .pack) &&
-
- git show-index <$packdir/$pack.idx >actual.raw &&
- cut -f2 -d" " actual.raw | sort >actual &&
- test_cmp reachable actual &&
-
- git show-index <$packdir/$cruft.idx >actual.raw &&
- cut -f2 -d" " actual.raw | sort >actual &&
- test_cmp unreachable actual
- )
-'
-
test_expect_success 'loose objects mtimes upsert others' '
git init repo &&
test_when_finished "rm -fr repo" &&
@@ -470,219 +433,6 @@ test_expect_success 'expiring cruft objects with git gc' '
)
'
-test_expect_success 'cruft packs are not included in geometric repack' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit reachable &&
- git repack -Ad &&
- git branch -M main &&
-
- git checkout --orphan other &&
- test_commit cruft &&
- git repack -d &&
-
- git checkout main &&
- git branch -D other &&
- git tag -d cruft &&
- git reflog expire --all --expire=all &&
-
- git repack --cruft &&
-
- find $packdir -type f | sort >before &&
- git repack --geometric=2 -d &&
- find $packdir -type f | sort >after &&
-
- test_cmp before after
- )
-'
-
-test_expect_success 'repack --geometric collects once-cruft objects' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit reachable &&
- git repack -Ad &&
- git branch -M main &&
-
- git checkout --orphan other &&
- git rm -rf . &&
- test_commit --no-tag cruft &&
- cruft="$(git rev-parse HEAD)" &&
-
- git checkout main &&
- git branch -D other &&
- git reflog expire --all --expire=all &&
-
- # Pack the objects created in the previous step into a cruft
- # pack. Intentionally leave loose copies of those objects
- # around so we can pick them up in a subsequent --geometric
- # reapack.
- git repack --cruft &&
-
- # Now make those objects reachable, and ensure that they are
- # packed into the new pack created via a --geometric repack.
- git update-ref refs/heads/other $cruft &&
-
- # Without this object, the set of unpacked objects is exactly
- # the set of objects already in the cruft pack. Tweak that set
- # to ensure we do not overwrite the cruft pack entirely.
- test_commit reachable2 &&
-
- find $packdir -name "pack-*.idx" | sort >before &&
- git repack --geometric=2 -d &&
- find $packdir -name "pack-*.idx" | sort >after &&
-
- {
- git rev-list --objects --no-object-names $cruft &&
- git rev-list --objects --no-object-names reachable..reachable2
- } >want.raw &&
- sort want.raw >want &&
-
- pack=$(comm -13 before after) &&
- git show-index <$pack >objects.raw &&
-
- cut -d" " -f2 objects.raw | sort >got &&
-
- test_cmp want got
- )
-'
-
-test_expect_success 'cruft repack with no reachable objects' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit base &&
- git repack -ad &&
-
- base="$(git rev-parse base)" &&
-
- git for-each-ref --format="delete %(refname)" >in &&
- git update-ref --stdin <in &&
- git reflog expire --all --expire=all &&
- rm -fr .git/index &&
-
- git repack --cruft -d &&
-
- git cat-file -t $base
- )
-'
-
-write_blob () {
- test-tool genrandom "$@" >in &&
- git hash-object -w -t blob in
-}
-
-find_pack () {
- for idx in $(ls $packdir/pack-*.idx)
- do
- git show-index <$idx >out &&
- if grep -q "$1" out
- then
- echo $idx
- fi || return 1
- done
-}
-
-test_expect_success 'cruft repack with --max-pack-size' '
- git init max-pack-size &&
- (
- cd max-pack-size &&
- test_commit base &&
-
- # two cruft objects which exceed the maximum pack size
- foo=$(write_blob foo 1048576) &&
- bar=$(write_blob bar 1048576) &&
- test-tool chmtime --get -1000 \
- "$objdir/$(test_oid_to_path $foo)" >foo.mtime &&
- test-tool chmtime --get -2000 \
- "$objdir/$(test_oid_to_path $bar)" >bar.mtime &&
- git repack --cruft --max-pack-size=1M &&
- find $packdir -name "*.mtimes" >cruft &&
- test_line_count = 2 cruft &&
-
- foo_mtimes="$(basename $(find_pack $foo) .idx).mtimes" &&
- bar_mtimes="$(basename $(find_pack $bar) .idx).mtimes" &&
- test-tool pack-mtimes $foo_mtimes >foo.actual &&
- test-tool pack-mtimes $bar_mtimes >bar.actual &&
-
- echo "$foo $(cat foo.mtime)" >foo.expect &&
- echo "$bar $(cat bar.mtime)" >bar.expect &&
-
- test_cmp foo.expect foo.actual &&
- test_cmp bar.expect bar.actual &&
- test "$foo_mtimes" != "$bar_mtimes"
- )
-'
-
-test_expect_success 'cruft repack with pack.packSizeLimit' '
- (
- cd max-pack-size &&
- # repack everything back together to remove the existing cruft
- # pack (but to keep its objects)
- git repack -adk &&
- git -c pack.packSizeLimit=1M repack --cruft &&
- # ensure the same post condition is met when --max-pack-size
- # would otherwise be inferred from the configuration
- find $packdir -name "*.mtimes" >cruft &&
- test_line_count = 2 cruft &&
- for pack in $(cat cruft)
- do
- test-tool pack-mtimes "$(basename $pack)" >objects &&
- test_line_count = 1 objects || return 1
- done
- )
-'
-
-test_expect_success 'cruft repack respects repack.cruftWindow' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit base &&
-
- GIT_TRACE2_EVENT=$(pwd)/event.trace \
- git -c pack.window=1 -c repack.cruftWindow=2 repack \
- --cruft --window=3 &&
-
- grep "pack-objects.*--window=2.*--cruft" event.trace
- )
-'
-
-test_expect_success 'cruft repack respects --window by default' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit base &&
-
- GIT_TRACE2_EVENT=$(pwd)/event.trace \
- git -c pack.window=2 repack --cruft --window=3 &&
-
- grep "pack-objects.*--window=3.*--cruft" event.trace
- )
-'
-
-test_expect_success 'cruft repack respects --quiet' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit base &&
- GIT_PROGRESS_DELAY=0 git repack --cruft --quiet 2>err &&
- test_must_be_empty err
- )
-'
-
test_expect_success 'cruft --local drops unreachable objects' '
git init alternate &&
git init repo &&
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 43d2947d28..cd452040ea 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -477,4 +477,254 @@ test_expect_success 'reachable packs are preferred over cruft ones' '
)
'
+test_expect_success 'repack --cruft generates a cruft pack' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit reachable &&
+ git branch -M main &&
+ git checkout --orphan other &&
+ test_commit unreachable &&
+
+ git checkout main &&
+ git branch -D other &&
+ git tag -d unreachable &&
+ # objects are not cruft if they are contained in the reflogs
+ git reflog expire --all --expire=all &&
+
+ git rev-list --objects --all --no-object-names >reachable.raw &&
+ git cat-file --batch-all-objects --batch-check="%(objectname)" >objects &&
+ sort <reachable.raw >reachable &&
+ comm -13 reachable objects >unreachable &&
+
+ git repack --cruft -d &&
+
+ cruft=$(basename $(ls $packdir/pack-*.mtimes) .mtimes) &&
+ pack=$(basename $(ls $packdir/pack-*.pack | grep -v $cruft) .pack) &&
+
+ git show-index <$packdir/$pack.idx >actual.raw &&
+ cut -f2 -d" " actual.raw | sort >actual &&
+ test_cmp reachable actual &&
+
+ git show-index <$packdir/$cruft.idx >actual.raw &&
+ cut -f2 -d" " actual.raw | sort >actual &&
+ test_cmp unreachable actual
+ )
+'
+
+test_expect_success 'cruft packs are not included in geometric repack' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit reachable &&
+ git repack -Ad &&
+ git branch -M main &&
+
+ git checkout --orphan other &&
+ test_commit cruft &&
+ git repack -d &&
+
+ git checkout main &&
+ git branch -D other &&
+ git tag -d cruft &&
+ git reflog expire --all --expire=all &&
+
+ git repack --cruft &&
+
+ find $packdir -type f | sort >before &&
+ git repack --geometric=2 -d &&
+ find $packdir -type f | sort >after &&
+
+ test_cmp before after
+ )
+'
+
+test_expect_success 'repack --geometric collects once-cruft objects' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit reachable &&
+ git repack -Ad &&
+ git branch -M main &&
+
+ git checkout --orphan other &&
+ git rm -rf . &&
+ test_commit --no-tag cruft &&
+ cruft="$(git rev-parse HEAD)" &&
+
+ git checkout main &&
+ git branch -D other &&
+ git reflog expire --all --expire=all &&
+
+ # Pack the objects created in the previous step into a cruft
+ # pack. Intentionally leave loose copies of those objects
+ # around so we can pick them up in a subsequent --geometric
+ # reapack.
+ git repack --cruft &&
+
+ # Now make those objects reachable, and ensure that they are
+ # packed into the new pack created via a --geometric repack.
+ git update-ref refs/heads/other $cruft &&
+
+ # Without this object, the set of unpacked objects is exactly
+ # the set of objects already in the cruft pack. Tweak that set
+ # to ensure we do not overwrite the cruft pack entirely.
+ test_commit reachable2 &&
+
+ find $packdir -name "pack-*.idx" | sort >before &&
+ git repack --geometric=2 -d &&
+ find $packdir -name "pack-*.idx" | sort >after &&
+
+ {
+ git rev-list --objects --no-object-names $cruft &&
+ git rev-list --objects --no-object-names reachable..reachable2
+ } >want.raw &&
+ sort want.raw >want &&
+
+ pack=$(comm -13 before after) &&
+ git show-index <$pack >objects.raw &&
+
+ cut -d" " -f2 objects.raw | sort >got &&
+
+ test_cmp want got
+ )
+'
+
+test_expect_success 'cruft repack with no reachable objects' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit base &&
+ git repack -ad &&
+
+ base="$(git rev-parse base)" &&
+
+ git for-each-ref --format="delete %(refname)" >in &&
+ git update-ref --stdin <in &&
+ git reflog expire --all --expire=all &&
+ rm -fr .git/index &&
+
+ git repack --cruft -d &&
+
+ git cat-file -t $base
+ )
+'
+
+write_blob () {
+ test-tool genrandom "$@" >in &&
+ git hash-object -w -t blob in
+}
+
+find_pack () {
+ for idx in $(ls $packdir/pack-*.idx)
+ do
+ git show-index <$idx >out &&
+ if grep -q "$1" out
+ then
+ echo $idx
+ fi || return 1
+ done
+}
+
+test_expect_success 'cruft repack with --max-pack-size' '
+ git init max-pack-size &&
+ (
+ cd max-pack-size &&
+ test_commit base &&
+
+ # two cruft objects which exceed the maximum pack size
+ foo=$(write_blob foo 1048576) &&
+ bar=$(write_blob bar 1048576) &&
+ test-tool chmtime --get -1000 \
+ "$objdir/$(test_oid_to_path $foo)" >foo.mtime &&
+ test-tool chmtime --get -2000 \
+ "$objdir/$(test_oid_to_path $bar)" >bar.mtime &&
+ git repack --cruft --max-pack-size=1M &&
+ find $packdir -name "*.mtimes" >cruft &&
+ test_line_count = 2 cruft &&
+
+ foo_mtimes="$(basename $(find_pack $foo) .idx).mtimes" &&
+ bar_mtimes="$(basename $(find_pack $bar) .idx).mtimes" &&
+ test-tool pack-mtimes $foo_mtimes >foo.actual &&
+ test-tool pack-mtimes $bar_mtimes >bar.actual &&
+
+ echo "$foo $(cat foo.mtime)" >foo.expect &&
+ echo "$bar $(cat bar.mtime)" >bar.expect &&
+
+ test_cmp foo.expect foo.actual &&
+ test_cmp bar.expect bar.actual &&
+ test "$foo_mtimes" != "$bar_mtimes"
+ )
+'
+
+test_expect_success 'cruft repack with pack.packSizeLimit' '
+ (
+ cd max-pack-size &&
+ # repack everything back together to remove the existing cruft
+ # pack (but to keep its objects)
+ git repack -adk &&
+ git -c pack.packSizeLimit=1M repack --cruft &&
+ # ensure the same post condition is met when --max-pack-size
+ # would otherwise be inferred from the configuration
+ find $packdir -name "*.mtimes" >cruft &&
+ test_line_count = 2 cruft &&
+ for pack in $(cat cruft)
+ do
+ test-tool pack-mtimes "$(basename $pack)" >objects &&
+ test_line_count = 1 objects || return 1
+ done
+ )
+'
+
+test_expect_success 'cruft repack respects repack.cruftWindow' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit base &&
+
+ GIT_TRACE2_EVENT=$(pwd)/event.trace \
+ git -c pack.window=1 -c repack.cruftWindow=2 repack \
+ --cruft --window=3 &&
+
+ grep "pack-objects.*--window=2.*--cruft" event.trace
+ )
+'
+
+test_expect_success 'cruft repack respects --window by default' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit base &&
+
+ GIT_TRACE2_EVENT=$(pwd)/event.trace \
+ git -c pack.window=2 repack --cruft --window=3 &&
+
+ grep "pack-objects.*--window=3.*--cruft" event.trace
+ )
+'
+
+test_expect_success 'cruft repack respects --quiet' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit base &&
+ GIT_PROGRESS_DELAY=0 git repack --cruft --quiet 2>err &&
+ test_must_be_empty err
+ )
+'
+
test_done
--
2.49.0.rc0.6.g7f120c35e9
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/5] t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
2025-03-17 23:00 [PATCH 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
2025-03-17 23:00 ` [PATCH 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
@ 2025-03-17 23:00 ` Taylor Blau
2025-03-19 14:20 ` Elijah Newren
2025-03-17 23:00 ` [PATCH 3/5] t/t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
` (3 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2025-03-17 23:00 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
Now that a number of new tests have landed in t7704, make sure that they
all make sense and are testing the things they say they are.
Things are mostly OK, but a handful of tests needed tweaks. Those tweaks
are as follows:
- Use the terms "too large" or "too small" in tests that exercise the
'--max-cruft-size' behavior. This has historically been treated as a
threshold beneath which to combine cruft packs, but that will change
in a subsequent commit. Prepare for that by using a more generic
term.
- Remove references to "--max-cruft-size" in the freshening tests.
These tests provide coverage of our ability to record updated mtimes
for objects already in cruft packs whose mtimes are upserted from
various sources (loose objects, finding that object in a new pack,
another cruft pack, etc.).
These have nothing to do with the '--max-cruft-size' feature, and in
fact none of the tests even *use* '--max-cruft-size'. Name them
appropriately to make it clear that these tests exercise freshening
behavior, not '--max-cruft-size' behavior.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t7704-repack-cruft.sh | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index cd452040ea..e6e4c2fad8 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -149,7 +149,7 @@ generate_cruft_pack () {
echo "$packdir/pack-$pack.mtimes"
}
-test_expect_success '--max-cruft-size creates new packs when above threshold' '
+test_expect_success '--max-cruft-size creates new packs when too large' '
git init max-cruft-size-large &&
(
cd max-cruft-size-large &&
@@ -173,7 +173,7 @@ test_expect_success '--max-cruft-size creates new packs when above threshold' '
)
'
-test_expect_success '--max-cruft-size combines existing packs when below threshold' '
+test_expect_success '--max-cruft-size combines existing packs when not too large' '
git init max-cruft-size-small &&
(
cd max-cruft-size-small &&
@@ -236,10 +236,10 @@ test_expect_success '--max-cruft-size combines smaller packs first' '
)
'
-test_expect_success 'setup --max-cruft-size with freshened objects' '
- git init max-cruft-size-freshen &&
+test_expect_success 'setup cruft with freshened objects' '
+ git init cruft-freshen &&
(
- cd max-cruft-size-freshen &&
+ cd cruft-freshen &&
test_commit base &&
git repack -ad &&
@@ -257,9 +257,9 @@ test_expect_success 'setup --max-cruft-size with freshened objects' '
)
'
-test_expect_success '--max-cruft-size with freshened objects (loose)' '
+test_expect_success 'cruft with freshened objects (loose)' '
(
- cd max-cruft-size-freshen &&
+ cd cruft-freshen &&
# regenerate the object, setting its mtime to be more recent
foo="$(generate_random_blob foo 64)" &&
@@ -275,9 +275,9 @@ test_expect_success '--max-cruft-size with freshened objects (loose)' '
)
'
-test_expect_success '--max-cruft-size with freshened objects (packed)' '
+test_expect_success 'cruft with freshened objects (packed)' '
(
- cd max-cruft-size-freshen &&
+ cd cruft-freshen &&
# regenerate the object and store it in a packfile,
# setting its mtime to be more recent
@@ -304,7 +304,7 @@ test_expect_success '--max-cruft-size with freshened objects (packed)' '
)
'
-test_expect_success '--max-cruft-size with freshened objects (previously cruft)' '
+test_expect_success 'multi-cruft with freshened objects (previously cruft)' '
repo="max-cruft-size-threshold" &&
test_when_finished "rm -fr $repo" &&
--
2.49.0.rc0.6.g7f120c35e9
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/5] t/t7704-repack-cruft.sh: consolidate `write_blob()`
2025-03-17 23:00 [PATCH 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
2025-03-17 23:00 ` [PATCH 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
2025-03-17 23:00 ` [PATCH 2/5] t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests Taylor Blau
@ 2025-03-17 23:00 ` Taylor Blau
2025-03-17 23:00 ` [PATCH 4/5] repack: avoid combining cruft packs with `--max-cruft-size` Taylor Blau
` (2 subsequent siblings)
5 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-17 23:00 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
A previous commit moved a handful of tests from a different script into
t7704, including one that relies on generating random blobs.
Incidentally, the original home of this test defined its own helper
"write_blob" for doing so, which is identical in function to our
"generate_random_blob" (and is slightly inferior to the latter, which
cleans up after itself).
Rewrite the test that uses "write_blob" to no longer do so and then
remove the function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t7704-repack-cruft.sh | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index e6e4c2fad8..3fd5aa6089 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -618,11 +618,6 @@ test_expect_success 'cruft repack with no reachable objects' '
)
'
-write_blob () {
- test-tool genrandom "$@" >in &&
- git hash-object -w -t blob in
-}
-
find_pack () {
for idx in $(ls $packdir/pack-*.idx)
do
@@ -641,8 +636,8 @@ test_expect_success 'cruft repack with --max-pack-size' '
test_commit base &&
# two cruft objects which exceed the maximum pack size
- foo=$(write_blob foo 1048576) &&
- bar=$(write_blob bar 1048576) &&
+ foo=$(generate_random_blob foo 1048576) &&
+ bar=$(generate_random_blob bar 1048576) &&
test-tool chmtime --get -1000 \
"$objdir/$(test_oid_to_path $foo)" >foo.mtime &&
test-tool chmtime --get -2000 \
--
2.49.0.rc0.6.g7f120c35e9
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/5] repack: avoid combining cruft packs with `--max-cruft-size`
2025-03-17 23:00 [PATCH 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
` (2 preceding siblings ...)
2025-03-17 23:00 ` [PATCH 3/5] t/t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
@ 2025-03-17 23:00 ` Taylor Blau
2025-03-17 23:00 ` [PATCH 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size` Taylor Blau
2025-03-19 22:52 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
5 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-17 23:00 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.
This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.
But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).
So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.
This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.
The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.
Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:
if (args->max_pack_size && !cruft_expiration) {
collapse_small_cruft_packs(in, args->max_pack_size, existing);
} else {
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
for_each_string_list_item(item, &existing->cruft_packs)
fprintf(in, "-%s.pack\n", item->string);
}
This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.
Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).
Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-repack.adoc | 12 ++------
builtin/repack.c | 50 ++++++---------------------------
t/t5329-pack-objects-cruft.sh | 52 +++++++++++++++++++++++++++++++++++
t/t7704-repack-cruft.sh | 8 ++----
4 files changed, 67 insertions(+), 55 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 5852a5c973..11db43b1c5 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -77,15 +77,9 @@ to the new separate pack will be written.
Only useful with `--cruft -d`.
--max-cruft-size=<n>::
- Repack cruft objects into packs as large as `<n>` bytes before
- creating new packs. As long as there are enough cruft packs
- smaller than `<n>`, repacking will cause a new cruft pack to
- be created containing objects from any combined cruft packs,
- along with any new unreachable objects. Cruft packs larger than
- `<n>` will not be modified. When the new cruft pack is larger
- than `<n>` bytes, it will be split into multiple packs, all of
- which are guaranteed to be at most `<n>` bytes in size. Only
- useful with `--cruft -d`.
+ Overrides `--max-pack-size` for cruft packs. Inherits the value of
+ `--max-pack-size` (if any) by default. See the documentation for
+ `--max-pack-size` for more details.
--expire-to=<dir>::
Write a cruft pack containing pruned objects (if any) to the
diff --git a/builtin/repack.c b/builtin/repack.c
index 75e3752353..9658f6b354 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1022,28 +1022,19 @@ static int write_filtered_pack(const struct pack_objects_args *args,
return finish_pack_objects_cmd(&cmd, names, local);
}
-static int existing_cruft_pack_cmp(const void *va, const void *vb)
-{
- struct packed_git *a = *(struct packed_git **)va;
- struct packed_git *b = *(struct packed_git **)vb;
-
- if (a->pack_size < b->pack_size)
- return -1;
- if (a->pack_size > b->pack_size)
- return 1;
- return 0;
-}
-
-static void collapse_small_cruft_packs(FILE *in, size_t max_size,
+static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED,
struct existing_packs *existing)
{
- struct packed_git **existing_cruft, *p;
+ struct packed_git *p;
struct strbuf buf = STRBUF_INIT;
- size_t total_size = 0;
- size_t existing_cruft_nr = 0;
size_t i;
- ALLOC_ARRAY(existing_cruft, existing->cruft_packs.nr);
+ /*
+ * Squelch a -Wunused-function warning while we rationalize
+ * the behavior of --max-cruft-size. This function will become
+ * used again in a future commit.
+ */
+ (void)retain_cruft_pack;
for (p = get_all_packs(the_repository); p; p = p->next) {
if (!(p->is_cruft && p->pack_local))
@@ -1056,29 +1047,7 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size,
if (!string_list_has_string(&existing->cruft_packs, buf.buf))
continue;
- if (existing_cruft_nr >= existing->cruft_packs.nr)
- BUG("too many cruft packs (found %"PRIuMAX", but knew "
- "of %"PRIuMAX")",
- (uintmax_t)existing_cruft_nr + 1,
- (uintmax_t)existing->cruft_packs.nr);
- existing_cruft[existing_cruft_nr++] = p;
- }
-
- QSORT(existing_cruft, existing_cruft_nr, existing_cruft_pack_cmp);
-
- for (i = 0; i < existing_cruft_nr; i++) {
- size_t proposed;
-
- p = existing_cruft[i];
- proposed = st_add(total_size, p->pack_size);
-
- if (proposed <= max_size) {
- total_size = proposed;
- fprintf(in, "-%s\n", pack_basename(p));
- } else {
- retain_cruft_pack(existing, p);
- fprintf(in, "%s\n", pack_basename(p));
- }
+ fprintf(in, "-%s.pack\n", buf.buf);
}
for (i = 0; i < existing->non_kept_packs.nr; i++)
@@ -1086,7 +1055,6 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size,
existing->non_kept_packs.items[i].string);
strbuf_release(&buf);
- free(existing_cruft);
}
static int write_cruft_pack(const struct pack_objects_args *args,
diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh
index 60dac8312d..25ddda5cf3 100755
--- a/t/t5329-pack-objects-cruft.sh
+++ b/t/t5329-pack-objects-cruft.sh
@@ -695,4 +695,56 @@ test_expect_success 'additional cruft blobs via gc.recentObjectsHook' '
)
'
+test_expect_success 'split cruft packs with --max-cruft-size' '
+ repo=cruft-with--max-cruft-size &&
+ test_when_finished "rm -fr $repo" &&
+
+ git init "$repo" &&
+
+ (
+ cd "$repo" &&
+
+ git config core.compression 0 &&
+
+ sz=$((1024 * 1024)) && # 1MiB
+ test-tool genrandom foo $sz >foo &&
+ test-tool genrandom bar $sz >bar &&
+ foo="$(git hash-object -w -t blob foo)" &&
+ bar="$(git hash-object -w -t blob bar)" &&
+
+ to=$packdir/pack &&
+ # Pack together foo and bar into a single 2MiB pack.
+ pack="$(git pack-objects $to <<-EOF
+ $foo
+ $bar
+ EOF
+ )" &&
+
+ # Then generate a cruft pack containing foo and bar.
+ #
+ # Generate the pack with --max-pack-size equal to the
+ # size of one object, forcing us to write two cruft
+ # packs.
+ git pack-objects --cruft --max-pack-size=$sz $to <<-EOF &&
+ -pack-$pack.pack
+ EOF
+
+ ls $packdir/pack-*.mtimes >crufts &&
+ test_line_count = 2 crufts &&
+
+ for cruft in $(cat crufts)
+ do
+ test-tool pack-mtimes "$(basename "$cruft")" || return 1
+ done >actual.raw &&
+
+ cut -d" " -f1 <actual.raw | sort >actual &&
+ sort >expect <<-EOF &&
+ $foo
+ $bar
+ EOF
+
+ test_cmp expect actual
+ )
+'
+
test_done
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 3fd5aa6089..6debad368d 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -194,7 +194,7 @@ test_expect_success '--max-cruft-size combines existing packs when not too large
)
'
-test_expect_success '--max-cruft-size combines smaller packs first' '
+test_expect_failure '--max-cruft-size combines smaller packs first' '
git init max-cruft-size-consume-small &&
(
cd max-cruft-size-consume-small &&
@@ -354,13 +354,11 @@ test_expect_success 'multi-cruft with freshened objects (previously cruft)' '
done >actual.raw &&
sort actual.raw >actual &&
- # Among the set of all cruft packs, we should see both
- # mtimes for object $foo and $bar, as well as the
+ # Among the set of all cruft packs, we should see the
+ # new mtimes for object $foo and $bar, as well as the
# single new copy of $baz.
sort >expect <<-EOF &&
- $foo $(cat foo.old)
$foo $(cat foo.new)
- $bar $(cat bar.old)
$bar $(cat bar.new)
$baz $(cat baz.old)
$quux $(cat quux.new)
--
2.49.0.rc0.6.g7f120c35e9
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size`
2025-03-17 23:00 [PATCH 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
` (3 preceding siblings ...)
2025-03-17 23:00 ` [PATCH 4/5] repack: avoid combining cruft packs with `--max-cruft-size` Taylor Blau
@ 2025-03-17 23:00 ` Taylor Blau
2025-03-18 16:30 ` Junio C Hamano
2025-03-19 14:21 ` Elijah Newren
2025-03-19 22:52 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
5 siblings, 2 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-17 23:00 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.
Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and prohibits repacking ones which are
larger.
This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.
The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).
But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size' instead.
There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-repack.adoc | 8 ++++++++
builtin/repack.c | 38 +++++++++++++++++++++++------------
t/t7704-repack-cruft.sh | 22 +++++++++++---------
3 files changed, 46 insertions(+), 22 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 11db43b1c5..8e6d61aa2f 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -81,6 +81,14 @@ to the new separate pack will be written.
`--max-pack-size` (if any) by default. See the documentation for
`--max-pack-size` for more details.
+--combine-cruft-below-size=<n>::
+ When generating cruft packs without pruning, only repack
+ existing cruft packs whose size is strictly less than `<n>`.
+ Cruft packs whose size is greater than or equal to `<n>` are
+ left as-is and not repacked. Useful when you want to avoid
+ repacking large cruft pack(s) in repositories that have many
+ and/or large unreachable objects.
+
--expire-to=<dir>::
Write a cruft pack containing pruned objects (if any) to the
directory `<dir>`. This option is useful for keeping a copy of
diff --git a/builtin/repack.c b/builtin/repack.c
index 9658f6b354..f3330ade7b 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1022,20 +1022,13 @@ static int write_filtered_pack(const struct pack_objects_args *args,
return finish_pack_objects_cmd(&cmd, names, local);
}
-static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED,
- struct existing_packs *existing)
+static void combine_small_cruft_packs(FILE *in, size_t combine_cruft_below_size,
+ struct existing_packs *existing)
{
struct packed_git *p;
struct strbuf buf = STRBUF_INIT;
size_t i;
- /*
- * Squelch a -Wunused-function warning while we rationalize
- * the behavior of --max-cruft-size. This function will become
- * used again in a future commit.
- */
- (void)retain_cruft_pack;
-
for (p = get_all_packs(the_repository); p; p = p->next) {
if (!(p->is_cruft && p->pack_local))
continue;
@@ -1047,7 +1040,12 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED,
if (!string_list_has_string(&existing->cruft_packs, buf.buf))
continue;
- fprintf(in, "-%s.pack\n", buf.buf);
+ if (p->pack_size < combine_cruft_below_size) {
+ fprintf(in, "-%s\n", pack_basename(p));
+ } else {
+ retain_cruft_pack(existing, p);
+ fprintf(in, "%s\n", pack_basename(p));
+ }
}
for (i = 0; i < existing->non_kept_packs.nr; i++)
@@ -1061,6 +1059,7 @@ static int write_cruft_pack(const struct pack_objects_args *args,
const char *destination,
const char *pack_prefix,
const char *cruft_expiration,
+ unsigned long combine_cruft_below_size,
struct string_list *names,
struct existing_packs *existing)
{
@@ -1103,8 +1102,9 @@ static int write_cruft_pack(const struct pack_objects_args *args,
in = xfdopen(cmd.in, "w");
for_each_string_list_item(item, names)
fprintf(in, "%s-%s.pack\n", pack_prefix, item->string);
- if (args->max_pack_size && !cruft_expiration) {
- collapse_small_cruft_packs(in, args->max_pack_size, existing);
+ if (combine_cruft_below_size && !cruft_expiration) {
+ combine_small_cruft_packs(in, combine_cruft_below_size,
+ existing);
} else {
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
@@ -1158,6 +1158,7 @@ int cmd_repack(int argc,
const char *opt_window_memory = NULL;
const char *opt_depth = NULL;
const char *opt_threads = NULL;
+ unsigned long combine_cruft_below_size = 0ul;
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, &pack_everything,
@@ -1170,6 +1171,9 @@ int cmd_repack(int argc,
PACK_CRUFT),
OPT_STRING(0, "cruft-expiration", &cruft_expiration, N_("approxidate"),
N_("with --cruft, expire objects older than this")),
+ OPT_MAGNITUDE(0, "combine-cruft-below-size",
+ &combine_cruft_below_size,
+ N_("with --cruft, only repack cruft packs smaller than this")),
OPT_MAGNITUDE(0, "max-cruft-size", &cruft_po_args.max_pack_size,
N_("with --cruft, limit the size of new cruft packs")),
OPT_BOOL('d', NULL, &delete_redundant,
@@ -1413,7 +1417,8 @@ int cmd_repack(int argc,
cruft_po_args.quiet = po_args.quiet;
ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix,
- cruft_expiration, &names,
+ cruft_expiration,
+ combine_cruft_below_size, &names,
&existing);
if (ret)
goto cleanup;
@@ -1440,10 +1445,17 @@ int cmd_repack(int argc,
* generate an empty pack (since every object not in the
* cruft pack generated above will have an mtime older
* than the expiration).
+ *
+ * Pretend we don't have a `--combine-cruft-below-size`
+ * argument, since we're not selectively combining
+ * anything based on size to generate the limbo cruft
+ * pack, but rather removing all cruft packs from the
+ * main repository regardless of size.
*/
ret = write_cruft_pack(&cruft_po_args, expire_to,
pack_prefix,
NULL,
+ 0ul,
&names,
&existing);
if (ret)
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 6debad368d..8aebfb45f5 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -194,10 +194,13 @@ test_expect_success '--max-cruft-size combines existing packs when not too large
)
'
-test_expect_failure '--max-cruft-size combines smaller packs first' '
- git init max-cruft-size-consume-small &&
+test_expect_success '--combine-cruft-below-size combines packs' '
+ repo=combine-cruft-below-size &&
+ test_when_finished "rm -fr $repo" &&
+
+ git init "$repo" &&
(
- cd max-cruft-size-consume-small &&
+ cd "$repo" &&
test_commit base &&
git repack -ad &&
@@ -211,11 +214,11 @@ test_expect_failure '--max-cruft-size combines smaller packs first' '
test-tool pack-mtimes "$(basename $cruft_bar)" >>expect.raw &&
sort expect.raw >expect.objects &&
- # repacking with `--max-cruft-size=2M` should combine
- # both 0.5 MiB packs together, instead of, say, one of
- # the 0.5 MiB packs with the 1.0 MiB pack
+ # Repacking with `--combine-cruft-below-size=1M`
+ # should combine both 0.5 MiB packs together, but
+ # ignore the two packs which are >= 1.0 MiB.
ls $packdir/pack-*.mtimes | sort >cruft.before &&
- git repack -d --cruft --max-cruft-size=2M &&
+ git repack -d --cruft --combine-cruft-below-size=1M &&
ls $packdir/pack-*.mtimes | sort >cruft.after &&
comm -13 cruft.before cruft.after >cruft.new &&
@@ -224,11 +227,12 @@ test_expect_failure '--max-cruft-size combines smaller packs first' '
test_line_count = 1 cruft.new &&
test_line_count = 2 cruft.removed &&
- # the two smaller packs should be rolled up first
+ # The two packs smaller than 1.0MiB should be repacked
+ # together.
printf "%s\n" $cruft_foo $cruft_bar | sort >expect.removed &&
test_cmp expect.removed cruft.removed &&
- # ...and contain the set of objects rolled up
+ # ...and contain the set of objects rolled up.
test-tool pack-mtimes "$(basename $(cat cruft.new))" >actual.raw &&
sort actual.raw >actual.objects &&
--
2.49.0.rc0.6.g7f120c35e9
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size`
2025-03-17 23:00 ` [PATCH 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size` Taylor Blau
@ 2025-03-18 16:30 ` Junio C Hamano
2025-03-19 22:40 ` Taylor Blau
2025-03-19 14:21 ` Elijah Newren
1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2025-03-18 16:30 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren
Taylor Blau <me@ttaylorr.com> writes:
> @@ -81,6 +81,14 @@ to the new separate pack will be written.
> `--max-pack-size` (if any) by default. See the documentation for
> `--max-pack-size` for more details.
>
> +--combine-cruft-below-size=<n>::
> + When generating cruft packs without pruning, only repack
> + existing cruft packs whose size is strictly less than `<n>`.
> + Cruft packs whose size is greater than or equal to `<n>` are
> + left as-is and not repacked. Useful when you want to avoid
> + repacking large cruft pack(s) in repositories that have many
> + and/or large unreachable objects.
> +
Shared with existing entries in this file, but let's strive to make
sure we explicitly mention units. --max-cruft-size=<n> is explained
to cramp below '<n>' bytes, which is great, --max-pack-size=<n> says
it accepts k/m/g suffixes and its minimum size is 1 MiB, which is
explicit enough hint that this is counted in bytes. This new entry
should hint that this is also counted in bytes.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
2025-03-17 23:00 ` [PATCH 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
@ 2025-03-19 14:20 ` Elijah Newren
0 siblings, 0 replies; 19+ messages in thread
From: Elijah Newren @ 2025-03-19 14:20 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
On Mon, Mar 17, 2025 at 4:00 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> The cruft pack feature has two primary test scripts which exercise
> various parts of it, which are:
>
> - t5329-pack-objects-cruft.sh
> - t7704-repack-cruft.sh
>
> The former is designed to test low-level pack generation mechanics at
> the 'git pack-objects --cruft'-level, which is plumbing. The latter, on
> the other hand, is designed to test the user-facing behavior through
> 'git repack --cruft', which is porcelain (under the "ancillary
> manipulators" sub-section).
>
> At some point a handful of tests which should have been added to the
> latter script were instead written to the former. This isn't a huge
> deal, but rectifying it is straightforward. Move a handful of
> 'repack'-related tests out of t5329 and into their rightful home in
> t7704.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> t/t5329-pack-objects-cruft.sh | 250 ----------------------------------
> t/t7704-repack-cruft.sh | 250 ++++++++++++++++++++++++++++++++++
> 2 files changed, 250 insertions(+), 250 deletions(-)
A quick view with --color-moved makes it easy to verify that the tests
simply moved.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/5] t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
2025-03-17 23:00 ` [PATCH 2/5] t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests Taylor Blau
@ 2025-03-19 14:20 ` Elijah Newren
0 siblings, 0 replies; 19+ messages in thread
From: Elijah Newren @ 2025-03-19 14:20 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
On Mon, Mar 17, 2025 at 4:00 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Now that a number of new tests have landed in t7704, make sure that they
> all make sense and are testing the things they say they are.
>
> Things are mostly OK, but a handful of tests needed tweaks. Those tweaks
> are as follows:
>
> - Use the terms "too large" or "too small" in tests that exercise the
> '--max-cruft-size' behavior. This has historically been treated as a
> threshold beneath which to combine cruft packs, but that will change
> in a subsequent commit. Prepare for that by using a more generic
> term.
>
> - Remove references to "--max-cruft-size" in the freshening tests.
> These tests provide coverage of our ability to record updated mtimes
> for objects already in cruft packs whose mtimes are upserted from
> various sources (loose objects, finding that object in a new pack,
> another cruft pack, etc.).
>
> These have nothing to do with the '--max-cruft-size' feature, and in
> fact none of the tests even *use* '--max-cruft-size'. Name them
> appropriately to make it clear that these tests exercise freshening
> behavior, not '--max-cruft-size' behavior.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> t/t7704-repack-cruft.sh | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
> index cd452040ea..e6e4c2fad8 100755
> --- a/t/t7704-repack-cruft.sh
> +++ b/t/t7704-repack-cruft.sh
> @@ -149,7 +149,7 @@ generate_cruft_pack () {
> echo "$packdir/pack-$pack.mtimes"
> }
>
> -test_expect_success '--max-cruft-size creates new packs when above threshold' '
> +test_expect_success '--max-cruft-size creates new packs when too large' '
> git init max-cruft-size-large &&
> (
> cd max-cruft-size-large &&
> @@ -173,7 +173,7 @@ test_expect_success '--max-cruft-size creates new packs when above threshold' '
> )
> '
>
> -test_expect_success '--max-cruft-size combines existing packs when below threshold' '
> +test_expect_success '--max-cruft-size combines existing packs when not too large' '
> git init max-cruft-size-small &&
> (
> cd max-cruft-size-small &&
> @@ -236,10 +236,10 @@ test_expect_success '--max-cruft-size combines smaller packs first' '
> )
> '
>
> -test_expect_success 'setup --max-cruft-size with freshened objects' '
> - git init max-cruft-size-freshen &&
> +test_expect_success 'setup cruft with freshened objects' '
> + git init cruft-freshen &&
> (
> - cd max-cruft-size-freshen &&
> + cd cruft-freshen &&
>
> test_commit base &&
> git repack -ad &&
> @@ -257,9 +257,9 @@ test_expect_success 'setup --max-cruft-size with freshened objects' '
> )
> '
>
> -test_expect_success '--max-cruft-size with freshened objects (loose)' '
> +test_expect_success 'cruft with freshened objects (loose)' '
> (
> - cd max-cruft-size-freshen &&
> + cd cruft-freshen &&
>
> # regenerate the object, setting its mtime to be more recent
> foo="$(generate_random_blob foo 64)" &&
> @@ -275,9 +275,9 @@ test_expect_success '--max-cruft-size with freshened objects (loose)' '
> )
> '
>
> -test_expect_success '--max-cruft-size with freshened objects (packed)' '
> +test_expect_success 'cruft with freshened objects (packed)' '
> (
> - cd max-cruft-size-freshen &&
> + cd cruft-freshen &&
>
> # regenerate the object and store it in a packfile,
> # setting its mtime to be more recent
> @@ -304,7 +304,7 @@ test_expect_success '--max-cruft-size with freshened objects (packed)' '
> )
> '
>
> -test_expect_success '--max-cruft-size with freshened objects (previously cruft)' '
> +test_expect_success 'multi-cruft with freshened objects (previously cruft)' '
> repo="max-cruft-size-threshold" &&
>
> test_when_finished "rm -fr $repo" &&
> --
> 2.49.0.rc0.6.g7f120c35e9
Using --color-words for this diff makes it easy to see the small
wording clarifications; looks good.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size`
2025-03-17 23:00 ` [PATCH 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size` Taylor Blau
2025-03-18 16:30 ` Junio C Hamano
@ 2025-03-19 14:21 ` Elijah Newren
2025-03-19 22:50 ` Taylor Blau
1 sibling, 1 reply; 19+ messages in thread
From: Elijah Newren @ 2025-03-19 14:21 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
On Mon, Mar 17, 2025 at 4:00 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> The previous commit changed the behavior of repack's '--max-cruft-size'
> to specify a cruft pack-specific override for '--max-pack-size'.
>
> Introduce a new flag, '--combine-cruft-below-size' which is a
> replacement for the old behavior of '--max-cruft-size'. This new flag
> does explicitly what it says: it combines together cruft packs which are
> smaller than a given threshold, and prohibits repacking ones which are
> larger.
To me "prohibits" suggests some kind of stronger action that
potentially persists beyond the end of this operation. Perhaps this
could be reworded to something like
s/prohibits repacking ones/leaves alone packs/ ?
> This accomplishes the original intent of '--max-cruft-size', which was
> to avoid repacking cruft packs larger than the given threshold.
>
> The new behavior is slightly different. Instead of building up small
> packs together until the threshold is met, '--combine-cruft-below-size'
> packs up *all* cruft packs smaller than the threshold. This means that
> we may make a pack much larger than the given threshold (e.g., if you
> aggregate 5 packs which are each 99 MiB in size with a threshold of 100
> MiB).
>
> But that's OK: the point isn't to restrict the size of the cruft packs
> we generate, it's to avoid working with ones that have already grown too
> large. If repositories still want to limit the size of the generated
> cruft pack(s), they may use '--max-cruft-size' instead.
...but then they wouldn't get any cruft packs being combined. Did you
mean s/instead/together with --combine-cruft-below-size/ ?
> There's some minor test fallout as a result of the slight differences in
> behavior between the old meaning of '--max-cruft-size' and the behavior
> of '--combine-cruft-below-size'. In the test which is now called
> "--combine-cruft-below-size combines packs", we need to use the new flag
> over the old one to exercise that test's intended behavior. The
> remainder of the changes there are to improve the clarity of the
> comments.
>
> Suggested-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> Documentation/git-repack.adoc | 8 ++++++++
> builtin/repack.c | 38 +++++++++++++++++++++++------------
> t/t7704-repack-cruft.sh | 22 +++++++++++---------
> 3 files changed, 46 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
> index 11db43b1c5..8e6d61aa2f 100644
> --- a/Documentation/git-repack.adoc
> +++ b/Documentation/git-repack.adoc
> @@ -81,6 +81,14 @@ to the new separate pack will be written.
> `--max-pack-size` (if any) by default. See the documentation for
> `--max-pack-size` for more details.
>
> +--combine-cruft-below-size=<n>::
> + When generating cruft packs without pruning, only repack
> + existing cruft packs whose size is strictly less than `<n>`.
> + Cruft packs whose size is greater than or equal to `<n>` are
> + left as-is and not repacked. Useful when you want to avoid
> + repacking large cruft pack(s) in repositories that have many
> + and/or large unreachable objects.
> +
Does it make sense to modify the documentation for either the
--max-cruft-szie or --combine-cruft-below-size options to suggest that
if both are used, it is recommended to make --max-cruft-size twice (or
more) the value of --combine-cruft-below-size ?
> --expire-to=<dir>::
> Write a cruft pack containing pruned objects (if any) to the
> directory `<dir>`. This option is useful for keeping a copy of
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 9658f6b354..f3330ade7b 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -1022,20 +1022,13 @@ static int write_filtered_pack(const struct pack_objects_args *args,
> return finish_pack_objects_cmd(&cmd, names, local);
> }
>
> -static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED,
> - struct existing_packs *existing)
> +static void combine_small_cruft_packs(FILE *in, size_t combine_cruft_below_size,
> + struct existing_packs *existing)
> {
> struct packed_git *p;
> struct strbuf buf = STRBUF_INIT;
> size_t i;
>
> - /*
> - * Squelch a -Wunused-function warning while we rationalize
> - * the behavior of --max-cruft-size. This function will become
> - * used again in a future commit.
> - */
> - (void)retain_cruft_pack;
> -
> for (p = get_all_packs(the_repository); p; p = p->next) {
> if (!(p->is_cruft && p->pack_local))
> continue;
> @@ -1047,7 +1040,12 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED,
> if (!string_list_has_string(&existing->cruft_packs, buf.buf))
> continue;
>
> - fprintf(in, "-%s.pack\n", buf.buf);
> + if (p->pack_size < combine_cruft_below_size) {
> + fprintf(in, "-%s\n", pack_basename(p));
> + } else {
> + retain_cruft_pack(existing, p);
> + fprintf(in, "%s\n", pack_basename(p));
> + }
> }
>
> for (i = 0; i < existing->non_kept_packs.nr; i++)
> @@ -1061,6 +1059,7 @@ static int write_cruft_pack(const struct pack_objects_args *args,
> const char *destination,
> const char *pack_prefix,
> const char *cruft_expiration,
> + unsigned long combine_cruft_below_size,
> struct string_list *names,
> struct existing_packs *existing)
> {
> @@ -1103,8 +1102,9 @@ static int write_cruft_pack(const struct pack_objects_args *args,
> in = xfdopen(cmd.in, "w");
> for_each_string_list_item(item, names)
> fprintf(in, "%s-%s.pack\n", pack_prefix, item->string);
> - if (args->max_pack_size && !cruft_expiration) {
> - collapse_small_cruft_packs(in, args->max_pack_size, existing);
> + if (combine_cruft_below_size && !cruft_expiration) {
> + combine_small_cruft_packs(in, combine_cruft_below_size,
> + existing);
> } else {
> for_each_string_list_item(item, &existing->non_kept_packs)
> fprintf(in, "-%s.pack\n", item->string);
> @@ -1158,6 +1158,7 @@ int cmd_repack(int argc,
> const char *opt_window_memory = NULL;
> const char *opt_depth = NULL;
> const char *opt_threads = NULL;
> + unsigned long combine_cruft_below_size = 0ul;
>
> struct option builtin_repack_options[] = {
> OPT_BIT('a', NULL, &pack_everything,
> @@ -1170,6 +1171,9 @@ int cmd_repack(int argc,
> PACK_CRUFT),
> OPT_STRING(0, "cruft-expiration", &cruft_expiration, N_("approxidate"),
> N_("with --cruft, expire objects older than this")),
> + OPT_MAGNITUDE(0, "combine-cruft-below-size",
> + &combine_cruft_below_size,
> + N_("with --cruft, only repack cruft packs smaller than this")),
> OPT_MAGNITUDE(0, "max-cruft-size", &cruft_po_args.max_pack_size,
> N_("with --cruft, limit the size of new cruft packs")),
> OPT_BOOL('d', NULL, &delete_redundant,
> @@ -1413,7 +1417,8 @@ int cmd_repack(int argc,
> cruft_po_args.quiet = po_args.quiet;
>
> ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix,
> - cruft_expiration, &names,
> + cruft_expiration,
> + combine_cruft_below_size, &names,
> &existing);
> if (ret)
> goto cleanup;
> @@ -1440,10 +1445,17 @@ int cmd_repack(int argc,
> * generate an empty pack (since every object not in the
> * cruft pack generated above will have an mtime older
> * than the expiration).
> + *
> + * Pretend we don't have a `--combine-cruft-below-size`
> + * argument, since we're not selectively combining
> + * anything based on size to generate the limbo cruft
> + * pack, but rather removing all cruft packs from the
> + * main repository regardless of size.
> */
> ret = write_cruft_pack(&cruft_po_args, expire_to,
> pack_prefix,
> NULL,
> + 0ul,
> &names,
> &existing);
> if (ret)
> diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
> index 6debad368d..8aebfb45f5 100755
> --- a/t/t7704-repack-cruft.sh
> +++ b/t/t7704-repack-cruft.sh
> @@ -194,10 +194,13 @@ test_expect_success '--max-cruft-size combines existing packs when not too large
> )
> '
>
> -test_expect_failure '--max-cruft-size combines smaller packs first' '
> - git init max-cruft-size-consume-small &&
> +test_expect_success '--combine-cruft-below-size combines packs' '
> + repo=combine-cruft-below-size &&
> + test_when_finished "rm -fr $repo" &&
> +
> + git init "$repo" &&
> (
> - cd max-cruft-size-consume-small &&
> + cd "$repo" &&
>
> test_commit base &&
> git repack -ad &&
> @@ -211,11 +214,11 @@ test_expect_failure '--max-cruft-size combines smaller packs first' '
> test-tool pack-mtimes "$(basename $cruft_bar)" >>expect.raw &&
> sort expect.raw >expect.objects &&
>
> - # repacking with `--max-cruft-size=2M` should combine
> - # both 0.5 MiB packs together, instead of, say, one of
> - # the 0.5 MiB packs with the 1.0 MiB pack
> + # Repacking with `--combine-cruft-below-size=1M`
> + # should combine both 0.5 MiB packs together, but
> + # ignore the two packs which are >= 1.0 MiB.
> ls $packdir/pack-*.mtimes | sort >cruft.before &&
> - git repack -d --cruft --max-cruft-size=2M &&
> + git repack -d --cruft --combine-cruft-below-size=1M &&
> ls $packdir/pack-*.mtimes | sort >cruft.after &&
>
> comm -13 cruft.before cruft.after >cruft.new &&
> @@ -224,11 +227,12 @@ test_expect_failure '--max-cruft-size combines smaller packs first' '
> test_line_count = 1 cruft.new &&
> test_line_count = 2 cruft.removed &&
>
> - # the two smaller packs should be rolled up first
> + # The two packs smaller than 1.0MiB should be repacked
> + # together.
> printf "%s\n" $cruft_foo $cruft_bar | sort >expect.removed &&
> test_cmp expect.removed cruft.removed &&
>
> - # ...and contain the set of objects rolled up
> + # ...and contain the set of objects rolled up.
> test-tool pack-mtimes "$(basename $(cat cruft.new))" >actual.raw &&
> sort actual.raw >actual.objects &&
>
> --
> 2.49.0.rc0.6.g7f120c35e9
The rest looks good to me (although I reviewed a preview and you
addressed that feedback, so perhaps not so surprising...)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size`
2025-03-18 16:30 ` Junio C Hamano
@ 2025-03-19 22:40 ` Taylor Blau
0 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-19 22:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jeff King, Elijah Newren
On Tue, Mar 18, 2025 at 09:30:14AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > @@ -81,6 +81,14 @@ to the new separate pack will be written.
> > `--max-pack-size` (if any) by default. See the documentation for
> > `--max-pack-size` for more details.
> >
> > +--combine-cruft-below-size=<n>::
> > + When generating cruft packs without pruning, only repack
> > + existing cruft packs whose size is strictly less than `<n>`.
> > + Cruft packs whose size is greater than or equal to `<n>` are
> > + left as-is and not repacked. Useful when you want to avoid
> > + repacking large cruft pack(s) in repositories that have many
> > + and/or large unreachable objects.
> > +
>
> Shared with existing entries in this file, but let's strive to make
> sure we explicitly mention units. --max-cruft-size=<n> is explained
> to cramp below '<n>' bytes, which is great, --max-pack-size=<n> says
> it accepts k/m/g suffixes and its minimum size is 1 MiB, which is
> explicit enough hint that this is counted in bytes. This new entry
> should hint that this is also counted in bytes.
Definitely an oversight on my part, I certainly agree with this
sentiment.
FWIW, '--max-cruft-size' no longer explicitly says "in bytes", but
mostly because it (a) doesn't describe `<n>` at all, because (b) it
refers readers to the documentation on `--max-pack-size`. No need to
cover it twice there.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size`
2025-03-19 14:21 ` Elijah Newren
@ 2025-03-19 22:50 ` Taylor Blau
0 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-19 22:50 UTC (permalink / raw)
To: Elijah Newren; +Cc: git, Jeff King, Junio C Hamano
On Wed, Mar 19, 2025 at 07:21:01AM -0700, Elijah Newren wrote:
> On Mon, Mar 17, 2025 at 4:00 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > The previous commit changed the behavior of repack's '--max-cruft-size'
> > to specify a cruft pack-specific override for '--max-pack-size'.
> >
> > Introduce a new flag, '--combine-cruft-below-size' which is a
> > replacement for the old behavior of '--max-cruft-size'. This new flag
> > does explicitly what it says: it combines together cruft packs which are
> > smaller than a given threshold, and prohibits repacking ones which are
> > larger.
>
> To me "prohibits" suggests some kind of stronger action that
> potentially persists beyond the end of this operation. Perhaps this
> could be reworded to something like
> s/prohibits repacking ones/leaves alone packs/ ?
Fair enough, I went with "leaves alone ones which are larger".
> > This accomplishes the original intent of '--max-cruft-size', which was
> > to avoid repacking cruft packs larger than the given threshold.
> >
> > The new behavior is slightly different. Instead of building up small
> > packs together until the threshold is met, '--combine-cruft-below-size'
> > packs up *all* cruft packs smaller than the threshold. This means that
> > we may make a pack much larger than the given threshold (e.g., if you
> > aggregate 5 packs which are each 99 MiB in size with a threshold of 100
> > MiB).
> >
> > But that's OK: the point isn't to restrict the size of the cruft packs
> > we generate, it's to avoid working with ones that have already grown too
> > large. If repositories still want to limit the size of the generated
> > cruft pack(s), they may use '--max-cruft-size' instead.
>
> ...but then they wouldn't get any cruft packs being combined. Did you
> mean s/instead/together with --combine-cruft-below-size/ ?
Oops, yeah; I meant to imply that '--max-cruft-size' was/is still an
option, not that one excludes the other. I went for s/instead//.
> > There's some minor test fallout as a result of the slight differences in
> > behavior between the old meaning of '--max-cruft-size' and the behavior
> > of '--combine-cruft-below-size'. In the test which is now called
> > "--combine-cruft-below-size combines packs", we need to use the new flag
> > over the old one to exercise that test's intended behavior. The
> > remainder of the changes there are to improve the clarity of the
> > comments.
> >
> > Suggested-by: Elijah Newren <newren@gmail.com>
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> > Documentation/git-repack.adoc | 8 ++++++++
> > builtin/repack.c | 38 +++++++++++++++++++++++------------
> > t/t7704-repack-cruft.sh | 22 +++++++++++---------
> > 3 files changed, 46 insertions(+), 22 deletions(-)
> >
> > diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
> > index 11db43b1c5..8e6d61aa2f 100644
> > --- a/Documentation/git-repack.adoc
> > +++ b/Documentation/git-repack.adoc
> > @@ -81,6 +81,14 @@ to the new separate pack will be written.
> > `--max-pack-size` (if any) by default. See the documentation for
> > `--max-pack-size` for more details.
> >
> > +--combine-cruft-below-size=<n>::
> > + When generating cruft packs without pruning, only repack
> > + existing cruft packs whose size is strictly less than `<n>`.
> > + Cruft packs whose size is greater than or equal to `<n>` are
> > + left as-is and not repacked. Useful when you want to avoid
> > + repacking large cruft pack(s) in repositories that have many
> > + and/or large unreachable objects.
> > +
>
> Does it make sense to modify the documentation for either the
> --max-cruft-szie or --combine-cruft-below-size options to suggest that
> if both are used, it is recommended to make --max-cruft-size twice (or
> more) the value of --combine-cruft-below-size ?
Ehhh... I kind of want to avoid mentioning it TBH. Part of the reason
there is that I think the explanation of "why" is a little too detailed
to spell out meaningfully in the documentation while still keeping it
concise. But more importantly I want to avoid encouraging people to use
the '--max-{pack,cruft}-size' flags to begin with, since there really is
no reason to use them outside of filesystem constraints.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size'
2025-03-17 23:00 [PATCH 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
` (4 preceding siblings ...)
2025-03-17 23:00 ` [PATCH 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size` Taylor Blau
@ 2025-03-19 22:52 ` Taylor Blau
2025-03-19 22:52 ` [PATCH v2 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
` (5 more replies)
5 siblings, 6 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-19 22:52 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
Here is a small reroll of my series to introduce a new 'repack' flag
called '--combine-cruft-below-size'. There aren't any functional changes
in this round, and the changes that do exist are limited to
documentation and commit message tweaks in the final patch.
A range-diff is included below, as well as the original cover letter:
(This is based on tb/multi-cruft-pack-refresh-fix from Junio's tree,
which is at 08f612ba70 (builtin/pack-objects.c: freshen objects from
existing cruft packs, 2025-03-13) at the time of writing).
This series replaces something close to the existing behavior of
repack's '--max-cruft-size' flag with '--combine-cruft-below-size'.
The new flag is much clearer in its intent and function, and avoids the
lack of clarity between the two that was discussed in
<cover.1741648467.git.me@ttaylorr.com>
The new behavior is as follows:
- '--max-cruft-size' is a cruft pack-specific override for repack's
'--max-pack-size' command-line flag.
- '--combine-cruft-below-size' instructs repack to only combine cruft
packs which are smaller than the given threshold. This will likely
result in packs which are larger than the threshold. But that is OK:
the point is to limit the size of the individual packs on input, not
the size of the outgoing pack.
This series does break the existing behavior of '--max-cruft-size'. But
I think breaking backwards compatibility here is OK, since the existing
behavior was so broken to begin with. I'm happy to consider other
alternatives if others feel that this isn't OK.
The series has an interesting structure that I feel may be worth calling
out. The first three patches are trivial test cleanups. The fourth patch
modifies the existing behavior of '--max-cruft-size', but does so while
keeping some of the old infrastructure around.
That may seem like an unnecessarily complicated approach, but it greatly
simplifies introducing the new behavior in the following (and final)
commit. I think that this results in a series that is a little easier to
review (since we don't see a ton of code being removed in one commit and
then re-added in another immediately following it). But if others feel
like this was a mistake, please let me know ;-).
Thanks in advance for your review!
Taylor Blau (5):
t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
t/t7704-repack-cruft.sh: consolidate `write_blob()`
repack: avoid combining cruft packs with `--max-cruft-size`
repack: begin combining cruft packs with `--combine-cruft-below-size`
Documentation/git-repack.adoc | 21 ++-
builtin/repack.c | 62 +++----
t/t5329-pack-objects-cruft.sh | 302 ++++++----------------------------
t/t7704-repack-cruft.sh | 293 ++++++++++++++++++++++++++++++---
4 files changed, 355 insertions(+), 323 deletions(-)
Range-diff against v1:
1: 0aa8aa65c1 = 1: 0aa8aa65c1 t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
2: 5e8bd3e66e = 2: 5e8bd3e66e t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
3: b075ad8601 = 3: b075ad8601 t/t7704-repack-cruft.sh: consolidate `write_blob()`
4: 7941997e33 = 4: 7941997e33 repack: avoid combining cruft packs with `--max-cruft-size`
5: 7f120c35e9 ! 5: dee780a2aa repack: begin combining cruft packs with `--combine-cruft-below-size`
@@ Commit message
Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
- smaller than a given threshold, and prohibits repacking ones which are
+ smaller than a given threshold, and leaves alone ones which are
larger.
This accomplishes the original intent of '--max-cruft-size', which was
@@ Commit message
But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
- cruft pack(s), they may use '--max-cruft-size' instead.
+ cruft pack(s), they may use '--max-cruft-size'.
There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
@@ Documentation/git-repack.adoc: to the new separate pack will be written.
+--combine-cruft-below-size=<n>::
+ When generating cruft packs without pruning, only repack
-+ existing cruft packs whose size is strictly less than `<n>`.
-+ Cruft packs whose size is greater than or equal to `<n>` are
-+ left as-is and not repacked. Useful when you want to avoid
-+ repacking large cruft pack(s) in repositories that have many
-+ and/or large unreachable objects.
++ existing cruft packs whose size is strictly less than `<n>`,
++ where `<n>` represents a number of bytes, which can optionally
++ be suffixed with "k", "m", or "g". Cruft packs whose size is
++ greater than or equal to `<n>` are left as-is and not repacked.
++ Useful when you want to avoid repacking large cruft pack(s) in
++ repositories that have many and/or large unreachable objects.
+
--expire-to=<dir>::
Write a cruft pack containing pruned objects (if any) to the
base-commit: 08f612ba7000bf181ef6d8baed9ece322e567efd
--
2.49.0.4.ge59cf92f8d
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
2025-03-19 22:52 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
@ 2025-03-19 22:52 ` Taylor Blau
2025-03-19 22:52 ` [PATCH v2 2/5] t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests Taylor Blau
` (4 subsequent siblings)
5 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-19 22:52 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
The cruft pack feature has two primary test scripts which exercise
various parts of it, which are:
- t5329-pack-objects-cruft.sh
- t7704-repack-cruft.sh
The former is designed to test low-level pack generation mechanics at
the 'git pack-objects --cruft'-level, which is plumbing. The latter, on
the other hand, is designed to test the user-facing behavior through
'git repack --cruft', which is porcelain (under the "ancillary
manipulators" sub-section).
At some point a handful of tests which should have been added to the
latter script were instead written to the former. This isn't a huge
deal, but rectifying it is straightforward. Move a handful of
'repack'-related tests out of t5329 and into their rightful home in
t7704.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t5329-pack-objects-cruft.sh | 250 ----------------------------------
t/t7704-repack-cruft.sh | 250 ++++++++++++++++++++++++++++++++++
2 files changed, 250 insertions(+), 250 deletions(-)
diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh
index b71a0aef40..60dac8312d 100755
--- a/t/t5329-pack-objects-cruft.sh
+++ b/t/t5329-pack-objects-cruft.sh
@@ -360,43 +360,6 @@ test_expect_success 'expired objects are pruned' '
)
'
-test_expect_success 'repack --cruft generates a cruft pack' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit reachable &&
- git branch -M main &&
- git checkout --orphan other &&
- test_commit unreachable &&
-
- git checkout main &&
- git branch -D other &&
- git tag -d unreachable &&
- # objects are not cruft if they are contained in the reflogs
- git reflog expire --all --expire=all &&
-
- git rev-list --objects --all --no-object-names >reachable.raw &&
- git cat-file --batch-all-objects --batch-check="%(objectname)" >objects &&
- sort <reachable.raw >reachable &&
- comm -13 reachable objects >unreachable &&
-
- git repack --cruft -d &&
-
- cruft=$(basename $(ls $packdir/pack-*.mtimes) .mtimes) &&
- pack=$(basename $(ls $packdir/pack-*.pack | grep -v $cruft) .pack) &&
-
- git show-index <$packdir/$pack.idx >actual.raw &&
- cut -f2 -d" " actual.raw | sort >actual &&
- test_cmp reachable actual &&
-
- git show-index <$packdir/$cruft.idx >actual.raw &&
- cut -f2 -d" " actual.raw | sort >actual &&
- test_cmp unreachable actual
- )
-'
-
test_expect_success 'loose objects mtimes upsert others' '
git init repo &&
test_when_finished "rm -fr repo" &&
@@ -470,219 +433,6 @@ test_expect_success 'expiring cruft objects with git gc' '
)
'
-test_expect_success 'cruft packs are not included in geometric repack' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit reachable &&
- git repack -Ad &&
- git branch -M main &&
-
- git checkout --orphan other &&
- test_commit cruft &&
- git repack -d &&
-
- git checkout main &&
- git branch -D other &&
- git tag -d cruft &&
- git reflog expire --all --expire=all &&
-
- git repack --cruft &&
-
- find $packdir -type f | sort >before &&
- git repack --geometric=2 -d &&
- find $packdir -type f | sort >after &&
-
- test_cmp before after
- )
-'
-
-test_expect_success 'repack --geometric collects once-cruft objects' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit reachable &&
- git repack -Ad &&
- git branch -M main &&
-
- git checkout --orphan other &&
- git rm -rf . &&
- test_commit --no-tag cruft &&
- cruft="$(git rev-parse HEAD)" &&
-
- git checkout main &&
- git branch -D other &&
- git reflog expire --all --expire=all &&
-
- # Pack the objects created in the previous step into a cruft
- # pack. Intentionally leave loose copies of those objects
- # around so we can pick them up in a subsequent --geometric
- # reapack.
- git repack --cruft &&
-
- # Now make those objects reachable, and ensure that they are
- # packed into the new pack created via a --geometric repack.
- git update-ref refs/heads/other $cruft &&
-
- # Without this object, the set of unpacked objects is exactly
- # the set of objects already in the cruft pack. Tweak that set
- # to ensure we do not overwrite the cruft pack entirely.
- test_commit reachable2 &&
-
- find $packdir -name "pack-*.idx" | sort >before &&
- git repack --geometric=2 -d &&
- find $packdir -name "pack-*.idx" | sort >after &&
-
- {
- git rev-list --objects --no-object-names $cruft &&
- git rev-list --objects --no-object-names reachable..reachable2
- } >want.raw &&
- sort want.raw >want &&
-
- pack=$(comm -13 before after) &&
- git show-index <$pack >objects.raw &&
-
- cut -d" " -f2 objects.raw | sort >got &&
-
- test_cmp want got
- )
-'
-
-test_expect_success 'cruft repack with no reachable objects' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit base &&
- git repack -ad &&
-
- base="$(git rev-parse base)" &&
-
- git for-each-ref --format="delete %(refname)" >in &&
- git update-ref --stdin <in &&
- git reflog expire --all --expire=all &&
- rm -fr .git/index &&
-
- git repack --cruft -d &&
-
- git cat-file -t $base
- )
-'
-
-write_blob () {
- test-tool genrandom "$@" >in &&
- git hash-object -w -t blob in
-}
-
-find_pack () {
- for idx in $(ls $packdir/pack-*.idx)
- do
- git show-index <$idx >out &&
- if grep -q "$1" out
- then
- echo $idx
- fi || return 1
- done
-}
-
-test_expect_success 'cruft repack with --max-pack-size' '
- git init max-pack-size &&
- (
- cd max-pack-size &&
- test_commit base &&
-
- # two cruft objects which exceed the maximum pack size
- foo=$(write_blob foo 1048576) &&
- bar=$(write_blob bar 1048576) &&
- test-tool chmtime --get -1000 \
- "$objdir/$(test_oid_to_path $foo)" >foo.mtime &&
- test-tool chmtime --get -2000 \
- "$objdir/$(test_oid_to_path $bar)" >bar.mtime &&
- git repack --cruft --max-pack-size=1M &&
- find $packdir -name "*.mtimes" >cruft &&
- test_line_count = 2 cruft &&
-
- foo_mtimes="$(basename $(find_pack $foo) .idx).mtimes" &&
- bar_mtimes="$(basename $(find_pack $bar) .idx).mtimes" &&
- test-tool pack-mtimes $foo_mtimes >foo.actual &&
- test-tool pack-mtimes $bar_mtimes >bar.actual &&
-
- echo "$foo $(cat foo.mtime)" >foo.expect &&
- echo "$bar $(cat bar.mtime)" >bar.expect &&
-
- test_cmp foo.expect foo.actual &&
- test_cmp bar.expect bar.actual &&
- test "$foo_mtimes" != "$bar_mtimes"
- )
-'
-
-test_expect_success 'cruft repack with pack.packSizeLimit' '
- (
- cd max-pack-size &&
- # repack everything back together to remove the existing cruft
- # pack (but to keep its objects)
- git repack -adk &&
- git -c pack.packSizeLimit=1M repack --cruft &&
- # ensure the same post condition is met when --max-pack-size
- # would otherwise be inferred from the configuration
- find $packdir -name "*.mtimes" >cruft &&
- test_line_count = 2 cruft &&
- for pack in $(cat cruft)
- do
- test-tool pack-mtimes "$(basename $pack)" >objects &&
- test_line_count = 1 objects || return 1
- done
- )
-'
-
-test_expect_success 'cruft repack respects repack.cruftWindow' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit base &&
-
- GIT_TRACE2_EVENT=$(pwd)/event.trace \
- git -c pack.window=1 -c repack.cruftWindow=2 repack \
- --cruft --window=3 &&
-
- grep "pack-objects.*--window=2.*--cruft" event.trace
- )
-'
-
-test_expect_success 'cruft repack respects --window by default' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit base &&
-
- GIT_TRACE2_EVENT=$(pwd)/event.trace \
- git -c pack.window=2 repack --cruft --window=3 &&
-
- grep "pack-objects.*--window=3.*--cruft" event.trace
- )
-'
-
-test_expect_success 'cruft repack respects --quiet' '
- git init repo &&
- test_when_finished "rm -fr repo" &&
- (
- cd repo &&
-
- test_commit base &&
- GIT_PROGRESS_DELAY=0 git repack --cruft --quiet 2>err &&
- test_must_be_empty err
- )
-'
-
test_expect_success 'cruft --local drops unreachable objects' '
git init alternate &&
git init repo &&
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 43d2947d28..cd452040ea 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -477,4 +477,254 @@ test_expect_success 'reachable packs are preferred over cruft ones' '
)
'
+test_expect_success 'repack --cruft generates a cruft pack' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit reachable &&
+ git branch -M main &&
+ git checkout --orphan other &&
+ test_commit unreachable &&
+
+ git checkout main &&
+ git branch -D other &&
+ git tag -d unreachable &&
+ # objects are not cruft if they are contained in the reflogs
+ git reflog expire --all --expire=all &&
+
+ git rev-list --objects --all --no-object-names >reachable.raw &&
+ git cat-file --batch-all-objects --batch-check="%(objectname)" >objects &&
+ sort <reachable.raw >reachable &&
+ comm -13 reachable objects >unreachable &&
+
+ git repack --cruft -d &&
+
+ cruft=$(basename $(ls $packdir/pack-*.mtimes) .mtimes) &&
+ pack=$(basename $(ls $packdir/pack-*.pack | grep -v $cruft) .pack) &&
+
+ git show-index <$packdir/$pack.idx >actual.raw &&
+ cut -f2 -d" " actual.raw | sort >actual &&
+ test_cmp reachable actual &&
+
+ git show-index <$packdir/$cruft.idx >actual.raw &&
+ cut -f2 -d" " actual.raw | sort >actual &&
+ test_cmp unreachable actual
+ )
+'
+
+test_expect_success 'cruft packs are not included in geometric repack' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit reachable &&
+ git repack -Ad &&
+ git branch -M main &&
+
+ git checkout --orphan other &&
+ test_commit cruft &&
+ git repack -d &&
+
+ git checkout main &&
+ git branch -D other &&
+ git tag -d cruft &&
+ git reflog expire --all --expire=all &&
+
+ git repack --cruft &&
+
+ find $packdir -type f | sort >before &&
+ git repack --geometric=2 -d &&
+ find $packdir -type f | sort >after &&
+
+ test_cmp before after
+ )
+'
+
+test_expect_success 'repack --geometric collects once-cruft objects' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit reachable &&
+ git repack -Ad &&
+ git branch -M main &&
+
+ git checkout --orphan other &&
+ git rm -rf . &&
+ test_commit --no-tag cruft &&
+ cruft="$(git rev-parse HEAD)" &&
+
+ git checkout main &&
+ git branch -D other &&
+ git reflog expire --all --expire=all &&
+
+ # Pack the objects created in the previous step into a cruft
+ # pack. Intentionally leave loose copies of those objects
+ # around so we can pick them up in a subsequent --geometric
+ # reapack.
+ git repack --cruft &&
+
+ # Now make those objects reachable, and ensure that they are
+ # packed into the new pack created via a --geometric repack.
+ git update-ref refs/heads/other $cruft &&
+
+ # Without this object, the set of unpacked objects is exactly
+ # the set of objects already in the cruft pack. Tweak that set
+ # to ensure we do not overwrite the cruft pack entirely.
+ test_commit reachable2 &&
+
+ find $packdir -name "pack-*.idx" | sort >before &&
+ git repack --geometric=2 -d &&
+ find $packdir -name "pack-*.idx" | sort >after &&
+
+ {
+ git rev-list --objects --no-object-names $cruft &&
+ git rev-list --objects --no-object-names reachable..reachable2
+ } >want.raw &&
+ sort want.raw >want &&
+
+ pack=$(comm -13 before after) &&
+ git show-index <$pack >objects.raw &&
+
+ cut -d" " -f2 objects.raw | sort >got &&
+
+ test_cmp want got
+ )
+'
+
+test_expect_success 'cruft repack with no reachable objects' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit base &&
+ git repack -ad &&
+
+ base="$(git rev-parse base)" &&
+
+ git for-each-ref --format="delete %(refname)" >in &&
+ git update-ref --stdin <in &&
+ git reflog expire --all --expire=all &&
+ rm -fr .git/index &&
+
+ git repack --cruft -d &&
+
+ git cat-file -t $base
+ )
+'
+
+write_blob () {
+ test-tool genrandom "$@" >in &&
+ git hash-object -w -t blob in
+}
+
+find_pack () {
+ for idx in $(ls $packdir/pack-*.idx)
+ do
+ git show-index <$idx >out &&
+ if grep -q "$1" out
+ then
+ echo $idx
+ fi || return 1
+ done
+}
+
+test_expect_success 'cruft repack with --max-pack-size' '
+ git init max-pack-size &&
+ (
+ cd max-pack-size &&
+ test_commit base &&
+
+ # two cruft objects which exceed the maximum pack size
+ foo=$(write_blob foo 1048576) &&
+ bar=$(write_blob bar 1048576) &&
+ test-tool chmtime --get -1000 \
+ "$objdir/$(test_oid_to_path $foo)" >foo.mtime &&
+ test-tool chmtime --get -2000 \
+ "$objdir/$(test_oid_to_path $bar)" >bar.mtime &&
+ git repack --cruft --max-pack-size=1M &&
+ find $packdir -name "*.mtimes" >cruft &&
+ test_line_count = 2 cruft &&
+
+ foo_mtimes="$(basename $(find_pack $foo) .idx).mtimes" &&
+ bar_mtimes="$(basename $(find_pack $bar) .idx).mtimes" &&
+ test-tool pack-mtimes $foo_mtimes >foo.actual &&
+ test-tool pack-mtimes $bar_mtimes >bar.actual &&
+
+ echo "$foo $(cat foo.mtime)" >foo.expect &&
+ echo "$bar $(cat bar.mtime)" >bar.expect &&
+
+ test_cmp foo.expect foo.actual &&
+ test_cmp bar.expect bar.actual &&
+ test "$foo_mtimes" != "$bar_mtimes"
+ )
+'
+
+test_expect_success 'cruft repack with pack.packSizeLimit' '
+ (
+ cd max-pack-size &&
+ # repack everything back together to remove the existing cruft
+ # pack (but to keep its objects)
+ git repack -adk &&
+ git -c pack.packSizeLimit=1M repack --cruft &&
+ # ensure the same post condition is met when --max-pack-size
+ # would otherwise be inferred from the configuration
+ find $packdir -name "*.mtimes" >cruft &&
+ test_line_count = 2 cruft &&
+ for pack in $(cat cruft)
+ do
+ test-tool pack-mtimes "$(basename $pack)" >objects &&
+ test_line_count = 1 objects || return 1
+ done
+ )
+'
+
+test_expect_success 'cruft repack respects repack.cruftWindow' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit base &&
+
+ GIT_TRACE2_EVENT=$(pwd)/event.trace \
+ git -c pack.window=1 -c repack.cruftWindow=2 repack \
+ --cruft --window=3 &&
+
+ grep "pack-objects.*--window=2.*--cruft" event.trace
+ )
+'
+
+test_expect_success 'cruft repack respects --window by default' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit base &&
+
+ GIT_TRACE2_EVENT=$(pwd)/event.trace \
+ git -c pack.window=2 repack --cruft --window=3 &&
+
+ grep "pack-objects.*--window=3.*--cruft" event.trace
+ )
+'
+
+test_expect_success 'cruft repack respects --quiet' '
+ git init repo &&
+ test_when_finished "rm -fr repo" &&
+ (
+ cd repo &&
+
+ test_commit base &&
+ GIT_PROGRESS_DELAY=0 git repack --cruft --quiet 2>err &&
+ test_must_be_empty err
+ )
+'
+
test_done
--
2.49.0.4.ge59cf92f8d
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 2/5] t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
2025-03-19 22:52 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
2025-03-19 22:52 ` [PATCH v2 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
@ 2025-03-19 22:52 ` Taylor Blau
2025-03-19 22:52 ` [PATCH v2 3/5] t/t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
` (3 subsequent siblings)
5 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-19 22:52 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
Now that a number of new tests have landed in t7704, make sure that they
all make sense and are testing the things they say they are.
Things are mostly OK, but a handful of tests needed tweaks. Those tweaks
are as follows:
- Use the terms "too large" or "too small" in tests that exercise the
'--max-cruft-size' behavior. This has historically been treated as a
threshold beneath which to combine cruft packs, but that will change
in a subsequent commit. Prepare for that by using a more generic
term.
- Remove references to "--max-cruft-size" in the freshening tests.
These tests provide coverage of our ability to record updated mtimes
for objects already in cruft packs whose mtimes are upserted from
various sources (loose objects, finding that object in a new pack,
another cruft pack, etc.).
These have nothing to do with the '--max-cruft-size' feature, and in
fact none of the tests even *use* '--max-cruft-size'. Name them
appropriately to make it clear that these tests exercise freshening
behavior, not '--max-cruft-size' behavior.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t7704-repack-cruft.sh | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index cd452040ea..e6e4c2fad8 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -149,7 +149,7 @@ generate_cruft_pack () {
echo "$packdir/pack-$pack.mtimes"
}
-test_expect_success '--max-cruft-size creates new packs when above threshold' '
+test_expect_success '--max-cruft-size creates new packs when too large' '
git init max-cruft-size-large &&
(
cd max-cruft-size-large &&
@@ -173,7 +173,7 @@ test_expect_success '--max-cruft-size creates new packs when above threshold' '
)
'
-test_expect_success '--max-cruft-size combines existing packs when below threshold' '
+test_expect_success '--max-cruft-size combines existing packs when not too large' '
git init max-cruft-size-small &&
(
cd max-cruft-size-small &&
@@ -236,10 +236,10 @@ test_expect_success '--max-cruft-size combines smaller packs first' '
)
'
-test_expect_success 'setup --max-cruft-size with freshened objects' '
- git init max-cruft-size-freshen &&
+test_expect_success 'setup cruft with freshened objects' '
+ git init cruft-freshen &&
(
- cd max-cruft-size-freshen &&
+ cd cruft-freshen &&
test_commit base &&
git repack -ad &&
@@ -257,9 +257,9 @@ test_expect_success 'setup --max-cruft-size with freshened objects' '
)
'
-test_expect_success '--max-cruft-size with freshened objects (loose)' '
+test_expect_success 'cruft with freshened objects (loose)' '
(
- cd max-cruft-size-freshen &&
+ cd cruft-freshen &&
# regenerate the object, setting its mtime to be more recent
foo="$(generate_random_blob foo 64)" &&
@@ -275,9 +275,9 @@ test_expect_success '--max-cruft-size with freshened objects (loose)' '
)
'
-test_expect_success '--max-cruft-size with freshened objects (packed)' '
+test_expect_success 'cruft with freshened objects (packed)' '
(
- cd max-cruft-size-freshen &&
+ cd cruft-freshen &&
# regenerate the object and store it in a packfile,
# setting its mtime to be more recent
@@ -304,7 +304,7 @@ test_expect_success '--max-cruft-size with freshened objects (packed)' '
)
'
-test_expect_success '--max-cruft-size with freshened objects (previously cruft)' '
+test_expect_success 'multi-cruft with freshened objects (previously cruft)' '
repo="max-cruft-size-threshold" &&
test_when_finished "rm -fr $repo" &&
--
2.49.0.4.ge59cf92f8d
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 3/5] t/t7704-repack-cruft.sh: consolidate `write_blob()`
2025-03-19 22:52 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
2025-03-19 22:52 ` [PATCH v2 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
2025-03-19 22:52 ` [PATCH v2 2/5] t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests Taylor Blau
@ 2025-03-19 22:52 ` Taylor Blau
2025-03-19 22:52 ` [PATCH v2 4/5] repack: avoid combining cruft packs with `--max-cruft-size` Taylor Blau
` (2 subsequent siblings)
5 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-19 22:52 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
A previous commit moved a handful of tests from a different script into
t7704, including one that relies on generating random blobs.
Incidentally, the original home of this test defined its own helper
"write_blob" for doing so, which is identical in function to our
"generate_random_blob" (and is slightly inferior to the latter, which
cleans up after itself).
Rewrite the test that uses "write_blob" to no longer do so and then
remove the function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t7704-repack-cruft.sh | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index e6e4c2fad8..3fd5aa6089 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -618,11 +618,6 @@ test_expect_success 'cruft repack with no reachable objects' '
)
'
-write_blob () {
- test-tool genrandom "$@" >in &&
- git hash-object -w -t blob in
-}
-
find_pack () {
for idx in $(ls $packdir/pack-*.idx)
do
@@ -641,8 +636,8 @@ test_expect_success 'cruft repack with --max-pack-size' '
test_commit base &&
# two cruft objects which exceed the maximum pack size
- foo=$(write_blob foo 1048576) &&
- bar=$(write_blob bar 1048576) &&
+ foo=$(generate_random_blob foo 1048576) &&
+ bar=$(generate_random_blob bar 1048576) &&
test-tool chmtime --get -1000 \
"$objdir/$(test_oid_to_path $foo)" >foo.mtime &&
test-tool chmtime --get -2000 \
--
2.49.0.4.ge59cf92f8d
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 4/5] repack: avoid combining cruft packs with `--max-cruft-size`
2025-03-19 22:52 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
` (2 preceding siblings ...)
2025-03-19 22:52 ` [PATCH v2 3/5] t/t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
@ 2025-03-19 22:52 ` Taylor Blau
2025-03-19 22:52 ` [PATCH v2 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size` Taylor Blau
2025-03-20 4:30 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Elijah Newren
5 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-19 22:52 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.
This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.
But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).
So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.
This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.
The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.
Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:
if (args->max_pack_size && !cruft_expiration) {
collapse_small_cruft_packs(in, args->max_pack_size, existing);
} else {
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
for_each_string_list_item(item, &existing->cruft_packs)
fprintf(in, "-%s.pack\n", item->string);
}
This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.
Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).
Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-repack.adoc | 12 ++------
builtin/repack.c | 50 ++++++---------------------------
t/t5329-pack-objects-cruft.sh | 52 +++++++++++++++++++++++++++++++++++
t/t7704-repack-cruft.sh | 8 ++----
4 files changed, 67 insertions(+), 55 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 5852a5c973..11db43b1c5 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -77,15 +77,9 @@ to the new separate pack will be written.
Only useful with `--cruft -d`.
--max-cruft-size=<n>::
- Repack cruft objects into packs as large as `<n>` bytes before
- creating new packs. As long as there are enough cruft packs
- smaller than `<n>`, repacking will cause a new cruft pack to
- be created containing objects from any combined cruft packs,
- along with any new unreachable objects. Cruft packs larger than
- `<n>` will not be modified. When the new cruft pack is larger
- than `<n>` bytes, it will be split into multiple packs, all of
- which are guaranteed to be at most `<n>` bytes in size. Only
- useful with `--cruft -d`.
+ Overrides `--max-pack-size` for cruft packs. Inherits the value of
+ `--max-pack-size` (if any) by default. See the documentation for
+ `--max-pack-size` for more details.
--expire-to=<dir>::
Write a cruft pack containing pruned objects (if any) to the
diff --git a/builtin/repack.c b/builtin/repack.c
index 75e3752353..9658f6b354 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1022,28 +1022,19 @@ static int write_filtered_pack(const struct pack_objects_args *args,
return finish_pack_objects_cmd(&cmd, names, local);
}
-static int existing_cruft_pack_cmp(const void *va, const void *vb)
-{
- struct packed_git *a = *(struct packed_git **)va;
- struct packed_git *b = *(struct packed_git **)vb;
-
- if (a->pack_size < b->pack_size)
- return -1;
- if (a->pack_size > b->pack_size)
- return 1;
- return 0;
-}
-
-static void collapse_small_cruft_packs(FILE *in, size_t max_size,
+static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED,
struct existing_packs *existing)
{
- struct packed_git **existing_cruft, *p;
+ struct packed_git *p;
struct strbuf buf = STRBUF_INIT;
- size_t total_size = 0;
- size_t existing_cruft_nr = 0;
size_t i;
- ALLOC_ARRAY(existing_cruft, existing->cruft_packs.nr);
+ /*
+ * Squelch a -Wunused-function warning while we rationalize
+ * the behavior of --max-cruft-size. This function will become
+ * used again in a future commit.
+ */
+ (void)retain_cruft_pack;
for (p = get_all_packs(the_repository); p; p = p->next) {
if (!(p->is_cruft && p->pack_local))
@@ -1056,29 +1047,7 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size,
if (!string_list_has_string(&existing->cruft_packs, buf.buf))
continue;
- if (existing_cruft_nr >= existing->cruft_packs.nr)
- BUG("too many cruft packs (found %"PRIuMAX", but knew "
- "of %"PRIuMAX")",
- (uintmax_t)existing_cruft_nr + 1,
- (uintmax_t)existing->cruft_packs.nr);
- existing_cruft[existing_cruft_nr++] = p;
- }
-
- QSORT(existing_cruft, existing_cruft_nr, existing_cruft_pack_cmp);
-
- for (i = 0; i < existing_cruft_nr; i++) {
- size_t proposed;
-
- p = existing_cruft[i];
- proposed = st_add(total_size, p->pack_size);
-
- if (proposed <= max_size) {
- total_size = proposed;
- fprintf(in, "-%s\n", pack_basename(p));
- } else {
- retain_cruft_pack(existing, p);
- fprintf(in, "%s\n", pack_basename(p));
- }
+ fprintf(in, "-%s.pack\n", buf.buf);
}
for (i = 0; i < existing->non_kept_packs.nr; i++)
@@ -1086,7 +1055,6 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size,
existing->non_kept_packs.items[i].string);
strbuf_release(&buf);
- free(existing_cruft);
}
static int write_cruft_pack(const struct pack_objects_args *args,
diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh
index 60dac8312d..25ddda5cf3 100755
--- a/t/t5329-pack-objects-cruft.sh
+++ b/t/t5329-pack-objects-cruft.sh
@@ -695,4 +695,56 @@ test_expect_success 'additional cruft blobs via gc.recentObjectsHook' '
)
'
+test_expect_success 'split cruft packs with --max-cruft-size' '
+ repo=cruft-with--max-cruft-size &&
+ test_when_finished "rm -fr $repo" &&
+
+ git init "$repo" &&
+
+ (
+ cd "$repo" &&
+
+ git config core.compression 0 &&
+
+ sz=$((1024 * 1024)) && # 1MiB
+ test-tool genrandom foo $sz >foo &&
+ test-tool genrandom bar $sz >bar &&
+ foo="$(git hash-object -w -t blob foo)" &&
+ bar="$(git hash-object -w -t blob bar)" &&
+
+ to=$packdir/pack &&
+ # Pack together foo and bar into a single 2MiB pack.
+ pack="$(git pack-objects $to <<-EOF
+ $foo
+ $bar
+ EOF
+ )" &&
+
+ # Then generate a cruft pack containing foo and bar.
+ #
+ # Generate the pack with --max-pack-size equal to the
+ # size of one object, forcing us to write two cruft
+ # packs.
+ git pack-objects --cruft --max-pack-size=$sz $to <<-EOF &&
+ -pack-$pack.pack
+ EOF
+
+ ls $packdir/pack-*.mtimes >crufts &&
+ test_line_count = 2 crufts &&
+
+ for cruft in $(cat crufts)
+ do
+ test-tool pack-mtimes "$(basename "$cruft")" || return 1
+ done >actual.raw &&
+
+ cut -d" " -f1 <actual.raw | sort >actual &&
+ sort >expect <<-EOF &&
+ $foo
+ $bar
+ EOF
+
+ test_cmp expect actual
+ )
+'
+
test_done
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 3fd5aa6089..6debad368d 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -194,7 +194,7 @@ test_expect_success '--max-cruft-size combines existing packs when not too large
)
'
-test_expect_success '--max-cruft-size combines smaller packs first' '
+test_expect_failure '--max-cruft-size combines smaller packs first' '
git init max-cruft-size-consume-small &&
(
cd max-cruft-size-consume-small &&
@@ -354,13 +354,11 @@ test_expect_success 'multi-cruft with freshened objects (previously cruft)' '
done >actual.raw &&
sort actual.raw >actual &&
- # Among the set of all cruft packs, we should see both
- # mtimes for object $foo and $bar, as well as the
+ # Among the set of all cruft packs, we should see the
+ # new mtimes for object $foo and $bar, as well as the
# single new copy of $baz.
sort >expect <<-EOF &&
- $foo $(cat foo.old)
$foo $(cat foo.new)
- $bar $(cat bar.old)
$bar $(cat bar.new)
$baz $(cat baz.old)
$quux $(cat quux.new)
--
2.49.0.4.ge59cf92f8d
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size`
2025-03-19 22:52 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
` (3 preceding siblings ...)
2025-03-19 22:52 ` [PATCH v2 4/5] repack: avoid combining cruft packs with `--max-cruft-size` Taylor Blau
@ 2025-03-19 22:52 ` Taylor Blau
2025-03-20 4:30 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Elijah Newren
5 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2025-03-19 22:52 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Elijah Newren
The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.
Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and leaves alone ones which are
larger.
This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.
The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).
But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size'.
There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-repack.adoc | 9 +++++++++
builtin/repack.c | 38 +++++++++++++++++++++++------------
t/t7704-repack-cruft.sh | 22 +++++++++++---------
3 files changed, 47 insertions(+), 22 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 11db43b1c5..e1cd75eebe 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -81,6 +81,15 @@ to the new separate pack will be written.
`--max-pack-size` (if any) by default. See the documentation for
`--max-pack-size` for more details.
+--combine-cruft-below-size=<n>::
+ When generating cruft packs without pruning, only repack
+ existing cruft packs whose size is strictly less than `<n>`,
+ where `<n>` represents a number of bytes, which can optionally
+ be suffixed with "k", "m", or "g". Cruft packs whose size is
+ greater than or equal to `<n>` are left as-is and not repacked.
+ Useful when you want to avoid repacking large cruft pack(s) in
+ repositories that have many and/or large unreachable objects.
+
--expire-to=<dir>::
Write a cruft pack containing pruned objects (if any) to the
directory `<dir>`. This option is useful for keeping a copy of
diff --git a/builtin/repack.c b/builtin/repack.c
index 9658f6b354..f3330ade7b 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1022,20 +1022,13 @@ static int write_filtered_pack(const struct pack_objects_args *args,
return finish_pack_objects_cmd(&cmd, names, local);
}
-static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED,
- struct existing_packs *existing)
+static void combine_small_cruft_packs(FILE *in, size_t combine_cruft_below_size,
+ struct existing_packs *existing)
{
struct packed_git *p;
struct strbuf buf = STRBUF_INIT;
size_t i;
- /*
- * Squelch a -Wunused-function warning while we rationalize
- * the behavior of --max-cruft-size. This function will become
- * used again in a future commit.
- */
- (void)retain_cruft_pack;
-
for (p = get_all_packs(the_repository); p; p = p->next) {
if (!(p->is_cruft && p->pack_local))
continue;
@@ -1047,7 +1040,12 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED,
if (!string_list_has_string(&existing->cruft_packs, buf.buf))
continue;
- fprintf(in, "-%s.pack\n", buf.buf);
+ if (p->pack_size < combine_cruft_below_size) {
+ fprintf(in, "-%s\n", pack_basename(p));
+ } else {
+ retain_cruft_pack(existing, p);
+ fprintf(in, "%s\n", pack_basename(p));
+ }
}
for (i = 0; i < existing->non_kept_packs.nr; i++)
@@ -1061,6 +1059,7 @@ static int write_cruft_pack(const struct pack_objects_args *args,
const char *destination,
const char *pack_prefix,
const char *cruft_expiration,
+ unsigned long combine_cruft_below_size,
struct string_list *names,
struct existing_packs *existing)
{
@@ -1103,8 +1102,9 @@ static int write_cruft_pack(const struct pack_objects_args *args,
in = xfdopen(cmd.in, "w");
for_each_string_list_item(item, names)
fprintf(in, "%s-%s.pack\n", pack_prefix, item->string);
- if (args->max_pack_size && !cruft_expiration) {
- collapse_small_cruft_packs(in, args->max_pack_size, existing);
+ if (combine_cruft_below_size && !cruft_expiration) {
+ combine_small_cruft_packs(in, combine_cruft_below_size,
+ existing);
} else {
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
@@ -1158,6 +1158,7 @@ int cmd_repack(int argc,
const char *opt_window_memory = NULL;
const char *opt_depth = NULL;
const char *opt_threads = NULL;
+ unsigned long combine_cruft_below_size = 0ul;
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, &pack_everything,
@@ -1170,6 +1171,9 @@ int cmd_repack(int argc,
PACK_CRUFT),
OPT_STRING(0, "cruft-expiration", &cruft_expiration, N_("approxidate"),
N_("with --cruft, expire objects older than this")),
+ OPT_MAGNITUDE(0, "combine-cruft-below-size",
+ &combine_cruft_below_size,
+ N_("with --cruft, only repack cruft packs smaller than this")),
OPT_MAGNITUDE(0, "max-cruft-size", &cruft_po_args.max_pack_size,
N_("with --cruft, limit the size of new cruft packs")),
OPT_BOOL('d', NULL, &delete_redundant,
@@ -1413,7 +1417,8 @@ int cmd_repack(int argc,
cruft_po_args.quiet = po_args.quiet;
ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix,
- cruft_expiration, &names,
+ cruft_expiration,
+ combine_cruft_below_size, &names,
&existing);
if (ret)
goto cleanup;
@@ -1440,10 +1445,17 @@ int cmd_repack(int argc,
* generate an empty pack (since every object not in the
* cruft pack generated above will have an mtime older
* than the expiration).
+ *
+ * Pretend we don't have a `--combine-cruft-below-size`
+ * argument, since we're not selectively combining
+ * anything based on size to generate the limbo cruft
+ * pack, but rather removing all cruft packs from the
+ * main repository regardless of size.
*/
ret = write_cruft_pack(&cruft_po_args, expire_to,
pack_prefix,
NULL,
+ 0ul,
&names,
&existing);
if (ret)
diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index 6debad368d..8aebfb45f5 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -194,10 +194,13 @@ test_expect_success '--max-cruft-size combines existing packs when not too large
)
'
-test_expect_failure '--max-cruft-size combines smaller packs first' '
- git init max-cruft-size-consume-small &&
+test_expect_success '--combine-cruft-below-size combines packs' '
+ repo=combine-cruft-below-size &&
+ test_when_finished "rm -fr $repo" &&
+
+ git init "$repo" &&
(
- cd max-cruft-size-consume-small &&
+ cd "$repo" &&
test_commit base &&
git repack -ad &&
@@ -211,11 +214,11 @@ test_expect_failure '--max-cruft-size combines smaller packs first' '
test-tool pack-mtimes "$(basename $cruft_bar)" >>expect.raw &&
sort expect.raw >expect.objects &&
- # repacking with `--max-cruft-size=2M` should combine
- # both 0.5 MiB packs together, instead of, say, one of
- # the 0.5 MiB packs with the 1.0 MiB pack
+ # Repacking with `--combine-cruft-below-size=1M`
+ # should combine both 0.5 MiB packs together, but
+ # ignore the two packs which are >= 1.0 MiB.
ls $packdir/pack-*.mtimes | sort >cruft.before &&
- git repack -d --cruft --max-cruft-size=2M &&
+ git repack -d --cruft --combine-cruft-below-size=1M &&
ls $packdir/pack-*.mtimes | sort >cruft.after &&
comm -13 cruft.before cruft.after >cruft.new &&
@@ -224,11 +227,12 @@ test_expect_failure '--max-cruft-size combines smaller packs first' '
test_line_count = 1 cruft.new &&
test_line_count = 2 cruft.removed &&
- # the two smaller packs should be rolled up first
+ # The two packs smaller than 1.0MiB should be repacked
+ # together.
printf "%s\n" $cruft_foo $cruft_bar | sort >expect.removed &&
test_cmp expect.removed cruft.removed &&
- # ...and contain the set of objects rolled up
+ # ...and contain the set of objects rolled up.
test-tool pack-mtimes "$(basename $(cat cruft.new))" >actual.raw &&
sort actual.raw >actual.objects &&
--
2.49.0.4.ge59cf92f8d
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size'
2025-03-19 22:52 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
` (4 preceding siblings ...)
2025-03-19 22:52 ` [PATCH v2 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size` Taylor Blau
@ 2025-03-20 4:30 ` Elijah Newren
5 siblings, 0 replies; 19+ messages in thread
From: Elijah Newren @ 2025-03-20 4:30 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano
On Wed, Mar 19, 2025 at 3:52 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Here is a small reroll of my series to introduce a new 'repack' flag
> called '--combine-cruft-below-size'. There aren't any functional changes
> in this round, and the changes that do exist are limited to
> documentation and commit message tweaks in the final patch.
>
> A range-diff is included below, as well as the original cover letter:
>
> (This is based on tb/multi-cruft-pack-refresh-fix from Junio's tree,
> which is at 08f612ba70 (builtin/pack-objects.c: freshen objects from
> existing cruft packs, 2025-03-13) at the time of writing).
>
> This series replaces something close to the existing behavior of
> repack's '--max-cruft-size' flag with '--combine-cruft-below-size'.
>
> The new flag is much clearer in its intent and function, and avoids the
> lack of clarity between the two that was discussed in
>
> <cover.1741648467.git.me@ttaylorr.com>
>
> The new behavior is as follows:
>
> - '--max-cruft-size' is a cruft pack-specific override for repack's
> '--max-pack-size' command-line flag.
>
> - '--combine-cruft-below-size' instructs repack to only combine cruft
> packs which are smaller than the given threshold. This will likely
> result in packs which are larger than the threshold. But that is OK:
> the point is to limit the size of the individual packs on input, not
> the size of the outgoing pack.
>
> This series does break the existing behavior of '--max-cruft-size'. But
> I think breaking backwards compatibility here is OK, since the existing
> behavior was so broken to begin with. I'm happy to consider other
> alternatives if others feel that this isn't OK.
>
> The series has an interesting structure that I feel may be worth calling
> out. The first three patches are trivial test cleanups. The fourth patch
> modifies the existing behavior of '--max-cruft-size', but does so while
> keeping some of the old infrastructure around.
>
> That may seem like an unnecessarily complicated approach, but it greatly
> simplifies introducing the new behavior in the following (and final)
> commit. I think that this results in a series that is a little easier to
> review (since we don't see a ton of code being removed in one commit and
> then re-added in another immediately following it). But if others feel
> like this was a mistake, please let me know ;-).
>
> Thanks in advance for your review!
>
> Taylor Blau (5):
> t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
> t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
> t/t7704-repack-cruft.sh: consolidate `write_blob()`
> repack: avoid combining cruft packs with `--max-cruft-size`
> repack: begin combining cruft packs with `--combine-cruft-below-size`
>
> Documentation/git-repack.adoc | 21 ++-
> builtin/repack.c | 62 +++----
> t/t5329-pack-objects-cruft.sh | 302 ++++++----------------------------
> t/t7704-repack-cruft.sh | 293 ++++++++++++++++++++++++++++++---
> 4 files changed, 355 insertions(+), 323 deletions(-)
>
> Range-diff against v1:
> 1: 0aa8aa65c1 = 1: 0aa8aa65c1 t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests
> 2: 5e8bd3e66e = 2: 5e8bd3e66e t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
> 3: b075ad8601 = 3: b075ad8601 t/t7704-repack-cruft.sh: consolidate `write_blob()`
> 4: 7941997e33 = 4: 7941997e33 repack: avoid combining cruft packs with `--max-cruft-size`
> 5: 7f120c35e9 ! 5: dee780a2aa repack: begin combining cruft packs with `--combine-cruft-below-size`
> @@ Commit message
> Introduce a new flag, '--combine-cruft-below-size' which is a
> replacement for the old behavior of '--max-cruft-size'. This new flag
> does explicitly what it says: it combines together cruft packs which are
> - smaller than a given threshold, and prohibits repacking ones which are
> + smaller than a given threshold, and leaves alone ones which are
> larger.
>
> This accomplishes the original intent of '--max-cruft-size', which was
> @@ Commit message
> But that's OK: the point isn't to restrict the size of the cruft packs
> we generate, it's to avoid working with ones that have already grown too
> large. If repositories still want to limit the size of the generated
> - cruft pack(s), they may use '--max-cruft-size' instead.
> + cruft pack(s), they may use '--max-cruft-size'.
>
> There's some minor test fallout as a result of the slight differences in
> behavior between the old meaning of '--max-cruft-size' and the behavior
> @@ Documentation/git-repack.adoc: to the new separate pack will be written.
>
> +--combine-cruft-below-size=<n>::
> + When generating cruft packs without pruning, only repack
> -+ existing cruft packs whose size is strictly less than `<n>`.
> -+ Cruft packs whose size is greater than or equal to `<n>` are
> -+ left as-is and not repacked. Useful when you want to avoid
> -+ repacking large cruft pack(s) in repositories that have many
> -+ and/or large unreachable objects.
> ++ existing cruft packs whose size is strictly less than `<n>`,
> ++ where `<n>` represents a number of bytes, which can optionally
> ++ be suffixed with "k", "m", or "g". Cruft packs whose size is
> ++ greater than or equal to `<n>` are left as-is and not repacked.
> ++ Useful when you want to avoid repacking large cruft pack(s) in
> ++ repositories that have many and/or large unreachable objects.
> +
> --expire-to=<dir>::
> Write a cruft pack containing pruned objects (if any) to the
>
> base-commit: 08f612ba7000bf181ef6d8baed9ece322e567efd
> --
> 2.49.0.4.ge59cf92f8d
v2 + your comments address all my feedback from v1, so this round
looks good to me.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-03-20 4:30 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-17 23:00 [PATCH 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
2025-03-17 23:00 ` [PATCH 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
2025-03-19 14:20 ` Elijah Newren
2025-03-17 23:00 ` [PATCH 2/5] t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests Taylor Blau
2025-03-19 14:20 ` Elijah Newren
2025-03-17 23:00 ` [PATCH 3/5] t/t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
2025-03-17 23:00 ` [PATCH 4/5] repack: avoid combining cruft packs with `--max-cruft-size` Taylor Blau
2025-03-17 23:00 ` [PATCH 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size` Taylor Blau
2025-03-18 16:30 ` Junio C Hamano
2025-03-19 22:40 ` Taylor Blau
2025-03-19 14:21 ` Elijah Newren
2025-03-19 22:50 ` Taylor Blau
2025-03-19 22:52 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Taylor Blau
2025-03-19 22:52 ` [PATCH v2 1/5] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
2025-03-19 22:52 ` [PATCH v2 2/5] t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests Taylor Blau
2025-03-19 22:52 ` [PATCH v2 3/5] t/t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
2025-03-19 22:52 ` [PATCH v2 4/5] repack: avoid combining cruft packs with `--max-cruft-size` Taylor Blau
2025-03-19 22:52 ` [PATCH v2 5/5] repack: begin combining cruft packs with `--combine-cruft-below-size` Taylor Blau
2025-03-20 4:30 ` [PATCH v2 0/5] repack: introduce '--combine-cruft-below-size' Elijah Newren
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).