From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttayllorr.com,
Derrick Stolee <stolee@gmail.com>,
Derrick Stolee <stolee@gmail.com>
Subject: [PATCH 1/2] t5319: add failing test case for repack/expire
Date: Thu, 18 Jul 2024 19:55:45 +0000 [thread overview]
Message-ID: <9b8e2012c9107f99e19c541113ae6a405e38a92f.1721332546.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1764.git.1721332546.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
Git 2.45.0 included the change b7d6f23a171 (midx-write.c: use
`--stdin-packs` when repacking, 2024-04-01) which caused the 'git
multi-pack-index repack' command to use 'git pack-objects --stdin-packs'
instead of listing the objects to repack. While this change was
motivated by efficient cross-process communication and the ability to
improve delta compression, it breaks a fundamental function of the
'incremental-repack' task that is enabled by default in Scalar clones or
Git repositories that run 'git maintenance start'.
The 'incremental-repack' task performs a two-step process of the
'expire' and 'repack' subcommands of the 'git multi-pack-index' builtin.
The 'expire' command removes any pack-files listed in the
multi-pack-index but without any referenced objects. The 'repack' task
then finds a batch of pack-files to repack and sends their objects to
'git pack-objects'. Both the pack-files chosen for the batch and the
objects chosen to repack are based on the ones that the multi-pack-index
references. Objects that appear in a pack-file but have a duplicate copy
in a newer pack-file are not considered in this case. Since the
multi-pack-index references only the newest copy of an object, this
allows the next 'incremental-repack' task to remove the pack-files in
the next 'expire' task. This delay is intentional due to how Windows
handles may block deletion of files with open read handles.
However, the mentioned commit changed this behavior to divorce the set
of objects referenced by the multi-pack-index and instead use a set of
"included" and "excluded" pack-files in the 'git pack-objects' builtin.
When a pack-file is selected as "included", only the objects it contains
but are not in any "excluded" pack-files are considered for repacking.
This has led to client repositories failing to remove old pack-files as
they still have some referenced objects. This grows over time until the
point that Git is trying to repack the same pack-files over and over.
For now, create a test case that demonstrates the expected behavior, but
also fails in its final line. The setup here it attempting to recreate a
typical situation for a repository that uses a blobless partial clone.
There would be a large initial pack-file from the clone that is never
selected in the 'repack' batch. There are other pack-files that have a
combination of new objects from incremental fetches and possibly blobs
that are not connected to those incremental fetches; these blobs could
be filled in from commands like 'git checkout' or 'git blame'. The
pack-files also have some overlap on purpose so test-1 has some
duplicates in test-2 and test-2 has some duplicates in test-3.
At the end of the test, the test-2 pack-file still exists though it
should have been expired. This test will pass when reverting the
offending commit.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/t5319-multi-pack-index.sh | 55 +++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index dd09134db03..327376233c5 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -1004,6 +1004,61 @@ test_expect_success 'repack --batch-size=<large> repacks everything' '
)
'
+test_expect_failure 'repack/expire loop' '
+ git init repack-expire &&
+ test_when_finished "rm -fr repack-expire" &&
+ (
+ cd repack-expire &&
+
+ test_commit_bulk 5 &&
+
+ # Create three overlapping pack-files
+ git rev-list --objects HEAD~3 >in-1 &&
+ git rev-list --objects HEAD~4..HEAD~2 >in-2 &&
+ git rev-list --objects HEAD~3..HEAD >in-3 &&
+
+ # Create disconnected blobs
+ obj1=$(git hash-object -w in-1) &&
+ obj2=$(git hash-object -w in-2) &&
+ obj3=$(git hash-object -w in-3) &&
+
+ echo $obj2 >>in-2 &&
+ echo $obj3 >>in-3 &&
+
+ for i in $(test_seq 3)
+ do
+ git pack-objects .git/objects/pack/test-$i <in-$i \
+ || return 1
+ done &&
+
+ rm -fr .git/objects/pack/pack-* &&
+ git multi-pack-index write &&
+
+ for i in $(test_seq 3)
+ do
+ for file in $(ls .git/objects/pack/test-$i*)
+ do
+ test-tool chmtime =+$((3600*$i-25000)) $file || return 1
+ done || return 1
+ done &&
+
+ pack1=$(ls .git/objects/pack/test-1-*.pack) &&
+ pack2=$(ls .git/objects/pack/test-2-*.pack) &&
+ pack3=$(ls .git/objects/pack/test-3-*.pack) &&
+
+ # Prevent test-1 from being rewritten.
+ touch "${pack1%.pack}.keep" &&
+
+ # This repack-expire loop should repack all non-kept packs
+ # into a new pack and then delete the old packs.
+ git multi-pack-index repack &&
+ git multi-pack-index expire &&
+
+ test_path_is_missing $pack3 &&
+ test_path_is_missing $pack2
+ )
+'
+
test_expect_success 'load reverse index when missing .idx, .pack' '
git init repo &&
test_when_finished "rm -fr repo" &&
--
gitgitgadget
next prev parent reply other threads:[~2024-07-18 19:55 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-18 19:55 [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Derrick Stolee via GitGitGadget
2024-07-18 19:55 ` Derrick Stolee via GitGitGadget [this message]
2024-07-18 19:55 ` [PATCH 2/2] midx-write: revert use of --stdin-packs Derrick Stolee via GitGitGadget
2024-07-18 21:57 ` [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Junio C Hamano
2024-07-18 22:38 ` Taylor Blau
2024-07-19 13:23 ` Derrick Stolee
2024-07-19 13:24 ` Derrick Stolee
2024-07-19 15:13 ` Junio C Hamano
2024-07-19 16:20 ` Derrick Stolee
2024-07-18 22:50 ` Taylor Blau
2024-07-19 13:21 ` Derrick Stolee
2024-07-19 13:38 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9b8e2012c9107f99e19c541113ae6a405e38a92f.1721332546.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttayllorr.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).