* [PATCH 1/2] t5319: add failing test case for repack/expire
2024-07-18 19:55 [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Derrick Stolee via GitGitGadget
@ 2024-07-18 19:55 ` Derrick Stolee via GitGitGadget
2024-07-18 19:55 ` [PATCH 2/2] midx-write: revert use of --stdin-packs Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-07-18 19:55 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Git 2.45.0 included the change b7d6f23a171 (midx-write.c: use
`--stdin-packs` when repacking, 2024-04-01) which caused the 'git
multi-pack-index repack' command to use 'git pack-objects --stdin-packs'
instead of listing the objects to repack. While this change was
motivated by efficient cross-process communication and the ability to
improve delta compression, it breaks a fundamental function of the
'incremental-repack' task that is enabled by default in Scalar clones or
Git repositories that run 'git maintenance start'.
The 'incremental-repack' task performs a two-step process of the
'expire' and 'repack' subcommands of the 'git multi-pack-index' builtin.
The 'expire' command removes any pack-files listed in the
multi-pack-index but without any referenced objects. The 'repack' task
then finds a batch of pack-files to repack and sends their objects to
'git pack-objects'. Both the pack-files chosen for the batch and the
objects chosen to repack are based on the ones that the multi-pack-index
references. Objects that appear in a pack-file but have a duplicate copy
in a newer pack-file are not considered in this case. Since the
multi-pack-index references only the newest copy of an object, this
allows the next 'incremental-repack' task to remove the pack-files in
the next 'expire' task. This delay is intentional due to how Windows
handles may block deletion of files with open read handles.
However, the mentioned commit changed this behavior to divorce the set
of objects referenced by the multi-pack-index and instead use a set of
"included" and "excluded" pack-files in the 'git pack-objects' builtin.
When a pack-file is selected as "included", only the objects it contains
but are not in any "excluded" pack-files are considered for repacking.
This has led to client repositories failing to remove old pack-files as
they still have some referenced objects. This grows over time until the
point that Git is trying to repack the same pack-files over and over.
For now, create a test case that demonstrates the expected behavior, but
also fails in its final line. The setup here it attempting to recreate a
typical situation for a repository that uses a blobless partial clone.
There would be a large initial pack-file from the clone that is never
selected in the 'repack' batch. There are other pack-files that have a
combination of new objects from incremental fetches and possibly blobs
that are not connected to those incremental fetches; these blobs could
be filled in from commands like 'git checkout' or 'git blame'. The
pack-files also have some overlap on purpose so test-1 has some
duplicates in test-2 and test-2 has some duplicates in test-3.
At the end of the test, the test-2 pack-file still exists though it
should have been expired. This test will pass when reverting the
offending commit.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/t5319-multi-pack-index.sh | 55 +++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index dd09134db03..327376233c5 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -1004,6 +1004,61 @@ test_expect_success 'repack --batch-size=<large> repacks everything' '
)
'
+test_expect_failure 'repack/expire loop' '
+ git init repack-expire &&
+ test_when_finished "rm -fr repack-expire" &&
+ (
+ cd repack-expire &&
+
+ test_commit_bulk 5 &&
+
+ # Create three overlapping pack-files
+ git rev-list --objects HEAD~3 >in-1 &&
+ git rev-list --objects HEAD~4..HEAD~2 >in-2 &&
+ git rev-list --objects HEAD~3..HEAD >in-3 &&
+
+ # Create disconnected blobs
+ obj1=$(git hash-object -w in-1) &&
+ obj2=$(git hash-object -w in-2) &&
+ obj3=$(git hash-object -w in-3) &&
+
+ echo $obj2 >>in-2 &&
+ echo $obj3 >>in-3 &&
+
+ for i in $(test_seq 3)
+ do
+ git pack-objects .git/objects/pack/test-$i <in-$i \
+ || return 1
+ done &&
+
+ rm -fr .git/objects/pack/pack-* &&
+ git multi-pack-index write &&
+
+ for i in $(test_seq 3)
+ do
+ for file in $(ls .git/objects/pack/test-$i*)
+ do
+ test-tool chmtime =+$((3600*$i-25000)) $file || return 1
+ done || return 1
+ done &&
+
+ pack1=$(ls .git/objects/pack/test-1-*.pack) &&
+ pack2=$(ls .git/objects/pack/test-2-*.pack) &&
+ pack3=$(ls .git/objects/pack/test-3-*.pack) &&
+
+ # Prevent test-1 from being rewritten.
+ touch "${pack1%.pack}.keep" &&
+
+ # This repack-expire loop should repack all non-kept packs
+ # into a new pack and then delete the old packs.
+ git multi-pack-index repack &&
+ git multi-pack-index expire &&
+
+ test_path_is_missing $pack3 &&
+ test_path_is_missing $pack2
+ )
+'
+
test_expect_success 'load reverse index when missing .idx, .pack' '
git init repo &&
test_when_finished "rm -fr repo" &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 2/2] midx-write: revert use of --stdin-packs
2024-07-18 19:55 [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Derrick Stolee via GitGitGadget
2024-07-18 19:55 ` [PATCH 1/2] t5319: add failing test case for repack/expire Derrick Stolee via GitGitGadget
@ 2024-07-18 19:55 ` Derrick Stolee via GitGitGadget
2024-07-18 21:57 ` [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Junio C Hamano
2024-07-18 22:50 ` Taylor Blau
3 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2024-07-18 19:55 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
This reverts b7d6f23a171 (midx-write.c: use `--stdin-packs` when
repacking, 2024-04-01) and then marks the test created in the previous
change as passing.
The fundamental issue with the reverted change is that the focus on
pack-files separates the object selection from how the multi-pack-index
selects a single pack-file for an object ID with multiple copies among
the tracked pack-files.
The change was made with the intention of improving delta compression in
the resulting pack-file, but that can be resolved with the existing
object list mechanism. There are other potential pitfalls of doing an
object walk at this time if the repository is a blobless partial clone,
and that will require additional testing on top of the one that changes
here.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 18 +++++++++---------
t/t5319-multi-pack-index.sh | 2 +-
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 65e69d2de78..960cc46250e 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1474,8 +1474,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
- strvec_pushl(&cmd.args, "pack-objects", "--stdin-packs", "--non-empty",
- NULL);
+ strvec_push(&cmd.args, "pack-objects");
strvec_pushf(&cmd.args, "%s/pack/pack", object_dir);
@@ -1499,15 +1498,16 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
}
cmd_in = xfdopen(cmd.in, "w");
- for (i = 0; i < m->num_packs; i++) {
- struct packed_git *p = m->packs[i];
- if (!p)
+
+ for (i = 0; i < m->num_objects; i++) {
+ struct object_id oid;
+ uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
+
+ if (!include_pack[pack_int_id])
continue;
- if (include_pack[i])
- fprintf(cmd_in, "%s\n", pack_basename(p));
- else
- fprintf(cmd_in, "^%s\n", pack_basename(p));
+ nth_midxed_object_oid(&oid, m, i);
+ fprintf(cmd_in, "%s\n", oid_to_hex(&oid));
}
fclose(cmd_in);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 327376233c5..5cce0be19e6 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -1004,7 +1004,7 @@ test_expect_success 'repack --batch-size=<large> repacks everything' '
)
'
-test_expect_failure 'repack/expire loop' '
+test_expect_success 'repack/expire loop' '
git init repack-expire &&
test_when_finished "rm -fr repack-expire" &&
(
--
gitgitgadget
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
2024-07-18 19:55 [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Derrick Stolee via GitGitGadget
2024-07-18 19:55 ` [PATCH 1/2] t5319: add failing test case for repack/expire Derrick Stolee via GitGitGadget
2024-07-18 19:55 ` [PATCH 2/2] midx-write: revert use of --stdin-packs Derrick Stolee via GitGitGadget
@ 2024-07-18 21:57 ` Junio C Hamano
2024-07-18 22:38 ` Taylor Blau
2024-07-19 13:24 ` Derrick Stolee
2024-07-18 22:50 ` Taylor Blau
3 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2024-07-18 21:57 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, me, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Here is an issue I noticed while exploring issues with my local copy of a
> large monorepo. I was intending to show some engineers how nice the objects
> were maintained by background maintenance, but saw hundreds of small
> pack-files that were up to two months old. This time matched when I upgraded
> to the microsoft/git fork that included the 2.45.0 release of Git.
I almost said "wow, perfect timing on the -rc1 day", but then
realized that this is not a regression during _this_ cycle, but a
cycle ago.
You already Cc'ed Taylor, whom I would have asked for help if I were
the one who found this issue, which is good.
Will queue.
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
2024-07-18 21:57 ` [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Junio C Hamano
@ 2024-07-18 22:38 ` Taylor Blau
2024-07-19 13:23 ` Derrick Stolee
2024-07-19 13:24 ` Derrick Stolee
1 sibling, 1 reply; 12+ messages in thread
From: Taylor Blau @ 2024-07-18 22:38 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee
On Thu, Jul 18, 2024 at 02:57:55PM -0700, Junio C Hamano wrote:
> You already Cc'ed Taylor, whom I would have asked for help if I were
> the one who found this issue, which is good.
It looks like there is a small typo in my email address as
"me@ttayllorr.com" instead of "me@ttaylorr.com", but I saw the message
anyway ;-).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
2024-07-18 22:38 ` Taylor Blau
@ 2024-07-19 13:23 ` Derrick Stolee
0 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2024-07-19 13:23 UTC (permalink / raw)
To: Taylor Blau, Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git
On 7/18/24 6:38 PM, Taylor Blau wrote:
> On Thu, Jul 18, 2024 at 02:57:55PM -0700, Junio C Hamano wrote:
>> You already Cc'ed Taylor, whom I would have asked for help if I were
>> the one who found this issue, which is good.
>
> It looks like there is a small typo in my email address as
> "me@ttayllorr.com" instead of "me@ttaylorr.com", but I saw the message
> anyway ;-).
Whoops! Sorry about that. Repeated too many letters. I've fixed the
GitGitGadget cover letter in case a v2 is required.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
2024-07-18 21:57 ` [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Junio C Hamano
2024-07-18 22:38 ` Taylor Blau
@ 2024-07-19 13:24 ` Derrick Stolee
2024-07-19 15:13 ` Junio C Hamano
1 sibling, 1 reply; 12+ messages in thread
From: Derrick Stolee @ 2024-07-19 13:24 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, Taylor Blau
On 7/18/24 5:57 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> Here is an issue I noticed while exploring issues with my local copy of a
>> large monorepo. I was intending to show some engineers how nice the objects
>> were maintained by background maintenance, but saw hundreds of small
>> pack-files that were up to two months old. This time matched when I upgraded
>> to the microsoft/git fork that included the 2.45.0 release of Git.
>
> I almost said "wow, perfect timing on the -rc1 day", but then
> realized that this is not a regression during _this_ cycle, but a
> cycle ago.
I almost waited until after the release, but I wanted to put the
information out there just in case you were interested in taking it
into 2.46.0 or were planning on a 2.45.3.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
2024-07-19 13:24 ` Derrick Stolee
@ 2024-07-19 15:13 ` Junio C Hamano
2024-07-19 16:20 ` Derrick Stolee
0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2024-07-19 15:13 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git, Taylor Blau
Derrick Stolee <stolee@gmail.com> writes:
> On 7/18/24 5:57 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> Here is an issue I noticed while exploring issues with my local copy of a
>>> large monorepo. I was intending to show some engineers how nice the objects
>>> were maintained by background maintenance, but saw hundreds of small
>>> pack-files that were up to two months old. This time matched when I upgraded
>>> to the microsoft/git fork that included the 2.45.0 release of Git.
>> I almost said "wow, perfect timing on the -rc1 day", but then
>> realized that this is not a regression during _this_ cycle, but a
>> cycle ago.
>
> I almost waited until after the release, but I wanted to put the
> information out there just in case you were interested in taking it
> into 2.46.0 or were planning on a 2.45.3.
Yup, thanks but this is not exactly a repository breaking data
corruption bug, and did not look ultra urgent. Especially if we
want to pursue a solution that helps both expiring stale packs
better (which is what you are restoring) and making better delta
chain selection (which may be what you are losing) at the same time,
such a change could become a source of data corruption bug, so I'd
prefer to see it started early in a cycle, rather as a last-minute
"let's fix this too".
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
2024-07-19 15:13 ` Junio C Hamano
@ 2024-07-19 16:20 ` Derrick Stolee
0 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2024-07-19 16:20 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git, Taylor Blau
On 7/19/24 11:13 AM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> On 7/18/24 5:57 PM, Junio C Hamano wrote:
>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>> Here is an issue I noticed while exploring issues with my local copy of a
>>>> large monorepo. I was intending to show some engineers how nice the objects
>>>> were maintained by background maintenance, but saw hundreds of small
>>>> pack-files that were up to two months old. This time matched when I upgraded
>>>> to the microsoft/git fork that included the 2.45.0 release of Git.
>>> I almost said "wow, perfect timing on the -rc1 day", but then
>>> realized that this is not a regression during _this_ cycle, but a
>>> cycle ago.
>>
>> I almost waited until after the release, but I wanted to put the
>> information out there just in case you were interested in taking it
>> into 2.46.0 or were planning on a 2.45.3.
>
> Yup, thanks but this is not exactly a repository breaking data
> corruption bug, and did not look ultra urgent. Especially if we
> want to pursue a solution that helps both expiring stale packs
> better (which is what you are restoring) and making better delta
> chain selection (which may be what you are losing) at the same time,
> such a change could become a source of data corruption bug, so I'd
> prefer to see it started early in a cycle, rather as a last-minute
> "let's fix this too".
I agree with this assessment. The microsoft/git fork has taken the
revert as users of that fork have a higher chance of being affected.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
2024-07-18 19:55 [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2024-07-18 21:57 ` [PATCH 0/2] Fix background maintenance regression in Git 2.45.0 Junio C Hamano
@ 2024-07-18 22:50 ` Taylor Blau
2024-07-19 13:21 ` Derrick Stolee
3 siblings, 1 reply; 12+ messages in thread
From: Taylor Blau @ 2024-07-18 22:50 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Taylor Blau, Derrick Stolee
On Thu, Jul 18, 2024 at 07:55:44PM +0000, Derrick Stolee via GitGitGadget wrote:
> The issue is that 'git multi-pack-index repack' was taught to call 'git
> pack-objects' with the new '--stdin-packs' option. However, this changes the
> object selection algorithm. Instead of using the objects referenced by the
> multi-pack-index, it compares pack-files using a list of "included" and
> "excluded" pack-files. This loses some granularity of how the
> multi-pack-index chooses among duplicate objects.
Thank you for looking at this so carefully! Let me double check my own
understanding.
Suppose we have a MIDX with some pack 'P' and say, some commit object
'C' which appears in that pack. Let's also suppose we have another pack
'Q' in the same MIDX which also contains 'C', but the MIDX selected its
copy from pack 'P'.
If we want to combine 'P' with some other packs (excluding 'Q'), then
the input to --stdin-packs will look something like:
P.pack
^Q.pack
...
And the resulting pack would not contain 'C', since we would reject it
via: add_object_entry_from_pack() -> want_object_in_pack() ->
want_found_object() -> has_object_kept_pack(). The final function there
would find a copy of 'C' in 'Q', and since 'Q' is excluded, we would
reject 'C' as unwanted.
So the resulting pack would not contain 'C', and the MIDX would hold
onto its copy from pack 'P', resulting in 'P' being both (a) in the set
of packs to repack together, but also (b) non-expireable, since it has
at least one object selected from it in the MIDX.
> The end result is that some objects that would normally have been included
> in the new pack-file are no longer included. The copy that the
> multi-pack-index references is in the pack-file that was intended to be
> repacked, so that pack-file cannot be expired in the next 'git
> multi-pack-index expire' step and is included again in the batch of objects
> to repack.
I think this matches my own understanding, but let me know if I'm
missing something. Assuming I'm thinking about this the same way you
are, the fix (stop using --stdin-packss) makes sense to me.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
2024-07-18 22:50 ` Taylor Blau
@ 2024-07-19 13:21 ` Derrick Stolee
2024-07-19 13:38 ` Taylor Blau
0 siblings, 1 reply; 12+ messages in thread
From: Derrick Stolee @ 2024-07-19 13:21 UTC (permalink / raw)
To: Taylor Blau, Derrick Stolee via GitGitGadget; +Cc: git, gitster
On 7/18/24 6:50 PM, Taylor Blau wrote:
> On Thu, Jul 18, 2024 at 07:55:44PM +0000, Derrick Stolee via GitGitGadget wrote:
>> The issue is that 'git multi-pack-index repack' was taught to call 'git
>> pack-objects' with the new '--stdin-packs' option. However, this changes the
>> object selection algorithm. Instead of using the objects referenced by the
>> multi-pack-index, it compares pack-files using a list of "included" and
>> "excluded" pack-files. This loses some granularity of how the
>> multi-pack-index chooses among duplicate objects.
>
> Thank you for looking at this so carefully! Let me double check my own
> understanding.
>
> Suppose we have a MIDX with some pack 'P' and say, some commit object
> 'C' which appears in that pack. Let's also suppose we have another pack
> 'Q' in the same MIDX which also contains 'C', but the MIDX selected its
> copy from pack 'P'.
>
> If we want to combine 'P' with some other packs (excluding 'Q'), then
> the input to --stdin-packs will look something like:
>
> P.pack
> ^Q.pack
> ...
>
> And the resulting pack would not contain 'C', since we would reject it
> via: add_object_entry_from_pack() -> want_object_in_pack() ->
> want_found_object() -> has_object_kept_pack(). The final function there
> would find a copy of 'C' in 'Q', and since 'Q' is excluded, we would
> reject 'C' as unwanted.
>
> So the resulting pack would not contain 'C', and the MIDX would hold
> onto its copy from pack 'P', resulting in 'P' being both (a) in the set
> of packs to repack together, but also (b) non-expireable, since it has
> at least one object selected from it in the MIDX.
>
>> The end result is that some objects that would normally have been included
>> in the new pack-file are no longer included. The copy that the
>> multi-pack-index references is in the pack-file that was intended to be
>> repacked, so that pack-file cannot be expired in the next 'git
>> multi-pack-index expire' step and is included again in the batch of objects
>> to repack.
>
> I think this matches my own understanding, but let me know if I'm
> missing something. Assuming I'm thinking about this the same way you
> are, the fix (stop using --stdin-packss) makes sense to me.
Your interpretation matches mine. Thanks for the careful read.
I think we can accomplish similar goals that match the reasoning for
--stdin-packs (better deltas while also limiting the object walk to the
repacked objects) with some changes to read_object_list_from_stdin(),
but that's a more subtle kind of change.
Taking this change as-is will cause a regression in the quality of
delta choices, but recovers our ability to expire "repacked" pack-files
in all cases.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0
2024-07-19 13:21 ` Derrick Stolee
@ 2024-07-19 13:38 ` Taylor Blau
0 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2024-07-19 13:38 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git, gitster
On Fri, Jul 19, 2024 at 09:21:51AM -0400, Derrick Stolee wrote:
> On 7/18/24 6:50 PM, Taylor Blau wrote:
>
> > I think this matches my own understanding, but let me know if I'm
> > missing something. Assuming I'm thinking about this the same way you
> > are, the fix (stop using --stdin-packss) makes sense to me.
>
> Your interpretation matches mine. Thanks for the careful read.
>
> I think we can accomplish similar goals that match the reasoning for
> --stdin-packs (better deltas while also limiting the object walk to the
> repacked objects) with some changes to read_object_list_from_stdin(),
> but that's a more subtle kind of change.
FWIW, the main motivation for that change was to limit the amount of
cross-process I/O that was necessary to generate the new pack. I figured
that for relatively small amounts of packs which contain relatively
large amounts of objects that it would be more efficient to write out
the pack names than the object names.
I was thinking a little bit about how we would alter the behavior of
'--stdin-packs' to match what the 'multi-pack-index repack' caller
needs. I agree that it is possible, and I doubly agree that it is subtle
;-).
TBH, I think that the amount of I/O we're potentially saving is dwarfed
by the amount of I/O and CPU time it takes to actually generate the new
pack, so I doubt the effort to make such a subtle change would be all
that worthwhile, though certainly an interesting exercise ;-).
> Taking this change as-is will cause a regression in the quality of
> delta choices, but recovers our ability to expire "repacked" pack-files
> in all cases.
Yeah, definitely.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 12+ messages in thread