From: Derrick Stolee <stolee@gmail.com>
To: Taylor Blau <me@ttaylorr.com>,
Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com,
johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im,
johncai86@gmail.com, newren@gmail.com
Subject: Re: [PATCH 5/7] p5313: add size comparison test
Date: Fri, 22 Nov 2024 10:26:19 -0500 [thread overview]
Message-ID: <233d4261-64b3-4149-bb6f-aeba66834c1c@gmail.com> (raw)
In-Reply-To: <Zz+YrvL8h0Cxwqfy@nand.local>
On 11/21/24 3:31 PM, Taylor Blau wrote:
> On Tue, Nov 05, 2024 at 03:05:05AM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>> The thin pack that simulates a push is much worse with --full-name-hash
>> in this case. The name hash values are doing a lot to assist with delta
>> bases, it seems. The big pack and shallow clone cases are slightly worse
>> with the --full-name-hash option. Only the full repack gains some
>> benefits in size.
>
> Not a problem with your patch, but just thinking aloud: do you think
> there is an easy/straightforward way to suggest when to use
> --full-name-hash or not?
The kinds of heuristics I would use are:
1. Are there enough commits that enough files have enough versions
across history that it's very important to keep deltas within a path?
2. Is the repository at least 500MB such that there is actually room for
a "meaningful" change in size?
3. Are there a lot of name-hash collisions? (The last patch in the series
helps do this through a test-helper, but isn't something we can expect
end users to check themselves.)
>> + cat >in-shallow <<-EOF
>> + $(git rev-parse HEAD)
>> + --shallow $(git rev-parse HEAD)
>> + EOF
>> +'
>
> I was going to comment that these could probably be moved into the
> individual perf test that cares about reading each of these inputs. But
> having them shared here makes sense since we are naturally comparing
> generating two packs with the same input (with and without
> --full-name-hash). So the shared setup here makes sense to me.
I also wanted to avoid having these commands be part of the time
measurement, even if they are extremely small.
>> +
>> +test_perf 'thin pack' '
>> + git pack-objects --thin --stdout --revs --sparse <in-thin >out
>> +'
>> +
>> +test_size 'thin pack size' '
>> + test_file_size out
>> +'
>
> Nice. I always forget about this and end up writing 'wc -c <out'.
I believe this is a Junio recommendation from an earlier version.
>> +test_size 'repack size' '
>> + pack=$(ls .git/objects/pack/pack-*.pack) &&
>> + test_file_size "$pack"
>
> Here and below, I think it's fine to inline this as in:
>
> test_file_size "$(ls .git/objects/pack/pack-*.pack)"
Generally I prefer to split things into stages so the verbose output
provides a clear definition of the value when calling the Git command.
> ...but I wonder: will using ".git" break this test in bare repositories?
> Should we write instead:
>
> pack="$(ls $(git rev-parse --git-dir)/objects/pack/pack-*.pack)" &&
> test_file_size
>
> ?
While this would break a bare repo, the perf lib makes a bare repo be
copied into a non-bare repo as follows:
test_perf_copy_repo_contents () {
for stuff in "$1"/*
do
case "$stuff" in
*/objects|*/hooks|*/config|*/commondir|*/gitdir|*/worktrees|*/fsmonitor--daemon*)
;;
*)
cp -R "$stuff" "$repo/.git/" || exit 1
;;
esac
done
}
I'll still add the `git rev-parse` suggestion because it's safest.
Thanks,
-Stolee
next prev parent reply other threads:[~2024-11-22 15:26 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-05 3:05 [PATCH 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee via GitGitGadget
2024-11-05 3:05 ` [PATCH 1/7] pack-objects: add --full-name-hash option Derrick Stolee via GitGitGadget
2024-11-21 20:08 ` Taylor Blau
2024-11-21 21:35 ` Taylor Blau
2024-11-21 23:32 ` Junio C Hamano
2024-11-22 11:46 ` Derrick Stolee
2024-11-22 11:59 ` Derrick Stolee
2024-11-26 8:26 ` Patrick Steinhardt
2024-11-05 3:05 ` [PATCH 2/7] repack: " Derrick Stolee via GitGitGadget
2024-11-21 20:12 ` Taylor Blau
2024-11-22 12:07 ` Derrick Stolee
2024-11-05 3:05 ` [PATCH 3/7] pack-objects: add GIT_TEST_FULL_NAME_HASH Derrick Stolee via GitGitGadget
2024-11-21 20:15 ` Taylor Blau
2024-11-22 12:09 ` Derrick Stolee
2024-11-22 1:13 ` Jonathan Tan
2024-11-22 3:23 ` Junio C Hamano
2024-11-22 18:01 ` Jonathan Tan
2024-11-25 0:39 ` Junio C Hamano
2024-11-25 19:45 ` Jonathan Tan
2024-11-26 1:29 ` Junio C Hamano
2024-11-26 8:26 ` Patrick Steinhardt
2024-11-05 3:05 ` [PATCH 4/7] git-repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-11-21 20:17 ` Taylor Blau
2024-11-22 15:26 ` Derrick Stolee
2024-11-05 3:05 ` [PATCH 5/7] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-11-21 20:31 ` Taylor Blau
2024-11-22 15:26 ` Derrick Stolee [this message]
2024-11-26 8:26 ` Patrick Steinhardt
2024-11-05 3:05 ` [PATCH 6/7] pack-objects: disable --full-name-hash when shallow Derrick Stolee via GitGitGadget
2024-11-21 20:33 ` Taylor Blau
2024-11-22 15:27 ` Derrick Stolee
2024-11-05 3:05 ` [PATCH 7/7] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-11-21 20:42 ` Taylor Blau
2024-11-22 1:23 ` Jonathan Tan
2024-11-21 23:50 ` [PATCH 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Jonathan Tan
2024-11-22 3:01 ` Junio C Hamano
2024-11-22 4:22 ` Junio C Hamano
2024-11-22 15:27 ` Derrick Stolee
2024-11-24 23:57 ` Junio C Hamano
2024-11-22 18:05 ` Jonathan Tan
2024-12-02 23:21 ` [PATCH v2 0/8] " Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 1/8] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2024-12-04 20:06 ` karthik nayak
2024-12-04 21:05 ` Junio C Hamano
2024-12-05 9:46 ` karthik nayak
2024-12-09 23:15 ` Jonathan Tan
2024-12-10 0:01 ` Junio C Hamano
2024-12-02 23:21 ` [PATCH v2 2/8] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2024-12-04 20:53 ` karthik nayak
2024-12-02 23:21 ` [PATCH v2 3/8] repack: " Derrick Stolee via GitGitGadget
2024-12-04 21:15 ` karthik nayak
2024-12-02 23:21 ` [PATCH v2 4/8] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2024-12-04 21:21 ` karthik nayak
2024-12-09 23:12 ` Jonathan Tan
2024-12-20 17:03 ` Derrick Stolee
2024-12-02 23:21 ` [PATCH v2 5/8] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 6/8] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 7/8] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 8/8] pack-objects: add third name hash version Derrick Stolee via GitGitGadget
2024-12-03 3:23 ` [PATCH v2 0/8] pack-objects: Create an alternative name hash algorithm (recreated) Junio C Hamano
2024-12-04 4:56 ` Derrick Stolee
2024-12-04 5:02 ` Junio C Hamano
2024-12-20 17:19 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2024-12-20 17:19 ` [PATCH v3 1/8] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2025-01-22 22:08 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 2/8] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2025-01-22 22:17 ` Taylor Blau
2025-01-24 17:29 ` Derrick Stolee
2024-12-20 17:19 ` [PATCH v3 3/8] repack: " Derrick Stolee via GitGitGadget
2025-01-22 22:18 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 4/8] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2025-01-22 22:20 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 5/8] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-12-20 17:19 ` [PATCH v3 6/8] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-12-20 17:19 ` [PATCH v3 7/8] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2025-01-22 22:22 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 8/8] pack-objects: add third name hash version Derrick Stolee via GitGitGadget
2025-01-22 22:37 ` Taylor Blau
2025-01-24 17:34 ` Derrick Stolee
2025-01-21 20:21 ` [PATCH v3 0/8] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee
2025-01-22 23:28 ` Taylor Blau
2025-01-24 17:45 ` Derrick Stolee
2025-01-27 19:02 ` [PATCH v4 0/7] " Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 1/7] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 2/7] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2025-01-27 21:18 ` Junio C Hamano
2025-01-29 13:38 ` Derrick Stolee
2025-01-27 19:02 ` [PATCH v4 3/7] repack: " Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 4/7] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 5/7] p5313: add size comparison test Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 6/7] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 7/7] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2025-01-31 21:39 ` [PATCH v4 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=233d4261-64b3-4149-bb6f-aeba66834c1c@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=johncai86@gmail.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).