git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Taylor Blau <me@ttaylorr.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com,
	johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im,
	johncai86@gmail.com, newren@gmail.com
Subject: Re: [PATCH 5/7] p5313: add size comparison test
Date: Fri, 22 Nov 2024 10:26:19 -0500	[thread overview]
Message-ID: <233d4261-64b3-4149-bb6f-aeba66834c1c@gmail.com> (raw)
In-Reply-To: <Zz+YrvL8h0Cxwqfy@nand.local>

On 11/21/24 3:31 PM, Taylor Blau wrote:
> On Tue, Nov 05, 2024 at 03:05:05AM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>

>> The thin pack that simulates a push is much worse with --full-name-hash
>> in this case. The name hash values are doing a lot to assist with delta
>> bases, it seems. The big pack and shallow clone cases are slightly worse
>> with the --full-name-hash option. Only the full repack gains some
>> benefits in size.
> 
> Not a problem with your patch, but just thinking aloud: do you think
> there is an easy/straightforward way to suggest when to use
> --full-name-hash or not?

The kinds of heuristics I would use are:

1. Are there enough commits that enough files have enough versions
    across history that it's very important to keep deltas within a path?

2. Is the repository at least 500MB such that there is actually room for
    a "meaningful" change in size?

3. Are there a lot of name-hash collisions? (The last patch in the series
    helps do this through a test-helper, but isn't something we can expect
    end users to check themselves.)


>> +	cat >in-shallow <<-EOF
>> +	$(git rev-parse HEAD)
>> +	--shallow $(git rev-parse HEAD)
>> +	EOF
>> +'
> 
> I was going to comment that these could probably be moved into the
> individual perf test that cares about reading each of these inputs. But
> having them shared here makes sense since we are naturally comparing
> generating two packs with the same input (with and without
> --full-name-hash). So the shared setup here makes sense to me.

I also wanted to avoid having these commands be part of the time
measurement, even if they are extremely small.

>> +
>> +test_perf 'thin pack' '
>> +	git pack-objects --thin --stdout --revs --sparse  <in-thin >out
>> +'
>> +
>> +test_size 'thin pack size' '
>> +	test_file_size out
>> +'
> 
> Nice. I always forget about this and end up writing 'wc -c <out'.

I believe this is a Junio recommendation from an earlier version.

>> +test_size 'repack size' '
>> +	pack=$(ls .git/objects/pack/pack-*.pack) &&
>> +	test_file_size "$pack"
> 
> Here and below, I think it's fine to inline this as in:
> 
>      test_file_size "$(ls .git/objects/pack/pack-*.pack)"

Generally I prefer to split things into stages so the verbose output
provides a clear definition of the value when calling the Git command.

> ...but I wonder: will using ".git" break this test in bare repositories?
> Should we write instead:
> 
>      pack="$(ls $(git rev-parse --git-dir)/objects/pack/pack-*.pack)" &&
>      test_file_size
> 
> ?
While this would break a bare repo, the perf lib makes a bare repo be
copied into a non-bare repo as follows:

test_perf_copy_repo_contents () {
	for stuff in "$1"/*
	do
		case "$stuff" in
		*/objects|*/hooks|*/config|*/commondir|*/gitdir|*/worktrees|*/fsmonitor--daemon*)
			;;
		*)
			cp -R "$stuff" "$repo/.git/" || exit 1
			;;
		esac
	done
}

I'll still add the `git rev-parse` suggestion because it's safest.

Thanks,
-Stolee


  reply	other threads:[~2024-11-22 15:26 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-05  3:05 [PATCH 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee via GitGitGadget
2024-11-05  3:05 ` [PATCH 1/7] pack-objects: add --full-name-hash option Derrick Stolee via GitGitGadget
2024-11-21 20:08   ` Taylor Blau
2024-11-21 21:35     ` Taylor Blau
2024-11-21 23:32       ` Junio C Hamano
2024-11-22 11:46       ` Derrick Stolee
2024-11-22 11:59     ` Derrick Stolee
2024-11-26  8:26   ` Patrick Steinhardt
2024-11-05  3:05 ` [PATCH 2/7] repack: " Derrick Stolee via GitGitGadget
2024-11-21 20:12   ` Taylor Blau
2024-11-22 12:07     ` Derrick Stolee
2024-11-05  3:05 ` [PATCH 3/7] pack-objects: add GIT_TEST_FULL_NAME_HASH Derrick Stolee via GitGitGadget
2024-11-21 20:15   ` Taylor Blau
2024-11-22 12:09     ` Derrick Stolee
2024-11-22  1:13   ` Jonathan Tan
2024-11-22  3:23     ` Junio C Hamano
2024-11-22 18:01       ` Jonathan Tan
2024-11-25  0:39         ` Junio C Hamano
2024-11-25 19:45           ` Jonathan Tan
2024-11-26  1:29             ` Junio C Hamano
2024-11-26  8:26   ` Patrick Steinhardt
2024-11-05  3:05 ` [PATCH 4/7] git-repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-11-21 20:17   ` Taylor Blau
2024-11-22 15:26     ` Derrick Stolee
2024-11-05  3:05 ` [PATCH 5/7] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-11-21 20:31   ` Taylor Blau
2024-11-22 15:26     ` Derrick Stolee [this message]
2024-11-26  8:26   ` Patrick Steinhardt
2024-11-05  3:05 ` [PATCH 6/7] pack-objects: disable --full-name-hash when shallow Derrick Stolee via GitGitGadget
2024-11-21 20:33   ` Taylor Blau
2024-11-22 15:27     ` Derrick Stolee
2024-11-05  3:05 ` [PATCH 7/7] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-11-21 20:42   ` Taylor Blau
2024-11-22  1:23   ` Jonathan Tan
2024-11-21 23:50 ` [PATCH 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Jonathan Tan
2024-11-22  3:01   ` Junio C Hamano
2024-11-22  4:22     ` Junio C Hamano
2024-11-22 15:27     ` Derrick Stolee
2024-11-24 23:57       ` Junio C Hamano
2024-11-22 18:05     ` Jonathan Tan
2024-12-02 23:21 ` [PATCH v2 0/8] " Derrick Stolee via GitGitGadget
2024-12-02 23:21   ` [PATCH v2 1/8] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2024-12-04 20:06     ` karthik nayak
2024-12-04 21:05       ` Junio C Hamano
2024-12-05  9:46         ` karthik nayak
2024-12-09 23:15     ` Jonathan Tan
2024-12-10  0:01       ` Junio C Hamano
2024-12-02 23:21   ` [PATCH v2 2/8] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2024-12-04 20:53     ` karthik nayak
2024-12-02 23:21   ` [PATCH v2 3/8] repack: " Derrick Stolee via GitGitGadget
2024-12-04 21:15     ` karthik nayak
2024-12-02 23:21   ` [PATCH v2 4/8] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2024-12-04 21:21     ` karthik nayak
2024-12-09 23:12     ` Jonathan Tan
2024-12-20 17:03       ` Derrick Stolee
2024-12-02 23:21   ` [PATCH v2 5/8] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-12-02 23:21   ` [PATCH v2 6/8] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-12-02 23:21   ` [PATCH v2 7/8] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2024-12-02 23:21   ` [PATCH v2 8/8] pack-objects: add third name hash version Derrick Stolee via GitGitGadget
2024-12-03  3:23   ` [PATCH v2 0/8] pack-objects: Create an alternative name hash algorithm (recreated) Junio C Hamano
2024-12-04  4:56     ` Derrick Stolee
2024-12-04  5:02       ` Junio C Hamano
2024-12-20 17:19   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2024-12-20 17:19     ` [PATCH v3 1/8] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2025-01-22 22:08       ` Taylor Blau
2024-12-20 17:19     ` [PATCH v3 2/8] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2025-01-22 22:17       ` Taylor Blau
2025-01-24 17:29         ` Derrick Stolee
2024-12-20 17:19     ` [PATCH v3 3/8] repack: " Derrick Stolee via GitGitGadget
2025-01-22 22:18       ` Taylor Blau
2024-12-20 17:19     ` [PATCH v3 4/8] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2025-01-22 22:20       ` Taylor Blau
2024-12-20 17:19     ` [PATCH v3 5/8] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-12-20 17:19     ` [PATCH v3 6/8] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-12-20 17:19     ` [PATCH v3 7/8] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2025-01-22 22:22       ` Taylor Blau
2024-12-20 17:19     ` [PATCH v3 8/8] pack-objects: add third name hash version Derrick Stolee via GitGitGadget
2025-01-22 22:37       ` Taylor Blau
2025-01-24 17:34         ` Derrick Stolee
2025-01-21 20:21     ` [PATCH v3 0/8] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee
2025-01-22 23:28       ` Taylor Blau
2025-01-24 17:45         ` Derrick Stolee
2025-01-27 19:02     ` [PATCH v4 0/7] " Derrick Stolee via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 1/7] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 2/7] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2025-01-27 21:18         ` Junio C Hamano
2025-01-29 13:38           ` Derrick Stolee
2025-01-27 19:02       ` [PATCH v4 3/7] repack: " Derrick Stolee via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 4/7] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 5/7] p5313: add size comparison test Derrick Stolee via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 6/7] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 7/7] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2025-01-31 21:39       ` [PATCH v4 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=233d4261-64b3-4149-bb6f-aeba66834c1c@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).