From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com,
johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im,
johncai86@gmail.com, newren@gmail.com,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 7/7] test-tool: add helper for name-hash values
Date: Thu, 21 Nov 2024 15:42:20 -0500 [thread overview]
Message-ID: <Zz+bLOi6DZoD5CfI@nand.local> (raw)
In-Reply-To: <ab341dd0e58f77b3c7c6f5765d9e34cb02bef56f.1730775908.git.gitgitgadget@gmail.com>
On Tue, Nov 05, 2024 at 03:05:07AM +0000, Derrick Stolee via GitGitGadget wrote:
> Test this tree
> -----------------------------------------------------------------
> 5314.1: paths at head 4.5K
> 5314.2: number of distinct name-hashes 4.1K
> 5314.3: number of distinct full-name-hashes 4.5K
> 5314.4: maximum multiplicity of name-hashes 13
> 5314.5: maximum multiplicity of fullname-hashes 1
>
> Here, the maximum collision multiplicity is 13, but around 10% of paths
> have a collision with another path.
Neat.
> diff --git a/t/helper/test-name-hash.c b/t/helper/test-name-hash.c
> new file mode 100644
> index 00000000000..e4ecd159b76
> --- /dev/null
> +++ b/t/helper/test-name-hash.c
> @@ -0,0 +1,24 @@
> +/*
> + * test-name-hash.c: Read a list of paths over stdin and report on their
> + * name-hash and full name-hash.
> + */
> +
> +#include "test-tool.h"
> +#include "git-compat-util.h"
> +#include "pack-objects.h"
> +#include "strbuf.h"
> +
> +int cmd__name_hash(int argc UNUSED, const char **argv UNUSED)
> +{
> + struct strbuf line = STRBUF_INIT;
> +
> + while (!strbuf_getline(&line, stdin)) {
> + uint32_t name_hash = pack_name_hash(line.buf);
> + uint32_t full_hash = pack_full_name_hash(line.buf);
> +
> + printf("%10"PRIu32"\t%10"PRIu32"\t%s\n", name_hash, full_hash, line.buf);
I'm definitely nitpicking, but having a tab to separate these two 32-bit
values feels odd when we know already that they will be at most
10-characters wide.
I probably would have written:
printf("%10"PRIu32" %10"PRIu32"\t%s\n", name_hash, full_hash, line.buf);
instead, but this is obviously not a big deal either way ;-).
> diff --git a/t/perf/p5314-name-hash.sh b/t/perf/p5314-name-hash.sh
> new file mode 100755
> index 00000000000..9fe26612fac
> --- /dev/null
> +++ b/t/perf/p5314-name-hash.sh
> @@ -0,0 +1,41 @@
> +#!/bin/sh
> +
> +test_description='Tests pack performance using bitmaps'
> +. ./perf-lib.sh
> +
> +GIT_TEST_PASSING_SANITIZE_LEAK=0
> +export GIT_TEST_PASSING_SANITIZE_LEAK
Does this conflict with Patrick's series to remove these leak checking
annotations? I think it might, which is not unexpected given this series
was written before that one (and it's my fault for not reviewing it
earlier).
> +test_perf_large_repo
> +
> +test_size 'paths at head' '
> + git ls-tree -r --name-only HEAD >path-list &&
> + wc -l <path-list
> +'
> +
> +test_size 'number of distinct name-hashes' '
> + cat path-list | test-tool name-hash >name-hashes &&
> + cat name-hashes | awk "{ print \$1; }" | sort -n | uniq -c >name-hash-count &&
In these two (and a handful of others lower down in this same script)
the "cat ... |" is unnecessary. I think this one should be written as:
test-tool name-hash <path-list >name-hashes &&
awk "{ print \$1; }" <name-hashes | sort | uniq -c >name-hash-count &&
(sort -n is unnecessary, since we just care about getting the list in
sorted order so that "uniq -c" can count the number of unique values).
> + wc -l <name-hash-count
> +'
> +
> +test_size 'number of distinct full-name-hashes' '
> + cat name-hashes | awk "{ print \$2; }" | sort -n | uniq -c >full-name-hash-count &&
> + wc -l <full-name-hash-count
> +'
> +
> +test_size 'maximum multiplicity of name-hashes' '
> + cat name-hash-count | \
> + sort -nr | \
> + head -n 1 | \
> + awk "{ print \$1; }"
> +'
> +
> +test_size 'maximum multiplicity of fullname-hashes' '
> + cat full-name-hash-count | \
> + sort -nr | \
> + head -n 1 | \
> + awk "{ print \$1; }"
Nitpicking again, but you could extract the "sort | head | awk" pipeline
into a function.
Thanks,
Taylor
next prev parent reply other threads:[~2024-11-21 20:42 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-05 3:05 [PATCH 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee via GitGitGadget
2024-11-05 3:05 ` [PATCH 1/7] pack-objects: add --full-name-hash option Derrick Stolee via GitGitGadget
2024-11-21 20:08 ` Taylor Blau
2024-11-21 21:35 ` Taylor Blau
2024-11-21 23:32 ` Junio C Hamano
2024-11-22 11:46 ` Derrick Stolee
2024-11-22 11:59 ` Derrick Stolee
2024-11-26 8:26 ` Patrick Steinhardt
2024-11-05 3:05 ` [PATCH 2/7] repack: " Derrick Stolee via GitGitGadget
2024-11-21 20:12 ` Taylor Blau
2024-11-22 12:07 ` Derrick Stolee
2024-11-05 3:05 ` [PATCH 3/7] pack-objects: add GIT_TEST_FULL_NAME_HASH Derrick Stolee via GitGitGadget
2024-11-21 20:15 ` Taylor Blau
2024-11-22 12:09 ` Derrick Stolee
2024-11-22 1:13 ` Jonathan Tan
2024-11-22 3:23 ` Junio C Hamano
2024-11-22 18:01 ` Jonathan Tan
2024-11-25 0:39 ` Junio C Hamano
2024-11-25 19:45 ` Jonathan Tan
2024-11-26 1:29 ` Junio C Hamano
2024-11-26 8:26 ` Patrick Steinhardt
2024-11-05 3:05 ` [PATCH 4/7] git-repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-11-21 20:17 ` Taylor Blau
2024-11-22 15:26 ` Derrick Stolee
2024-11-05 3:05 ` [PATCH 5/7] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-11-21 20:31 ` Taylor Blau
2024-11-22 15:26 ` Derrick Stolee
2024-11-26 8:26 ` Patrick Steinhardt
2024-11-05 3:05 ` [PATCH 6/7] pack-objects: disable --full-name-hash when shallow Derrick Stolee via GitGitGadget
2024-11-21 20:33 ` Taylor Blau
2024-11-22 15:27 ` Derrick Stolee
2024-11-05 3:05 ` [PATCH 7/7] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-11-21 20:42 ` Taylor Blau [this message]
2024-11-22 1:23 ` Jonathan Tan
2024-11-21 23:50 ` [PATCH 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Jonathan Tan
2024-11-22 3:01 ` Junio C Hamano
2024-11-22 4:22 ` Junio C Hamano
2024-11-22 15:27 ` Derrick Stolee
2024-11-24 23:57 ` Junio C Hamano
2024-11-22 18:05 ` Jonathan Tan
2024-12-02 23:21 ` [PATCH v2 0/8] " Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 1/8] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2024-12-04 20:06 ` karthik nayak
2024-12-04 21:05 ` Junio C Hamano
2024-12-05 9:46 ` karthik nayak
2024-12-09 23:15 ` Jonathan Tan
2024-12-10 0:01 ` Junio C Hamano
2024-12-02 23:21 ` [PATCH v2 2/8] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2024-12-04 20:53 ` karthik nayak
2024-12-02 23:21 ` [PATCH v2 3/8] repack: " Derrick Stolee via GitGitGadget
2024-12-04 21:15 ` karthik nayak
2024-12-02 23:21 ` [PATCH v2 4/8] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2024-12-04 21:21 ` karthik nayak
2024-12-09 23:12 ` Jonathan Tan
2024-12-20 17:03 ` Derrick Stolee
2024-12-02 23:21 ` [PATCH v2 5/8] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 6/8] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 7/8] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 8/8] pack-objects: add third name hash version Derrick Stolee via GitGitGadget
2024-12-03 3:23 ` [PATCH v2 0/8] pack-objects: Create an alternative name hash algorithm (recreated) Junio C Hamano
2024-12-04 4:56 ` Derrick Stolee
2024-12-04 5:02 ` Junio C Hamano
2024-12-20 17:19 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2024-12-20 17:19 ` [PATCH v3 1/8] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2025-01-22 22:08 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 2/8] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2025-01-22 22:17 ` Taylor Blau
2025-01-24 17:29 ` Derrick Stolee
2024-12-20 17:19 ` [PATCH v3 3/8] repack: " Derrick Stolee via GitGitGadget
2025-01-22 22:18 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 4/8] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2025-01-22 22:20 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 5/8] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-12-20 17:19 ` [PATCH v3 6/8] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-12-20 17:19 ` [PATCH v3 7/8] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2025-01-22 22:22 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 8/8] pack-objects: add third name hash version Derrick Stolee via GitGitGadget
2025-01-22 22:37 ` Taylor Blau
2025-01-24 17:34 ` Derrick Stolee
2025-01-21 20:21 ` [PATCH v3 0/8] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee
2025-01-22 23:28 ` Taylor Blau
2025-01-24 17:45 ` Derrick Stolee
2025-01-27 19:02 ` [PATCH v4 0/7] " Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 1/7] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 2/7] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2025-01-27 21:18 ` Junio C Hamano
2025-01-29 13:38 ` Derrick Stolee
2025-01-27 19:02 ` [PATCH v4 3/7] repack: " Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 4/7] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 5/7] p5313: add size comparison test Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 6/7] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 7/7] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2025-01-31 21:39 ` [PATCH v4 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zz+bLOi6DZoD5CfI@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=johncai86@gmail.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).