From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, johannes.schindelin@gmx.de, peff@peff.net,
ps@pks.im, me@ttaylorr.com, johncai86@gmail.com,
newren@gmail.com, jonathantanmy@google.com,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 0/8] pack-objects: Create an alternative name hash algorithm (recreated)
Date: Tue, 03 Dec 2024 12:23:38 +0900 [thread overview]
Message-ID: <xmqq1pypfo05.fsf@gitster.g> (raw)
In-Reply-To: <pull.1823.v2.git.1733181682.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Mon, 02 Dec 2024 23:21:14 +0000")
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> This series creates a mechanism to select alternative name hashes using a
> new --name-hash-version=<n> option. The versions are:
>
> 1. Version 1 is the default name hash that already exists. This option
> focuses on the final bytes of the path to maximize locality for
> cross-path deltas.
>
> 2. Version 2 is the new path-component hash function suggested by Jonathan
> Tan in the previous version (with some modifications). This hash
> function essentially computes the v1 name hash of each path component
> and then overlays those hashes with a shift to make the parent
> directories contribute less to the final hash, but enough to break many
> collisions that exist in v1.
>
> 3. Version 3 is the hash function that I submitted under the
> --full-name-hash feature in the previous versions. This uses a
> pseudorandom hash procedure to minimize collisions but at the expense of
> losing on locality. This version is implemented in the final patch of
> the series mostly for comparison purposes, as it is unlikely to be
> selected as a valuable hash function over v2. The final patch could be
> omitted from the merged version.
>
> See the patches themselves for detailed results in the p5313-pack-objects.sh
> performance test and the p5314-name-hash.sh test that demonstrates how many
> collisions occur with each hash function.
These do not sound like versions but more like variants to me,
especially if one is expected to perform better than another in some
cases and worse in some other cases. Is it expected that JTan's hash
to perform better than the original and current hash in almost all
cases (I would not be surprised at all if that were the case)?
> In general, the v2 name hash function gets very close to the compression
> results of v3 in the full repack case, even in the repositories that feature
> many name hash collisions. These benefits come as well without downsides to
> other kinds of packfiles, including small pushed packs, larger incremental
> fetch packs, and shallow clones.
Nice.
> As we can see, v2 nearly reaches the effectiveness of v3 (and outperforms it
> once!) but there is still a significant change between the
> --name-hash-version feature and the --path-walk feature.
>
> The main reason we are considering this --name-hash-version feature is that
> it has the least amount of stretch required in order for it to be integrated
> with reachability bitmaps, required for server environments. In fact, the
> change in this version to use a numerical version makes it more obvious how
> to connect the version number to a value in the .bitmap file format. Tests
> are added to guarantee that the hash functions preserve their behavior over
> time, since data files depend on that.
Yeah, that aspect certainly is an attractive one. We should be able
to teach the bitmap file to say which name-hash function is in use
and all the pack data reuse logic we have should be reusable.
next prev parent reply other threads:[~2024-12-03 3:23 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-05 3:05 [PATCH 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee via GitGitGadget
2024-11-05 3:05 ` [PATCH 1/7] pack-objects: add --full-name-hash option Derrick Stolee via GitGitGadget
2024-11-21 20:08 ` Taylor Blau
2024-11-21 21:35 ` Taylor Blau
2024-11-21 23:32 ` Junio C Hamano
2024-11-22 11:46 ` Derrick Stolee
2024-11-22 11:59 ` Derrick Stolee
2024-11-26 8:26 ` Patrick Steinhardt
2024-11-05 3:05 ` [PATCH 2/7] repack: " Derrick Stolee via GitGitGadget
2024-11-21 20:12 ` Taylor Blau
2024-11-22 12:07 ` Derrick Stolee
2024-11-05 3:05 ` [PATCH 3/7] pack-objects: add GIT_TEST_FULL_NAME_HASH Derrick Stolee via GitGitGadget
2024-11-21 20:15 ` Taylor Blau
2024-11-22 12:09 ` Derrick Stolee
2024-11-22 1:13 ` Jonathan Tan
2024-11-22 3:23 ` Junio C Hamano
2024-11-22 18:01 ` Jonathan Tan
2024-11-25 0:39 ` Junio C Hamano
2024-11-25 19:45 ` Jonathan Tan
2024-11-26 1:29 ` Junio C Hamano
2024-11-26 8:26 ` Patrick Steinhardt
2024-11-05 3:05 ` [PATCH 4/7] git-repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-11-21 20:17 ` Taylor Blau
2024-11-22 15:26 ` Derrick Stolee
2024-11-05 3:05 ` [PATCH 5/7] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-11-21 20:31 ` Taylor Blau
2024-11-22 15:26 ` Derrick Stolee
2024-11-26 8:26 ` Patrick Steinhardt
2024-11-05 3:05 ` [PATCH 6/7] pack-objects: disable --full-name-hash when shallow Derrick Stolee via GitGitGadget
2024-11-21 20:33 ` Taylor Blau
2024-11-22 15:27 ` Derrick Stolee
2024-11-05 3:05 ` [PATCH 7/7] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-11-21 20:42 ` Taylor Blau
2024-11-22 1:23 ` Jonathan Tan
2024-11-21 23:50 ` [PATCH 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Jonathan Tan
2024-11-22 3:01 ` Junio C Hamano
2024-11-22 4:22 ` Junio C Hamano
2024-11-22 15:27 ` Derrick Stolee
2024-11-24 23:57 ` Junio C Hamano
2024-11-22 18:05 ` Jonathan Tan
2024-12-02 23:21 ` [PATCH v2 0/8] " Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 1/8] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2024-12-04 20:06 ` karthik nayak
2024-12-04 21:05 ` Junio C Hamano
2024-12-05 9:46 ` karthik nayak
2024-12-09 23:15 ` Jonathan Tan
2024-12-10 0:01 ` Junio C Hamano
2024-12-02 23:21 ` [PATCH v2 2/8] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2024-12-04 20:53 ` karthik nayak
2024-12-02 23:21 ` [PATCH v2 3/8] repack: " Derrick Stolee via GitGitGadget
2024-12-04 21:15 ` karthik nayak
2024-12-02 23:21 ` [PATCH v2 4/8] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2024-12-04 21:21 ` karthik nayak
2024-12-09 23:12 ` Jonathan Tan
2024-12-20 17:03 ` Derrick Stolee
2024-12-02 23:21 ` [PATCH v2 5/8] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 6/8] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 7/8] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2024-12-02 23:21 ` [PATCH v2 8/8] pack-objects: add third name hash version Derrick Stolee via GitGitGadget
2024-12-03 3:23 ` Junio C Hamano [this message]
2024-12-04 4:56 ` [PATCH v2 0/8] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee
2024-12-04 5:02 ` Junio C Hamano
2024-12-20 17:19 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2024-12-20 17:19 ` [PATCH v3 1/8] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2025-01-22 22:08 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 2/8] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2025-01-22 22:17 ` Taylor Blau
2025-01-24 17:29 ` Derrick Stolee
2024-12-20 17:19 ` [PATCH v3 3/8] repack: " Derrick Stolee via GitGitGadget
2025-01-22 22:18 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 4/8] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2025-01-22 22:20 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 5/8] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-12-20 17:19 ` [PATCH v3 6/8] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-12-20 17:19 ` [PATCH v3 7/8] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2025-01-22 22:22 ` Taylor Blau
2024-12-20 17:19 ` [PATCH v3 8/8] pack-objects: add third name hash version Derrick Stolee via GitGitGadget
2025-01-22 22:37 ` Taylor Blau
2025-01-24 17:34 ` Derrick Stolee
2025-01-21 20:21 ` [PATCH v3 0/8] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee
2025-01-22 23:28 ` Taylor Blau
2025-01-24 17:45 ` Derrick Stolee
2025-01-27 19:02 ` [PATCH v4 0/7] " Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 1/7] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 2/7] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2025-01-27 21:18 ` Junio C Hamano
2025-01-29 13:38 ` Derrick Stolee
2025-01-27 19:02 ` [PATCH v4 3/7] repack: " Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 4/7] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 5/7] p5313: add size comparison test Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 6/7] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2025-01-27 19:02 ` [PATCH v4 7/7] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2025-01-31 21:39 ` [PATCH v4 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq1pypfo05.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=johannes.schindelin@gmx.de \
--cc=johncai86@gmail.com \
--cc=jonathantanmy@google.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).