From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Jon Forrest <nobozo@gmail.com>,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH] abbrev: allow extending beyond 20 chars to disambiguate
Date: Mon, 11 Aug 2025 21:17:12 +0000 [thread overview]
Message-ID: <aJpd2MYMWgEoxQWi@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <xmqqfrdx517b.fsf@gitster.g>
[-- Attachment #1: Type: text/plain, Size: 1884 bytes --]
On 2025-08-11 at 15:26:32, Junio C Hamano wrote:
> When you have two or more objects with object names that share more
> than half the length of the hash algorithm in use (e.g. 10 bytes for
> SHA-1 that produces 20-byte/160-bit hash), find_unique_abbrev()
> fails to show disambiguation.
Is this really the case? If the restriction is due to using
GIT_MAX_RAWSZ instead of GIT_MAX_HEXSZ, then that's 32 vs. 64 in our
modern codebase.
> To see how many leading letters of a given full object name is
> sufficiently unambiguous, the algorithm starts from a initial
> length, guessed based on the estimated number of objects in the
> repository, and see if another object that shares the prefix, and
> keeps extending the abbreviation. The loop stops at GIT_MAX_RAWSZ,
> which is counted as the number of bytes, since 5b20ace6 (sha1_name:
> unroll len loop in find_unique_abbrev_r(), 2017-10-08); before that
> change, it extended up to GIT_MAX_HEXSZ, which is the correct limit
> because the loop is adding one output letter per iteration.
Nicely explained.
> * No tests added, since I do not think I want to find two valid
> objects with their object names sharing the same prefix that is
> more than 20 letters long. The current abbreviation code happens
> to ignore validity of the object and takes invalid objects into
> account when disambiguating, but I do not want to see a test rely
> on that.
Yes, even if we could efficiently create such a collision with SHA-1
using the best known attacks on it, that would still be 2^63.5, which
was estimated to cost about USD 10,000 in 2025. I don't think doing
that just to produce a test would be a good use of the project's (or
really, anyone else's) funds. Using SHA-256, of course, would require
at least 2^80 work.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next prev parent reply other threads:[~2025-08-11 21:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-11 15:26 [PATCH] abbrev: allow extending beyond 20 chars to disambiguate Junio C Hamano
2025-08-11 18:53 ` Junio C Hamano
2025-08-11 19:06 ` [PATCH v2] " Junio C Hamano
2025-08-11 22:23 ` brian m. carlson
2025-08-12 13:28 ` Derrick Stolee
2025-08-12 14:58 ` René Scharfe
2025-08-12 15:17 ` Junio C Hamano
2025-08-12 15:59 ` René Scharfe
2025-08-14 15:09 ` [PATCH v3] abbrev: allow extending beyond 32 " Junio C Hamano
2025-08-11 21:17 ` brian m. carlson [this message]
2025-08-11 21:25 ` [PATCH] abbrev: allow extending beyond 20 " Junio C Hamano
2025-08-11 21:28 ` Junio C Hamano
2025-08-12 15:26 ` Jon Forrest
2025-08-12 16:21 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aJpd2MYMWgEoxQWi@fruit.crustytoothpaste.net \
--to=sandals@crustytoothpaste.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nobozo@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.