From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Cc: Jon Forrest <nobozo@gmail.com>
Subject: Re: [PATCH v2] abbrev: allow extending beyond 20 chars to disambiguate
Date: Tue, 12 Aug 2025 09:28:43 -0400 [thread overview]
Message-ID: <55cbc80e-2585-49dd-80e9-33f200a9fac1@gmail.com> (raw)
In-Reply-To: <xmqqzfc51xvk.fsf@gitster.g>
On 8/11/2025 3:06 PM, Junio C Hamano wrote:
> When you have two or more objects with object names that share more
> than half the length of the hash algorithm in use (e.g. 10 bytes for
> SHA-1 that produces 20-byte/160-bit hash), find_unique_abbrev()
> fails to show disambiguation.
>
> To see how many leading letters of a given full object name is
> sufficiently unambiguous, the algorithm starts from a initial
> length, guessed based on the estimated number of objects in the
> repository, and see if another object that shares the prefix, and
> keeps extending the abbreviation. The loop stops at GIT_MAX_RAWSZ,
> which is counted as the number of bytes, since 5b20ace6 (sha1_name:
> unroll len loop in find_unique_abbrev_r(), 2017-10-08);
Wow. My first Git contribution.
Some style issues that you're opportunistically cleaning up are
due to my newness, for sure.
> before that
> change, it extended up to GIT_SHA1_HEXSZ, which was the correct
> limit because the loop is adding one output letter per iteration
> and back then SHA256 was not in the picture.
>
> Pass the max length of the hash being in use in the current
> repository down the code path, and use it to compute the code to
> update the abbreviation length required to make it unique.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> object-name.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/object-name.c b/object-name.c
> index 11aa0e6afc..8f9af57c0a 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -680,6 +680,7 @@ static unsigned msb(unsigned long val)
> struct min_abbrev_data {
> unsigned int init_len;
> unsigned int cur_len;
> + unsigned int max_len;
> char *hex;
> struct repository *repo;
> const struct object_id *oid;
> @@ -699,12 +700,12 @@ static inline char get_hex_char_from_oid(const struct object_id *oid,
> static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
> {
> struct min_abbrev_data *mad = cb_data;
> -
> unsigned int i = mad->init_len;
> +
> while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
> i++;
>
> - if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
> + if (mad->cur_len <= i && i < mad->max_len)
> mad->cur_len = i + 1;
This logic is all about not extending the abbreviation
length to beyond the length of the hex array, so your
limits make sense. Moving the comparisons here helps
with readability.
> return 0;
> @@ -864,6 +865,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
> mad.repo = r;
> mad.init_len = len;
> mad.cur_len = len;
> + mad.max_len = hexsz;
This new parameter required for allowing SHA256 is valuable.
I agree that we shouldn't need a test case to guarantee this
in the future. Good to clean up unnecessary uses of the
macro limits.
Thanks,
-Stolee
next prev parent reply other threads:[~2025-08-12 13:29 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-11 15:26 [PATCH] abbrev: allow extending beyond 20 chars to disambiguate Junio C Hamano
2025-08-11 18:53 ` Junio C Hamano
2025-08-11 19:06 ` [PATCH v2] " Junio C Hamano
2025-08-11 22:23 ` brian m. carlson
2025-08-12 13:28 ` Derrick Stolee [this message]
2025-08-12 14:58 ` René Scharfe
2025-08-12 15:17 ` Junio C Hamano
2025-08-12 15:59 ` René Scharfe
2025-08-14 15:09 ` [PATCH v3] abbrev: allow extending beyond 32 " Junio C Hamano
2025-08-11 21:17 ` [PATCH] abbrev: allow extending beyond 20 " brian m. carlson
2025-08-11 21:25 ` Junio C Hamano
2025-08-11 21:28 ` Junio C Hamano
2025-08-12 15:26 ` Jon Forrest
2025-08-12 16:21 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55cbc80e-2585-49dd-80e9-33f200a9fac1@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nobozo@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.