From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Cc: Jon Forrest <nobozo@gmail.com>, Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v2] abbrev: allow extending beyond 20 chars to disambiguate
Date: Mon, 11 Aug 2025 12:06:39 -0700 [thread overview]
Message-ID: <xmqqzfc51xvk.fsf@gitster.g> (raw)
In-Reply-To: <xmqqfrdx517b.fsf@gitster.g> (Junio C. Hamano's message of "Mon, 11 Aug 2025 08:26:32 -0700")
When you have two or more objects with object names that share more
than half the length of the hash algorithm in use (e.g. 10 bytes for
SHA-1 that produces 20-byte/160-bit hash), find_unique_abbrev()
fails to show disambiguation.
To see how many leading letters of a given full object name is
sufficiently unambiguous, the algorithm starts from a initial
length, guessed based on the estimated number of objects in the
repository, and see if another object that shares the prefix, and
keeps extending the abbreviation. The loop stops at GIT_MAX_RAWSZ,
which is counted as the number of bytes, since 5b20ace6 (sha1_name:
unroll len loop in find_unique_abbrev_r(), 2017-10-08); before that
change, it extended up to GIT_SHA1_HEXSZ, which was the correct
limit because the loop is adding one output letter per iteration
and back then SHA256 was not in the picture.
Pass the max length of the hash being in use in the current
repository down the code path, and use it to compute the code to
update the abbreviation length required to make it unique.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
object-name.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/object-name.c b/object-name.c
index 11aa0e6afc..8f9af57c0a 100644
--- a/object-name.c
+++ b/object-name.c
@@ -680,6 +680,7 @@ static unsigned msb(unsigned long val)
struct min_abbrev_data {
unsigned int init_len;
unsigned int cur_len;
+ unsigned int max_len;
char *hex;
struct repository *repo;
const struct object_id *oid;
@@ -699,12 +700,12 @@ static inline char get_hex_char_from_oid(const struct object_id *oid,
static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
{
struct min_abbrev_data *mad = cb_data;
-
unsigned int i = mad->init_len;
+
while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
i++;
- if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
+ if (mad->cur_len <= i && i < mad->max_len)
mad->cur_len = i + 1;
return 0;
@@ -864,6 +865,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
mad.repo = r;
mad.init_len = len;
mad.cur_len = len;
+ mad.max_len = hexsz;
mad.hex = hex;
mad.oid = oid;
Range-diff:
1: 2e1d2b4ef6 ! 1: 5c67e57f14 abbrev: allow extending beyond 20 chars to disambiguate
@@ Commit message
keeps extending the abbreviation. The loop stops at GIT_MAX_RAWSZ,
which is counted as the number of bytes, since 5b20ace6 (sha1_name:
unroll len loop in find_unique_abbrev_r(), 2017-10-08); before that
- change, it extended up to GIT_MAX_HEXSZ, which is the correct limit
- because the loop is adding one output letter per iteration.
+ change, it extended up to GIT_SHA1_HEXSZ, which was the correct
+ limit because the loop is adding one output letter per iteration and
+ back then SHA256 was not in the picture.
+
+ Pass the max length of the hash being in use in the current
+ repository down the code path, and use it to compute the code to
+ update the abbreviation length required to make it unique.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
## object-name.c ##
-@@ object-name.c: static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
+@@ object-name.c: static unsigned msb(unsigned long val)
+ struct min_abbrev_data {
+ unsigned int init_len;
+ unsigned int cur_len;
++ unsigned int max_len;
+ char *hex;
+ struct repository *repo;
+ const struct object_id *oid;
+@@ object-name.c: static inline char get_hex_char_from_oid(const struct object_id *oid,
+ static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
+ {
+ struct min_abbrev_data *mad = cb_data;
+-
+ unsigned int i = mad->init_len;
++
while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
i++;
- if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
-+ if (i < GIT_MAX_HEXSZ && i >= mad->cur_len)
++ if (mad->cur_len <= i && i < mad->max_len)
mad->cur_len = i + 1;
return 0;
+@@ object-name.c: int repo_find_unique_abbrev_r(struct repository *r, char *hex,
+ mad.repo = r;
+ mad.init_len = len;
+ mad.cur_len = len;
++ mad.max_len = hexsz;
+ mad.hex = hex;
+ mad.oid = oid;
+
--
2.51.0-rc1-144-g869f44a1ca
next prev parent reply other threads:[~2025-08-11 19:06 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-11 15:26 [PATCH] abbrev: allow extending beyond 20 chars to disambiguate Junio C Hamano
2025-08-11 18:53 ` Junio C Hamano
2025-08-11 19:06 ` Junio C Hamano [this message]
2025-08-11 22:23 ` [PATCH v2] " brian m. carlson
2025-08-12 13:28 ` Derrick Stolee
2025-08-12 14:58 ` René Scharfe
2025-08-12 15:17 ` Junio C Hamano
2025-08-12 15:59 ` René Scharfe
2025-08-14 15:09 ` [PATCH v3] abbrev: allow extending beyond 32 " Junio C Hamano
2025-08-11 21:17 ` [PATCH] abbrev: allow extending beyond 20 " brian m. carlson
2025-08-11 21:25 ` Junio C Hamano
2025-08-11 21:28 ` Junio C Hamano
2025-08-12 15:26 ` Jon Forrest
2025-08-12 16:21 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqzfc51xvk.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=nobozo@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).