From: Namhyung Kim <namhyung@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: Re* [PATCH 1/2] sha1_name: try to use same abbrev length when core.abbrevguard is specified
Date: Thu, 10 Mar 2011 20:52:46 +0900 [thread overview]
Message-ID: <1299757966.1499.34.camel@leonhard> (raw)
In-Reply-To: <7vlj0n5o3n.fsf_-_@alter.siamese.dyndns.org>
2011-03-10 (목), 01:19 -0800, Junio C Hamano:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > I think what may be desirable is to honor the caller-supplied "len" a bit
> > better. If an object is uniquely identifiable with only 4-hexdigit today,
> > and if the caller gives 7 as len and the guard is set to 3, we return 10
> > hexdigits with the current code. We should instead return 7 hexdigits in
> > such a case, as that is in line with the "use 3 extra to give the
> > disambiguation we find today a better chance to survive".
>
> And here is an attempt to do so. I have to admit that I didn't give it
> too much thought, though, so please be careful when reviewing the logic.
>
What if the unique length is greater than or equal to the given length?
For instance the unique length is 7 and the caller gives 7 and the guard
is 3. What do you want to return, 7 or 10? How about the unique length
of 8?
I think the meaning of the guard is somewhat vague. When this feature
was considered in LKML at first, Linus just wanted to change the default
length of commit abbreviation to 12 by making it user-configurable. [1]
And this is the same situation what I tried to tell you in the previous
email.
So I think it would be better to choose the output length using only
caller-given length and the guard length if it guarantees the uniqueness
today. It'll be simple, consistent and expected behavior IMHO.
Thanks.
> -- >8 --
> Subject: find_unique_abbrev(): honor caller-supplied "len" better
>
> The caller of this function wants to ensure that the returned string is a
> unique abbreviation of the object name, and at least len characters long.
> When "len" is sufficiently short and we cannot ensure uniqueness with only
> "len" bytes, we returned minimally unique prefix (i.e. if you dropped the
> last character, there would be two or more objects that share that same
> prefix as their names in the repository).
>
> An earlier change introduced core.abbrevguard configuration with a
> realization that a short prefix that is unique today may not stay unique
> forever, as new objects are added to the repository. When "len" is shorter
> than the length necessary to ensure uniqueness today, instead of returning
> a string that is only one character longer than the longest ambiguous
> prefix, we wanted to add that many extra characters to ensure uniqueness
> for longer time.
>
> However, the code forgot that "len" may be sufficiently long. If an
> object is uniquely identifiable with only 4-hexdigit today, and if the
> caller gives 7 as len and the guard is set to 3, we returned 10 hexdigits,
> which was 3 characters longer than necessary. We should instead return 7
> hexdigits in such a case, as that is in line with the original intention
> of using 3 extra hexdigits to give the disambiguation we find today a
> better chance to survive.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> sha1_name.c | 24 ++++++++++++++++++++++++
> 1 files changed, 24 insertions(+), 0 deletions(-)
>
> diff --git a/sha1_name.c b/sha1_name.c
> index 709ff2e..0f81581 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -193,6 +193,23 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
> return status;
> }
>
> +/*
> + * The caller wants a unique abbreviation of the full object name in
> + * "sha1" that is at least "len" characters long.
> + *
> + * (1) If sha1 identifies an existing object, there must be no other
> + * object that shares the returned string as the prefix of its
> + * name;
> + *
> + * (2) If no object with the given object name exists, there must be
> + * no object that has the returned string as the prefix of its
> + * name.
> + *
> + * Usually we try to return as short a string as possible, but the
> + * core.abbrevguard configuration may tell us to use at least that
> + * many characters more than necessary to make the result unique,
> + * in order to keep it unique a bit longer.
> + */
> const char *find_unique_abbrev(const unsigned char *sha1, int len)
> {
> int status, exists;
> @@ -202,6 +219,13 @@ const char *find_unique_abbrev(const unsigned char *sha1, int len)
> memcpy(hex, sha1_to_hex(sha1), 40);
> if (len == 40 || !len)
> return hex;
> + len -= unique_abbrev_extra_length;
> + if (len <= 0)
> + len = 1;
Anyway, how about
if (len < MINIMUM_ABBREV)
len = MINIMUM_ABBREV;
> + /*
> + * Try to see how short a prefix we can feed to get
> + * the desired unique hit
> + */
> while (len < 40) {
> unsigned char sha1_ret[20];
> status = get_short_sha1(hex, len, sha1_ret, 1);
--
Regards,
Namhyung Kim
next prev parent reply other threads:[~2011-03-10 11:53 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-08 10:59 [PATCH 1/2] sha1_name: try to use same abbrev length when core.abbrevguard is specified Namhyung Kim
2011-03-08 10:59 ` [PATCH 2/2] blame: introduce -u/--unique option Namhyung Kim
2011-03-09 19:45 ` Junio C Hamano
2011-03-10 6:13 ` Namhyung Kim
2011-03-10 9:43 ` Junio C Hamano
2011-03-10 13:58 ` Namhyung Kim
2011-03-09 19:12 ` [PATCH 1/2] sha1_name: try to use same abbrev length when core.abbrevguard is specified Junio C Hamano
2011-03-09 19:20 ` Junio C Hamano
2011-03-10 9:19 ` Re* " Junio C Hamano
2011-03-10 11:52 ` Namhyung Kim [this message]
2011-03-10 11:54 ` Namhyung Kim
2011-03-10 19:11 ` Junio C Hamano
2011-03-10 5:27 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1299757966.1499.34.camel@leonhard \
--to=namhyung@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).