All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
Date: Mon,  4 Feb 2019 17:12:17 +0100	[thread overview]
Message-ID: <20190204161217.20047-1-avarab@gmail.com> (raw)
In-Reply-To: <20160926043442.3pz7ccawdcsn2kzb@sigill.intra.peff.net>

The algorithm we use to pick the default abbreviation length as a
function of the approximate number of objects is described in the
commit message for e6c587c733 ("abbrev: auto size the default
abbreviation", 2016-09-30), as well as in and downthread of [1], but
it hasn't been documented.

Let's do that, and while we're at it explicitly test for when the
current implementation will "roll over" up to values of 2^32-1 (the
maximum portable "unsigned long" value).

1. https://public-inbox.org/git/20160926043442.3pz7ccawdcsn2kzb@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
This is a patch from the middle of a series I'm currently working on
re-rolling. See
https://public-inbox.org/git/20180608224136.20220-1-avarab@gmail.com/

What I'd like to get here is commentary on the phrasing and accuracy
of the doc patch I'm adding here.

This patch assumes that we have a abbrev_length_for_object_count()
function, which I've added in an eariler unpublished patch. It just
exposes the length picking algorithm found in find_unique_abbrev_r().

 Documentation/config/core.txt       | 17 +++++++
 builtin/rev-parse.c                 |  8 ++++
 t/t1512-rev-parse-disambiguation.sh | 74 +++++++++++++++++++++++++++++
 3 files changed, 99 insertions(+)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 185857a13f..2175761833 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -599,6 +599,23 @@ core.abbrev::
 	abbreviated object names to stay unique for some time.
 	The minimum length is 4.
 +
+The algorithm to pick the the current abbreviation length is
+considered an implementation detail, and might be changed in the
+future. Since Git version 2.11, the length has been configured to
+auto-scale based on the estimated number of objects in the
+repository. We pick a length such that if all objects in the
+repository were abbreviated, we'd have a 50% chance of a *single*
+collision.
++
+For example, with 2^14-1 is the last object count at which we'll pick
+a short length of "7", and will roll over to "8" once we have one more
+object at 2^14. Since each hexdigit we add (4 bits) allows us to have
+four times (2 bits) as many objects in the repository, we'll roll over
+to a length of "9" at 2^16 objects, "10" at 2^18 etc. We'll never
+automatically pick a length less than "7", which effectively hardcodes
+2^12 as the minimum number of objects in a repository we'll consider
+when choosing the abbreviation length.
++
 This can also be set to relative values such as `+2` or `-2`, which
 means to add or subtract N characters from the SHA-1 that Git would
 otherwise print, this allows for producing more future-proof SHA-1s
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index d0d751a009..e7bf4375a2 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -773,6 +773,14 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 					return 1;
 				continue;
 			}
+			if (opt_with_value(arg, "--abbrev-len", &arg)) {
+				unsigned long v;
+				if (!git_parse_ulong(arg, &v))
+					return 1;
+				int len = abbrev_length_for_object_count(v);
+				printf("%d\n", len);
+				continue;
+			}
 			if (!strcmp(arg, "--bisect")) {
 				for_each_fullref_in("refs/bisect/bad", show_reference, NULL, 0);
 				for_each_fullref_in("refs/bisect/good", anti_reference, NULL, 0);
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index 265a6972fc..0e97888a44 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -450,4 +450,78 @@ test_expect_success C_LOCALE_OUTPUT 'ambiguous commits are printed by type first
 	done
 '
 
+test_expect_success 'abbreviation length at 2^N-1 and 2^N' '
+	pow_2_min=$(git rev-parse --abbrev-len=3) &&
+	pow_2_eql=$(git rev-parse --abbrev-len=4) &&
+	pow_4_min=$(git rev-parse --abbrev-len=15) &&
+	pow_4_eql=$(git rev-parse --abbrev-len=16) &&
+	pow_6_min=$(git rev-parse --abbrev-len=63) &&
+	pow_6_eql=$(git rev-parse --abbrev-len=64) &&
+	pow_8_min=$(git rev-parse --abbrev-len=255) &&
+	pow_8_eql=$(git rev-parse --abbrev-len=256) &&
+	pow_10_min=$(git rev-parse --abbrev-len=1023) &&
+	pow_10_eql=$(git rev-parse --abbrev-len=1024) &&
+	pow_12_min=$(git rev-parse --abbrev-len=4095) &&
+	pow_12_eql=$(git rev-parse --abbrev-len=4096) &&
+	pow_14_min=$(git rev-parse --abbrev-len=16383) &&
+	pow_14_eql=$(git rev-parse --abbrev-len=16384) &&
+	pow_16_min=$(git rev-parse --abbrev-len=65535) &&
+	pow_16_eql=$(git rev-parse --abbrev-len=65536) &&
+	pow_18_min=$(git rev-parse --abbrev-len=262143) &&
+	pow_18_eql=$(git rev-parse --abbrev-len=262144) &&
+	pow_20_min=$(git rev-parse --abbrev-len=1048575) &&
+	pow_20_eql=$(git rev-parse --abbrev-len=1048576) &&
+	pow_22_min=$(git rev-parse --abbrev-len=4194303) &&
+	pow_22_eql=$(git rev-parse --abbrev-len=4194304) &&
+	pow_24_min=$(git rev-parse --abbrev-len=16777215) &&
+	pow_24_eql=$(git rev-parse --abbrev-len=16777216) &&
+	pow_26_min=$(git rev-parse --abbrev-len=67108863) &&
+	pow_26_eql=$(git rev-parse --abbrev-len=67108864) &&
+	pow_28_min=$(git rev-parse --abbrev-len=268435455) &&
+	pow_28_eql=$(git rev-parse --abbrev-len=268435456) &&
+	pow_30_min=$(git rev-parse --abbrev-len=1073741823) &&
+	pow_30_eql=$(git rev-parse --abbrev-len=1073741824) &&
+	pow_32_min=$(git rev-parse --abbrev-len=4294967295) &&
+
+	cat >actual <<-EOF &&
+	2 = $pow_2_min $pow_2_eql
+	4 = $pow_4_min $pow_4_eql
+	6 = $pow_6_min $pow_6_eql
+	8 = $pow_8_min $pow_8_eql
+	10 = $pow_10_min $pow_10_eql
+	12 = $pow_12_min $pow_12_eql
+	14 = $pow_14_min $pow_14_eql
+	16 = $pow_16_min $pow_16_eql
+	18 = $pow_18_min $pow_18_eql
+	20 = $pow_20_min $pow_20_eql
+	22 = $pow_22_min $pow_22_eql
+	24 = $pow_24_min $pow_24_eql
+	26 = $pow_26_min $pow_26_eql
+	28 = $pow_28_min $pow_28_eql
+	30 = $pow_30_min $pow_30_eql
+	32 = 16
+	EOF
+
+	cat >expected <<-\EOF &&
+	2 = 7 7
+	4 = 7 7
+	6 = 7 7
+	8 = 7 7
+	10 = 7 7
+	12 = 7 7
+	14 = 7 8
+	16 = 8 9
+	18 = 9 10
+	20 = 10 11
+	22 = 11 12
+	24 = 12 13
+	26 = 13 14
+	28 = 14 15
+	30 = 15 16
+	32 = 16
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.20.1.611.gfbb209baf1


  parent reply	other threads:[~2019-02-04 16:12 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-26  1:39 Changing the default for "core.abbrev"? Linus Torvalds
2016-09-26  3:46 ` Junio C Hamano
2016-09-26  4:34   ` Jeff King
2016-09-26  4:45     ` Junio C Hamano
2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
2016-09-26 11:59         ` [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators Jeff King
2016-09-26 16:37           ` Junio C Hamano
2016-09-26 17:21             ` Jeff King
2016-09-26 17:50               ` Junio C Hamano
2016-09-26 11:59         ` [PATCH 02/10] get_sha1: avoid repeating ourselves via ONLY_TO_DIE Jeff King
2016-09-26 11:59         ` [PATCH 03/10] get_sha1: propagate flags to child functions Jeff King
2016-09-26 11:59         ` [PATCH 04/10] get_short_sha1: peel tags when looking for treeish Jeff King
2016-09-26 12:11           ` Jeff King
2016-09-26 16:55           ` Junio C Hamano
2016-09-26 17:23             ` Jeff King
2016-09-26 12:00         ` [PATCH 05/10] get_short_sha1: refactor init of disambiguation code Jeff King
2016-09-26 12:00         ` [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix Jeff King
2016-09-26 17:10           ` Junio C Hamano
2016-09-26 17:25             ` Jeff King
2016-09-26 17:36               ` Junio C Hamano
2016-09-26 12:00         ` [PATCH 07/10] get_short_sha1: mark ambiguity error for translation Jeff King
2016-09-26 12:00         ` [PATCH 08/10] sha1_array: let callbacks interrupt iteration Jeff King
2016-09-26 12:00         ` [PATCH 09/10] for_each_abbrev: drop duplicate objects Jeff King
2016-09-26 12:00         ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
2016-09-26 16:36           ` Linus Torvalds
2016-09-27  5:42             ` Jacob Keller
2016-09-27 12:38             ` Jeff King
2016-09-29 13:01             ` Kyle J. McKay
2016-09-29 13:24               ` Jeff King
2016-09-29 14:36                 ` Kyle J. McKay
2016-09-29 14:55                   ` Jeff King
2016-09-26 17:30           ` Junio C Hamano
2016-09-26 17:34             ` Jeff King
2016-09-26 17:39               ` Junio C Hamano
2016-09-29 11:46           ` Kyle J. McKay
2016-09-29 13:03             ` Jeff King
2016-09-29 17:19               ` Junio C Hamano
2016-09-30  5:51                 ` Jacob Keller
2019-02-04 16:12     ` Ævar Arnfjörð Bjarmason [this message]
2019-02-04 19:13       ` [RFC/PATCH] core.abbrev doc: document and test the abbreviation length Junio C Hamano
2019-02-04 20:04       ` Junio C Hamano
2019-02-04 21:36         ` Ævar Arnfjörð Bjarmason
2019-02-04 23:32         ` Jeff King
2019-02-04 23:50           ` Ævar Arnfjörð Bjarmason
2019-02-06 18:29             ` Jeff King
2019-02-06 18:36               ` Ævar Arnfjörð Bjarmason
2016-09-26  6:33   ` Changing the default for "core.abbrev"? Matthieu Moy
2016-09-26 12:09     ` Jeff King
2016-09-29 13:01   ` Kyle J. McKay
2016-09-26  7:13 ` Christian Couder
2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
2016-09-28 23:30   ` [PATCH 1/4] config: allow customizing /etc/gitconfig location Junio C Hamano
2016-09-29  9:53     ` Jakub Narębski
2016-09-29 17:20       ` Junio C Hamano
2016-09-29 17:45         ` Matthieu Moy
2016-09-28 23:30   ` [PATCH 2/4] t13xx: do not assume system config is empty Junio C Hamano
2016-09-29  9:01     ` Jeff King
2016-09-29 18:13       ` Junio C Hamano
2016-09-29 18:26         ` Jeff King
2016-09-29 18:57           ` Junio C Hamano
2016-09-29 19:18             ` Jeff King
2016-09-29 19:57               ` Junio C Hamano
2016-09-29 19:06           ` Junio C Hamano
2016-09-29 19:26             ` Jeff King
2016-09-29 21:03               ` Junio C Hamano
2016-09-29 21:08                 ` Jeff King
2016-09-28 23:30   ` [PATCH 3/4] worktree: honor configuration variables Junio C Hamano
2016-09-28 23:30   ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
2016-09-29  2:44     ` SZEDER Gábor
2016-09-29  5:27       ` Lukas Fleischer
2016-09-29  9:22         ` Jeff King
2016-09-29  9:15       ` Jeff King
2016-09-29 10:03         ` Matthieu Moy
2016-09-29 12:52         ` SZEDER Gábor
2016-09-29  5:58     ` Johannes Sixt
2016-09-29 18:05       ` Junio C Hamano
2016-09-29 18:37         ` Linus Torvalds
2016-09-29 18:55           ` Linus Torvalds
2016-09-29 19:06             ` Linus Torvalds
2016-09-29 19:42               ` Junio C Hamano
2016-09-30  0:56               ` Mike Hommey
2016-09-30  1:01                 ` Linus Torvalds
2016-09-30 19:41                   ` Ævar Arnfjörð Bjarmason
2016-09-29 19:16             ` Jeff King
2016-09-29 19:40               ` Linus Torvalds
2016-09-29 19:45                 ` Junio C Hamano
2016-09-29 21:53                   ` Linus Torvalds
2016-09-29 23:13                     ` Junio C Hamano
2016-09-29 23:20                       ` Junio C Hamano
2016-09-30  0:20                       ` Linus Torvalds
2016-09-30  0:28                         ` Linus Torvalds
2016-09-30  0:57                           ` Linus Torvalds
2016-09-30  1:18                             ` Linus Torvalds
2016-09-30  3:54                               ` Junio C Hamano
2016-09-30  4:10                                 ` Junio C Hamano
2016-09-30  4:18                                   ` Linus Torvalds
2016-09-30  4:29                                     ` Linus Torvalds
2016-09-30  4:27                                   ` Junio C Hamano
2016-09-30  4:35                                     ` Junio C Hamano
2016-09-30 18:40                                     ` Junio C Hamano
2016-09-30 18:51                                       ` Linus Torvalds
2016-09-30 19:00                                         ` Junio C Hamano
2016-09-30  4:11                                 ` Linus Torvalds
2016-09-30  8:06                               ` Jeff King
2016-09-30 17:54                                 ` Linus Torvalds
2016-09-30 18:05                                   ` Jeff King
2016-09-30 18:21                                   ` Linus Torvalds
2016-09-30 20:01                                     ` Junio C Hamano
2016-09-30 17:56                                 ` Junio C Hamano
2016-09-30  7:47                       ` Jeff King
2016-09-29  9:25     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190204161217.20047-1-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.