From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
dstolee@microsoft.com, git@jeffhostetler.com, peff@peff.net,
johannes.schindelin@gmx.de, jrnieder@gmail.com,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [RFC PATCH 2/2] sha1-name: add core.validateAbbrev & relative core.abbrev
Date: Wed, 6 Jun 2018 10:27:19 +0000 [thread overview]
Message-ID: <20180606102719.27145-3-avarab@gmail.com> (raw)
In-Reply-To: <20180606102719.27145-1-avarab@gmail.com>
In-Reply-To: <87lgbsz61p.fsf@evledraar.gmail.com>
Since Linus added auto-sizing for abbreviations in
e6c587c733 ("abbrev: auto size the default abbreviation", 2016-09-30)
we've been less likely to produce a short SHA-1 today that'll collide
on the same repository tomorrow, since before we'd always pick the
bare minimum we could get away with.
But we still do a full disambiguation check, which can be very
expensive in some cases. There's a work-in-progress MIDX
implementation to address that[1].
This change adds an alternate method of achieving some of the same
ends (but possibly not all, see [2] and replies to the original thread
at [1]).
Now, as described in the docs the user can set a relative abbreviation
length via core.abbrev, e.g. on linux.git core.abbrev=+2 will produce
SHA-1s that are 14 characters (as opposed to the implicit 12).
This in combination with core.validateAbbrev=false (off by default)
allows for picking a trade-off between performance, and the odds that
future or remote (or current and local) short SHA-1 will be ambiguous.
1. https://public-inbox.org/git/20180107181459.222909-1-dstolee@microsoft.com/
2. https://public-inbox.org/git/87lgbsz61p.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
Documentation/config.txt | 46 ++++++++++++++++++++++++++++++++++++++++
cache.h | 2 ++
config.c | 14 ++++++++++++
environment.c | 2 ++
sha1-name.c | 15 +++++++++++++
5 files changed, 79 insertions(+)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..8624110818 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -919,6 +919,52 @@ core.abbrev::
in your repository, which hopefully is enough for
abbreviated object names to stay unique for some time.
The minimum length is 4.
++
+This can also be set to relative values such as `+2` or `-2`, which
+means to add or subtract N characters from the SHA-1 that Git would
+otherwise print. This is useful in combination with the
+`core.validateAbbrev` setting, or to get more future-proof hashes to
+reference in the future in a repository whose number of objects is
+expected to grow.
+
+core.validateAbbrev::
+ If set to false (true by default) don't do any validation when
+ printing abbreviated object names to see if they're really
+ unique. This makes printing objects more performant at the
+ cost of potentially printing object names that aren't unique
+ within the current repository.
++
+When printing abbreviated object names Git needs to look through the
+local object store. This is an `O(log N)` operation assuming all the
+objects are in a single pack file, but `X * O(log N)` given `X` pack
+files, which can get expensive on some larger repositories.
++
+This setting changes that to `O(1)`, but with the trade-off that
+depending on the value of `core.abbrev` way may be printing
+abbreviated hashes that collide. Too see how likely this is, try
+running:
++
+-----------------------------------------------------------------------------------------------------------
+git log --all --pretty=format:%h --abbrev=4 | perl -nE 'chomp; say length' | sort | uniq -c | sort -nr
+-----------------------------------------------------------------------------------------------------------
++
+This shows how many commits were found at each abbreviation length. On
+linux.git in June 2018 this shows a bit more than 750,000 commits,
+with just 4 needing 11 characters to be fully abbreviated, and the
+default heuristic picks a length of 12.
++
+Even without `core.validateAbbrev=false` the results abbreviation
+already a bit of a probability game. They're guaranteed at the moment
+of generation, but as more objects are added, ambiguities may be
+introduced. Likewise, what's unambiguous for you may not be for
+somebody else you're communicating with, if they have their own clone.
++
+Therefore the default of `core.validateAbbrev=true` may not save you
+in practice if you're sharing the SHA-1 or noting it now to use after
+a `git fetch`. You may be better off setting `core.abbrev` to
+e.g. `+2` to add 2 extra characters to the SHA-1 in combination with
+`core.validateAbbrev=false` to get a reasonable trade-off between
+safety and performance.
add.ignoreErrors::
add.ignore-errors (deprecated)::
diff --git a/cache.h b/cache.h
index 89a107a7f7..6dc5af9482 100644
--- a/cache.h
+++ b/cache.h
@@ -772,6 +772,8 @@ extern int check_stat;
extern int quote_path_fully;
extern int has_symlinks;
extern int minimum_abbrev, default_abbrev;
+extern int default_abbrev_relative;
+extern int validate_abbrev;
extern int ignore_case;
extern int assume_unchanged;
extern int prefer_symlink_refs;
diff --git a/config.c b/config.c
index 12f762ad92..b6e0d17af1 100644
--- a/config.c
+++ b/config.c
@@ -1146,11 +1146,25 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.validateabbrev")) {
+ if (!value)
+ return config_error_nonbool(var);
+ validate_abbrev = git_config_bool(var, value);
+ return 0;
+ }
+
if (!strcmp(var, "core.abbrev")) {
if (!value)
return config_error_nonbool(var);
if (!strcasecmp(value, "auto")) {
default_abbrev = -1;
+ } else if (*value == '+' || *value == '-') {
+ int relative = git_config_int(var, value);
+ if (relative == 0)
+ die(_("bad core.abbrev value %s. "
+ "relative values must be non-zero"),
+ value);
+ default_abbrev_relative = relative;
} else {
int abbrev = git_config_int(var, value);
if (abbrev < minimum_abbrev || abbrev > 40)
diff --git a/environment.c b/environment.c
index 2a6de2330b..4a24d8126b 100644
--- a/environment.c
+++ b/environment.c
@@ -22,6 +22,8 @@ int trust_ctime = 1;
int check_stat = 1;
int has_symlinks = 1;
int minimum_abbrev = 4, default_abbrev = -1;
+int default_abbrev_relative = 0;
+int validate_abbrev = 1;
int ignore_case;
int assume_unchanged;
int prefer_symlink_refs;
diff --git a/sha1-name.c b/sha1-name.c
index 60d9ef3c7e..aa7ccea14d 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -576,6 +576,7 @@ int find_unique_abbrev_r(char *hex, const struct object_id *oid, int len)
struct disambiguate_state ds;
struct min_abbrev_data mad;
struct object_id oid_ret;
+ int dar = default_abbrev_relative;
if (len < 0) {
unsigned long count = approximate_object_count();
/*
@@ -602,6 +603,20 @@ int find_unique_abbrev_r(char *hex, const struct object_id *oid, int len)
if (len == GIT_SHA1_HEXSZ || !len)
return GIT_SHA1_HEXSZ;
+ if (dar) {
+ if (len + dar < MINIMUM_ABBREV) {
+ len = MINIMUM_ABBREV;
+ dar = 0;
+ }
+
+ if (validate_abbrev) {
+ len += dar;
+ } else {
+ hex[len + dar] = 0;
+ return len + dar;
+ }
+ }
+
mad.init_len = len;
mad.cur_len = len;
mad.hex = hex;
--
2.17.0.290.gded63e768a
next prev parent reply other threads:[~2018-06-06 10:27 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-07 18:14 [RFC PATCH 00/18] Multi-pack index (MIDX) Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes Derrick Stolee
2018-01-08 19:32 ` Jonathan Tan
2018-01-08 20:35 ` Derrick Stolee
2018-01-08 22:06 ` Jonathan Tan
2018-01-07 18:14 ` [RFC PATCH 02/18] midx: specify midx file format Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 03/18] midx: create core.midx config setting Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 04/18] midx: write multi-pack indexes for an object list Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 05/18] midx: create midx builtin with --write mode Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 06/18] midx: add t5318-midx.sh test script Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 07/18] midx: teach midx --write to update midx-head Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 08/18] midx: teach git-midx to read midx file details Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 09/18] midx: find details of nth object in midx Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 10/18] midx: use existing midx when writing Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 11/18] midx: teach git-midx to clear midx files Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 12/18] midx: teach git-midx to delete expired files Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 13/18] t5318-midx.h: confirm git actions are stable Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 14/18] midx: load midx files when loading packs Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 15/18] midx: use midx for approximate object count Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 16/18] midx: nth_midxed_object_oid() and bsearch_midx() Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 17/18] sha1_name: use midx for abbreviations Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 18/18] packfile: use midx for object loads Derrick Stolee
2018-01-07 22:42 ` [RFC PATCH 00/18] Multi-pack index (MIDX) Ævar Arnfjörð Bjarmason
2018-01-08 0:08 ` Derrick Stolee
2018-01-08 10:20 ` Jeff King
2018-01-08 10:27 ` Jeff King
2018-01-08 12:28 ` Ævar Arnfjörð Bjarmason
2018-01-08 13:43 ` Johannes Schindelin
2018-01-09 6:50 ` Jeff King
2018-01-09 13:05 ` Johannes Schindelin
2018-01-09 19:51 ` Stefan Beller
2018-01-09 20:12 ` Junio C Hamano
2018-01-09 20:16 ` Stefan Beller
2018-01-09 21:31 ` Junio C Hamano
2018-01-10 17:05 ` Johannes Schindelin
2018-01-10 10:57 ` Jeff King
2018-01-08 13:43 ` Derrick Stolee
2018-01-09 7:12 ` Jeff King
2018-01-08 11:43 ` Ævar Arnfjörð Bjarmason
2018-06-06 8:13 ` Ævar Arnfjörð Bjarmason
2018-06-06 10:27 ` [RFC PATCH 0/2] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
2018-06-06 10:27 ` [RFC PATCH 1/2] config.c: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
2018-06-06 10:27 ` Ævar Arnfjörð Bjarmason [this message]
2018-06-06 12:04 ` [RFC PATCH 2/2] sha1-name: add core.validateAbbrev & relative core.abbrev Christian Couder
2018-06-06 11:24 ` [RFC PATCH 00/18] Multi-pack index (MIDX) Derrick Stolee
2018-01-10 18:25 ` Martin Fick
2018-01-10 19:39 ` Derrick Stolee
2018-01-10 21:01 ` Martin Fick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180606102719.27145-3-avarab@gmail.com \
--to=avarab@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.