SEGV (detected by Address Sanitizer) when using `core.abbrev` option

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* SEGV (detected by Address Sanitizer) when using `core.abbrev` option
@ 2024-06-10 22:27 Kyle Lippincott
  2024-06-10 23:01 ` Kyle Lippincott
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Kyle Lippincott @ 2024-06-10 22:27 UTC (permalink / raw)
  To: Git Mailing List, Patrick Steinhardt

c8aed5e8dadf (repository: stop setting SHA1 as the default object
hash, 2024-05-07) stopped initializing the_hash_algo, but config.c
references it when it observes a user setting core.abbrev, in two
ways:
- if core.abbrev is detected as a 'no' boolean value, then
default_abbrev is set to the_hash_algo->hexsz
- if core.abbrev is set to an integer, it verifies that it's within
range for the hash algorithm (specifically: it errors out if the value
is < minimum_abbrev or > the_hash_algo->hexsz).

Stack:
==2421488==ERROR: AddressSanitizer: SEGV on unknown address
0x000000000018 (pc 0x56344202585f bp 0x7fff9546fe10 sp 0x7fff9546fcb0
T0)
==2421488==The signal is caused by a READ memory access.
==2421488==Hint: address points to the zero page.
    #0 0x56344202585f in git_default_core_config git/config.c:1466
    #1 0x56344202585f in git_default_config git/config.c:1815
    #2 0x56344202064e in configset_iter git/config.c:2185
    #3 0x563441d531cb in cmd_clone builtin/clone.c:981
    #4 0x563441cebac2 in run_builtin git/git.c:474
    #5 0x563441cebac2 in handle_builtin git/git.c:729
    #6 0x563441ceed0a in run_argv git/git.c:793
    #7 0x563441cf0aea in cmd_main git/git.c:928
    #8 0x563441ce9323 in main git/common-main.c:62
    #9 0x7fa3228456c9 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
    #10 0x7fa322845784 in __libc_start_main_impl ../csu/libc-start.c:360
    #11 0x563441ceb530 in _start (git/git+0x1e0530) (BuildId:
c0e4b09d5b212a201769f1eb8e7592cddbe3af1d)

AddressSanitizer can not provide additional info.

---

My first thought for a fix was to just cap it at 40, with the
assumption that the code would handle it correctly in the unlikely
event that the hash size ever decreased, but I don't think that does
the right thing if `core.abbrev=no`. That's documented as a way of
obtaining the full hashes (with no abbreviation), and if we're using
sha256, capping that at 40 hex (160bits) is incorrect.

My second thought was that we could store the requested value and
validate it on every usage. This complicates usage locations, and can
lead to poor behavior (crashes in the middle of operation when we
finally get around to checking the value).

My third thought was that we could store the requested value and
validate when we have a repository that initializes the hash for us as
part of that initialization. If we attempt to abbreviate some hashes
without that setup, we act as if core.abbrev isn't set at all (so they
get the default behavior). That seems like the best option overall.
Exploring that further ...

I've looked at a semi-random collection of places where
`default_abbrev` or `DEFAULT_ABBREV` are used, and they all seem to
eventually go through `repo_find_unique_abbrev_r`, and they pass in
the len. This also always has a repository available (otherwise it
wouldn't be able to disambiguate). Furthermore, it has `if (len < 0)`.
We could thus carry the "unvalidated" request in default_abbrev by
having special magic values (auto=-1 like today, no=-2 (replaced with
hexsz when we know it), other requests are <= -4, for a requested
length of 4 or higher), or we could have another variable
(requested_default_abbrev) that gets copied to default_abbrev when we
have a repo.

The one potential issue I can think of with this is that setting
`core.abbrev = no`, having that resolve to 64 (sha256) when we set up
the repo, and then if we ever read from a repo that uses 40 hexsz
(such as sha1), then we have to ensure that we tolerate a requested
length greater than the current hash algorithm's maximum length. This
likely wasn't a problem when sha1 was the default, because we're
unlikely to go to a hash algorithm with <160 bits in the future. But
if sha256 becomes the default, then this can be problematic.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SEGV (detected by Address Sanitizer) when using `core.abbrev` option
  2024-06-10 22:27 SEGV (detected by Address Sanitizer) when using `core.abbrev` option Kyle Lippincott
@ 2024-06-10 23:01 ` Kyle Lippincott
  2024-06-10 23:21   ` Junio C Hamano
  2024-06-11  6:06 ` Patrick Steinhardt
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Kyle Lippincott @ 2024-06-10 23:01 UTC (permalink / raw)
  To: Git Mailing List, Patrick Steinhardt

On Mon, Jun 10, 2024 at 3:27 PM Kyle Lippincott <spectral@google.com> wrote:
>

I just realized I didn't give a reproduction command, sorry about
that; here's the reproduction command provided by our user:

git config --global core.abbrev 12
git clone https://github.com/git/git.git

I realized this because Josh Steadmon informed me that config
loading/parsing is lazy when I asked why `git bisect` still worked
with this setting in my git config. So I think this might be a problem
only for things that go through `git_default_config` (if we set up the
repo object, we probably work just fine).

> c8aed5e8dadf (repository: stop setting SHA1 as the default object
> hash, 2024-05-07) stopped initializing the_hash_algo, but config.c
> references it when it observes a user setting core.abbrev, in two
> ways:
> - if core.abbrev is detected as a 'no' boolean value, then
> default_abbrev is set to the_hash_algo->hexsz
> - if core.abbrev is set to an integer, it verifies that it's within
> range for the hash algorithm (specifically: it errors out if the value
> is < minimum_abbrev or > the_hash_algo->hexsz).
>
> Stack:
> ==2421488==ERROR: AddressSanitizer: SEGV on unknown address
> 0x000000000018 (pc 0x56344202585f bp 0x7fff9546fe10 sp 0x7fff9546fcb0
> T0)
> ==2421488==The signal is caused by a READ memory access.
> ==2421488==Hint: address points to the zero page.
>     #0 0x56344202585f in git_default_core_config git/config.c:1466
>     #1 0x56344202585f in git_default_config git/config.c:1815
>     #2 0x56344202064e in configset_iter git/config.c:2185
>     #3 0x563441d531cb in cmd_clone builtin/clone.c:981
>     #4 0x563441cebac2 in run_builtin git/git.c:474
>     #5 0x563441cebac2 in handle_builtin git/git.c:729
>     #6 0x563441ceed0a in run_argv git/git.c:793
>     #7 0x563441cf0aea in cmd_main git/git.c:928
>     #8 0x563441ce9323 in main git/common-main.c:62
>     #9 0x7fa3228456c9 in __libc_start_call_main
> ../sysdeps/nptl/libc_start_call_main.h:58
>     #10 0x7fa322845784 in __libc_start_main_impl ../csu/libc-start.c:360
>     #11 0x563441ceb530 in _start (git/git+0x1e0530) (BuildId:
> c0e4b09d5b212a201769f1eb8e7592cddbe3af1d)
>
> AddressSanitizer can not provide additional info.
>
> ---
>
> My first thought for a fix was to just cap it at 40, with the
> assumption that the code would handle it correctly in the unlikely
> event that the hash size ever decreased, but I don't think that does
> the right thing if `core.abbrev=no`. That's documented as a way of
> obtaining the full hashes (with no abbreviation), and if we're using
> sha256, capping that at 40 hex (160bits) is incorrect.
>
> My second thought was that we could store the requested value and
> validate it on every usage. This complicates usage locations, and can
> lead to poor behavior (crashes in the middle of operation when we
> finally get around to checking the value).
>
> My third thought was that we could store the requested value and
> validate when we have a repository that initializes the hash for us as
> part of that initialization. If we attempt to abbreviate some hashes
> without that setup, we act as if core.abbrev isn't set at all (so they
> get the default behavior). That seems like the best option overall.
> Exploring that further ...
>
> I've looked at a semi-random collection of places where
> `default_abbrev` or `DEFAULT_ABBREV` are used, and they all seem to
> eventually go through `repo_find_unique_abbrev_r`, and they pass in
> the len. This also always has a repository available (otherwise it
> wouldn't be able to disambiguate). Furthermore, it has `if (len < 0)`.
> We could thus carry the "unvalidated" request in default_abbrev by
> having special magic values (auto=-1 like today, no=-2 (replaced with
> hexsz when we know it), other requests are <= -4, for a requested
> length of 4 or higher), or we could have another variable
> (requested_default_abbrev) that gets copied to default_abbrev when we
> have a repo.
>
> The one potential issue I can think of with this is that setting
> `core.abbrev = no`, having that resolve to 64 (sha256) when we set up
> the repo, and then if we ever read from a repo that uses 40 hexsz
> (such as sha1), then we have to ensure that we tolerate a requested
> length greater than the current hash algorithm's maximum length. This
> likely wasn't a problem when sha1 was the default, because we're
> unlikely to go to a hash algorithm with <160 bits in the future. But
> if sha256 becomes the default, then this can be problematic.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SEGV (detected by Address Sanitizer) when using `core.abbrev` option
  2024-06-10 23:01 ` Kyle Lippincott
@ 2024-06-10 23:21   ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2024-06-10 23:21 UTC (permalink / raw)
  To: Kyle Lippincott; +Cc: Git Mailing List, Patrick Steinhardt

Kyle Lippincott <spectral@google.com> writes:

> I just realized I didn't give a reproduction command, sorry about
> that; here's the reproduction command provided by our user:
>
> git config --global core.abbrev 12
> git clone https://github.com/git/git.git

Or "git -c core.abbrev=12 clone https://any/thing".

Nicely spotted.  And this is a curious one, as you would generally
expect that having a fallback setting in the global space would help
you instead of hurting you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SEGV (detected by Address Sanitizer) when using `core.abbrev` option
  2024-06-10 22:27 SEGV (detected by Address Sanitizer) when using `core.abbrev` option Kyle Lippincott
  2024-06-10 23:01 ` Kyle Lippincott
@ 2024-06-11  6:06 ` Patrick Steinhardt
  2024-06-11  8:42 ` [PATCH 1/2] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Patrick Steinhardt @ 2024-06-11  6:06 UTC (permalink / raw)
  To: Kyle Lippincott; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 5981 bytes --]

On Mon, Jun 10, 2024 at 03:27:11PM -0700, Kyle Lippincott wrote:
> c8aed5e8dadf (repository: stop setting SHA1 as the default object
> hash, 2024-05-07) stopped initializing the_hash_algo, but config.c
> references it when it observes a user setting core.abbrev, in two
> ways:
> - if core.abbrev is detected as a 'no' boolean value, then
> default_abbrev is set to the_hash_algo->hexsz
> - if core.abbrev is set to an integer, it verifies that it's within
> range for the hash algorithm (specifically: it errors out if the value
> is < minimum_abbrev or > the_hash_algo->hexsz).
> 
> Stack:
> ==2421488==ERROR: AddressSanitizer: SEGV on unknown address
> 0x000000000018 (pc 0x56344202585f bp 0x7fff9546fe10 sp 0x7fff9546fcb0
> T0)
> ==2421488==The signal is caused by a READ memory access.
> ==2421488==Hint: address points to the zero page.
>     #0 0x56344202585f in git_default_core_config git/config.c:1466
>     #1 0x56344202585f in git_default_config git/config.c:1815
>     #2 0x56344202064e in configset_iter git/config.c:2185
>     #3 0x563441d531cb in cmd_clone builtin/clone.c:981
>     #4 0x563441cebac2 in run_builtin git/git.c:474
>     #5 0x563441cebac2 in handle_builtin git/git.c:729
>     #6 0x563441ceed0a in run_argv git/git.c:793
>     #7 0x563441cf0aea in cmd_main git/git.c:928
>     #8 0x563441ce9323 in main git/common-main.c:62
>     #9 0x7fa3228456c9 in __libc_start_call_main
> ../sysdeps/nptl/libc_start_call_main.h:58
>     #10 0x7fa322845784 in __libc_start_main_impl ../csu/libc-start.c:360
>     #11 0x563441ceb530 in _start (git/git+0x1e0530) (BuildId:
> c0e4b09d5b212a201769f1eb8e7592cddbe3af1d)
> 
> AddressSanitizer can not provide additional info.

Good find.

> My first thought for a fix was to just cap it at 40, with the
> assumption that the code would handle it correctly in the unlikely
> event that the hash size ever decreased, but I don't think that does
> the right thing if `core.abbrev=no`. That's documented as a way of
> obtaining the full hashes (with no abbreviation), and if we're using
> sha256, capping that at 40 hex (160bits) is incorrect.

Exactly, this doesn't feel like the right solution.

> My second thought was that we could store the requested value and
> validate it on every usage. This complicates usage locations, and can
> lead to poor behavior (crashes in the middle of operation when we
> finally get around to checking the value).

This feels more complex than needed indeed.

> My third thought was that we could store the requested value and
> validate when we have a repository that initializes the hash for us as
> part of that initialization. If we attempt to abbreviate some hashes
> without that setup, we act as if core.abbrev isn't set at all (so they
> get the default behavior). That seems like the best option overall.
> Exploring that further ...
> 
> I've looked at a semi-random collection of places where
> `default_abbrev` or `DEFAULT_ABBREV` are used, and they all seem to
> eventually go through `repo_find_unique_abbrev_r`, and they pass in
> the len. This also always has a repository available (otherwise it
> wouldn't be able to disambiguate). Furthermore, it has `if (len < 0)`.
> We could thus carry the "unvalidated" request in default_abbrev by
> having special magic values (auto=-1 like today, no=-2 (replaced with
> hexsz when we know it), other requests are <= -4, for a requested
> length of 4 or higher), or we could have another variable
> (requested_default_abbrev) that gets copied to default_abbrev when we
> have a repo.

This also feels a bit too complex in my opinion.

> The one potential issue I can think of with this is that setting
> `core.abbrev = no`, having that resolve to 64 (sha256) when we set up
> the repo, and then if we ever read from a repo that uses 40 hexsz
> (such as sha1), then we have to ensure that we tolerate a requested
> length greater than the current hash algorithm's maximum length. This
> likely wasn't a problem when sha1 was the default, because we're
> unlikely to go to a hash algorithm with <160 bits in the future. But
> if sha256 becomes the default, then this can be problematic.

I think this is the best solution. I don't quite see why we need to
abort in the first place when the caller asks for something longer
than GIT_MAX_HEXSZ. If they do, it's rather clear that the intent is to
show the full object hash, so we can do that.

In other words, the following should be sufficient, shouldn't it?

diff --git a/config.c b/config.c
index abce05b774..0416b0f2b6 100644
--- a/config.c
+++ b/config.c
@@ -1460,10 +1460,10 @@ static int git_default_core_config(const char *var, const char *value,
 		if (!strcasecmp(value, "auto"))
 			default_abbrev = -1;
 		else if (!git_parse_maybe_bool_text(value))
-			default_abbrev = the_hash_algo->hexsz;
+			default_abbrev = GIT_MAX_HEXSZ;
 		else {
 			int abbrev = git_config_int(var, value, ctx->kvi);
-			if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)
+			if (abbrev < minimum_abbrev)
 				return error(_("abbrev length out of range: %d"), abbrev);
 			default_abbrev = abbrev;
 		}
diff --git a/object-name.c b/object-name.c
index 523af6f64f..1be2ad1a16 100644
--- a/object-name.c
+++ b/object-name.c
@@ -837,7 +837,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
 	}
 
 	oid_to_hex_r(hex, oid);
-	if (len == hexsz || !len)
+	if (len >= hexsz || !len)
 		return hexsz;
 
 	mad.repo = r;

This is also in line with `parse_opt_abbrev_cb()`, which does the
following:

    if (v && v < MINIMUM_ABBREV)
        v = MINIMUM_ABBREV;
    else if (startup_info->have_repository && v > the_hash_algo->hexsz)
        v = the_hash_algo->hexsz;

In other words, it trims to `the_hash_algo->hexsz` or, if we don't have
a repository, it doesn't bother to trim the value at all.

I'll send a patch that does this.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/2] config: fix segfault when parsing "core.abbrev" without repo
  2024-06-10 22:27 SEGV (detected by Address Sanitizer) when using `core.abbrev` option Kyle Lippincott
  2024-06-10 23:01 ` Kyle Lippincott
  2024-06-11  6:06 ` Patrick Steinhardt
@ 2024-06-11  8:42 ` Patrick Steinhardt
  2024-06-11 17:54   ` Junio C Hamano
  2024-06-11  8:42 ` [PATCH 2/2] object-name: don't try to abbreviate to lengths greater than hexsz Patrick Steinhardt
  2024-06-12  8:03 ` [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
  4 siblings, 1 reply; 14+ messages in thread
From: Patrick Steinhardt @ 2024-06-11  8:42 UTC (permalink / raw)
  To: git; +Cc: Kyle Lippincott, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4099 bytes --]

The "core.abbrev" config allows the user to specify the minimum length
when abbreviating object hashes. Next to the values "auto" and "no",
this config also accepts a concrete length that needs to be bigger or
equal to the minimum length and smaller or equal to the hash algorithm's
hex length. While the former condition is trivial, the latter depends on
the object format used by the current repository. It is thus a variable
upper boundary that may either be 40 (SHA-1) or 64 (SHA-256).

This has two major downsides. First, the user that specifies this config
must be aware of the object hashes that its repository use. If they want
to configure the value globally, then they cannot pick any value in the
range `[41, 64]` if they have any repository that uses SHA-1. If they
did, Git would error out when parsing the config.

Second, and more importantly, parsing "core.abbrev" crashes when outside
of a Git repository because we dereference `the_hash_algo` to figure out
its hex length. Starting with c8aed5e8da (repository: stop setting SHA1
as the default object hash, 2024-05-07) though, we stopped initializing
`the_hash_algo` outside of Git repositories.

Fix both of these issues by not making it an error anymore when the
given length exceeds the hash length. Instead, if we have a repository,
then we truncate the length to the maximum length of `the_hash_algo`.
Otherwise, we simply leave the abbreviated length intact and store it
as-is. This is equivalent to the logic in `parse_opt_abbrev_cb()` and is
handled just fine by `repo_find_unique_abbrev_r()`. In practice, we
should never even end up using `default_abbrev` without a repository
anyway given that abbreviating object IDs to unique prefixes requires us
to have access to an object database.

Reported-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 config.c         | 7 +++++--
 t/t4202-log.sh   | 6 ++++++
 t/t5601-clone.sh | 7 +++++++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index abce05b774..ab2844d9e1 100644
--- a/config.c
+++ b/config.c
@@ -1460,11 +1460,14 @@ static int git_default_core_config(const char *var, const char *value,
 		if (!strcasecmp(value, "auto"))
 			default_abbrev = -1;
 		else if (!git_parse_maybe_bool_text(value))
-			default_abbrev = the_hash_algo->hexsz;
+			default_abbrev = startup_info->have_repository ?
+				the_hash_algo->hexsz : GIT_MAX_HEXSZ;
 		else {
 			int abbrev = git_config_int(var, value, ctx->kvi);
-			if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)
+			if (abbrev < minimum_abbrev)
 				return error(_("abbrev length out of range: %d"), abbrev);
+			else if (startup_info->have_repository && abbrev > the_hash_algo->hexsz)
+				abbrev = the_hash_algo->hexsz;
 			default_abbrev = abbrev;
 		}
 		return 0;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 86c695eb0a..99c063e4cd 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1237,6 +1237,12 @@ test_expect_success 'log.abbrevCommit configuration' '
 	test_cmp expect.whatchanged.full actual
 '
 
+test_expect_success 'log.abbrevCommit with --abbrev=9000' '
+	git log --no-abbrev >expect &&
+	git log --abbrev-commit --abbrev=9000 >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'show added path under "--follow -M"' '
 	# This tests for a regression introduced in v1.7.2-rc0~103^2~2
 	test_create_repo regression &&
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index cc0b953f14..5d7ea147f1 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -46,6 +46,13 @@ test_expect_success 'output from clone' '
 	test $(grep Clon output | wc -l) = 1
 '
 
+test_expect_success 'output from clone with core.abbrev does not crash' '
+	rm -fr dst &&
+	echo "Cloning into ${SQ}dst${SQ}..." >expect &&
+	git -c core.abbrev=12 clone -n "file://$(pwd)/src" dst >actual 2>&1 &&
+	test_cmp expect actual
+'
+
 test_expect_success 'clone does not keep pack' '
 
 	rm -fr dst &&
-- 
2.45.2.436.gcd77e87115.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] object-name: don't try to abbreviate to lengths greater than hexsz
  2024-06-10 22:27 SEGV (detected by Address Sanitizer) when using `core.abbrev` option Kyle Lippincott
                   ` (2 preceding siblings ...)
  2024-06-11  8:42 ` [PATCH 1/2] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
@ 2024-06-11  8:42 ` Patrick Steinhardt
  2024-06-12  8:03 ` [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
  4 siblings, 0 replies; 14+ messages in thread
From: Patrick Steinhardt @ 2024-06-11  8:42 UTC (permalink / raw)
  To: git; +Cc: Kyle Lippincott, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1993 bytes --]

When given a length that equals the current hash algorithm's hex size,
then `repo_find_unique_abbrev_r()` exits early without trying to find an
abbreviation. This is only sensible because there is nothing to
abbreviate in the first place, so searching through objects to find a
unique prefix would be a waste of compute.

What we don't handle though is the case where the user passes a length
greater than the hash length. This is fine in practice as we still
compute the correct result. But at the very least, this is a waste of
resources as we try to abbreviate a value that cannot be abbreviated,
which causes us to hit the object database.

Start to explicitly handle values larger than hexsz to avoid this
performance penalty, which leads to a measureable speedup. The following
benchmark has been executed in linux.git:

  Benchmark 1: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD~)
    Time (mean ± σ):     12.812 s ±  0.040 s    [User: 12.225 s, System: 0.554 s]
    Range (min … max):   12.723 s … 12.857 s    10 runs

  Benchmark 2: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD)
    Time (mean ± σ):     11.095 s ±  0.029 s    [User: 10.546 s, System: 0.521 s]
    Range (min … max):   11.037 s … 11.122 s    10 runs

  Summary
    git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD) ran
      1.15 ± 0.00 times faster than git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 object-name.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-name.c b/object-name.c
index 523af6f64f..1be2ad1a16 100644
--- a/object-name.c
+++ b/object-name.c
@@ -837,7 +837,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
 	}
 
 	oid_to_hex_r(hex, oid);
-	if (len == hexsz || !len)
+	if (len >= hexsz || !len)
 		return hexsz;
 
 	mad.repo = r;
-- 
2.45.2.436.gcd77e87115.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] config: fix segfault when parsing "core.abbrev" without repo
  2024-06-11  8:42 ` [PATCH 1/2] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
@ 2024-06-11 17:54   ` Junio C Hamano
  2024-06-12  5:55     ` Patrick Steinhardt
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2024-06-11 17:54 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Kyle Lippincott

Patrick Steinhardt <ps@pks.im> writes:

> Fix both of these issues by not making it an error anymore when the
> given length exceeds the hash length. Instead, if we have a repository,
> then we truncate the length to the maximum length of `the_hash_algo`.
> Otherwise, we simply leave the abbreviated length intact and store it
> as-is. This is equivalent to the logic in `parse_opt_abbrev_cb()` and is
> handled just fine by `repo_find_unique_abbrev_r()`. In practice, we
> should never even end up using `default_abbrev` without a repository
> anyway given that abbreviating object IDs to unique prefixes requires us
> to have access to an object database.

Makes sense.

> diff --git a/config.c b/config.c
> index abce05b774..ab2844d9e1 100644
> --- a/config.c
> +++ b/config.c
> @@ -1460,11 +1460,14 @@ static int git_default_core_config(const char *var, const char *value,
>  		if (!strcasecmp(value, "auto"))
>  			default_abbrev = -1;
>  		else if (!git_parse_maybe_bool_text(value))
> -			default_abbrev = the_hash_algo->hexsz;
> +			default_abbrev = startup_info->have_repository ?
> +				the_hash_algo->hexsz : GIT_MAX_HEXSZ;

We will need to have some code that further adjusts overly long
default_abbrev when we really have to abbreviate (at which time,
hopefully we are already aware of the real hash algorithm used in
the repository, and that may be SHA-1) anyway.

So do we even need the conditional here?  Can't we just set it to
GIT_MAX_HEXSZ here unconditionally?

> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index 86c695eb0a..99c063e4cd 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -1237,6 +1237,12 @@ test_expect_success 'log.abbrevCommit configuration' '
>  	test_cmp expect.whatchanged.full actual
>  '
>  
> +test_expect_success 'log.abbrevCommit with --abbrev=9000' '
> +	git log --no-abbrev >expect &&
> +	git log --abbrev-commit --abbrev=9000 >actual &&
> +	test_cmp expect actual
> +'

Interesting.  We didn't have coverage for "we want to see full
object names" case.

> +test_expect_success 'output from clone with core.abbrev does not crash' '
> +	rm -fr dst &&
> +	echo "Cloning into ${SQ}dst${SQ}..." >expect &&
> +	git -c core.abbrev=12 clone -n "file://$(pwd)/src" dst >actual 2>&1 &&
> +	test_cmp expect actual
> +'

OK.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] config: fix segfault when parsing "core.abbrev" without repo
  2024-06-11 17:54   ` Junio C Hamano
@ 2024-06-12  5:55     ` Patrick Steinhardt
  0 siblings, 0 replies; 14+ messages in thread
From: Patrick Steinhardt @ 2024-06-12  5:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Kyle Lippincott

[-- Attachment #1: Type: text/plain, Size: 2338 bytes --]

On Tue, Jun 11, 2024 at 10:54:23AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Fix both of these issues by not making it an error anymore when the
> > given length exceeds the hash length. Instead, if we have a repository,
> > then we truncate the length to the maximum length of `the_hash_algo`.
> > Otherwise, we simply leave the abbreviated length intact and store it
> > as-is. This is equivalent to the logic in `parse_opt_abbrev_cb()` and is
> > handled just fine by `repo_find_unique_abbrev_r()`. In practice, we
> > should never even end up using `default_abbrev` without a repository
> > anyway given that abbreviating object IDs to unique prefixes requires us
> > to have access to an object database.
> 
> Makes sense.
> 
> > diff --git a/config.c b/config.c
> > index abce05b774..ab2844d9e1 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -1460,11 +1460,14 @@ static int git_default_core_config(const char *var, const char *value,
> >  		if (!strcasecmp(value, "auto"))
> >  			default_abbrev = -1;
> >  		else if (!git_parse_maybe_bool_text(value))
> > -			default_abbrev = the_hash_algo->hexsz;
> > +			default_abbrev = startup_info->have_repository ?
> > +				the_hash_algo->hexsz : GIT_MAX_HEXSZ;
> 
> We will need to have some code that further adjusts overly long
> default_abbrev when we really have to abbreviate (at which time,
> hopefully we are already aware of the real hash algorithm used in
> the repository, and that may be SHA-1) anyway.
> 
> So do we even need the conditional here?  Can't we just set it to
> GIT_MAX_HEXSZ here unconditionally?

Not really. I was erring on the safe side here to retain the status quo
for all relevant cases. As explained in the commit message, the length
is only relevant when we have a repository because we otherwise wouldn't
be able to abbreviate anything anyway. So essentially, this change is a
functional no-op.

There's also the question of the second commit, where we only handle
`abbrev == the_hash_algo->size`, but not `abbrev > the_hash_algo->size`.
It works, but is slower when not truncating the length until the second
patch fixes it.

So yes, we can set this unconditionally to `GIT_MAX_HEXSZ`, but out of
abundance of caution I decided to make this conditional.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo
  2024-06-10 22:27 SEGV (detected by Address Sanitizer) when using `core.abbrev` option Kyle Lippincott
                   ` (3 preceding siblings ...)
  2024-06-11  8:42 ` [PATCH 2/2] object-name: don't try to abbreviate to lengths greater than hexsz Patrick Steinhardt
@ 2024-06-12  8:03 ` Patrick Steinhardt
  2024-06-12  8:03   ` [PATCH v2 1/3] " Patrick Steinhardt
                     ` (3 more replies)
  4 siblings, 4 replies; 14+ messages in thread
From: Patrick Steinhardt @ 2024-06-12  8:03 UTC (permalink / raw)
  To: git; +Cc: Kyle Lippincott, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4035 bytes --]

Hi,

this is the second version of my patch series that fixes a segfault when
parsing "core.abbrev" without a repository.

Changes compared to v1:

  - Stop truncating the abbreviation length to the upper boundary
    completely. It's unnecessary, complicates the code and makes us
    dependent on `the_repository`.

  - Adapt `parse_opt_abbrev_cb()` to also stop truncating such that the
    behaviour of "core.abbrev" and `--abbrev` is the same.

  - Extend test coverage a bit.

Thanks!

Patrick

Patrick Steinhardt (3):
  config: fix segfault when parsing "core.abbrev" without repo
  parse-options-cb: stop clamping "--abbrev=" to hash length
  object-name: don't try to abbreviate to lengths greater than hexsz

 config.c           |  4 ++--
 object-name.c      |  2 +-
 parse-options-cb.c |  2 --
 t/t4202-log.sh     | 24 ++++++++++++++++++++++++
 t/t5601-clone.sh   |  7 +++++++
 5 files changed, 34 insertions(+), 5 deletions(-)

Range-diff against v1:
1:  7ded51bbce ! 1:  b48c50dd92 config: fix segfault when parsing "core.abbrev" without repo
    @@ Commit message
         `the_hash_algo` outside of Git repositories.
     
         Fix both of these issues by not making it an error anymore when the
    -    given length exceeds the hash length. Instead, if we have a repository,
    -    then we truncate the length to the maximum length of `the_hash_algo`.
    -    Otherwise, we simply leave the abbreviated length intact and store it
    -    as-is. This is equivalent to the logic in `parse_opt_abbrev_cb()` and is
    -    handled just fine by `repo_find_unique_abbrev_r()`. In practice, we
    -    should never even end up using `default_abbrev` without a repository
    -    anyway given that abbreviating object IDs to unique prefixes requires us
    -    to have access to an object database.
    +    given length exceeds the hash length. Instead, leave the abbreviated
    +    length intact. `repo_find_unique_abbrev_r()` handles this just fine
    +    except for a performance penalty which we will fix in a subsequent
    +    commit.
     
         Reported-by: Kyle Lippincott <spectral@google.com>
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
    @@ config.c: static int git_default_core_config(const char *var, const char *value,
      			default_abbrev = -1;
      		else if (!git_parse_maybe_bool_text(value))
     -			default_abbrev = the_hash_algo->hexsz;
    -+			default_abbrev = startup_info->have_repository ?
    -+				the_hash_algo->hexsz : GIT_MAX_HEXSZ;
    ++			default_abbrev = GIT_MAX_HEXSZ;
      		else {
      			int abbrev = git_config_int(var, value, ctx->kvi);
     -			if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)
     +			if (abbrev < minimum_abbrev)
      				return error(_("abbrev length out of range: %d"), abbrev);
    -+			else if (startup_info->have_repository && abbrev > the_hash_algo->hexsz)
    -+				abbrev = the_hash_algo->hexsz;
      			default_abbrev = abbrev;
      		}
    - 		return 0;
     
      ## t/t4202-log.sh ##
     @@ t/t4202-log.sh: test_expect_success 'log.abbrevCommit configuration' '
      	test_cmp expect.whatchanged.full actual
      '
      
    -+test_expect_success 'log.abbrevCommit with --abbrev=9000' '
    ++test_expect_success '--abbrev-commit with core.abbrev=false' '
     +	git log --no-abbrev >expect &&
    -+	git log --abbrev-commit --abbrev=9000 >actual &&
    ++	git -c core.abbrev=false log --abbrev-commit >actual &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success '--abbrev-commit with core.abbrev=9000' '
    ++	git log --no-abbrev >expect &&
    ++	git -c core.abbrev=9000 log --abbrev-commit >actual &&
     +	test_cmp expect actual
     +'
     +
-:  ---------- > 2:  92860256a6 parse-options-cb: stop clamping "--abbrev=" to hash length
2:  31c0405f85 = 3:  0ccb8d8efa object-name: don't try to abbreviate to lengths greater than hexsz
-- 
2.45.2.457.g8d94cfb545.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/3] config: fix segfault when parsing "core.abbrev" without repo
  2024-06-12  8:03 ` [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
@ 2024-06-12  8:03   ` Patrick Steinhardt
  2024-06-12  9:22     ` Karthik Nayak
  2024-06-12  8:03   ` [PATCH v2 2/3] parse-options-cb: stop clamping "--abbrev=" to hash length Patrick Steinhardt
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Patrick Steinhardt @ 2024-06-12  8:03 UTC (permalink / raw)
  To: git; +Cc: Kyle Lippincott, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3789 bytes --]

The "core.abbrev" config allows the user to specify the minimum length
when abbreviating object hashes. Next to the values "auto" and "no",
this config also accepts a concrete length that needs to be bigger or
equal to the minimum length and smaller or equal to the hash algorithm's
hex length. While the former condition is trivial, the latter depends on
the object format used by the current repository. It is thus a variable
upper boundary that may either be 40 (SHA-1) or 64 (SHA-256).

This has two major downsides. First, the user that specifies this config
must be aware of the object hashes that its repository use. If they want
to configure the value globally, then they cannot pick any value in the
range `[41, 64]` if they have any repository that uses SHA-1. If they
did, Git would error out when parsing the config.

Second, and more importantly, parsing "core.abbrev" crashes when outside
of a Git repository because we dereference `the_hash_algo` to figure out
its hex length. Starting with c8aed5e8da (repository: stop setting SHA1
as the default object hash, 2024-05-07) though, we stopped initializing
`the_hash_algo` outside of Git repositories.

Fix both of these issues by not making it an error anymore when the
given length exceeds the hash length. Instead, leave the abbreviated
length intact. `repo_find_unique_abbrev_r()` handles this just fine
except for a performance penalty which we will fix in a subsequent
commit.

Reported-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 config.c         |  4 ++--
 t/t4202-log.sh   | 12 ++++++++++++
 t/t5601-clone.sh |  7 +++++++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index abce05b774..0416b0f2b6 100644
--- a/config.c
+++ b/config.c
@@ -1460,10 +1460,10 @@ static int git_default_core_config(const char *var, const char *value,
 		if (!strcasecmp(value, "auto"))
 			default_abbrev = -1;
 		else if (!git_parse_maybe_bool_text(value))
-			default_abbrev = the_hash_algo->hexsz;
+			default_abbrev = GIT_MAX_HEXSZ;
 		else {
 			int abbrev = git_config_int(var, value, ctx->kvi);
-			if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)
+			if (abbrev < minimum_abbrev)
 				return error(_("abbrev length out of range: %d"), abbrev);
 			default_abbrev = abbrev;
 		}
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 86c695eb0a..e97826458c 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1237,6 +1237,18 @@ test_expect_success 'log.abbrevCommit configuration' '
 	test_cmp expect.whatchanged.full actual
 '

+test_expect_success '--abbrev-commit with core.abbrev=false' '
+	git log --no-abbrev >expect &&
+	git -c core.abbrev=false log --abbrev-commit >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--abbrev-commit with core.abbrev=9000' '
+	git log --no-abbrev >expect &&
+	git -c core.abbrev=9000 log --abbrev-commit >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'show added path under "--follow -M"' '
 	# This tests for a regression introduced in v1.7.2-rc0~103^2~2
 	test_create_repo regression &&
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index cc0b953f14..5d7ea147f1 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -46,6 +46,13 @@ test_expect_success 'output from clone' '
 	test $(grep Clon output | wc -l) = 1
 '

+test_expect_success 'output from clone with core.abbrev does not crash' '
+	rm -fr dst &&
+	echo "Cloning into ${SQ}dst${SQ}..." >expect &&
+	git -c core.abbrev=12 clone -n "file://$(pwd)/src" dst >actual 2>&1 &&
+	test_cmp expect actual
+'
+
 test_expect_success 'clone does not keep pack' '

 	rm -fr dst &&
-- 
2.45.2.457.g8d94cfb545.dirty

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/3] parse-options-cb: stop clamping "--abbrev=" to hash length
  2024-06-12  8:03 ` [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
  2024-06-12  8:03   ` [PATCH v2 1/3] " Patrick Steinhardt
@ 2024-06-12  8:03   ` Patrick Steinhardt
  2024-06-12  8:03   ` [PATCH v2 3/3] object-name: don't try to abbreviate to lengths greater than hexsz Patrick Steinhardt
  2024-06-12  9:29   ` [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo Karthik Nayak
  3 siblings, 0 replies; 14+ messages in thread
From: Patrick Steinhardt @ 2024-06-12  8:03 UTC (permalink / raw)
  To: git; +Cc: Kyle Lippincott, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2410 bytes --]

The `OPT__ABBREV()` option allows the user to specify the length that
object hashes shall be abbreviated to. This length needs to be in the
range of `(MIN_ABBREV, the_hash_algo->hexsz)`, which is why we clamp the
value as required. While this makes sense in the case of `MIN_ABBREV`,
it is unnecessary for the upper boundary as the value is eventually
passed down to `repo_find_unnique_abbrev_r()`, which handles values
larger than the current hash length just fine.

In the preceding commit, we have changed parsing of the "core.abbrev"
config to stop clamping to the upper boundary. Let's do the same here so
that the code becomes simpler, we are consistent with how we treat the
"core.abbrev" config and so that we stop depending on `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 parse-options-cb.c |  2 --
 t/t4202-log.sh     | 12 ++++++++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/parse-options-cb.c b/parse-options-cb.c
index d99d688d3c..b2aa62a9dc 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -30,8 +30,6 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset)
 				     opt->long_name);
 		if (v && v < MINIMUM_ABBREV)
 			v = MINIMUM_ABBREV;
-		else if (startup_info->have_repository && v > the_hash_algo->hexsz)
-			v = the_hash_algo->hexsz;
 	}
 	*(int *)(opt->value) = v;
 	return 0;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index e97826458c..51f7beb59f 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1243,12 +1243,24 @@ test_expect_success '--abbrev-commit with core.abbrev=false' '
 	test_cmp expect actual
 '
 
+test_expect_success '--abbrev-commit with --no-abbrev' '
+	git log --no-abbrev >expect &&
+	git log --abbrev-commit --no-abbrev >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success '--abbrev-commit with core.abbrev=9000' '
 	git log --no-abbrev >expect &&
 	git -c core.abbrev=9000 log --abbrev-commit >actual &&
 	test_cmp expect actual
 '
 
+test_expect_success '--abbrev-commit with --abbrev=9000' '
+	git log --no-abbrev >expect &&
+	git log --abbrev-commit --abbrev=9000 >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'show added path under "--follow -M"' '
 	# This tests for a regression introduced in v1.7.2-rc0~103^2~2
 	test_create_repo regression &&
-- 
2.45.2.457.g8d94cfb545.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/3] object-name: don't try to abbreviate to lengths greater than hexsz
  2024-06-12  8:03 ` [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
  2024-06-12  8:03   ` [PATCH v2 1/3] " Patrick Steinhardt
  2024-06-12  8:03   ` [PATCH v2 2/3] parse-options-cb: stop clamping "--abbrev=" to hash length Patrick Steinhardt
@ 2024-06-12  8:03   ` Patrick Steinhardt
  2024-06-12  9:29   ` [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo Karthik Nayak
  3 siblings, 0 replies; 14+ messages in thread
From: Patrick Steinhardt @ 2024-06-12  8:03 UTC (permalink / raw)
  To: git; +Cc: Kyle Lippincott, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1993 bytes --]

When given a length that equals the current hash algorithm's hex size,
then `repo_find_unique_abbrev_r()` exits early without trying to find an
abbreviation. This is only sensible because there is nothing to
abbreviate in the first place, so searching through objects to find a
unique prefix would be a waste of compute.

What we don't handle though is the case where the user passes a length
greater than the hash length. This is fine in practice as we still
compute the correct result. But at the very least, this is a waste of
resources as we try to abbreviate a value that cannot be abbreviated,
which causes us to hit the object database.

Start to explicitly handle values larger than hexsz to avoid this
performance penalty, which leads to a measureable speedup. The following
benchmark has been executed in linux.git:

  Benchmark 1: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD~)
    Time (mean ± σ):     12.812 s ±  0.040 s    [User: 12.225 s, System: 0.554 s]
    Range (min … max):   12.723 s … 12.857 s    10 runs

  Benchmark 2: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD)
    Time (mean ± σ):     11.095 s ±  0.029 s    [User: 10.546 s, System: 0.521 s]
    Range (min … max):   11.037 s … 11.122 s    10 runs

  Summary
    git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD) ran
      1.15 ± 0.00 times faster than git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 object-name.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-name.c b/object-name.c
index 523af6f64f..1be2ad1a16 100644
--- a/object-name.c
+++ b/object-name.c
@@ -837,7 +837,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
 	}
 
 	oid_to_hex_r(hex, oid);
-	if (len == hexsz || !len)
+	if (len >= hexsz || !len)
 		return hexsz;
 
 	mad.repo = r;
-- 
2.45.2.457.g8d94cfb545.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] config: fix segfault when parsing "core.abbrev" without repo
  2024-06-12  8:03   ` [PATCH v2 1/3] " Patrick Steinhardt
@ 2024-06-12  9:22     ` Karthik Nayak
  0 siblings, 0 replies; 14+ messages in thread
From: Karthik Nayak @ 2024-06-12  9:22 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Kyle Lippincott, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4166 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> The "core.abbrev" config allows the user to specify the minimum length
> when abbreviating object hashes. Next to the values "auto" and "no",
> this config also accepts a concrete length that needs to be bigger or
> equal to the minimum length and smaller or equal to the hash algorithm's
> hex length. While the former condition is trivial, the latter depends on
> the object format used by the current repository. It is thus a variable
> upper boundary that may either be 40 (SHA-1) or 64 (SHA-256).
>
> This has two major downsides. First, the user that specifies this config
> must be aware of the object hashes that its repository use. If they want
> to configure the value globally, then they cannot pick any value in the
> range `[41, 64]` if they have any repository that uses SHA-1. If they
> did, Git would error out when parsing the config.
>
> Second, and more importantly, parsing "core.abbrev" crashes when outside
> of a Git repository because we dereference `the_hash_algo` to figure out
> its hex length. Starting with c8aed5e8da (repository: stop setting SHA1
> as the default object hash, 2024-05-07) though, we stopped initializing
> `the_hash_algo` outside of Git repositories.
>
> Fix both of these issues by not making it an error anymore when the
> given length exceeds the hash length. Instead, leave the abbreviated
> length intact. `repo_find_unique_abbrev_r()` handles this just fine
> except for a performance penalty which we will fix in a subsequent
> commit.
>
> Reported-by: Kyle Lippincott <spectral@google.com>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  config.c         |  4 ++--
>  t/t4202-log.sh   | 12 ++++++++++++
>  t/t5601-clone.sh |  7 +++++++
>  3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/config.c b/config.c
> index abce05b774..0416b0f2b6 100644
> --- a/config.c
> +++ b/config.c
> @@ -1460,10 +1460,10 @@ static int git_default_core_config(const char *var, const char *value,
>  		if (!strcasecmp(value, "auto"))
>  			default_abbrev = -1;
>  		else if (!git_parse_maybe_bool_text(value))
> -			default_abbrev = the_hash_algo->hexsz;
> +			default_abbrev = GIT_MAX_HEXSZ;

So if the value is set to 'no'ish, we set it to the max value possible.

>  		else {
>  			int abbrev = git_config_int(var, value, ctx->kvi);
> -			if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)
> +			if (abbrev < minimum_abbrev)
>  				return error(_("abbrev length out of range: %d"), abbrev);
>  			default_abbrev = abbrev;
>  		}
>

I was wondering if the documentation for 'core.abbrev' needs to be
modified, but seems like we don't mention how the max value can error
out in the first place so all good there.

> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index 86c695eb0a..e97826458c 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -1237,6 +1237,18 @@ test_expect_success 'log.abbrevCommit configuration' '
>  	test_cmp expect.whatchanged.full actual
>  '
>
> +test_expect_success '--abbrev-commit with core.abbrev=false' '
> +	git log --no-abbrev >expect &&
> +	git -c core.abbrev=false log --abbrev-commit >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success '--abbrev-commit with core.abbrev=9000' '
> +	git log --no-abbrev >expect &&
> +	git -c core.abbrev=9000 log --abbrev-commit >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'show added path under "--follow -M"' '
>  	# This tests for a regression introduced in v1.7.2-rc0~103^2~2
>  	test_create_repo regression &&
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index cc0b953f14..5d7ea147f1 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -46,6 +46,13 @@ test_expect_success 'output from clone' '
>  	test $(grep Clon output | wc -l) = 1
>  '
>
> +test_expect_success 'output from clone with core.abbrev does not crash' '
> +	rm -fr dst &&
> +	echo "Cloning into ${SQ}dst${SQ}..." >expect &&
> +	git -c core.abbrev=12 clone -n "file://$(pwd)/src" dst >actual 2>&1 &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'clone does not keep pack' '
>
>  	rm -fr dst &&
> --
> 2.45.2.457.g8d94cfb545.dirty

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo
  2024-06-12  8:03 ` [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-06-12  8:03   ` [PATCH v2 3/3] object-name: don't try to abbreviate to lengths greater than hexsz Patrick Steinhardt
@ 2024-06-12  9:29   ` Karthik Nayak
  3 siblings, 0 replies; 14+ messages in thread
From: Karthik Nayak @ 2024-06-12  9:29 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Kyle Lippincott, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 712 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> this is the second version of my patch series that fixes a segfault when
> parsing "core.abbrev" without a repository.
>
> Changes compared to v1:
>
>   - Stop truncating the abbreviation length to the upper boundary
>     completely. It's unnecessary, complicates the code and makes us
>     dependent on `the_repository`.
>
>   - Adapt `parse_opt_abbrev_cb()` to also stop truncating such that the
>     behaviour of "core.abbrev" and `--abbrev` is the same.
>
>   - Extend test coverage a bit.
>
> Thanks!
>
> Patrick

The changes look good, I wasn't expecting a benchmark in the last commit
for such a small change, but that was nice to see. Thanks.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-06-12  9:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-10 22:27 SEGV (detected by Address Sanitizer) when using `core.abbrev` option Kyle Lippincott
2024-06-10 23:01 ` Kyle Lippincott
2024-06-10 23:21   ` Junio C Hamano
2024-06-11  6:06 ` Patrick Steinhardt
2024-06-11  8:42 ` [PATCH 1/2] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
2024-06-11 17:54   ` Junio C Hamano
2024-06-12  5:55     ` Patrick Steinhardt
2024-06-11  8:42 ` [PATCH 2/2] object-name: don't try to abbreviate to lengths greater than hexsz Patrick Steinhardt
2024-06-12  8:03 ` [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo Patrick Steinhardt
2024-06-12  8:03   ` [PATCH v2 1/3] " Patrick Steinhardt
2024-06-12  9:22     ` Karthik Nayak
2024-06-12  8:03   ` [PATCH v2 2/3] parse-options-cb: stop clamping "--abbrev=" to hash length Patrick Steinhardt
2024-06-12  8:03   ` [PATCH v2 3/3] object-name: don't try to abbreviate to lengths greater than hexsz Patrick Steinhardt
2024-06-12  9:29   ` [PATCH v2 0/3] config: fix segfault when parsing "core.abbrev" without repo Karthik Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).