git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, "Jeff King" <peff@peff.net>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Carlos Andrés Ramírez Cataño" <antaigroupltda@gmail.com>
Subject: Re: [PATCH] object-name: reject too-deep recursive ancestor queries
Date: Thu, 23 Nov 2023 14:53:47 +0100	[thread overview]
Message-ID: <ZV9Za7iCL6WiE-Py@tanuki> (raw)
In-Reply-To: <57c0b30ddfe7c0ae78069682ff8454791e54469f.1700496801.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 9200 bytes --]

On Mon, Nov 20, 2023 at 11:13:45AM -0500, Taylor Blau wrote:
> When trying to resolve a revision query like "HEAD~~~~~", our call
> pattern looks something like:
> 
>   - object-name.c::get_oid_with_context()
>   - object-name.c::get_oid_1()
>   - object-name.c::get_nth_ancestor()
>   - object-name.c::get_oid_1()
>   - ...
> 
> With `get_nth_ancestor()` and `get_oid_1()` mutually recurring, popping
> one '~' off of the revision query for each round of the recursion.

One thing I noticed just now is that we have exactly the same problem
with `^`, just with a different callstack. This problem isn't yet
addressed by your patch.

I have to wonder whether we should tighten restrictions even further:
instead of manually keeping track of how deep in the stack we are, we
limit the length of revisions to at most 1MB. I would claim that this
limit is sufficiently large to never be a problem in practice. Revisions
are limited to 4kB on most platforms anyway due to the maximum path
length. And if somebody really wants to request the (1024 * 1024) + 1th
parent, they shouldn't do that by appending this many "~" or "^" chars,
but instead they should ask for "~1048577" or "^1048577".

I realize that this is much more restrictive than the current patch. But
it would be a good defensive mechanism against all kinds of weird revs,
and I am very certain that there are other ways to blow the stack or
cause out-of-bounds reads or writes here.

Patrick

> Since this recursive behavior is unbounded, having too many "~"'s
> contained in a revision query will cause us to blow the stack.
> Generating a message like this when compiled under SANITIZE=address:
> 
>     $ valgrind git rev-parse "HEAD$(perl -e "print \"~\" x 1000000000000")"
>     ==597453== Memcheck, a memory error detector
>     ==597453== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
>     ==597453== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
>     ==597453== Command: /home/ttaylorr/local/bin/git.compile diff HEAD~~~~~~~~~~~~[...]
>     ==597453==
>     AddressSanitizer:DEADLYSIGNAL
>     =================================================================
>     ==597453==ERROR: AddressSanitizer: stack-overflow on address 0x7fffdd838ff8 (pc 0x7f2726082748 bp 0x7fffdd839110 sp 0x7fffdd839000 T0)
>         #0 0x7f2726082748 in __asan::GetTLSFakeStack() ../../../../src/libsanitizer/asan/asan_fake_stack.cpp:176
>         #1 0x7f2726082748 in GetFakeStackFast ../../../../src/libsanitizer/asan/asan_fake_stack.cpp:193
>         #2 0x7f27260833de in OnMalloc ../../../../src/libsanitizer/asan/asan_fake_stack.cpp:207
>         #3 0x7f27260833de in __asan_stack_malloc_1 ../../../../src/libsanitizer/asan/asan_fake_stack.cpp:256
>         #4 0x563f9077d9d8 in get_nth_ancestor /home/ttaylorr/src/git/object-name.c:1087
>         #5 0x563f9077e957 in get_oid_1 /home/ttaylorr/src/git/object-name.c:1295
>         #6 0x563f9077da64 in get_nth_ancestor /home/ttaylorr/src/git/object-name.c:1092
>         #7 0x563f9077e957 in get_oid_1 /home/ttaylorr/src/git/object-name.c:1295
>         #8 0x563f9077da64 in get_nth_ancestor /home/ttaylorr/src/git/object-name.c:1092
>         [...]
>         #247 0x563f9077e957 in get_oid_1 /home/ttaylorr/src/git/object-name.c:1295
>         #248 0x563f9077da64 in get_nth_ancestor /home/ttaylorr/src/git/object-name.c:1092
> 
>     SUMMARY: AddressSanitizer: stack-overflow ../../../../src/libsanitizer/asan/asan_fake_stack.cpp:176 in __asan::GetTLSFakeStack()
>     ==597453==ABORTING
> 
> (Note that the actual stack is much deeper. GDB reports that the bottom
> of the stack looks something like the following):
> 
>     #54866 0x0000555555c6d3bf in get_oid_with_context_1 (repo=0x5555563849a0 <the_repo>, name=0x7fffffff4be5 "HEAD", '~' <repeats 196 times>..., flags=128, prefix=0x0, oid=0x7ffff5713d40, oc=0x7ffff5713d90) at object-name.c:1947
>     #54867 0x0000555555c6e2fa in get_oid_with_context (repo=0x5555563849a0 <the_repo>, str=0x7fffffff4be5 "HEAD", '~' <repeats 196 times>..., flags=128, oid=0x7ffff5713d40, oc=0x7ffff5713d90) at object-name.c:2096
>     #54868 0x0000555555d8eed8 in handle_revision_arg_1 (arg_=0x7fffffff4be5 "HEAD", '~' <repeats 196 times>..., revs=0x7ffff5b000d0, flags=0, revarg_opt=0) at revision.c:2174
>     #54869 0x0000555555d8f1a9 in handle_revision_arg (arg=0x7fffffff4be5 "HEAD", '~' <repeats 196 times>..., revs=0x7ffff5b000d0, flags=0, revarg_opt=0) at revision.c:2189
>     #54870 0x0000555555d97ca9 in setup_revisions (argc=2, argv=0x7fffffff4970, revs=0x7ffff5b000d0, opt=0x0) at revision.c:2932
>     #54871 0x00005555557d6a63 in cmd_diff (argc=2, argv=0x7fffffff4970, prefix=0x0) at builtin/diff.c:502
>     #54872 0x00005555557367bf in run_builtin (p=0x5555561c4c30 <commands+816>, argc=2, argv=0x7fffffff4970) at git.c:469
>     #54873 0x000055555573716b in handle_builtin (argc=2, argv=0x7fffffff4970) at git.c:723
>     #54874 0x000055555573785a in run_argv (argcp=0x7ffff56028b0, argv=0x7ffff56028e0) at git.c:787
>     #54875 0x0000555555738626 in cmd_main (argc=2, argv=0x7fffffff4970) at git.c:922
>     #54876 0x00005555559d3fdd in main (argc=3, argv=0x7fffffff4968) at common-main.c:62
> 
> Fortunately, we can impose a limit on the maximum recursion depth we're
> willing to accept when resolving queries like the above without
> significantly impeding users. This patch sets the limit at 4096, though
> we could probably increase that limit depending on the size of each
> frame.
> 
> The limit introduced here is large enough that any reasonable query
> should still run to completion, but small enough that if the frame size
> were to significantly increase, our protection would still be effective.
> 
> The change here is straightforward: each call to get_nth_ancestor()
> increases a counter, and then decrements that counter before returning.
> 
> The diff is a little noisy since there are a handful of return paths
> from `get_nth_ancestor()`, all of which need to decrement the depth
> variable.
> 
> Since this is a local-only exploit, a user would have to be tricked into
> running such a query by an adversary. Even if they were successfully
> tricked into running the malicious query, the blast radius is limited to
> a local stack overflow, which does not have meaningful paths to remote
> code execution, arbitrary memory reads, or any more grave security
> concerns.
> 
> Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  object-name.c                  | 26 ++++++++++++++++++++------
>  t/t1506-rev-parse-diagnosis.sh |  5 +++++
>  2 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/object-name.c b/object-name.c
> index 0bfa29dbbf..675e0a759e 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -1080,6 +1080,9 @@ static enum get_oid_result get_parent(struct repository *r,
>  	return MISSING_OBJECT;
>  }
>  
> +static int get_nth_ancestor_max_depth = 4096;
> +static int get_nth_ancestor_curr_depth;
> +
>  static enum get_oid_result get_nth_ancestor(struct repository *r,
>  					    const char *name, int len,
>  					    struct object_id *result,
> @@ -1089,20 +1092,31 @@ static enum get_oid_result get_nth_ancestor(struct repository *r,
>  	struct commit *commit;
>  	int ret;
>  
> +	if (++get_nth_ancestor_curr_depth > get_nth_ancestor_max_depth)
> +		 return error(_("exceeded maximum ancestor depth"));
> +
>  	ret = get_oid_1(r, name, len, &oid, GET_OID_COMMITTISH);
>  	if (ret)
> -		return ret;
> +		goto done;
>  	commit = lookup_commit_reference(r, &oid);
> -	if (!commit)
> -		return MISSING_OBJECT;
> +	if (!commit) {
> +		ret = MISSING_OBJECT;
> +		goto done;
> +	}
>  
>  	while (generation--) {
> -		if (repo_parse_commit(r, commit) || !commit->parents)
> -			return MISSING_OBJECT;
> +		if (repo_parse_commit(r, commit) || !commit->parents) {
> +			ret = MISSING_OBJECT;
> +			goto done;
> +		}
>  		commit = commit->parents->item;
>  	}
>  	oidcpy(result, &commit->object.oid);
> -	return FOUND;
> +
> +	ret = FOUND;
> +done:
> +	get_nth_ancestor_curr_depth--;
> +	return ret;
>  }
>  
>  struct object *repo_peel_to_type(struct repository *r, const char *name, int namelen,
> diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
> index ef40511d89..b3b9f6c8c5 100755
> --- a/t/t1506-rev-parse-diagnosis.sh
> +++ b/t/t1506-rev-parse-diagnosis.sh
> @@ -244,6 +244,11 @@ test_expect_success 'reject Nth ancestor if N is too high' '
>  	test_must_fail git rev-parse HEAD~100000000000000000000000000000000
>  '
>  
> +test_expect_success 'reject too-deep recursive ancestor queries' '
> +	test_must_fail git rev-parse "HEAD$(perl -e "print \"~\" x 4097")" 2>err &&
> +	grep "error: exceeded maximum ancestor depth" err
> +'
> +
>  test_expect_success 'pathspecs with wildcards are not ambiguous' '
>  	echo "*.c" >expect &&
>  	git rev-parse "*.c" >actual &&
> 
> base-commit: cfb8a6e9a93adbe81efca66e6110c9b4d2e57169
> -- 
> 2.43.0.rc2.19.geadd45bf00

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2023-11-23 13:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-20 16:13 [PATCH] object-name: reject too-deep recursive ancestor queries Taylor Blau
2023-11-20 17:14 ` Junio C Hamano
2023-11-23 13:53 ` Patrick Steinhardt [this message]
2023-11-24  9:44   ` Junio C Hamano
2023-11-24 10:11     ` Patrick Steinhardt
2023-12-06 19:40       ` Jeff King
2023-12-07  6:52         ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZV9Za7iCL6WiE-Py@tanuki \
    --to=ps@pks.im \
    --cc=antaigroupltda@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).