git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Erik Faye-Lund <kusmabite@gmail.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [RFC PATCH 02/10] range-diff.c: don't use st_mult() for signed "int"
Date: Thu, 9 Dec 2021 22:39:41 -0500	[thread overview]
Message-ID: <YbLL/YWbjc/sPRyH@coredump.intra.peff.net> (raw)
In-Reply-To: <RFC-patch-02.10-bd7d014c531-20211209T191653Z-avarab@gmail.com>

On Thu, Dec 09, 2021 at 08:19:19PM +0100, Ævar Arnfjörð Bjarmason wrote:

> As documented in 320d0b493a2 (add helpers for detecting size_t
> overflow, 2016-02-19) the arguments to st_mult() and st_add() "must be
> unsigned".

This isn't correct. The comment that says "must be unsigned" is for
unsigned_mult_overflows(). Which indeed would not work correctly for a
signed type. But st_add() is a separate function (and not a macro)
because that means its arguments will always be promoted or converted
into a size_t. And so no matter what you pass it, it will always itself
pass a size_t into unsigned_mult_overflows().

> In subsequent commits further overflows resulting in segfaults will be
> fixed in this code, but let's start by removing this supposed guard
> that does nothing except give us a false sense of
> security. E.g. providing an "n" of INT_MAX here will result in "1" on
> my system, causing us to write into memory.

This guard doesn't do nothing. Your patch here:

> @@ -312,7 +312,7 @@ static void get_correspondences(struct string_list *a, struct string_list *b,
>  	int *cost, c, *a2b, *b2a;
>  	int i, j;
>  
> -	ALLOC_ARRAY(cost, st_mult(n, n));
> +	ALLOC_ARRAY(cost, n * n);
>  	ALLOC_ARRAY(a2b, n);
>  	ALLOC_ARRAY(b2a, n);

makes things strictly worse, because that "n * n" may overflow and cause
us to under-allocate the array. You can see it in isolation with some
extra code like this:

diff --git a/git.c b/git.c
index 5ff21be21f..63349e4b79 100644
--- a/git.c
+++ b/git.c
@@ -850,11 +850,23 @@ static int run_argv(int *argcp, const char ***argv)
 	return done_alias;
 }
 
+static void foo(void)
+{
+	int n = 2 << 16;
+
+	printf("n = %d\n", n);
+	printf("raw mult = %"PRIuMAX"\n", (uintmax_t)(n * n));
+	printf("st_mult = %"PRIuMAX"\n", (uintmax_t)st_mult(n, n));
+	exit(0);
+}
+
 int cmd_main(int argc, const char **argv)
 {
 	const char *cmd;
 	int done_help = 0;
 
+	foo();
+
 	cmd = argv[0];
 	if (!cmd)
 		cmd = "git-help";

With st_mult, we get the correct answer of 16GB. Without it, we end up
with 0!

Back to the original code, if you generate a test setup like this:

diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index e30bc48a29..f552d3086e 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -772,4 +772,11 @@ test_expect_success '--left-only/--right-only' '
 	test_cmp expect actual
 '
 
+test_expect_success 'giant case' '
+	test_commit base &&
+	test_commit_bulk --ref=refs/heads/v1 --message="v1 commit %s" 32768 &&
+	test_commit_bulk --ref=refs/heads/v2 --message="v2 commit %s" 32768 &&
+	git range-diff base v1 v2
+'
+
 test_done

Then we'd allocate a 0-length array for "cost" and segfault as soon as
we try to put anything in it. You can confirm this by applying your
patch, running under gdb, and stopping at the xmalloc() call inside
get_correspondences(). With st_mult(), then we come up with the correct
value of 16GB (or complain about overflow on a 32-bit system).

So st_add() is working as advertised here; it's goal is just to make
sure we never under-allocate. You are right, though, that the code after
that in get_correspondences() has trouble because of the signedness. If
the code used an "unsigned int", it would still be _wrong_. But when it
overflowed, it would always come up with a value under 4GB. So it might
produce a wrong answer, but it couldn't possibly point outside the
bounds of the array and cause a security problem.

But because it's a signed int, we overflow to a negative value and try
to look at memory far before the start of the array. So I can see how it
is tempting to argue that this st_mult() isn't really helping since we
segfault anyway. But I'd disagree. By correctly computing the value of
16GB instead of "0", we know that the ALLOC_ARRAY() line is doing the
right thing. And it may choose not to continue if it can't allocate
16GB. That may happen because you don't have the RAM, but also because
of GIT_ALLOC_LIMIT.

So if you set GIT_ALLOC_LIMIT=4g, for example, you are immune to the bug
here. We'd refuse the allocation and die there, which protects
downstream code from trying to fill in the array with bogus arithmetic.

Dropping the st_mult() does nothing to fix the actual problem (which is
that this function should use a more appropriate type), but introduces
new failure modes.

-Peff

  reply	other threads:[~2021-12-10  3:39 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 19:19 [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 01/10] string-list API: change "nr" and "alloc" to "size_t" Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 02/10] range-diff.c: don't use st_mult() for signed "int" Ævar Arnfjörð Bjarmason
2021-12-10  3:39   ` Jeff King [this message]
2021-12-10 10:22     ` Ævar Arnfjörð Bjarmason
2021-12-10 11:41       ` Jeff King
2021-12-10 12:31         ` Ævar Arnfjörð Bjarmason
2021-12-10 19:24           ` Phillip Wood
2021-12-14 14:34           ` Jeff King
2021-12-10 14:27         ` Johannes Schindelin
2021-12-10 14:58           ` Ævar Arnfjörð Bjarmason
2021-12-11 14:01             ` Johannes Schindelin
2021-12-12 17:44               ` Ævar Arnfjörð Bjarmason
2021-12-14 14:42           ` Jeff King
2021-12-09 19:19 ` [RFC PATCH 03/10] range-diff.c: use "size_t" to refer to "struct string_list"'s "nr" Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 04/10] range-diff: zero out elements in "cost" first Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 05/10] linear-assignment.c: split up compute_assignment() function Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 06/10] linear-assignment.c: take "size_t", not "int" for *_count Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 07/10] linear-assignment.c: convert a macro to a "static inline" function Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 08/10] linear-assignment.c: detect signed add/mul on GCC and Clang Ævar Arnfjörð Bjarmason
2021-12-10  3:56   ` Jeff King
2021-12-09 19:19 ` [RFC PATCH 09/10] linear-assignment.c: add and use intprops.h from Gnulib Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 10/10] linear-assignment.c: use "intmax_t" instead of "int" Ævar Arnfjörð Bjarmason
2021-12-10  4:00   ` Jeff King
2021-12-10 12:30 ` [RFC PATCH v2 0/5] range-diff: fix segfault due to integer overflow Ævar Arnfjörð Bjarmason
2021-12-10 12:30   ` [RFC PATCH v2 1/5] range-diff: zero out elements in "cost" first Ævar Arnfjörð Bjarmason
2021-12-14 13:36     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 2/5] linear-assignment.c: split up compute_assignment() function Ævar Arnfjörð Bjarmason
2021-12-14 13:39     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 3/5] linear-assignment.c: take "size_t", not "int" for *_count Ævar Arnfjörð Bjarmason
2021-12-14 13:40     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 4/5] range-diff.c: rename "n" to "column_count" in get_correspondences() Ævar Arnfjörð Bjarmason
2021-12-14 13:42     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 5/5] range-diff: fix integer overflow & segfault on cost[i + n * j] Ævar Arnfjörð Bjarmason
2021-12-14 14:04     ` Jeff King
2021-12-10 14:31 ` [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow Johannes Schindelin
2021-12-10 15:07   ` Ævar Arnfjörð Bjarmason
2021-12-21 23:22   ` Philip Oakley
2021-12-21 23:36     ` Ævar Arnfjörð Bjarmason
2021-12-22 20:50       ` Johannes Schindelin
2021-12-22 21:11         ` Jeff King
2021-12-24 11:15       ` Philip Oakley
2021-12-24 16:46         ` Ævar Arnfjörð Bjarmason
2021-12-24 18:31           ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbLL/YWbjc/sPRyH@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=kusmabite@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).