[PATCH 1/6] reencode_string: use st_add/st_mult helpers

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 1/6] reencode_string: use st_add/st_mult helpers
Date: Tue, 24 Jul 2018 06:50:10 -0400	[thread overview]
Message-ID: <20180724105009.GA17165@sigill.intra.peff.net> (raw)
In-Reply-To: <20180724104852.GA14638@sigill.intra.peff.net>

When converting a string with iconv, if the output buffer
isn't big enough, we grow it. But our growth is done without
any concern for integer overflow. So when we add:

  outalloc = sofar + insz * 2 + 32;

we may end up wrapping outalloc (which is a size_t), and
allocating a too-small buffer. We then manipulate it
further:

  outsz = outalloc - sofar - 1;

and feed outsz back to iconv. If outalloc is wrapped and
smaller than sofar, we'll end up with a small allocation but
feed a very large outsz to iconv, which could result in it
overflowing the buffer.

Can we use this to construct an attack wherein the victim
clones a repository with a very large commit object with an
encoding header, and running "git log" reencodes it into
utf8, causing an overflow?

An attack of this sort is likely impossible in practice.
"sofar" is how many output bytes we've written total, and
"insz" is the number of input bytes remaining. Imagine our
input doubles in size as we output it (which is easy to do
by converting latin1 to utf8, for example), and that we
start with N input bytes. Our initial output buffer also
starts at N bytes, so after the first call we'd have N/2
input bytes remaining (insz), and have written N bytes
(sofar). That means our next allocation will be
(N + N/2 * 2 + 32) bytes, or (2N + 32).

We can therefore overflow a 32-bit size_t with a commit
message that's just under 2^31 bytes, assuming it consists
mostly of "doubling" sequences (e.g., latin1 0xe1 which
becomes utf8 0xc3 0xa1).

But we'll never make it that far with such a message. We'll
be spending 2^31 bytes on the original string. And our
initial output buffer will also be 2^31 bytes. Which is not
going to succeed on a system with a 32-bit size_t, since
there will be other things using the address space, too. The
initial malloc will fail.

If we imagine instead that we can triple the size when
converting, then our second allocation becomes
(N + 2/3N * 2 + 32), or (7/3N + 32). That still requires two
allocations of 3/7 of our address space (6/7 of the total)
to succeed.

If we imagine we can quadruple, it becomes (5/2N + 32); we
need to be able to allocate 4/5 of the address space to
succeed.

This might start to get plausible. But is it possible to get
a 4-to-1 increase in size? Probably if you're converting to
some obscure encoding. But since git defaults to utf8 for
its output, that's the likely destination encoding for an
attack. And while there are 4-character utf8 sequences, it's
unlikely that you'd be able find a single-byte source
sequence in any encoding.

So this is certainly buggy code which should be fixed, but
it is probably not a useful attack vector.

Signed-off-by: Jeff King <peff@peff.net>
---
 utf8.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/utf8.c b/utf8.c
index d55e20c641..a2fd24c70a 100644
--- a/utf8.c
+++ b/utf8.c
@@ -477,7 +477,7 @@ char *reencode_string_iconv(const char *in, size_t insz, iconv_t conv, int *outs
 	iconv_ibp cp;

 	outsz = insz;
-	outalloc = outsz + 1; /* for terminating NUL */
+	outalloc = st_add(outsz, 1); /* for terminating NUL */
 	out = xmalloc(outalloc);
 	outpos = out;
 	cp = (iconv_ibp)in;
@@ -497,7 +497,7 @@ char *reencode_string_iconv(const char *in, size_t insz, iconv_t conv, int *outs
 			 * converting the rest.
 			 */
 			sofar = outpos - out;
-			outalloc = sofar + insz * 2 + 32;
+			outalloc = st_add3(sofar, st_mult(insz, 2), 32);
 			out = xrealloc(out, outalloc);
 			outpos = out + sofar;
 			outsz = outalloc - sofar - 1;
-- 
2.18.0.542.g2bf2fc4f7e

next prev parent reply	other threads:[~2018-07-24 10:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-24 10:48 [PATCH 0/6] use size_t in iconv/strbuf Jeff King
2018-07-24 10:50 ` Jeff King [this message]
2018-07-24 10:50 ` [PATCH 2/6] reencode_string: use size_t for string lengths Jeff King
2018-07-24 10:51 ` [PATCH 3/6] strbuf: use size_t for length in intermediate variables Jeff King
2018-07-24 10:51 ` [PATCH 4/6] strbuf_readlink: use ssize_t Jeff King
2018-07-24 10:51 ` [PATCH 5/6] pass st.st_size as hint for strbuf_readlink() Jeff King
2018-07-25 18:41   ` Torsten Bögershausen
2018-07-26  6:09     ` Jeff King
2018-07-24 10:52 ` [PATCH 6/6] strbuf_humanise: use unsigned variables Jeff King

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:d55e20c64 dfblob:a2fd24c70 )
 OR (
bs:"[PATCH 1/6] reencode_string: use st_add/st_mult helpers" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180724105009.GA17165@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).