UBSan failing on expensive test(s)

Git development
 help / color / mirror / Atom feed

* UBSan failing on expensive test(s)
@ 2026-05-15 23:43 Junio C Hamano
  2026-05-16  2:13 ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2026-05-15 23:43 UTC (permalink / raw)
  To: git

This started happening on 'next' that runs EXPENSIVE tests thanks to
Dscho's recent updates to enable them in CI.

https://github.com/git/git/actions/runs/25896439353/job/76110441841#step:10:2172

It claims that """

    commit.c:1574:6: runtime error: signed integer overflow:
    -2147483648 - 1 cannot be represented in type 'int'

""".

Another is related in the sense that it used to be hidden behind
EXPENSIVE prerequisite, but is probably unrelated.

https://github.com/git/git/actions/runs/25896439353/job/76110441842#step:10:156


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: UBSan failing on expensive test(s)
  2026-05-15 23:43 UBSan failing on expensive test(s) Junio C Hamano
@ 2026-05-16  2:13 ` Jeff King
  2026-05-16  2:16   ` [PATCH 1/2] apply: plug leak on "patch too large" error Jeff King
  2026-05-16  2:23   ` [PATCH 2/2] commit: handle large commit messages in utf8 verification Jeff King
  0 siblings, 2 replies; 4+ messages in thread
From: Jeff King @ 2026-05-16  2:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sat, May 16, 2026 at 08:43:07AM +0900, Junio C Hamano wrote:

> This started happening on 'next' that runs EXPENSIVE tests thanks to
> Dscho's recent updates to enable them in CI.
> 
> https://github.com/git/git/actions/runs/25896439353/job/76110441841#step:10:2172
> 
> It claims that """
> 
>     commit.c:1574:6: runtime error: signed integer overflow:
>     -2147483648 - 1 cannot be represented in type 'int'
> 
> """.
> 
> Another is related in the sense that it used to be hidden behind
> EXPENSIVE prerequisite, but is probably unrelated.
> 
> https://github.com/git/git/actions/runs/25896439353/job/76110441842#step:10:156

These patches should fix both.

  [1/2]: apply: plug leak on "patch too large" error
  [2/2]: commit: handle large commit messages in utf8 verification

 apply.c  |  6 ++++--
 commit.c | 31 +++++++++++++++----------------
 2 files changed, 19 insertions(+), 18 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] apply: plug leak on "patch too large" error
  2026-05-16  2:13 ` Jeff King
@ 2026-05-16  2:16   ` Jeff King
  2026-05-16  2:23   ` [PATCH 2/2] commit: handle large commit messages in utf8 verification Jeff King
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff King @ 2026-05-16  2:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

In apply_patch(), we return immediately if read_patch_file() returns an
error. Traditionally this was OK, since an error from strbuf_read()
would restore the strbuf to its unallocated state.

But since f1c0e3946e (apply: reject patches larger than ~1 GiB,
2022-10-25), we may also return an error if we successfully read the
patch but it is too large. In this case we leak the strbuf contents when
apply_patch() returns.

You can see it in action by running t4141 under LSan with the EXPENSIVE
prereq enabled.

We can fix this in one of two places:

  1. In read_patch_file(), we could release the buffer before returning
     the error, behaving more like a raw strbuf_read() call.

  2. In apply_patch(), we can release the strbuf ourselves before
     returning.

I picked the latter, since it future proofs us against read_patch_file()
getting new error modes. We also have a cleanup label in that function
already, so now our error handling at this spot matches the rest of the
function (and all of the variables are initialized such that the rest of
the cleanup is correctly a noop at this point).

Signed-off-by: Jeff King <peff@peff.net>
---
 apply.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/apply.c b/apply.c
index 4aa1694cfa..249248d4f2 100644
--- a/apply.c
+++ b/apply.c
@@ -4881,8 +4881,10 @@ static int apply_patch(struct apply_state *state,

 	state->patch_input_file = filename;
 	state->linenr = 1;
-	if (read_patch_file(&buf, fd) < 0)
-		return -128;
+	if (read_patch_file(&buf, fd) < 0) {
+		res = -128;
+		goto end;
+	}
 	offset = 0;
 	while (offset < buf.len) {
 		struct patch *patch;
-- 
2.54.0.490.gaeb18d0c26

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] commit: handle large commit messages in utf8 verification
  2026-05-16  2:13 ` Jeff King
  2026-05-16  2:16   ` [PATCH 1/2] apply: plug leak on "patch too large" error Jeff King
@ 2026-05-16  2:23   ` Jeff King
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff King @ 2026-05-16  2:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Running t4205 under UBSan with the EXPENSIVE prereq enabled triggers an
error when we try to create a commit message that is over 2GB:

  commit.c:1574:6: runtime error: signed integer overflow:
    -2147483648 - 1 cannot be represented in type 'int'

The problem is that find_invalid_utf8() is not prepared to handle
large buffers, as it uses an "int" to represent buffer sizes and
offsets.

We can fix this with a few changes:

  1. We'll take in "len" as a size_t (which is what the caller has
     anyway, since it's working with a strbuf).

  2. We need to return a size_t to give the offset to the invalid utf8,
     but we also need a sentinel value for "no invalid value"
     (previously "-1"). Let's split these to return a bool for "found
     invalid utf8" and then pass back the offset as an out-parameter.
     We'll switch the function name to match the new semantics.

  3. The caller in verify_utf8() uses a "long" to store buffer
     positions, which is a bit funny. This goes back to 08a94a145c
     (commit/commit-tree: correct latin1 to utf-8, 2012-06-28) and is
     perhaps trying to match our use of "unsigned long" for object sizes
     (though we don't care about it ever becoming negative here). This
     should be a size_t, too, as some platforms (like Windows) still use
     a 32-bit long on machines with 64-bit pointers.

  4. The "bytes" field within find_invalid_utf() does not have range
     problems. It is the number of bytes the utf8 sequence claims to
     have, so is limited by how many bits can be set in a single 8-bit
     byte. However, if we leave it as an "int" then the compiler will
     complain about the sign mismatch when comparing it to "len". So
     let's make it unsigned, too.

All of this is a little silly, of course, because 2GB text commit
messages are clearly nonsense. So we might consider rejecting them
outright, but it is easy enough to make these helper functions more
robust in the meantime.

Signed-off-by: Jeff King <peff@peff.net>
---
I tried to look carefully for any reasons why these variables could ever
be negative, but beyond the sentinel value in the return type, didn't
see one. But reviewers should double check. Note that "bad_offset =
offset-1" in the context looks suspicious, but we are guaranteed that
offset has been advanced at this point.

 commit.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/commit.c b/commit.c
index 4385ae4329..8cf09be39d 100644
--- a/commit.c
+++ b/commit.c
@@ -1558,16 +1558,16 @@ int commit_tree(const char *msg, size_t msg_len, const struct object_id *tree,
 	return result;
 }
 
-static int find_invalid_utf8(const char *buf, int len)
+static bool has_invalid_utf8(const char *buf, size_t len, size_t *bad_offset)
 {
-	int offset = 0;
+	size_t offset = 0;
 	static const unsigned int max_codepoint[] = {
 		0x7f, 0x7ff, 0xffff, 0x10ffff
 	};
 
 	while (len) {
 		unsigned char c = *buf++;
-		int bytes, bad_offset;
+		unsigned bytes;
 		unsigned int codepoint;
 		unsigned int min_val, max_val;
 
@@ -1578,7 +1578,7 @@ static int find_invalid_utf8(const char *buf, int len)
 		if (c < 0x80)
 			continue;
 
-		bad_offset = offset-1;
+		*bad_offset = offset-1;
 
 		/*
 		 * Count how many more high bits set: that's how
@@ -1595,11 +1595,11 @@ static int find_invalid_utf8(const char *buf, int len)
 		 * codepoints beyond U+10FFFF, which are guaranteed never to exist.
 		 */
 		if (bytes < 1 || 3 < bytes)
-			return bad_offset;
+			return true;
 
 		/* Do we *have* that many bytes? */
 		if (len < bytes)
-			return bad_offset;
+			return true;
 
 		/*
 		 * Place the encoded bits at the bottom of the value and compute the
@@ -1617,23 +1617,23 @@ static int find_invalid_utf8(const char *buf, int len)
 			codepoint <<= 6;
 			codepoint |= *buf & 0x3f;
 			if ((*buf++ & 0xc0) != 0x80)
-				return bad_offset;
+				return true;
 		} while (--bytes);
 
 		/* Reject codepoints that are out of range for the sequence length. */
 		if (codepoint < min_val || codepoint > max_val)
-			return bad_offset;
+			return true;
 		/* Surrogates are only for UTF-16 and cannot be encoded in UTF-8. */
 		if ((codepoint & 0x1ff800) == 0xd800)
-			return bad_offset;
+			return true;
 		/* U+xxFFFE and U+xxFFFF are guaranteed non-characters. */
 		if ((codepoint & 0xfffe) == 0xfffe)
-			return bad_offset;
+			return true;
 		/* So are anything in the range U+FDD0..U+FDEF. */
 		if (codepoint >= 0xfdd0 && codepoint <= 0xfdef)
-			return bad_offset;
+			return true;
 	}
-	return -1;
+	return false;
 }
 
 /*
@@ -1645,15 +1645,14 @@ static int find_invalid_utf8(const char *buf, int len)
 static int verify_utf8(struct strbuf *buf)
 {
 	int ok = 1;
-	long pos = 0;
+	size_t pos = 0;
 
 	for (;;) {
-		int bad;
+		size_t bad;
 		unsigned char c;
 		unsigned char replace[2];
 
-		bad = find_invalid_utf8(buf->buf + pos, buf->len - pos);
-		if (bad < 0)
+		if (!has_invalid_utf8(buf->buf + pos, buf->len - pos, &bad))
 			return ok;
 		pos += bad;
 		ok = 0;
-- 
2.54.0.490.gaeb18d0c26

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-16  2:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 23:43 UBSan failing on expensive test(s) Junio C Hamano
2026-05-16  2:13 ` Jeff King
2026-05-16  2:16   ` [PATCH 1/2] apply: plug leak on "patch too large" error Jeff King
2026-05-16  2:23   ` [PATCH 2/2] commit: handle large commit messages in utf8 verification Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox