Git development
 help / color / mirror / Atom feed
* [PATCH] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation
@ 2026-04-17 16:26 Elijah Newren via GitGitGadget
  2026-04-17 19:21 ` Junio C Hamano
  2026-04-17 22:45 ` [PATCH v2] " Elijah Newren via GitGitGadget
  0 siblings, 2 replies; 9+ messages in thread
From: Elijah Newren via GitGitGadget @ 2026-04-17 16:26 UTC (permalink / raw)
  To: git; +Cc: LorenzoPegorari, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

f85b49f3d4a (diff: improve scaling of filenames in diffstat to handle
UTF-8 chars, 2024-10-27) introduced a loop in show_stats() that calls
utf8_width() repeatedly to skip leading characters until the displayed
width fits.  However, utf8_width() can return problematic values:

  - For invalid UTF-8 sequences, pick_one_utf8_char() sets the name
    pointer to NULL and utf8_width() returns 0.  Since name_len does
    not change, the loop iterates once more and pick_one_utf8_char()
    dereferences the NULL pointer, crashing.

  - For control characters, utf8_width() returns -1, so name_len
    grows when it is expected to shrink.  This can cause the loop to
    consume more characters than the string contains, reading past
    the trailing NUL.

By default, fill_print_name() will C-quotes filenames which escapes
control characters and invalid bytes to printable text.  That avoids
this bug from being triggered; however, with core.quotePath=false,
raw bytes can reach this code.

Add tests exercising both failure modes with core.quotePath=false and
a narrow --stat-name-width to force truncation: one with a bare 0xC0
byte (invalid UTF-8 lead byte, triggers NULL deref) and one with a
0x01 byte (control character, causes the loop to read past the end
of the string).

Fix the bug by:
  - Adding a *name check to terminate the loop at end-of-string
  - Detecting the NULL pointer from invalid UTF-8 and falling back to
    showing the full untruncated name
  - Breaking on negative width (control characters)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
    diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8
    truncation
    
    Maintainer note: This is a new bug from the v2.54 cycle

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2093%2Fnewren%2Ffix%2Fdiffstat-utf8-loop-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2093/newren/fix/diffstat-utf8-loop-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2093

 diff.c                 | 13 +++++++++++--
 t/t4052-stat-output.sh | 25 +++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 397e38b41c..7b27241733 100644
--- a/diff.c
+++ b/diff.c
@@ -3093,8 +3093,17 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			if (len < 0)
 				len = 0;
 
-			while (name_len > len)
-				name_len -= utf8_width((const char**)&name, NULL);
+			while (name_len > len && *name) {
+				int w = utf8_width((const char **)&name, NULL);
+				if (!name) { /* Invalid UTF-8 */
+					name = file->print_name;
+					name_len = utf8_strwidth(name);
+					break;
+				}
+				if (w < 0)  /* control character */
+					break;
+				name_len -= w;
+			}
 
 			slash = strchr(name, '/');
 			if (slash)
diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
index 7c749062e2..84c53c1a51 100755
--- a/t/t4052-stat-output.sh
+++ b/t/t4052-stat-output.sh
@@ -445,4 +445,29 @@ test_expect_success 'diffstat where line_prefix contains ANSI escape codes is co
 	test_grep "<RED>|<RESET>  ${FILENAME_TRIMMED} | 0" out
 '
 
+test_expect_success 'diffstat truncation with invalid UTF-8 does not crash' '
+	empty_blob=$(git hash-object -w --stdin </dev/null) &&
+	printf "100644 blob $empty_blob\taaa-\300-aaa\n" |
+	git mktree >tree_file &&
+	tree=$(cat tree_file) &&
+	empty_tree=$(git mktree </dev/null) &&
+	c1=$(git commit-tree -m before $empty_tree) &&
+	c2=$(git commit-tree -m after -p $c1 $tree) &&
+	git -c core.quotepath=false diff --stat --stat-name-width=5 $c1..$c2 >output &&
+	test_grep "| 0" output
+'
+
+test_expect_success FUNNYNAMES 'diffstat truncation with control chars does not crash' '
+	FNAME=$(printf "aaa-\x01-aaa") &&
+	git commit --allow-empty -m setup &&
+	>$FNAME &&
+	git add -- $FNAME &&
+	git commit -m "add file with control char name" &&
+	git -c core.quotepath=false diff --stat --stat-name-width=5 HEAD~1..HEAD >output &&
+	test_grep "| 0" output &&
+	rm -- $FNAME &&
+	git rm -- $FNAME &&
+	git commit -m "remove test file"
+'
+
 test_done

base-commit: 9f223ef1c026d91c7ac68cc0211bde255dda6199
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-04-20 16:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-17 16:26 [PATCH] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation Elijah Newren via GitGitGadget
2026-04-17 19:21 ` Junio C Hamano
2026-04-17 22:00   ` Elijah Newren
2026-04-17 22:21     ` Junio C Hamano
2026-04-17 22:45 ` [PATCH v2] " Elijah Newren via GitGitGadget
2026-04-19 23:52   ` Lorenzo Pegorari
2026-04-20 14:51     ` Elijah Newren
2026-04-20 15:42   ` [PATCH v3] " Elijah Newren via GitGitGadget
2026-04-20 16:41     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox