From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>,
Elijah Newren <newren@gmail.com>,
Elijah Newren <newren@gmail.com>
Subject: [PATCH] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation
Date: Fri, 17 Apr 2026 16:26:03 +0000 [thread overview]
Message-ID: <pull.2093.git.1776443163041.gitgitgadget@gmail.com> (raw)
From: Elijah Newren <newren@gmail.com>
f85b49f3d4a (diff: improve scaling of filenames in diffstat to handle
UTF-8 chars, 2024-10-27) introduced a loop in show_stats() that calls
utf8_width() repeatedly to skip leading characters until the displayed
width fits. However, utf8_width() can return problematic values:
- For invalid UTF-8 sequences, pick_one_utf8_char() sets the name
pointer to NULL and utf8_width() returns 0. Since name_len does
not change, the loop iterates once more and pick_one_utf8_char()
dereferences the NULL pointer, crashing.
- For control characters, utf8_width() returns -1, so name_len
grows when it is expected to shrink. This can cause the loop to
consume more characters than the string contains, reading past
the trailing NUL.
By default, fill_print_name() will C-quotes filenames which escapes
control characters and invalid bytes to printable text. That avoids
this bug from being triggered; however, with core.quotePath=false,
raw bytes can reach this code.
Add tests exercising both failure modes with core.quotePath=false and
a narrow --stat-name-width to force truncation: one with a bare 0xC0
byte (invalid UTF-8 lead byte, triggers NULL deref) and one with a
0x01 byte (control character, causes the loop to read past the end
of the string).
Fix the bug by:
- Adding a *name check to terminate the loop at end-of-string
- Detecting the NULL pointer from invalid UTF-8 and falling back to
showing the full untruncated name
- Breaking on negative width (control characters)
Signed-off-by: Elijah Newren <newren@gmail.com>
---
diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8
truncation
Maintainer note: This is a new bug from the v2.54 cycle
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2093%2Fnewren%2Ffix%2Fdiffstat-utf8-loop-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2093/newren/fix/diffstat-utf8-loop-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2093
diff.c | 13 +++++++++++--
t/t4052-stat-output.sh | 25 +++++++++++++++++++++++++
2 files changed, 36 insertions(+), 2 deletions(-)
diff --git a/diff.c b/diff.c
index 397e38b41c..7b27241733 100644
--- a/diff.c
+++ b/diff.c
@@ -3093,8 +3093,17 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
if (len < 0)
len = 0;
- while (name_len > len)
- name_len -= utf8_width((const char**)&name, NULL);
+ while (name_len > len && *name) {
+ int w = utf8_width((const char **)&name, NULL);
+ if (!name) { /* Invalid UTF-8 */
+ name = file->print_name;
+ name_len = utf8_strwidth(name);
+ break;
+ }
+ if (w < 0) /* control character */
+ break;
+ name_len -= w;
+ }
slash = strchr(name, '/');
if (slash)
diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
index 7c749062e2..84c53c1a51 100755
--- a/t/t4052-stat-output.sh
+++ b/t/t4052-stat-output.sh
@@ -445,4 +445,29 @@ test_expect_success 'diffstat where line_prefix contains ANSI escape codes is co
test_grep "<RED>|<RESET> ${FILENAME_TRIMMED} | 0" out
'
+test_expect_success 'diffstat truncation with invalid UTF-8 does not crash' '
+ empty_blob=$(git hash-object -w --stdin </dev/null) &&
+ printf "100644 blob $empty_blob\taaa-\300-aaa\n" |
+ git mktree >tree_file &&
+ tree=$(cat tree_file) &&
+ empty_tree=$(git mktree </dev/null) &&
+ c1=$(git commit-tree -m before $empty_tree) &&
+ c2=$(git commit-tree -m after -p $c1 $tree) &&
+ git -c core.quotepath=false diff --stat --stat-name-width=5 $c1..$c2 >output &&
+ test_grep "| 0" output
+'
+
+test_expect_success FUNNYNAMES 'diffstat truncation with control chars does not crash' '
+ FNAME=$(printf "aaa-\x01-aaa") &&
+ git commit --allow-empty -m setup &&
+ >$FNAME &&
+ git add -- $FNAME &&
+ git commit -m "add file with control char name" &&
+ git -c core.quotepath=false diff --stat --stat-name-width=5 HEAD~1..HEAD >output &&
+ test_grep "| 0" output &&
+ rm -- $FNAME &&
+ git rm -- $FNAME &&
+ git commit -m "remove test file"
+'
+
test_done
base-commit: 9f223ef1c026d91c7ac68cc0211bde255dda6199
--
gitgitgadget
next reply other threads:[~2026-04-17 16:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-17 16:26 Elijah Newren via GitGitGadget [this message]
2026-04-17 19:21 ` [PATCH] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation Junio C Hamano
2026-04-17 22:00 ` Elijah Newren
2026-04-17 22:21 ` Junio C Hamano
2026-04-17 22:45 ` [PATCH v2] " Elijah Newren via GitGitGadget
2026-04-19 23:52 ` Lorenzo Pegorari
2026-04-20 14:51 ` Elijah Newren
2026-04-20 15:42 ` [PATCH v3] " Elijah Newren via GitGitGadget
2026-04-20 16:41 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2093.git.1776443163041.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=lorenzo.pegorari2002@gmail.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox