From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>,
Elijah Newren <newren@gmail.com>,
Elijah Newren <newren@gmail.com>
Subject: [PATCH] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation
Date: Fri, 17 Apr 2026 16:26:03 +0000 [thread overview]
Message-ID: <pull.2093.git.1776443163041.gitgitgadget@gmail.com> (raw)
From: Elijah Newren <newren@gmail.com>
f85b49f3d4a (diff: improve scaling of filenames in diffstat to handle
UTF-8 chars, 2024-10-27) introduced a loop in show_stats() that calls
utf8_width() repeatedly to skip leading characters until the displayed
width fits. However, utf8_width() can return problematic values:
- For invalid UTF-8 sequences, pick_one_utf8_char() sets the name
pointer to NULL and utf8_width() returns 0. Since name_len does
not change, the loop iterates once more and pick_one_utf8_char()
dereferences the NULL pointer, crashing.
- For control characters, utf8_width() returns -1, so name_len
grows when it is expected to shrink. This can cause the loop to
consume more characters than the string contains, reading past
the trailing NUL.
By default, fill_print_name() will C-quotes filenames which escapes
control characters and invalid bytes to printable text. That avoids
this bug from being triggered; however, with core.quotePath=false,
raw bytes can reach this code.
Add tests exercising both failure modes with core.quotePath=false and
a narrow --stat-name-width to force truncation: one with a bare 0xC0
byte (invalid UTF-8 lead byte, triggers NULL deref) and one with a
0x01 byte (control character, causes the loop to read past the end
of the string).
Fix the bug by:
- Adding a *name check to terminate the loop at end-of-string
- Detecting the NULL pointer from invalid UTF-8 and falling back to
showing the full untruncated name
- Breaking on negative width (control characters)
Signed-off-by: Elijah Newren <newren@gmail.com>
---
diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8
truncation
Maintainer note: This is a new bug from the v2.54 cycle
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2093%2Fnewren%2Ffix%2Fdiffstat-utf8-loop-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2093/newren/fix/diffstat-utf8-loop-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2093
diff.c | 13 +++++++++++--
t/t4052-stat-output.sh | 25 +++++++++++++++++++++++++
2 files changed, 36 insertions(+), 2 deletions(-)
diff --git a/diff.c b/diff.c
index 397e38b41c..7b27241733 100644
--- a/diff.c
+++ b/diff.c
@@ -3093,8 +3093,17 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
if (len < 0)
len = 0;
- while (name_len > len)
- name_len -= utf8_width((const char**)&name, NULL);
+ while (name_len > len && *name) {
+ int w = utf8_width((const char **)&name, NULL);
+ if (!name) { /* Invalid UTF-8 */
+ name = file->print_name;
+ name_len = utf8_strwidth(name);
+ break;
+ }
+ if (w < 0) /* control character */
+ break;
+ name_len -= w;
+ }
slash = strchr(name, '/');
if (slash)
diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
index 7c749062e2..84c53c1a51 100755
--- a/t/t4052-stat-output.sh
+++ b/t/t4052-stat-output.sh
@@ -445,4 +445,29 @@ test_expect_success 'diffstat where line_prefix contains ANSI escape codes is co
test_grep "<RED>|<RESET> ${FILENAME_TRIMMED} | 0" out
'
+test_expect_success 'diffstat truncation with invalid UTF-8 does not crash' '
+ empty_blob=$(git hash-object -w --stdin </dev/null) &&
+ printf "100644 blob $empty_blob\taaa-\300-aaa\n" |
+ git mktree >tree_file &&
+ tree=$(cat tree_file) &&
+ empty_tree=$(git mktree </dev/null) &&
+ c1=$(git commit-tree -m before $empty_tree) &&
+ c2=$(git commit-tree -m after -p $c1 $tree) &&
+ git -c core.quotepath=false diff --stat --stat-name-width=5 $c1..$c2 >output &&
+ test_grep "| 0" output
+'
+
+test_expect_success FUNNYNAMES 'diffstat truncation with control chars does not crash' '
+ FNAME=$(printf "aaa-\x01-aaa") &&
+ git commit --allow-empty -m setup &&
+ >$FNAME &&
+ git add -- $FNAME &&
+ git commit -m "add file with control char name" &&
+ git -c core.quotepath=false diff --stat --stat-name-width=5 HEAD~1..HEAD >output &&
+ test_grep "| 0" output &&
+ rm -- $FNAME &&
+ git rm -- $FNAME &&
+ git commit -m "remove test file"
+'
+
test_done
base-commit: 9f223ef1c026d91c7ac68cc0211bde255dda6199
--
gitgitgadget
next reply other threads:[~2026-04-17 16:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-17 16:26 Elijah Newren via GitGitGadget [this message]
2026-04-17 19:21 ` [PATCH] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation Junio C Hamano
2026-04-17 22:00 ` Elijah Newren
2026-04-17 22:21 ` Junio C Hamano
2026-04-17 22:45 ` [PATCH v2] " Elijah Newren via GitGitGadget
2026-04-19 23:52 ` Lorenzo Pegorari
2026-04-20 14:51 ` Elijah Newren
2026-04-20 15:42 ` [PATCH v3] " Elijah Newren via GitGitGadget
2026-04-20 16:41 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2093.git.1776443163041.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=lorenzo.pegorari2002@gmail.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.