From: Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v2] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation
Date: Mon, 20 Apr 2026 01:52:10 +0200 [thread overview]
Message-ID: <aeVqqsdq9B7GE9gS@lorenzo-VM> (raw)
In-Reply-To: <pull.2093.v2.git.1776465910538.gitgitgadget@gmail.com>
On Fri, Apr 17, 2026 at 10:45:10PM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> f85b49f3d4a (diff: improve scaling of filenames in diffstat to handle
> UTF-8 chars, 2026-01-16) introduced a loop in show_stats() that calls
> utf8_width() repeatedly to skip leading characters until the displayed
> width fits. However, utf8_width() can return problematic values:
>
> - For invalid UTF-8 sequences, pick_one_utf8_char() sets the name
> pointer to NULL and utf8_width() returns 0. Since name_len does
> not change, the loop iterates once more and pick_one_utf8_char()
> dereferences the NULL pointer, crashing.
>
> - For control characters, utf8_width() returns -1, so name_len
> grows when it is expected to shrink. This can cause the loop to
> consume more characters than the string contains, reading past
> the trailing NUL.
>
> By default, fill_print_name() will C-quotes filenames which escapes
> control characters and invalid bytes to printable text. That avoids
> this bug from being triggered; however, with core.quotePath=false,
> raw bytes can reach this code.
>
> Add tests exercising both failure modes with core.quotePath=false and
> a narrow --stat-name-width to force truncation: one with a bare 0xC0
> byte (invalid UTF-8 lead byte, triggers NULL deref) and one with a
> 0x01 byte (control character, causes the loop to read past the end
> of the string).
>
> Fix both issues by introducing utf8_ish_width(), a thin wrapper
> around utf8_width() that guarantees the pointer always advances and
> the returned width is never negative:
>
> - On invalid UTF-8 it restores the pointer, advances by one byte,
> and returns width 1 (matching the strlen()-based fallback used
> by utf8_strwidth()).
> - On a control character it returns 0 (matching utf8_strnwidth()
> which skips them).
>
> Also add a "&& *name" guard to the while-loop condition so it
> terminates at end-of-string even when utf8_strwidth()'s strlen()
> fallback causes name_len to exceed the sum of per-character widths.
i>
> Signed-off-by: Elijah Newren <newren@gmail.com>
Hi, thanks for CCing me and thanks for improving on my previous work.
All of these changes make a lot of sense, and indeed they fix issues
that I didn't consider in f85b49f3d4a (diff: improve scaling of
filenames in diffstat to handle UTF-8 chars, 2026-01-16).
[...]
> diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
> index 7c749062e2..84c53c1a51 100755
> --- a/t/t4052-stat-output.sh
> +++ b/t/t4052-stat-output.sh
> @@ -445,4 +445,29 @@ test_expect_success 'diffstat where line_prefix contains ANSI escape codes is co
[...]
>
> +test_expect_success FUNNYNAMES 'diffstat truncation with control chars does not crash' '
> + FNAME=$(printf "aaa-\x01-aaa") &&
> + git commit --allow-empty -m setup &&
> + >$FNAME &&
> + git add -- $FNAME &&
> + git commit -m "add file with control char name" &&
> + git -c core.quotepath=false diff --stat --stat-name-width=5 HEAD~1..HEAD >output &&
> + test_grep "| 0" output &&
> + rm -- $FNAME &&
> + git rm -- $FNAME &&
> + git commit -m "remove test file"
> +'
> +
> test_done
The only thing that I don't quite understand is this second test.
From my tests, the previous code using:
```
[...]
while (name_len > len)
name_len -= utf8_width((const char**)&name, NULL);
[...]
```
passes this second test just fine, while I believe it's supposed to
fail.
Am I missing something?
Thanks,
Lorenzo
next prev parent reply other threads:[~2026-04-19 23:52 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-17 16:26 [PATCH] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation Elijah Newren via GitGitGadget
2026-04-17 19:21 ` Junio C Hamano
2026-04-17 22:00 ` Elijah Newren
2026-04-17 22:21 ` Junio C Hamano
2026-04-17 22:45 ` [PATCH v2] " Elijah Newren via GitGitGadget
2026-04-19 23:52 ` Lorenzo Pegorari [this message]
2026-04-20 14:51 ` Elijah Newren
2026-04-20 15:42 ` [PATCH v3] " Elijah Newren via GitGitGadget
2026-04-20 16:41 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeVqqsdq9B7GE9gS@lorenzo-VM \
--to=lorenzo.pegorari2002@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox