Git development
 help / color / mirror / Atom feed
From: Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v2] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation
Date: Mon, 20 Apr 2026 01:52:10 +0200	[thread overview]
Message-ID: <aeVqqsdq9B7GE9gS@lorenzo-VM> (raw)
In-Reply-To: <pull.2093.v2.git.1776465910538.gitgitgadget@gmail.com>

On Fri, Apr 17, 2026 at 10:45:10PM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> f85b49f3d4a (diff: improve scaling of filenames in diffstat to handle
> UTF-8 chars, 2026-01-16) introduced a loop in show_stats() that calls
> utf8_width() repeatedly to skip leading characters until the displayed
> width fits.  However, utf8_width() can return problematic values:
> 
>   - For invalid UTF-8 sequences, pick_one_utf8_char() sets the name
>     pointer to NULL and utf8_width() returns 0.  Since name_len does
>     not change, the loop iterates once more and pick_one_utf8_char()
>     dereferences the NULL pointer, crashing.
> 
>   - For control characters, utf8_width() returns -1, so name_len
>     grows when it is expected to shrink.  This can cause the loop to
>     consume more characters than the string contains, reading past
>     the trailing NUL.
> 
> By default, fill_print_name() will C-quotes filenames which escapes
> control characters and invalid bytes to printable text.  That avoids
> this bug from being triggered; however, with core.quotePath=false,
> raw bytes can reach this code.
> 
> Add tests exercising both failure modes with core.quotePath=false and
> a narrow --stat-name-width to force truncation: one with a bare 0xC0
> byte (invalid UTF-8 lead byte, triggers NULL deref) and one with a
> 0x01 byte (control character, causes the loop to read past the end
> of the string).
> 
> Fix both issues by introducing utf8_ish_width(), a thin wrapper
> around utf8_width() that guarantees the pointer always advances and
> the returned width is never negative:
> 
>   - On invalid UTF-8 it restores the pointer, advances by one byte,
>     and returns width 1 (matching the strlen()-based fallback used
>     by utf8_strwidth()).
>   - On a control character it returns 0 (matching utf8_strnwidth()
>     which skips them).
> 
> Also add a "&& *name" guard to the while-loop condition so it
> terminates at end-of-string even when utf8_strwidth()'s strlen()
> fallback causes name_len to exceed the sum of per-character widths.
i> 
> Signed-off-by: Elijah Newren <newren@gmail.com>

Hi, thanks for CCing me and thanks for improving on my previous work.

All of these changes make a lot of sense, and indeed they fix issues
that I didn't consider in f85b49f3d4a (diff: improve scaling of
filenames in diffstat to handle UTF-8 chars, 2026-01-16).

[...]

> diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
> index 7c749062e2..84c53c1a51 100755
> --- a/t/t4052-stat-output.sh
> +++ b/t/t4052-stat-output.sh
> @@ -445,4 +445,29 @@ test_expect_success 'diffstat where line_prefix contains ANSI escape codes is co

[...]

>
> +test_expect_success FUNNYNAMES 'diffstat truncation with control chars does not crash' '
> +	FNAME=$(printf "aaa-\x01-aaa") &&
> +	git commit --allow-empty -m setup &&
> +	>$FNAME &&
> +	git add -- $FNAME &&
> +	git commit -m "add file with control char name" &&
> +	git -c core.quotepath=false diff --stat --stat-name-width=5 HEAD~1..HEAD >output &&
> +	test_grep "| 0" output &&
> +	rm -- $FNAME &&
> +	git rm -- $FNAME &&
> +	git commit -m "remove test file"
> +'
> +
>  test_done

The only thing that I don't quite understand is this second test.

From my tests, the previous code using:

```
[...]
while (name_len > len)
	name_len -= utf8_width((const char**)&name, NULL);
[...]
```

passes this second test just fine, while I believe it's supposed to
fail.

Am I missing something?


Thanks,
Lorenzo

  reply	other threads:[~2026-04-19 23:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-17 16:26 [PATCH] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation Elijah Newren via GitGitGadget
2026-04-17 19:21 ` Junio C Hamano
2026-04-17 22:00   ` Elijah Newren
2026-04-17 22:21     ` Junio C Hamano
2026-04-17 22:45 ` [PATCH v2] " Elijah Newren via GitGitGadget
2026-04-19 23:52   ` Lorenzo Pegorari [this message]
2026-04-20 14:51     ` Elijah Newren
2026-04-20 15:42   ` [PATCH v3] " Elijah Newren via GitGitGadget
2026-04-20 16:41     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeVqqsdq9B7GE9gS@lorenzo-VM \
    --to=lorenzo.pegorari2002@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox