From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF5F223BD02 for ; Fri, 17 Apr 2026 22:45:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.176 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776465915; cv=none; b=GlegXBWkTiIn8p0ebMWroAs2e04WPSAsYeu8on81l17d0bXaJp2B+8MkyCml7hoqdBgn/3YJLbJl967VmM3tHONa2X2P8im2VNBAIJFfcNVbwBIdMbSHKzRRnyttirKHGmUjubnSN4fqczGPNyLBjAlaX/yl3KrLme++68ii2IA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776465915; c=relaxed/simple; bh=gIy9BuJMDy/IzaOngHtil/LaNmhpUOOgUW/in8N82J0=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=hcMNj0JzyTup7iQAdNXClwijPTaPf7p4wVRRC28vYG5cc3uRnZvRKmfxLJuyXu/SCXHbTFnXgMpy/+9+wB2HT56gaL8V2Rv5K06rZdk8cQTAetTrh/5UwemVe63hjfUFCj7tAf4UVhXu+6fqHxwlrJ4/Jxkhxsh6gSwI++tQzaM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Nj7drW6m; arc=none smtp.client-ip=209.85.222.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Nj7drW6m" Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-8cfc5941028so159435585a.1 for ; Fri, 17 Apr 2026 15:45:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776465912; x=1777070712; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=mY3ZrY+mXaJHlh3DUDdsPL8ChiuWxNH9Os93hKoFd1w=; b=Nj7drW6mbrq5tJy5hEVgYHL1yGiwyJ+X6nxFaay6Esv+rViJtCk2KejzKeuoVIoTVE vUur3HWJqIHUXwSxjEFJUBbaLbK5+y5TZ6UEGllR1G3DI+GQcXmTEw9DjRELYF88La93 yGLcK3JlzzDfGD33nrdgS3jUgp/55w6JK6gbWcDXwCJrQft8FMgqn2oqvNNcWghsyk9X kFVqxDpzInvw+0265m6qHlHGnp+3tgF4EeBhE3OC+ef8UldtGXTBRrILSNOPDW+QXvO8 Tx1mtYvD6rZQJesMR+tJZCwI7syAp+4+H8L1GIZVPs6xnvrILnLKinQ2IjV2AFXL4lWC ve8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776465912; x=1777070712; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=mY3ZrY+mXaJHlh3DUDdsPL8ChiuWxNH9Os93hKoFd1w=; b=ZqyazxaoRQDi3AbYPs/vvrzvf5yRq1pSm+lzWZSZ4F282DgGYXm/G01kbajgn2fKyE L/VErPb8iZOhC3sgBODsVguY9gqhdo4anb9VzUUWPA+PtNttdSslVi8MJ1UADTT1Gi1F fWZMl4lwk5UwUg6SpucSGILz2ExgWuVeNVD2VoJGwWea9jEDNaHV/YpkolEvkunCft0L b673vb0LwtKD6oOru6UsTrwMlWS8pPKkxm8gyN87p8xQNazJ9cglDgdwfQ/JjDo4GkaG O82bhL5aG5ilSpqRUT7Aptxag2brRcy5RLUz9m9Z60zDZNWf7gp5sSQRdncFf4pzS6kr SAIg== X-Gm-Message-State: AOJu0Ywkrj7eQcfjaXrHUB4Ck+IHRETRoRwJJWAnpuYlvp3apBI7nYOw Zmm+4PaB87wprWe9gky1KJkQKoXtSj9ck4yVvPnd8/Nbl0ay6VhdWxdjOJE94g== X-Gm-Gg: AeBDieumF3QKiaFaDnLG4loERCGklvJeiaD9nXyG99FL3nZ6qXF3FUZPmQqHWG8IokL KMhjRVV08EHMj2ReDfsxqDGJLY55oZ9tvPPwp5knhuMAQ8jOn00UwiltXz8H8Y4Re00OvsaqeML iwKgM31zDqIWaESaBdr/+bBgW+sri9INUNUpkRMaY2yojtQHvj1EetjoFGzes6W7WRbJ0sLTsxh jQNBrchZDSeuad0Xvn36lsKt3bHlI6nXtPda4MG2pLNk8jveppY9JYuffKNziFWaNGjrfeLDCiT daOm5NcC6zZK+6lMXAuxtkZfzJ9U3xogZzQMTeo+UKNxq3di35Tx4dOXAD+1sSUalel1GMUYvqL Q24hAiaTC5wAOXTFUF3gXDrQRCP3f7IZZxTzf60PcwYMSoG8CHHbWYmDXsyfETyKqA0NJ5WYCN0 HRBEma62zs67IYYIP+uyXd9ef+F7VXNzYUOMfh5g== X-Received: by 2002:a05:620a:284a:b0:8cd:9231:8b54 with SMTP id af79cd13be357-8e792852ae7mr680776685a.62.1776465911733; Fri, 17 Apr 2026 15:45:11 -0700 (PDT) Received: from [127.0.0.1] ([135.232.232.91]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8e7d92ce037sm205562985a.32.2026.04.17.15.45.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Apr 2026 15:45:11 -0700 (PDT) Message-Id: In-Reply-To: References: From: "Elijah Newren via GitGitGadget" Date: Fri, 17 Apr 2026 22:45:10 +0000 Subject: [PATCH v2] diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: LorenzoPegorari , Elijah Newren , Elijah Newren , Elijah Newren From: Elijah Newren f85b49f3d4a (diff: improve scaling of filenames in diffstat to handle UTF-8 chars, 2026-01-16) introduced a loop in show_stats() that calls utf8_width() repeatedly to skip leading characters until the displayed width fits. However, utf8_width() can return problematic values: - For invalid UTF-8 sequences, pick_one_utf8_char() sets the name pointer to NULL and utf8_width() returns 0. Since name_len does not change, the loop iterates once more and pick_one_utf8_char() dereferences the NULL pointer, crashing. - For control characters, utf8_width() returns -1, so name_len grows when it is expected to shrink. This can cause the loop to consume more characters than the string contains, reading past the trailing NUL. By default, fill_print_name() will C-quotes filenames which escapes control characters and invalid bytes to printable text. That avoids this bug from being triggered; however, with core.quotePath=false, raw bytes can reach this code. Add tests exercising both failure modes with core.quotePath=false and a narrow --stat-name-width to force truncation: one with a bare 0xC0 byte (invalid UTF-8 lead byte, triggers NULL deref) and one with a 0x01 byte (control character, causes the loop to read past the end of the string). Fix both issues by introducing utf8_ish_width(), a thin wrapper around utf8_width() that guarantees the pointer always advances and the returned width is never negative: - On invalid UTF-8 it restores the pointer, advances by one byte, and returns width 1 (matching the strlen()-based fallback used by utf8_strwidth()). - On a control character it returns 0 (matching utf8_strnwidth() which skips them). Also add a "&& *name" guard to the while-loop condition so it terminates at end-of-string even when utf8_strwidth()'s strlen() fallback causes name_len to exceed the sum of per-character widths. Signed-off-by: Elijah Newren --- diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation Changes since v1: * Simplified the loop to almost what we had before via a wrapper function that always succeeds in advancing the string and never returns a negative width. (Which, as a consequence, treats invalid UTF-8 and control characters the roughly the same, unlike v1.) Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2093%2Fnewren%2Ffix%2Fdiffstat-utf8-loop-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2093/newren/fix/diffstat-utf8-loop-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/2093 Range-diff vs v1: 1: fcd44d6cf8 ! 1: 4a72647ce2 diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation @@ Commit message diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation f85b49f3d4a (diff: improve scaling of filenames in diffstat to handle - UTF-8 chars, 2024-10-27) introduced a loop in show_stats() that calls + UTF-8 chars, 2026-01-16) introduced a loop in show_stats() that calls utf8_width() repeatedly to skip leading characters until the displayed width fits. However, utf8_width() can return problematic values: @@ Commit message 0x01 byte (control character, causes the loop to read past the end of the string). - Fix the bug by: - - Adding a *name check to terminate the loop at end-of-string - - Detecting the NULL pointer from invalid UTF-8 and falling back to - showing the full untruncated name - - Breaking on negative width (control characters) + Fix both issues by introducing utf8_ish_width(), a thin wrapper + around utf8_width() that guarantees the pointer always advances and + the returned width is never negative: + + - On invalid UTF-8 it restores the pointer, advances by one byte, + and returns width 1 (matching the strlen()-based fallback used + by utf8_strwidth()). + - On a control character it returns 0 (matching utf8_strnwidth() + which skips them). + + Also add a "&& *name" guard to the while-loop condition so it + terminates at end-of-string even when utf8_strwidth()'s strlen() + fallback causes name_len to exceed the sum of per-character widths. Signed-off-by: Elijah Newren ## diff.c ## +@@ diff.c: void print_stat_summary(FILE *fp, int files, + print_stat_summary_inserts_deletes(&o, files, insertions, deletions); + } + ++/* ++ * Like utf8_width(), but guaranteed safe for use in loops that subtract ++ * per-character widths: ++ * ++ * - utf8_width() sets *start to NULL on invalid UTF-8 and returns 0; ++ * we restore the pointer and advance by one byte, returning width 1 ++ * (matching the strlen()-based fallback in utf8_strwidth()). ++ * ++ * - utf8_width() returns -1 for control characters; we return 0 ++ * (matching utf8_strnwidth() which skips them). ++ */ ++static int utf8_ish_width(const char **start) ++{ ++ const char *old = *start; ++ int w = utf8_width(start, NULL); ++ if (!*start) { ++ *start = old + 1; ++ return 1; ++ } ++ return (w < 0) ? 0 : w; ++} ++ + static void show_stats(struct diffstat_t *data, struct diff_options *options) + { + int i, len, add, del, adds = 0, dels = 0; @@ diff.c: static void show_stats(struct diffstat_t *data, struct diff_options *options) if (len < 0) len = 0; - while (name_len > len) - name_len -= utf8_width((const char**)&name, NULL); -+ while (name_len > len && *name) { -+ int w = utf8_width((const char **)&name, NULL); -+ if (!name) { /* Invalid UTF-8 */ -+ name = file->print_name; -+ name_len = utf8_strwidth(name); -+ break; -+ } -+ if (w < 0) /* control character */ -+ break; -+ name_len -= w; -+ } ++ while (name_len > len && *name) ++ name_len -= utf8_ish_width((const char**)&name); slash = strchr(name, '/'); if (slash) diff.c | 26 ++++++++++++++++++++++++-- t/t4052-stat-output.sh | 25 +++++++++++++++++++++++++ 2 files changed, 49 insertions(+), 2 deletions(-) diff --git a/diff.c b/diff.c index 397e38b41c..1a3b19f71f 100644 --- a/diff.c +++ b/diff.c @@ -2927,6 +2927,28 @@ void print_stat_summary(FILE *fp, int files, print_stat_summary_inserts_deletes(&o, files, insertions, deletions); } +/* + * Like utf8_width(), but guaranteed safe for use in loops that subtract + * per-character widths: + * + * - utf8_width() sets *start to NULL on invalid UTF-8 and returns 0; + * we restore the pointer and advance by one byte, returning width 1 + * (matching the strlen()-based fallback in utf8_strwidth()). + * + * - utf8_width() returns -1 for control characters; we return 0 + * (matching utf8_strnwidth() which skips them). + */ +static int utf8_ish_width(const char **start) +{ + const char *old = *start; + int w = utf8_width(start, NULL); + if (!*start) { + *start = old + 1; + return 1; + } + return (w < 0) ? 0 : w; +} + static void show_stats(struct diffstat_t *data, struct diff_options *options) { int i, len, add, del, adds = 0, dels = 0; @@ -3093,8 +3115,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) if (len < 0) len = 0; - while (name_len > len) - name_len -= utf8_width((const char**)&name, NULL); + while (name_len > len && *name) + name_len -= utf8_ish_width((const char**)&name); slash = strchr(name, '/'); if (slash) diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh index 7c749062e2..84c53c1a51 100755 --- a/t/t4052-stat-output.sh +++ b/t/t4052-stat-output.sh @@ -445,4 +445,29 @@ test_expect_success 'diffstat where line_prefix contains ANSI escape codes is co test_grep "| ${FILENAME_TRIMMED} | 0" out ' +test_expect_success 'diffstat truncation with invalid UTF-8 does not crash' ' + empty_blob=$(git hash-object -w --stdin tree_file && + tree=$(cat tree_file) && + empty_tree=$(git mktree output && + test_grep "| 0" output +' + +test_expect_success FUNNYNAMES 'diffstat truncation with control chars does not crash' ' + FNAME=$(printf "aaa-\x01-aaa") && + git commit --allow-empty -m setup && + >$FNAME && + git add -- $FNAME && + git commit -m "add file with control char name" && + git -c core.quotepath=false diff --stat --stat-name-width=5 HEAD~1..HEAD >output && + test_grep "| 0" output && + rm -- $FNAME && + git rm -- $FNAME && + git commit -m "remove test file" +' + test_done base-commit: 9f223ef1c026d91c7ac68cc0211bde255dda6199 -- gitgitgadget