public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [GSoC PATCH 1/1] diff: improve scaling of filenames in diffstat to handle UTF-8 chars
@ 2026-01-14 22:27 LorenzoPegorari
  2026-01-14 22:50 ` Junio C Hamano
  2026-01-16  0:04 ` [GSoC PATCH v2 0/2] " LorenzoPegorari
  0 siblings, 2 replies; 7+ messages in thread
From: LorenzoPegorari @ 2026-01-14 22:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The `show_stats()` function tries to scale the filenames in the diffstat to
ensure they don't exceed the given `name-width`. It does so by calculating
the "display width" of the characters to be dropped, but then advances the
filename pointer by that number of bytes.

However, the "display width" of a character is not always equal to its byte
count. The result is that sometimes, when displaying UTF-8 characters,
filenames exceed the given `name-width`, and frequently the bytes of the
UTF-8 characters are truncated.

The following is an example of the issue, where the 2 files are "HelloHi" and
"Hello你好", and `name-width=6`:

    ...oHi | 0
    ...<BD><A0>好 | 0

Make the filename pointer move by the actual number of bytes of the
characters to drop from the filename, rather than their display width, using
the `utf8_width()` function.

Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
---
 diff.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index a68ddd2168..271ace5728 100644
--- a/diff.c
+++ b/diff.c
@@ -2859,17 +2859,10 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			char *slash;
 			prefix = "...";
 			len -= 3;
-			/*
-			 * NEEDSWORK: (name_len - len) counts the display
-			 * width, which would be shorter than the byte
-			 * length of the corresponding substring.
-			 * Advancing "name" by that number of bytes does
-			 * *NOT* skip over that many columns, so it is
-			 * very likely that chomping the pathname at the
-			 * slash we will find starting from "name" will
-			 * leave the resulting string still too long.
-			 */
-			name += name_len - len;
+
+			while (name_len > len)
+				name_len -= utf8_width((const char**)&name, NULL);
+
 			slash = strchr(name, '/');
 			if (slash)
 				name = slash;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-01-17 17:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-14 22:27 [GSoC PATCH 1/1] diff: improve scaling of filenames in diffstat to handle UTF-8 chars LorenzoPegorari
2026-01-14 22:50 ` Junio C Hamano
2026-01-16  0:00   ` Lorenzo Pegorari
2026-01-16  0:04 ` [GSoC PATCH v2 0/2] " LorenzoPegorari
2026-01-16  0:05   ` [GSoC PATCH v2 1/2] " LorenzoPegorari
2026-01-16  0:05   ` [GSoC PATCH v2 2/2] t4073: add test for diffstat paths length when containing " LorenzoPegorari
2026-01-17 17:52     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox