* [GSoC PATCH 0/2] diff: handle ANSI chars in prefix when calculating diffstat width
@ 2026-02-24 1:09 LorenzoPegorari
2026-02-24 1:11 ` [GSoC PATCH 1/2] " LorenzoPegorari
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: LorenzoPegorari @ 2026-02-24 1:09 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
This patch aims to fix a bug where the calculation of the diffstat width
incorrectly uses the strlen() of line_prefix instead of its actual
display width.
This patch addresses the NEEDSWORK item added by ce8529b2 (diff: leave
NEEDWORK notes in show_stats() function, 2022-10-21).
Also, this bug was reported and suggested to me by Junio C Hamano here:
https://lore.kernel.org/git/xmqqikd3ermt.fsf@gitster.g/
Junio, do you wish to be included in this patch by a Reported-by and/or
Suggested-by tag?
LorenzoPegorari (2):
diff: handle ANSI chars in prefix when calculating diffstat width
t4074: add test for diffstat width when prefix contains ANSI chars
diff.c | 12 ++----
t/t4074-diff-stat-width-with-line-prefix.sh | 42 +++++++++++++++++++++
2 files changed, 46 insertions(+), 8 deletions(-)
create mode 100755 t/t4074-diff-stat-width-with-line-prefix.sh
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread* [GSoC PATCH 1/2] diff: handle ANSI chars in prefix when calculating diffstat width 2026-02-24 1:09 [GSoC PATCH 0/2] diff: handle ANSI chars in prefix when calculating diffstat width LorenzoPegorari @ 2026-02-24 1:11 ` LorenzoPegorari 2026-02-24 1:20 ` [GSoC PATCH 2/2] t4074: add test for diffstat width when prefix contains ANSI chars LorenzoPegorari ` (2 subsequent siblings) 3 siblings, 0 replies; 14+ messages in thread From: LorenzoPegorari @ 2026-02-24 1:11 UTC (permalink / raw) To: git; +Cc: Junio C Hamano The diffstat width is calculated by taking the terminal width and incorrectly subtracting the strlen() of line_prefix, instead of the actual display width of line_prefix (which may contain ANSI chars). Utilize the display width instead, obtained via utf8_strnwidth() with the flag to skip ANSI chars. Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> --- diff.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/diff.c b/diff.c index 35b903a9a0..de1db28714 100644 --- a/diff.c +++ b/diff.c @@ -2749,7 +2749,9 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) count = i; /* where we can stop scanning in data->files[] */ /* - * We have width = stat_width or term_columns() columns total. + * We have width = stat_width or term_columns() columns total + * minus the length of line_prefix skipping ANSI chars to get the + * display width (e.g., to skip ANSI-colored strings in "log --graph"). * We want a maximum of min(max_len, stat_name_width) for the name part. * We want a maximum of min(max_change, stat_graph_width) for the +- part. * We also need 1 for " " and 4 + decimal_width(max_change) @@ -2776,14 +2778,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) * separators and this message, this message will "overflow" * making the line longer than the maximum width. */ - - /* - * NEEDSWORK: line_prefix is often used for "log --graph" output - * and contains ANSI-colored string. utf8_strnwidth() should be - * used to correctly count the display width instead of strlen(). - */ if (options->stat_width == -1) - width = term_columns() - strlen(line_prefix); + width = term_columns() - utf8_strnwidth(line_prefix, strlen(line_prefix), 1); else width = options->stat_width ? options->stat_width : 80; number_width = decimal_width(max_change) > number_width ? -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [GSoC PATCH 2/2] t4074: add test for diffstat width when prefix contains ANSI chars 2026-02-24 1:09 [GSoC PATCH 0/2] diff: handle ANSI chars in prefix when calculating diffstat width LorenzoPegorari 2026-02-24 1:11 ` [GSoC PATCH 1/2] " LorenzoPegorari @ 2026-02-24 1:20 ` LorenzoPegorari 2026-02-24 5:43 ` Junio C Hamano 2026-02-24 5:17 ` [GSoC PATCH 0/2] diff: handle ANSI chars in prefix when calculating diffstat width Junio C Hamano 2026-02-27 16:01 ` [GSoC PATCH v2 0/2] diff: handle UTF-8 " LorenzoPegorari 3 siblings, 1 reply; 14+ messages in thread From: LorenzoPegorari @ 2026-02-24 1:20 UTC (permalink / raw) To: git; +Cc: Junio C Hamano Add test checking the calculation of the diffstat display width when the line_prefix contains ANSI characters. Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> --- t/meson.build | 1 + t/t4074-diff-stat-width-with-line-prefix.sh | 42 +++++++++++++++++++++ 2 files changed, 43 insertions(+) create mode 100755 t/t4074-diff-stat-width-with-line-prefix.sh diff --git a/t/meson.build b/t/meson.build index f80e366cff..2867ba8a77 100644 --- a/t/meson.build +++ b/t/meson.build @@ -502,6 +502,7 @@ integration_tests = [ 't4071-diff-minimal.sh', 't4072-diff-max-depth.sh', 't4073-diff-stat-name-width.sh', + 't4074-diff-stat-width-with-line-prefix.sh', 't4100-apply-stat.sh', 't4101-apply-nonl.sh', 't4102-apply-rename.sh', diff --git a/t/t4074-diff-stat-width-with-line-prefix.sh b/t/t4074-diff-stat-width-with-line-prefix.sh new file mode 100755 index 0000000000..69ab85bf25 --- /dev/null +++ b/t/t4074-diff-stat-width-with-line-prefix.sh @@ -0,0 +1,42 @@ +#!/bin/sh + +test_description='git-diff check diffstat width when line_prefix is not empty' + +. ./test-lib.sh + + +# The terminal, during a test, should default to a width of 80 columns +FILE_MAX="filename-with-exact-length-to-take-the-max-amount-of-space-in-diffstat" +FILE_LONGER="...name-with-exact-length-to-take-the-max-amount-of-space-in-diffstat+" + +setup () { + rm -rf * ".git" && + git init && + git config color.diff always +} + +test_expect_success 'check width with max name-width' ' + setup && + touch "${FILE_MAX}" && + git add . && + git commit -m "init" && + echo "text" >"${FILE_MAX}" && + git add . && + git commit -m "text" && + git log --graph --stat >out && + grep "${FILE} | 1" out +' + +test_expect_success 'check width with longer name-width' ' + setup && + touch "${FILE_MAX}+" && + git add . && + git commit -m "init" && + echo "text" >"${FILE_MAX}+" && + git add . && + git commit -m "text" && + git log --graph --stat >out && + grep "${FILE_LONGER} | 1" out +' + +test_done -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [GSoC PATCH 2/2] t4074: add test for diffstat width when prefix contains ANSI chars 2026-02-24 1:20 ` [GSoC PATCH 2/2] t4074: add test for diffstat width when prefix contains ANSI chars LorenzoPegorari @ 2026-02-24 5:43 ` Junio C Hamano 2026-02-25 2:18 ` Lorenzo Pegorari 0 siblings, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2026-02-24 5:43 UTC (permalink / raw) To: LorenzoPegorari; +Cc: git LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes: > Add test checking the calculation of the diffstat display width when > the line_prefix contains ANSI characters. Hmph, who stuffs the "line_prefix" and with what? > Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> > --- > t/meson.build | 1 + > t/t4074-diff-stat-width-with-line-prefix.sh | 42 +++++++++++++++++++++ > 2 files changed, 43 insertions(+) > create mode 100755 t/t4074-diff-stat-width-with-line-prefix.sh Do we need a brand new test script only to host just two tests? Is it too cumbersome to modify existing test scripts that already test "git log --graph"? > +setup () { > + rm -rf * ".git" && > + git init && > + git config color.diff always > +} Why? > +test_expect_success 'check width with max name-width' ' > + setup && Why? Shouldn't something like test_config color.diff always && be sufficient? > + touch "${FILE_MAX}" && Use of "touch" when you do not care about the timestamp of the resulting file is misleading. If you only care about its existence, do something like >"${FILE_MAX}" && instead. > + git add . && > + git commit -m "init" && > + echo "text" >"${FILE_MAX}" && > + git add . && > + git commit -m "text" && OK, so we have two commits, one adds an empty file, followed by another that adds one line of "text" to the file. > + git log --graph --stat >out && > + grep "${FILE} | 1" out > +' Use "test_grep" instead of "grep" to make it easier to debug when things break, perhaps? It is unclear what we are exactly looking for. What's the failure mode and how the miscounting of the width of the line-prefix contribute to that failure? We have two commits, "log" without "--reverse" would give the creation of an empty file i.e., "file | 0" first and then one line addition to the file i.e., "file | 1 +" next. We only care about "${FILE} | 1" existing in the output and we do not bother checking what comes at the beginning of the line before "${FILE}" or what comes on the line fter "| 1", because we can detect the breakage we expect to see only by looking at the middle of the line? How would that work? The test deserves a bit more comment to help readers who wonder about these things. Side note. I think I know the answer to these questions, but this project is not about me ;-) but other future contributors also would need help when they encounter this test and want to learn what it is testing. > +test_expect_success 'check width with longer name-width' ' > + setup && > + touch "${FILE_MAX}+" && > + git add . && > + git commit -m "init" && > + echo "text" >"${FILE_MAX}+" && > + git add . && > + git commit -m "text" && > + git log --graph --stat >out && > + grep "${FILE_LONGER} | 1" out > +' > + > +test_done Instead of doing the repository construction twice, you can see the same effect by running "git log --graph --stat" on the same history with different values exported on COLUMNS environment variable, which I think would be easier to see and simpler to debug. Something along the lines of ... COLUMNS=80 git log --graph --stat >out && test_decode_color out >out.decoded && test_grep "^<RED>|<RESET> ${FILE_MAX} | 1 <GREEN>+<RESET>$" out.decoded && COLUMNS=79 git log --graph --stat >out && test_decode_color out >out.decoded && test_grep "^<RED>|<RESET> ${FILE_TRUNCATED} | 1 <GREEN>+<RESET>$" out.decoded ... perhaps? > +# The terminal, during a test, should default to a width of 80 columns > +FILE_MAX="filename-with-exact-length-to-take-the-max-amount-of-space-in-diffstat" > +FILE_LONGER="...name-with-exact-length-to-take-the-max-amount-of-space-in-diffstat+" ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GSoC PATCH 2/2] t4074: add test for diffstat width when prefix contains ANSI chars 2026-02-24 5:43 ` Junio C Hamano @ 2026-02-25 2:18 ` Lorenzo Pegorari 0 siblings, 0 replies; 14+ messages in thread From: Lorenzo Pegorari @ 2026-02-25 2:18 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Mon, Feb 23, 2026 at 09:43:56PM -0800, Junio C Hamano wrote: > LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes: > > > Add test checking the calculation of the diffstat display width when > > the line_prefix contains ANSI characters. > > Hmph, who stuffs the "line_prefix" and with what? Yeah, I should make the commit description clearer. Will improve that. > > Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> > > --- > > t/meson.build | 1 + > > t/t4074-diff-stat-width-with-line-prefix.sh | 42 +++++++++++++++++++++ > > 2 files changed, 43 insertions(+) > > create mode 100755 t/t4074-diff-stat-width-with-line-prefix.sh > > Do we need a brand new test script only to host just two tests? Is > it too cumbersome to modify existing test scripts that already test > "git log --graph"? Mmh I guess the test could be added to "t4052-stat-output.sh", which "tests --stat output of various commands". Do you agree that this the correct place? (btw, t4052 seems to me like a quite messy test...) > > +setup () { > > + rm -rf * ".git" && > > + git init && > > + git config color.diff always > > +} > > Why? > > > +test_expect_success 'check width with max name-width' ' > > + setup && > > Why? Shouldn't something like > > test_config color.diff always && > > be sufficient? This is sufficient, and the setup() func makes no sense now that I know this. Thanks for pointing that out! > > + touch "${FILE_MAX}" && > > Use of "touch" when you do not care about the timestamp of the > resulting file is misleading. If you only care about its existence, > do something like > > >"${FILE_MAX}" && > > instead. Ack. > > + git add . && > > + git commit -m "init" && > > + echo "text" >"${FILE_MAX}" && > > + git add . && > > + git commit -m "text" && > > OK, so we have two commits, one adds an empty file, followed by > another that adds one line of "text" to the file. Yes, these are just placeholder commits. We need at least 2 commits to create a graph with "log --graph". I will make them "--allow-empty" and "--allow-empty-message" emphasize that they are placeholders. > > + git log --graph --stat >out && > > + grep "${FILE} | 1" out > > +' > > Use "test_grep" instead of "grep" to make it easier to debug when > things break, perhaps? Ack. > It is unclear what we are exactly looking for. What's the failure > mode and how the miscounting of the width of the line-prefix > contribute to that failure? > > We have two commits, "log" without "--reverse" would give the > creation of an empty file i.e., "file | 0" first and then one line > addition to the file i.e., "file | 1 +" next. We only care about > "${FILE} | 1" existing in the output and we do not bother checking > what comes at the beginning of the line before "${FILE}" or what > comes on the line fter "| 1", because we can detect the breakage we > expect to see only by looking at the middle of the line? How would > that work? The test deserves a bit more comment to help readers who > wonder about these things. I will add comments to explain the test better. The goal is to have a file with name that is just the exact length so that the line in the git-log output which includes the diffstat will take all the available space in the terminal (if the diffstat width is calculated correctly). Then we test the diffstat width calculation with the terminal being the correct size for the file name, and then 1 column short. If the file name is fully displayed first, and is then shortened by only 1 character, it means that the diffstat width was correctly calculated. > Side note. I think I know the answer to these questions, > but this project is not about me ;-) but other future > contributors also would need help when they encounter this > test and want to learn what it is testing. Thank you so much for helping me out without just giving me the solution. It must have taken longer :-) > > +test_expect_success 'check width with longer name-width' ' > > + setup && > > + touch "${FILE_MAX}+" && > > + git add . && > > + git commit -m "init" && > > + echo "text" >"${FILE_MAX}+" && > > + git add . && > > + git commit -m "text" && > > + git log --graph --stat >out && > > + grep "${FILE_LONGER} | 1" out > > +' > > + > > +test_done > > Instead of doing the repository construction twice, you can see the > same effect by running "git log --graph --stat" on the same history > with different values exported on COLUMNS environment variable, > which I think would be easier to see and simpler to debug. > > Something along the lines of ... > > COLUMNS=80 git log --graph --stat >out && > test_decode_color out >out.decoded && > test_grep "^<RED>|<RESET> ${FILE_MAX} | 1 <GREEN>+<RESET>$" out.decoded && > > COLUMNS=79 git log --graph --stat >out && > test_decode_color out >out.decoded && > test_grep "^<RED>|<RESET> ${FILE_TRUNCATED} | 1 <GREEN>+<RESET>$" out.decoded > > ... perhaps? Didn't know about this. Will use it for sure. Again, thanks a lot for the through review Junio. Just let me know if t4052 (described above) is the correct place for the test! (Sorry for the double email Junio... why does mutt not automatically add the mailing list to CC while replying... :'[ ) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GSoC PATCH 0/2] diff: handle ANSI chars in prefix when calculating diffstat width 2026-02-24 1:09 [GSoC PATCH 0/2] diff: handle ANSI chars in prefix when calculating diffstat width LorenzoPegorari 2026-02-24 1:11 ` [GSoC PATCH 1/2] " LorenzoPegorari 2026-02-24 1:20 ` [GSoC PATCH 2/2] t4074: add test for diffstat width when prefix contains ANSI chars LorenzoPegorari @ 2026-02-24 5:17 ` Junio C Hamano 2026-02-27 16:01 ` [GSoC PATCH v2 0/2] diff: handle UTF-8 " LorenzoPegorari 3 siblings, 0 replies; 14+ messages in thread From: Junio C Hamano @ 2026-02-24 5:17 UTC (permalink / raw) To: LorenzoPegorari; +Cc: git LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes: > This patch aims to fix a bug where the calculation of the diffstat width > incorrectly uses the strlen() of line_prefix instead of its actual > display width. > > This patch addresses the NEEDSWORK item added by ce8529b2 (diff: leave > NEEDWORK notes in show_stats() function, 2022-10-21). > > Also, this bug was reported and suggested to me by Junio C Hamano here: > https://lore.kernel.org/git/xmqqikd3ermt.fsf@gitster.g/ > > Junio, do you wish to be included in this patch by a Reported-by and/or > Suggested-by tag? Neither. I didn't report or suggest anything recently. I merely pointed at one out of the two NEEDSWORK comments that was written long time ago when you addressed the other one. > diff: handle ANSI chars in prefix when calculating diffstat width > t4074: add test for diffstat width when prefix contains ANSI chars "ANSI chars" is a phrase that was hard to understand, as the theme of this was paged out of my working memory a long time ago ;-). I think the issue is that strings with ANSI color escape sequences are used in "log --graph" and other output, and in order to measure how many display columns these strings occupy, we must not be using strlen() but utr8_strnwidth() that knows how to count UTF-8 chars (whose display widths do not match the number of bytes) as well as that these color escape sequences do not occupy any display width? Anyway, thanks for working on these patches. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [GSoC PATCH v2 0/2] diff: handle UTF-8 chars in prefix when calculating diffstat width 2026-02-24 1:09 [GSoC PATCH 0/2] diff: handle ANSI chars in prefix when calculating diffstat width LorenzoPegorari ` (2 preceding siblings ...) 2026-02-24 5:17 ` [GSoC PATCH 0/2] diff: handle ANSI chars in prefix when calculating diffstat width Junio C Hamano @ 2026-02-27 16:01 ` LorenzoPegorari 2026-02-27 16:04 ` [GSoC PATCH v2 1/2] " LorenzoPegorari ` (3 more replies) 3 siblings, 4 replies; 14+ messages in thread From: LorenzoPegorari @ 2026-02-27 16:01 UTC (permalink / raw) To: git; +Cc: Junio C Hamano This patch aims to fix a bug where the calculation of the diffstat width incorrectly uses the strlen() of line_prefix instead of its actual display width. This patch addresses the NEEDSWORK item added by ce8529b2 (diff: leave NEEDWORK notes in show_stats() function, 2022-10-21). V2 DIFF: * Changed references from "ANSI char" to "UTF-8 char" * Compacted the 2 test scripts in a single script, that I placed in the already existing test file t4052, which "tests --stat output for various commands" * Added many of the changes suggested to me by Junio C Hamano [1], in order to make the test easier to read and to debug * Added a descriptive comment for the test script, and a more complete commit message to describe the goal of the test [1]: https://lore.kernel.org/git/xmqqikbmk86b.fsf@gitster.g/ LorenzoPegorari (2): diff: handle UTF-8 chars in prefix when calculating diffstat width t4052: add test for diffstat width when prefix contains UTF-8 chars diff.c | 12 ++++-------- t/b | 0 t/t4052-stat-output.sh | 30 ++++++++++++++++++++++++++++++ 3 files changed, 34 insertions(+), 8 deletions(-) create mode 100644 t/b Range-diff against v1: 1: a798eda511 ! 1: 9e8161a700 diff: handle ANSI chars in prefix when calculating diffstat width @@ Metadata Author: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> ## Commit message ## - diff: handle ANSI chars in prefix when calculating diffstat width + diff: handle UTF-8 chars in prefix when calculating diffstat width The diffstat width is calculated by taking the terminal width and - incorrectly subtracting the strlen() of line_prefix, instead of the - actual display width of line_prefix (which may contain ANSI chars). + incorrectly subtracting the `strlen()` of `line_prefix`, instead of the + actual display width of `line_prefix`, which may contain UTF-8 + characters (e.g., ANSI-colored strings in `log --graph --stat`). - Utilize the display width instead, obtained via utf8_strnwidth() with - the flag to skip ANSI chars. + Utilize the display width instead, obtained via `utf8_strnwidth()` with + the flag `skip_ansi`. Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> @@ diff.c: static void show_stats(struct diffstat_t *data, struct diff_options *opt /* - * We have width = stat_width or term_columns() columns total. -+ * We have width = stat_width or term_columns() columns total -+ * minus the length of line_prefix skipping ANSI chars to get the -+ * display width (e.g., to skip ANSI-colored strings in "log --graph"). ++ * We have width = stat_width or term_columns() columns total minus the ++ * length of line_prefix skipping UTF-8 chars to get the display width ++ * (e.g., to skip ANSI-colored strings in "log --graph --stat"). * We want a maximum of min(max_len, stat_name_width) for the name part. * We want a maximum of min(max_change, stat_graph_width) for the +- part. * We also need 1 for " " and 4 + decimal_width(max_change) 2: ce25150593 < -: ---------- t4074: add test for diffstat width when prefix contains ANSI chars -: ---------- > 2: 984fa10d72 t4052: add test for diffstat width when prefix contains UTF-8 chars -- 2.43.0 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [GSoC PATCH v2 1/2] diff: handle UTF-8 chars in prefix when calculating diffstat width 2026-02-27 16:01 ` [GSoC PATCH v2 0/2] diff: handle UTF-8 " LorenzoPegorari @ 2026-02-27 16:04 ` LorenzoPegorari 2026-02-27 16:08 ` [GSoC PATCH v2 2/2] t4052: add test for diffstat width when prefix contains UTF-8 chars LorenzoPegorari ` (2 subsequent siblings) 3 siblings, 0 replies; 14+ messages in thread From: LorenzoPegorari @ 2026-02-27 16:04 UTC (permalink / raw) To: git; +Cc: Junio C Hamano The diffstat width is calculated by taking the terminal width and incorrectly subtracting the `strlen()` of `line_prefix`, instead of the actual display width of `line_prefix`, which may contain UTF-8 characters (e.g., ANSI-colored strings in `log --graph --stat`). Utilize the display width instead, obtained via `utf8_strnwidth()` with the flag `skip_ansi`. Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> --- diff.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/diff.c b/diff.c index 35b903a9a0..395cb464f4 100644 --- a/diff.c +++ b/diff.c @@ -2749,7 +2749,9 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) count = i; /* where we can stop scanning in data->files[] */ /* - * We have width = stat_width or term_columns() columns total. + * We have width = stat_width or term_columns() columns total minus the + * length of line_prefix skipping UTF-8 chars to get the display width + * (e.g., to skip ANSI-colored strings in "log --graph --stat"). * We want a maximum of min(max_len, stat_name_width) for the name part. * We want a maximum of min(max_change, stat_graph_width) for the +- part. * We also need 1 for " " and 4 + decimal_width(max_change) @@ -2776,14 +2778,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) * separators and this message, this message will "overflow" * making the line longer than the maximum width. */ - - /* - * NEEDSWORK: line_prefix is often used for "log --graph" output - * and contains ANSI-colored string. utf8_strnwidth() should be - * used to correctly count the display width instead of strlen(). - */ if (options->stat_width == -1) - width = term_columns() - strlen(line_prefix); + width = term_columns() - utf8_strnwidth(line_prefix, strlen(line_prefix), 1); else width = options->stat_width ? options->stat_width : 80; number_width = decimal_width(max_change) > number_width ? -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [GSoC PATCH v2 2/2] t4052: add test for diffstat width when prefix contains UTF-8 chars 2026-02-27 16:01 ` [GSoC PATCH v2 0/2] diff: handle UTF-8 " LorenzoPegorari 2026-02-27 16:04 ` [GSoC PATCH v2 1/2] " LorenzoPegorari @ 2026-02-27 16:08 ` LorenzoPegorari 2026-02-27 18:08 ` Junio C Hamano 2026-02-27 18:04 ` [GSoC PATCH v2 0/2] diff: handle UTF-8 chars in prefix when calculating diffstat width Junio C Hamano 2026-02-27 21:43 ` [GSoC PATCH v3 0/2] diff: handle ANSI escape codes " LorenzoPegorari 3 siblings, 1 reply; 14+ messages in thread From: LorenzoPegorari @ 2026-02-27 16:08 UTC (permalink / raw) To: git; +Cc: Junio C Hamano Add test checking the calculation of the diffstat display width when the `line_prefix`, which is text that goes before the diffstat, contains UTF-8 characters. This situation happens, for example, when `git log --stat --graph` is executed: * `--stat` will create a diffstat for each commit * `--graph` will stuff `line_prefix` with the graph portion of the log, which contains UTF-8 characters (ANSI escape codes to color the text) Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> --- t/b | 0 t/t4052-stat-output.sh | 30 ++++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+) create mode 100644 t/b diff --git a/t/b b/t/b new file mode 100644 index 0000000000..e69de29bb2 diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh index 740bb97091..cc4665047d 100755 --- a/t/t4052-stat-output.sh +++ b/t/t4052-stat-output.sh @@ -413,4 +413,34 @@ test_expect_success 'merge --stat respects COLUMNS with long name' ' test_cmp expect actual ' +# git-log will print only 1 commit containing a single branch graph and a diffstat. +# The diffstat will be only one file, with a placeholder FILENAME, that, with +# enough terminal display width, will contain the following line: +# "<RED>|<RESET> ${FILENAME} | 0" +# where "<RED>" and "<RESET>" are ANSI escape codes to color the text. +# To calculate the minimium terminal display width MIN_TERM_WIDTH so that the +# FILENAME in the diffstat will not be shortened, we take the FILENAME length +# and add 9 to it. +# To check if the diffstat width, when the line_prefix (the "<RED>|<RESET>" of +# the graph) contains UTF-8 characters (the ANSI escape codes), is calculated +# correctly, we: +# 1. check if it contains the line defined before when using MIN_TERM_WIDTH +# 2. check if it contains the line defined before, but with the FILENAME +# shortened by only one character, when using MIN_TERM_WIDTH - 1 + +test_expect_success 'diffstat where line_prefix contains UTF-8 chars is correct width' ' + FILENAME="placeholder-text-placeholder-text" && + FILENAME_TRIMMED="...eholder-text-placeholder-text" && + MIN_TERM_WIDTH=$((${#FILENAME} + 9)) && + test_config color.diff always && + git commit --allow-empty --allow-empty-message && + >${FILENAME} && + git add ${FILENAME} && + git commit --allow-empty-message && + COLUMNS=$((MIN_TERM_WIDTH)) git log --graph --stat -n1 | test_decode_color >out && + test_grep "<RED>|<RESET> ${FILENAME} | 0" out && + COLUMNS=$((MIN_TERM_WIDTH - 1)) git log --graph --stat -n1 | test_decode_color >out && + test_grep "<RED>|<RESET> ${FILENAME_TRIMMED} | 0" out +' + test_done -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [GSoC PATCH v2 2/2] t4052: add test for diffstat width when prefix contains UTF-8 chars 2026-02-27 16:08 ` [GSoC PATCH v2 2/2] t4052: add test for diffstat width when prefix contains UTF-8 chars LorenzoPegorari @ 2026-02-27 18:08 ` Junio C Hamano 0 siblings, 0 replies; 14+ messages in thread From: Junio C Hamano @ 2026-02-27 18:08 UTC (permalink / raw) To: LorenzoPegorari; +Cc: git LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes: > Add test checking the calculation of the diffstat display width when the > `line_prefix`, which is text that goes before the diffstat, contains > UTF-8 characters. This does not want to say UTF-8 for two reasons. US-ASCII is a subset of UTF-8, and more importantly, most of the problems miscounting the display width of the line-prefix part comes from the fact that ANSI color escapes are 0-width but consumes strlen() bytes. Curiously ... > +# git-log will print only 1 commit containing a single branch graph and a diffstat. > +# The diffstat will be only one file, with a placeholder FILENAME, that, with > +# enough terminal display width, will contain the following line: > +# "<RED>|<RESET> ${FILENAME} | 0" > +# where "<RED>" and "<RESET>" are ANSI escape codes to color the text. ... this does say "ANSI escape codes to color the text", which is exactly what we want to say ;-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GSoC PATCH v2 0/2] diff: handle UTF-8 chars in prefix when calculating diffstat width 2026-02-27 16:01 ` [GSoC PATCH v2 0/2] diff: handle UTF-8 " LorenzoPegorari 2026-02-27 16:04 ` [GSoC PATCH v2 1/2] " LorenzoPegorari 2026-02-27 16:08 ` [GSoC PATCH v2 2/2] t4052: add test for diffstat width when prefix contains UTF-8 chars LorenzoPegorari @ 2026-02-27 18:04 ` Junio C Hamano 2026-02-27 21:43 ` [GSoC PATCH v3 0/2] diff: handle ANSI escape codes " LorenzoPegorari 3 siblings, 0 replies; 14+ messages in thread From: Junio C Hamano @ 2026-02-27 18:04 UTC (permalink / raw) To: LorenzoPegorari; +Cc: git LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes: > This patch aims to fix a bug where the calculation of the diffstat width > incorrectly uses the strlen() of line_prefix instead of its actual > display width. > > This patch addresses the NEEDSWORK item added by ce8529b2 (diff: leave > NEEDWORK notes in show_stats() function, 2022-10-21). > > V2 DIFF: > * Changed references from "ANSI char" to "UTF-8 char" Is that correct? I thought these references are mostly about the ANSI color escape sequences that are used to paint strings in color e.g., printf "This is shown in \033[31mRED\033[0m color\012" uses "\033[31m" (use RED as the foreground color) and "\033[0m" (reset all styles and colors). The problem the NEEDSWORK comment talks about is that the code uses strlen() but these two sequences in the above example are 0-width as far as the terminal display width computation is concerned. And that is why we want to use utf8_strnwidth() with SKIP_ANSI bit on. > diff.c | 12 ++++-------- > t/b | 0 > t/t4052-stat-output.sh | 30 ++++++++++++++++++++++++++++++ > 3 files changed, 34 insertions(+), 8 deletions(-) > create mode 100644 t/b I doubt you meant to add a new file there ;-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* [GSoC PATCH v3 0/2] diff: handle ANSI escape codes in prefix when calculating diffstat width 2026-02-27 16:01 ` [GSoC PATCH v2 0/2] diff: handle UTF-8 " LorenzoPegorari ` (2 preceding siblings ...) 2026-02-27 18:04 ` [GSoC PATCH v2 0/2] diff: handle UTF-8 chars in prefix when calculating diffstat width Junio C Hamano @ 2026-02-27 21:43 ` LorenzoPegorari 2026-02-27 21:45 ` [GSoC PATCH v3 1/2] " LorenzoPegorari 2026-02-27 21:48 ` [GSoC PATCH v3 2/2] t4052: test for diffstat width when prefix contains ANSI escape codes LorenzoPegorari 3 siblings, 2 replies; 14+ messages in thread From: LorenzoPegorari @ 2026-02-27 21:43 UTC (permalink / raw) To: git; +Cc: Junio C Hamano This patch aims to fix a bug where the calculation of the diffstat width incorrectly uses the strlen() of line_prefix instead of its actual display width. This patch addresses the NEEDSWORK item added by ce8529b2 (diff: leave NEEDWORK notes in show_stats() function, 2022-10-21). Thanks Junio for you guidance. V3 DIFF: * Changed references from "UTF-8 char" to "ANSI escape codes" * Removed mistakenly added empty file * Slightly improved the test script comment LorenzoPegorari (2): diff: handle ANSI escape codes in prefix when calculating diffstat width t4052: test for diffstat width when prefix contains ANSI escape codes diff.c | 12 ++++-------- t/t4052-stat-output.sh | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+), 8 deletions(-) Range-diff against v2: 1: 9e8161a700 ! 1: f75d6d779e diff: handle UTF-8 chars in prefix when calculating diffstat width @@ Metadata Author: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> ## Commit message ## - diff: handle UTF-8 chars in prefix when calculating diffstat width + diff: handle ANSI escape codes in prefix when calculating diffstat width The diffstat width is calculated by taking the terminal width and incorrectly subtracting the `strlen()` of `line_prefix`, instead of the - actual display width of `line_prefix`, which may contain UTF-8 - characters (e.g., ANSI-colored strings in `log --graph --stat`). + actual display width of `line_prefix`, which may contain ANSI escape + codes (e.g., ANSI-colored strings in `log --graph --stat`). Utilize the display width instead, obtained via `utf8_strnwidth()` with the flag `skip_ansi`. @@ diff.c: static void show_stats(struct diffstat_t *data, struct diff_options *opt /* - * We have width = stat_width or term_columns() columns total. + * We have width = stat_width or term_columns() columns total minus the -+ * length of line_prefix skipping UTF-8 chars to get the display width -+ * (e.g., to skip ANSI-colored strings in "log --graph --stat"). ++ * length of line_prefix skipping ANSI escape codes to get the display ++ * width (e.g., skip ANSI-colored strings in "log --graph --stat"). * We want a maximum of min(max_len, stat_name_width) for the name part. * We want a maximum of min(max_change, stat_graph_width) for the +- part. * We also need 1 for " " and 4 + decimal_width(max_change) 2: 984fa10d72 ! 2: 1d55bff06e t4052: add test for diffstat width when prefix contains UTF-8 chars @@ Metadata Author: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> ## Commit message ## - t4052: add test for diffstat width when prefix contains UTF-8 chars + t4052: test for diffstat width when prefix contains ANSI escape codes Add test checking the calculation of the diffstat display width when the `line_prefix`, which is text that goes before the diffstat, contains - UTF-8 characters. + ANSI escape codes. This situation happens, for example, when `git log --stat --graph` is executed: * `--stat` will create a diffstat for each commit * `--graph` will stuff `line_prefix` with the graph portion of the log, - which contains UTF-8 characters (ANSI escape codes to color the text) + which contains ANSI escape codes to color the text Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> - ## t/b (new) ## - ## t/t4052-stat-output.sh ## @@ t/t4052-stat-output.sh: test_expect_success 'merge --stat respects COLUMNS with long name' ' test_cmp expect actual ' -+# git-log will print only 1 commit containing a single branch graph and a diffstat. ++# We want git-log to print only 1 commit containing a single branch graph and a ++# diffstat (the diffstat display width, when not manually set through the ++# option "--stat-width", will be automatically calculated). +# The diffstat will be only one file, with a placeholder FILENAME, that, with +# enough terminal display width, will contain the following line: +# "<RED>|<RESET> ${FILENAME} | 0" @@ t/t4052-stat-output.sh: test_expect_success 'merge --stat respects COLUMNS with +# FILENAME in the diffstat will not be shortened, we take the FILENAME length +# and add 9 to it. +# To check if the diffstat width, when the line_prefix (the "<RED>|<RESET>" of -+# the graph) contains UTF-8 characters (the ANSI escape codes), is calculated -+# correctly, we: ++# the graph) contains ANSI escape codes (the ANSI escape codes to color the ++# text), is calculated correctly, we: +# 1. check if it contains the line defined before when using MIN_TERM_WIDTH +# 2. check if it contains the line defined before, but with the FILENAME +# shortened by only one character, when using MIN_TERM_WIDTH - 1 + -+test_expect_success 'diffstat where line_prefix contains UTF-8 chars is correct width' ' ++test_expect_success 'diffstat where line_prefix contains ANSI escape codes is correct width' ' + FILENAME="placeholder-text-placeholder-text" && + FILENAME_TRIMMED="...eholder-text-placeholder-text" && + MIN_TERM_WIDTH=$((${#FILENAME} + 9)) && -- 2.43.0 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [GSoC PATCH v3 1/2] diff: handle ANSI escape codes in prefix when calculating diffstat width 2026-02-27 21:43 ` [GSoC PATCH v3 0/2] diff: handle ANSI escape codes " LorenzoPegorari @ 2026-02-27 21:45 ` LorenzoPegorari 2026-02-27 21:48 ` [GSoC PATCH v3 2/2] t4052: test for diffstat width when prefix contains ANSI escape codes LorenzoPegorari 1 sibling, 0 replies; 14+ messages in thread From: LorenzoPegorari @ 2026-02-27 21:45 UTC (permalink / raw) To: git; +Cc: Junio C Hamano The diffstat width is calculated by taking the terminal width and incorrectly subtracting the `strlen()` of `line_prefix`, instead of the actual display width of `line_prefix`, which may contain ANSI escape codes (e.g., ANSI-colored strings in `log --graph --stat`). Utilize the display width instead, obtained via `utf8_strnwidth()` with the flag `skip_ansi`. Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> --- diff.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/diff.c b/diff.c index 35b903a9a0..6eaf40fe2e 100644 --- a/diff.c +++ b/diff.c @@ -2749,7 +2749,9 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) count = i; /* where we can stop scanning in data->files[] */ /* - * We have width = stat_width or term_columns() columns total. + * We have width = stat_width or term_columns() columns total minus the + * length of line_prefix skipping ANSI escape codes to get the display + * width (e.g., skip ANSI-colored strings in "log --graph --stat"). * We want a maximum of min(max_len, stat_name_width) for the name part. * We want a maximum of min(max_change, stat_graph_width) for the +- part. * We also need 1 for " " and 4 + decimal_width(max_change) @@ -2776,14 +2778,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) * separators and this message, this message will "overflow" * making the line longer than the maximum width. */ - - /* - * NEEDSWORK: line_prefix is often used for "log --graph" output - * and contains ANSI-colored string. utf8_strnwidth() should be - * used to correctly count the display width instead of strlen(). - */ if (options->stat_width == -1) - width = term_columns() - strlen(line_prefix); + width = term_columns() - utf8_strnwidth(line_prefix, strlen(line_prefix), 1); else width = options->stat_width ? options->stat_width : 80; number_width = decimal_width(max_change) > number_width ? -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [GSoC PATCH v3 2/2] t4052: test for diffstat width when prefix contains ANSI escape codes 2026-02-27 21:43 ` [GSoC PATCH v3 0/2] diff: handle ANSI escape codes " LorenzoPegorari 2026-02-27 21:45 ` [GSoC PATCH v3 1/2] " LorenzoPegorari @ 2026-02-27 21:48 ` LorenzoPegorari 1 sibling, 0 replies; 14+ messages in thread From: LorenzoPegorari @ 2026-02-27 21:48 UTC (permalink / raw) To: git; +Cc: Junio C Hamano Add test checking the calculation of the diffstat display width when the `line_prefix`, which is text that goes before the diffstat, contains ANSI escape codes. This situation happens, for example, when `git log --stat --graph` is executed: * `--stat` will create a diffstat for each commit * `--graph` will stuff `line_prefix` with the graph portion of the log, which contains ANSI escape codes to color the text Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com> --- t/t4052-stat-output.sh | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh index 740bb97091..7c749062e2 100755 --- a/t/t4052-stat-output.sh +++ b/t/t4052-stat-output.sh @@ -413,4 +413,36 @@ test_expect_success 'merge --stat respects COLUMNS with long name' ' test_cmp expect actual ' +# We want git-log to print only 1 commit containing a single branch graph and a +# diffstat (the diffstat display width, when not manually set through the +# option "--stat-width", will be automatically calculated). +# The diffstat will be only one file, with a placeholder FILENAME, that, with +# enough terminal display width, will contain the following line: +# "<RED>|<RESET> ${FILENAME} | 0" +# where "<RED>" and "<RESET>" are ANSI escape codes to color the text. +# To calculate the minimium terminal display width MIN_TERM_WIDTH so that the +# FILENAME in the diffstat will not be shortened, we take the FILENAME length +# and add 9 to it. +# To check if the diffstat width, when the line_prefix (the "<RED>|<RESET>" of +# the graph) contains ANSI escape codes (the ANSI escape codes to color the +# text), is calculated correctly, we: +# 1. check if it contains the line defined before when using MIN_TERM_WIDTH +# 2. check if it contains the line defined before, but with the FILENAME +# shortened by only one character, when using MIN_TERM_WIDTH - 1 + +test_expect_success 'diffstat where line_prefix contains ANSI escape codes is correct width' ' + FILENAME="placeholder-text-placeholder-text" && + FILENAME_TRIMMED="...eholder-text-placeholder-text" && + MIN_TERM_WIDTH=$((${#FILENAME} + 9)) && + test_config color.diff always && + git commit --allow-empty --allow-empty-message && + >${FILENAME} && + git add ${FILENAME} && + git commit --allow-empty-message && + COLUMNS=$((MIN_TERM_WIDTH)) git log --graph --stat -n1 | test_decode_color >out && + test_grep "<RED>|<RESET> ${FILENAME} | 0" out && + COLUMNS=$((MIN_TERM_WIDTH - 1)) git log --graph --stat -n1 | test_decode_color >out && + test_grep "<RED>|<RESET> ${FILENAME_TRIMMED} | 0" out +' + test_done -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-02-27 21:48 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-24 1:09 [GSoC PATCH 0/2] diff: handle ANSI chars in prefix when calculating diffstat width LorenzoPegorari 2026-02-24 1:11 ` [GSoC PATCH 1/2] " LorenzoPegorari 2026-02-24 1:20 ` [GSoC PATCH 2/2] t4074: add test for diffstat width when prefix contains ANSI chars LorenzoPegorari 2026-02-24 5:43 ` Junio C Hamano 2026-02-25 2:18 ` Lorenzo Pegorari 2026-02-24 5:17 ` [GSoC PATCH 0/2] diff: handle ANSI chars in prefix when calculating diffstat width Junio C Hamano 2026-02-27 16:01 ` [GSoC PATCH v2 0/2] diff: handle UTF-8 " LorenzoPegorari 2026-02-27 16:04 ` [GSoC PATCH v2 1/2] " LorenzoPegorari 2026-02-27 16:08 ` [GSoC PATCH v2 2/2] t4052: add test for diffstat width when prefix contains UTF-8 chars LorenzoPegorari 2026-02-27 18:08 ` Junio C Hamano 2026-02-27 18:04 ` [GSoC PATCH v2 0/2] diff: handle UTF-8 chars in prefix when calculating diffstat width Junio C Hamano 2026-02-27 21:43 ` [GSoC PATCH v3 0/2] diff: handle ANSI escape codes " LorenzoPegorari 2026-02-27 21:45 ` [GSoC PATCH v3 1/2] " LorenzoPegorari 2026-02-27 21:48 ` [GSoC PATCH v3 2/2] t4052: test for diffstat width when prefix contains ANSI escape codes LorenzoPegorari
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox