All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [GSoC PATCH 1/1] diff: improve scaling of filenames in diffstat to handle UTF-8 chars
Date: Wed, 14 Jan 2026 14:50:02 -0800	[thread overview]
Message-ID: <xmqqikd3ermt.fsf@gitster.g> (raw)
In-Reply-To: <aWgYRkv-YsuekdR_@lorenzo-VM> (LorenzoPegorari's message of "Wed, 14 Jan 2026 23:27:18 +0100")

LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:

> The `show_stats()` function tries to scale the filenames in the diffstat to
> ensure they don't exceed the given `name-width`. It does so by calculating
> the "display width" of the characters to be dropped, but then advances the
> filename pointer by that number of bytes.
>
> However, the "display width" of a character is not always equal to its byte
> count. The result is that sometimes, when displaying UTF-8 characters,
> filenames exceed the given `name-width`, and frequently the bytes of the
> UTF-8 characters are truncated.
>
> The following is an example of the issue, where the 2 files are "HelloHi" and
> "Hello你好", and `name-width=6`:
>
>     ...oHi | 0
>     ...<BD><A0>好 | 0
>
> Make the filename pointer move by the actual number of bytes of the
> characters to drop from the filename, rather than their display width, using
> the `utf8_width()` function.
>
> Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> ---
>  diff.c | 15 ++++-----------
>  1 file changed, 4 insertions(+), 11 deletions(-)

Two comments and a half.

 * The change needed for this is surprisingly simple.

 * You already know about samples that may exhibit the issue you are
   addressing.  Can we add it as a test case somewhere in t/
   directory?

 * The NEEDSWORK item addressed by this patch is one of the two
   NEEDSWORK items added by ce8529b2 (diff: leave NEEDWORK notes in
   show_stats() function, 2022-10-21).  Makes me wonder how involved
   the changes would need to be to solve the other one?

Thanks.


> diff --git a/diff.c b/diff.c
> index a68ddd2168..271ace5728 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2859,17 +2859,10 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
>  			char *slash;
>  			prefix = "...";
>  			len -= 3;
> -			/*
> -			 * NEEDSWORK: (name_len - len) counts the display
> -			 * width, which would be shorter than the byte
> -			 * length of the corresponding substring.
> -			 * Advancing "name" by that number of bytes does
> -			 * *NOT* skip over that many columns, so it is
> -			 * very likely that chomping the pathname at the
> -			 * slash we will find starting from "name" will
> -			 * leave the resulting string still too long.
> -			 */
> -			name += name_len - len;
> +
> +			while (name_len > len)
> +				name_len -= utf8_width((const char**)&name, NULL);
> +
>  			slash = strchr(name, '/');
>  			if (slash)
>  				name = slash;

  reply	other threads:[~2026-01-14 22:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-14 22:27 [GSoC PATCH 1/1] diff: improve scaling of filenames in diffstat to handle UTF-8 chars LorenzoPegorari
2026-01-14 22:50 ` Junio C Hamano [this message]
2026-01-16  0:00   ` Lorenzo Pegorari
2026-01-16  0:04 ` [GSoC PATCH v2 0/2] " LorenzoPegorari
2026-01-16  0:05   ` [GSoC PATCH v2 1/2] " LorenzoPegorari
2026-01-16  0:05   ` [GSoC PATCH v2 2/2] t4073: add test for diffstat paths length when containing " LorenzoPegorari
2026-01-17 17:52     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqikd3ermt.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=lorenzo.pegorari2002@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.