public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Pegorari <lorenzo.pegorari2002@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [GSoC PATCH 1/1] diff: improve scaling of filenames in diffstat to handle UTF-8 chars
Date: Fri, 16 Jan 2026 01:00:33 +0100	[thread overview]
Message-ID: <aWl_oeJJxtwsUyR3@lorenzo-VM> (raw)
In-Reply-To: <xmqqikd3ermt.fsf@gitster.g>

On Wed, Jan 14, 2026 at 02:50:02PM -0800, Junio C Hamano wrote:
> LorenzoPegorari <lorenzo.pegorari2002@gmail.com> writes:
> 
> > The `show_stats()` function tries to scale the filenames in the diffstat to
> > ensure they don't exceed the given `name-width`. It does so by calculating
> > the "display width" of the characters to be dropped, but then advances the
> > filename pointer by that number of bytes.
> >
> > However, the "display width" of a character is not always equal to its byte
> > count. The result is that sometimes, when displaying UTF-8 characters,
> > filenames exceed the given `name-width`, and frequently the bytes of the
> > UTF-8 characters are truncated.
> >
> > The following is an example of the issue, where the 2 files are "HelloHi" and
> > "Hello你好", and `name-width=6`:
> >
> >     ...oHi | 0
> >     ...<BD><A0>好 | 0
> >
> > Make the filename pointer move by the actual number of bytes of the
> > characters to drop from the filename, rather than their display width, using
> > the `utf8_width()` function.
> >
> > Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
> > ---
> >  diff.c | 15 ++++-----------
> >  1 file changed, 4 insertions(+), 11 deletions(-)
> 
> Two comments and a half.
> 
>  * The change needed for this is surprisingly simple.

It is indeed surprisingly simple, I agree!

>  * You already know about samples that may exhibit the issue you are
>    addressing.  Can we add it as a test case somewhere in t/
>    directory?

Yeah, we should add a test case. I will do it in the next reroll.

>  * The NEEDSWORK item addressed by this patch is one of the two
>    NEEDSWORK items added by ce8529b2 (diff: leave NEEDWORK notes in
>    show_stats() function, 2022-10-21).  Makes me wonder how involved
>    the changes would need to be to solve the other one?

Mmh, I see. I'll take a closer look, but at a first glance it doesn't
seem too involved.

>
> Thanks.
>

Thank you!

> 
> > diff --git a/diff.c b/diff.c
> > index a68ddd2168..271ace5728 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -2859,17 +2859,10 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
> >  			char *slash;
> >  			prefix = "...";
> >  			len -= 3;
> > -			/*
> > -			 * NEEDSWORK: (name_len - len) counts the display
> > -			 * width, which would be shorter than the byte
> > -			 * length of the corresponding substring.
> > -			 * Advancing "name" by that number of bytes does
> > -			 * *NOT* skip over that many columns, so it is
> > -			 * very likely that chomping the pathname at the
> > -			 * slash we will find starting from "name" will
> > -			 * leave the resulting string still too long.
> > -			 */
> > -			name += name_len - len;
> > +
> > +			while (name_len > len)
> > +				name_len -= utf8_width((const char**)&name, NULL);
> > +
> >  			slash = strchr(name, '/');
> >  			if (slash)
> >  				name = slash;

  reply	other threads:[~2026-01-16  0:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-14 22:27 [GSoC PATCH 1/1] diff: improve scaling of filenames in diffstat to handle UTF-8 chars LorenzoPegorari
2026-01-14 22:50 ` Junio C Hamano
2026-01-16  0:00   ` Lorenzo Pegorari [this message]
2026-01-16  0:04 ` [GSoC PATCH v2 0/2] " LorenzoPegorari
2026-01-16  0:05   ` [GSoC PATCH v2 1/2] " LorenzoPegorari
2026-01-16  0:05   ` [GSoC PATCH v2 2/2] t4073: add test for diffstat paths length when containing " LorenzoPegorari
2026-01-17 17:52     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWl_oeJJxtwsUyR3@lorenzo-VM \
    --to=lorenzo.pegorari2002@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox