All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Kyle J. McKay" <mackyle@gmail.com>
Cc: Yi EungJun <semtlenori@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH v3] diff-highlight: do not split multibyte characters
Date: Sat, 4 Apr 2015 10:09:02 -0400	[thread overview]
Message-ID: <20150404140902.GA25455@peff.net> (raw)
In-Reply-To: <6a8dcc870e53040e1f54d7c36a1b33a@74d39fa044aa309eaea14b9f57fe79c>

On Fri, Apr 03, 2015 at 03:15:14PM -0700, Kyle J. McKay wrote:

> When the input is UTF-8 and Perl is operating on bytes instead of
> characters, a diff that changes one multibyte character to another
> that shares an initial byte sequence will result in a broken diff
> display as the common byte sequence prefix will be separated from
> the rest of the bytes in the multibyte character.
> 
> For example, if a single line contains only the unicode character
> U+C9C4 (encoded as UTF-8 0xEC, 0xA7, 0x84) and that line is then
> changed to the unicode character U+C9C0 (encoded as UTF-8 0xEC,
> 0xA7, 0x80), when operating on bytes diff-highlight will show only
> the single byte change from 0x84 to 0x80 thus creating invalid UTF-8
> and a broken diff display.
> 
> Fix this by putting Perl into character mode when splitting the line
> and then back into byte mode after the split is finished.
> 
> The utf8::xxx functions require Perl 5.8 so we require that as well.
> 
> Also, since we are mucking with code in the split_line function, we
> change a '*' quantifier to a '+' quantifier when matching the $COLOR
> expression which has the side effect of speeding everything up while
> eliminating useless '' elements in the returned array.
> 
> Reported-by: Yi EungJun <semtlenori@gmail.com>
> Signed-off-by: Kyle J. McKay <mackyle@gmail.com>

This version looks good to me. I looked over the diff of running "git
log -p --color" on git.git through diff-highlight before and after this
patch, and everything looks like an improvement.

  Acked-by: Jeff King <peff@peff.net>

Thanks both of you for working on this.

-Peff

  reply	other threads:[~2015-04-04 14:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-30 15:55 [PATCH] diff-highlight: Fix broken multibyte string Yi EungJun
2015-03-30 22:16 ` Jeff King
2015-04-03  0:49   ` Kyle J. McKay
2015-04-03  1:24     ` Jeff King
2015-04-03  1:59       ` Kyle J. McKay
2015-04-03 21:47         ` Jeff King
2015-04-03  2:19       ` Yi, EungJun
2015-04-03 22:08         ` Jeff King
2015-04-03 22:24           ` Kyle J. McKay
2015-04-04 14:10             ` Jeff King
2015-04-03 22:15         ` [PATCH v3] diff-highlight: do not split multibyte characters Kyle J. McKay
2015-04-04 14:09           ` Jeff King [this message]
2015-04-04 14:47             ` Yi, EungJun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150404140902.GA25455@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=mackyle@gmail.com \
    --cc=semtlenori@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.