From: Jeff King <peff@peff.net>
To: "Yi, EungJun" <semtlenori@gmail.com>
Cc: "Kyle J. McKay" <mackyle@gmail.com>, Git List <git@vger.kernel.org>
Subject: Re: [PATCH] diff-highlight: Fix broken multibyte string
Date: Fri, 3 Apr 2015 18:08:22 -0400 [thread overview]
Message-ID: <20150403220821.GB11220@peff.net> (raw)
In-Reply-To: <CAFT+Tg8-tUBAvgX1bTni7joye_ZuZ_NOT_mmamnnm5GdWzEhrg@mail.gmail.com>
On Fri, Apr 03, 2015 at 11:19:24AM +0900, Yi, EungJun wrote:
> > I timed this one versus the existing diff-highlight. It's about 7%
> > slower. That's not great, but is acceptable to me. The String::Multibyte
> > version was a lot faster, which was nice (but I'm still unclear on
> > _why_).
>
> I think the reason is here:
>
> > sub split_line {
> > local $_ = shift;
> > return map { /$COLOR/ ? $_ : ($mbcs ? $mbcs->strsplit('', $_) : split //) }
> > split /($COLOR)/;
> > }
>
> I removed "*" from "split /($COLOR*)/". Actually I don't know why "*"
> was required but I need to remove it to make my patch works correctly.
Ah, OK, that makes more sense. The "*" was meant to handle the case of
multiple groups of ANSI colors in a row. But I think it should have been
"+" in that case, as we would otherwise split on the empty field, which
would mean character-by-character. And the second "split" in the map
would then be superfluous, which would break your patch (we've already
split the multi-byte characters before we even hit $mbcs->strsplit).
Kyle's patch does not care, because it tweaks the string so that normal
split works. Which means there is an easy speedup here. :)
Doing:
diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight
index 08c88bb..1c4b599 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -165,7 +165,7 @@ sub highlight_pair {
sub split_line {
local $_ = shift;
return map { /$COLOR/ ? $_ : (split //) }
- split /($COLOR*)/;
+ split /($COLOR+)/;
}
sub highlight_line {
gives me a 25% speed improvement, and the same output processing
git.git's entire "git log -p" output.
I thought that meant we could also optimize out the "map" call entirely,
and just use the first split (with "*") to end up with a list of $COLOR
chunks and single characters, but it does not seem to work. So maybe I
am misreading something about what is going on.
-Peff
next prev parent reply other threads:[~2015-04-03 22:08 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-30 15:55 [PATCH] diff-highlight: Fix broken multibyte string Yi EungJun
2015-03-30 22:16 ` Jeff King
2015-04-03 0:49 ` Kyle J. McKay
2015-04-03 1:24 ` Jeff King
2015-04-03 1:59 ` Kyle J. McKay
2015-04-03 21:47 ` Jeff King
2015-04-03 2:19 ` Yi, EungJun
2015-04-03 22:08 ` Jeff King [this message]
2015-04-03 22:24 ` Kyle J. McKay
2015-04-04 14:10 ` Jeff King
2015-04-03 22:15 ` [PATCH v3] diff-highlight: do not split multibyte characters Kyle J. McKay
2015-04-04 14:09 ` Jeff King
2015-04-04 14:47 ` Yi, EungJun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150403220821.GB11220@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=mackyle@gmail.com \
--cc=semtlenori@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.