From: Jakub Narebski <jnareb@gmail.com>
To: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
Date: Tue, 14 Feb 2012 18:31:58 +0100 [thread overview]
Message-ID: <201202141831.59699.jnareb@gmail.com> (raw)
In-Reply-To: <20120213220917.4cf14eb1@gmail.com>
A few issues I have just noticed about this series.
First, about naming. "Highlighting interesting parts of diff" is
acceptable name, but "syntax highlighting for diff" is not: gitweb
already use syntax highlighting in diff views. Call it "diff refinement
highlighting", "highlighting changes / changed sections", "intraline
highlighting".
Second, I think (but I am not sure) that there is a bug in code that
finds common suffix and prefix.
If I understand correctly the idea is to highlight changed part if
there is at least one of common non-whitespace suffix or prefix.
So syntax highlighting should look like this:
1. Both prefix and suffix are non empty and non whitespace only
-foo -{bar} baz
+foo +{quux} baz
2. Non empty and non whitespace only prefix
-foo -{bar}
+foo +{quux}
2. Non empty and non whitespace only suffix
--{bar} baz
++{quux} baz
But in your code $prefix is not the length of common prefix, but
the position of end of prefix in the original line of diff. So
you start with $prefix = 1... even though the prefix is empty.
How is that supposed to work?
On Mon, 13 Feb 2012, Michał Kiedrowicz wrote:
> Jakub Narebski <jnareb@gmail.com> wrote:
> > On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:
> > > I haven't found *examples* on GitHub and Trac sites, but what about
> > > these ones:
> > >
> > > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> > > http://trac.edgewall.org/changeset/10973
[...]
> > BTW GitHub is closed source, but we can check what algorithm does Trac
> > use for diff refinement highlighting (highlighting changed portions of
> > diff).
> >
>
> I think it's
> http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
> (see markup intraline_changes()).
It is get_change_extent() that finds extent of changes, as a pair
containing the offset at which the changes start, and the negative offset
at which the changes end. So it is the same solution you use, only
without ignoring whitespace-only prefixes and suffixes... This code can
be easily ported to Perl, BTW.
The markup_intraline_changes() function compares lines from preimage and
from postimage pairwise, requiring that number of lines matches, the same
like in your algorithm.
[...]
> > I wonder if we can use --diff-words for diff refinement highlighting,
> > i.e. LCS on words.
>
> I think we can try it, but I worry about performance of running `git
> diff` on every diff chunk.
I was thinking about one single additional run of git-diff-tree with
`--diff-words`, not one per chunk.
Or perhaps even put it together in one git-diff-tree invocation, just like
'commitdiff' action / git_commitdiff() subroutine uses single git-diff-tree
invocation, with the option "--patch-with-raw", to generate both raw diff
for difftree and patchset.
> > Anyway Jeff's approach is a bit limited, in that it would work only for
> > change that does not involve adding newlines, for example splitting
> > overly long line when changing something.
> >
> > See for example line 1786 (in pre-image) in http://trac.edgewall.org/changeset/10973
>
> Yes, I'm aware of that. I was thinking about improving it later ("Let's
> start with a simple refinment highlightning and maybe later add more
> sophisticated algorithms").
Right.
[...]
> > BTW. is it "at least one of prefix or suffix are non-empty" or "both prefix
> > and suffix are non-empty"?
> >
>
> At least one. See:
>
> -a = 42;
> +b = 42;
>
> Here prefix is empty but suffix is not.
Nb. prefix is empty but $prefix == 1, and is boolean true.
> > > I would also consider ignoring prefixes/suffixes with punctuation, like:
> > >
> > > - * I like you.
> > > + * Alice had a little lamb.
> >
> > But this patch doesn't implement this feature yet, isn't it?
>
> No, but is a matter of adding
>
> -$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> +$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/);
>
> (and the same for suffix)
All right. But it is better added as separate patch. Perhaps even
requiring that not only there is at least one of common prefix or common
suffix, but at least one of them is not whitespace only could be put
in a separate commit...
> > > > > +sub format_rem_add_line {
> > > > > + my ($rem, $add) = @_;
> > > > > + my @r = split(//, $rem);
> > > > > + my @a = split(//, $add);
> >
> > BTW the name of variable can be just @add and @rem.
> >
>
> I know they are different scopes but I don't like it. It makes the code
> more confusing IMO. But I won't insist.
In my opinion if the variable refers to the same entity in different
forms, using @foo and %foo (used in gitweb), or $foo and @foo (could be
used here) is all right, and even better than trying to come up with
different name for the same thing because of sigil.
> > > > Shouldn't
> > > > $prefix / $prefix_len start from 0, not from 1?
> > >
> > > It starts from 1 because it skips first +/-. It should become obvious
> > > after reading the comment from last patch :).
This means that $prefix is true even if prefix is empty ($prefix == 1).
Wouldn't it be better for $prefix_len to count length of true prefix,
without diff adornment? Or make @r / @rem skip initial characters...
> > > + # In combined diff we must ignore two +/- characters.
> > > + $prefix = 2 if ($is_combined);
> >
> > Anyway comment about that fact would be nice.
>
> Will do.
BTW. it is not "2" but "scalar @{$co{'parents'}}".
> > > The splitting and comparing by characters is taken from diff-highlight.
> > > I don't think it's worth changing here.
> >
> > You are right.
> >
> > I'll try to come with hacky algorithm using string bitwise xor and regexp,
> > and benchmark it comparing to your C-like solution, but it can be left for
> > later (simple is better than clever, usually).
>
> If you have time :).
Anyway it would be separate commit. Better to just copy tested code
from contrib/diff-highlight
BTW. would "git blame -C -C -C -w" detect this correctly as code
movement^W copying?
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2012-02-14 17:32 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
2012-02-11 15:20 ` Jakub Narebski
2012-02-11 23:03 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
2012-02-11 15:53 ` Jakub Narebski
2012-02-11 23:16 ` Michał Kiedrowicz
2012-02-25 9:00 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
2012-02-11 16:02 ` Jakub Narebski
2012-02-10 9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
2012-02-11 16:29 ` Jakub Narebski
2012-02-11 23:20 ` Michał Kiedrowicz
2012-02-11 23:30 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
2012-02-11 17:14 ` Jakub Narebski
2012-02-11 23:38 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 13:23 ` Jakub Narebski
2012-02-10 14:15 ` Michał Kiedrowicz
2012-02-10 14:55 ` Jakub Narebski
2012-02-10 17:33 ` Michał Kiedrowicz
2012-02-10 22:52 ` Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff) Jakub Narebski
2012-02-10 20:24 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
2012-02-14 6:54 ` Michal Kiedrowicz
2012-02-14 7:14 ` Junio C Hamano
2012-02-14 8:20 ` Jeff King
2012-02-10 20:20 ` Jeff King
2012-02-10 21:29 ` Michał Kiedrowicz
2012-02-10 21:32 ` Jeff King
2012-02-10 21:36 ` Michał Kiedrowicz
2012-02-10 21:47 ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
2012-02-13 22:27 ` Jeff King
2012-02-13 22:28 ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
2012-02-13 22:32 ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
2012-02-14 6:35 ` Michal Kiedrowicz
2012-02-13 22:33 ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
2012-02-13 22:36 ` [PATCH 4/5] diff-highlight: match " Jeff King
2012-02-13 22:37 ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
2012-02-14 6:48 ` Michal Kiedrowicz
2012-02-14 0:05 ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
2012-02-14 0:22 ` Jeff King
2012-02-14 1:19 ` Junio C Hamano
2012-02-14 6:04 ` Jeff King
2012-02-14 6:28 ` Michal Kiedrowicz
2012-02-10 21:56 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 23:45 ` Jakub Narebski
2012-02-12 10:42 ` Jakub Narebski
2012-02-13 6:54 ` Michal Kiedrowicz
2012-02-13 19:58 ` Jakub Narebski
2012-02-13 21:10 ` Michał Kiedrowicz
2012-02-13 6:41 ` Michal Kiedrowicz
2012-02-13 18:44 ` Jakub Narebski
2012-02-13 21:09 ` Michał Kiedrowicz
2012-02-14 17:31 ` Jakub Narebski [this message]
2012-02-14 18:23 ` Michał Kiedrowicz
2012-02-14 18:52 ` Jeff King
2012-02-14 20:04 ` Michał Kiedrowicz
2012-02-14 20:38 ` Jeff King
2012-02-10 9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
2012-02-12 0:11 ` Jakub Narebski
2012-02-13 6:46 ` Michal Kiedrowicz
2012-02-10 9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
2012-02-12 0:03 ` Jakub Narebski
2012-02-13 6:48 ` Michal Kiedrowicz
2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 22:56 ` Michał Kiedrowicz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201202141831.59699.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=michal.kiedrowicz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).