From: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
Date: Tue, 14 Feb 2012 19:23:40 +0100 [thread overview]
Message-ID: <20120214192340.2d473866@gmail.com> (raw)
In-Reply-To: <201202141831.59699.jnareb@gmail.com>
Jakub Narebski <jnareb@gmail.com> wrote:
> A few issues I have just noticed about this series.
>
>
> First, about naming. "Highlighting interesting parts of diff" is
> acceptable name, but "syntax highlighting for diff" is not: gitweb
> already use syntax highlighting in diff views. Call it "diff refinement
> highlighting", "highlighting changes / changed sections", "intraline
> highlighting".
OK, I'll make sure the naming is correct and consistent.
>
>
> Second, I think (but I am not sure) that there is a bug in code that
> finds common suffix and prefix.
>
> If I understand correctly the idea is to highlight changed part if
> there is at least one of common non-whitespace suffix or prefix.
> So syntax highlighting should look like this:
>
> 1. Both prefix and suffix are non empty and non whitespace only
>
> -foo -{bar} baz
> +foo +{quux} baz
>
> 2. Non empty and non whitespace only prefix
>
> -foo -{bar}
> +foo +{quux}
>
> 2. Non empty and non whitespace only suffix
>
> --{bar} baz
> ++{quux} baz
>
> But in your code $prefix is not the length of common prefix, but
> the position of end of prefix in the original line of diff. So
> you start with $prefix = 1... even though the prefix is empty.
>
> How is that supposed to work?
But see the check:
+ # Mark lines that are different from each other, but have some common
+ # part that isn't whitespace. If lines are completely different, don't
+ # mark them because that would make output unreadable, especially if
+ # diff consists of multiple lines.
+ if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == $#a)
^
|
This part.
+ || ($prefix_is_space && $suffix_is_space)) {
+ $esc_rem = esc_html($rem);
+ $esc_add = esc_html($add);
+ } else {
+ $esc_rem = esc_html_mark_range(\@r, $prefix, $suffix_rem);
+ $esc_add = esc_html_mark_range(\@a, $prefix, $suffix_add);
+ }
I guess it's still not correct because it should be equal to number of
parents or $prefix_length should start from 0 like you wrote later.
>
>
> On Mon, 13 Feb 2012, Michał Kiedrowicz wrote:
> > Jakub Narebski <jnareb@gmail.com> wrote:
> > > On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:
>
> > > > I haven't found *examples* on GitHub and Trac sites, but what about
> > > > these ones:
> > > >
> > > > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> > > > http://trac.edgewall.org/changeset/10973
> [...]
> > > BTW GitHub is closed source, but we can check what algorithm does Trac
> > > use for diff refinement highlighting (highlighting changed portions of
> > > diff).
> > >
> >
> > I think it's
> > http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
> > (see markup intraline_changes()).
>
> It is get_change_extent() that finds extent of changes, as a pair
> containing the offset at which the changes start, and the negative offset
> at which the changes end. So it is the same solution you use, only
> without ignoring whitespace-only prefixes and suffixes... This code can
> be easily ported to Perl, BTW.
>
> The markup_intraline_changes() function compares lines from preimage and
> from postimage pairwise, requiring that number of lines matches, the same
> like in your algorithm.
>
So using Jeff's diff-highlight we remain quite consistent with Trac
output. There's nothing we can "steal" from it.
> > > > I would also consider ignoring prefixes/suffixes with punctuation, like:
> > > >
> > > > - * I like you.
> > > > + * Alice had a little lamb.
> > >
> > > But this patch doesn't implement this feature yet, isn't it?
> >
> > No, but is a matter of adding
> >
> > -$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> > +$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/);
> >
> > (and the same for suffix)
>
> All right. But it is better added as separate patch.
Sure
> Perhaps even
> requiring that not only there is at least one of common prefix or common
> suffix, but at least one of them is not whitespace only could be put
> in a separate commit...
>
> > > > > > +sub format_rem_add_line {
> > > > > > + my ($rem, $add) = @_;
> > > > > > + my @r = split(//, $rem);
> > > > > > + my @a = split(//, $add);
> > >
> > > BTW the name of variable can be just @add and @rem.
> > >
> >
> > I know they are different scopes but I don't like it. It makes the code
> > more confusing IMO. But I won't insist.
>
> In my opinion if the variable refers to the same entity in different
> forms, using @foo and %foo (used in gitweb), or $foo and @foo (could be
> used here) is all right, and even better than trying to come up with
> different name for the same thing because of sigil.
OK. I'll follow that convention then.
>
> > > > > Shouldn't
> > > > > $prefix / $prefix_len start from 0, not from 1?
> > > >
> > > > It starts from 1 because it skips first +/-. It should become obvious
> > > > after reading the comment from last patch :).
>
> This means that $prefix is true even if prefix is empty ($prefix == 1).
> Wouldn't it be better for $prefix_len to count length of true prefix,
> without diff adornment? Or make @r / @rem skip initial characters...
>
> > > > + # In combined diff we must ignore two +/- characters.
> > > > + $prefix = 2 if ($is_combined);
> > >
> > > Anyway comment about that fact would be nice.
> >
> > Will do.
>
> BTW. it is not "2" but "scalar @{$co{'parents'}}".
>
OK.
>
> > > > The splitting and comparing by characters is taken from diff-highlight.
> > > > I don't think it's worth changing here.
> > >
> > > You are right.
> > >
> > > I'll try to come with hacky algorithm using string bitwise xor and regexp,
> > > and benchmark it comparing to your C-like solution, but it can be left for
> > > later (simple is better than clever, usually).
> >
> > If you have time :).
>
> Anyway it would be separate commit. Better to just copy tested code
> from contrib/diff-highlight
>
> BTW. would "git blame -C -C -C -w" detect this correctly as code
> movement^W copying?
>
Cannot say. Have you considered what I wrote in a separate e-mail,
about using diff-highlight output directly / as a library?
next prev parent reply other threads:[~2012-02-14 18:23 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
2012-02-11 15:20 ` Jakub Narebski
2012-02-11 23:03 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
2012-02-11 15:53 ` Jakub Narebski
2012-02-11 23:16 ` Michał Kiedrowicz
2012-02-25 9:00 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
2012-02-11 16:02 ` Jakub Narebski
2012-02-10 9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
2012-02-11 16:29 ` Jakub Narebski
2012-02-11 23:20 ` Michał Kiedrowicz
2012-02-11 23:30 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
2012-02-11 17:14 ` Jakub Narebski
2012-02-11 23:38 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 13:23 ` Jakub Narebski
2012-02-10 14:15 ` Michał Kiedrowicz
2012-02-10 14:55 ` Jakub Narebski
2012-02-10 17:33 ` Michał Kiedrowicz
2012-02-10 22:52 ` Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff) Jakub Narebski
2012-02-10 20:24 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
2012-02-14 6:54 ` Michal Kiedrowicz
2012-02-14 7:14 ` Junio C Hamano
2012-02-14 8:20 ` Jeff King
2012-02-10 20:20 ` Jeff King
2012-02-10 21:29 ` Michał Kiedrowicz
2012-02-10 21:32 ` Jeff King
2012-02-10 21:36 ` Michał Kiedrowicz
2012-02-10 21:47 ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
2012-02-13 22:27 ` Jeff King
2012-02-13 22:28 ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
2012-02-13 22:32 ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
2012-02-14 6:35 ` Michal Kiedrowicz
2012-02-13 22:33 ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
2012-02-13 22:36 ` [PATCH 4/5] diff-highlight: match " Jeff King
2012-02-13 22:37 ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
2012-02-14 6:48 ` Michal Kiedrowicz
2012-02-14 0:05 ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
2012-02-14 0:22 ` Jeff King
2012-02-14 1:19 ` Junio C Hamano
2012-02-14 6:04 ` Jeff King
2012-02-14 6:28 ` Michal Kiedrowicz
2012-02-10 21:56 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 23:45 ` Jakub Narebski
2012-02-12 10:42 ` Jakub Narebski
2012-02-13 6:54 ` Michal Kiedrowicz
2012-02-13 19:58 ` Jakub Narebski
2012-02-13 21:10 ` Michał Kiedrowicz
2012-02-13 6:41 ` Michal Kiedrowicz
2012-02-13 18:44 ` Jakub Narebski
2012-02-13 21:09 ` Michał Kiedrowicz
2012-02-14 17:31 ` Jakub Narebski
2012-02-14 18:23 ` Michał Kiedrowicz [this message]
2012-02-14 18:52 ` Jeff King
2012-02-14 20:04 ` Michał Kiedrowicz
2012-02-14 20:38 ` Jeff King
2012-02-10 9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
2012-02-12 0:11 ` Jakub Narebski
2012-02-13 6:46 ` Michal Kiedrowicz
2012-02-10 9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
2012-02-12 0:03 ` Jakub Narebski
2012-02-13 6:48 ` Michal Kiedrowicz
2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 22:56 ` Michał Kiedrowicz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120214192340.2d473866@gmail.com \
--to=michal.kiedrowicz@gmail.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.