git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
Date: Tue, 14 Feb 2012 19:23:40 +0100	[thread overview]
Message-ID: <20120214192340.2d473866@gmail.com> (raw)
In-Reply-To: <201202141831.59699.jnareb@gmail.com>

Jakub Narebski <jnareb@gmail.com> wrote:

> A few issues I have just noticed about this series.
> 
> 
> First, about naming.  "Highlighting interesting parts of diff" is
> acceptable name, but "syntax highlighting for diff" is not: gitweb
> already use syntax highlighting in diff views.  Call it "diff refinement
> highlighting", "highlighting changes / changed sections", "intraline
> highlighting".

OK, I'll make sure the naming is correct and consistent.
> 
> 
> Second, I think (but I am not sure) that there is a bug in code that
> finds common suffix and prefix.
> 
> If I understand correctly the idea is to highlight changed part if
> there is at least one of common non-whitespace suffix or prefix.
> So syntax highlighting should look like this:
> 
> 1. Both prefix and suffix are non empty and non whitespace only
> 
>   -foo -{bar} baz
>   +foo +{quux} baz
> 
> 2. Non empty and non whitespace only prefix
> 
>   -foo -{bar}
>   +foo +{quux}
> 
> 2. Non empty and non whitespace only suffix
> 
>   --{bar} baz
>   ++{quux} baz
> 
> But in your code $prefix is not the length of common prefix, but
> the position of end of prefix in the original line of diff.  So
> you start with $prefix = 1... even though the prefix is empty.
> 
> How is that supposed to work?

But see the check:

+	# Mark lines that are different from each other, but have some common
+	# part that isn't whitespace.  If lines are completely different, don't
+	# mark them because that would make output unreadable, especially if
+	# diff consists of multiple lines.
+	if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == $#a)
                        ^
                        |
                        This part.

+		|| ($prefix_is_space && $suffix_is_space)) {
+		$esc_rem = esc_html($rem);
+		$esc_add = esc_html($add);
+	} else {
+		$esc_rem = esc_html_mark_range(\@r, $prefix, $suffix_rem);
+		$esc_add = esc_html_mark_range(\@a, $prefix, $suffix_add);
+	}

I guess it's still not correct because it should be equal to number of
parents or $prefix_length should start from 0 like you wrote later.

> 
> 
> On Mon, 13 Feb 2012, Michał Kiedrowicz wrote:
> > Jakub Narebski <jnareb@gmail.com> wrote:
> > > On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:
> 
> > > > I haven't found *examples* on GitHub and Trac sites, but what about
> > > > these ones:
> > > > 
> > > > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> > > > http://trac.edgewall.org/changeset/10973
> [...]
> > > BTW GitHub is closed source, but we can check what algorithm does Trac
> > > use for diff refinement highlighting (highlighting changed portions of
> > > diff).
> > > 
> > 
> > I think it's
> > http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
> > (see markup intraline_changes()).
>  
> It is get_change_extent() that finds extent of changes, as a pair
> containing the offset at which the changes start, and the negative offset
> at which the changes end.  So it is the same solution you use, only
> without ignoring whitespace-only prefixes and suffixes...  This code can
> be easily ported to Perl, BTW.
> 
> The markup_intraline_changes() function compares lines from preimage and
> from postimage pairwise, requiring that number of lines matches, the same
> like in your algorithm.
> 

So using Jeff's diff-highlight we remain quite consistent with Trac
output. There's nothing we can "steal" from it. 

> > > > I would also consider ignoring prefixes/suffixes with punctuation, like:
> > > > 
> > > > 	- * I like you.
> > > > 	+ * Alice had a little lamb.
> > > 
> > > But this patch doesn't implement this feature yet, isn't it?
> > 
> > No, but is a matter of adding
> > 
> > 	-$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> > 	+$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/);
> > 
> > (and the same for suffix)
> 
> All right.  But it is better added as separate patch.

Sure 

> Perhaps even
> requiring that not only there is at least one of common prefix or common
> suffix, but at least one of them is not whitespace only could be put
> in a separate commit...
> 
> > > > > > +sub format_rem_add_line {
> > > > > > +	my ($rem, $add) = @_;
> > > > > > +	my @r = split(//, $rem);
> > > > > > +	my @a = split(//, $add);
> > > 
> > > BTW the name of variable can be just @add and @rem.
> > >
> > 
> > I know they are different scopes but I don't like it. It makes the code
> > more confusing IMO. But I won't insist.
> 
> In my opinion if the variable refers to the same entity in different
> forms, using @foo and %foo (used in gitweb), or $foo and @foo (could be
> used here) is all right, and even better than trying to come up with
> different name for the same thing because of sigil.

OK. I'll follow that convention then.

>  
> > > > > Shouldn't
> > > > > $prefix / $prefix_len start from 0, not from 1?
> > > > 
> > > > It starts from 1 because it skips first +/-. It should become obvious
> > > > after reading the comment from last patch :).
> 
> This means that $prefix is true even if prefix is empty ($prefix == 1).
> Wouldn't it be better for $prefix_len to count length of true prefix,
> without diff adornment?  Or make @r / @rem skip initial characters...
> 
> > > > +	# In combined diff we must ignore two +/- characters.
> > > > +	$prefix = 2 if ($is_combined);
> > > 
> > > Anyway comment about that fact would be nice.
> > 
> > Will do.
> 
> BTW. it is not "2" but "scalar @{$co{'parents'}}".
> 

OK.

> 
> > > > The splitting and comparing by characters is taken from diff-highlight.
> > > > I don't think it's worth changing here.
> > > 
> > > You are right.
> > > 
> > > I'll try to come with hacky algorithm using string bitwise xor and regexp,
> > > and benchmark it comparing to your C-like solution, but it can be left for
> > > later (simple is better than clever, usually).
> > 
> > If you have time :).
> 
> Anyway it would be separate commit.  Better to just copy tested code
> from contrib/diff-highlight
> 
> BTW. would "git blame -C -C -C -w" detect this correctly as code
> movement^W copying?
>  

Cannot say. Have you considered what I wrote in a separate e-mail,
about using diff-highlight output directly / as a library? 

  reply	other threads:[~2012-02-14 18:23 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
2012-02-11 15:20   ` Jakub Narebski
2012-02-11 23:03     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
2012-02-11 15:53   ` Jakub Narebski
2012-02-11 23:16     ` Michał Kiedrowicz
2012-02-25  9:00     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
2012-02-11 16:02   ` Jakub Narebski
2012-02-10  9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
2012-02-11 16:29   ` Jakub Narebski
2012-02-11 23:20     ` Michał Kiedrowicz
2012-02-11 23:30       ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
2012-02-11 17:14   ` Jakub Narebski
2012-02-11 23:38     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 13:23   ` Jakub Narebski
2012-02-10 14:15     ` Michał Kiedrowicz
2012-02-10 14:55       ` Jakub Narebski
2012-02-10 17:33         ` Michał Kiedrowicz
2012-02-10 22:52           ` Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff) Jakub Narebski
2012-02-10 20:24         ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
2012-02-14  6:54     ` Michal Kiedrowicz
2012-02-14  7:14       ` Junio C Hamano
2012-02-14  8:20         ` Jeff King
2012-02-10 20:20   ` Jeff King
2012-02-10 21:29     ` Michał Kiedrowicz
2012-02-10 21:32       ` Jeff King
2012-02-10 21:36         ` Michał Kiedrowicz
2012-02-10 21:47         ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
2012-02-13 22:27           ` Jeff King
2012-02-13 22:28             ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
2012-02-13 22:32             ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
2012-02-14  6:35               ` Michal Kiedrowicz
2012-02-13 22:33             ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
2012-02-13 22:36             ` [PATCH 4/5] diff-highlight: match " Jeff King
2012-02-13 22:37             ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
2012-02-14  6:48               ` Michal Kiedrowicz
2012-02-14  0:05             ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
2012-02-14  0:22               ` Jeff King
2012-02-14  1:19                 ` Junio C Hamano
2012-02-14  6:04                   ` Jeff King
2012-02-14  6:28             ` Michal Kiedrowicz
2012-02-10 21:56     ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 23:45   ` Jakub Narebski
2012-02-12 10:42     ` Jakub Narebski
2012-02-13  6:54       ` Michal Kiedrowicz
2012-02-13 19:58         ` Jakub Narebski
2012-02-13 21:10           ` Michał Kiedrowicz
2012-02-13  6:41     ` Michal Kiedrowicz
2012-02-13 18:44       ` Jakub Narebski
2012-02-13 21:09         ` Michał Kiedrowicz
2012-02-14 17:31           ` Jakub Narebski
2012-02-14 18:23             ` Michał Kiedrowicz [this message]
2012-02-14 18:52               ` Jeff King
2012-02-14 20:04                 ` Michał Kiedrowicz
2012-02-14 20:38                   ` Jeff King
2012-02-10  9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
2012-02-12  0:11   ` Jakub Narebski
2012-02-13  6:46     ` Michal Kiedrowicz
2012-02-10  9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
2012-02-12  0:03   ` Jakub Narebski
2012-02-13  6:48     ` Michal Kiedrowicz
2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 22:56   ` Michał Kiedrowicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120214192340.2d473866@gmail.com \
    --to=michal.kiedrowicz@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).