From: Michal Kiedrowicz <michal.kiedrowicz@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
Date: Mon, 13 Feb 2012 07:41:25 +0100 [thread overview]
Message-ID: <20120213074125.3e8793ad@mkiedrowicz.ivo.pl> (raw)
In-Reply-To: <m3y5s9rl3g.fsf@localhost.localdomain>
Jakub Narebski <jnareb@gmail.com> wrote:
> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
>
> > Reading diff output is sometimes very hard, even if it's colored,
> > especially if lines differ only in few characters. This is often
> > true when a commit fixes a typo or renames some variables or
> > functions.
> >
> This is certainly nice feature to have. I think most tools that
> implement side-by-side diff implement this too.
>
> > This commit teaches gitweb to highlight characters that are
> > different between old and new line. This should work in the
> > similar manner as in Trac or GitHub.
> >
> Doe you have URLs with good examples of such diff refinement
> highlighting (GNU Emacs ediff/emerge terminology) / highlighting
> differing segments (diff-highlight terminology)?
I haven't found *examples* on GitHub and Trac sites, but what about
these ones:
https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
http://trac.edgewall.org/changeset/10973
>
> > The code that compares lines is based on
> > contrib/diff-highlight/diff-highlight, except that it works with
> > multiline changes too. It also won't highlight lines that are
> > completely different because that would only make the output
> > unreadable.
> >
> So what does it look like? From the contrib/diff-highlight/README I
> guess that it finds common prefix and common suffix, and highlights
> the rest, which includes change.
Yes.
> It doesn't implement LCS / diff
> algorithm like e.g. tkdiff does for its diff refinement highlighting,
> isn't it?
>
I doesn't. I share the Jeff's opinion that:
a) Jeff's approach is "good enough"
b) LCS on bytes could be very confusing if it marked few sets of
characters.
> By completly different you mean that they do not have common prefix or
> common suffix (at least one of them), isn't it?
Yes, but I also don't highlight lines which prefix/suffix consists only
of whitespace (This is quite common). I would also consider ignoring
prefixes/suffixes with punctuation, like:
- * I like you.
+ * Alice had a little lamb.
>
> > Combined diffs are not supported but a following commit will change
> > it.
> >
> O.K.
>
> > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> > ---
> > gitweb/gitweb.perl | 82
> > ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 files
> > changed, 77 insertions(+), 5 deletions(-)
> >
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index db61553..1a5b454 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -2322,7 +2322,7 @@ sub format_cc_diff_chunk_header {
> > # wrap patch (diff) line into a <div> (not to be used for diff
> > headers), # the line must be esc_html()'ed
> > sub format_diff_line {
> > - my ($line, $diff_class, $from, $to) = @_;
> > + my ($line, $diff_class) = @_;
>
> Why that change?
I think it fallout from previous patch where format_diff_line() stopped
using $from and $to. Will fix that.
>
> >
> > my $diff_classes = "diff";
> > $diff_classes .= " $diff_class" if ($diff_class);
> > @@ -4923,14 +4923,85 @@ sub print_inline_diff_lines {
> > print foreach (@$add);
> > }
> >
> > +# Highlight characters from $prefix to $suffix and escape HTML.
> > +# $str is a reference to the array of characters.
> > +sub esc_html_mark_range {
> > + my ($str, $prefix, $suffix) = @_;
> > +
> > + # Don't generate empty <span> element.
> > + if ($prefix == $suffix + 1) {
> > + return esc_html(join('', @$str), -nbsp=>1);
> > + }
> > +
> > + my $before = join('', @{$str}[0..($prefix - 1)]);
> > + my $marked = join('', @{$str}[$prefix..$suffix]);
> > + my $after = join('', @{$str}[($suffix + 1)..$#{$str}]);
>
> Eeeeeek! First you split into letters, in caller at that, then join?
> Why not pass striung ($str suggests string not array of characters),
> and use substr instead?
>
> [Please disregard this and the next paragraph at first reading]
I will rename $str to something more self describing.
>
> > +
> > + return esc_html($before, -nbsp=>1) .
> > + $cgi->span({-class=>'marked'}, esc_html($marked,
> > -nbsp=>1)) .
> > + esc_html($after,-nbsp=>1);
> > +}
>
> Anyway I have send to git mailing list a patch series, which in one of
> patches adds esc_html_match_hl($str, $regexp) to highlight matches in
> a string. Your esc_html_mark_range(), after a generalization, could
> be used as underlying "engine".
>
> Something like this, perhaps (untested):
>
> # Highlight selected fragments of string, using given CSS class,
> # and escape HTML. It is assumed that fragments do not overlap.
> # Regions are passed as list of pairs (array references).
> sub esc_html_hl {
> my ($str, $css_class, @sel) = @_;
> return esc_html($str) unless @sel;
>
> my $out = '';
> my $pos = 0;
>
> for my $s (@sel) {
> $out .= esc_html(substr($str, $pos, $s->[0] - $pos))
> if ($s->[0] - $pos > 0);
> $out .= $cgi->span({-class => $css_class},
> esc_html(substr($str, $s->[0],
> $s->[1] - $s->[0])));
>
> $pos = $m->[1];
> }
> $out .= esc_html(substr($str, $pos))
> if ($pos < length($str));
>
> return $out;
> }
>
> > +
> > +# Format removed and added line, mark changed part and HTML-format
> > them.
>
> You should probably ad here that this code is taken from
> diff-highlight in contrib area, isn't it?
True.
>
> > +sub format_rem_add_line {
> > + my ($rem, $add) = @_;
> > + my @r = split(//, $rem);
> > + my @a = split(//, $add);
> > + my ($esc_rem, $esc_add);
> > + my ($prefix, $suffix_rem, $suffix_add) = (1, $#r, $#a);
>
> It is not as much $prefix, as $prefix_len, isn't it?
Yes.
> Shouldn't
> $prefix / $prefix_len start from 0, not from 1?
It starts from 1 because it skips first +/-. It should become obvious
after reading the comment from last patch :).
+ # In combined diff we must ignore two +/- characters.
+ $prefix = 2 if ($is_combined);
+
>
> > + my ($prefix_is_space, $suffix_is_space) = (1, 1);
> > +
> > + while ($prefix < @r && $prefix < @a) {
> > + last if ($r[$prefix] ne $a[$prefix]);
> > +
> > + $prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> > + $prefix++;
> > + }
>
> Ah, I see that it is easier to find common prefix by treating string
> as array of characters.
>
> Though I wonder if it wouldn't be easier to use trick of XOR-ing two
> strings (see "Bitwise String Operators" in perlop(1)):
>
> my $xor = "$rem" ^ "$add";
>
> and finding starting sequence of "\0", which denote common prefix.
>
>
> Though this and the following is a nice implementation of
> algorithm... as it would be implemented in C. Nevermind, it might be
> good enough...
The splitting and comparing by characters is taken from diff-highlight.
I don't think it's worth changing here.
>
> > +
> > + while ($suffix_rem >= $prefix && $suffix_add >= $prefix) {
> > + last if ($r[$suffix_rem] ne $a[$suffix_add]);
> > +
> > + $suffix_is_space = 0 if ($r[$suffix_rem] !~ /\s/);
> > + $suffix_rem--;
> > + $suffix_add--;
> > + }
>
> BTW., perhaps using single negative $suffix_len instead of separate
> $suffix_rem_pos and $suffix_add_pos would make code simpler and easier
> to understand?
I'll try that.
>
> > +
> > + # Mark lines that are different from each other, but have
> > some common
> > + # part that isn't whitespace. If lines are completely
> > different, don't
> > + # mark them because that would make output unreadable,
> > especially if
> > + # diff consists of multiple lines.
> > + if (($prefix == 1 && $suffix_rem == $#r && $suffix_add ==
> > $#a)
> > + || ($prefix_is_space && $suffix_is_space)) {
>
> Micronit about style: in gitweb we put boolean operator at the end of
> continued line, not at beginning of next one.
>
> So this would be:
>
> + if (($prefix == 1 && $suffix_rem == $#r && $suffix_add ==
> $#a) ||
> + ($prefix_is_space && $suffix_is_space)) {
>
OK
> > + $esc_rem = esc_html($rem);
> > + $esc_add = esc_html($add);
> > + } else {
> > + $esc_rem = esc_html_mark_range(\@r, $prefix,
> > $suffix_rem);
> > + $esc_add = esc_html_mark_range(\@a, $prefix,
> > $suffix_add);
> > + }
> > +
> > + return format_diff_line($esc_rem, 'rem'),
> > + format_diff_line($esc_add, 'add');
>
> Please use spaces to align.
OK
>
> > +}
> > +
> > # HTML-format diff context, removed and added lines.
> > sub format_ctx_rem_add_lines {
> > - my ($ctx, $rem, $add) = @_;
> > + my ($ctx, $rem, $add, $is_combined) = @_;
> > my (@new_ctx, @new_rem, @new_add);
> > + my $num_add_lines = @$add;
>
> Why is this temporary variable needed? If you are not sure that ==
> operator enforces scalar context on both arguments you can always
> write
>
> scalar @$add == scalar @$rem
>
You read my mind.
> > +
> > + if (!$is_combined && $num_add_lines > 0 && $num_add_lines
> > == @$rem) {
> > + for (my $i = 0; $i < $num_add_lines; $i++) {
> > + my ($line_rem, $line_add) =
> > format_rem_add_line(
> > + $rem->[$i], $add->[$i]);
> > + push @new_rem, $line_rem;
> > + push @new_add, $line_add;
>
> The original contrib/diff-highlight works only for single changed line
> (single removed and single added). You make this code work also for
> the case where number of aded lines is equal to the number of removed
> lines, and assume that subsequent changed lines in preimage
> correcponds to subsequent changed lines in postimage, which is not
> always true:
>
> -foo
> -bar
> +baz
> +fooo
>
> This is not described in commit message, I think.
True. Note that in this particular case nothing should be highlighted
because corresponding lines don't have common prefix/suffix.
>
> > + }
> > + } else {
> > + @new_rem = map { format_diff_line(esc_html($_,
> > -nbsp=>1), 'rem') } @$rem;
> > + @new_add = map { format_diff_line(esc_html($_,
> > -nbsp=>1), 'add') } @$add;
> > + }
> >
> > @new_ctx = map { format_diff_line(esc_html($_, -nbsp=>1),
> > 'ctx') } @$ctx;
> > - @new_rem = map { format_diff_line(esc_html($_, -nbsp=>1),
> > 'rem') } @$rem;
> > - @new_add = map { format_diff_line(esc_html($_, -nbsp=>1),
> > 'add') } @$add;
> > return (\@new_ctx, \@new_rem, \@new_add);
> > }
> > @@ -4939,7 +5010,8 @@ sub format_ctx_rem_add_lines {
> > sub print_diff_lines {
> > my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
> >
> > - ($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem,
> > $add);
> > + ($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem,
> > $add,
> > + $is_combined);
>
> O.K., now the code depends on $is_combined
>
next prev parent reply other threads:[~2012-02-13 6:41 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
2012-02-11 15:20 ` Jakub Narebski
2012-02-11 23:03 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
2012-02-11 15:53 ` Jakub Narebski
2012-02-11 23:16 ` Michał Kiedrowicz
2012-02-25 9:00 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
2012-02-11 16:02 ` Jakub Narebski
2012-02-10 9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
2012-02-11 16:29 ` Jakub Narebski
2012-02-11 23:20 ` Michał Kiedrowicz
2012-02-11 23:30 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
2012-02-11 17:14 ` Jakub Narebski
2012-02-11 23:38 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 13:23 ` Jakub Narebski
2012-02-10 14:15 ` Michał Kiedrowicz
2012-02-10 14:55 ` Jakub Narebski
2012-02-10 17:33 ` Michał Kiedrowicz
2012-02-10 22:52 ` Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff) Jakub Narebski
2012-02-10 20:24 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
2012-02-14 6:54 ` Michal Kiedrowicz
2012-02-14 7:14 ` Junio C Hamano
2012-02-14 8:20 ` Jeff King
2012-02-10 20:20 ` Jeff King
2012-02-10 21:29 ` Michał Kiedrowicz
2012-02-10 21:32 ` Jeff King
2012-02-10 21:36 ` Michał Kiedrowicz
2012-02-10 21:47 ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
2012-02-13 22:27 ` Jeff King
2012-02-13 22:28 ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
2012-02-13 22:32 ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
2012-02-14 6:35 ` Michal Kiedrowicz
2012-02-13 22:33 ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
2012-02-13 22:36 ` [PATCH 4/5] diff-highlight: match " Jeff King
2012-02-13 22:37 ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
2012-02-14 6:48 ` Michal Kiedrowicz
2012-02-14 0:05 ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
2012-02-14 0:22 ` Jeff King
2012-02-14 1:19 ` Junio C Hamano
2012-02-14 6:04 ` Jeff King
2012-02-14 6:28 ` Michal Kiedrowicz
2012-02-10 21:56 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 23:45 ` Jakub Narebski
2012-02-12 10:42 ` Jakub Narebski
2012-02-13 6:54 ` Michal Kiedrowicz
2012-02-13 19:58 ` Jakub Narebski
2012-02-13 21:10 ` Michał Kiedrowicz
2012-02-13 6:41 ` Michal Kiedrowicz [this message]
2012-02-13 18:44 ` Jakub Narebski
2012-02-13 21:09 ` Michał Kiedrowicz
2012-02-14 17:31 ` Jakub Narebski
2012-02-14 18:23 ` Michał Kiedrowicz
2012-02-14 18:52 ` Jeff King
2012-02-14 20:04 ` Michał Kiedrowicz
2012-02-14 20:38 ` Jeff King
2012-02-10 9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
2012-02-12 0:11 ` Jakub Narebski
2012-02-13 6:46 ` Michal Kiedrowicz
2012-02-10 9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
2012-02-12 0:03 ` Jakub Narebski
2012-02-13 6:48 ` Michal Kiedrowicz
2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 22:56 ` Michał Kiedrowicz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120213074125.3e8793ad@mkiedrowicz.ivo.pl \
--to=michal.kiedrowicz@gmail.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).