git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
Date: Sat, 11 Feb 2012 15:45:29 -0800 (PST)	[thread overview]
Message-ID: <m3y5s9rl3g.fsf@localhost.localdomain> (raw)
In-Reply-To: <1328865494-24415-7-git-send-email-michal.kiedrowicz@gmail.com>

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> Reading diff output is sometimes very hard, even if it's colored,
> especially if lines differ only in few characters.  This is often true
> when a commit fixes a typo or renames some variables or functions.
>
This is certainly nice feature to have.  I think most tools that
implement side-by-side diff implement this too.

> This commit teaches gitweb to highlight characters that are different
> between old and new line.  This should work in the similar manner as in
> Trac or GitHub.
>
Doe you have URLs with good examples of such diff refinement
highlighting (GNU Emacs ediff/emerge terminology) / highlighting
differing segments (diff-highlight terminology)?
 
> The code that compares lines is based on
> contrib/diff-highlight/diff-highlight, except that it works with
> multiline changes too.  It also won't highlight lines that are
> completely different because that would only make the output unreadable.
>
So what does it look like?  From the contrib/diff-highlight/README I
guess that it finds common prefix and common suffix, and highlights
the rest, which includes change.  It doesn't implement LCS / diff
algorithm like e.g. tkdiff does for its diff refinement highlighting,
isn't it?

By completly different you mean that they do not have common prefix or
common suffix (at least one of them), isn't it?

> Combined diffs are not supported but a following commit will change it.
> 
O.K.

> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> ---
>  gitweb/gitweb.perl |   82 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 77 insertions(+), 5 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index db61553..1a5b454 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2322,7 +2322,7 @@ sub format_cc_diff_chunk_header {
>  # wrap patch (diff) line into a <div> (not to be used for diff headers),
>  # the line must be esc_html()'ed
>  sub format_diff_line {
> -	my ($line, $diff_class, $from, $to) = @_;
> +	my ($line, $diff_class) = @_;

Why that change?

>  
>  	my $diff_classes = "diff";
>  	$diff_classes .= " $diff_class" if ($diff_class);
> @@ -4923,14 +4923,85 @@ sub print_inline_diff_lines {
>  	print foreach (@$add);
>  }
>  
> +# Highlight characters from $prefix to $suffix and escape HTML.
> +# $str is a reference to the array of characters.
> +sub esc_html_mark_range {
> +	my ($str, $prefix, $suffix) = @_;
> +
> +	# Don't generate empty <span> element.
> +	if ($prefix == $suffix + 1) {
> +		return esc_html(join('', @$str), -nbsp=>1);
> +	}
> +
> +	my $before = join('', @{$str}[0..($prefix - 1)]);
> +	my $marked = join('', @{$str}[$prefix..$suffix]);
> +	my $after = join('', @{$str}[($suffix + 1)..$#{$str}]);

Eeeeeek!  First you split into letters, in caller at that, then join?
Why not pass striung ($str suggests string not array of characters),
and use substr instead?

[Please disregard this and the next paragraph at first reading]

> +
> +	return esc_html($before, -nbsp=>1) .
> +		$cgi->span({-class=>'marked'}, esc_html($marked, -nbsp=>1)) .
> +		esc_html($after,-nbsp=>1);
> +}

Anyway I have send to git mailing list a patch series, which in one of
patches adds esc_html_match_hl($str, $regexp) to highlight matches in
a string.  Your esc_html_mark_range(), after a generalization, could
be used as underlying "engine".

Something like this, perhaps (untested):

   # Highlight selected fragments of string, using given CSS class,
   # and escape HTML.  It is assumed that fragments do not overlap.
   # Regions are passed as list of pairs (array references).
   sub esc_html_hl {
        my ($str, $css_class, @sel) = @_;
        return esc_html($str) unless @sel;
   
        my $out = '';
        my $pos = 0;
   
        for my $s (@sel) {
                $out .= esc_html(substr($str, $pos, $s->[0] - $pos))
                        if ($s->[0] - $pos > 0);
                $out .= $cgi->span({-class => $css_class},
                                   esc_html(substr($str, $s->[0], $s->[1] - $s->[0])));

                $pos = $m->[1];
        }
        $out .= esc_html(substr($str, $pos))
                if ($pos < length($str));
   
        return $out;
   }

> +
> +# Format removed and added line, mark changed part and HTML-format them.

You should probably ad here that this code is taken from
diff-highlight in contrib area, isn't it?

> +sub format_rem_add_line {
> +	my ($rem, $add) = @_;
> +	my @r = split(//, $rem);
> +	my @a = split(//, $add);
> +	my ($esc_rem, $esc_add);
> +	my ($prefix, $suffix_rem, $suffix_add) = (1, $#r, $#a);

It is not as much $prefix, as $prefix_len, isn't it?  Shouldn't
$prefix / $prefix_len start from 0, not from 1?

> +	my ($prefix_is_space, $suffix_is_space) = (1, 1);
> +
> +	while ($prefix < @r && $prefix < @a) {
> +		last if ($r[$prefix] ne $a[$prefix]);
> +
> +		$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> +		$prefix++;
> +	}

Ah, I see that it is easier to find common prefix by treating string
as array of characters.

Though I wonder if it wouldn't be easier to use trick of XOR-ing two
strings (see "Bitwise String Operators" in perlop(1)):

        my $xor = "$rem" ^ "$add";

and finding starting sequence of "\0", which denote common prefix.


Though this and the following is a nice implementation of
algorithm... as it would be implemented in C.  Nevermind, it might be
good enough...

> +
> +	while ($suffix_rem >= $prefix && $suffix_add >= $prefix) {
> +		last if ($r[$suffix_rem] ne $a[$suffix_add]);
> +
> +		$suffix_is_space = 0 if ($r[$suffix_rem] !~ /\s/);
> +		$suffix_rem--;
> +		$suffix_add--;
> +	}

BTW., perhaps using single negative $suffix_len instead of separate
$suffix_rem_pos and $suffix_add_pos would make code simpler and easier
to understand?

> +
> +	# Mark lines that are different from each other, but have some common
> +	# part that isn't whitespace.  If lines are completely different, don't
> +	# mark them because that would make output unreadable, especially if
> +	# diff consists of multiple lines.
> +	if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == $#a)
> +		|| ($prefix_is_space && $suffix_is_space)) {

Micronit about style: in gitweb we put boolean operator at the end of
continued line, not at beginning of next one.

So this would be:

  +	if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == $#a) ||
  +	    ($prefix_is_space && $suffix_is_space)) {

> +		$esc_rem = esc_html($rem);
> +		$esc_add = esc_html($add);
> +	} else {
> +		$esc_rem = esc_html_mark_range(\@r, $prefix, $suffix_rem);
> +		$esc_add = esc_html_mark_range(\@a, $prefix, $suffix_add);
> +	}
> +
> +	return format_diff_line($esc_rem, 'rem'),
> +		format_diff_line($esc_add, 'add');

Please use spaces to align.

> +}
> +
>  # HTML-format diff context, removed and added lines.
>  sub format_ctx_rem_add_lines {
> -	my ($ctx, $rem, $add) = @_;
> +	my ($ctx, $rem, $add, $is_combined) = @_;
>  	my (@new_ctx, @new_rem, @new_add);
> +	my $num_add_lines = @$add;

Why is this temporary variable needed?  If you are not sure that ==
operator enforces scalar context on both arguments you can always
write

  scalar @$add == scalar @$rem

> +
> +	if (!$is_combined && $num_add_lines > 0 && $num_add_lines == @$rem) {
> +		for (my $i = 0; $i < $num_add_lines; $i++) {
> +			my ($line_rem, $line_add) = format_rem_add_line(
> +				$rem->[$i], $add->[$i]);
> +			push @new_rem, $line_rem;
> +			push @new_add, $line_add;

The original contrib/diff-highlight works only for single changed line
(single removed and single added).  You make this code work also for
the case where number of aded lines is equal to the number of removed
lines, and assume that subsequent changed lines in preimage
correcponds to subsequent changed lines in postimage, which is not
always true:

   -foo
   -bar
   +baz
   +fooo

This is not described in commit message, I think.

> +		}
> +	} else {
> +		@new_rem = map { format_diff_line(esc_html($_, -nbsp=>1), 'rem') } @$rem;
> +		@new_add = map { format_diff_line(esc_html($_, -nbsp=>1), 'add') } @$add;
> +	}
>  
>  	@new_ctx = map { format_diff_line(esc_html($_, -nbsp=>1), 'ctx') } @$ctx;
> -	@new_rem = map { format_diff_line(esc_html($_, -nbsp=>1), 'rem') } @$rem;
> -	@new_add = map { format_diff_line(esc_html($_, -nbsp=>1), 'add') } @$add;
>  
>  	return (\@new_ctx, \@new_rem, \@new_add);
>  }
> @@ -4939,7 +5010,8 @@ sub format_ctx_rem_add_lines {
>  sub print_diff_lines {
>  	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
>  
> -	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add);
> +	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add,
> +		$is_combined);

O.K., now the code depends on $is_combined

-- 
Jakub Narębski

  parent reply	other threads:[~2012-02-11 23:45 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
2012-02-11 15:20   ` Jakub Narebski
2012-02-11 23:03     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
2012-02-11 15:53   ` Jakub Narebski
2012-02-11 23:16     ` Michał Kiedrowicz
2012-02-25  9:00     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
2012-02-11 16:02   ` Jakub Narebski
2012-02-10  9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
2012-02-11 16:29   ` Jakub Narebski
2012-02-11 23:20     ` Michał Kiedrowicz
2012-02-11 23:30       ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
2012-02-11 17:14   ` Jakub Narebski
2012-02-11 23:38     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 13:23   ` Jakub Narebski
2012-02-10 14:15     ` Michał Kiedrowicz
2012-02-10 14:55       ` Jakub Narebski
2012-02-10 17:33         ` Michał Kiedrowicz
2012-02-10 22:52           ` Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff) Jakub Narebski
2012-02-10 20:24         ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
2012-02-14  6:54     ` Michal Kiedrowicz
2012-02-14  7:14       ` Junio C Hamano
2012-02-14  8:20         ` Jeff King
2012-02-10 20:20   ` Jeff King
2012-02-10 21:29     ` Michał Kiedrowicz
2012-02-10 21:32       ` Jeff King
2012-02-10 21:36         ` Michał Kiedrowicz
2012-02-10 21:47         ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
2012-02-13 22:27           ` Jeff King
2012-02-13 22:28             ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
2012-02-13 22:32             ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
2012-02-14  6:35               ` Michal Kiedrowicz
2012-02-13 22:33             ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
2012-02-13 22:36             ` [PATCH 4/5] diff-highlight: match " Jeff King
2012-02-13 22:37             ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
2012-02-14  6:48               ` Michal Kiedrowicz
2012-02-14  0:05             ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
2012-02-14  0:22               ` Jeff King
2012-02-14  1:19                 ` Junio C Hamano
2012-02-14  6:04                   ` Jeff King
2012-02-14  6:28             ` Michal Kiedrowicz
2012-02-10 21:56     ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 23:45   ` Jakub Narebski [this message]
2012-02-12 10:42     ` Jakub Narebski
2012-02-13  6:54       ` Michal Kiedrowicz
2012-02-13 19:58         ` Jakub Narebski
2012-02-13 21:10           ` Michał Kiedrowicz
2012-02-13  6:41     ` Michal Kiedrowicz
2012-02-13 18:44       ` Jakub Narebski
2012-02-13 21:09         ` Michał Kiedrowicz
2012-02-14 17:31           ` Jakub Narebski
2012-02-14 18:23             ` Michał Kiedrowicz
2012-02-14 18:52               ` Jeff King
2012-02-14 20:04                 ` Michał Kiedrowicz
2012-02-14 20:38                   ` Jeff King
2012-02-10  9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
2012-02-12  0:11   ` Jakub Narebski
2012-02-13  6:46     ` Michal Kiedrowicz
2012-02-10  9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
2012-02-12  0:03   ` Jakub Narebski
2012-02-13  6:48     ` Michal Kiedrowicz
2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 22:56   ` Michał Kiedrowicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3y5s9rl3g.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=michal.kiedrowicz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).