From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jakub Narebski Subject: Re: [PATCH v2 7/8] gitweb: Highlight interesting parts of diff Date: Thu, 29 Mar 2012 20:42:20 +0100 Message-ID: <201203292142.21680.jnareb@gmail.com> References: <1332543417-19664-1-git-send-email-michal.kiedrowicz@gmail.com> <1332543417-19664-8-git-send-email-michal.kiedrowicz@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: git@vger.kernel.org, Jeff King To: =?utf-8?q?Micha=C5=82_Kiedrowicz?= X-From: git-owner@vger.kernel.org Thu Mar 29 21:42:31 2012 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SDLEs-0007rs-ET for gcvg-git-2@plane.gmane.org; Thu, 29 Mar 2012 21:42:31 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758933Ab2C2TmZ convert rfc822-to-quoted-printable (ORCPT ); Thu, 29 Mar 2012 15:42:25 -0400 Received: from mail-we0-f174.google.com ([74.125.82.174]:47511 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751646Ab2C2TmY (ORCPT ); Thu, 29 Mar 2012 15:42:24 -0400 Received: by wejx9 with SMTP id x9so1233125wej.19 for ; Thu, 29 Mar 2012 12:42:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:user-agent:cc:references:in-reply-to :mime-version:content-type:content-transfer-encoding :content-disposition:message-id; bh=TgUOSJOz+jpHncUTu3ft7bxJsPknGs1dJitaBDCWoIs=; b=qOgCPc+ZXx9n8zDU6f6TkNVg5yibJy4OHn+VQUiC92lkRhIkk0kquYnAjeaFQ3UmKz zIFa+nN3sSWIyqKfl4oxHAZfcqML7N1ylx7OsfO2gL/EgzsqTC7f5oAH9PfH3yvbn9cA 2iNK6/6dw1KUZRzprkPqBhfENIA3+NRZ9OQ3kvoa/TmQqiJd7vaI2SI2ieJcoZZsFaG1 lbqrON6G9VvEEs3fb7hhxwchO99UrEmFSSYW1AiN4OOybdbzM/4R8/dKBw/O/MOp062U 2IDyomsURL3PM83vVfU62jjIQMOKsQFcI2mcN3foKlW5hrHSZMWosjSiq0fKZQNMufyJ xPjg== Received: by 10.180.88.67 with SMTP id be3mr8579748wib.20.1333050142575; Thu, 29 Mar 2012 12:42:22 -0700 (PDT) Received: from [192.168.1.13] (aeh224.neoplus.adsl.tpnet.pl. [83.25.111.224]) by mx.google.com with ESMTPS id fz9sm69504727wib.3.2012.03.29.12.42.20 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 29 Mar 2012 12:42:21 -0700 (PDT) User-Agent: KMail/1.9.3 In-Reply-To: <1332543417-19664-8-git-send-email-michal.kiedrowicz@gmail.com> Content-Disposition: inline Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Fri, 23 Mar 2012, Micha=C5=82 Kiedrowicz wrote: > Reading diff output is sometimes very hard, even if it's colored, > especially if lines differ only in few characters. This is often tru= e > when a commit fixes a typo or renames some variables or functions. >=20 > This commit teaches gitweb to highlight characters that are different > between old and new line with a light green/red background. This sho= uld > work in the similar manner as in Trac or GitHub. >=20 > The algorithm that compares lines is based on contrib/diff-highlight. > Basically, it works by determining common prefix/suffix of correspond= ing > lines and highlightning only the middle part of lines. For more > information, see contrib/diff-highlight/README. > Nice description. =20 > Combined diffs are not supported but a following commit will change i= t. >=20 O.K. > Two changes in format_diff_line() were needed to allow diff refinemen= t > highlightning to work. Firstly, format_diff_line() was taught to not > esc_html() line that was passed as a reference. This was needed beca= use > refining the highlight must be performed on unescaped lines and it us= es > a element to mark interesting parts of the line. Well, actually we just need to make format_diff_line to not esc_html passed line which is already esc_html'ed or esc_html_hl_regions'ed, i.e. to avoid double escaping. Passing line as reference if it is to be treated as HTML is one possibility, and I think it is a good convention to start to use. > Secondly, the= =20 > lines are untabified before comparing because calling untabify() > after inserting 's could improperly convert tabs to spaces. Well, it is actually because untabify() must work on original text to work correctly, i.e. to convert tabs to spaces according to tab stops. It must precede esc_html'ing, and even more esc_html_hl_regions'ing. >=20 > Signed-off-by: Micha=C5=82 Kiedrowicz Anyway this is a good change, much cleaner than previous version Acked-by: Jakub Nar=C4=99bski > --- > gitweb/gitweb.perl | 84 ++++++++++++++++++++++++++++++++++++= ++++++---- > gitweb/static/gitweb.css | 8 ++++ > 2 files changed, 85 insertions(+), 7 deletions(-) >=20 > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index db32588..872ba12 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -2426,14 +2426,14 @@ sub format_cc_diff_chunk_header { > } > =20 > # process patch (diff) line (not to be used for diff headers), > -# returning HTML-formatted (but not wrapped) line > +# returning HTML-formatted (but not wrapped) line. > +# If the line is already esc_html()'ed, pass it as a reference. Errr... I think the explanation here must be in reverse: +# If the line is passed as a reference, it is treated as HTML, +# and not esc_html()'ed. > sub format_diff_line { > my ($line, $diff_class, $from, $to) =3D @_; > =20 > - chomp $line; > - $line =3D untabify($line); > - > - if ($from && $to && $line =3D~ m/^\@{2} /) { > + if (ref($line)) { > + $line =3D $$line; > + } elsif ($from && $to && $line =3D~ m/^\@{2} /) { > $line =3D format_unidiff_chunk_header($line, $from, $to); > } elsif ($from && $to && $line =3D~ m/^\@{3}/) { > $line =3D format_cc_diff_chunk_header($line, $from, $to); > @@ -5054,10 +5054,80 @@ sub print_inline_diff_lines { > print foreach (@$add); > } > =20 > +# Format removed and added line, mark changed part and HTML-format t= hem. > +# Impementation is based on contrib/diff-highlight > +sub format_rem_add_line { Wouldn't a better name be format_rem_add_pair() or format_rem_add_lines= (), or format_rem_add_lines_pair()? > + my ($rem, $add) =3D @_; > + my @rem =3D split(//, $rem); > + my @add =3D split(//, $add); > + my ($esc_rem, $esc_add); > + # Ignore +/- character, thus $prefix_len is set to 1. > + my ($prefix_len, $suffix_len) =3D (1, 0); I wonder if it wouldn't be slightly cleaner to count $prefix_len from +/- rather than start of line. It means + # Prefix does not count initial +/- character. + my ($prefix_len, $suffix_len) =3D (0, 0); Perhaps even remove initial +/- from @add and @rem. + shift @rem; + shift @add; Ehh... now I see that starting $prefix_len with 1 might be not so bad idea after all. > + my ($prefix_is_space, $suffix_is_space) =3D (1, 1); It is not entirely true: $prefix_is_space is really $prefix_is_space_or= _empty. It might be a better idea to use + my ($prefix_has_nonspace, $suffix_has_nonspace); using the fact that undefined value is false. > + > + while ($prefix_len < @rem && $prefix_len < @add) { > + last if ($rem[$prefix_len] ne $add[$prefix_len]); > + > + $prefix_is_space =3D 0 if ($rem[$prefix_len] !~ /\s/); + $prefix_has_nonspace =3D 1 if ($rem[$prefix_len] =3D~ /\s/); > + $prefix_len++; > + } > + > + my $shorter =3D (@rem < @add) ? @rem : @add; > + while ($prefix_len + $suffix_len < $shorter) { > + last if ($rem[-1 - $suffix_len] ne $add[-1 - $suffix_len]); > + > + $suffix_is_space =3D 0 if ($rem[-1 - $suffix_len] !~ /\s/); + $suffix_has_nonspace =3D 1 if ($rem[-1 - $suffix_len] =3D~ /\s/); > + $suffix_len++; > + } > + > + # Mark lines that are different from each other, but have some comm= on > + # part that isn't whitespace. If lines are completely different, d= on't > + # mark them because that would make output unreadable, especially i= f > + # diff consists of multiple lines. > + if (($prefix_len =3D=3D 1 && $suffix_len =3D=3D 0) || > + ($prefix_is_space && $suffix_is_space)) { Actually ($prefix_is_space && $suffix_is_space) is enough, but cryptic. With $prefix_is_space (=3D=3D $prefix_is_space_or_empty) -> $prefix_has= _nonspace it would be + if (not ($prefix_has_nonspace || $suffix_has_nonspace)) { Anyway, it is not as if it is not clear enough. > + $esc_rem =3D esc_html($rem, -nbsp=3D>1); > + $esc_add =3D esc_html($add, -nbsp=3D>1); > + } else { > + $esc_rem =3D esc_html_hl_regions($rem, 'marked', [$prefix_len, @re= m - $suffix_len], -nbsp=3D>1); > + $esc_add =3D esc_html_hl_regions($add, 'marked', [$prefix_len, @ad= d - $suffix_len], -nbsp=3D>1); + $esc_rem =3D esc_html_hl_regions($rem, 'marked', [$prefix_len+1, @= rem+1 - $suffix_len], -nbsp=3D>1); + $esc_add =3D esc_html_hl_regions($add, 'marked', [$prefix_len+1, @= add+1 - $suffix_len], -nbsp=3D>1); So probably not worth it. > + } > + > + return format_diff_line(\$esc_rem, 'rem'), > + format_diff_line(\$esc_add, 'add'); + return format_diff_line(\$esc_rem, 'rem'), + format_diff_line(\$esc_add, 'add'); > +} > + > +# HTML-format diff context, removed and added lines. > +sub format_ctx_rem_add_lines { > + my ($ctx, $rem, $add, $is_combined) =3D @_; > + my (@new_ctx, @new_rem, @new_add); > + > + # Highlight if every removed line has a corresponding added line. > + # Combined diffs are not supported ATM. ATM =3D=3D at this moment? Please say so. > + if (!$is_combined && @$add > 0 && @$add =3D=3D @$rem) { > + for (my $i =3D 0; $i < @$add; $i++) { > + my ($line_rem, $line_add) =3D format_rem_add_line( > + $rem->[$i], $add->[$i]); + my ($line_rem, $line_add) =3D + format_rem_add_line($rem->[$i], $add->[$i]); > + push @new_rem, $line_rem; > + push @new_add, $line_add; > + } > + } else { > + @new_rem =3D map { format_diff_line($_, 'rem') } @$rem; > + @new_add =3D map { format_diff_line($_, 'add') } @$add; > + } > + > + @new_ctx =3D map { format_diff_line($_, 'ctx') } @$ctx; > + > + return (\@new_ctx, \@new_rem, \@new_add); > +} Nice. > + > # Print context lines and then rem/add lines. > sub print_diff_lines { > my ($ctx, $rem, $add, $diff_style, $is_combined) =3D @_; > =20 > + ($ctx, $rem, $add) =3D format_ctx_rem_add_lines($ctx, $rem, $add, > + $is_combined); > + + ($ctx, $rem, $add) =3D format_ctx_rem_add_lines($ctx, $rem, $add, $= is_combined); + Unless the line is too long. > if ($diff_style eq 'sidebyside' && !$is_combined) { > print_sidebyside_diff_lines($ctx, $rem, $add); > } else { > @@ -5089,11 +5159,11 @@ sub print_diff_chunk { > foreach my $line_info (@chunk) { > my ($class, $line) =3D @$line_info; > =20 > - $line =3D format_diff_line($line, $class, $from, $to); > + $line =3D untabify($line); O.K. you move untabify() out of format_diff_line(), and must now ensure that caller uses it before calling format_diff_line() or=20 format_ctx_rem_add_lines() (which uses format_diff_line() itself). I wonder if leaving untabify() (and chomp) in format_diff_line(), but not running it if passed reference, and running untabify() in format_ctx_rem_add_lines() wouldn't be a better, less fragile code. > =20 > # print chunk headers > if ($class && $class eq 'chunk_header') { > - print $line; > + print format_diff_line($line, $class, $from, $to); O.K., only 'chunk_header' are not formatted. > next; > } [I have to take a look how final version of print_diff_lines() looks like in this commit; the patch alone is not enough] Right, so we format just before printing, and print_* do formatting itself before printing. Nice and clean. > =20 > diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css > index c530355..3c4a3c9 100644 > --- a/gitweb/static/gitweb.css > +++ b/gitweb/static/gitweb.css > @@ -438,6 +438,10 @@ div.diff.add { > color: #008800; > } > =20 > +div.diff.add span.marked { > + background-color: #77ff77; > +} > + > div.diff.from_file a.path, > div.diff.from_file { > color: #aa0000; > @@ -447,6 +451,10 @@ div.diff.rem { > color: #cc0000; > } > =20 > +div.diff.rem span.marked { > + background-color: #ff7777; > +} > + > div.diff.chunk_header a, > div.diff.chunk_header { > color: #990099; > --=20 Nice styling. --=20 Jakub Narebski Poland