From: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
Date: Mon, 13 Feb 2012 22:09:17 +0100 [thread overview]
Message-ID: <20120213220917.4cf14eb1@gmail.com> (raw)
In-Reply-To: <201202131944.50886.jnareb@gmail.com>
Jakub Narebski <jnareb@gmail.com> wrote:
> On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:
> > Jakub Narebski <jnareb@gmail.com> wrote:
> > > Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> > I haven't found *examples* on GitHub and Trac sites, but what about
> > these ones:
> >
> > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> > http://trac.edgewall.org/changeset/10973
>
> Thanks. That is what I meant by "good examples". Perhaps they should
> be put in the commit message?
OK
>
> BTW GitHub is closed source, but we can check what algorithm does Trac
> use for diff refinement highlighting (highlighting changed portions of
> diff).
>
I think it's
http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
(see markup intraline_changes()).
> > > It doesn't implement LCS / diff
> > > algorithm like e.g. tkdiff does for its diff refinement highlighting,
> > > isn't it?
> >
> > I doesn't. I share the Jeff's opinion that:
> > a) Jeff's approach is "good enough"
> > b) LCS on bytes could be very confusing if it marked few sets of
> > characters.
>
> I wonder if we can use --diff-words for diff refinement highlighting,
> i.e. LCS on words.
I think we can try it, but I worry about performance of running `git
diff` on every diff chunk.
>
> Anyway Jeff's approach is a bit limited, in that it would work only for
> change that does not involve adding newlines, for example splitting
> overly long line when changing something.
>
> See for example line 1786 (in pre-image) in http://trac.edgewall.org/changeset/10973
>
Yes, I'm aware of that. I was thinking about improving it later ("Let's
start with a simple refinment highlightning and maybe later add more
sophisticated algorithms").
> > > By completly different you mean that they do not have common prefix or
> > > common suffix (at least one of them), isn't it?
>
> BTW. is it "at least one of prefix or suffix are non-empty" or "both prefix
> and suffix are non-empty"?
>
At least one. See:
-a = 42;
+b = 42;
Here prefix is empty but suffix is not.
> > I would also consider ignoring prefixes/suffixes with punctuation, like:
> >
> > - * I like you.
> > + * Alice had a little lamb.
>
> But this patch doesn't implement this feature yet, isn't it?
No, but is a matter of adding
-$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
+$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/);
(and the same for suffix)
> Well, here is another idea: do not highlight if sum of prefix and suffix
> lengths are less than some threshold, e.g. 2 characters not including
> whitespace, or some percentage with respect to total line length.
>
That might be a good idea.
> > > Eeeeeek! First you split into letters, in caller at that, then join?
> > > Why not pass striung ($str suggests string not array of characters),
> > > and use substr instead?
> > >
> > > [Please disregard this and the next paragraph at first reading]
> >
> > I will rename $str to something more self describing.
>
> Please do.
>
> BTW. don't you assume here that both common prefix and common suffix
> are non-empty?
Yes. The caller makes sure they are correct (at least
> > > > +sub format_rem_add_line {
> > > > + my ($rem, $add) = @_;
> > > > + my @r = split(//, $rem);
> > > > + my @a = split(//, $add);
>
> BTW the name of variable can be just @add and @rem.
>
I know they are different scopes but I don't like it. It makes the code
more confusing IMO. But I won't insist.
> > > Shouldn't
> > > $prefix / $prefix_len start from 0, not from 1?
> >
> > It starts from 1 because it skips first +/-. It should become obvious
> > after reading the comment from last patch :).
> >
> > + # In combined diff we must ignore two +/- characters.
> > + $prefix = 2 if ($is_combined);
>
> Anyway comment about that fact would be nice.
Will do.
>
> > > > + my ($prefix_is_space, $suffix_is_space) = (1, 1);
> > > > +
> > > > + while ($prefix < @r && $prefix < @a) {
> > > > + last if ($r[$prefix] ne $a[$prefix]);
> > > > +
> > > > + $prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> > > > + $prefix++;
> > > > + }
> > >
> > > Ah, I see that it is easier to find common prefix by treating string
> > > as array of characters.
> > >
> > > Though I wonder if it wouldn't be easier to use trick of XOR-ing two
> > > strings (see "Bitwise String Operators" in perlop(1)):
> > >
> > > my $xor = "$rem" ^ "$add";
> > >
> > > and finding starting sequence of "\0", which denote common prefix.
> > >
> > >
> > > Though this and the following is a nice implementation of
> > > algorithm... as it would be implemented in C. Nevermind, it might be
> > > good enough...
> >
> > The splitting and comparing by characters is taken from diff-highlight.
> > I don't think it's worth changing here.
>
> You are right.
>
> I'll try to come with hacky algorithm using string bitwise xor and regexp,
> and benchmark it comparing to your C-like solution, but it can be left for
> later (simple is better than clever, usually).
If you have time :).
> > > > # HTML-format diff context, removed and added lines.
> > > > sub format_ctx_rem_add_lines {
> > > > - my ($ctx, $rem, $add) = @_;
> > > > + my ($ctx, $rem, $add, $is_combined) = @_;
> > > > my (@new_ctx, @new_rem, @new_add);
> > > > + my $num_add_lines = @$add;
> > >
> > > Why is this temporary variable needed? If you are not sure that ==
> > > operator enforces scalar context on both arguments you can always
> > > write
> > >
> > > scalar @$add == scalar @$rem
> >
> > You read my mind.
>
> BTW. the same comment applies to patch adding support for highlighting
> changed part in combined diff.
>
OK
next prev parent reply other threads:[~2012-02-13 21:09 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
2012-02-11 15:20 ` Jakub Narebski
2012-02-11 23:03 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
2012-02-11 15:53 ` Jakub Narebski
2012-02-11 23:16 ` Michał Kiedrowicz
2012-02-25 9:00 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
2012-02-11 16:02 ` Jakub Narebski
2012-02-10 9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
2012-02-11 16:29 ` Jakub Narebski
2012-02-11 23:20 ` Michał Kiedrowicz
2012-02-11 23:30 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
2012-02-11 17:14 ` Jakub Narebski
2012-02-11 23:38 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 13:23 ` Jakub Narebski
2012-02-10 14:15 ` Michał Kiedrowicz
2012-02-10 14:55 ` Jakub Narebski
2012-02-10 17:33 ` Michał Kiedrowicz
2012-02-10 22:52 ` Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff) Jakub Narebski
2012-02-10 20:24 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
2012-02-14 6:54 ` Michal Kiedrowicz
2012-02-14 7:14 ` Junio C Hamano
2012-02-14 8:20 ` Jeff King
2012-02-10 20:20 ` Jeff King
2012-02-10 21:29 ` Michał Kiedrowicz
2012-02-10 21:32 ` Jeff King
2012-02-10 21:36 ` Michał Kiedrowicz
2012-02-10 21:47 ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
2012-02-13 22:27 ` Jeff King
2012-02-13 22:28 ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
2012-02-13 22:32 ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
2012-02-14 6:35 ` Michal Kiedrowicz
2012-02-13 22:33 ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
2012-02-13 22:36 ` [PATCH 4/5] diff-highlight: match " Jeff King
2012-02-13 22:37 ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
2012-02-14 6:48 ` Michal Kiedrowicz
2012-02-14 0:05 ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
2012-02-14 0:22 ` Jeff King
2012-02-14 1:19 ` Junio C Hamano
2012-02-14 6:04 ` Jeff King
2012-02-14 6:28 ` Michal Kiedrowicz
2012-02-10 21:56 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 23:45 ` Jakub Narebski
2012-02-12 10:42 ` Jakub Narebski
2012-02-13 6:54 ` Michal Kiedrowicz
2012-02-13 19:58 ` Jakub Narebski
2012-02-13 21:10 ` Michał Kiedrowicz
2012-02-13 6:41 ` Michal Kiedrowicz
2012-02-13 18:44 ` Jakub Narebski
2012-02-13 21:09 ` Michał Kiedrowicz [this message]
2012-02-14 17:31 ` Jakub Narebski
2012-02-14 18:23 ` Michał Kiedrowicz
2012-02-14 18:52 ` Jeff King
2012-02-14 20:04 ` Michał Kiedrowicz
2012-02-14 20:38 ` Jeff King
2012-02-10 9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
2012-02-12 0:11 ` Jakub Narebski
2012-02-13 6:46 ` Michal Kiedrowicz
2012-02-10 9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
2012-02-12 0:03 ` Jakub Narebski
2012-02-13 6:48 ` Michal Kiedrowicz
2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 22:56 ` Michał Kiedrowicz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120213220917.4cf14eb1@gmail.com \
--to=michal.kiedrowicz@gmail.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).