From: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
Date: Mon, 13 Feb 2012 22:09:17 +0100 [thread overview]
Message-ID: <20120213220917.4cf14eb1@gmail.com> (raw)
In-Reply-To: <201202131944.50886.jnareb@gmail.com>
Jakub Narebski <jnareb@gmail.com> wrote:
> On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:
> > Jakub Narebski <jnareb@gmail.com> wrote:
> > > Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> > I haven't found *examples* on GitHub and Trac sites, but what about
> > these ones:
> >
> > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> > http://trac.edgewall.org/changeset/10973
>
> Thanks. That is what I meant by "good examples". Perhaps they should
> be put in the commit message?
OK
>
> BTW GitHub is closed source, but we can check what algorithm does Trac
> use for diff refinement highlighting (highlighting changed portions of
> diff).
>
I think it's
http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
(see markup intraline_changes()).
> > > It doesn't implement LCS / diff
> > > algorithm like e.g. tkdiff does for its diff refinement highlighting,
> > > isn't it?
> >
> > I doesn't. I share the Jeff's opinion that:
> > a) Jeff's approach is "good enough"
> > b) LCS on bytes could be very confusing if it marked few sets of
> > characters.
>
> I wonder if we can use --diff-words for diff refinement highlighting,
> i.e. LCS on words.
I think we can try it, but I worry about performance of running `git
diff` on every diff chunk.
>
> Anyway Jeff's approach is a bit limited, in that it would work only for
> change that does not involve adding newlines, for example splitting
> overly long line when changing something.
>
> See for example line 1786 (in pre-image) in http://trac.edgewall.org/changeset/10973
>
Yes, I'm aware of that. I was thinking about improving it later ("Let's
start with a simple refinment highlightning and maybe later add more
sophisticated algorithms").
> > > By completly different you mean that they do not have common prefix or
> > > common suffix (at least one of them), isn't it?
>
> BTW. is it "at least one of prefix or suffix are non-empty" or "both prefix
> and suffix are non-empty"?
>
At least one. See:
-a = 42;
+b = 42;
Here prefix is empty but suffix is not.
> > I would also consider ignoring prefixes/suffixes with punctuation, like:
> >
> > - * I like you.
> > + * Alice had a little lamb.
>
> But this patch doesn't implement this feature yet, isn't it?
No, but is a matter of adding
-$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
+$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/);
(and the same for suffix)
> Well, here is another idea: do not highlight if sum of prefix and suffix
> lengths are less than some threshold, e.g. 2 characters not including
> whitespace, or some percentage with respect to total line length.
>
That might be a good idea.
> > > Eeeeeek! First you split into letters, in caller at that, then join?
> > > Why not pass striung ($str suggests string not array of characters),
> > > and use substr instead?
> > >
> > > [Please disregard this and the next paragraph at first reading]
> >
> > I will rename $str to something more self describing.
>
> Please do.
>
> BTW. don't you assume here that both common prefix and common suffix
> are non-empty?
Yes. The caller makes sure they are correct (at least
> > > > +sub format_rem_add_line {
> > > > + my ($rem, $add) = @_;
> > > > + my @r = split(//, $rem);
> > > > + my @a = split(//, $add);
>
> BTW the name of variable can be just @add and @rem.
>
I know they are different scopes but I don't like it. It makes the code
more confusing IMO. But I won't insist.
> > > Shouldn't
> > > $prefix / $prefix_len start from 0, not from 1?
> >
> > It starts from 1 because it skips first +/-. It should become obvious
> > after reading the comment from last patch :).
> >
> > + # In combined diff we must ignore two +/- characters.
> > + $prefix = 2 if ($is_combined);
>
> Anyway comment about that fact would be nice.
Will do.
>
> > > > + my ($prefix_is_space, $suffix_is_space) = (1, 1);
> > > > +
> > > > + while ($prefix < @r && $prefix < @a) {
> > > > + last if ($r[$prefix] ne $a[$prefix]);
> > > > +
> > > > + $prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> > > > + $prefix++;
> > > > + }
> > >
> > > Ah, I see that it is easier to find common prefix by treating string
> > > as array of characters.
> > >
> > > Though I wonder if it wouldn't be easier to use trick of XOR-ing two
> > > strings (see "Bitwise String Operators" in perlop(1)):
> > >
> > > my $xor = "$rem" ^ "$add";
> > >
> > > and finding starting sequence of "\0", which denote common prefix.
> > >
> > >
> > > Though this and the following is a nice implementation of
> > > algorithm... as it would be implemented in C. Nevermind, it might be
> > > good enough...
> >
> > The splitting and comparing by characters is taken from diff-highlight.
> > I don't think it's worth changing here.
>
> You are right.
>
> I'll try to come with hacky algorithm using string bitwise xor and regexp,
> and benchmark it comparing to your C-like solution, but it can be left for
> later (simple is better than clever, usually).
If you have time :).
> > > > # HTML-format diff context, removed and added lines.
> > > > sub format_ctx_rem_add_lines {
> > > > - my ($ctx, $rem, $add) = @_;
> > > > + my ($ctx, $rem, $add, $is_combined) = @_;
> > > > my (@new_ctx, @new_rem, @new_add);
> > > > + my $num_add_lines = @$add;
> > >
> > > Why is this temporary variable needed? If you are not sure that ==
> > > operator enforces scalar context on both arguments you can always
> > > write
> > >
> > > scalar @$add == scalar @$rem
> >
> > You read my mind.
>
> BTW. the same comment applies to patch adding support for highlighting
> changed part in combined diff.
>
OK
next prev parent reply other threads:[~2012-02-13 21:09 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
2012-02-11 15:20 ` Jakub Narebski
2012-02-11 23:03 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
2012-02-11 15:53 ` Jakub Narebski
2012-02-11 23:16 ` Michał Kiedrowicz
2012-02-25 9:00 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
2012-02-11 16:02 ` Jakub Narebski
2012-02-10 9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
2012-02-11 16:29 ` Jakub Narebski
2012-02-11 23:20 ` Michał Kiedrowicz
2012-02-11 23:30 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
2012-02-11 17:14 ` Jakub Narebski
2012-02-11 23:38 ` Michał Kiedrowicz
2012-02-10 9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 13:23 ` Jakub Narebski
2012-02-10 14:15 ` Michał Kiedrowicz
2012-02-10 14:55 ` Jakub Narebski
2012-02-10 17:33 ` Michał Kiedrowicz
2012-02-10 22:52 ` Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff) Jakub Narebski
2012-02-10 20:24 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
2012-02-14 6:54 ` Michal Kiedrowicz
2012-02-14 7:14 ` Junio C Hamano
2012-02-14 8:20 ` Jeff King
2012-02-10 20:20 ` Jeff King
2012-02-10 21:29 ` Michał Kiedrowicz
2012-02-10 21:32 ` Jeff King
2012-02-10 21:36 ` Michał Kiedrowicz
2012-02-10 21:47 ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
2012-02-13 22:27 ` Jeff King
2012-02-13 22:28 ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
2012-02-13 22:32 ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
2012-02-14 6:35 ` Michal Kiedrowicz
2012-02-13 22:33 ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
2012-02-13 22:36 ` [PATCH 4/5] diff-highlight: match " Jeff King
2012-02-13 22:37 ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
2012-02-14 6:48 ` Michal Kiedrowicz
2012-02-14 0:05 ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
2012-02-14 0:22 ` Jeff King
2012-02-14 1:19 ` Junio C Hamano
2012-02-14 6:04 ` Jeff King
2012-02-14 6:28 ` Michal Kiedrowicz
2012-02-10 21:56 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 23:45 ` Jakub Narebski
2012-02-12 10:42 ` Jakub Narebski
2012-02-13 6:54 ` Michal Kiedrowicz
2012-02-13 19:58 ` Jakub Narebski
2012-02-13 21:10 ` Michał Kiedrowicz
2012-02-13 6:41 ` Michal Kiedrowicz
2012-02-13 18:44 ` Jakub Narebski
2012-02-13 21:09 ` Michał Kiedrowicz [this message]
2012-02-14 17:31 ` Jakub Narebski
2012-02-14 18:23 ` Michał Kiedrowicz
2012-02-14 18:52 ` Jeff King
2012-02-14 20:04 ` Michał Kiedrowicz
2012-02-14 20:38 ` Jeff King
2012-02-10 9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
2012-02-12 0:11 ` Jakub Narebski
2012-02-13 6:46 ` Michal Kiedrowicz
2012-02-10 9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
2012-02-12 0:03 ` Jakub Narebski
2012-02-13 6:48 ` Michal Kiedrowicz
2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 22:56 ` Michał Kiedrowicz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120213220917.4cf14eb1@gmail.com \
--to=michal.kiedrowicz@gmail.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.