git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>
Cc: git@vger.kernel.org
Subject: [PATCH 5/5] diff-highlight: document some non-optimal cases
Date: Mon, 13 Feb 2012 17:37:33 -0500	[thread overview]
Message-ID: <20120213223733.GE19521@sigill.intra.peff.net> (raw)
In-Reply-To: <20120213222702.GA19393@sigill.intra.peff.net>

The diff-highlight script works on heuristics, so it can be
wrong. Let's document some of the wrong-ness in case
somebody feels like working on it.

Signed-off-by: Jeff King <peff@peff.net>
---
These were just some that I considered while looking at the output of
the original and the current code. Suggestions are welcome for more.

 contrib/diff-highlight/README |   93 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)

diff --git a/contrib/diff-highlight/README b/contrib/diff-highlight/README
index 4a58579..502e03b 100644
--- a/contrib/diff-highlight/README
+++ b/contrib/diff-highlight/README
@@ -57,3 +57,96 @@ following in your git configuration:
 	show = diff-highlight | less
 	diff = diff-highlight | less
 ---------------------------------------------
+
+Bugs
+----
+
+Because diff-highlight relies on heuristics to guess which parts of
+changes are important, there are some cases where the highlighting is
+more distracting than useful. Fortunately, these cases are rare in
+practice, and when they do occur, the worst case is simply a little
+extra highlighting. This section documents some cases known to be
+sub-optimal, in case somebody feels like working on improving the
+heuristics.
+
+1. Two changes on the same line get highlighted in a blob. For example,
+   highlighting:
+
+----------------------------------------------
+-foo(buf, size);
++foo(obj->buf, obj->size);
+----------------------------------------------
+
+   yields (where the inside of "+{}" would be highlighted):
+
+----------------------------------------------
+-foo(buf, size);
++foo(+{obj->buf, obj->}size);
+----------------------------------------------
+
+   whereas a more semantically meaningful output would be:
+
+----------------------------------------------
+-foo(buf, size);
++foo(+{obj->}buf, +{obj->}size);
+----------------------------------------------
+
+   Note that doing this right would probably involve a set of
+   content-specific boundary patterns, similar to word-diff. Otherwise
+   you get junk like:
+
+-----------------------------------------------------
+-this line has some -{i}nt-{ere}sti-{ng} text on it
++this line has some +{fa}nt+{a}sti+{c} text on it
+-----------------------------------------------------
+
+   which is less readable than the current output.
+
+2. The multi-line matching assumes that lines in the pre- and post-image
+   match by position. This is often the case, but can be fooled when a
+   line is removed from the top and a new one added at the bottom (or
+   vice versa). Unless the lines in the middle are also changed, diffs
+   will show this as two hunks, and it will not get highlighted at all
+   (which is good). But if the lines in the middle are changed, the
+   highlighting can be misleading. Here's a pathological case:
+
+-----------------------------------------------------
+-one
+-two
+-three
+-four
++two 2
++three 3
++four 4
++five 5
+-----------------------------------------------------
+
+   which gets highlighted as:
+
+-----------------------------------------------------
+-one
+-t-{wo}
+-three
+-f-{our}
++two 2
++t+{hree 3}
++four 4
++f+{ive 5}
+-----------------------------------------------------
+
+   because it matches "two" to "three 3", and so forth. It would be
+   nicer as:
+
+-----------------------------------------------------
+-one
+-two
+-three
+-four
++two +{2}
++three +{3}
++four +{4}
++five 5
+-----------------------------------------------------
+
+   which would probably involve pre-matching the lines into pairs
+   according to some heuristic.
-- 
1.7.8.4.17.g2df81

  parent reply	other threads:[~2012-02-13 22:37 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
2012-02-11 15:20   ` Jakub Narebski
2012-02-11 23:03     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
2012-02-11 15:53   ` Jakub Narebski
2012-02-11 23:16     ` Michał Kiedrowicz
2012-02-25  9:00     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
2012-02-11 16:02   ` Jakub Narebski
2012-02-10  9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
2012-02-11 16:29   ` Jakub Narebski
2012-02-11 23:20     ` Michał Kiedrowicz
2012-02-11 23:30       ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
2012-02-11 17:14   ` Jakub Narebski
2012-02-11 23:38     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 13:23   ` Jakub Narebski
2012-02-10 14:15     ` Michał Kiedrowicz
2012-02-10 14:55       ` Jakub Narebski
2012-02-10 17:33         ` Michał Kiedrowicz
2012-02-10 22:52           ` Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff) Jakub Narebski
2012-02-10 20:24         ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
2012-02-14  6:54     ` Michal Kiedrowicz
2012-02-14  7:14       ` Junio C Hamano
2012-02-14  8:20         ` Jeff King
2012-02-10 20:20   ` Jeff King
2012-02-10 21:29     ` Michał Kiedrowicz
2012-02-10 21:32       ` Jeff King
2012-02-10 21:36         ` Michał Kiedrowicz
2012-02-10 21:47         ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
2012-02-13 22:27           ` Jeff King
2012-02-13 22:28             ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
2012-02-13 22:32             ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
2012-02-14  6:35               ` Michal Kiedrowicz
2012-02-13 22:33             ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
2012-02-13 22:36             ` [PATCH 4/5] diff-highlight: match " Jeff King
2012-02-13 22:37             ` Jeff King [this message]
2012-02-14  6:48               ` [PATCH 5/5] diff-highlight: document some non-optimal cases Michal Kiedrowicz
2012-02-14  0:05             ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
2012-02-14  0:22               ` Jeff King
2012-02-14  1:19                 ` Junio C Hamano
2012-02-14  6:04                   ` Jeff King
2012-02-14  6:28             ` Michal Kiedrowicz
2012-02-10 21:56     ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 23:45   ` Jakub Narebski
2012-02-12 10:42     ` Jakub Narebski
2012-02-13  6:54       ` Michal Kiedrowicz
2012-02-13 19:58         ` Jakub Narebski
2012-02-13 21:10           ` Michał Kiedrowicz
2012-02-13  6:41     ` Michal Kiedrowicz
2012-02-13 18:44       ` Jakub Narebski
2012-02-13 21:09         ` Michał Kiedrowicz
2012-02-14 17:31           ` Jakub Narebski
2012-02-14 18:23             ` Michał Kiedrowicz
2012-02-14 18:52               ` Jeff King
2012-02-14 20:04                 ` Michał Kiedrowicz
2012-02-14 20:38                   ` Jeff King
2012-02-10  9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
2012-02-12  0:11   ` Jakub Narebski
2012-02-13  6:46     ` Michal Kiedrowicz
2012-02-10  9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
2012-02-12  0:03   ` Jakub Narebski
2012-02-13  6:48     ` Michal Kiedrowicz
2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 22:56   ` Michał Kiedrowicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120213223733.GE19521@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=michal.kiedrowicz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).