git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Beller <sbeller@google.com>
To: jrnieder@gmail.com
Cc: git@vger.kernel.org, sbeller@google.com
Subject: [PATCH] Documentation/diff-options: explain different diff algorithms
Date: Mon,  6 Aug 2018 15:25:51 -0700	[thread overview]
Message-ID: <20180806222551.132628-1-sbeller@google.com> (raw)
In-Reply-To: <20180724044051.GA208393@aiede.svl.corp.google.com>

As a user I wondered what the diff algorithms are about. Offer at least
a basic explanation on the differences of the diff algorithms.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt | 10 +++++++---
 Documentation/git-diff.txt     | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f394608b42c..eae033a21ea 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -91,14 +91,18 @@ appearing as a deletion or addition in the output. It uses the "patience
 diff" algorithm internally.
 
 --diff-algorithm={patience|minimal|histogram|myers}::
-	Choose a diff algorithm. The variants are as follows:
+	Choose a diff algorithm. See the discussion of DIFF ALGORITHMS
+ifndef::git-diff[]
+	in linkgit:git-diff[1]
+endif::git-diff[]
+	. The variants are as follows:
 +
 --
 `default`, `myers`;;
 	The basic greedy diff algorithm. Currently, this is the default.
 `minimal`;;
-	Spend extra time to make sure the smallest possible diff is
-	produced.
+	The same algorithm as `myers`, but spend extra time to make
+	sure the smallest possible diff is produced.
 `patience`;;
 	Use "patience diff" algorithm when generating patches.
 `histogram`;;
diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index b180f1fa5bf..b182389aaae 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -119,6 +119,40 @@ include::diff-options.txt[]
 
 include::diff-format.txt[]
 
+DIFF ALGORITHMS
+---------------
+`Myers`
+
+A diff as produced by the basic greedy algorithm described in
+link:http://www.xmailserver.org/diff2.pdf[An O(ND) Difference Algorithm and its Variations].
+with a run time of O(M + N + D^2). It employs a heuristic to allow for
+a faster diff at the small cost of diff size.
+The `minimal` algorithm has that heuristic turned off.
+
+`Patience`
+
+This algorithm by Bram Cohen matches the longest common subsequence
+of unique lines on both sides, recursively. It obtained its name by
+the way the longest subsequence is found, as that is a byproduct of
+the patience sorting algorithm. If there are no unique lines left
+it falls back to `myers`. Empirically this algorithm produces
+a more readable output for code, but it does not garantuee
+the shortest output.
+
+`Histogram`
+
+This algorithm finds the longest common substring and recursively
+diffs the content before and after the longest common substring.
+If there are no common substrings left, fallback to `myers`.
+This is often the fastest, but in corner cases (when there are
+many common substrings of the same length) it produces bad
+results as seen in:
+
+	seq 1 100 >one
+	echo 99 > two
+	seq 1 2 98 >>two
+	git diff --no-index --histogram one two
+
 EXAMPLES
 --------
 
-- 
2.18.0.597.ga71716f1ad-goog


  parent reply	other threads:[~2018-08-06 22:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-24  0:36 [PATCH] Documentation/diff-options: explain different diff algorithms Stefan Beller
2018-07-24  4:40 ` Jonathan Nieder
2018-07-24 17:38   ` Stefan Beller
2018-07-24 20:06     ` Junio C Hamano
2018-08-06 22:25   ` Stefan Beller [this message]
2018-08-06 23:18     ` Jonathan Nieder
2018-08-07 15:56       ` Junio C Hamano
2018-08-09 19:26         ` Stefan Beller
2018-08-10 22:18           ` Stefan Beller
2018-08-09 19:51       ` Stefan Beller
2018-08-10  0:10 ` [PATCH 0/2] Getting data on different diff algorithms WAS: " Stefan Beller
2018-08-10  0:10   ` [PATCH 1/2] WIP: range-diff: take extra arguments for different diffs Stefan Beller
2018-08-10  0:10   ` [PATCH 2/2] WIP range-diff: print some statistics about the range Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180806222551.132628-1-sbeller@google.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).