All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Beller <sbeller@google.com>
To: git@vger.kernel.org
Cc: Stefan Beller <sbeller@google.com>
Subject: [PATCH] Documentation/diff-options: explain different diff algorithms
Date: Mon, 23 Jul 2018 17:36:19 -0700	[thread overview]
Message-ID: <20180724003619.185290-1-sbeller@google.com> (raw)

As a user I wondered what the diff algorithms are about. Offer at least
a basic explanation on the differences of the diff algorithms.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f466600972f..0d765482027 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -94,16 +94,34 @@ diff" algorithm internally.
 	Choose a diff algorithm. The variants are as follows:
 +
 --
-`default`, `myers`;;
-	The basic greedy diff algorithm. Currently, this is the default.
 `minimal`;;
-	Spend extra time to make sure the smallest possible diff is
-	produced.
+	A diff as produced by the basic greedy algorithm described in
+	link:http://www.xmailserver.org/diff2.pdf[An O(ND) Difference Algorithm and its Variations]
+`default`, `myers`;;
+	The same algorithm as `minimal`, extended with a heuristic to
+	reduce extensive searches. Currently, this is the default.
 `patience`;;
-	Use "patience diff" algorithm when generating patches.
+	Use "patience diff" algorithm when generating patches. This
+	matches the longest common subsequence of unique lines on
+	both sides, recursively. It obtained its name by the way the
+	longest subsequence is found, as that is a byproduct of the
+	patience sorting algorithm. If there are no unique lines left
+	it falls back to `myers`. Empirically this algorithm produces
+	a more readable output for code, but it does not garantuee
+	the shortest output.
 `histogram`;;
-	This algorithm extends the patience algorithm to "support
-	low-occurrence common elements".
+	This algorithm re-implements the `patience` algorithm with
+	"support of low-occurrence common elements" and only picks
+	one element of the LCS for the recursion. It is often the
+	fastest, but in cornercases (when there are many longest
+	common subsequences of the same length) it produces bad
+	results as seen in:
+
+		seq 1 100 >one
+		echo 99 > two
+		seq 1 2 98 >>two
+		git diff --no-index --histogram one two
+
 --
 +
 For instance, if you configured diff.algorithm variable to a
-- 
2.18.0.345.g5c9ce644c3-goog


             reply	other threads:[~2018-07-24  0:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-24  0:36 Stefan Beller [this message]
2018-07-24  4:40 ` [PATCH] Documentation/diff-options: explain different diff algorithms Jonathan Nieder
2018-07-24 17:38   ` Stefan Beller
2018-07-24 20:06     ` Junio C Hamano
2018-08-06 22:25   ` Stefan Beller
2018-08-06 23:18     ` Jonathan Nieder
2018-08-07 15:56       ` Junio C Hamano
2018-08-09 19:26         ` Stefan Beller
2018-08-10 22:18           ` Stefan Beller
2018-08-09 19:51       ` Stefan Beller
2018-08-10  0:10 ` [PATCH 0/2] Getting data on different diff algorithms WAS: " Stefan Beller
2018-08-10  0:10   ` [PATCH 1/2] WIP: range-diff: take extra arguments for different diffs Stefan Beller
2018-08-10  0:10   ` [PATCH 2/2] WIP range-diff: print some statistics about the range Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180724003619.185290-1-sbeller@google.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.