All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] Documentation/diff-options: explain different diff algorithms
Date: Mon, 6 Aug 2018 16:18:43 -0700	[thread overview]
Message-ID: <20180806231843.GA4117@aiede.svl.corp.google.com> (raw)
In-Reply-To: <20180806222551.132628-1-sbeller@google.com>

Hi,

Stefan Beller wrote:
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -91,14 +91,18 @@ appearing as a deletion or addition in the output. It uses the "patience
>  diff" algorithm internally.
>  
>  --diff-algorithm={patience|minimal|histogram|myers}::
> -	Choose a diff algorithm. The variants are as follows:
> +	Choose a diff algorithm. See the discussion of DIFF ALGORITHMS
> +ifndef::git-diff[]
> +	in linkgit:git-diff[1]
> +endif::git-diff[]
> +	. The variants are as follows:

This means outside of git-diff(1), I'd see

	See the discussion of DIFF ALGORITHMS in git-diff(1) .

And in git-diff(1), I'd see

	See the discussion of DIFF ALGORITHMS .

Both don't seem quite right, since they have an extra space before the
period.  The git-diff(1) seems especially not quite right --- does it
intend to say something like "See the DIFF ALGORITHMS section for more
discussion"?

[...]
> --- a/Documentation/git-diff.txt
> +++ b/Documentation/git-diff.txt
> @@ -119,6 +119,40 @@ include::diff-options.txt[]
>  
>  include::diff-format.txt[]
>  
> +DIFF ALGORITHMS
> +---------------

Please add some introductory words about what the headings refer to.

> +`Myers`
> +
> +A diff as produced by the basic greedy algorithm described in
> +link:http://www.xmailserver.org/diff2.pdf[An O(ND) Difference Algorithm and its Variations].
> +with a run time of O(M + N + D^2). It employs a heuristic to allow for
> +a faster diff at the small cost of diff size.
> +The `minimal` algorithm has that heuristic turned off.
> +
> +`Patience`
> +
> +This algorithm by Bram Cohen matches the longest common subsequence
> +of unique lines on both sides, recursively. It obtained its name by
> +the way the longest subsequence is found, as that is a byproduct of
> +the patience sorting algorithm. If there are no unique lines left
> +it falls back to `myers`. Empirically this algorithm produces
> +a more readable output for code, but it does not garantuee

nit: s/garantuee/guarantee/

> +the shortest output.

Trivia: the `minimal` variant of Myers doesn't guarantee shortest
output, either: what it minimizes is the number of lines marked as
added or removed.  If you want to minimize context lines too, then
that would be a new variant. ;-)

[...]
> +`Histogram`
> +
> +This algorithm finds the longest common substring and recursively
> +diffs the content before and after the longest common substring.

optional: may be worth a short aside in the text about the distinction
between LCS and LCS. ;-)

It would be especially useful here, since the alphabet used in these
strings is *lines* instead of characters, so the first-time reader
could probably use some help in building their intuition.

> +If there are no common substrings left, fallback to `myers`.

nit: fallback is the noun, fall back is the verb.

> +This is often the fastest, but in corner cases (when there are
> +many common substrings of the same length) it produces bad

Can you clarify what "bad" means?  E.g. would "unexpected", or "poorly
aligned", match what you mean?

> +results as seen in:
> +
> +	seq 1 100 >one
> +	echo 99 > two
> +	seq 1 2 98 >>two
> +	git diff --no-index --histogram one two
> +
>  EXAMPLES
>  --------

Thanks,
Jonathan

  reply	other threads:[~2018-08-06 23:18 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-24  0:36 [PATCH] Documentation/diff-options: explain different diff algorithms Stefan Beller
2018-07-24  4:40 ` Jonathan Nieder
2018-07-24 17:38   ` Stefan Beller
2018-07-24 20:06     ` Junio C Hamano
2018-08-06 22:25   ` Stefan Beller
2018-08-06 23:18     ` Jonathan Nieder [this message]
2018-08-07 15:56       ` Junio C Hamano
2018-08-09 19:26         ` Stefan Beller
2018-08-10 22:18           ` Stefan Beller
2018-08-09 19:51       ` Stefan Beller
2018-08-10  0:10 ` [PATCH 0/2] Getting data on different diff algorithms WAS: " Stefan Beller
2018-08-10  0:10   ` [PATCH 1/2] WIP: range-diff: take extra arguments for different diffs Stefan Beller
2018-08-10  0:10   ` [PATCH 2/2] WIP range-diff: print some statistics about the range Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180806231843.GA4117@aiede.svl.corp.google.com \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.