git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: "John Cai" <johncai86@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"John Cai via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org, "Patrick Steinhardt" <ps@pks.im>
Subject: Re: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm
Date: Sat, 11 Feb 2023 17:39:38 +0000	[thread overview]
Message-ID: <37a89e67-23f0-83f7-aaa5-0199d5db3adb@dunelm.org.uk> (raw)
In-Reply-To: <CABPp-BHQDS+AzWXtk9WV4HY2QZ8UdXrWJJDr-y6VPoLB6HuAfw@mail.gmail.com>

Hi Elijah

On 10/02/2023 09:57, Elijah Newren wrote:
> Hi Phillip,
> 
> On Thu, Feb 9, 2023 at 6:44 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>>
>> Hi Elijah
>>
>> On 09/02/2023 09:09, Elijah Newren wrote:
>> I'd support a change to either patience or histogram as the default
>> algorithm. My personal preference would be for the patience algorithm as
>> I think it generally gives nicer diffs in the cases that the two
>> disagree (see below, I've tried changing diff.algorithm to histogram a
>> few times and I always end up changing it back to patience pretty
>> quickly). However I can see there is an advantage in having "diff" and
>> "merge" use the same algorithm as users who diffing either side to the
>> merge base will see the same diff that the merge is using. The histogram
>> algorithm is known to produce sub-optimal diffs in certain cases[1] but
>> I'm not sure how much worse it is in that respect than any of the other
>> algorithms.
> [...]
>> [1]
>> https://lore.kernel.org/git/CAGZ79kZYO6hHiAM8Sfp3J=VX11c=0-7YDSx3_EAKt5-uvvt-Ew@mail.gmail.com/
> 
> Thanks, I might have a fix, though I'm a bit worried my tweaks might
> trigger issues elsewhere or cost a bit of performance; I'll need to
> test.  Are there any other good known testcases where histogram
> produces sub-optimal diffs?

Not that I'm aware of (I've a feeling there might have been something on 
the JGit mailing list but I only managed to find a copy of Stefan's 
message). Loosely related is [1] which talks about hash collisions and 
I've never found the time to look at properly. I suspect any hash 
collision problem is more likely to affect xdl_classify_record() which 
is used by all the algorithms.

>> To see the differences between the output of patience and histogram
>> algorithms I diffed the output of "git log -p --no-merges
>> --diff-algorithm=patience" and "git log -p --no-merges
>> --diff-algorithm=histogram". The first three differences are
>>
>> - 6c065f72b8 (http: support CURLOPT_PROTOCOLS_STR, 2023-01-16)
>>     In get_curl_allowed_protocols() the patience algorithm shows the
>>     change in the return statement more clearly
>>
>> - 47cfc9bd7d (attr: add flag `--source` to work with tree-ish, 2023-01-14)
>>      The histogram algorithm shows read_attr_from_index() being moved
>>      whereas the patience algorithm does not making the diff easier to
>>      follow.
>>
>> - b0226007f0 (fsmonitor: eliminate call to deprecated FSEventStream
>> function, 2022-12-14)
>>     In fsm_listen__stop_async() the histogram algorithm shows
>>     data->shutdown_style = SHUTDOWN_EVENT;
>>     being moved, which is not as clear as the patience output which
>>     shows it as a context line.
> 
> If my current changes are "good", then they also remove the
> differences between patience and histogram for the second and third
> commits above.  (And the differences between the two algorithms for
> the first commit look really minor.)

Interesting. I agree the differences for the first commit are small. 
Interestingly I think they come from patience algorithm falling back to 
the myers implementation because it cannot find any unique context lines 
(I have a patch that removes the fallback[2] and it gives the same 
result as the histogram implementation).

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/4e0eff48-4a3e-4f0e-9ed2-d01ec38442a5@www.fastmail.com/

[2] https://github.com/phillipwood/git/commits/pure-patience-diff

  reply	other threads:[~2023-02-11 17:39 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-05  3:46 [PATCH 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-05  3:46 ` [PATCH 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-06 16:20   ` Phillip Wood
2023-02-05  3:46 ` [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai via GitGitGadget
2023-02-05 17:50   ` Eric Sunshine
2023-02-06 13:10     ` John Cai
2023-02-06 16:27   ` Phillip Wood
2023-02-06 18:14     ` Eric Sunshine
2023-02-06 19:50     ` John Cai
2023-02-09  8:26       ` Elijah Newren
2023-02-09 10:31         ` "bad" diffs (was: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm) Ævar Arnfjörð Bjarmason
2023-02-09 16:37         ` [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai
2023-02-06 16:39   ` Ævar Arnfjörð Bjarmason
2023-02-06 20:37     ` John Cai
2023-02-07 14:55       ` Phillip Wood
2023-02-07 17:00         ` John Cai
2023-02-09  9:09           ` Elijah Newren
2023-02-09 14:44             ` Phillip Wood
2023-02-10  9:57               ` Elijah Newren
2023-02-11 17:39                 ` Phillip Wood [this message]
2023-02-11  1:59               ` Jeff King
2023-02-15  2:35                 ` Elijah Newren
2023-02-15  4:21                   ` Jeff King
2023-02-15  5:20                     ` Junio C Hamano
2023-02-15 14:44                 ` Phillip Wood
2023-02-15 15:00                   ` Jeff King
2023-02-07 17:27         ` Ævar Arnfjörð Bjarmason
2023-02-15 14:47           ` Phillip Wood
2023-02-09  8:44       ` Elijah Newren
2023-02-14 21:16         ` John Cai
2023-02-15  3:41           ` Elijah Newren
2023-02-09  7:50     ` Elijah Newren
2023-02-09  9:41       ` Ævar Arnfjörð Bjarmason
2023-02-11  2:04         ` Jeff King
2023-02-07 17:56   ` Jeff King
2023-02-07 20:18     ` Ævar Arnfjörð Bjarmason
2023-02-07 20:47       ` Junio C Hamano
2023-02-07 21:05         ` Ævar Arnfjörð Bjarmason
2023-02-07 21:28           ` Junio C Hamano
2023-02-07 21:44             ` Ævar Arnfjörð Bjarmason
2023-02-09 16:34     ` John Cai
2023-02-11  1:39       ` Jeff King
2023-02-14 21:40 ` [PATCH v2 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-14 21:40   ` [PATCH v2 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-15  2:38     ` Junio C Hamano
2023-02-15 23:34       ` John Cai
2023-02-15 23:42         ` Junio C Hamano
2023-02-16  2:14           ` Jeff King
2023-02-16  2:57             ` Junio C Hamano
2023-02-16 20:34               ` John Cai
2023-02-14 21:40   ` [PATCH v2 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai via GitGitGadget
2023-02-15  2:56     ` Junio C Hamano
2023-02-15  3:20       ` Junio C Hamano
2023-02-16 20:37         ` John Cai
2023-02-17 20:21   ` [PATCH v3 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-17 20:21     ` [PATCH v3 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-17 21:27       ` Junio C Hamano
2023-02-18  1:36       ` Elijah Newren
2023-02-17 20:21     ` [PATCH v3 2/2] diff: teach diff to read algorithm from diff driver John Cai via GitGitGadget
2023-02-17 21:50       ` Junio C Hamano
2023-02-18  2:56       ` Elijah Newren
2023-02-20 15:32         ` John Cai
2023-02-20 16:21           ` Elijah Newren
2023-02-20 16:49             ` John Cai
2023-02-20 17:32               ` Elijah Newren
2023-02-20 20:53                 ` John Cai
2023-02-22 19:47                 ` Jeff King
2023-02-24 17:44                   ` John Cai
2023-02-18  1:16     ` [PATCH v3 0/2] Teach diff to honor diff algorithms set through git attributes Elijah Newren
2023-02-20 13:37       ` John Cai
2023-02-20 21:04     ` [PATCH v4 " John Cai via GitGitGadget
2023-02-20 21:04       ` [PATCH v4 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-20 21:04       ` [PATCH v4 2/2] diff: teach diff to read algorithm from diff driver John Cai via GitGitGadget
2023-02-21 17:34       ` [PATCH v4 0/2] Teach diff to honor diff algorithms set through git attributes Junio C Hamano
2023-02-21 18:05         ` Elijah Newren
2023-02-21 18:51           ` Junio C Hamano
2023-02-21 19:36             ` John Cai
2023-02-21 20:16               ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37a89e67-23f0-83f7-aaa5-0199d5db3adb@dunelm.org.uk \
    --to=phillip.wood123@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johncai86@gmail.com \
    --cc=newren@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).