git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Is --minimal ever not the right thing?
@ 2023-12-19 16:10 Tao Klerks
  2023-12-19 17:25 ` Mike Castle
  0 siblings, 1 reply; 4+ messages in thread
From: Tao Klerks @ 2023-12-19 16:10 UTC (permalink / raw)
  To: git

Hi folks,

A user today showed me a situation where `git diff` (and `git blame`)
seemed to be doing the wrong thing: where two big blocks of text were
removed from a file, leaving 4 lines untouched in the middle, the
default diff was noting all three regions as lines removed, with those
4 "untouched" lines as *added* in the same place.

We compared to another diffing tool, p4merge, and that was showing
"the right thing" - two deleted regions with untouched lines in the
middle.

We realized that `--minimal` does "the right thing" in git, and you
can set up `diff.algorithm` config to use it by default in `git diff`
(although `git blame` doesn't currently/yet support it... a small
enhancement opportunity there :) ), but that raises two questions:

1. Is there any practical reason for any user *not* to set
`diff.algorithm` to `minimal`? Has anyone ever done an analysis of the
performance cost (or "diff readability cost", if that is a thing) of
"minimal" vs "default"?

2. If "minimal" is just better, and its higher computational cost is
effectively trivial, then why wouldn't we change the default?

I suspect this comes down to situations where git does big diffs
behind the scenes...? But I don't know offhand.

Any feedback would be most appreciated!

Thanks,
Tao

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Is --minimal ever not the right thing?
  2023-12-19 16:10 Is --minimal ever not the right thing? Tao Klerks
@ 2023-12-19 17:25 ` Mike Castle
  2023-12-19 17:55   ` Elijah Newren
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Castle @ 2023-12-19 17:25 UTC (permalink / raw)
  To: Tao Klerks; +Cc: git

I believe that the diff algorithms available are the same one's in GNU
diff.  From https://www.gnu.org/software/diffutils/manual/html_node/diff-Performance.html:
"""
The way that GNU diff determines which lines have changed always comes
up with a near-minimal set of differences. Usually it is good enough
for practical purposes. If the diff output is large, you might want
diff to use a modified algorithm that sometimes produces a smaller set
of differences. The --minimal (-d) option does this; however, it can
also cause diff to run more slowly than usual, so it is not the
default behavior.
"""

Since it has been that way decades before git even existed, I suspect
(but do not know) that, yes, analysis has been performed, and it makes
sense to keep the current default.

Then again, in the decades sense, the entire stack from hardware to
compilers has improved, and maybe it does deserve a revisit.  You
could check whatever email archives is used for diffutils and see if
there has been any discussion on it recently (say, last 5 years?).

As you pointed out, you can set it yourself and see what happens over time.

Cheers,
mrc

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Is --minimal ever not the right thing?
  2023-12-19 17:25 ` Mike Castle
@ 2023-12-19 17:55   ` Elijah Newren
  2023-12-19 18:09     ` Konstantin Tokarev
  0 siblings, 1 reply; 4+ messages in thread
From: Elijah Newren @ 2023-12-19 17:55 UTC (permalink / raw)
  To: Mike Castle; +Cc: Tao Klerks, git

To add to what Mike said...

On Tue, Dec 19, 2023 at 9:25 AM Mike Castle <dalgoda@gmail.com> wrote:
>
> I believe that the diff algorithms available are the same one's in GNU
> diff.  From https://www.gnu.org/software/diffutils/manual/html_node/diff-Performance.html:
> """
> The way that GNU diff determines which lines have changed always comes
> up with a near-minimal set of differences. Usually it is good enough
> for practical purposes. If the diff output is large, you might want
> diff to use a modified algorithm that sometimes produces a smaller set
> of differences. The --minimal (-d) option does this; however, it can
> also cause diff to run more slowly than usual, so it is not the
> default behavior.
> """
>
> Since it has been that way decades before git even existed, I suspect
> (but do not know) that, yes, analysis has been performed, and it makes
> sense to keep the current default.
>
> Then again, in the decades sense, the entire stack from hardware to
> compilers has improved, and maybe it does deserve a revisit.  You
> could check whatever email archives is used for diffutils and see if
> there has been any discussion on it recently (say, last 5 years?).
>
> As you pointed out, you can set it yourself and see what happens over time.

There have been various discussions of diff performance, quality of
results, what the default should be, etc.  Including within the last
year.

minimal is guaranteed to produce a minimal diff, i.e. fewest total
subtractions and additions.  That is sometimes "best" quality, but
definitely not always.  On the performance axis, in special cases
minimal can be nearly as fast as myers and the other diff algorithms,
but only in special cases.

I think patience or histogram would make better defaults, at least
with some tweaks.  I had some patches to improve some worst case
performance and quality results coming from histogram that I was
working on in early 2023, but those got put on the backburner when
$DAYJOB pulled support for my Git work.  And I'm not aware of anyone
else currently working in the area.

Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Is --minimal ever not the right thing?
  2023-12-19 17:55   ` Elijah Newren
@ 2023-12-19 18:09     ` Konstantin Tokarev
  0 siblings, 0 replies; 4+ messages in thread
From: Konstantin Tokarev @ 2023-12-19 18:09 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Mike Castle, Tao Klerks, git

On Tue, 19 Dec 2023 09:55:34 -0800
Elijah Newren <newren@gmail.com> wrote:

> minimal is guaranteed to produce a minimal diff, i.e. fewest total
> subtractions and additions.  That is sometimes "best" quality, but
> definitely not always. 

I second this. Recently I had a case when I had to use --anchored
option of git diff to produce more informative diff instead of minimal
one.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-12-19 18:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-19 16:10 Is --minimal ever not the right thing? Tao Klerks
2023-12-19 17:25 ` Mike Castle
2023-12-19 17:55   ` Elijah Newren
2023-12-19 18:09     ` Konstantin Tokarev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).