git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git diff-tree -r -C output inexact sometimes
@ 2012-09-21  3:20 Cristian Tibirna
  2012-09-21  6:03 ` Jeff King
  0 siblings, 1 reply; 2+ messages in thread
From: Cristian Tibirna @ 2012-09-21  3:20 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]

Hello

A colleague of mine discovered an inconsistency in the functioning of 

git diff-tree -r -C

in specific conditions. As tenuous as these conditions might seem (once you 
run the script in attachment and analyse its output), please rest assured that 
it comes from a real-life case.

Running the script in attachment produces a git repository in which were 
operated a large number of file renames, in which many of the renamed files 
(in this particular case all) have the same content but different names.

The commit data from the renaming operation (last commit in the script-
generated history) is inexactly rendered by the command 

git diff-tree -r -C master

The logical result is correctly produced by the more restricted command

git diff-tree -r -M master

IMO for this particular last commit both the above commands should return the 
same result.

Note that reducing i or j in the generator script attached below makes the bug 
dissapear.

Thanks a lot for your attention.

-- 
Cristian Tibirna
KDE developer .. tibirna@kde.org .. http://www.kde.org

[-- Attachment #2: generate_git_tree.sh --]
[-- Type: application/x-shellscript, Size: 418 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: git diff-tree -r -C output inexact sometimes
  2012-09-21  3:20 git diff-tree -r -C output inexact sometimes Cristian Tibirna
@ 2012-09-21  6:03 ` Jeff King
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2012-09-21  6:03 UTC (permalink / raw)
  To: Cristian Tibirna; +Cc: git

On Thu, Sep 20, 2012 at 11:20:31PM -0400, Cristian Tibirna wrote:

> Running the script in attachment produces a git repository in which were 
> operated a large number of file renames, in which many of the renamed files 
> (in this particular case all) have the same content but different names.
> 
> The commit data from the renaming operation (last commit in the script-
> generated history) is inexactly rendered by the command 
> 
> git diff-tree -r -C master
> 
> The logical result is correctly produced by the more restricted command
> 
> git diff-tree -r -M master
> 
> IMO for this particular last commit both the above commands should return the 
> same result.

Interesting. I get the same results from both commands. But I did have
to munge your script, as my "rename" command does not seem to work like
the one you expect in your script. So I may have misinterpreted the
intent of it.

However, I would not be surprised if one could conduct a situation in
which "-C" and "-M" produced different results. Since the content of all
the files is the same, git has to make a guess about which files match
up based on their filenames. The current heuristic is very stupid and
just tries to match basenames (e.g., moving "foo/Makefile" to
"bar/Makefile" is a better match than moving the same content to
"bar/foo.c"). But in this case, the basenames don't match at all.

By using "-C", we will typically have more rename sources available, and
we may therefore process the possible pairs in a different order. Since
our name heuristic is largely useless, our results depend on that order.

I think the real solution is to improve the name heuristic. Something
like an edit distance would make more sense (though I think it is not as
simple as an edit distance across the whole pathname, as moving a
basename across directories should probably be preferred to changing the
filename inside a directory).

Largely I think nobody has cared much because this only comes up when
you move multiple identical files. Quite often there is a minor
difference even between very similar files, and that is enough to come
up with sane results.

-Peff

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-09-21  6:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-21  3:20 git diff-tree -r -C output inexact sometimes Cristian Tibirna
2012-09-21  6:03 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).