git-mv redux: there must be something else going on

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* git-mv redux: there must be something else going on
@ 2010-02-03 18:25 Ron Garret
  2010-02-03 18:48 ` Avery Pennarun
  0 siblings, 1 reply; 20+ messages in thread
From: Ron Garret @ 2010-02-03 18:25 UTC (permalink / raw)
  To: git

Based on my current understanding of git there should be no difference 
between a git mv and a git rm followed by a git add.  But empirically 
there is a difference.  git log --M --follow is able to track files 
through git mvs even if their content changes completely.  Likewise, it 
does *not* track files through rm/add combinations even if the content 
didn't change at all.  (See experiment transcript below.)

So something in my understanding of how git works must be wrong.  Git 
must be keeping a separate record of file renames somewhere.  But where?

Just for the record, I'm not complaining about this behavior.  In fact, 
what git does is exactly what I want.  I just want to understand how it 
works.

Thanks,
rg


---

[ron@mickey:~/devel/gittest]$ cat>file1
1
2
3
4
5
[ron@mickey:~/devel/gittest]$ cat>file2
a
b
c
d
e
[ron@mickey:~/devel/gittest]$ git init
Initialized empty Git repository in /Users/ron/devel/gittest/.git/
[ron@mickey:~/devel/gittest]$ git add file1
[ron@mickey:~/devel/gittest]$ git commit -m 'Add numbers'
[master (root-commit) 54c2e4a] Add numbers
 1 files changed, 5 insertions(+), 0 deletions(-)
 create mode 100644 file1
[ron@mickey:~/devel/gittest]$ git rm file1
rm 'file1'
[ron@mickey:~/devel/gittest]$ git add file2
[ron@mickey:~/devel/gittest]$ git commit -m 'numbers->letters'
[master fe05d12] numbers->letters
 2 files changed, 5 insertions(+), 5 deletions(-)
 delete mode 100644 file1
 create mode 100644 file2
[ron@mickey:~/devel/gittest]$ git log --name-status -M --follow file2
commit fe05d1233be1bb11f4ed0e8496e4191795d515a0
Author: rongarret <ron@mickey>
Date:   Wed Feb 3 10:13:38 2010 -0800

   numbers->letters

A       file2
[ron@mickey:~/devel/gittest]$ ls
file2 git/
[ron@mickey:~/devel/gittest]$ cat>file2
6
7
8
9
10
[ron@mickey:~/devel/gittest]$ git mv file2 file3
[ron@mickey:~/devel/gittest]$ git commit -m 'letters->numbers'
[master ae3f6d4] letters->numbers
 1 files changed, 0 insertions(+), 0 deletions(-)
 rename file2 => file3 (100%)
[ron@mickey:~/devel/gittest]$ git log --name-status -M --follow file3
commit ae3f6d440483fa41cf08819237e87d567ac3a31d
Author: rongarret <ron@mickey>
Date:   Wed Feb 3 10:15:00 2010 -0800

    letters->numbers

R100    file2   file3

commit fe05d1233be1bb11f4ed0e8496e4191795d515a0
Author: rongarret <ron@mickey>
Date:   Wed Feb 3 10:13:38 2010 -0800

   numbers->letters

A       file2
[ron@mickey:~/devel/gittest]$ ls
file3 git/
[ron@mickey:~/devel/gittest]$ mv file3 file4
[ron@mickey:~/devel/gittest]$ git rm file3
rm 'file3'
[ron@mickey:~/devel/gittest]$ git add file4
[ron@mickey:~/devel/gittest]$ git commit -m 'rm/add identical content'
[master a3d7227] rm/add identical content
 2 files changed, 5 insertions(+), 5 deletions(-)
 delete mode 100644 file3
 create mode 100644 file4
[ron@mickey:~/devel/gittest]$ git log --name-status -M --follow file4
commit a3d7227fc2edca75fff8894acd5b077d1788bb36
Author: rongarret <ron@mickey>
Date:   Wed Feb 3 10:17:23 2010 -0800

    rm/add identical content

A       file4
[ron@mickey:~/devel/gittest]$

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 18:25 git-mv redux: there must be something else going on Ron Garret
@ 2010-02-03 18:48 ` Avery Pennarun
  2010-02-03 19:23   ` Ron Garret
  2010-02-03 20:12   ` Pete Harlan
  0 siblings, 2 replies; 20+ messages in thread
From: Avery Pennarun @ 2010-02-03 18:48 UTC (permalink / raw)
  To: Ron Garret; +Cc: git

On Wed, Feb 3, 2010 at 1:25 PM, Ron Garret <ron1@flownet.com> wrote:
> So something in my understanding of how git works must be wrong.  Git
> must be keeping a separate record of file renames somewhere.  But where?

It doesn't.  Your experiment is wrong.

> [ron@mickey:~/devel/gittest]$ cat>file2
> 6
> 7
> 8
> 9
> 10
> [ron@mickey:~/devel/gittest]$ git mv file2 file3
> [ron@mickey:~/devel/gittest]$ git commit -m 'letters->numbers'
> [master ae3f6d4] letters->numbers
>  1 files changed, 0 insertions(+), 0 deletions(-)
>  rename file2 => file3 (100%)

Whoops.  You didn't 'git add file2' (before the mv) or 'git add file3'
(after the mv), or use commit -a, so what you've committed is the
*old* content of file2 under the name file3.  The *new* content of
file2 is still uncommitted in your work tree under the name file3.
This is why git can detect the move.  (The 100% is a good clue: it
means the old and new files are 100% identical.)

Artificial tests like this are useless anyway.  If you renamed file2
to file3 *and* changed all the contents, did you *really* rename it?
If so, who cares?  What good does it do you to know this?  If someone
else tries to patch the old file2 and you merge it into a (totally
different) file3 vs a (now missing) file2, how is that any better?

On the other hand, if one guy moves file2 to file3 and changes a few
lines, you want the other guy's patch to go into file3, whether the
first guy used 'git mv' or add+rm or anything else.

As long as only a few lines changed, git does the right thing.  If
most/all of the lines have changed, then there is no right thing,
because you'll get a nasty merge conflict either way.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 18:48 ` Avery Pennarun
@ 2010-02-03 19:23   ` Ron Garret
  2010-02-03 19:47     ` Avery Pennarun
  2010-02-03 19:53     ` Nicolas Pitre
  2010-02-03 20:12   ` Pete Harlan
  1 sibling, 2 replies; 20+ messages in thread
From: Ron Garret @ 2010-02-03 19:23 UTC (permalink / raw)
  To: git

In article 
<32541b131002031048i26d166d9w3567a60515235c34@mail.gmail.com>,
 Avery Pennarun <apenwarr@gmail.com> wrote:

> On Wed, Feb 3, 2010 at 1:25 PM, Ron Garret <ron1@flownet.com> wrote:
> > So something in my understanding of how git works must be wrong.  Git
> > must be keeping a separate record of file renames somewhere.  But where?
> 
> It doesn't.  Your experiment is wrong.
> 
> > [ron@mickey:~/devel/gittest]$ cat>file2
> > 6
> > 7
> > 8
> > 9
> > 10
> > [ron@mickey:~/devel/gittest]$ git mv file2 file3
> > [ron@mickey:~/devel/gittest]$ git commit -m 'letters->numbers'
> > [master ae3f6d4] letters->numbers
> >  1 files changed, 0 insertions(+), 0 deletions(-)
> >  rename file2 => file3 (100%)
> 
> Whoops.  You didn't 'git add file2' (before the mv) or 'git add file3'
> (after the mv), or use commit -a, so what you've committed is the
> *old* content of file2 under the name file3.  The *new* content of
> file2 is still uncommitted in your work tree under the name file3.
> This is why git can detect the move.  (The 100% is a good clue: it
> means the old and new files are 100% identical.)

Ah.  That explains everything.  Thanks.  (I thought git mv was 
equivalent to git rm followed by git add.  But it's not.)

> Artificial tests like this are useless anyway.

Yes, I know.  This was not intended to be a real-world example.  I was 
just trying to understand the heuristics that git uses to track filename 
changes, and in particular, how much a file could change before git 
decided it was a different file.  When I got to zero shared lines 
between old and new it was clear that I was missing something 
fundamental :-)

So... how *does* git decide when two blobs are different blobs and when 
they are the same blob with mods?  I asked this question before and was 
pointed to the diffcore docs, but that didn't really clear things up.  
That just describes all the different ways git can do diffs, not the 
actual heuristics that git uses to track content.

rg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 19:23   ` Ron Garret
@ 2010-02-03 19:47     ` Avery Pennarun
  2010-02-03 20:30       ` Ron Garret
  2010-02-03 19:53     ` Nicolas Pitre
  1 sibling, 1 reply; 20+ messages in thread
From: Avery Pennarun @ 2010-02-03 19:47 UTC (permalink / raw)
  To: Ron Garret; +Cc: git

On Wed, Feb 3, 2010 at 2:23 PM, Ron Garret <ron1@flownet.com> wrote:
> In article
> Ah.  That explains everything.  Thanks.  (I thought git mv was
> equivalent to git rm followed by git add.  But it's not.)

I suppose in this case it's not.  The only difference is when your
work tree differs from your index, though, and it's to be expected
that 'git rm', in removing things from the index, would lose your
ability to track those differences.

> So... how *does* git decide when two blobs are different blobs and when
> they are the same blob with mods?  I asked this question before and was
> pointed to the diffcore docs, but that didn't really clear things up.
> That just describes all the different ways git can do diffs, not the
> actual heuristics that git uses to track content.

If you really want to know the details, looking at the code really is
probably the best solution; it's not even that long.

The short version is that git chooses a set of candidate blobs, then
diffs them and figures out a percentage similarity between each pair.
(A simple way to think of the similarity index is "how long is the
diff compared to the file itself?"  If the diff is of length zero, the
similarity is 100%, and so on.) If the similarity is greater than a
certain threshold, then it's considered to be the same file.

Choosing the set of candidates is actually the more interesting
problem, since detecting moves using the above algorithm is O(n^2)
with the number of candidates.  That's why 'git diff' and 'git log'
don't do it at all by default.

If you provide -M, the set of candidates is the set of files that were
removed/modified and the set of files that were added.  (Added files
are compared against removed/modified files, iirc.)  Normally that's a
very short list.  With -C, you need to compare all
added/removed/modified files with all others, which is slightly more
work.  With --find-copies-harder, it becomes potentially a *lot* of
work.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 19:23   ` Ron Garret
  2010-02-03 19:47     ` Avery Pennarun
@ 2010-02-03 19:53     ` Nicolas Pitre
  2010-02-03 20:27       ` Ron Garret
  1 sibling, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2010-02-03 19:53 UTC (permalink / raw)
  To: Ron Garret; +Cc: git

On Wed, 3 Feb 2010, Ron Garret wrote:

> So... how *does* git decide when two blobs are different blobs and when 
> they are the same blob with mods?  I asked this question before and was 
> pointed to the diffcore docs, but that didn't really clear things up.  
> That just describes all the different ways git can do diffs, not the 
> actual heuristics that git uses to track content.

Yes, those same heuristics are used to make the decision.

|The second transformation in the chain is diffcore-break, and is
|controlled by the -B option to the 'git diff-{asterisk}' commands.  
|This is used to detect a filepair that represents "complete rewrite" 
|and break such filepair into two filepairs that represent delete and
|create.
|[...]

|This transformation is used to detect renames and copies, and is
|controlled by the -M option (to detect renames) and the -C option
|(to detect copies as well) to the 'git diff-{asterisk}' commands.  
|[...]

Note that you may use the -B, -C, -M and --find-copies-harder arguments 
with log as well as diff commands even if there is no actual diff 
output.  So the explanation is really in that document even if simple 
rename detection is concerned only by a fraction of what is said there.

And Git can detect copied files too.

Those semantics are not stored in the repository so they can be improved 
or even changed after the facts.

Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 18:48 ` Avery Pennarun
  2010-02-03 19:23   ` Ron Garret
@ 2010-02-03 20:12   ` Pete Harlan
  2010-02-03 20:34     ` Ron Garret
  1 sibling, 1 reply; 20+ messages in thread
From: Pete Harlan @ 2010-02-03 20:12 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Ron Garret, git

On 02/03/2010 10:48 AM, Avery Pennarun wrote:
>> [ron@mickey:~/devel/gittest]$ git mv file2 file3
>> [ron@mickey:~/devel/gittest]$ git commit -m 'letters->numbers'
>> [master ae3f6d4] letters->numbers
>>  1 files changed, 0 insertions(+), 0 deletions(-)
>>  rename file2 => file3 (100%)
> 
> Whoops.  You didn't 'git add file2' (before the mv) or 'git add file3'
> (after the mv), or use commit -a, so what you've committed is the
> *old* content of file2 under the name file3.  The *new* content of
> file2 is still uncommitted in your work tree under the name file3.

It may be reasonable for "git mv foo bar" to print a helpful message to
the user if foo has un-checked-in changes, similarly to what "git rm" does.

Unlike "git rm", "git mv" could still perform the operation even without
"-f", but the semantics of "git mv" differ enough from plain "mv" that a
short blurb from Git in that case might help.

--Pete

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 19:53     ` Nicolas Pitre
@ 2010-02-03 20:27       ` Ron Garret
  2010-02-03 20:31         ` Ron Garret
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Ron Garret @ 2010-02-03 20:27 UTC (permalink / raw)
  To: git

In article <alpine.LFD.2.00.1002031436490.1681@xanadu.home>,
 Nicolas Pitre <nico@fluxnic.net> wrote:

> On Wed, 3 Feb 2010, Ron Garret wrote:
> 
> > So... how *does* git decide when two blobs are different blobs and when 
> > they are the same blob with mods?  I asked this question before and was 
> > pointed to the diffcore docs, but that didn't really clear things up.  
> > That just describes all the different ways git can do diffs, not the 
> > actual heuristics that git uses to track content.
> 
> Yes, those same heuristics are used to make the decision.
> 
> |The second transformation in the chain is diffcore-break, and is
> |controlled by the -B option to the 'git diff-{asterisk}' commands.  
> |This is used to detect a filepair that represents "complete rewrite" 
> |and break such filepair into two filepairs that represent delete and
> |create.
> |[...]
> 
> |This transformation is used to detect renames and copies, and is
> |controlled by the -M option (to detect renames) and the -C option
> |(to detect copies as well) to the 'git diff-{asterisk}' commands.  
> |[...]
> 
> Note that you may use the -B, -C, -M and --find-copies-harder arguments 
> with log as well as diff commands even if there is no actual diff 
> output.  So the explanation is really in that document even if simple 
> rename detection is concerned only by a fraction of what is said there.
> 
> And Git can detect copied files too.
> 
> Those semantics are not stored in the repository so they can be improved 
> or even changed after the facts.

OK, on closer reading I see that the information is there, but it's well 
hidden :-)  (For example, the -M option takes an optional numerical 
argument so you can tweak how much similarity is needed to be considered 
a move.  But the docs for git log don't mention this.  It's buried deep 
in the git diffcore docs.  But yes, it's there.)

So I think I'm beginning to understand how this works, but that leads me 
to another question: it seems to me that there are potential screw cases 
for this purely content-based system of tracking files.  For example, 
suppose I have a directory full of sample config files, all of which are 
similar to each other.  Will that cause diffcore to get confused?

Feel free to treat that as a rhetorical question because obviously I can 
(and probably should) get the answer by trying it.

Thanks!
rg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 19:47     ` Avery Pennarun
@ 2010-02-03 20:30       ` Ron Garret
  0 siblings, 0 replies; 20+ messages in thread
From: Ron Garret @ 2010-02-03 20:30 UTC (permalink / raw)
  To: git

In article 
<32541b131002031147r367ee08fxc64c4c54165953a3@mail.gmail.com>,
 Avery Pennarun <apenwarr@gmail.com> wrote:

> On Wed, Feb 3, 2010 at 2:23 PM, Ron Garret <ron1@flownet.com> wrote:
> > In article
> > Ah.  That explains everything.  Thanks.  (I thought git mv was
> > equivalent to git rm followed by git add.  But it's not.)
> 
> I suppose in this case it's not.  The only difference is when your
> work tree differs from your index, though, and it's to be expected
> that 'git rm', in removing things from the index, would lose your
> ability to track those differences.
> 
> > So... how *does* git decide when two blobs are different blobs and when
> > they are the same blob with mods?  I asked this question before and was
> > pointed to the diffcore docs, but that didn't really clear things up.
> > That just describes all the different ways git can do diffs, not the
> > actual heuristics that git uses to track content.
> 
> If you really want to know the details, looking at the code really is
> probably the best solution; it's not even that long.
> 
> The short version is that git chooses a set of candidate blobs, then
> diffs them and figures out a percentage similarity between each pair.
> (A simple way to think of the similarity index is "how long is the
> diff compared to the file itself?"  If the diff is of length zero, the
> similarity is 100%, and so on.) If the similarity is greater than a
> certain threshold, then it's considered to be the same file.
> 
> Choosing the set of candidates is actually the more interesting
> problem, since detecting moves using the above algorithm is O(n^2)
> with the number of candidates.  That's why 'git diff' and 'git log'
> don't do it at all by default.
> 
> If you provide -M, the set of candidates is the set of files that were
> removed/modified and the set of files that were added.  (Added files
> are compared against removed/modified files, iirc.)  Normally that's a
> very short list.  With -C, you need to compare all
> added/removed/modified files with all others, which is slightly more
> work.  With --find-copies-harder, it becomes potentially a *lot* of
> work.

Thanks!  That clarifies a lot.

rg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 20:27       ` Ron Garret
@ 2010-02-03 20:31         ` Ron Garret
  2010-02-03 20:40         ` Avery Pennarun
  2010-02-03 20:44         ` Nicolas Pitre
  2 siblings, 0 replies; 20+ messages in thread
From: Ron Garret @ 2010-02-03 20:31 UTC (permalink / raw)
  To: git

In article <ron1-34F9C6.12273203022010@news.gmane.org>,
 Ron Garret <ron1@flownet.com> wrote:

> In article <alpine.LFD.2.00.1002031436490.1681@xanadu.home>,
>  Nicolas Pitre <nico@fluxnic.net> wrote:
> 
> > On Wed, 3 Feb 2010, Ron Garret wrote:
> > 
> > > So... how *does* git decide when two blobs are different blobs and when 
> > > they are the same blob with mods?  I asked this question before and was 
> > > pointed to the diffcore docs, but that didn't really clear things up.  
> > > That just describes all the different ways git can do diffs, not the 
> > > actual heuristics that git uses to track content.
> > 
> > Yes, those same heuristics are used to make the decision.
> > 
> > |The second transformation in the chain is diffcore-break, and is
> > |controlled by the -B option to the 'git diff-{asterisk}' commands.  
> > |This is used to detect a filepair that represents "complete rewrite" 
> > |and break such filepair into two filepairs that represent delete and
> > |create.
> > |[...]
> > 
> > |This transformation is used to detect renames and copies, and is
> > |controlled by the -M option (to detect renames) and the -C option
> > |(to detect copies as well) to the 'git diff-{asterisk}' commands.  
> > |[...]
> > 
> > Note that you may use the -B, -C, -M and --find-copies-harder arguments 
> > with log as well as diff commands even if there is no actual diff 
> > output.  So the explanation is really in that document even if simple 
> > rename detection is concerned only by a fraction of what is said there.
> > 
> > And Git can detect copied files too.
> > 
> > Those semantics are not stored in the repository so they can be improved 
> > or even changed after the facts.
> 
> OK, on closer reading I see that the information is there, but it's well 
> hidden :-)  (For example, the -M option takes an optional numerical 
> argument so you can tweak how much similarity is needed to be considered 
> a move.  But the docs for git log don't mention this.  It's buried deep 
> in the git diffcore docs.  But yes, it's there.)
> 
> So I think I'm beginning to understand how this works, but that leads me 
> to another question: it seems to me that there are potential screw cases 
> for this purely content-based system of tracking files.  For example, 
> suppose I have a directory full of sample config files, all of which are 
> similar to each other.  Will that cause diffcore to get confused?
> 
> Feel free to treat that as a rhetorical question because obviously I can 
> (and probably should) get the answer by trying it.

Actually, I think the answer is in Avery's post in another branch of 
this thread.

rg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 20:12   ` Pete Harlan
@ 2010-02-03 20:34     ` Ron Garret
  2010-02-03 21:12       ` [PATCH] Documentation: clarify git-mv behaviour wrt dirty files Thomas Rast
  0 siblings, 1 reply; 20+ messages in thread
From: Ron Garret @ 2010-02-03 20:34 UTC (permalink / raw)
  To: git

In article <4B69D897.2060908@pcharlan.com>,
 Pete Harlan <pgit@pcharlan.com> wrote:

> On 02/03/2010 10:48 AM, Avery Pennarun wrote:
> >> [ron@mickey:~/devel/gittest]$ git mv file2 file3
> >> [ron@mickey:~/devel/gittest]$ git commit -m 'letters->numbers'
> >> [master ae3f6d4] letters->numbers
> >>  1 files changed, 0 insertions(+), 0 deletions(-)
> >>  rename file2 => file3 (100%)
> > 
> > Whoops.  You didn't 'git add file2' (before the mv) or 'git add file3'
> > (after the mv), or use commit -a, so what you've committed is the
> > *old* content of file2 under the name file3.  The *new* content of
> > file2 is still uncommitted in your work tree under the name file3.
> 
> It may be reasonable for "git mv foo bar" to print a helpful message to
> the user if foo has un-checked-in changes, similarly to what "git rm" does.
> 
> Unlike "git rm", "git mv" could still perform the operation even without
> "-f", but the semantics of "git mv" differ enough from plain "mv" that a
> short blurb from Git in that case might help.

I think that a simple tweak to the docs would be enough.  Right now it 
says:

"The index is updated after successful completion, but the change must 
still be committed."

I'm pretty sure I would have been less confused if it had said something 
like:

"The index is updated to reflect the new name of the file, but NOT any 
new content that file may contain.  Changed content must be added to the 
index separately with git add, and all changes must still be commited."

rg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 20:27       ` Ron Garret
  2010-02-03 20:31         ` Ron Garret
@ 2010-02-03 20:40         ` Avery Pennarun
  2010-02-03 22:33           ` Ron Garret
  2010-02-03 20:44         ` Nicolas Pitre
  2 siblings, 1 reply; 20+ messages in thread
From: Avery Pennarun @ 2010-02-03 20:40 UTC (permalink / raw)
  To: Ron Garret; +Cc: git

On Wed, Feb 3, 2010 at 3:27 PM, Ron Garret <ron1@flownet.com> wrote:
> So I think I'm beginning to understand how this works, but that leads me
> to another question: it seems to me that there are potential screw cases
> for this purely content-based system of tracking files.  For example,
> suppose I have a directory full of sample config files, all of which are
> similar to each other.  Will that cause diffcore to get confused?

Cases like that are always confusing, even to humans.  Person A
renames X to Y, but at the same time creates Z which is almost
identical.  Person B patches X, then merges in person A's changes.

What do you expect to happen?  Should Y be changed, because that's the
file X was moved from?  Or should we change Z, because it's almost the
same content anyway?  Or maybe we should change both, since a change
to the old X is probably intended to affect the copied *content* that
ended up in both Y and Z?

Simply storing whether person A has renamed vs. copied vs. added a
file makes the answer to the "what do you expect to happen" question
more obvious, but fails to answer the "what *should* happen" question.
 Thus it's more of a distraction than a feature.  It took a while for
me to accept this, but once I did, I realized that git's behaviour has
still never caused me a problem in real life, despite repeated file
renames and complicated merges.

In contrast, svn's explicit rename tracking has shot me in the foot
numerous times.  (svn remembers when I delete file X and then
subsequently re-add it with the same content.  So if I merge in
someone's change to the *old* file X, it barfs because omg omg that's
a totally different file X and it can't possibly figure out what to
do.  Gee, thanks.  It's also hopelessly incompetent at handling
"renames" in which a newbie developer didn't know to use svn mv, but
instead used svn rm, mv, and svn add.)

Have fun,

Avery

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 20:27       ` Ron Garret
  2010-02-03 20:31         ` Ron Garret
  2010-02-03 20:40         ` Avery Pennarun
@ 2010-02-03 20:44         ` Nicolas Pitre
  2 siblings, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2010-02-03 20:44 UTC (permalink / raw)
  To: Ron Garret; +Cc: git

On Wed, 3 Feb 2010, Ron Garret wrote:

> OK, on closer reading I see that the information is there, but it's well 
> hidden :-)  (For example, the -M option takes an optional numerical 
> argument so you can tweak how much similarity is needed to be considered 
> a move.  But the docs for git log don't mention this.  It's buried deep 
> in the git diffcore docs.  But yes, it's there.)

The doc is indeed not perfect.  Probably the -M option and friends could 
be listed again in the git-log and git-diff pages with a more casual 
explanation.

> So I think I'm beginning to understand how this works, but that leads me 
> to another question: it seems to me that there are potential screw cases 
> for this purely content-based system of tracking files.  For example, 
> suppose I have a directory full of sample config files, all of which are 
> similar to each other.  Will that cause diffcore to get confused?

There are ways to fool the heuristics indeed.  But overall it is still 
more reliable than manually having to record the rename into the tool 
since humans are known for screwing these things up more often than 
machines.  And again the heuristics can be modified after the fact if 
needed, unlike the manually recorded false renames (or lack of rename 
record) which will remain wrong unless another manual correction is 
applied to the database.

Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] Documentation: clarify git-mv behaviour wrt dirty files
  2010-02-03 20:34     ` Ron Garret
@ 2010-02-03 21:12       ` Thomas Rast
  2010-02-03 21:56         ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Rast @ 2010-02-03 21:12 UTC (permalink / raw)
  To: git; +Cc: Ron Garret, Avery Pennarun, Pete Harlan

Clearly point out that the rename happens separately for worktree and
index.  This confused users, as they are apparently told that git-mv
== git-rm && mv && git-add, which it is not.

While there, move the synposis to the synopsis section, which so far
was rather useless, and reword the first sentence to eliminate the
mentions of 'script'.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---

Ron, please don't drop the Cc lists, it's customary around here to Cc
everyone involved so far.

On Wednesday 03 February 2010 21:34:05 you wrote:
> In article <4B69D897.2060908@pcharlan.com>,
>  Pete Harlan <pgit@pcharlan.com> wrote:
> > Unlike "git rm", "git mv" could still perform the operation even without
> > "-f", but the semantics of "git mv" differ enough from plain "mv" that a
> > short blurb from Git in that case might help.
> 
> I think that a simple tweak to the docs would be enough.  Right now it 
> says:
> 
> "The index is updated after successful completion, but the change must 
> still be committed."
> 
> I'm pretty sure I would have been less confused if it had said something 
> like:
> 
> "The index is updated to reflect the new name of the file, but NOT any 
> new content that file may contain.  Changed content must be added to the 
> index separately with git add, and all changes must still be commited."

How about this change instead, which formulates it in terms of what
does happen, instead of what does not.

BTW, I'm wondering whether the "move or rename" distinction is really
worth it.  Does the user care?  I always figured it was a technical
detail whether rename() works or you actually need to move anything.


 Documentation/git-mv.txt |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-mv.txt b/Documentation/git-mv.txt
index bdcb585..eff11b7 100644
--- a/Documentation/git-mv.txt
+++ b/Documentation/git-mv.txt
@@ -8,22 +8,22 @@ git-mv - Move or rename a file, a directory, or a symlink
 
 SYNOPSIS
 --------
-'git mv' <options>... <args>...
+'git mv' [-f] [-n] <source> <destination>
+'git mv' [-f] [-n] [-k] <source>... <destination directory>
 
 DESCRIPTION
 -----------
-This script is used to move or rename a file, directory or symlink.
-
- git mv [-f] [-n] <source> <destination>
- git mv [-f] [-n] [-k] <source> ... <destination directory>
+'git-mv' renames files, directories, and symlinks in worktree and
+index.
 
 In the first form, it renames <source>, which must exist and be either
 a file, symlink or directory, to <destination>.
 In the second form, the last argument has to be an existing
 directory; the given sources will be moved into this directory.
 
-The index is updated after successful completion, but the change must still be
-committed.
+For every renamed file or symlink, the worktree and index contents are
+renamed separately, preserving both staged and unstaged changes.  You
+will still have to commit the rename.
 
 OPTIONS
 -------
-- 
1.7.0.rc1.166.g7cae7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Documentation: clarify git-mv behaviour wrt dirty files
  2010-02-03 21:12       ` [PATCH] Documentation: clarify git-mv behaviour wrt dirty files Thomas Rast
@ 2010-02-03 21:56         ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2010-02-03 21:56 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Ron Garret, Avery Pennarun, Pete Harlan

Thomas Rast <trast@student.ethz.ch> writes:

> Clearly point out that the rename happens separately for worktree and
> index.  This confused users, as they are apparently told that git-mv
> == git-rm && mv && git-add, which it is not.

I may be confused too as I had to read these three lines three times and I
do not think these two sentences mesh well together.

What happens with "git mv A B" is that it moves a work tree file A to B
and moves the index entry for A to B, hence all of:

 (1) the fact that you do not have A anymore;

 (2) the fact that you now have B instead; and

 (3) the fact that your work tree file B (which used to be A) has changes
     from its corresponding index entry

are _consistently_ kept between the work tree and the index.

I don't think "happens separately for" makes sense.  At best, it is an
implementation detail that doesn't help users understand what the command
does and what it is used for better.

Of course, it is different from

    "git rm -f --cached A && mv A B && git add B"

which would add changes that you were not prepared to add (i.e. you had
output from "git diff A" before you started).  I think that was a buggy
way old scripted version of "git mv" used to work, by the way.

> While there, move the synposis to the synopsis section, which so far
> was rather useless, and reword the first sentence to eliminate the
> mentions of 'script'.

That's a good change regardless.

> +For every renamed file or symlink, the worktree and index contents are
> +renamed separately, preserving both staged and unstaged changes....

I'd just say:

    While renaming paths, changes in the files in the work tree that you
    have not added are preserved.

> +....  You
> +will still have to commit the rename.

I don't understand why you want to say "You will still have to commit the
rename" here.  It is like saying in "git add" manpage that "You will still
have to commit the added contents" because "add" only affects the index
and does not make a commit.  Drop it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 20:40         ` Avery Pennarun
@ 2010-02-03 22:33           ` Ron Garret
  2010-02-03 23:18             ` Avery Pennarun
  2010-02-04  0:48             ` Junio C Hamano
  0 siblings, 2 replies; 20+ messages in thread
From: Ron Garret @ 2010-02-03 22:33 UTC (permalink / raw)
  To: git

In article 
<32541b131002031240p6b67536ame6b69c6d662a7968@mail.gmail.com>,
 Avery Pennarun <apenwarr@gmail.com> wrote:

> On Wed, Feb 3, 2010 at 3:27 PM, Ron Garret <ron1@flownet.com> wrote:
> > So I think I'm beginning to understand how this works, but that leads me
> > to another question: it seems to me that there are potential screw cases
> > for this purely content-based system of tracking files.  For example,
> > suppose I have a directory full of sample config files, all of which are
> > similar to each other.  Will that cause diffcore to get confused?
> 
> Cases like that are always confusing, even to humans.  Person A
> renames X to Y, but at the same time creates Z which is almost
> identical.  Person B patches X, then merges in person A's changes.
> 
> What do you expect to happen?  Should Y be changed, because that's the
> file X was moved from?  Or should we change Z, because it's almost the
> same content anyway?  Or maybe we should change both, since a change
> to the old X is probably intended to affect the copied *content* that
> ended up in both Y and Z?
> 
> Simply storing whether person A has renamed vs. copied vs. added a
> file makes the answer to the "what do you expect to happen" question
> more obvious, but fails to answer the "what *should* happen" question.
>  Thus it's more of a distraction than a feature.  It took a while for
> me to accept this, but once I did, I realized that git's behaviour has
> still never caused me a problem in real life, despite repeated file
> renames and complicated merges.
> 
> In contrast, svn's explicit rename tracking has shot me in the foot
> numerous times.  (svn remembers when I delete file X and then
> subsequently re-add it with the same content.  So if I merge in
> someone's change to the *old* file X, it barfs because omg omg that's
> a totally different file X and it can't possibly figure out what to
> do.  Gee, thanks.  It's also hopelessly incompetent at handling
> "renames" in which a newbie developer didn't know to use svn mv, but
> instead used svn rm, mv, and svn add.)

Here's a realistic case where keeping explicit track of renames could be 
useful.

A and B start with a file named config.  A and B both make edits.  In 
addition, B renames config to be config1 and creates a new, very similar 
file called config2.  B then merges from A with the expectation that B's 
edits to config would end up in config1 and not config2.  It seems to me 
that without tracking renames, it would be luck of the draw which file 
the patch got applied to.

rg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 22:33           ` Ron Garret
@ 2010-02-03 23:18             ` Avery Pennarun
  2010-02-03 23:55               ` Jay Soffian
  2010-02-04  0:10               ` Ron Garret
  2010-02-04  0:48             ` Junio C Hamano
  1 sibling, 2 replies; 20+ messages in thread
From: Avery Pennarun @ 2010-02-03 23:18 UTC (permalink / raw)
  To: Ron Garret; +Cc: git

On Wed, Feb 3, 2010 at 5:33 PM, Ron Garret <ron1@flownet.com> wrote:
> Here's a realistic case where keeping explicit track of renames could be
> useful.
>
> A and B start with a file named config.  A and B both make edits.  In
> addition, B renames config to be config1 and creates a new, very similar
> file called config2.  B then merges from A with the expectation that B's
> edits to config would end up in config1 and not config2.  It seems to me
> that without tracking renames, it would be luck of the draw which file
> the patch got applied to.

The problem is that this single "realistic case" is not actually very
common, and it's dwarfed by the other realistic cases: developer
forgets to use 'git mv' to rename the file; developer accidentally
deletes a file, commits, and then readds it later; etc.

Have I been bitten by exactly your example?  Yup.  But I've been
bitten by lots of other related things too, and explicit rename
tracking (at least in svn) has quite frequently made the problems
*worse*.  In my personal experience, git screws up less often.  The
fact that it's also elegant is a nice bonus too :)

More about this: http://marc.info/?l=git&m=114123702826251

Have fun,

Avery

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 23:18             ` Avery Pennarun
@ 2010-02-03 23:55               ` Jay Soffian
  2010-02-04  0:10                 ` Ron Garret
  2010-02-04  0:10               ` Ron Garret
  1 sibling, 1 reply; 20+ messages in thread
From: Jay Soffian @ 2010-02-03 23:55 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Ron Garret, git

On Wed, Feb 3, 2010 at 6:18 PM, Avery Pennarun <apenwarr@gmail.com> wrote:
> More about this: http://marc.info/?l=git&m=114123702826251

I think the canonical email on the subject is this one:

http://article.gmane.org/gmane.comp.version-control.git/217

:-)

j.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 23:55               ` Jay Soffian
@ 2010-02-04  0:10                 ` Ron Garret
  0 siblings, 0 replies; 20+ messages in thread
From: Ron Garret @ 2010-02-04  0:10 UTC (permalink / raw)
  To: git

In article 
<76718491002031555i2c1558f9qe0c97d07ceb86bb6@mail.gmail.com>,
 Jay Soffian <jaysoffian@gmail.com> wrote:

> On Wed, Feb 3, 2010 at 6:18 PM, Avery Pennarun <apenwarr@gmail.com> wrote:
> > More about this: http://marc.info/?l=git&m=114123702826251
> 
> I think the canonical email on the subject is this one:
> 
> http://article.gmane.org/gmane.comp.version-control.git/217
> 

The upshot seems to be this:

> And that "where did this come from" decision should be done at _search_ 
> time, not commit time.

And I'm mostly convinced, except for the one screw case that I outlined 
above.  In that case the search-time result is ambiguous, and file 
tracking information could be used to resolve the ambiguity.  But it 
certainly does seem like a rare enough situation that it's not worth 
worrying about.

I think I'm starting to git it :-)

rg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 23:18             ` Avery Pennarun
  2010-02-03 23:55               ` Jay Soffian
@ 2010-02-04  0:10               ` Ron Garret
  1 sibling, 0 replies; 20+ messages in thread
From: Ron Garret @ 2010-02-04  0:10 UTC (permalink / raw)
  To: git

In article 
<32541b131002031518t1017d351xcf9071f0a937474e@mail.gmail.com>,
 Avery Pennarun <apenwarr@gmail.com> wrote:

> On Wed, Feb 3, 2010 at 5:33 PM, Ron Garret <ron1@flownet.com> wrote:
> > Here's a realistic case where keeping explicit track of renames could be
> > useful.
> >
> > A and B start with a file named config.  A and B both make edits.  In
> > addition, B renames config to be config1 and creates a new, very similar
> > file called config2.  B then merges from A with the expectation that B's
> > edits to config would end up in config1 and not config2.  It seems to me
> > that without tracking renames, it would be luck of the draw which file
> > the patch got applied to.
> 
> The problem is that this single "realistic case" is not actually very
> common, and it's dwarfed by the other realistic cases: developer
> forgets to use 'git mv' to rename the file; developer accidentally
> deletes a file, commits, and then readds it later; etc.

Makes sense.

> Have I been bitten by exactly your example?  Yup.  But I've been
> bitten by lots of other related things too, and explicit rename
> tracking (at least in svn) has quite frequently made the problems
> *worse*.  In my personal experience, git screws up less often.  The
> fact that it's also elegant is a nice bonus too :)
> 
> More about this: http://marc.info/?l=git&m=114123702826251

Thanks, that's a great read!

rg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: git-mv redux: there must be something else going on
  2010-02-03 22:33           ` Ron Garret
  2010-02-03 23:18             ` Avery Pennarun
@ 2010-02-04  0:48             ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2010-02-04  0:48 UTC (permalink / raw)
  To: Ron Garret; +Cc: git

Ron Garret <ron1@flownet.com> writes:

> A and B start with a file named config.  A and B both make edits.  In 
> addition, B renames config to be config1 and creates a new, very similar 
> file called config2.  B then merges from A with the expectation that B's 
> edits to config would end up in config1 and not config2.  It seems to me 
> that without tracking renames, it would be luck of the draw which file 
> the patch got applied to.

I don't think the above is necessarily "rename" issue, but touches an
interesting point -- it is so "interesting" to the point that no sane SCM
would even consider that is a problem they need to solve.

If config1 and config2 are about two different ways to configure the
software (e.g. two different build for different customers), and change
made by A was to accomodate new configuration option made in the upstream,
B might even want to have that addition reflected in _both_ of his
configuration files, config1 and config2.

Earlier in this message, I said that this is not an issue SCM should even
be solving, because a sane way to handle this would _not_ be to copy and
edit config1/config2 and keep track of them in SCM; instead, saner people
would maintain a build procedure (e.g. Makefile target) to transform the
template "config" into necessary "config1" and "config2" customized
variants.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-02-04  0:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-03 18:25 git-mv redux: there must be something else going on Ron Garret
2010-02-03 18:48 ` Avery Pennarun
2010-02-03 19:23   ` Ron Garret
2010-02-03 19:47     ` Avery Pennarun
2010-02-03 20:30       ` Ron Garret
2010-02-03 19:53     ` Nicolas Pitre
2010-02-03 20:27       ` Ron Garret
2010-02-03 20:31         ` Ron Garret
2010-02-03 20:40         ` Avery Pennarun
2010-02-03 22:33           ` Ron Garret
2010-02-03 23:18             ` Avery Pennarun
2010-02-03 23:55               ` Jay Soffian
2010-02-04  0:10                 ` Ron Garret
2010-02-04  0:10               ` Ron Garret
2010-02-04  0:48             ` Junio C Hamano
2010-02-03 20:44         ` Nicolas Pitre
2010-02-03 20:12   ` Pete Harlan
2010-02-03 20:34     ` Ron Garret
2010-02-03 21:12       ` [PATCH] Documentation: clarify git-mv behaviour wrt dirty files Thomas Rast
2010-02-03 21:56         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).