git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* detecting rename->commit->modify->commit
@ 2008-05-01 14:10 Ittay Dror
  2008-05-01 14:45 ` Jeff King
  2008-05-01 14:54 ` Ittay Dror
  0 siblings, 2 replies; 49+ messages in thread
From: Ittay Dror @ 2008-05-01 14:10 UTC (permalink / raw)
  To: git

Hi,

Say I have a file A, I rename to 'B', commit, then change file B and 
commit. Does 'git diff -M HEAD^^..' detect that? From what I see now, it 
will show 'B' as new (all of it with '+' prefix in the output). Am I right?

Thank you,
Ittay

-- 
Ittay Dror <ittayd@tikalk.com>
Tikal <http://www.tikalk.com>
Tikal Project <http://tikal.sourceforge.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 14:10 detecting rename->commit->modify->commit Ittay Dror
@ 2008-05-01 14:45 ` Jeff King
  2008-05-01 15:08   ` Ittay Dror
  2008-05-01 14:54 ` Ittay Dror
  1 sibling, 1 reply; 49+ messages in thread
From: Jeff King @ 2008-05-01 14:45 UTC (permalink / raw)
  To: Ittay Dror; +Cc: git

On Thu, May 01, 2008 at 05:10:24PM +0300, Ittay Dror wrote:

> Say I have a file A, I rename to 'B', commit, then change file B and  
> commit. Does 'git diff -M HEAD^^..' detect that? From what I see now, it  
> will show 'B' as new (all of it with '+' prefix in the output). Am I 
> right?

Yes, it should find it, assuming the changes to B leave it recognizable.
Try:

  mkdir repo && cd repo && git init
  cp /usr/share/dict/words A
  git add . && git commit -m added
  mv A B && git add B && git commit -a -m rename
  echo change >>B && git commit -a -m change
  git diff -M HEAD^^.. | head -n 7

You should see something like:

  diff --git a/A b/B
  similarity index 99%
  rename from A
  rename to B
  index 8e50f11..6525618 100644
  --- a/A
  +++ b/B

However, note the similarity index. If you change B so much that it
doesn't look close to the original A, then the rename is not detected
(and intentionally so -- the argument is that it is no longer a rename
in that context, but a rewritten file).

-Peff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 14:10 detecting rename->commit->modify->commit Ittay Dror
  2008-05-01 14:45 ` Jeff King
@ 2008-05-01 14:54 ` Ittay Dror
  2008-05-01 15:09   ` Jeff King
                     ` (2 more replies)
  1 sibling, 3 replies; 49+ messages in thread
From: Ittay Dror @ 2008-05-01 14:54 UTC (permalink / raw)
  To: git

Also, would anyone like to comment on: 
http://www.markshuttleworth.com/archives/123 (Renaming is the killer app 
of distributed version control 
<http://www.markshuttleworth.com/archives/123>)?

Thank you,
Ittay

Ittay Dror wrote:
> Hi,
>
> Say I have a file A, I rename to 'B', commit, then change file B and 
> commit. Does 'git diff -M HEAD^^..' detect that? From what I see now, 
> it will show 'B' as new (all of it with '+' prefix in the output). Am 
> I right?
>
> Thank you,
> Ittay
>

-- 
Ittay Dror <ittayd@tikalk.com>
Tikal <http://www.tikalk.com>
Tikal Project <http://tikal.sourceforge.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 14:45 ` Jeff King
@ 2008-05-01 15:08   ` Ittay Dror
  2008-05-01 15:20     ` Jeff King
  2008-05-01 15:24     ` Ittay Dror
  0 siblings, 2 replies; 49+ messages in thread
From: Ittay Dror @ 2008-05-01 15:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git

But it doesn't work across directories :-(.

Try:
 >mkdir foo
 >echo "hello" > foo/A
 >git add foo/A
 >git commit -m 'foo/A'
 >mkdir bar
 >git mv foo/A bar
 >git commit -m 'bar/A'
 >echo "world" >> bar/A
 >git add bar/A
 >git commit -m 'bar/A world'
 >git diff HEAD^^..HEAD^ | cat
diff --git a/foo/A b/bar/A
similarity index 100%
rename from foo/A
rename to bar/A
 > git diff HEAD^^.. | cat
diff --git a/bar/A b/bar/A
new file mode 100644
index 0000000..94954ab
--- /dev/null
+++ b/bar/A
@@ -0,0 +1,2 @@
+hello
+world
diff --git a/foo/A b/foo/A
deleted file mode 100644
index ce01362..0000000
--- a/foo/A
+++ /dev/null
@@ -1 +0,0 @@
-hello





Jeff King wrote:
> On Thu, May 01, 2008 at 05:10:24PM +0300, Ittay Dror wrote:
>
>   
>> Say I have a file A, I rename to 'B', commit, then change file B and  
>> commit. Does 'git diff -M HEAD^^..' detect that? From what I see now, it  
>> will show 'B' as new (all of it with '+' prefix in the output). Am I 
>> right?
>>     
>
> Yes, it should find it, assuming the changes to B leave it recognizable.
> Try:
>
>   mkdir repo && cd repo && git init
>   cp /usr/share/dict/words A
>   git add . && git commit -m added
>   mv A B && git add B && git commit -a -m rename
>   echo change >>B && git commit -a -m change
>   git diff -M HEAD^^.. | head -n 7
>
> You should see something like:
>
>   diff --git a/A b/B
>   similarity index 99%
>   rename from A
>   rename to B
>   index 8e50f11..6525618 100644
>   --- a/A
>   +++ b/B
>
> However, note the similarity index. If you change B so much that it
> doesn't look close to the original A, then the rename is not detected
> (and intentionally so -- the argument is that it is no longer a rename
> in that context, but a rewritten file).
>
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   

-- 
Ittay Dror <ittayd@tikalk.com>
Tikal <http://www.tikalk.com>
Tikal Project <http://tikal.sourceforge.net>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 14:54 ` Ittay Dror
@ 2008-05-01 15:09   ` Jeff King
  2008-05-01 15:20     ` Ittay Dror
  2008-05-01 15:30     ` David Tweed
  2008-05-01 15:27   ` Avery Pennarun
  2008-05-01 16:39   ` Sitaram Chamarty
  2 siblings, 2 replies; 49+ messages in thread
From: Jeff King @ 2008-05-01 15:09 UTC (permalink / raw)
  To: Ittay Dror; +Cc: git

On Thu, May 01, 2008 at 05:54:06PM +0300, Ittay Dror wrote:

> Also, would anyone like to comment on:  
> http://www.markshuttleworth.com/archives/123 (Renaming is the killer app  
> of distributed version control  
> <http://www.markshuttleworth.com/archives/123>)?

My two cents:

1. I think he is overly obsessed with renaming. He seems concerned that
somebody will show up, make a big renaming patch, and then break your
system. Guess what? They can also show up, make a big code change patch,
and then break your system. In either case you have to review the
changes before accepting them, and it is up to the version control
system to show you the changes in a way you can understand.

2. I see the same old "git developers decided renaming wasn't
important" argument. I think this is bogus. I think renaming _is_
important, but I actually prefer git's approach of deducing renames,
because it reflects a fundamental property of git: we track states, not
changes, and git doesn't care how you arrive at each state. So I am free
to use a combination of git commands, editors, patch application tools,
or anything else to get my tree to the right place.

3. He doesn't like that git doesn't track _directory_ renames. This is
not a fundamental problem with git's approach (which could deduce
directory renames after the fact), but rather comes from the fact that
directory renames are controversial. That is, even if you know (through
deduction or because an explicit rename was recorded) that "subdir1"
moved to "subdir2", that doesn't necessarily mean that new files added
into "subdir1" should make that move, as well.

-Peff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:09   ` Jeff King
@ 2008-05-01 15:20     ` Ittay Dror
  2008-05-01 15:30     ` David Tweed
  1 sibling, 0 replies; 49+ messages in thread
From: Ittay Dror @ 2008-05-01 15:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git


Jeff King wrote:
> My two cents:
>
> 1. I think he is overly obsessed with renaming. He seems concerned that
> somebody will show up, make a big renaming patch, and then break your
> system. Guess what? They can also show up, make a big code change patch,
> and then break your system. In either case you have to review the
> changes before accepting them, and it is up to the version control
> system to show you the changes in a way you can understand
I think he was more concerned that merges will break after such a change.

Ittay

-- 
Ittay Dror <ittayd@tikalk.com>
Tikal <http://www.tikalk.com>
Tikal Project <http://tikal.sourceforge.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:08   ` Ittay Dror
@ 2008-05-01 15:20     ` Jeff King
  2008-05-01 15:30       ` Ittay Dror
  2008-05-01 20:39       ` Teemu Likonen
  2008-05-01 15:24     ` Ittay Dror
  1 sibling, 2 replies; 49+ messages in thread
From: Jeff King @ 2008-05-01 15:20 UTC (permalink / raw)
  To: Ittay Dror; +Cc: git

On Thu, May 01, 2008 at 06:08:33PM +0300, Ittay Dror wrote:

> But it doesn't work across directories :-(.

Yes, it does.

> Try:
> >mkdir foo
> >echo "hello" > foo/A
> >git add foo/A
> >git commit -m 'foo/A'
> >mkdir bar
> >git mv foo/A bar
> >git commit -m 'bar/A'
> >echo "world" >> bar/A
> >git add bar/A
> >git commit -m 'bar/A world'
> >git diff HEAD^^..HEAD^ | cat
> diff --git a/foo/A b/bar/A
> similarity index 100%
> rename from foo/A
> rename to bar/A

See, it just worked across directories.

> > git diff HEAD^^.. | cat
> diff --git a/bar/A b/bar/A
> new file mode 100644
> index 0000000..94954ab
> --- /dev/null
> +++ b/bar/A
> @@ -0,0 +1,2 @@
> +hello
> +world
> diff --git a/foo/A b/foo/A
> deleted file mode 100644
> index ce01362..0000000
> --- a/foo/A
> +++ /dev/null
> @@ -1 +0,0 @@
> -hello

Of course it doesn't work here. You have two files, one containing
"hello\n" and one containing "hello\nworld\n". Their similarity is 50%,
which is not enough to consider it a rename. And I would argue that's
reasonable, since the files have only one line in common. The problem is
that you are using a toy example (which is why my example used
/usr/share/dict/words, which has enough content to definitively call it
a rename).

...

Hmm, looking at the code, though, 50% is supposed to be the default
minimum. So there might actually be a bug.

-Peff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:08   ` Ittay Dror
  2008-05-01 15:20     ` Jeff King
@ 2008-05-01 15:24     ` Ittay Dror
  2008-05-01 15:28       ` Jeff King
  1 sibling, 1 reply; 49+ messages in thread
From: Ittay Dror @ 2008-05-01 15:24 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Btw, this happened to me in a real use case. I wanted to restructure a 
source tree. So I put it under git and started to happily move things 
around, always committing after a move. I thought that git will 
correctly identify these moves and show me the differences I made after 
(in a separate commit). But it doesn't, and now that I want to prepare a 
summary of the changes I've made, I'm stuck with a huge diff that is 
hard to make sense of.

Ittay

Ittay Dror wrote:
> But it doesn't work across directories :-(.
>
> Try:
> >mkdir foo
> >echo "hello" > foo/A
> >git add foo/A
> >git commit -m 'foo/A'
> >mkdir bar
> >git mv foo/A bar
> >git commit -m 'bar/A'
> >echo "world" >> bar/A
> >git add bar/A
> >git commit -m 'bar/A world'
> >git diff HEAD^^..HEAD^ | cat
> diff --git a/foo/A b/bar/A
> similarity index 100%
> rename from foo/A
> rename to bar/A
> > git diff HEAD^^.. | cat
> diff --git a/bar/A b/bar/A
> new file mode 100644
> index 0000000..94954ab
> --- /dev/null
> +++ b/bar/A
> @@ -0,0 +1,2 @@
> +hello
> +world
> diff --git a/foo/A b/foo/A
> deleted file mode 100644
> index ce01362..0000000
> --- a/foo/A
> +++ /dev/null
> @@ -1 +0,0 @@
> -hello
>
>
>
>
>
> Jeff King wrote:
>> On Thu, May 01, 2008 at 05:10:24PM +0300, Ittay Dror wrote:
>>
>>  
>>> Say I have a file A, I rename to 'B', commit, then change file B 
>>> and  commit. Does 'git diff -M HEAD^^..' detect that? From what I 
>>> see now, it  will show 'B' as new (all of it with '+' prefix in the 
>>> output). Am I right?
>>>     
>>
>> Yes, it should find it, assuming the changes to B leave it recognizable.
>> Try:
>>
>>   mkdir repo && cd repo && git init
>>   cp /usr/share/dict/words A
>>   git add . && git commit -m added
>>   mv A B && git add B && git commit -a -m rename
>>   echo change >>B && git commit -a -m change
>>   git diff -M HEAD^^.. | head -n 7
>>
>> You should see something like:
>>
>>   diff --git a/A b/B
>>   similarity index 99%
>>   rename from A
>>   rename to B
>>   index 8e50f11..6525618 100644
>>   --- a/A
>>   +++ b/B
>>
>> However, note the similarity index. If you change B so much that it
>> doesn't look close to the original A, then the rename is not detected
>> (and intentionally so -- the argument is that it is no longer a rename
>> in that context, but a rewritten file).
>>
>> -Peff
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe git" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>   
>

-- 
Ittay Dror <ittayd@tikalk.com>
Tikal <http://www.tikalk.com>
Tikal Project <http://tikal.sourceforge.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 14:54 ` Ittay Dror
  2008-05-01 15:09   ` Jeff King
@ 2008-05-01 15:27   ` Avery Pennarun
  2008-05-01 15:34     ` Jeff King
  2008-05-01 16:39   ` Sitaram Chamarty
  2 siblings, 1 reply; 49+ messages in thread
From: Avery Pennarun @ 2008-05-01 15:27 UTC (permalink / raw)
  To: Ittay Dror; +Cc: git

On 5/1/08, Ittay Dror <ittayd@tikalk.com> wrote:
> Also, would anyone like to comment on:
> http://www.markshuttleworth.com/archives/123 (Renaming is
> the killer app of distributed version control
> <http://www.markshuttleworth.com/archives/123>)?

One of the comments linked to this:
http://automatthias.wordpress.com/2007/06/07/directory-renaming-in-scm/

Which points out that git doesn't really handle directory renames at
all.  If someone creates file A/X then renames A to B, then merges
with someone who both added the file A/Y and modified A/X, git will
produce a tree containing (modified) B/Y and (new) A/Y.

Technically this is "correct" in that no data is lost and there are no
conflicts, but it is obviously not what was "intended", which was that
the new file Y should have ended up in folder B.

Before you say this is not a realistic use case, I've personally had
this exact problem:

- I had a project with all of my work in a folder "src"
- I decided that the 'src' folder was redundant, so I moved it all to
the root folder
- Someone else was working on an old maintenance branch which still had 'src'
- When I merged from that person, some new files were created under
'src', and of course didn't work.

Since the maintenance branch was long-lived, this problem happened
repeatedly.  That said, it's also pretty easy to work around, so it's
not the end of the world.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:24     ` Ittay Dror
@ 2008-05-01 15:28       ` Jeff King
  0 siblings, 0 replies; 49+ messages in thread
From: Jeff King @ 2008-05-01 15:28 UTC (permalink / raw)
  To: Ittay Dror; +Cc: git

On Thu, May 01, 2008 at 06:24:30PM +0300, Ittay Dror wrote:

> Btw, this happened to me in a real use case. I wanted to restructure a  
> source tree. So I put it under git and started to happily move things  
> around, always committing after a move. I thought that git will correctly 
> identify these moves and show me the differences I made after (in a 
> separate commit). But it doesn't, and now that I want to prepare a  
> summary of the changes I've made, I'm stuck with a huge diff that is hard 
> to make sense of.

If you have a specific case where you think renames should have been
detected but they weren't, by all means, please share it. It's possible
that there is a bug in the rename detection, or that the limits are not
set correctly, and we could improve it.

-Peff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:09   ` Jeff King
  2008-05-01 15:20     ` Ittay Dror
@ 2008-05-01 15:30     ` David Tweed
  1 sibling, 0 replies; 49+ messages in thread
From: David Tweed @ 2008-05-01 15:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Ittay Dror, git

On Thu, May 1, 2008 at 4:09 PM, Jeff King <peff@peff.net> wrote:
> On Thu, May 01, 2008 at 05:54:06PM +0300, Ittay Dror wrote:
>
>  > Also, would anyone like to comment on:
>  > http://www.markshuttleworth.com/archives/123 (Renaming is the killer app
>  > of distributed version control
>  > <http://www.markshuttleworth.com/archives/123>)?

I'll just make the obvious point that he's talking about a problem and
an underlying cause:

The problem is not being able to successfully merge branches as time
goes by when one branch has had some renaming. He's decided the root
cause is not have an explicit representation of renames which would
enable the merges to succeed. So there are two questions:

1. Does development often happen where files get renamed and then
modified significantly in a distributed fashion but it is still
sensible to automatically merge the results?

2. Do you need explicit rename tracking to do an automatic merge in those cases?

I suspect that for 2 you don't in theory but considering all the
non-obvious possibilities would slow down the normal case of a
standard merge.

-- 
cheers, dave tweed__________________________
david.tweed@gmail.com
Rm 124, School of Systems Engineering, University of Reading.
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:20     ` Jeff King
@ 2008-05-01 15:30       ` Ittay Dror
  2008-05-01 15:38         ` Jeff King
  2008-05-01 15:47         ` Jakub Narebski
  2008-05-01 20:39       ` Teemu Likonen
  1 sibling, 2 replies; 49+ messages in thread
From: Ittay Dror @ 2008-05-01 15:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:
> Of course it doesn't work here. You have two files, one containing
> "hello\n" and one containing "hello\nworld\n". Their similarity is 50%,
> which is not enough to consider it a rename. And I would argue that's
> reasonable, since the files have only one line in common. The problem is
> that you are using a toy example (which is why my example used
> /usr/share/dict/words, which has enough content to definitively call it
> a rename).
>
>   
Well, I would have expected git to notice that the file was renamed in 
one commit and keep tracking changes afterwards.

Also, as I wrote in another post, this happened to me with real files of 
a real source tree, and with very small changes (and sometimes not at 
all) to these files.

Ittay

-- 
Ittay Dror <ittayd@tikalk.com>
Tikal <http://www.tikalk.com>
Tikal Project <http://tikal.sourceforge.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:27   ` Avery Pennarun
@ 2008-05-01 15:34     ` Jeff King
  2008-05-01 15:50       ` Avery Pennarun
  2008-05-01 19:12       ` Steven Grimm
  0 siblings, 2 replies; 49+ messages in thread
From: Jeff King @ 2008-05-01 15:34 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Ittay Dror, git

On Thu, May 01, 2008 at 11:27:34AM -0400, Avery Pennarun wrote:

> Before you say this is not a realistic use case, I've personally had
> this exact problem:
> 
> - I had a project with all of my work in a folder "src"
> - I decided that the 'src' folder was redundant, so I moved it all to
> the root folder
> - Someone else was working on an old maintenance branch which still had 'src'
> - When I merged from that person, some new files were created under
> 'src', and of course didn't work.

Sure. But we've also had the exact case of:

  - there are some files in subdir/, but that is not a good name, and
    there is something else that you are going to add that would be
    better named as subdir/.
  - you rename subdir/ to bettername/
  - you create subdir/newfile

but you _don't_ want newfile to go into bettername/. It's _replacing_
what went into bettername/.

So I don't think you can always track the intent automatically.

Though if you could specify the intent to the SCM, you could
differentiate at the time of move between these two cases, and the merge
could do the right thing later. Or alternatively, you could specify at
time of merge which to do.  It's just that nobody has implemented it.

-Peff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:30       ` Ittay Dror
@ 2008-05-01 15:38         ` Jeff King
  2008-05-01 15:47         ` Jakub Narebski
  1 sibling, 0 replies; 49+ messages in thread
From: Jeff King @ 2008-05-01 15:38 UTC (permalink / raw)
  To: Ittay Dror; +Cc: git

On Thu, May 01, 2008 at 06:30:46PM +0300, Ittay Dror wrote:

> Well, I would have expected git to notice that the file was renamed in  
> one commit and keep tracking changes afterwards.

That's not how git works, and that's not what you asked it to do. You
gave it two states and asked it to diff between them. It never even
looked at the intermediate steps (and that's generally why git is so
fast). If you want to follow the history and look at every commit, then
that is something that _can_ be done, and does get done with things like
"git log --follow". But there is a diff mode currently implemented that
will crawl the history looking for interesting things.

-Peff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:30       ` Ittay Dror
  2008-05-01 15:38         ` Jeff King
@ 2008-05-01 15:47         ` Jakub Narebski
  1 sibling, 0 replies; 49+ messages in thread
From: Jakub Narebski @ 2008-05-01 15:47 UTC (permalink / raw)
  To: Ittay Dror; +Cc: Jeff King, git

Ittay Dror <ittayd@tikalk.com> writes:

> Jeff King wrote:
> >
> > Of course it doesn't work here. You have two files, one containing
> > "hello\n" and one containing "hello\nworld\n". Their similarity is 50%,
> > which is not enough to consider it a rename. And I would argue that's
> > reasonable, since the files have only one line in common. The problem is
> > that you are using a toy example (which is why my example used
> > /usr/share/dict/words, which has enough content to definitively call it
> > a rename).
> >
> >
> Well, I would have expected git to notice that the file was renamed in
> one commit and keep tracking changes afterwards.
> 
> Also, as I wrote in another post, this happened to me with real files
> of a real source tree, and with very small changes (and sometimes not
> at all) to these files.

The idea of rename detection is to help with merges.  If the files are
different enough that content based (similarity based) rename
detection doesn't detect rename, they are usually too different to
merge automatically anyway.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:34     ` Jeff King
@ 2008-05-01 15:50       ` Avery Pennarun
  2008-05-01 16:48         ` Jeff King
  2008-05-01 19:12       ` Steven Grimm
  1 sibling, 1 reply; 49+ messages in thread
From: Avery Pennarun @ 2008-05-01 15:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Ittay Dror, git

On 5/1/08, Jeff King <peff@peff.net> wrote:
> On Thu, May 01, 2008 at 11:27:34AM -0400, Avery Pennarun wrote:
>
>  > Before you say this is not a realistic use case, I've personally had
>  > this exact problem:
>  >
>  > - I had a project with all of my work in a folder "src"
>  > - I decided that the 'src' folder was redundant, so I moved it all to
>  > the root folder
>  > - Someone else was working on an old maintenance branch which still had 'src'
>  > - When I merged from that person, some new files were created under
>  > 'src', and of course didn't work.
>
>
> Sure. But we've also had the exact case of:
>
>   - there are some files in subdir/ [1], but that is not a good name, and
>     there is something else that you are going to add that would be
>     better named as subdir/.
>   - you rename subdir/ to bettername/ [2]
>   - you create subdir/newfile [3]
>
>  but you _don't_ want newfile to go into bettername/. It's _replacing_
>  what went into bettername/.

I would argue that this is a sort of "directory splitting" operation.
That is, all anyone ever did was add some files to a subdir/ that
already existed [1], *or* move all the files from subdir/ to a
previously-empty bettername/ [2], *or* create a new subdir/ and add
files to it [3]. In each case, no merge operation was necessary and it
is completely obvious by comparing "before and after" trees which case
it was.

I guess my argument here is just that it should be *possible* to
deduce and implement both cases at merge time just fine using git's
existing storage model.  It just hasn't been implemented yet.  (And
incidentally, I think that's totally awesome and I'd never want to go
back to an explicit rename tracking model.)

I should shut up now because the actual merge machinery scares me and
I'm not willing to volunteer to write a patch for this one :)

Have fun,

Avery

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 14:54 ` Ittay Dror
  2008-05-01 15:09   ` Jeff King
  2008-05-01 15:27   ` Avery Pennarun
@ 2008-05-01 16:39   ` Sitaram Chamarty
  2008-05-01 18:58     ` Ittay Dror
  2 siblings, 1 reply; 49+ messages in thread
From: Sitaram Chamarty @ 2008-05-01 16:39 UTC (permalink / raw)
  To: Ittay Dror; +Cc: git

On Thu, May 1, 2008 at 8:24 PM, Ittay Dror <ittayd@tikalk.com> wrote:
> Also, would anyone like to comment on:
> http://www.markshuttleworth.com/archives/123 (Renaming is the killer app of
> distributed version control <http://www.markshuttleworth.com/archives/123>)?

someone already did, albeit in just discussion form rather than
examples, in a comment on that same page:

http://www.markshuttleworth.com/archives/123#comment-118655

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:50       ` Avery Pennarun
@ 2008-05-01 16:48         ` Jeff King
  2008-05-01 19:45           ` Avery Pennarun
  0 siblings, 1 reply; 49+ messages in thread
From: Jeff King @ 2008-05-01 16:48 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Ittay Dror, git

On Thu, May 01, 2008 at 11:50:31AM -0400, Avery Pennarun wrote:

> I would argue that this is a sort of "directory splitting" operation.
> That is, all anyone ever did was add some files to a subdir/ that
> already existed [1], *or* move all the files from subdir/ to a
> previously-empty bettername/ [2], *or* create a new subdir/ and add
> files to it [3]. In each case, no merge operation was necessary and it
> is completely obvious by comparing "before and after" trees which case
> it was.

I don't see it. I think the steps are exactly the same as in your
example. Consider:

  1. You have some files in src/
  2. All of the files from src/ get moved away
  3. You merge in somebody else's work which adds a file in src/, but
     their work is based on a commit which predates 2.

The question is: if they had seen 2., would they have put the file into
src/, or into the new location? I think the answer depends on the
semantics of the file. If it is semantically an addition to the source
code that got moved, then yes. If it is a _replacement_ for the
source code that got moved, then no.

> I guess my argument here is just that it should be *possible* to
> deduce and implement both cases at merge time just fine using git's
> existing storage model.  It just hasn't been implemented yet.  (And
> incidentally, I think that's totally awesome and I'd never want to go
> back to an explicit rename tracking model.)

I think you lack information to decide automatically between the two
cases listed above. But I think in most cases it would be sufficient for
the tool to say "this directory seems to have moved, but this new file
was added in it" and let the user decide which makes sense.

> I should shut up now because the actual merge machinery scares me and
> I'm not willing to volunteer to write a patch for this one :)

It would probably start not with merge machinery, but with diff
machinery to detect "directory has moved". But that is also scary. :)

You could also do this totally _outside_ of git, similar to
git-mergetool. Wait until you get a conflict, and then run a script
which looks at the two endpoints and the merge base and says "Oh, maybe
this is a good way of resolving."

-Peff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 16:39   ` Sitaram Chamarty
@ 2008-05-01 18:58     ` Ittay Dror
  0 siblings, 0 replies; 49+ messages in thread
From: Ittay Dror @ 2008-05-01 18:58 UTC (permalink / raw)
  To: Sitaram Chamarty; +Cc: git

Sitaram Chamarty wrote:
> http://www.markshuttleworth.com/archives/123#comment-118655
>
Here is the comment from the thread, my comment on it is below:

 > This is a very strong point for renaming, but it is not necessarily 
an universal one.

 > Here is one example of the issue: one developer renaming a directory 
in his branch, and another adding a file to the original directory in 
his branch. What happens at the merge ?
 > - Bazaar renames the directory and puts the new file in the _renamed_ 
directory.
 > - Git renames the directory with its files, but keeps the old 
directory too and adds the new file there.

 > Bazaar’s behavior certainly is better for C. However it is not 
universally better.

 > For example in Java you cannot rename a file without changing its 
contents. So, moving a file to a directory different from where its 
author put it will almost certainly break the build.

 > The bottom line is, both behaviors can seem valid or broken, 
depending on the case. Neither is perfect. At the very abstract level 
file renames are _not_ a first-class operation. This is especially 
apparent in a language like Java.

 > Content movement is the first class operation. Things like moving 
functions, etc. The question is how one can handle that and whether the 
current strategy has a path for improvement. It could be > argued that 
once you commit yourself to explicitly tracking file renames, you are 
giving up a slew of opportunities for handling the more general cases.

 > One thing is for certain, a 100% ideal solution is impossible. It 
would have to be aware of the target programming language _and_ the 
build environment.

And my comment is that in this example, about Java, I think that 
manually fixing the package name in the file (after noticing the build 
is broken) is easy. On the other hand, if the other developer changed 
one of the renamed file, then manually merging the change in the file in 
the old location to the file in the new location is not so easy: you 
first need to discover that this happened, then merge the two files (and 
you still need to fix the package name).

ittay

-- 
Ittay Dror <ittayd@tikalk.com>
Tikal <http://www.tikalk.com>
Tikal Project <http://tikal.sourceforge.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:34     ` Jeff King
  2008-05-01 15:50       ` Avery Pennarun
@ 2008-05-01 19:12       ` Steven Grimm
  2008-05-01 23:14         ` Jeff King
  1 sibling, 1 reply; 49+ messages in thread
From: Steven Grimm @ 2008-05-01 19:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Avery Pennarun, Ittay Dror, git

On May 1, 2008, at 8:34 AM, Jeff King wrote:
> So I don't think you can always track the intent automatically.

That is absolutely true. You have to pick one case or the other as the  
default unless there's some way to tell the system your intent either  
at merge time or at move time.

However, that leaves the question of which default will be wrong the  
least often.

In my personal experience, I think a directory rename has almost  
always meant that I would want new files to appear in the new  
directory rather than to recreate the old directory. I can't think of  
a single time when I've wanted git's current behavior (though maybe  
it's happened on occasion) but the current behavior has tripped me up  
more than once and forced me to do extra work shuffling things around  
by hand post-merge. I acknowledge that there exist cases where the  
current behavior is correct -- but in my experience they're the  
minority.

Of course, the discussion is moot anyway until someone writes code to  
detect the situation; my impression is the current behavior is the way  
it is simply because it's what naturally happens in the absence of  
merge-time detection of a directory getting renamed.

-Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 16:48         ` Jeff King
@ 2008-05-01 19:45           ` Avery Pennarun
  2008-05-01 22:42             ` Jeff King
  0 siblings, 1 reply; 49+ messages in thread
From: Avery Pennarun @ 2008-05-01 19:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Ittay Dror, git

On Thu, May 1, 2008 at 12:48 PM, Jeff King <peff@peff.net> wrote:
>  I don't see it. I think the steps are exactly the same as in your
>  example. Consider:
>
>   1. You have some files in src/
>   2. All of the files from src/ get moved away
>   3. You merge in somebody else's work which adds a file in src/, but
>      their work is based on a commit which predates 2.
>
>  The question is: if they had seen 2., would they have put the file into
>  src/, or into the new location? I think the answer depends on the
>  semantics of the file. If it is semantically an addition to the source
>  code that got moved, then yes. If it is a _replacement_ for the
>  source code that got moved, then no.

I promised I would shut up, and I apparently didn't.  Sorry :)

I think this case isn't so hard.  Basically, a merge involves three
commits; the merge-base, my branch, and your branch.

In your example above, we compare the merge-base to the new version;
in that case, the new file is in an *existing* directory which
definitely corresponds to src/ in #1, because the the new version has
never even heard about src/ being deleted.  Thus, the file must be
intended to be part of the original src/, wherever it may now be.

In contrast, if the merge-base already had src/ being renamed, and
someone put something into src/, we'd know that they're putting it
into a fundamentally different directory than the moved src/.

Exactly how you track the "identity" of a directory without breaking
things down by individual commit sounds a little complicated, but it
feels to me like it should be possible.

I suspect this is a generalization of the earlier discussion (a few
months ago) that I read in the archive about git's handling of empty
directories.  Right now git does weird things with directory
creation/deletion because directories are not first-class citizens.

Anyway, as with the empty directory stuff, if I occasionally have to
mkdir/rmdir a couple things and rename a few files after doing a
merge, I'm not going to cry too much.  It sure beats explicitly
tracking renames and then having an oops-I-forgot-to-explicitly-track
rename throw a monkey wrench into my merges, which svn has saddled me
with lots of times.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 15:20     ` Jeff King
  2008-05-01 15:30       ` Ittay Dror
@ 2008-05-01 20:39       ` Teemu Likonen
  2008-05-01 23:09         ` Jeff King
  2008-05-02  2:06         ` Sitaram Chamarty
  1 sibling, 2 replies; 49+ messages in thread
From: Teemu Likonen @ 2008-05-01 20:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Ittay Dror, git

Jeff King wrote (2008-05-01 11:20 -0400):

> Hmm, looking at the code, though, 50% is supposed to be the default
> minimum. So there might actually be a bug.

I did some testing... A file, containing 10 lines (about 200 bytes),
renamed and then modified (similarity index being a bit over 50%). Git
detected the rename just fine with "git diff -M" over the rename and
change. When I edited the file even more (similarity only 40%) "git diff
-M" didn't detect the rename but "git diff -M4" did. To me it looks like
this works nicely, better than I expected, actually.

Smaller files than that do not seem to work with "git diff -M" over the
rename and changes. They can be followed with "git log --follow -p"
which works even with the two-line "hello\nworld". And of course there
is always

  git diff commit1:path1/file1 commit2:path2/file2

I'd conclude that for logs and diffs renames are detected very nicely
and there's no problem at all to get wanted information from the repo.
I wonder how this rename detection/tracking has become such a big thing,
a debate even. But maybe merges are different.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 19:45           ` Avery Pennarun
@ 2008-05-01 22:42             ` Jeff King
  0 siblings, 0 replies; 49+ messages in thread
From: Jeff King @ 2008-05-01 22:42 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Ittay Dror, git

On Thu, May 01, 2008 at 03:45:07PM -0400, Avery Pennarun wrote:

> In your example above, we compare the merge-base to the new version;
> in that case, the new file is in an *existing* directory which
> definitely corresponds to src/ in #1, because the the new version has
> never even heard about src/ being deleted.  Thus, the file must be
> intended to be part of the original src/, wherever it may now be.

I disagree with the final statement of the quoted paragraph above.

Just because you didn't build on the commit that moved src/* doesn't
mean the thing you put in src/ was intended to be moved along with src/.
For example:

  - it might have been a new work unrelated to the existing work in src/
    that got moved

  - it might have been a replacement for the work in src/ that was
    started before the movement. E.g., developer1 begins the replacement
    work. developer2 moves the old work out of the way. When the
    branches are merged, you don't want developer1's work moved.

And yes, I think those are probably less common than "it should be moved
along with src/*". My point isn't that this isn't a valuable construct,
but that we should stop short of mind-reading, and focus on making it
_easy_ to see what happened and to concisely specify the choice and
proceed.

-Peff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 20:39       ` Teemu Likonen
@ 2008-05-01 23:09         ` Jeff King
  2008-05-02  2:06         ` Sitaram Chamarty
  1 sibling, 0 replies; 49+ messages in thread
From: Jeff King @ 2008-05-01 23:09 UTC (permalink / raw)
  To: Teemu Likonen; +Cc: Junio C Hamano, Ittay Dror, git

[cc'd Junio for comments on this rename optimization]

On Thu, May 01, 2008 at 11:39:40PM +0300, Teemu Likonen wrote:

> > Hmm, looking at the code, though, 50% is supposed to be the default
> > minimum. So there might actually be a bug.
> 
> I did some testing... A file, containing 10 lines (about 200 bytes),
> renamed and then modified (similarity index being a bit over 50%). Git

Ah, OK. The problem comes because the toy example is so tiny. It hits
this code chunk:

  if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
          return 0;

where base_size is the size of the smaller file in bytes, and delta_size
is the difference between the size of the two files. This is an
optimization so that we don't even have to look at the contents.

But it is basing the percentage off of the smaller file, so even though
file B ("hello\nworld\n") is 50% made up of file A ("hello\n"), we
actually end up saying "there must be at least as much content added to
make B as there is in A already". IOW, the "percentage similarity" is
based off of the smaller file for this optimization.

Obviously this is a toy case, but I wonder if there are other larger
cases where you end up with a file which has substantial copied content,
but also _grows_ a lot (not just changes). For example, consider the
file:

  1
  2
  3
  4
  5
  6
  7
  8
  9

that is, ten lines each with a number. Now rename it, and start adding
more numbers. We detect the addition of 10, 11, 12. But adding 13 means
we no longer match. So even with only 4 lines added, we fail to match.

But again, this is a bit of a toy case. It relies on the line length
being a significant factor compared to number of lines.

-Peff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 19:12       ` Steven Grimm
@ 2008-05-01 23:14         ` Jeff King
  2008-05-03 17:56           ` merge renamed files/directories? (was: Re: detecting rename->commit->modify->commit) Ittay Dror
  2008-05-08 18:17           ` detecting rename->commit->modify->commit Jeff King
  0 siblings, 2 replies; 49+ messages in thread
From: Jeff King @ 2008-05-01 23:14 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Avery Pennarun, Ittay Dror, git

On Thu, May 01, 2008 at 12:12:33PM -0700, Steven Grimm wrote:

> However, that leaves the question of which default will be wrong the  
> least often.
>
> In my personal experience, I think a directory rename has almost always 
> meant that I would want new files to appear in the new directory rather 

I do agree that the rename is probably more often desired.

> Of course, the discussion is moot anyway until someone writes code to  
> detect the situation; my impression is the current behavior is the way it 
> is simply because it's what naturally happens in the absence of  
> merge-time detection of a directory getting renamed.

Yes, I think that is largely a correct impression (although I think
Linus has spoken out against directory renaming in the past, so there is
at least a little bit of conscious effort). I suspect the right sequence
of steps to implement this would be:

  1. write a proof-of-concept that shows directory renaming after the
    fact (e.g., take a conflicted merge, scan the diff for directory
    renames, and then fix up the files). That way it is available, but
    doesn't impact git at all.

  2. If people think it is useful, build it into the diff and merge
     machinery so that it can happen automagically, but make it
     optional. Thus git fully supports it, but the policy decision is
     left up to the user.

  3. Make it the default if it is the common choice.

So we just need somebody to volunteer to work on 1. ;)

-Peff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 20:39       ` Teemu Likonen
  2008-05-01 23:09         ` Jeff King
@ 2008-05-02  2:06         ` Sitaram Chamarty
  2008-05-02  2:38           ` Junio C Hamano
  1 sibling, 1 reply; 49+ messages in thread
From: Sitaram Chamarty @ 2008-05-02  2:06 UTC (permalink / raw)
  To: Teemu Likonen; +Cc: Jeff King, Ittay Dror, git

On Fri, May 2, 2008 at 2:09 AM, Teemu Likonen <tlikonen@iki.fi> wrote:

>  -M" didn't detect the rename but "git diff -M4" did. To me it looks like
>  this works nicely, better than I expected, actually.

err... I didn't realise -M had an option, and I just double checked
the man pages for diff, diff-files, diff-index, and diff-tree.  What
does the 4 mean?

Sitaram

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-02  2:06         ` Sitaram Chamarty
@ 2008-05-02  2:38           ` Junio C Hamano
  2008-05-02 16:59             ` Sitaram Chamarty
  0 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2008-05-02  2:38 UTC (permalink / raw)
  To: Sitaram Chamarty; +Cc: Teemu Likonen, Jeff King, Ittay Dror, git

"Sitaram Chamarty" <sitaramc@gmail.com> writes:

> On Fri, May 2, 2008 at 2:09 AM, Teemu Likonen <tlikonen@iki.fi> wrote:
>
>>  -M" didn't detect the rename but "git diff -M4" did. To me it looks like
>>  this works nicely, better than I expected, actually.
>
> err... I didn't realise -M had an option, and I just double checked
> the man pages for diff, diff-files, diff-index, and diff-tree.  What
> does the 4 mean?

The option to -M<num>, -C<num>, -B<num>/<num> are "raise or lower the
similarity threshold to <num> / 10^N" where N is the number of digits in
<num>.  IOW, you will always be expressing number between 0 and 1.

You should also be able to say -M40% but that is an ancient part of the
code base so I might be misremembering things.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-02  2:38           ` Junio C Hamano
@ 2008-05-02 16:59             ` Sitaram Chamarty
  0 siblings, 0 replies; 49+ messages in thread
From: Sitaram Chamarty @ 2008-05-02 16:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Teemu Likonen, Jeff King, Ittay Dror, git

On Fri, May 2, 2008 at 8:08 AM, Junio C Hamano <gitster@pobox.com> wrote:
>  The option to -M<num>, -C<num>, -B<num>/<num> are "raise or lower the
>  similarity threshold to <num> / 10^N" where N is the number of digits in
>  <num>.  IOW, you will always be expressing number between 0 and 1.

Thanks.  The only mention of this I find (now) is in a file called
diffcore.txt, which appears to exist only in the HTML documentation,
but not in the "man" pages anywhere, as of 1.5.5.

[ I pulled a few hairs out trying to find it in the man pages :-) ]

I'd submit a patch, but a guy who takes the easy way out even to get
the documentation (essentially doing a checkout of the "man" branch)
would certainly not be able to test it :-(

^ permalink raw reply	[flat|nested] 49+ messages in thread

* merge renamed files/directories? (was: Re: detecting rename->commit->modify->commit)
  2008-05-01 23:14         ` Jeff King
@ 2008-05-03 17:56           ` Ittay Dror
  2008-05-03 18:11             ` Avery Pennarun
  2008-05-08 18:17           ` detecting rename->commit->modify->commit Jeff King
  1 sibling, 1 reply; 49+ messages in thread
From: Ittay Dror @ 2008-05-03 17:56 UTC (permalink / raw)
  To: git

Can someone comment whether supporting merges after renames will be on 
the Git roadmap?

As a Java developer, I can say that refactoring of class names and 
packages happens quite often. Having to remember I've made this change 
throughout the lifetime of a branch (or master, until pushed to a 
central repository), and needing to manually merge changes to files / 
packages (directories) I've refactored is something that I want my VCS 
to do.

Thank you,
Ittay

Jeff King wrote:
> On Thu, May 01, 2008 at 12:12:33PM -0700, Steven Grimm wrote:
>
>   
>> However, that leaves the question of which default will be wrong the  
>> least often.
>>
>> In my personal experience, I think a directory rename has almost always 
>> meant that I would want new files to appear in the new directory rather 
>>     
>
> I do agree that the rename is probably more often desired.
>
>   
>> Of course, the discussion is moot anyway until someone writes code to  
>> detect the situation; my impression is the current behavior is the way it 
>> is simply because it's what naturally happens in the absence of  
>> merge-time detection of a directory getting renamed.
>>     
>
> Yes, I think that is largely a correct impression (although I think
> Linus has spoken out against directory renaming in the past, so there is
> at least a little bit of conscious effort). I suspect the right sequence
> of steps to implement this would be:
>
>   1. write a proof-of-concept that shows directory renaming after the
>     fact (e.g., take a conflicted merge, scan the diff for directory
>     renames, and then fix up the files). That way it is available, but
>     doesn't impact git at all.
>
>   2. If people think it is useful, build it into the diff and merge
>      machinery so that it can happen automagically, but make it
>      optional. Thus git fully supports it, but the policy decision is
>      left up to the user.
>
>   3. Make it the default if it is the common choice.
>
> So we just need somebody to volunteer to work on 1. ;)
>
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   

-- 
Ittay Dror <ittayd@tikalk.com>
Tikal <http://www.tikalk.com>
Tikal Project <http://tikal.sourceforge.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories? (was: Re: detecting rename->commit->modify->commit)
  2008-05-03 17:56           ` merge renamed files/directories? (was: Re: detecting rename->commit->modify->commit) Ittay Dror
@ 2008-05-03 18:11             ` Avery Pennarun
  2008-05-04  6:08               ` merge renamed files/directories? Ittay Dror
  0 siblings, 1 reply; 49+ messages in thread
From: Avery Pennarun @ 2008-05-03 18:11 UTC (permalink / raw)
  To: Ittay Dror; +Cc: git

On 5/3/08, Ittay Dror <ittayd@tikalk.com> wrote:
> Can someone comment whether supporting merges after renames will be on the
> Git roadmap?
>
>  As a Java developer, I can say that refactoring of class names and packages
> happens quite often. Having to remember I've made this change throughout the
> lifetime of a branch (or master, until pushed to a central repository), and
> needing to manually merge changes to files / packages (directories) I've
> refactored is something that I want my VCS to do.

Git already works fine for renames.  The only situation where
something funny happens is if you rename a whole directory and someone
else creates a file in the old directory.  (In that case, the new file
ends up in the old place instead of the new place.)  However, even in
that case, there is still no conflict and no manual merging necessary.

In fact, as someone else pointed out, renaming a java file requires
you to modify the file anyhow, so having git auto-move the file to
another directory *still* wouldn't make it work any better.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-03 18:11             ` Avery Pennarun
@ 2008-05-04  6:08               ` Ittay Dror
  2008-05-04  9:34                 ` Jakub Narebski
  2008-05-05 16:40                 ` Avery Pennarun
  0 siblings, 2 replies; 49+ messages in thread
From: Ittay Dror @ 2008-05-04  6:08 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: git



Avery Pennarun wrote:
> Git already works fine for renames.  The only situation where
> something funny happens is if you rename a whole directory and someone
> else creates a file in the old directory.  (In that case, the new file
> ends up in the old place instead of the new place.)  However, even in
> that case, there is still no conflict and no manual merging necessary.
>
>   
Sorry, but this is not the situation as I have experienced it with a 
local repository I have. I renamed a directory (without changing any 
files in it). 'git diff <commit>^ <commit>' shows the rename fine, but 
'git log -p -M -C <initial commit>..' does not (that is, the history for 
files in that directory is shown from the rename commit only). Obviously 
git-diff is not any better.
> In fact, as someone else pointed out, renaming a java file requires
> you to modify the file anyhow, so having git auto-move the file to
> another directory *still* wouldn't make it work any better.
>
>   
Sure it will, because otherwise I need to move it and still need to fix 
it. And there are many other file formats and languages where such a 
move will not require any change (I think it is funny that Java is a 
justification for not doing something for a tool primarily used by C 
people). Also, what happens if I change the file in the new location and 
someone else changes it in the old location? Will I need to do a manual 
merge?

-- 
Ittay Dror <ittayd@tikalk.com>
Tikal <http://www.tikalk.com>
Tikal Project <http://tikal.sourceforge.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-04  6:08               ` merge renamed files/directories? Ittay Dror
@ 2008-05-04  9:34                 ` Jakub Narebski
  2008-05-05 16:40                 ` Avery Pennarun
  1 sibling, 0 replies; 49+ messages in thread
From: Jakub Narebski @ 2008-05-04  9:34 UTC (permalink / raw)
  To: Ittay Dror; +Cc: Avery Pennarun, git

Ittay Dror <ittayd@tikalk.com> writes:
> Avery Pennarun wrote:

> > Git already works fine for renames.  The only situation where
> > something funny happens is if you rename a whole directory and someone
> > else creates a file in the old directory.  (In that case, the new file
> > ends up in the old place instead of the new place.)  However, even in
> > that case, there is still no conflict and no manual merging necessary.
>
> Sorry, but this is not the situation as I have experienced it with a
> local repository I have. I renamed a directory (without changing any
> files in it). 'git diff <commit>^ <commit>' shows the rename fine, but
> 'git log -p -M -C <initial commit>..' does not (that is, the history
> for files in that directory is shown from the rename commit
> only). Obviously git-diff is not any better.

This is one thing where git differs from other SCMs.  In "git log --
<path>" (that is what I assume you have used) the <path> argument is
path limiter.  It allows to specify more than one directory or a file.

Unfortunately currently "git log --follow=<file>" works only for single
files, and doesn't yet work for directories; which is caused, among
other things, by the lack of directory rename detection in git.

> [...] Also, what happens if I change the file in the new location
> and someone else changes it in the old location? Will I need to do a
> manual merge?

No, rename detection should make automatic merge possible.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-04  6:08               ` merge renamed files/directories? Ittay Dror
  2008-05-04  9:34                 ` Jakub Narebski
@ 2008-05-05 16:40                 ` Avery Pennarun
  2008-05-05 21:49                   ` Robin Rosenberg
  1 sibling, 1 reply; 49+ messages in thread
From: Avery Pennarun @ 2008-05-05 16:40 UTC (permalink / raw)
  To: Ittay Dror; +Cc: git

On 5/4/08, Ittay Dror <ittayd@tikalk.com> wrote:
>  Avery Pennarun wrote:
> > In fact, as someone else pointed out, renaming a java file requires
> > you to modify the file anyhow, so having git auto-move the file to
> > another directory *still* wouldn't make it work any better.
>
> Sure it will, because otherwise I need to move it and still need to fix it.
> And there are many other file formats and languages where such a move will
> not require any change (I think it is funny that Java is a justification for
> not doing something for a tool primarily used by C people).

I mentioned Java because you mentioned you were working in java.

The particular problem with Java doesn't happen to C people.  Imagine,
for example, that I add a new file, lib/foo.c, to lib/lib.a (thus they
have to modify lib/Makefile), while someone else renames "lib" to
"bettername".

When I merge, if git would create bettername/foo.c (it currently
won't) and properly automerge bettername/Makefile (it will), then the
program would still compile correctly.  However this doesn't work in
Java: lib/foo.java would include the word "lib" in its contents (in
the namespace declaration) and so there's no way automatic merging
would have resulted in a version that compiles correctly.

So what I said isn't to *justify* git's behaviour, merely to point out
that in java's case, there seems to be no way to get fully automatic
merging that would work.  In C, this case would have worked, if only
git supported directory renames.

In neither case is it very much work to fix by hand, though :)

Have fun,

Avery

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-05 16:40                 ` Avery Pennarun
@ 2008-05-05 21:49                   ` Robin Rosenberg
  2008-05-05 22:20                     ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Robin Rosenberg @ 2008-05-05 21:49 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Ittay Dror, git

måndagen den 5 maj 2008 18.40.24 skrev Avery Pennarun:
> On 5/4/08, Ittay Dror <ittayd@tikalk.com> wrote:
> >  Avery Pennarun wrote:
> > > In fact, as someone else pointed out, renaming a java file requires
> > > you to modify the file anyhow, so having git auto-move the file to
> > > another directory *still* wouldn't make it work any better.
> >
> > Sure it will, because otherwise I need to move it and still need to fix
> > it. And there are many other file formats and languages where such a move
> > will not require any change (I think it is funny that Java is a
> > justification for not doing something for a tool primarily used by C
> > people).
>
> I mentioned Java because you mentioned you were working in java.
>
> The particular problem with Java doesn't happen to C people.  Imagine,
> for example, that I add a new file, lib/foo.c, to lib/lib.a (thus they
> have to modify lib/Makefile), while someone else renames "lib" to
> "bettername".
>
> When I merge, if git would create bettername/foo.c (it currently
> won't) and properly automerge bettername/Makefile (it will), then the
> program would still compile correctly.  However this doesn't work in
> Java: lib/foo.java would include the word "lib" in its contents (in
> the namespace declaration) and so there's no way automatic merging
> would have resulted in a version that compiles correctly.

You will always find corner cases. Line-by line merge happens to
work, not because it is the theoretically correct way, but because we
have discovered that it nearly always works so our need for more
specialized merging is not huge. We have also adapted our development
practices to the way line-by-line merging works, i.e. we avoid binary
files and funny text file formats.

> So what I said isn't to *justify* git's behaviour, merely to point out
> that in java's case, there seems to be no way to get fully automatic
> merging that would work.  In C, this case would have worked, if only
> git supported directory renames.

Sure, a merge that understands this is java and does the correct thing. Evn
your case for C (with hypotetical directory rename detection) would fail if 
the renamed directory was used in an #include-statement (like #include 
<lib/foo.h>) Say someone thinks xxdiff should move to lib/xxdiff, while 
someone else adds a new reference to <xxdiff/xxdiff.h>. To resolve all cases 
you must have tools that understand what they are doing. Directyry rename
detection only solves a few cases, but it may be easy enough to implement to 
warrant the effort to get the tick in the box.

>
> In neither case is it very much work to fix by hand, though :),

I agree on that.

-- robin

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-05 21:49                   ` Robin Rosenberg
@ 2008-05-05 22:20                     ` Linus Torvalds
  2008-05-05 23:07                       ` Steven Grimm
  2008-05-06  1:38                       ` Avery Pennarun
  0 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2008-05-05 22:20 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: Avery Pennarun, Ittay Dror, git



On Mon, 5 May 2008, Robin Rosenberg wrote:
> 
> You will always find corner cases.

.. and btw, this is why merging should always 

 - be predictable (which implies "simple": overly clever merging, and 
   especially merging that takes complex history into account is *bad*, 
   because it's still going to do the wrong thing, but now it's going to 
   do so much less predictable)

 - be amenable to manual fixes even when it succeeds (ie even if an 
   automatic merge completes without errors, a subsequent build may find 
   problems, and a "git commit --amend" may well be the right thing to 
   do!)

 - aim for (preferrably easily-handled) conflicts when the unusual cases 
   happen.

   Conflicts for *common* things are bad, because they just cause more 
   work, and people get too complacent about fixing them. But similarly, 
   thinking that the unusual cases should be handled automatically is also 
   wrong - because the unusual cases are likely the ones that need some 
   manual resolution anyway.

Git will never do merges "perfectly", if only because it's fundamentally 
impossible to do that. But one thing git *does* do is to make it pretty 
damn easy to handle it.

I really don't understand why people expect a directory rename to be 
handled automatically, when it is (a) not that common and (b) not obvious 
what the solution is, but MOST OF ALL (c) so damn _easy_ to handle it 
manually after-the-fact when you notice that something doesn't compile!

Really. If you have a file that was created in the wrong subdirectory (and 
please admit that this is not common - it requires not just a directory 
rename, but also a file create in another branch at the same time), what's 
so hard with just doing

	make
	.. oh, oops, that was pretty obviousm, the expected source file 
	   didn't exist ..
	git mv olddir/file newdir/file
	git commit --amend

and "Tadaa! All done". Your merge that was *fundamentally impossible* to 
do automatically, was trivially done manually, with no actual big 
head-scratiching involved.

		Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-05 22:20                     ` Linus Torvalds
@ 2008-05-05 23:07                       ` Steven Grimm
  2008-05-06  0:29                         ` Linus Torvalds
  2008-05-06  1:38                       ` Avery Pennarun
  1 sibling, 1 reply; 49+ messages in thread
From: Steven Grimm @ 2008-05-05 23:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Robin Rosenberg, Avery Pennarun, Ittay Dror, git

On May 5, 2008, at 3:20 PM, Linus Torvalds wrote:
> I really don't understand why people expect a directory rename to be
> handled automatically, when it is (a) not that common and (b) not  
> obvious
> what the solution is, but MOST OF ALL (c) so damn _easy_ to handle it
> manually after-the-fact when you notice that something doesn't  
> compile!

Assuming all you track with git is source code that has dependencies  
such that a compile command fails cleanly when things end up in the  
wrong directory, sure.

If you're using git to, say, track a tree of documentation files or  
images that are referred to using relative URLs in HTML pages,  
detecting the breakage is less trivial unless you have a really solid  
automated QA process that can check for dangling references.

Are directory renames as common as file renames? Certainly not. But  
they happen often enough that it's annoying to have to manually clean  
up after them. Note that I did not say it is difficult or impossible  
to manually clean up after them. I think the number of people who've  
mentioned this on the list should stand as some kind of refutation of  
the idea that directory renames are so vanishingly rare as to not be  
worth mentioning. I've run into the problem a few times myself.

> and "Tadaa! All done". Your merge that was *fundamentally  
> impossible* to
> do automatically, was trivially done manually, with no actual big
> head-scratiching involved.

$ mkdir parent
$ cd parent
$ hg init
$ mkdir subdir1
$ echo "I am the walrus" > subdir1/file1
$ hg add subdir1/file1
$ hg commit -m 'initial commit'
$ cd ..
$ hg clone parent child
$ cd child
$ hg mv subdir1 subdir2
$ hg commit -m 'rename subdir1 to subdir2'
$ cd ../parent
$ echo 'I love prunes' > subdir1/file2
$ hg add subdir1/file2
$ hg commit -m 'new file in subdir'
$ cd ../child
$ hg pull
$ hg merge
$ ls subdir2
file1   file2

Doesn't seem *fundamentally* impossible to produce the results that  
are most likely to be what people want. (Which doesn't equal  
"guaranteed to be 100% correct 100% of the time or your money back" --  
as you say, merging is an inexact science.)

-Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-05 23:07                       ` Steven Grimm
@ 2008-05-06  0:29                         ` Linus Torvalds
  2008-05-06  0:40                           ` Linus Torvalds
  2008-05-06 15:47                           ` Theodore Tso
  0 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2008-05-06  0:29 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Robin Rosenberg, Avery Pennarun, Ittay Dror, git



On Mon, 5 May 2008, Steven Grimm wrote:
>
> Doesn't seem *fundamentally* impossible to produce the results that are most
> likely to be what people want.

You didn't understand what was fundamentally impossible.

And btw, this has nothing to do with directory renames either. There are 
tons of these kinds of merge issues that bad SCM developes have been 
masturbating over for YEARS. There's a whole science of making idiotic new 
merging models, one fancier than the other. The fact is, you cannot do a 
perfect job, the best thing you can do is pick a simple model, and try to 
make it repeatable and easy to fix up.

Maybe somebody bothers to implement some directory rename heuristic some 
day. Quite frankly, I personally cannot care less. It really is mental 
masturbation, and has absolutely no relevance for any real-world problem.

		Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-06  0:29                         ` Linus Torvalds
@ 2008-05-06  0:40                           ` Linus Torvalds
  2008-05-06 15:47                           ` Theodore Tso
  1 sibling, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2008-05-06  0:40 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Robin Rosenberg, Avery Pennarun, Ittay Dror, git



On Mon, 5 May 2008, Linus Torvalds wrote:
> 
> There are tons of these kinds of merge issues that bad SCM developes 
> have been masturbating over for YEARS.

.. and if I sound rather less than enthused about these kinds of issues, 
it's because of having seen years and years of people talking about merge 
strategies, and then at the same time using SVN which doesn't even record 
the parenthood of the resulting merges, or thinking that code always 
moves with whole files.

In other words, the details don't even matter. What matters is not being a 
total piece of sh*t in the big picture. 

		Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-05 22:20                     ` Linus Torvalds
  2008-05-05 23:07                       ` Steven Grimm
@ 2008-05-06  1:38                       ` Avery Pennarun
  2008-05-06  1:46                         ` Shawn O. Pearce
  2008-05-06  2:19                         ` Linus Torvalds
  1 sibling, 2 replies; 49+ messages in thread
From: Avery Pennarun @ 2008-05-06  1:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Robin Rosenberg, Ittay Dror, git

On 5/5/08, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>  I really don't understand why people expect a directory rename to be
>  handled automatically, when it is (a) not that common and (b) not obvious
>  what the solution is, but MOST OF ALL (c) so damn _easy_ to handle it
>  manually after-the-fact when you notice that something doesn't compile!

I general I agree with your point here, but I still find it surprising
how hard the directory-rename problem is made out to be.  As far as I
can see, the right implementation exactly parallels the single-file
rename implementation.

I think the same problem that prevents git from knowing the difference
between empty and nonexistent directories (eg.
http://kerneltrap.org/mailarchive/git/2007/7/18/251976) is the one
that prevents it from handling directory renames: git doesn't
acknowledge that it's *already* treating directories as first-class
objects.

What if you thought of a directory as simply a list of filenames?
(This is more or less what unix does anyway.)  Then an *empty*
directory is a tree of zero length; a nonexistent (or not tracked)
directory is simply not listed in the parent; a directory with
untracked files is like a file with patches not yet added to the
index(*); and trying to merge a file into a nonexistent directory
(when the original patch *didn't* create the directory fresh) would
trigger similar logic to the existing rename handling.  That is, put
the new file with the content that used to be next to it, by looking
for a tree with contents (names, not so much sha1's) similar to the
one it was expected to be in.

> It really is mental
> masturbation, and has absolutely no relevance for any real-world problem.

I personally don't get very interested in non-real-world problems.
Here's the actual case I tried to use a few months ago, but couldn't,
because git doesn't track directory renames.  (Note that I was quite
happily able to do this in svn, as much as you can do anything happily
in svn.)

I have a branch called 'mylib' with my library project in its root
directory.  What I wanted was to maintain my library in the 'mylib'
branch, then merge my library into the "libs/mylib" directory of my
application, which is in the 'myapp' branch.  (Of course, in real
life, there's more than one app using mylib in more than one
repository, and I'm actually doing 'git pull' of the mylib branch from
elsewhere.)

This actually works like magic in git - except when you create a file
in the 'mylib' branch, in which case it gets merged to the wrong path
every single time.  It seems to me like it should be very easy to put
it in the right place instead, making one more interesting use case
possible.

I realize git-submodule is the way you're supposed to do something
like this, but git-submodule doesn't really do what I want (yet) for
reasons discussed in other threads.

Have fun,

Avery

(*) Applying the same metaphor in reverse, operations that are valid
on directories are also valid for file contents.  I can think of
immediate uses for a .gitignore-style list that talks about file
*contents*.  Imagine if I could make a local patch to my Makefile,
mark that one patch as "ignored", and never accidentally check it in.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-06  1:38                       ` Avery Pennarun
@ 2008-05-06  1:46                         ` Shawn O. Pearce
  2008-05-06  1:58                           ` Avery Pennarun
  2008-05-06  2:19                         ` Linus Torvalds
  1 sibling, 1 reply; 49+ messages in thread
From: Shawn O. Pearce @ 2008-05-06  1:46 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Linus Torvalds, Robin Rosenberg, Ittay Dror, git

Avery Pennarun <apenwarr@gmail.com> wrote:
> 
> I have a branch called 'mylib' with my library project in its root
> directory.  What I wanted was to maintain my library in the 'mylib'
> branch, then merge my library into the "libs/mylib" directory of my
> application, which is in the 'myapp' branch. [...]
> 
> This actually works like magic in git - except when you create a file
> in the 'mylib' branch, in which case it gets merged to the wrong path
> every single time.  It seems to me like it should be very easy to put
> it in the right place instead, making one more interesting use case
> possible.
> 
> I realize git-submodule is the way you're supposed to do something
> like this, but git-submodule doesn't really do what I want (yet) for
> reasons discussed in other threads.

`git pull -s subtree mylib` ?

This is how git-gui and gitk are merged into git.git, and it avoids
this case by looking for a subdirectory rename, more specifically
a rename of "/" to "mylib/".

It also can go the other way, that is rename "mylib/" to "/", but
this path is never used as far as I know as git-gui and gitk don't
ever merge in the git.git history.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-06  1:46                         ` Shawn O. Pearce
@ 2008-05-06  1:58                           ` Avery Pennarun
  2008-05-06  2:12                             ` Shawn O. Pearce
  0 siblings, 1 reply; 49+ messages in thread
From: Avery Pennarun @ 2008-05-06  1:58 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Linus Torvalds, Robin Rosenberg, Ittay Dror, git

On 5/5/08, Shawn O. Pearce <spearce@spearce.org> wrote:
> Avery Pennarun <apenwarr@gmail.com> wrote:
>  >
>  > I have a branch called 'mylib' with my library project in its root
>  > directory.  What I wanted was to maintain my library in the 'mylib'
>  > branch, then merge my library into the "libs/mylib" directory of my
>
> > application, which is in the 'myapp' branch. [...]
>
> >
>  > This actually works like magic in git - except when you create a file
>  > in the 'mylib' branch, in which case it gets merged to the wrong path
>  > every single time.  It seems to me like it should be very easy to put
>  > it in the right place instead, making one more interesting use case
>  > possible.
>  >
>  > I realize git-submodule is the way you're supposed to do something
>  > like this, but git-submodule doesn't really do what I want (yet) for
>  > reasons discussed in other threads.
>
> `git pull -s subtree mylib` ?

First, I thought: wow!  How can that possibly work?  These guys are geniuses!

Then I found out that git-merge-subtree is a git builtin, and git.c says this:

  { "merge-recursive", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE },
  { "merge-subtree", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE },

And then my head exploded. :)

Still scraping the pieces of my brain back off the floor... but does
this mean the subtree merge strategy would fail exactly like
merge-recursive when new files are created?

Have fun,

Avery

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-06  1:58                           ` Avery Pennarun
@ 2008-05-06  2:12                             ` Shawn O. Pearce
  0 siblings, 0 replies; 49+ messages in thread
From: Shawn O. Pearce @ 2008-05-06  2:12 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Linus Torvalds, Robin Rosenberg, Ittay Dror, git

Avery Pennarun <apenwarr@gmail.com> wrote:
> On 5/5/08, Shawn O. Pearce <spearce@spearce.org> wrote:
> >
> > `git pull -s subtree mylib` ?
> 
> First, I thought: wow!  How can that possibly work?  These guys are geniuses!
> 
> Then I found out that git-merge-subtree is a git builtin, and git.c says this:
> 
>   { "merge-recursive", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE },
>   { "merge-subtree", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE },
> 
> And then my head exploded. :)
> 
> Still scraping the pieces of my brain back off the floor... but does
> this mean the subtree merge strategy would fail exactly like
> merge-recursive when new files are created?

Nope.  If you go look at cmd_merge_recursive you will see it has
different behavior based upon the name it was invoked as, even
though it is the same C function and has the same implementation.

If it is started with the name "merge-subtree" it tries to find
a matching subtree prefix to insert in front of all names, or
to remove from all names, such that a merge will correctly fully
include a set of files in a subdirectory, or full pull out a set
of files from a subdirectory.

Junio is the genius that implemented this.  Works quite well for
this library->application merge case that I think you were trying
to describe.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-06  1:38                       ` Avery Pennarun
  2008-05-06  1:46                         ` Shawn O. Pearce
@ 2008-05-06  2:19                         ` Linus Torvalds
  1 sibling, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2008-05-06  2:19 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Robin Rosenberg, Ittay Dror, git



On Mon, 5 May 2008, Avery Pennarun wrote:
>
> I general I agree with your point here, but I still find it surprising
> how hard the directory-rename problem is made out to be.

I do agree that it's probably not that hard.

But I disagree with people who whine about pointless stuff, and don't send 
patches.

		Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-06  0:29                         ` Linus Torvalds
  2008-05-06  0:40                           ` Linus Torvalds
@ 2008-05-06 15:47                           ` Theodore Tso
  2008-05-06 16:10                             ` Linus Torvalds
  1 sibling, 1 reply; 49+ messages in thread
From: Theodore Tso @ 2008-05-06 15:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Grimm, Robin Rosenberg, Avery Pennarun, Ittay Dror, git

On Mon, May 05, 2008 at 05:29:12PM -0700, Linus Torvalds wrote:
> 
> Maybe somebody bothers to implement some directory rename heuristic some 
> day. Quite frankly, I personally cannot care less. It really is mental 
> masturbation, and has absolutely no relevance for any real-world problem.
> 

Actually, the directory rename hueristic *does* have relevance in at
least some real-world cases.  For example, MySQL has plugin
directories, and occasionally the plugins get renamed, for whatever
reason.  If a plugin gets renamed, so does its directory, and if the
rename operation happens in an experimental (or devel) branch, but
then for whatever reason, a new file is created in the devel (or
maint) branch, without the directory rename hueristic, when the
changeset is pulled into the experimental (or devel) branch, the file
will be created in the wrong directory.

So it may be rare, but this kind of thing does happen in the real
world.

							- Ted

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-06 15:47                           ` Theodore Tso
@ 2008-05-06 16:10                             ` Linus Torvalds
  2008-05-06 16:15                               ` Linus Torvalds
  2008-05-06 16:32                               ` Ittay Dror
  0 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2008-05-06 16:10 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Steven Grimm, Robin Rosenberg, Avery Pennarun, Ittay Dror, git



On Tue, 6 May 2008, Theodore Tso wrote:
> 
> Actually, the directory rename hueristic *does* have relevance in at
> least some real-world cases.  For example, MySQL has plugin
> directories, and occasionally the plugins get renamed, for whatever
> reason.

I'm not saying that directory renames don't happen.

I don't even say that merges across directory renames don't happen.

I *am* saying that it's not a problem.

It's like data conflicts. Do they happen? Sure as hell. I can pretty much 
guarantee that any sane project will have more data conflicts than they 
will have rename conflicts (whether single-file or directory), and it's 
not only a problem, it's something that is absolutely *required* from a 
source control management system!

So are data conflicts a problem?

I claim that they aren't. They are a *positive* resource that you need to 
handle. Some of the "handling" is obviously going to be to try to avoid 
them, and if you get too much of them, the real "problem" is that you 
merge too seldom, or more commonly that you have a piece of code that is 
simply not done well enough, so many different people have to muck around 
in that area.

But fundamentally, you should always have data conflicts, and they aren't 
a problem in themselves. They are a problem only

 - If they are hard to understand and see, and *unexpected*. The SCM
   should explain what is going on, and explain why a conflict happens 
   (and that may perhaps mean after-the fact! I love "gitk --merge" 
   exactly because it tends to be very good at explaining what was going 
   on!).

 - If they are hard to fix.

   For example, one of the main problems I had with BK merging was the 
   fact that while the megetool was wonderful, you effectively *had* to 
   merge using it, and you couldn't sanely do an "incremental" merge 
   where you first did a first merge job, then checked that it at 
   least compiles, then tested it, and finally looked at the diffs from 
   both parents and looked at whether those all made sense, and you could 
   "refine" or fix the merge along the different phases.

   Of course, you hope that all merges are pretty obvious, and you can do 
   it right in one go, but no, they're not. They'll never be. They'll 
   never be fully automtic, but even when they aren't automatic, they'll 
   not even be trivially to do manually. But that's OK, as long as the 
   tool at least doesn't fight you, and lets you do whatever you want to 
   do a part of fixing things up.

Now, take a look back at directory renames.

Do they happen?

Yes.

Do they potentially mis-merge?

Yes.

But are they common and/or hard to fix and handle?

No.

And that's why I don't think people should call them "problems". The only 
_real_ issue here, I think, is that git just does things differently from 
other SCM's. Git does a _lot_ of things differently. You get used to it.

			Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-06 16:10                             ` Linus Torvalds
@ 2008-05-06 16:15                               ` Linus Torvalds
  2008-05-06 16:32                               ` Ittay Dror
  1 sibling, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2008-05-06 16:15 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Steven Grimm, Robin Rosenberg, Avery Pennarun, Ittay Dror, git



On Tue, 6 May 2008, Linus Torvalds wrote:
> 
>							I can pretty much 
> guarantee that any sane project will have more data conflicts than they 
> will have rename conflicts (whether single-file or directory), and it's 
> not only a problem, it's something that is absolutely *required* from a 
         ^^^-- not
> source control management system!

Oops. That didn't read well.

			Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-06 16:10                             ` Linus Torvalds
  2008-05-06 16:15                               ` Linus Torvalds
@ 2008-05-06 16:32                               ` Ittay Dror
  2008-05-06 16:39                                 ` Linus Torvalds
  1 sibling, 1 reply; 49+ messages in thread
From: Ittay Dror @ 2008-05-06 16:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Tso, Steven Grimm, Robin Rosenberg, Avery Pennarun, git



Linus Torvalds wrote:
>
>  - If they are hard to understand and see, and *unexpected*. The SCM
>    should explain what is going on, and explain why a conflict happens 
>    (and that may perhaps mean after-the fact! I love "gitk --merge" 
>    exactly because it tends to be very good at explaining what was going 
>    on!).
>
>   
So does git tell me what is going on with directory renames? Or should I 
just discover them when I try to compile (assuming that when the old 
directory name appears it will even get compiled, and that the file in 
it is something that gets compiled)

And no, it's not a common problem, but I don't like the fact that a 
merge conflict happens and the SCM doesn't tell me about it.

-- 
Ittay Dror <ittayd@tikalk.com>
Tikal <http://www.tikalk.com>
Tikal Project <http://tikal.sourceforge.net>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: merge renamed files/directories?
  2008-05-06 16:32                               ` Ittay Dror
@ 2008-05-06 16:39                                 ` Linus Torvalds
  0 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2008-05-06 16:39 UTC (permalink / raw)
  To: Ittay Dror
  Cc: Theodore Tso, Steven Grimm, Robin Rosenberg, Avery Pennarun, git



On Tue, 6 May 2008, Ittay Dror wrote:
> 
> And no, it's not a common problem, but I don't like the fact that a merge
> conflict happens and the SCM doesn't tell me about it.

I do agree that the most irritating feature of it is the silent clean 
merge. When it's not obvious what the right thing to do is, generally a 
merge strategy should try to warn, or even generate a conflict.

That said, anybody who thinks that "merge was automatic and successful" 
means that the mege was _correct_ is sadly mistaken. So you really 
shouldn't depend on it, and yeah, I strongly suggest building and testing 
after a merge (and before you push the result out), so that you can fix 
any issues.

			Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: detecting rename->commit->modify->commit
  2008-05-01 23:14         ` Jeff King
  2008-05-03 17:56           ` merge renamed files/directories? (was: Re: detecting rename->commit->modify->commit) Ittay Dror
@ 2008-05-08 18:17           ` Jeff King
  1 sibling, 0 replies; 49+ messages in thread
From: Jeff King @ 2008-05-08 18:17 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Avery Pennarun, Ittay Dror, git

On Thu, May 01, 2008 at 07:14:27PM -0400, Jeff King wrote:

>   1. write a proof-of-concept that shows directory renaming after the
>     fact (e.g., take a conflicted merge, scan the diff for directory
>     renames, and then fix up the files). That way it is available, but
>     doesn't impact git at all.

Here's a toy script that finds directory renames. I'm sure there are a
ton of corner cases it doesn't handle (like directory renames inside of
directory renames). My test case was the very trivial:

  mkdir repo && cd repo && git init

  mkdir subdir
  for i in 1 2 3; do
    echo content $i >subdir/file$i
  done
  git add subdir
  git commit -m initial

  git mv subdir new
  git commit -m move

  git checkout -b other HEAD^
  echo content 4 >subdir/file4
  git add subdir
  git commit -m new

  git merge --no-commit master
  perl ../find-dir-rename.pl
  git commit

At which point you should see the merged commit with new/file4.

Script is below.

-- >8 --
#!/usr/bin/perl
#
# Find renamed directories, and move any files in the "old"
# directory into the "new".
#
# usage:
#   git merge --no-commit <whatever>
#   find-dir-rename
#   git commit

use strict;

foreach my $r (renamed_dirs()) {
  move_dir_contents($r->{from}, $r->{to});
}
exit 0;

sub renamed_dirs {
  my $base = `git merge-base HEAD MERGE_HEAD`;
  chomp $base;
  return grep {
    $_->{score} == 1
  } (renamed_dirs_between($base, 'HEAD'),
     renamed_dirs_between($base, 'MERGE_HEAD'));
}

sub renamed_dirs_between {
  my ($base, $commit) = @_;

  my %sources;
  foreach my $pair (renamed_files($base, $commit)) {
    my $d1 = dir_of($pair->[0]);
    my $d2 = dir_of($pair->[1]);
    next unless defined($d1) && defined($d2);

    $sources{$d1}->{total}++;
    $sources{$d1}->{dests}->{$d2}++;
  }

  return map {
    my $from = $_;
    map {
      {
        from => $from,
        to => $_,
        score => $sources{$from}->{dests}->{$_} / $sources{$from}->{total},
      }
    } keys(%{$sources{$from}->{dests}});
  } removed_directories($base, $commit);
}

sub dir_of {
  local $_ = shift;
  s{/[^/]+$}{} or return undef;
  return $_;
}

sub renamed_files {
  my ($from, $to) = @_;
  open(my $fh, '-|', qw(git diff-tree -r -M), $from, $to)
    or die "unable to open diff-tree: $!";
  return map {
    chomp;
    m/ R\d+\t([^\t]+)\t(.*)/ ? [$1 => $2] : ()
  } <$fh>;
}

sub removed_directories {
  my ($base, $commit) = @_;
  my %new_dirs = map { $_ => 1 } directories($commit);
  return grep { !exists $new_dirs{$_} } directories($base);
}

sub directories {
  my $commit = shift;
  return uniq(
    map {
      s{/[^/]+$}{} ? $_ : ()
    } files($commit)
  );
}

sub files {
  my $commit = shift;
  open(my $fh, '-|', qw(git ls-tree -r), $commit)
    or die "unable to open ls-tree: $!";
  return map {
    chomp;
    s/^[^\t]*\t//;
    $_
  } <$fh>;
}

sub uniq {
  my %seen;
  return grep { !$seen{$_}++ } @_;
}

sub move_dir_contents {
  my ($from, $to) = @_;

  my @files = glob("$from/*");
  return unless @files;

  system(qw(git mv), @files, "$to/")
    and die "unable to move $from/* to $to";
  rmdir($from); # ignore error since there may be untracked files
}

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2008-05-08 18:18 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-01 14:10 detecting rename->commit->modify->commit Ittay Dror
2008-05-01 14:45 ` Jeff King
2008-05-01 15:08   ` Ittay Dror
2008-05-01 15:20     ` Jeff King
2008-05-01 15:30       ` Ittay Dror
2008-05-01 15:38         ` Jeff King
2008-05-01 15:47         ` Jakub Narebski
2008-05-01 20:39       ` Teemu Likonen
2008-05-01 23:09         ` Jeff King
2008-05-02  2:06         ` Sitaram Chamarty
2008-05-02  2:38           ` Junio C Hamano
2008-05-02 16:59             ` Sitaram Chamarty
2008-05-01 15:24     ` Ittay Dror
2008-05-01 15:28       ` Jeff King
2008-05-01 14:54 ` Ittay Dror
2008-05-01 15:09   ` Jeff King
2008-05-01 15:20     ` Ittay Dror
2008-05-01 15:30     ` David Tweed
2008-05-01 15:27   ` Avery Pennarun
2008-05-01 15:34     ` Jeff King
2008-05-01 15:50       ` Avery Pennarun
2008-05-01 16:48         ` Jeff King
2008-05-01 19:45           ` Avery Pennarun
2008-05-01 22:42             ` Jeff King
2008-05-01 19:12       ` Steven Grimm
2008-05-01 23:14         ` Jeff King
2008-05-03 17:56           ` merge renamed files/directories? (was: Re: detecting rename->commit->modify->commit) Ittay Dror
2008-05-03 18:11             ` Avery Pennarun
2008-05-04  6:08               ` merge renamed files/directories? Ittay Dror
2008-05-04  9:34                 ` Jakub Narebski
2008-05-05 16:40                 ` Avery Pennarun
2008-05-05 21:49                   ` Robin Rosenberg
2008-05-05 22:20                     ` Linus Torvalds
2008-05-05 23:07                       ` Steven Grimm
2008-05-06  0:29                         ` Linus Torvalds
2008-05-06  0:40                           ` Linus Torvalds
2008-05-06 15:47                           ` Theodore Tso
2008-05-06 16:10                             ` Linus Torvalds
2008-05-06 16:15                               ` Linus Torvalds
2008-05-06 16:32                               ` Ittay Dror
2008-05-06 16:39                                 ` Linus Torvalds
2008-05-06  1:38                       ` Avery Pennarun
2008-05-06  1:46                         ` Shawn O. Pearce
2008-05-06  1:58                           ` Avery Pennarun
2008-05-06  2:12                             ` Shawn O. Pearce
2008-05-06  2:19                         ` Linus Torvalds
2008-05-08 18:17           ` detecting rename->commit->modify->commit Jeff King
2008-05-01 16:39   ` Sitaram Chamarty
2008-05-01 18:58     ` Ittay Dror

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).