* git-mv redux: there must be something else going on @ 2010-02-03 18:25 Ron Garret 2010-02-03 18:48 ` Avery Pennarun 0 siblings, 1 reply; 20+ messages in thread From: Ron Garret @ 2010-02-03 18:25 UTC (permalink / raw) To: git Based on my current understanding of git there should be no difference between a git mv and a git rm followed by a git add. But empirically there is a difference. git log --M --follow is able to track files through git mvs even if their content changes completely. Likewise, it does *not* track files through rm/add combinations even if the content didn't change at all. (See experiment transcript below.) So something in my understanding of how git works must be wrong. Git must be keeping a separate record of file renames somewhere. But where? Just for the record, I'm not complaining about this behavior. In fact, what git does is exactly what I want. I just want to understand how it works. Thanks, rg --- [ron@mickey:~/devel/gittest]$ cat>file1 1 2 3 4 5 [ron@mickey:~/devel/gittest]$ cat>file2 a b c d e [ron@mickey:~/devel/gittest]$ git init Initialized empty Git repository in /Users/ron/devel/gittest/.git/ [ron@mickey:~/devel/gittest]$ git add file1 [ron@mickey:~/devel/gittest]$ git commit -m 'Add numbers' [master (root-commit) 54c2e4a] Add numbers 1 files changed, 5 insertions(+), 0 deletions(-) create mode 100644 file1 [ron@mickey:~/devel/gittest]$ git rm file1 rm 'file1' [ron@mickey:~/devel/gittest]$ git add file2 [ron@mickey:~/devel/gittest]$ git commit -m 'numbers->letters' [master fe05d12] numbers->letters 2 files changed, 5 insertions(+), 5 deletions(-) delete mode 100644 file1 create mode 100644 file2 [ron@mickey:~/devel/gittest]$ git log --name-status -M --follow file2 commit fe05d1233be1bb11f4ed0e8496e4191795d515a0 Author: rongarret <ron@mickey> Date: Wed Feb 3 10:13:38 2010 -0800 numbers->letters A file2 [ron@mickey:~/devel/gittest]$ ls file2 git/ [ron@mickey:~/devel/gittest]$ cat>file2 6 7 8 9 10 [ron@mickey:~/devel/gittest]$ git mv file2 file3 [ron@mickey:~/devel/gittest]$ git commit -m 'letters->numbers' [master ae3f6d4] letters->numbers 1 files changed, 0 insertions(+), 0 deletions(-) rename file2 => file3 (100%) [ron@mickey:~/devel/gittest]$ git log --name-status -M --follow file3 commit ae3f6d440483fa41cf08819237e87d567ac3a31d Author: rongarret <ron@mickey> Date: Wed Feb 3 10:15:00 2010 -0800 letters->numbers R100 file2 file3 commit fe05d1233be1bb11f4ed0e8496e4191795d515a0 Author: rongarret <ron@mickey> Date: Wed Feb 3 10:13:38 2010 -0800 numbers->letters A file2 [ron@mickey:~/devel/gittest]$ ls file3 git/ [ron@mickey:~/devel/gittest]$ mv file3 file4 [ron@mickey:~/devel/gittest]$ git rm file3 rm 'file3' [ron@mickey:~/devel/gittest]$ git add file4 [ron@mickey:~/devel/gittest]$ git commit -m 'rm/add identical content' [master a3d7227] rm/add identical content 2 files changed, 5 insertions(+), 5 deletions(-) delete mode 100644 file3 create mode 100644 file4 [ron@mickey:~/devel/gittest]$ git log --name-status -M --follow file4 commit a3d7227fc2edca75fff8894acd5b077d1788bb36 Author: rongarret <ron@mickey> Date: Wed Feb 3 10:17:23 2010 -0800 rm/add identical content A file4 [ron@mickey:~/devel/gittest]$ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 18:25 git-mv redux: there must be something else going on Ron Garret @ 2010-02-03 18:48 ` Avery Pennarun 2010-02-03 19:23 ` Ron Garret 2010-02-03 20:12 ` Pete Harlan 0 siblings, 2 replies; 20+ messages in thread From: Avery Pennarun @ 2010-02-03 18:48 UTC (permalink / raw) To: Ron Garret; +Cc: git On Wed, Feb 3, 2010 at 1:25 PM, Ron Garret <ron1@flownet.com> wrote: > So something in my understanding of how git works must be wrong. Git > must be keeping a separate record of file renames somewhere. But where? It doesn't. Your experiment is wrong. > [ron@mickey:~/devel/gittest]$ cat>file2 > 6 > 7 > 8 > 9 > 10 > [ron@mickey:~/devel/gittest]$ git mv file2 file3 > [ron@mickey:~/devel/gittest]$ git commit -m 'letters->numbers' > [master ae3f6d4] letters->numbers > 1 files changed, 0 insertions(+), 0 deletions(-) > rename file2 => file3 (100%) Whoops. You didn't 'git add file2' (before the mv) or 'git add file3' (after the mv), or use commit -a, so what you've committed is the *old* content of file2 under the name file3. The *new* content of file2 is still uncommitted in your work tree under the name file3. This is why git can detect the move. (The 100% is a good clue: it means the old and new files are 100% identical.) Artificial tests like this are useless anyway. If you renamed file2 to file3 *and* changed all the contents, did you *really* rename it? If so, who cares? What good does it do you to know this? If someone else tries to patch the old file2 and you merge it into a (totally different) file3 vs a (now missing) file2, how is that any better? On the other hand, if one guy moves file2 to file3 and changes a few lines, you want the other guy's patch to go into file3, whether the first guy used 'git mv' or add+rm or anything else. As long as only a few lines changed, git does the right thing. If most/all of the lines have changed, then there is no right thing, because you'll get a nasty merge conflict either way. Have fun, Avery ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 18:48 ` Avery Pennarun @ 2010-02-03 19:23 ` Ron Garret 2010-02-03 19:47 ` Avery Pennarun 2010-02-03 19:53 ` Nicolas Pitre 2010-02-03 20:12 ` Pete Harlan 1 sibling, 2 replies; 20+ messages in thread From: Ron Garret @ 2010-02-03 19:23 UTC (permalink / raw) To: git In article <32541b131002031048i26d166d9w3567a60515235c34@mail.gmail.com>, Avery Pennarun <apenwarr@gmail.com> wrote: > On Wed, Feb 3, 2010 at 1:25 PM, Ron Garret <ron1@flownet.com> wrote: > > So something in my understanding of how git works must be wrong. Git > > must be keeping a separate record of file renames somewhere. But where? > > It doesn't. Your experiment is wrong. > > > [ron@mickey:~/devel/gittest]$ cat>file2 > > 6 > > 7 > > 8 > > 9 > > 10 > > [ron@mickey:~/devel/gittest]$ git mv file2 file3 > > [ron@mickey:~/devel/gittest]$ git commit -m 'letters->numbers' > > [master ae3f6d4] letters->numbers > > 1 files changed, 0 insertions(+), 0 deletions(-) > > rename file2 => file3 (100%) > > Whoops. You didn't 'git add file2' (before the mv) or 'git add file3' > (after the mv), or use commit -a, so what you've committed is the > *old* content of file2 under the name file3. The *new* content of > file2 is still uncommitted in your work tree under the name file3. > This is why git can detect the move. (The 100% is a good clue: it > means the old and new files are 100% identical.) Ah. That explains everything. Thanks. (I thought git mv was equivalent to git rm followed by git add. But it's not.) > Artificial tests like this are useless anyway. Yes, I know. This was not intended to be a real-world example. I was just trying to understand the heuristics that git uses to track filename changes, and in particular, how much a file could change before git decided it was a different file. When I got to zero shared lines between old and new it was clear that I was missing something fundamental :-) So... how *does* git decide when two blobs are different blobs and when they are the same blob with mods? I asked this question before and was pointed to the diffcore docs, but that didn't really clear things up. That just describes all the different ways git can do diffs, not the actual heuristics that git uses to track content. rg ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 19:23 ` Ron Garret @ 2010-02-03 19:47 ` Avery Pennarun 2010-02-03 20:30 ` Ron Garret 2010-02-03 19:53 ` Nicolas Pitre 1 sibling, 1 reply; 20+ messages in thread From: Avery Pennarun @ 2010-02-03 19:47 UTC (permalink / raw) To: Ron Garret; +Cc: git On Wed, Feb 3, 2010 at 2:23 PM, Ron Garret <ron1@flownet.com> wrote: > In article > Ah. That explains everything. Thanks. (I thought git mv was > equivalent to git rm followed by git add. But it's not.) I suppose in this case it's not. The only difference is when your work tree differs from your index, though, and it's to be expected that 'git rm', in removing things from the index, would lose your ability to track those differences. > So... how *does* git decide when two blobs are different blobs and when > they are the same blob with mods? I asked this question before and was > pointed to the diffcore docs, but that didn't really clear things up. > That just describes all the different ways git can do diffs, not the > actual heuristics that git uses to track content. If you really want to know the details, looking at the code really is probably the best solution; it's not even that long. The short version is that git chooses a set of candidate blobs, then diffs them and figures out a percentage similarity between each pair. (A simple way to think of the similarity index is "how long is the diff compared to the file itself?" If the diff is of length zero, the similarity is 100%, and so on.) If the similarity is greater than a certain threshold, then it's considered to be the same file. Choosing the set of candidates is actually the more interesting problem, since detecting moves using the above algorithm is O(n^2) with the number of candidates. That's why 'git diff' and 'git log' don't do it at all by default. If you provide -M, the set of candidates is the set of files that were removed/modified and the set of files that were added. (Added files are compared against removed/modified files, iirc.) Normally that's a very short list. With -C, you need to compare all added/removed/modified files with all others, which is slightly more work. With --find-copies-harder, it becomes potentially a *lot* of work. Have fun, Avery ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 19:47 ` Avery Pennarun @ 2010-02-03 20:30 ` Ron Garret 0 siblings, 0 replies; 20+ messages in thread From: Ron Garret @ 2010-02-03 20:30 UTC (permalink / raw) To: git In article <32541b131002031147r367ee08fxc64c4c54165953a3@mail.gmail.com>, Avery Pennarun <apenwarr@gmail.com> wrote: > On Wed, Feb 3, 2010 at 2:23 PM, Ron Garret <ron1@flownet.com> wrote: > > In article > > Ah. That explains everything. Thanks. (I thought git mv was > > equivalent to git rm followed by git add. But it's not.) > > I suppose in this case it's not. The only difference is when your > work tree differs from your index, though, and it's to be expected > that 'git rm', in removing things from the index, would lose your > ability to track those differences. > > > So... how *does* git decide when two blobs are different blobs and when > > they are the same blob with mods? I asked this question before and was > > pointed to the diffcore docs, but that didn't really clear things up. > > That just describes all the different ways git can do diffs, not the > > actual heuristics that git uses to track content. > > If you really want to know the details, looking at the code really is > probably the best solution; it's not even that long. > > The short version is that git chooses a set of candidate blobs, then > diffs them and figures out a percentage similarity between each pair. > (A simple way to think of the similarity index is "how long is the > diff compared to the file itself?" If the diff is of length zero, the > similarity is 100%, and so on.) If the similarity is greater than a > certain threshold, then it's considered to be the same file. > > Choosing the set of candidates is actually the more interesting > problem, since detecting moves using the above algorithm is O(n^2) > with the number of candidates. That's why 'git diff' and 'git log' > don't do it at all by default. > > If you provide -M, the set of candidates is the set of files that were > removed/modified and the set of files that were added. (Added files > are compared against removed/modified files, iirc.) Normally that's a > very short list. With -C, you need to compare all > added/removed/modified files with all others, which is slightly more > work. With --find-copies-harder, it becomes potentially a *lot* of > work. Thanks! That clarifies a lot. rg ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 19:23 ` Ron Garret 2010-02-03 19:47 ` Avery Pennarun @ 2010-02-03 19:53 ` Nicolas Pitre 2010-02-03 20:27 ` Ron Garret 1 sibling, 1 reply; 20+ messages in thread From: Nicolas Pitre @ 2010-02-03 19:53 UTC (permalink / raw) To: Ron Garret; +Cc: git On Wed, 3 Feb 2010, Ron Garret wrote: > So... how *does* git decide when two blobs are different blobs and when > they are the same blob with mods? I asked this question before and was > pointed to the diffcore docs, but that didn't really clear things up. > That just describes all the different ways git can do diffs, not the > actual heuristics that git uses to track content. Yes, those same heuristics are used to make the decision. |The second transformation in the chain is diffcore-break, and is |controlled by the -B option to the 'git diff-{asterisk}' commands. |This is used to detect a filepair that represents "complete rewrite" |and break such filepair into two filepairs that represent delete and |create. |[...] |This transformation is used to detect renames and copies, and is |controlled by the -M option (to detect renames) and the -C option |(to detect copies as well) to the 'git diff-{asterisk}' commands. |[...] Note that you may use the -B, -C, -M and --find-copies-harder arguments with log as well as diff commands even if there is no actual diff output. So the explanation is really in that document even if simple rename detection is concerned only by a fraction of what is said there. And Git can detect copied files too. Those semantics are not stored in the repository so they can be improved or even changed after the facts. Nicolas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 19:53 ` Nicolas Pitre @ 2010-02-03 20:27 ` Ron Garret 2010-02-03 20:31 ` Ron Garret ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Ron Garret @ 2010-02-03 20:27 UTC (permalink / raw) To: git In article <alpine.LFD.2.00.1002031436490.1681@xanadu.home>, Nicolas Pitre <nico@fluxnic.net> wrote: > On Wed, 3 Feb 2010, Ron Garret wrote: > > > So... how *does* git decide when two blobs are different blobs and when > > they are the same blob with mods? I asked this question before and was > > pointed to the diffcore docs, but that didn't really clear things up. > > That just describes all the different ways git can do diffs, not the > > actual heuristics that git uses to track content. > > Yes, those same heuristics are used to make the decision. > > |The second transformation in the chain is diffcore-break, and is > |controlled by the -B option to the 'git diff-{asterisk}' commands. > |This is used to detect a filepair that represents "complete rewrite" > |and break such filepair into two filepairs that represent delete and > |create. > |[...] > > |This transformation is used to detect renames and copies, and is > |controlled by the -M option (to detect renames) and the -C option > |(to detect copies as well) to the 'git diff-{asterisk}' commands. > |[...] > > Note that you may use the -B, -C, -M and --find-copies-harder arguments > with log as well as diff commands even if there is no actual diff > output. So the explanation is really in that document even if simple > rename detection is concerned only by a fraction of what is said there. > > And Git can detect copied files too. > > Those semantics are not stored in the repository so they can be improved > or even changed after the facts. OK, on closer reading I see that the information is there, but it's well hidden :-) (For example, the -M option takes an optional numerical argument so you can tweak how much similarity is needed to be considered a move. But the docs for git log don't mention this. It's buried deep in the git diffcore docs. But yes, it's there.) So I think I'm beginning to understand how this works, but that leads me to another question: it seems to me that there are potential screw cases for this purely content-based system of tracking files. For example, suppose I have a directory full of sample config files, all of which are similar to each other. Will that cause diffcore to get confused? Feel free to treat that as a rhetorical question because obviously I can (and probably should) get the answer by trying it. Thanks! rg ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 20:27 ` Ron Garret @ 2010-02-03 20:31 ` Ron Garret 2010-02-03 20:40 ` Avery Pennarun 2010-02-03 20:44 ` Nicolas Pitre 2 siblings, 0 replies; 20+ messages in thread From: Ron Garret @ 2010-02-03 20:31 UTC (permalink / raw) To: git In article <ron1-34F9C6.12273203022010@news.gmane.org>, Ron Garret <ron1@flownet.com> wrote: > In article <alpine.LFD.2.00.1002031436490.1681@xanadu.home>, > Nicolas Pitre <nico@fluxnic.net> wrote: > > > On Wed, 3 Feb 2010, Ron Garret wrote: > > > > > So... how *does* git decide when two blobs are different blobs and when > > > they are the same blob with mods? I asked this question before and was > > > pointed to the diffcore docs, but that didn't really clear things up. > > > That just describes all the different ways git can do diffs, not the > > > actual heuristics that git uses to track content. > > > > Yes, those same heuristics are used to make the decision. > > > > |The second transformation in the chain is diffcore-break, and is > > |controlled by the -B option to the 'git diff-{asterisk}' commands. > > |This is used to detect a filepair that represents "complete rewrite" > > |and break such filepair into two filepairs that represent delete and > > |create. > > |[...] > > > > |This transformation is used to detect renames and copies, and is > > |controlled by the -M option (to detect renames) and the -C option > > |(to detect copies as well) to the 'git diff-{asterisk}' commands. > > |[...] > > > > Note that you may use the -B, -C, -M and --find-copies-harder arguments > > with log as well as diff commands even if there is no actual diff > > output. So the explanation is really in that document even if simple > > rename detection is concerned only by a fraction of what is said there. > > > > And Git can detect copied files too. > > > > Those semantics are not stored in the repository so they can be improved > > or even changed after the facts. > > OK, on closer reading I see that the information is there, but it's well > hidden :-) (For example, the -M option takes an optional numerical > argument so you can tweak how much similarity is needed to be considered > a move. But the docs for git log don't mention this. It's buried deep > in the git diffcore docs. But yes, it's there.) > > So I think I'm beginning to understand how this works, but that leads me > to another question: it seems to me that there are potential screw cases > for this purely content-based system of tracking files. For example, > suppose I have a directory full of sample config files, all of which are > similar to each other. Will that cause diffcore to get confused? > > Feel free to treat that as a rhetorical question because obviously I can > (and probably should) get the answer by trying it. Actually, I think the answer is in Avery's post in another branch of this thread. rg ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 20:27 ` Ron Garret 2010-02-03 20:31 ` Ron Garret @ 2010-02-03 20:40 ` Avery Pennarun 2010-02-03 22:33 ` Ron Garret 2010-02-03 20:44 ` Nicolas Pitre 2 siblings, 1 reply; 20+ messages in thread From: Avery Pennarun @ 2010-02-03 20:40 UTC (permalink / raw) To: Ron Garret; +Cc: git On Wed, Feb 3, 2010 at 3:27 PM, Ron Garret <ron1@flownet.com> wrote: > So I think I'm beginning to understand how this works, but that leads me > to another question: it seems to me that there are potential screw cases > for this purely content-based system of tracking files. For example, > suppose I have a directory full of sample config files, all of which are > similar to each other. Will that cause diffcore to get confused? Cases like that are always confusing, even to humans. Person A renames X to Y, but at the same time creates Z which is almost identical. Person B patches X, then merges in person A's changes. What do you expect to happen? Should Y be changed, because that's the file X was moved from? Or should we change Z, because it's almost the same content anyway? Or maybe we should change both, since a change to the old X is probably intended to affect the copied *content* that ended up in both Y and Z? Simply storing whether person A has renamed vs. copied vs. added a file makes the answer to the "what do you expect to happen" question more obvious, but fails to answer the "what *should* happen" question. Thus it's more of a distraction than a feature. It took a while for me to accept this, but once I did, I realized that git's behaviour has still never caused me a problem in real life, despite repeated file renames and complicated merges. In contrast, svn's explicit rename tracking has shot me in the foot numerous times. (svn remembers when I delete file X and then subsequently re-add it with the same content. So if I merge in someone's change to the *old* file X, it barfs because omg omg that's a totally different file X and it can't possibly figure out what to do. Gee, thanks. It's also hopelessly incompetent at handling "renames" in which a newbie developer didn't know to use svn mv, but instead used svn rm, mv, and svn add.) Have fun, Avery ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 20:40 ` Avery Pennarun @ 2010-02-03 22:33 ` Ron Garret 2010-02-03 23:18 ` Avery Pennarun 2010-02-04 0:48 ` Junio C Hamano 0 siblings, 2 replies; 20+ messages in thread From: Ron Garret @ 2010-02-03 22:33 UTC (permalink / raw) To: git In article <32541b131002031240p6b67536ame6b69c6d662a7968@mail.gmail.com>, Avery Pennarun <apenwarr@gmail.com> wrote: > On Wed, Feb 3, 2010 at 3:27 PM, Ron Garret <ron1@flownet.com> wrote: > > So I think I'm beginning to understand how this works, but that leads me > > to another question: it seems to me that there are potential screw cases > > for this purely content-based system of tracking files. For example, > > suppose I have a directory full of sample config files, all of which are > > similar to each other. Will that cause diffcore to get confused? > > Cases like that are always confusing, even to humans. Person A > renames X to Y, but at the same time creates Z which is almost > identical. Person B patches X, then merges in person A's changes. > > What do you expect to happen? Should Y be changed, because that's the > file X was moved from? Or should we change Z, because it's almost the > same content anyway? Or maybe we should change both, since a change > to the old X is probably intended to affect the copied *content* that > ended up in both Y and Z? > > Simply storing whether person A has renamed vs. copied vs. added a > file makes the answer to the "what do you expect to happen" question > more obvious, but fails to answer the "what *should* happen" question. > Thus it's more of a distraction than a feature. It took a while for > me to accept this, but once I did, I realized that git's behaviour has > still never caused me a problem in real life, despite repeated file > renames and complicated merges. > > In contrast, svn's explicit rename tracking has shot me in the foot > numerous times. (svn remembers when I delete file X and then > subsequently re-add it with the same content. So if I merge in > someone's change to the *old* file X, it barfs because omg omg that's > a totally different file X and it can't possibly figure out what to > do. Gee, thanks. It's also hopelessly incompetent at handling > "renames" in which a newbie developer didn't know to use svn mv, but > instead used svn rm, mv, and svn add.) Here's a realistic case where keeping explicit track of renames could be useful. A and B start with a file named config. A and B both make edits. In addition, B renames config to be config1 and creates a new, very similar file called config2. B then merges from A with the expectation that B's edits to config would end up in config1 and not config2. It seems to me that without tracking renames, it would be luck of the draw which file the patch got applied to. rg ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 22:33 ` Ron Garret @ 2010-02-03 23:18 ` Avery Pennarun 2010-02-03 23:55 ` Jay Soffian 2010-02-04 0:10 ` Ron Garret 2010-02-04 0:48 ` Junio C Hamano 1 sibling, 2 replies; 20+ messages in thread From: Avery Pennarun @ 2010-02-03 23:18 UTC (permalink / raw) To: Ron Garret; +Cc: git On Wed, Feb 3, 2010 at 5:33 PM, Ron Garret <ron1@flownet.com> wrote: > Here's a realistic case where keeping explicit track of renames could be > useful. > > A and B start with a file named config. A and B both make edits. In > addition, B renames config to be config1 and creates a new, very similar > file called config2. B then merges from A with the expectation that B's > edits to config would end up in config1 and not config2. It seems to me > that without tracking renames, it would be luck of the draw which file > the patch got applied to. The problem is that this single "realistic case" is not actually very common, and it's dwarfed by the other realistic cases: developer forgets to use 'git mv' to rename the file; developer accidentally deletes a file, commits, and then readds it later; etc. Have I been bitten by exactly your example? Yup. But I've been bitten by lots of other related things too, and explicit rename tracking (at least in svn) has quite frequently made the problems *worse*. In my personal experience, git screws up less often. The fact that it's also elegant is a nice bonus too :) More about this: http://marc.info/?l=git&m=114123702826251 Have fun, Avery ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 23:18 ` Avery Pennarun @ 2010-02-03 23:55 ` Jay Soffian 2010-02-04 0:10 ` Ron Garret 2010-02-04 0:10 ` Ron Garret 1 sibling, 1 reply; 20+ messages in thread From: Jay Soffian @ 2010-02-03 23:55 UTC (permalink / raw) To: Avery Pennarun; +Cc: Ron Garret, git On Wed, Feb 3, 2010 at 6:18 PM, Avery Pennarun <apenwarr@gmail.com> wrote: > More about this: http://marc.info/?l=git&m=114123702826251 I think the canonical email on the subject is this one: http://article.gmane.org/gmane.comp.version-control.git/217 :-) j. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 23:55 ` Jay Soffian @ 2010-02-04 0:10 ` Ron Garret 0 siblings, 0 replies; 20+ messages in thread From: Ron Garret @ 2010-02-04 0:10 UTC (permalink / raw) To: git In article <76718491002031555i2c1558f9qe0c97d07ceb86bb6@mail.gmail.com>, Jay Soffian <jaysoffian@gmail.com> wrote: > On Wed, Feb 3, 2010 at 6:18 PM, Avery Pennarun <apenwarr@gmail.com> wrote: > > More about this: http://marc.info/?l=git&m=114123702826251 > > I think the canonical email on the subject is this one: > > http://article.gmane.org/gmane.comp.version-control.git/217 > The upshot seems to be this: > And that "where did this come from" decision should be done at _search_ > time, not commit time. And I'm mostly convinced, except for the one screw case that I outlined above. In that case the search-time result is ambiguous, and file tracking information could be used to resolve the ambiguity. But it certainly does seem like a rare enough situation that it's not worth worrying about. I think I'm starting to git it :-) rg ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 23:18 ` Avery Pennarun 2010-02-03 23:55 ` Jay Soffian @ 2010-02-04 0:10 ` Ron Garret 1 sibling, 0 replies; 20+ messages in thread From: Ron Garret @ 2010-02-04 0:10 UTC (permalink / raw) To: git In article <32541b131002031518t1017d351xcf9071f0a937474e@mail.gmail.com>, Avery Pennarun <apenwarr@gmail.com> wrote: > On Wed, Feb 3, 2010 at 5:33 PM, Ron Garret <ron1@flownet.com> wrote: > > Here's a realistic case where keeping explicit track of renames could be > > useful. > > > > A and B start with a file named config. A and B both make edits. In > > addition, B renames config to be config1 and creates a new, very similar > > file called config2. B then merges from A with the expectation that B's > > edits to config would end up in config1 and not config2. It seems to me > > that without tracking renames, it would be luck of the draw which file > > the patch got applied to. > > The problem is that this single "realistic case" is not actually very > common, and it's dwarfed by the other realistic cases: developer > forgets to use 'git mv' to rename the file; developer accidentally > deletes a file, commits, and then readds it later; etc. Makes sense. > Have I been bitten by exactly your example? Yup. But I've been > bitten by lots of other related things too, and explicit rename > tracking (at least in svn) has quite frequently made the problems > *worse*. In my personal experience, git screws up less often. The > fact that it's also elegant is a nice bonus too :) > > More about this: http://marc.info/?l=git&m=114123702826251 Thanks, that's a great read! rg ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 22:33 ` Ron Garret 2010-02-03 23:18 ` Avery Pennarun @ 2010-02-04 0:48 ` Junio C Hamano 1 sibling, 0 replies; 20+ messages in thread From: Junio C Hamano @ 2010-02-04 0:48 UTC (permalink / raw) To: Ron Garret; +Cc: git Ron Garret <ron1@flownet.com> writes: > A and B start with a file named config. A and B both make edits. In > addition, B renames config to be config1 and creates a new, very similar > file called config2. B then merges from A with the expectation that B's > edits to config would end up in config1 and not config2. It seems to me > that without tracking renames, it would be luck of the draw which file > the patch got applied to. I don't think the above is necessarily "rename" issue, but touches an interesting point -- it is so "interesting" to the point that no sane SCM would even consider that is a problem they need to solve. If config1 and config2 are about two different ways to configure the software (e.g. two different build for different customers), and change made by A was to accomodate new configuration option made in the upstream, B might even want to have that addition reflected in _both_ of his configuration files, config1 and config2. Earlier in this message, I said that this is not an issue SCM should even be solving, because a sane way to handle this would _not_ be to copy and edit config1/config2 and keep track of them in SCM; instead, saner people would maintain a build procedure (e.g. Makefile target) to transform the template "config" into necessary "config1" and "config2" customized variants. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 20:27 ` Ron Garret 2010-02-03 20:31 ` Ron Garret 2010-02-03 20:40 ` Avery Pennarun @ 2010-02-03 20:44 ` Nicolas Pitre 2 siblings, 0 replies; 20+ messages in thread From: Nicolas Pitre @ 2010-02-03 20:44 UTC (permalink / raw) To: Ron Garret; +Cc: git On Wed, 3 Feb 2010, Ron Garret wrote: > OK, on closer reading I see that the information is there, but it's well > hidden :-) (For example, the -M option takes an optional numerical > argument so you can tweak how much similarity is needed to be considered > a move. But the docs for git log don't mention this. It's buried deep > in the git diffcore docs. But yes, it's there.) The doc is indeed not perfect. Probably the -M option and friends could be listed again in the git-log and git-diff pages with a more casual explanation. > So I think I'm beginning to understand how this works, but that leads me > to another question: it seems to me that there are potential screw cases > for this purely content-based system of tracking files. For example, > suppose I have a directory full of sample config files, all of which are > similar to each other. Will that cause diffcore to get confused? There are ways to fool the heuristics indeed. But overall it is still more reliable than manually having to record the rename into the tool since humans are known for screwing these things up more often than machines. And again the heuristics can be modified after the fact if needed, unlike the manually recorded false renames (or lack of rename record) which will remain wrong unless another manual correction is applied to the database. Nicolas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 18:48 ` Avery Pennarun 2010-02-03 19:23 ` Ron Garret @ 2010-02-03 20:12 ` Pete Harlan 2010-02-03 20:34 ` Ron Garret 1 sibling, 1 reply; 20+ messages in thread From: Pete Harlan @ 2010-02-03 20:12 UTC (permalink / raw) To: Avery Pennarun; +Cc: Ron Garret, git On 02/03/2010 10:48 AM, Avery Pennarun wrote: >> [ron@mickey:~/devel/gittest]$ git mv file2 file3 >> [ron@mickey:~/devel/gittest]$ git commit -m 'letters->numbers' >> [master ae3f6d4] letters->numbers >> 1 files changed, 0 insertions(+), 0 deletions(-) >> rename file2 => file3 (100%) > > Whoops. You didn't 'git add file2' (before the mv) or 'git add file3' > (after the mv), or use commit -a, so what you've committed is the > *old* content of file2 under the name file3. The *new* content of > file2 is still uncommitted in your work tree under the name file3. It may be reasonable for "git mv foo bar" to print a helpful message to the user if foo has un-checked-in changes, similarly to what "git rm" does. Unlike "git rm", "git mv" could still perform the operation even without "-f", but the semantics of "git mv" differ enough from plain "mv" that a short blurb from Git in that case might help. --Pete ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: git-mv redux: there must be something else going on 2010-02-03 20:12 ` Pete Harlan @ 2010-02-03 20:34 ` Ron Garret 2010-02-03 21:12 ` [PATCH] Documentation: clarify git-mv behaviour wrt dirty files Thomas Rast 0 siblings, 1 reply; 20+ messages in thread From: Ron Garret @ 2010-02-03 20:34 UTC (permalink / raw) To: git In article <4B69D897.2060908@pcharlan.com>, Pete Harlan <pgit@pcharlan.com> wrote: > On 02/03/2010 10:48 AM, Avery Pennarun wrote: > >> [ron@mickey:~/devel/gittest]$ git mv file2 file3 > >> [ron@mickey:~/devel/gittest]$ git commit -m 'letters->numbers' > >> [master ae3f6d4] letters->numbers > >> 1 files changed, 0 insertions(+), 0 deletions(-) > >> rename file2 => file3 (100%) > > > > Whoops. You didn't 'git add file2' (before the mv) or 'git add file3' > > (after the mv), or use commit -a, so what you've committed is the > > *old* content of file2 under the name file3. The *new* content of > > file2 is still uncommitted in your work tree under the name file3. > > It may be reasonable for "git mv foo bar" to print a helpful message to > the user if foo has un-checked-in changes, similarly to what "git rm" does. > > Unlike "git rm", "git mv" could still perform the operation even without > "-f", but the semantics of "git mv" differ enough from plain "mv" that a > short blurb from Git in that case might help. I think that a simple tweak to the docs would be enough. Right now it says: "The index is updated after successful completion, but the change must still be committed." I'm pretty sure I would have been less confused if it had said something like: "The index is updated to reflect the new name of the file, but NOT any new content that file may contain. Changed content must be added to the index separately with git add, and all changes must still be commited." rg ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] Documentation: clarify git-mv behaviour wrt dirty files 2010-02-03 20:34 ` Ron Garret @ 2010-02-03 21:12 ` Thomas Rast 2010-02-03 21:56 ` Junio C Hamano 0 siblings, 1 reply; 20+ messages in thread From: Thomas Rast @ 2010-02-03 21:12 UTC (permalink / raw) To: git; +Cc: Ron Garret, Avery Pennarun, Pete Harlan Clearly point out that the rename happens separately for worktree and index. This confused users, as they are apparently told that git-mv == git-rm && mv && git-add, which it is not. While there, move the synposis to the synopsis section, which so far was rather useless, and reword the first sentence to eliminate the mentions of 'script'. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- Ron, please don't drop the Cc lists, it's customary around here to Cc everyone involved so far. On Wednesday 03 February 2010 21:34:05 you wrote: > In article <4B69D897.2060908@pcharlan.com>, > Pete Harlan <pgit@pcharlan.com> wrote: > > Unlike "git rm", "git mv" could still perform the operation even without > > "-f", but the semantics of "git mv" differ enough from plain "mv" that a > > short blurb from Git in that case might help. > > I think that a simple tweak to the docs would be enough. Right now it > says: > > "The index is updated after successful completion, but the change must > still be committed." > > I'm pretty sure I would have been less confused if it had said something > like: > > "The index is updated to reflect the new name of the file, but NOT any > new content that file may contain. Changed content must be added to the > index separately with git add, and all changes must still be commited." How about this change instead, which formulates it in terms of what does happen, instead of what does not. BTW, I'm wondering whether the "move or rename" distinction is really worth it. Does the user care? I always figured it was a technical detail whether rename() works or you actually need to move anything. Documentation/git-mv.txt | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/Documentation/git-mv.txt b/Documentation/git-mv.txt index bdcb585..eff11b7 100644 --- a/Documentation/git-mv.txt +++ b/Documentation/git-mv.txt @@ -8,22 +8,22 @@ git-mv - Move or rename a file, a directory, or a symlink SYNOPSIS -------- -'git mv' <options>... <args>... +'git mv' [-f] [-n] <source> <destination> +'git mv' [-f] [-n] [-k] <source>... <destination directory> DESCRIPTION ----------- -This script is used to move or rename a file, directory or symlink. - - git mv [-f] [-n] <source> <destination> - git mv [-f] [-n] [-k] <source> ... <destination directory> +'git-mv' renames files, directories, and symlinks in worktree and +index. In the first form, it renames <source>, which must exist and be either a file, symlink or directory, to <destination>. In the second form, the last argument has to be an existing directory; the given sources will be moved into this directory. -The index is updated after successful completion, but the change must still be -committed. +For every renamed file or symlink, the worktree and index contents are +renamed separately, preserving both staged and unstaged changes. You +will still have to commit the rename. OPTIONS ------- -- 1.7.0.rc1.166.g7cae7 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] Documentation: clarify git-mv behaviour wrt dirty files 2010-02-03 21:12 ` [PATCH] Documentation: clarify git-mv behaviour wrt dirty files Thomas Rast @ 2010-02-03 21:56 ` Junio C Hamano 0 siblings, 0 replies; 20+ messages in thread From: Junio C Hamano @ 2010-02-03 21:56 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Ron Garret, Avery Pennarun, Pete Harlan Thomas Rast <trast@student.ethz.ch> writes: > Clearly point out that the rename happens separately for worktree and > index. This confused users, as they are apparently told that git-mv > == git-rm && mv && git-add, which it is not. I may be confused too as I had to read these three lines three times and I do not think these two sentences mesh well together. What happens with "git mv A B" is that it moves a work tree file A to B and moves the index entry for A to B, hence all of: (1) the fact that you do not have A anymore; (2) the fact that you now have B instead; and (3) the fact that your work tree file B (which used to be A) has changes from its corresponding index entry are _consistently_ kept between the work tree and the index. I don't think "happens separately for" makes sense. At best, it is an implementation detail that doesn't help users understand what the command does and what it is used for better. Of course, it is different from "git rm -f --cached A && mv A B && git add B" which would add changes that you were not prepared to add (i.e. you had output from "git diff A" before you started). I think that was a buggy way old scripted version of "git mv" used to work, by the way. > While there, move the synposis to the synopsis section, which so far > was rather useless, and reword the first sentence to eliminate the > mentions of 'script'. That's a good change regardless. > +For every renamed file or symlink, the worktree and index contents are > +renamed separately, preserving both staged and unstaged changes.... I'd just say: While renaming paths, changes in the files in the work tree that you have not added are preserved. > +.... You > +will still have to commit the rename. I don't understand why you want to say "You will still have to commit the rename" here. It is like saying in "git add" manpage that "You will still have to commit the added contents" because "add" only affects the index and does not make a commit. Drop it. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2010-02-04 0:48 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-03 18:25 git-mv redux: there must be something else going on Ron Garret 2010-02-03 18:48 ` Avery Pennarun 2010-02-03 19:23 ` Ron Garret 2010-02-03 19:47 ` Avery Pennarun 2010-02-03 20:30 ` Ron Garret 2010-02-03 19:53 ` Nicolas Pitre 2010-02-03 20:27 ` Ron Garret 2010-02-03 20:31 ` Ron Garret 2010-02-03 20:40 ` Avery Pennarun 2010-02-03 22:33 ` Ron Garret 2010-02-03 23:18 ` Avery Pennarun 2010-02-03 23:55 ` Jay Soffian 2010-02-04 0:10 ` Ron Garret 2010-02-04 0:10 ` Ron Garret 2010-02-04 0:48 ` Junio C Hamano 2010-02-03 20:44 ` Nicolas Pitre 2010-02-03 20:12 ` Pete Harlan 2010-02-03 20:34 ` Ron Garret 2010-02-03 21:12 ` [PATCH] Documentation: clarify git-mv behaviour wrt dirty files Thomas Rast 2010-02-03 21:56 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).