* Comments on recursive merge.. @ 2005-11-07 16:48 Linus Torvalds 2005-11-07 16:56 ` Linus Torvalds 2005-11-07 22:58 ` Comments on recursive merge Fredrik Kuivinen 0 siblings, 2 replies; 58+ messages in thread From: Linus Torvalds @ 2005-11-07 16:48 UTC (permalink / raw) To: Junio C Hamano, Fredrik Kuivinen; +Cc: Git Mailing List Guys, I just hit my first real rename conflict, and very timidly tried the "recursive" strategy in the hopes that I wouldn't need to do things by hand. It resolved things beautifully. Good job. My only worry is that I don't read python, so I don't really know how it does what it does, which makes me nervous. Can somebody (Fredrik?) add some documentation about the merge strategy and how it works. Considering that the stupid resolve strategy really requires you to know how git works when rename conflicts happen (things left in unmerged state are really quite hard to handle by hand unless you know exactly what you're doing), I'd almost suggest making "recursive" the default. I'm a bit nervous about it, but knowing how it works would probably put most of that to rest. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-07 16:48 Comments on recursive merge Linus Torvalds @ 2005-11-07 16:56 ` Linus Torvalds 2005-11-07 23:19 ` [PATCH] merge-recursive: Only print relevant rename messages Fredrik Kuivinen 2005-11-07 22:58 ` Comments on recursive merge Fredrik Kuivinen 1 sibling, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2005-11-07 16:56 UTC (permalink / raw) To: Junio C Hamano, Fredrik Kuivinen; +Cc: Git Mailing List On Mon, 7 Nov 2005, Linus Torvalds wrote: > > I just hit my first real rename conflict, and very timidly tried the > "recursive" strategy in the hopes that I wouldn't need to do things by > hand. > > It resolved things beautifully. Good job. Btw, one thing that it does is print out too much information. In particular, I had renames on both sides of the merge (in case anybody wants to see which one I'm talking about: it's the current top-of-head commit in the kernel archives: 333c47c847c90aaefde8b593054d9344106333b5). Now, renames that you've done yourself you really don't want to hear about, at least if the other side didn't change anything in that file. Renames that the _other_ side has done (the one you're merging) you may or may not want to know about, regardless of whether they happened to files that are changed. But since "git pull" will do a "git-apply --stat" at the end and show the renames there, I'd argue that the merge strategy itself should be quiet about any renames that are trivial. So how about talking about renames only if you end up also doing a file-level merge? As it is, doing the merge talked about renames that I had merged earlier in my own branch, which is just confusing. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH] merge-recursive: Only print relevant rename messages 2005-11-07 16:56 ` Linus Torvalds @ 2005-11-07 23:19 ` Fredrik Kuivinen 2005-11-07 23:54 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Fredrik Kuivinen @ 2005-11-07 23:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Fredrik Kuivinen, Git Mailing List On Mon, Nov 07, 2005 at 08:56:07AM -0800, Linus Torvalds wrote: > > Btw, one thing that it does is print out too much information. > > In particular, I had renames on both sides of the merge (in case anybody > wants to see which one I'm talking about: it's the current top-of-head > commit in the kernel archives: 333c47c847c90aaefde8b593054d9344106333b5). > > Now, renames that you've done yourself you really don't want to hear > about, at least if the other side didn't change anything in that file. > > Renames that the _other_ side has done (the one you're merging) you may or > may not want to know about, regardless of whether they happened to files > that are changed. But since "git pull" will do a "git-apply --stat" at the > end and show the renames there, I'd argue that the merge strategy itself > should be quiet about any renames that are trivial. > > So how about talking about renames only if you end up also doing a > file-level merge? As it is, doing the merge talked about renames that I > had merged earlier in my own branch, which is just confusing. > Sounds like a good idea. How about something like the following? -- It isn't really interesting to know about the renames that have already been committed to the branch you are working on. Furthermore, the 'git-apply --stat' at the end of git-(merge|pull) will tell us about any renames in the other branch. With this commit only renames which require a file-level merge will be printed. Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> --- git-merge-recursive.py | 22 +++++++++++++++------- 1 files changed, 15 insertions(+), 7 deletions(-) applies-to: 5af1b5b93257ecfe993bb24975bf596faa342758 89c029b439603630a53ee4e4d0cb7931111afd2a diff --git a/git-merge-recursive.py b/git-merge-recursive.py index 626d854..9983cd9 100755 --- a/git-merge-recursive.py +++ b/git-merge-recursive.py @@ -162,10 +162,13 @@ def mergeTrees(head, merge, common, bran # Low level file merging, update and removal # ------------------------------------------ +MERGE_NONE = 0 +MERGE_TRIVIAL = 1 +MERGE_3WAY = 2 def mergeFile(oPath, oSha, oMode, aPath, aSha, aMode, bPath, bSha, bMode, branch1Name, branch2Name): - merge = False + merge = MERGE_NONE clean = True if stat.S_IFMT(aMode) != stat.S_IFMT(bMode): @@ -178,7 +181,7 @@ def mergeFile(oPath, oSha, oMode, aPath, sha = bSha else: if aSha != oSha and bSha != oSha: - merge = True + merge = MERGE_TRIVIAL if aMode == oMode: mode = bMode @@ -207,7 +210,8 @@ def mergeFile(oPath, oSha, oMode, aPath, os.unlink(orig) os.unlink(src1) os.unlink(src2) - + + merge = MERGE_3WAY clean = (code == 0) else: assert(stat.S_ISLNK(aMode) and stat.S_ISLNK(bMode)) @@ -577,14 +581,16 @@ def processRenames(renamesA, renamesB, b updateFile(False, ren1.dstSha, ren1.dstMode, dstName1) updateFile(False, ren2.dstSha, ren2.dstMode, dstName2) else: - print 'Renaming', fmtRename(path, ren1.dstName) [resSha, resMode, clean, merge] = \ mergeFile(ren1.srcName, ren1.srcSha, ren1.srcMode, ren1.dstName, ren1.dstSha, ren1.dstMode, ren2.dstName, ren2.dstSha, ren2.dstMode, branchName1, branchName2) - if merge: + if merge or not clean: + print 'Renaming', fmtRename(path, ren1.dstName) + + if merge == MERGE_3WAY: print 'Auto-merging', ren1.dstName if not clean: @@ -653,14 +659,16 @@ def processRenames(renamesA, renamesB, b tryMerge = True if tryMerge: - print 'Renaming', fmtRename(ren1.srcName, ren1.dstName) [resSha, resMode, clean, merge] = \ mergeFile(ren1.srcName, ren1.srcSha, ren1.srcMode, ren1.dstName, ren1.dstSha, ren1.dstMode, ren1.srcName, srcShaOtherBranch, srcModeOtherBranch, branchName1, branchName2) - if merge: + if merge or not clean: + print 'Renaming', fmtRename(ren1.srcName, ren1.dstName) + + if merge == MERGE_3WAY: print 'Auto-merging', ren1.dstName if not clean: ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] merge-recursive: Only print relevant rename messages 2005-11-07 23:19 ` [PATCH] merge-recursive: Only print relevant rename messages Fredrik Kuivinen @ 2005-11-07 23:54 ` Junio C Hamano 2005-11-09 10:36 ` Fredrik Kuivinen 0 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-07 23:54 UTC (permalink / raw) To: Fredrik Kuivinen; +Cc: Linus Torvalds, Git Mailing List Fredrik Kuivinen <freku045@student.liu.se> writes: > @@ -178,7 +181,7 @@ def mergeFile(oPath, oSha, oMode, aPath, > sha = bSha > else: > if aSha != oSha and bSha != oSha: > - merge = True > + merge = MERGE_TRIVIAL The rest looks good to me, but are you sure about this part? I have a feeling that the above "and" should be "or", meaning, we check to see if there is _any_ change, and default to TRIVIAL, but later we would find that we need a real merge and then promote it to MERGE_3WAY. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] merge-recursive: Only print relevant rename messages 2005-11-07 23:54 ` Junio C Hamano @ 2005-11-09 10:36 ` Fredrik Kuivinen 0 siblings, 0 replies; 58+ messages in thread From: Fredrik Kuivinen @ 2005-11-09 10:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: Fredrik Kuivinen, Linus Torvalds, Git Mailing List On Mon, Nov 07, 2005 at 03:54:57PM -0800, Junio C Hamano wrote: > Fredrik Kuivinen <freku045@student.liu.se> writes: > > > @@ -178,7 +181,7 @@ def mergeFile(oPath, oSha, oMode, aPath, > > sha = bSha > > else: > > if aSha != oSha and bSha != oSha: > > - merge = True > > + merge = MERGE_TRIVIAL > > The rest looks good to me, but are you sure about this part? I > have a feeling that the above "and" should be "or", meaning, we > check to see if there is _any_ change, and default to TRIVIAL, > but later we would find that we need a real merge and then > promote it to MERGE_3WAY. > You are right. The code actually do the right thing, but it does it by accident. Please apply the following patch. --- merge-recursive: Fix limited output of rename messages The previous code did the right thing, but it did it by accident. Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se> --- git-merge-recursive.py | 12 ++++-------- 1 files changed, 4 insertions(+), 8 deletions(-) applies-to: bb7dd65e1d945edbe0137a761ebc388c7394067a f56613498cd7fb7013f532a04e63b580314ed957 diff --git a/git-merge-recursive.py b/git-merge-recursive.py index 9983cd9..3657875 100755 --- a/git-merge-recursive.py +++ b/git-merge-recursive.py @@ -162,13 +162,10 @@ def mergeTrees(head, merge, common, bran # Low level file merging, update and removal # ------------------------------------------ -MERGE_NONE = 0 -MERGE_TRIVIAL = 1 -MERGE_3WAY = 2 def mergeFile(oPath, oSha, oMode, aPath, aSha, aMode, bPath, bSha, bMode, branch1Name, branch2Name): - merge = MERGE_NONE + merge = False clean = True if stat.S_IFMT(aMode) != stat.S_IFMT(bMode): @@ -181,7 +178,7 @@ def mergeFile(oPath, oSha, oMode, aPath, sha = bSha else: if aSha != oSha and bSha != oSha: - merge = MERGE_TRIVIAL + merge = True if aMode == oMode: mode = bMode @@ -211,7 +208,6 @@ def mergeFile(oPath, oSha, oMode, aPath, os.unlink(src1) os.unlink(src2) - merge = MERGE_3WAY clean = (code == 0) else: assert(stat.S_ISLNK(aMode) and stat.S_ISLNK(bMode)) @@ -590,7 +586,7 @@ def processRenames(renamesA, renamesB, b if merge or not clean: print 'Renaming', fmtRename(path, ren1.dstName) - if merge == MERGE_3WAY: + if merge: print 'Auto-merging', ren1.dstName if not clean: @@ -668,7 +664,7 @@ def processRenames(renamesA, renamesB, b if merge or not clean: print 'Renaming', fmtRename(ren1.srcName, ren1.dstName) - if merge == MERGE_3WAY: + if merge: print 'Auto-merging', ren1.dstName if not clean: --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-07 16:48 Comments on recursive merge Linus Torvalds 2005-11-07 16:56 ` Linus Torvalds @ 2005-11-07 22:58 ` Fredrik Kuivinen 2005-11-08 0:13 ` Junio C Hamano 1 sibling, 1 reply; 58+ messages in thread From: Fredrik Kuivinen @ 2005-11-07 22:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Fredrik Kuivinen, Git Mailing List On Mon, Nov 07, 2005 at 08:48:06AM -0800, Linus Torvalds wrote: > > Guys, > > I just hit my first real rename conflict, and very timidly tried the > "recursive" strategy in the hopes that I wouldn't need to do things by > hand. > > It resolved things beautifully. Good job. I'm glad that it worked. > My only worry is that I don't read python, so I don't really know how it > does what it does, which makes me nervous. Can somebody (Fredrik?) add > some documentation about the merge strategy and how it works. I will write something up. > Considering that the stupid resolve strategy really requires you to know > how git works when rename conflicts happen (things left in unmerged state > are really quite hard to handle by hand unless you know exactly what > you're doing), I'd almost suggest making "recursive" the default. I'm a > bit nervous about it, but knowing how it works would probably put most of > that to rest. It would be great if the recursive strategy could get some more testing. I have tested it on a thousand commits or so in a few kernel repositories and haven't found any bugs, but it could be due to errors in the test setup, testing the wrong repositories or just being lucky. Some real-world testing would be great. - Fredrik ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-07 22:58 ` Comments on recursive merge Fredrik Kuivinen @ 2005-11-08 0:13 ` Junio C Hamano 2005-11-08 0:33 ` Linus Torvalds 0 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-08 0:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: git, Fredrik Kuivinen Fredrik Kuivinen <freku045@student.liu.se> writes: > On Mon, Nov 07, 2005 at 08:48:06AM -0800, Linus Torvalds wrote: >> >> Guys, >> >> I just hit my first real rename conflict, and very timidly tried the >> "recursive" strategy in the hopes that I wouldn't need to do things by >> hand. >> >> It resolved things beautifully. Good job. > > I'm glad that it worked. This is the first time I see you pleased by something in git that was done without very close supervision from you. All the credits for this one goes to Fredrik, of course, but it is a small victory for me as the maintainer as well, and I am very happy about it. >> ..., I'd almost suggest making "recursive" the default. I'm a >> bit nervous about it, but knowing how it works would probably >> put most of that to rest. Another thing to consider is if it is fast enough for everyday trivial merges. In any case, I've been thinking about teaching git-merge to look into .git/config to make it overridable which strategy to use by default. This would eliminate the hardcoded 'resolve for two-head, octopus for more' rule from git-pull. Then we could ship git-merge with the default rule of 'recursive for two-head, octopus for more', and if it turns out to be premature, you can update your config file to use resolve for two-head case while we sort things out. That way, recursive would get wider test coverage, and people who really need a working merge this minute can choose to run resolve in emergency without specifying '-s resolve' on the command line of 'git pull' every time. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 0:13 ` Junio C Hamano @ 2005-11-08 0:33 ` Linus Torvalds 2005-11-08 0:59 ` Junio C Hamano ` (3 more replies) 0 siblings, 4 replies; 58+ messages in thread From: Linus Torvalds @ 2005-11-08 0:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Fredrik Kuivinen On Mon, 7 Nov 2005, Junio C Hamano wrote: > > This is the first time I see you pleased by something in git > that was done without very close supervision from you. That sounds like a backhanded way of saying that I'm micromanagering, picky and difficult to work with ;) > Another thing to consider is if it is fast enough for everyday > trivial merges. Hmm. True. The _really_ trivial in-index case triggers for me pretty often, but I haven't done any statistics. It might be only 50% of the time. Is the recursive thing noticeably slower for the "easy" cases (ie things that the old regular resolve strategy does well)? It's certainly an option to just do what I just did, namely use the default one until it breaks, and then just do "git reset --hard" and re-do the pull with "-s recursive". A bit sad, and it would be good to have coverage on the recursive strategy.. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 0:33 ` Linus Torvalds @ 2005-11-08 0:59 ` Junio C Hamano 2005-11-08 11:58 ` Johannes Schindelin ` (2 subsequent siblings) 3 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2005-11-08 0:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: git, Fredrik Kuivinen Linus Torvalds <torvalds@osdl.org> writes: > On Mon, 7 Nov 2005, Junio C Hamano wrote: >> >> This is the first time I see you pleased by something in git >> that was done without very close supervision from you. > > That sounds like a backhanded way of saying that I'm micromanagering, > picky and difficult to work with ;) Sorry, that is not what I meant to say at all. You used to micromanage, but it was _very_ good for git back then. I admit that only once I found you too picky and difficult to work with while I was fixing a bad premature-free bug in the diffcore-rename code, but overall your attention to detail well paid off. Since I inherited the project, we added quite a lot of stuff, but I was still unsure if we are making good progress, or just stagnating with only small enhancements and obvious fixes. But Fredrik merge turns out to be a spectacular success as you found out, which is a triumph for Fredrik, but at the same time it means I was not doing too bad myself ;-). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 0:33 ` Linus Torvalds 2005-11-08 0:59 ` Junio C Hamano @ 2005-11-08 11:58 ` Johannes Schindelin 2005-11-08 21:02 ` Fredrik Kuivinen 2005-11-08 16:21 ` [RFC/PATCH] Make git-recursive the default strategy for git-pull Junio C Hamano 2005-11-11 22:25 ` Comments on recursive merge Junio C Hamano 3 siblings, 1 reply; 58+ messages in thread From: Johannes Schindelin @ 2005-11-08 11:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git, Fredrik Kuivinen Hi, On Mon, 7 Nov 2005, Linus Torvalds wrote: > Is the recursive thing noticeably slower for the "easy" cases (ie things > that the old regular resolve strategy does well)? IIRC recursive does nothing else than recursively merging the merge-bases (granted, in a clever way). So if there is only one merge-base, the only slow-down would be the startup of python (which is probably worth it, anyway). > It's certainly an option to just do what I just did, namely use the > default one until it breaks, and then just do "git reset --hard" and re-do > the pull with "-s recursive". A bit sad, and it would be good to have > coverage on the recursive strategy.. We already have a fallback list: after really-trivial, try automatic, ..., try resolve. Why not just add recursive? So, if even resolve failed, just try once more, with recursive. Ciao, Dscho ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 11:58 ` Johannes Schindelin @ 2005-11-08 21:02 ` Fredrik Kuivinen 2005-11-08 21:47 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Fredrik Kuivinen @ 2005-11-08 21:02 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Linus Torvalds, Junio C Hamano, git, Fredrik Kuivinen On Tue, Nov 08, 2005 at 12:58:50PM +0100, Johannes Schindelin wrote: > Hi, > > On Mon, 7 Nov 2005, Linus Torvalds wrote: > > > Is the recursive thing noticeably slower for the "easy" cases (ie things > > that the old regular resolve strategy does well)? > > IIRC recursive does nothing else than recursively merging the merge-bases > (granted, in a clever way). So if there is only one merge-base, the only > slow-down would be the startup of python (which is probably worth it, > anyway). > I haven't done any real measurements but my feeling is that the recursive strategy is at least not very much slower than the resolve strategy. In the single-common-ancestor case I can think of the following things which may make a difference speed wise: * The recursive strategy is written in Python * The code for finding common ancestors is also written in Python and is probably a bit slower than git-merge-base. * git-diff-tree -M --diff-filter=R <common ancestor> <branch> is executed twice, once for each branch. On the positive side the code which corresponds to git-merge-one-file in the git-resolve case is also written in python, we can therefore avoid some forks and execs. > > It's certainly an option to just do what I just did, namely use the > > default one until it breaks, and then just do "git reset --hard" and re-do > > the pull with "-s recursive". A bit sad, and it would be good to have > > coverage on the recursive strategy.. > > We already have a fallback list: after really-trivial, try automatic, ..., > try resolve. Why not just add recursive? So, if even resolve failed, just > try once more, with recursive. > I don't think this is a very good idea for two reasons. The first one is that there are some merge scenarios involving renames which should be conflicts but are cleanly merged by git-resolve. The second reason is that with the fall back list the recursive strategy will only be used in the strange corner cases and will thus not get nearly the same amount of testing it would get if it was the first choice (or directly after the really-trivial merge). - Fredrik ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 21:02 ` Fredrik Kuivinen @ 2005-11-08 21:47 ` Junio C Hamano 2005-11-08 21:52 ` Linus Torvalds 2005-11-08 23:04 ` Comments on recursive merge Johannes Schindelin 2 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2005-11-08 21:47 UTC (permalink / raw) To: Fredrik Kuivinen; +Cc: Johannes Schindelin, Linus Torvalds, git Fredrik Kuivinen <freku045@student.liu.se> writes: > The second reason is that with the fall back list the recursive > strategy will only be used in the strange corner cases and will thus > not get nearly the same amount of testing it would get if it was the > first choice (or directly after the really-trivial merge). There are two reasons to avoid git-merge choose from more than one strategy. 1. The whole idea that git-merge implements "goodness" metric is bogus. It does not know what merge strategy is good and that is the reason it punts and has the user choose his preferred strategy. 2. When it is going to loop over more than one strategy, it stashes away the current working tree state, so that the second and subsequent strategies can begin from a clean slate (including local modifications since the current head). If we try only one, there is no such cost involved. I think the patch I sent out last night to change the recursive as the default strategy and make it overridable from the configuration mechanism would be a better way to give people more exposure to the greatness of recursive while protecting them from potential glitches if any. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 21:02 ` Fredrik Kuivinen 2005-11-08 21:47 ` Junio C Hamano @ 2005-11-08 21:52 ` Linus Torvalds 2005-11-08 22:36 ` Fredrik Kuivinen 2005-11-08 23:04 ` Comments on recursive merge Johannes Schindelin 2 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2005-11-08 21:52 UTC (permalink / raw) To: Fredrik Kuivinen; +Cc: Johannes Schindelin, Junio C Hamano, git On Tue, 8 Nov 2005, Fredrik Kuivinen wrote: > > * The code for finding common ancestors is also written in Python and > is probably a bit slower than git-merge-base. Btw, what part of git-merge-bases is it that makes it not be practical? Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 21:52 ` Linus Torvalds @ 2005-11-08 22:36 ` Fredrik Kuivinen 2005-11-08 23:05 ` Linus Torvalds 0 siblings, 1 reply; 58+ messages in thread From: Fredrik Kuivinen @ 2005-11-08 22:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Fredrik Kuivinen, Johannes Schindelin, Junio C Hamano, git On Tue, Nov 08, 2005 at 01:52:06PM -0800, Linus Torvalds wrote: > > > On Tue, 8 Nov 2005, Fredrik Kuivinen wrote: > > > > * The code for finding common ancestors is also written in Python and > > is probably a bit slower than git-merge-base. > > Btw, what part of git-merge-bases is it that makes it not be practical? > The problem is in the multiple-common-ancestors case. If we have three common ancestors, A, B and C, we will start with merging A with B. The result is a new 'virtual' commit object (not stored in the object database), lets call it V. We are then going to merge V with C. To do that we need to get the common ancestor(s) of V and C, and as V doesn't exist in the database we can't use git-merge-base. I haven't given it a lot of thought though, it might be possible to use git-merge-base in some way and get the same results as we get now. It would certainly be possible to use git-merge-base in the first iteration and use the python code only when we actually have any 'virtual' commit objects. - Fredrik ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 22:36 ` Fredrik Kuivinen @ 2005-11-08 23:05 ` Linus Torvalds 2005-11-08 23:18 ` Johannes Schindelin 2005-11-09 0:32 ` Petr Baudis 0 siblings, 2 replies; 58+ messages in thread From: Linus Torvalds @ 2005-11-08 23:05 UTC (permalink / raw) To: Fredrik Kuivinen; +Cc: Johannes Schindelin, Junio C Hamano, git On Tue, 8 Nov 2005, Fredrik Kuivinen wrote: > > The problem is in the multiple-common-ancestors case. If we have three > common ancestors, A, B and C, we will start with merging A with B. The > result is a new 'virtual' commit object (not stored in the object > database), lets call it V. We are then going to merge V with C. To do > that we need to get the common ancestor(s) of V and C, and as V > doesn't exist in the database we can't use git-merge-base. Hmm. That's really the same as the merge-base of "C _and_ (A _or_ B)", isn't it? So we should be able to do that even without ever seeing the virtual merge. In fact, it really is pretty trivial from a technical standpoint: the "A _or_ B" part is really just inserting both A and B with the same "flags" value (see the "merge_base()" function in merge-base.c). So in general, "merge-base" could trivially be extended to have any number of "OR commits" on either side, as long as there is just one "and". It could also be extended to have multiple "and" cases, but that has actually already been done by "git-show-branch", I think. It's all the same logic, except it uses more than just two bits. So _technically_ it should be easy to do, it would just need some sane command line syntax to specify the grouping. > I haven't given it a lot of thought though, it might be possible to > use git-merge-base in some way and get the same results as we get now. > > It would certainly be possible to use git-merge-base in the first > iteration and use the python code only when we actually have any > 'virtual' commit objects. Just handling the first case specially would be sufficient for 99% of all uses. And if the multi-parent cases are slightly slower, I don't think anybody cares. In fact, we _always_ do the first git-merge-base in git-merge.sh anyway (actually, not git-merge-base, but "git-show-branch --merge-base"). So that's really done already for the "test if it's trivial" case.. Junio, that points out that "git-merge-base" is another program that could just be removed, since it's really supreceded by git-show-branch. Or did I miss something? Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 23:05 ` Linus Torvalds @ 2005-11-08 23:18 ` Johannes Schindelin 2005-11-09 0:18 ` Linus Torvalds 2005-11-09 0:32 ` Petr Baudis 1 sibling, 1 reply; 58+ messages in thread From: Johannes Schindelin @ 2005-11-08 23:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Fredrik Kuivinen, Junio C Hamano, git Hi, On Tue, 8 Nov 2005, Linus Torvalds wrote: > Junio, that points out that "git-merge-base" is another program that could > just be removed, since it's really supreceded by git-show-branch. Or did I > miss something? IIRC, git-show-branch has a limit on the number of refs it can take. Ciao, Dscho ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 23:18 ` Johannes Schindelin @ 2005-11-09 0:18 ` Linus Torvalds 2005-11-09 6:10 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2005-11-09 0:18 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Fredrik Kuivinen, Junio C Hamano, git On Wed, 9 Nov 2005, Johannes Schindelin wrote: > > On Tue, 8 Nov 2005, Linus Torvalds wrote: > > > Junio, that points out that "git-merge-base" is another program that could > > just be removed, since it's really supreceded by git-show-branch. Or did I > > miss something? > > IIRC, git-show-branch has a limit on the number of refs it can take. Well, git-merge-base does too. git-merge-base only takes two refs ;) In general, you need to keep track of one bit per ref, and since we have a 32-bit "flags" word and need a couple of bits for other maintenance info, pretty much anything that figures out common heads will be limited some way. This is only a limit for the "and" logic - the "or" logic (if we implement it) will just share the same status bit for all the refs that are "ored together" and thus has no limits. Oh, and the "and" logic can be extended by running the program multiple times, so it's not a "hard" limit, it's just an issue of convenience. That said, anybody who ever does an octopus of more than just a few heads deserves to be shot, so I don't think the limit should matter. The recursive strategy should only add the "or" kind of refs, and it shouldn't be a problem (apart from just how to describe them). Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 0:18 ` Linus Torvalds @ 2005-11-09 6:10 ` Junio C Hamano 0 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2005-11-09 6:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > In general, you need to keep track of one bit per ref, and since we have > a 32-bit "flags" word and need a couple of bits for other maintenance > info, pretty much anything that figures out common heads will be limited > some way. > > This is only a limit for the "and" logic - the "or" logic (if we implement > it) will just share the same status bit for all the refs that are "ored > together" and thus has no limits. > > Oh, and the "and" logic can be extended by running the program multiple > times, so it's not a "hard" limit, it's just an issue of convenience. > > That said, anybody who ever does an octopus of more than just a few heads > deserves to be shot, so I don't think the limit should matter. The > recursive strategy should only add the "or" kind of refs, and it > shouldn't be a problem (apart from just how to describe them). Come to think of it, git-merge-octopus does AND. If I am merging topic branches 1, 2, 3,... N into my master, internally it does an equivalent of merging 1 into master, then 2 into the result, then C into that result,..., and it uses merge-base of all the heads merged so far and the original master to pivot on. And I think this is *wrong*. The merge base of each step when merging head N does not have to be older than merge base of the original master and head N, but currently that is not what it does. I should be ORing them ideally, but even if I do not, I should be able to just use the merge base of head N and original master. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 23:05 ` Linus Torvalds 2005-11-08 23:18 ` Johannes Schindelin @ 2005-11-09 0:32 ` Petr Baudis 2005-11-09 0:51 ` Linus Torvalds 1 sibling, 1 reply; 58+ messages in thread From: Petr Baudis @ 2005-11-09 0:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: Fredrik Kuivinen, Johannes Schindelin, Junio C Hamano, git Dear diary, on Wed, Nov 09, 2005 at 12:05:43AM CET, I got a letter where Linus Torvalds <torvalds@osdl.org> said that... > Junio, that points out that "git-merge-base" is another program that could > just be removed, since it's really supreceded by git-show-branch. Or did I > miss something? Wow, I didn't know git-show-branch could do that (even though it's a bit unnatural to expect this from command named this way; then again, there's git-rev-parse...). BTW, git-show-branch is also by orders of magnitude faster (not that this would be any major timesaver). Median 0.006s vs. median 0.124s. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 0:32 ` Petr Baudis @ 2005-11-09 0:51 ` Linus Torvalds 2005-11-09 0:59 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2005-11-09 0:51 UTC (permalink / raw) To: Petr Baudis; +Cc: Fredrik Kuivinen, Johannes Schindelin, Junio C Hamano, git On Wed, 9 Nov 2005, Petr Baudis wrote: > > BTW, git-show-branch is also by orders of magnitude faster (not that > this would be any major timesaver). Median 0.006s vs. median 0.124s. Ouch. That makes me suspicious. One reason git-merge-base is slow is because it's being pretty careful about some pathological examples of dates being just the wrong way around, and it might just be that the reason git-show-branch is faster is because it isn't doing that part right. So yes, git-merge-base does extra work, but it does so because I think it needs to. Junio? You even wrote the comment about the case in git-merge-base, I'm wondering whether it's a bug that we use the fast-and-cheap algorithm in git-show-branch.. Of course, arguably you can first try the fast-and-cheap thing, and if that gives a merge parent that is acceptable, why not? So maybe it's the right thing for the "let's see if this is trivial" case, but I think it might _think_ some cases are trivial that really shouldn't, because they actually have two merge parents. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 0:51 ` Linus Torvalds @ 2005-11-09 0:59 ` Junio C Hamano 2005-11-09 1:22 ` Linus Torvalds 0 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-09 0:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > Junio? You even wrote the comment about the case in git-merge-base, I'm > wondering whether it's a bug that we use the fast-and-cheap algorithm in > git-show-branch.. I did show-branch soon after we worked on those pathlogical merge-base fix, so I would be a bit surprised if I did it without using all the knowledge from that exercise, but I do not remember offhand. The core logic should be simple generalization of two-head merge-base to N heads. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 0:59 ` Junio C Hamano @ 2005-11-09 1:22 ` Linus Torvalds 2005-11-09 1:42 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2005-11-09 1:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Tue, 8 Nov 2005, Junio C Hamano wrote: > > I did show-branch soon after we worked on those pathlogical > merge-base fix, so I would be a bit surprised if I did it > without using all the knowledge from that exercise, but I do not > remember offhand. The core logic should be simple > generalization of two-head merge-base to N heads. Hmm. Look at the "join_revs()" logic, and tell me I'm crazy. It does: struct commit *commit = pop_one_commit(list_p); int still_interesting = !!interesting(*list_p); in that order: it looks whether there are any interesting commits left _after_ it has popped the top-of-stack. Which means that "still_interesting" can go down to zero if we just popped the last interesting thing off the stack. Which seems wrong, because the thing we just popped off the stack could easily itself be interesting (in fact, it should be so, 99% of the time), and can cause other interesting commits to be populated back onto the list. So the "still_interesting" flag seems to be wrongly computed: the way it is computed now, it's meaningless. In contrast, the "merge_base()" thing does while (interesting(list)) { .. } which means that we really will walk the list until there is nothing interesting left. Which is admittedly expensive, but it was how we got rid of the pathological case. But maybe I'm just missing something really subtle. Maybe git-show-branch does some really clever optimization that is valid. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 1:22 ` Linus Torvalds @ 2005-11-09 1:42 ` Junio C Hamano 2005-11-09 10:20 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-09 1:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > It does: > > struct commit *commit = pop_one_commit(list_p); > int still_interesting = !!interesting(*list_p); > > in that order: it looks whether there are any interesting commits left > _after_ it has popped the top-of-stack. Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh. You are right. The problem is most of the time hidden, because we usually do one extra round (extra usually starts from 0 and we break out after we say "not interesting anymore" and extra < 0). Obviously, I was not thinking clearly. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 1:42 ` Junio C Hamano @ 2005-11-09 10:20 ` Junio C Hamano 2005-11-09 14:59 ` Petr Baudis 2005-11-09 16:30 ` Linus Torvalds 0 siblings, 2 replies; 58+ messages in thread From: Junio C Hamano @ 2005-11-09 10:20 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Junio C Hamano <junkio@cox.net> writes: > Linus Torvalds <torvalds@osdl.org> writes: > >> It does: >> >> struct commit *commit = pop_one_commit(list_p); >> int still_interesting = !!interesting(*list_p); >> >> in that order: it looks whether there are any interesting commits left >> _after_ it has popped the top-of-stack. > > The problem is most of the time hidden,... As you pointed out, still_interesting means "after we are done with this commit, do we still have something interesting to be processed?", and the later "extra < 0" check compensates for this. After I pop the last interesting commit, I still look at its parents and push them back into the list. It seems to be doing the right thing after all. I hate to admit it, but I have been having hard time figuring out how this thing works X-<. In the meantime, I've checked commits from linux-2.6 history that have more than one merge-base candidates. "git-merge-base --all" and "git-show-branch --merge-base" give the same answer to all of them [*1*]. I do not think "git-show-branch --merge-base" can be any more efficient than "git-merge-base --all". It does _more_ things (probably unnecessary things as well). Pasky's number could be just an artifact of hot/cold cache difference. [Footnote] *1* Here are the commits I used from linux-2.6 repository that have more than one commits: ba9b543d5bec0a7605952e2ba501fb8b0f3b6407 84ffa747520edd4556b136bdfc9df9eb1673ce12 da28c12089dfcfb8695b6b555cdb8e03dda2b690 3190186362466658f01b2e354e639378ce07e1a9 0c168775709faa74c1b87f1e61046e0c51ade7f3 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 467ca22d3371f132ee225a5591a1ed0cd518cb3d ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 10:20 ` Junio C Hamano @ 2005-11-09 14:59 ` Petr Baudis 2005-11-09 16:30 ` Linus Torvalds 1 sibling, 0 replies; 58+ messages in thread From: Petr Baudis @ 2005-11-09 14:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, git Dear diary, on Wed, Nov 09, 2005 at 11:20:22AM CET, I got a letter where Junio C Hamano <junkio@cox.net> said that... > I do not think "git-show-branch --merge-base" can be any more > efficient than "git-merge-base --all". It does _more_ things > (probably unnecessary things as well). Pasky's number could be > just an artifact of hot/cold cache difference. Certainly not that. But I've fetched in the meantime and now show-branch takes much longer - median 0.078s (git-merge-base's median still stays around 0.128s). So possibly git-show-branch did some smart optimization right away in the previous case. I can try to track down the particular commits if there's any interest. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 10:20 ` Junio C Hamano 2005-11-09 14:59 ` Petr Baudis @ 2005-11-09 16:30 ` Linus Torvalds 2005-11-09 20:13 ` Junio C Hamano 1 sibling, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2005-11-09 16:30 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Wed, 9 Nov 2005, Junio C Hamano wrote: > > As you pointed out, still_interesting means "after we are done > with this commit, do we still have something interesting to be > processed?", and the later "extra < 0" check compensates for > this. After I pop the last interesting commit, I still look at > its parents and push them back into the list. That "extra" check only helps once. If we ever hit the "extra--", it's gone. In other words, follow this: - we start out with "extra = 0" (default value) - we've got one "interesting" commit left, and we just popped it. - we now have "still_interesting = 0" - the commit has just one parent, and it's not something we've seen before, so we add it to the seen list and decrement "extra", which is now -1. We then insert it back to the list. - we go back up, pop the thing we just got, and now there are again no interesting commits on the list any more, so "still_interesting = 0". - now "extra" is -1, and we break out of the loop without ever percolating the flags of this commit to its parents. No? > It seems to be doing the right thing after all. I hate to admit it, but > I have been having hard time figuring out how this thing works X-<. In > the meantime, I've checked commits from linux-2.6 history that have more > than one merge-base candidates. I'm not very impressed by "it works for the seven cases I tried". It's entirely possible that there _is_ some reason it always works, but if so, I'd like to understand it. More likely, it works in _practice_ because the only way to trigger anything else is likely such a perverse commit history that you'd never see it, but hey.. Also, I don't think this has necessarily anything to do with "multiple merge bases". As far as I can tell, we can find a potential "merge base" that starts the culling of uniniteresting things, but some other branch (that we haven't followed yet - perhaps the one we just broke out of early) may end up causing an _earlier_ commit to turn out to also be a merge-base, and the merge-base we found originally turns out to be a parent of the new one, and thus totally uninteresting. See what I'm saying? Even with just _one_ well-defined merge base, we might hit it. It so happens that because we traverse the commit history in date order, we almost never (but the keyword here is _almost_) hit the case where a child of a commit ends up being parsed _after_ the commit that is its parent. That only happens when there are non-synchronized clocks etc, and there are very few cases of that in the kernel tree. Just to see how rare that is, do this: git-rev-list --pretty=raw HEAD | grep '^committer' | cut -d'>' -f2 | cut -d' ' -f2 > date-list which basically generates the list of dates of commits in the kernel tree, sorted in the natural order that we always traverse the commits in. Now, do sort -nr date-list | diff -u date-list - to see how often the dates are off. I'm seeing only _three_ commits that have time-warps (ie they were "earlier" than one of their parents). Out of 13,000+. So walking things in date order _almost_ always does the right thing just by mistake (well, it's not "mistake", of course. It's by design: it's the closest we can get to a nice balanced walk. But the point is that it's still just a heuristic, not something we can absolutely depend on). And THAT was the reason for the problem with the original git-merge-base algorithm. Not multiple merge-bases (which was admittedly another problem), but the fact that it didn't give the right merge-base at all due to time warps. (Again - it may be that there's something in show-branch that makes the optimization valid, but I just don't understand it). Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 16:30 ` Linus Torvalds @ 2005-11-09 20:13 ` Junio C Hamano 2005-11-09 21:58 ` Linus Torvalds 0 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-09 20:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > That "extra" check only helps once. If we ever hit the "extra--", it's > gone. I think you are right here, but while digging into this I found an interesting case. The current show-branch code does the same as merge-base in the pathological example depicted in merge-base.c, but they seem to do different things to this picture (commit grows from bottom to top, time flows alphabetically; find base between G and H). H / \ G A \ |\ / \ | B \ | \ \ \ C F \ \ / \ D / \ | / \| / E "git-merge-base --all" says the merge bases are B and E, while "show-branch --merge-base" mentions only B. In this case the latter is probably the better answer. Actually git-merge-base without --all only mentions E. This is because we give up when we find the list elements are all uninteresting. And this is very expensive to fix (I recall mentioning "horizon effect" last time we worked on this --- around August 12th). G gets bit 1 and H gets bit 2. Here is what happens in each iteration: List A B C D E F G H Result G1 H2 - - - - - - 1 2 H2 E1 B1 - 1 - - 1 - 1 2 F2 E1 B1 A2 2 1 - - 1 2 1 2 E3 B1 A2 2 1 - - 3 2 1 2 E3 B1 A2 2 1 - - 3 2 1 2 E3 C1 A2 2 1 1 - 3 2 1 2 E3 D1 A2 2 1 1 1 3 2 1 2 E3 A2 2 1 1 1 3 2 1 2 E3 B3 2 3 1 1 3 2 1 2 E3 B3 C7 2 3 7 1 3 2 1 2 E3 B3 We popped B with flag 3, and started contaminating the well by reinjecting its parent C with flag 7. That is all good, but "while (interesting(list))" check stops us from going further. Ideally the following two steps would have found out that E is also uninteresting. D7 2 3 7 7 3 2 1 2 E7 2 3 7 7 7 2 1 2 But that is expensive -- we would not know when to stop. A reproduction recipe is attached here, primarily so I do not have to worry about losing it from /var/tmp/. -- >8 -- cut here -- >8 -- #!/bin/sh rm -fr .git && git-init-db T=$(git-write-tree) M=1130000000 Z=+0000 export GIT_COMMITTER_EMAIL=git@comm.iter.xz export GIT_COMMITTER_NAME='C O Mmiter' export GIT_AUTHOR_NAME='A U Thor' export GIT_AUTHOR_EMAIL=git@au.thor.xz doit() { OFFSET=$1; shift NAME=$1; shift PARENTS= for P do PARENTS="${PARENTS}-p $P " done GIT_COMMITTER_DATE="$(($M + $OFFSET)) $Z" GIT_AUTHOR_DATE=$GIT_COMMITTER_DATE export GIT_COMMITTER_DATE GIT_AUTHOR_DATE commit=$(echo $NAME | git-commit-tree $T $PARENTS) echo $commit >.git/refs/tags/$NAME echo $commit } checkit() { echo MB git-merge-base --all "$@" | xargs git-name-rev echo SB git-show-branch --merge-base "$@" | xargs git-name-rev git-show-branch --sha1-name --more=99 "$@" } E=$(doit 5 E) D=$(doit 4 D $E) F=$(doit 6 F $E) C=$(doit 3 C $D) B=$(doit 2 B $C) A=$(doit 1 A $B) G=$(doit 7 G $B $E) H=$(doit 8 H $A $F) checkit $G $H exit ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 20:13 ` Junio C Hamano @ 2005-11-09 21:58 ` Linus Torvalds 2005-11-09 22:56 ` Junio C Hamano 2005-11-11 2:58 ` merge-base: fully contaminate the well Junio C Hamano 0 siblings, 2 replies; 58+ messages in thread From: Linus Torvalds @ 2005-11-09 21:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Wed, 9 Nov 2005, Junio C Hamano wrote: > > The current show-branch code does the same as merge-base in the > pathological example depicted in merge-base.c, but they seem to > do different things to this picture (commit grows from bottom to > top, time flows alphabetically; find base between G and H). > > H > / \ > G A \ > |\ / \ > | B \ > | \ \ > \ C F > \ \ / > \ D / > \ | / > \| / > E > > "git-merge-base --all" says the merge bases are B and E, while > "show-branch --merge-base" mentions only B. In this case the > latter is probably the better answer. I don't agree. Sure, B _may_ be the right answer for a particular merge strategt, but there's no way of knowing. Maybe all the big changes came in through F, and H is the merge that sorted that out, and E actually ends up being the better base. So I think from a correctness standpoint, the only thing that matters is "git-merge-base --all", and anything that doesn't know to return both E and B looks potentially buggy. > Actually git-merge-base without --all only mentions E. Well, we should really consider anything that doesn't take them all into account to be a bug waiting to happen (or rather, a merge waiting for a disaster), but E is the right one, since it's the more recent one). Now, this case obviously depends on history being almost maximally insane (ie pretty much _all_ the dates are wrong). So in practice we probably don't care. So maybe "git-show-branch --merge-base" ends up acceptable as a faster way to do the quick "let's see if we can find _some_ merge-base to do the in-index merge with", but personally I'd much rather always do a "git-merge-base --all", and only do the fast index merge if we only have one potential parent. That way there would never any question about what the "quick merge" does. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 21:58 ` Linus Torvalds @ 2005-11-09 22:56 ` Junio C Hamano 2005-11-09 23:34 ` Linus Torvalds 2005-11-11 2:58 ` merge-base: fully contaminate the well Junio C Hamano 1 sibling, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-09 22:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: >> >> H >> / \ >> G A \ >> |\ / \ >> | B \ >> | \ \ >> \ C F >> \ \ / >> \ D / >> \ | / >> \| / >> E >> > So I think from a correctness standpoint, the only thing that matters is > "git-merge-base --all", and anything that doesn't know to return both E > and B looks potentially buggy. But the point of well-poisoning you did in merge-base was to detect that E is an ancestor of B and exclude it in the first place. If it matters what F does, it means checking ancestry among B C D E and declare that B is a better ancestor than C, D, E does not help or is sometimes harmful. No question that B is always superiour ancestor than C and D, but arguably the presence of F _might_ change situation for B vs E. I however do not see merge-base trying to take that into account and treat E differently from C and D in any way. Only because F and E had newer timestamp than C and D, we ended up finding E first and did not poison E through B, and that's why you got both B and E. I think it was just an accident. If F were older than B, I suspect the result would have been very different. > Now, this case obviously depends on history being almost maximally insane > (ie pretty much _all_ the dates are wrong). So in practice we probably > don't care. I agree. The above example was to answer my own question in this message: http://marc.theaimsgroup.com/?l=git&m=112382448222823 > ... personally I'd much rather always do a > "git-merge-base --all", and only do the fast index merge if we only have > one potential parent. > > That way there would never any question about what the "quick merge" does. I agree we should try to stay away from "heuristic" and make things safer, but after seeing the above, I'd need a bit more time to convince myself that what 'git-merge-base --all' does is *the* safe approach. Right now, it looks to me that both are heuristic that work most of the time (merge-base --all 99.99999% of the time, show-branch 99% of the time, or something like that). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-09 22:56 ` Junio C Hamano @ 2005-11-09 23:34 ` Linus Torvalds 0 siblings, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2005-11-09 23:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Wed, 9 Nov 2005, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > >> > >> H > >> / \ > >> G A \ > >> |\ / \ > >> | B \ > >> | \ \ > >> \ C F > >> \ \ / > >> \ D / > >> \ | / > >> \| / > >> E > >> > > So I think from a correctness standpoint, the only thing that matters is > > "git-merge-base --all", and anything that doesn't know to return both E > > and B looks potentially buggy. > > But the point of well-poisoning you did in merge-base was to > detect that E is an ancestor of B and exclude it in the first > place. Ahh, you're right, and I'm wrong. That "E" is not a real merge-base, since there _is_ a valid merge-base that is a direct descendant of it and thus objectively better. And as to why git-merge-base returns E in the first place: it really shouldn't, but when it sees B, it can decide that C is uninteresting, and so there are no interesting commits left. So it never continues to walk D and thus never notices that D covers E and E is _also_ uninteresting. So it thinks both E and B are interesting, and since E has a more recent date, it will select that one when only showing one (and then show both when asked to). > I however do not see merge-base trying to take that into account > and treat E differently from C and D in any way. It doesn't. git-merge-base simply walks the chain in as close to date order as it can, and when it decides that the rest of the chain is provably uninteresting, it stops. Which means that sometimes it can stop with too _many_ merge heads, just because it hasn't realized that they are reachable through a chain that is otherwise provably uninteresting. This is because we define "uninteresting" as meaning "cannot reach any more _new_ merge-heads". Which is true. The fact that such a chain could reach some heads we found earlier and mark them as being pointless never enters the picture ;) > I agree we should try to stay away from "heuristic" and make > things safer, but after seeing the above, I'd need a bit more > time to convince myself that what 'git-merge-base --all' does is > *the* safe approach. Well, "git-merge-base --all" will be "safer" in the sense that it's guaranteed to give a superset of the merge-heads (which itself is "safe" in that it flags potentially interesting cases early). Then, the recursive merge strategy could notice (in fact, _will_ notice, if it tries to merge the merge-heads) that the merge of such a pair of merge-heads is one of the heads itself (just a fast-forward), and thus the recursive strategy should correctly have chosen "B" as the merge-head. So yes, "git-merge-base --all" really is safe. Sometimes (under fairly odd circumstances) a bit unnecessarily conservative, but always safe. > Right now, it looks to me that both are heuristic that work most of the > time (merge-base --all 99.99999% of the time, show-branch 99% of the > time, or something like that). The thing is, I don't see what guarantees that the show-branch brhaviour is safe or conservative. It happened to pick B in this case, which was the right choice, but I don't see how that was anything but just luck and happenstance. IOW, I can see that "git-merge-base --all" can return some unnecessary heads, but I can also argue for how that becomes safe and fixes itself. With git-show-branch --merge-base, I don't know what that argument is, because I can't see how it _guarantees_ that it would always pick B over E. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* merge-base: fully contaminate the well. 2005-11-09 21:58 ` Linus Torvalds 2005-11-09 22:56 ` Junio C Hamano @ 2005-11-11 2:58 ` Junio C Hamano 2005-11-11 5:36 ` Linus Torvalds 1 sibling, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-11 2:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: git The discussion on the list demonstrated a pathological case where an ancestor of a merge-base can be left interesting. This commit introduces a postprocessing phase to fix it. Signed-off-by: Junio C Hamano <junkio@cox.net> --- Linus Torvalds <torvalds@osdl.org> writes: > On Wed, 9 Nov 2005, Junio C Hamano wrote: > >> But the point of well-poisoning you did in merge-base was to >> detect that E is an ancestor of B and exclude it in the first >> place. > > Ahh, you're right, and I'm wrong. That "E" is not a real merge-base, since > there _is_ a valid merge-base that is a direct descendant of it and thus > objectively better. > > Which means that sometimes it can stop with too _many_ merge heads, just > because it hasn't realized that they are reachable through a chain that is > otherwise provably uninteresting. I am not particularly proud of this change, but here is an attempt to fully contaminate the well without going all the way down to root. It adds a postprocessing phase which does not parse any new commits. > The thing is, I don't see what guarantees that the show-branch brhaviour > is safe or conservative. You are right about this; I have a separate patch to fix it. merge-base.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 77 insertions(+), 1 deletions(-) applies-to: 8eacf17303188e55375f76bea8051555ba1baf02 a2fd94f707128f3f390362725c8d8b0802940111 diff --git a/merge-base.c b/merge-base.c index 286bf0e..43a6818 100644 --- a/merge-base.c +++ b/merge-base.c @@ -80,6 +80,45 @@ static struct commit *interesting(struct * Now, list does not have any interesting commit. So we find the newest * commit from the result list that is not marked uninteresting. Which is * commit B. + * + * + * Another pathological example how this thing can fail to mark an ancestor + * of a merge base as UNINTERESTING without the postprocessing phase. + * + * 2 + * H + * 1 / \ + * G A \ + * |\ / \ + * | B \ + * | \ \ + * \ C F + * \ \ / + * \ D / + * \ | / + * \| / + * E + * + * list A B C D E F G H + * G1 H2 - - - - - - 1 2 + * H2 E1 B1 - 1 - - 1 - 1 2 + * F2 E1 B1 A2 2 1 - - 1 2 1 2 + * E3 B1 A2 2 1 - - 3 2 1 2 + * B1 A2 2 1 - - 3 2 1 2 + * C1 A2 2 1 1 - 3 2 1 2 + * D1 A2 2 1 1 1 3 2 1 2 + * A2 2 1 1 1 3 2 1 2 + * B3 2 3 1 1 3 2 1 2 + * C7 2 3 7 1 3 2 1 2 + * + * At this point, unfortunately, everybody in the list is + * uninteresting, so we fail to complete the following two + * steps to fully marking uninteresting commits. + * + * D7 2 3 7 7 3 2 1 2 + * E7 2 3 7 7 7 2 1 2 + * + * and we end up showing E as an interesting merge base. */ static int show_all = 0; @@ -88,6 +127,7 @@ static int merge_base(struct commit *rev { struct commit_list *list = NULL; struct commit_list *result = NULL; + struct commit_list *tmp = NULL; if (rev1 == rev2) { printf("%s\n", sha1_to_hex(rev1->object.sha1)); @@ -104,9 +144,10 @@ static int merge_base(struct commit *rev while (interesting(list)) { struct commit *commit = list->item; - struct commit_list *tmp = list, *parents; + struct commit_list *parents; int flags = commit->object.flags & 7; + tmp = list; list = list->next; free(tmp); if (flags == 3) { @@ -130,6 +171,41 @@ static int merge_base(struct commit *rev if (!result) return 1; + /* + * Postprocess to fully contaminate the well. + */ + for (tmp = result; tmp; tmp = tmp->next) { + struct commit *c = tmp->item; + /* Reinject uninteresting ones to list, + * so we can scan their parents. + */ + if (c->object.flags & UNINTERESTING) + commit_list_insert(c, &list); + } + while (list) { + struct commit *c = list->item; + struct commit_list *parents; + + tmp = list; + list = list->next; + free(tmp); + + /* Anything taken out of the list is uninteresting, so + * mark all its parents uninteresting. We do not + * parse new ones (we already parsed all the relevant + * ones). + */ + parents = c->parents; + while (parents) { + struct commit *p = parents->item; + parents = parents->next; + if (!(p->object.flags & UNINTERESTING)) { + p->object.flags |= UNINTERESTING; + commit_list_insert(p, &list); + } + } + } + while (result) { struct commit *commit = result->item; result = result->next; --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: merge-base: fully contaminate the well. 2005-11-11 2:58 ` merge-base: fully contaminate the well Junio C Hamano @ 2005-11-11 5:36 ` Linus Torvalds 2005-11-11 6:04 ` Junio C Hamano 2005-11-11 8:28 ` Junio C Hamano 0 siblings, 2 replies; 58+ messages in thread From: Linus Torvalds @ 2005-11-11 5:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, 10 Nov 2005, Junio C Hamano wrote: > > The discussion on the list demonstrated a pathological case where > an ancestor of a merge-base can be left interesting. This commit > introduces a postprocessing phase to fix it. Hmm. I'd suggest only doing this for the (relatively unlikely) case of there being more than one merge-base.. You don't even have to be exact about it: just see if the result list is bigger than one. (ie "result->next != NULL"). Yeah, sometimes there are result commits that get turned UNINTERESTING after they are added to the result list, and your "contaminate" phase would be strictly not needed then, but that's already a pretty unusual case. So the cheap test is to just say /* Do we have multiple results? */ if (result->next) contaminate_well(result); no? Btw, I don't think your contamination logic is necessarily complete. We may not even have parsed some of the commits that end up being on that strange corner case. I think you catch the particular case you tried, but I think that in theory, with long chains of commits out of date order, you could be in the situation of having determined that everything was uninteresting before you even parsed enough to see the chain from one merge-base to another. In fact, I think it would happen with your pathological example if it had just one more commit out-of-order in the E-D-C-B chain. But I didn't walk it through. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: merge-base: fully contaminate the well. 2005-11-11 5:36 ` Linus Torvalds @ 2005-11-11 6:04 ` Junio C Hamano 2005-11-11 16:18 ` Linus Torvalds 2005-11-11 8:28 ` Junio C Hamano 1 sibling, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-11 6:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > So the cheap test is to just say > > /* Do we have multiple results? */ > if (result->next) > contaminate_well(result); > > no? Correct. And this is only for really artificial corner case so we should try to avoid for normal cases as much as possible, cheaply. > Btw, I don't think your contamination logic is necessarily complete. We > may not even have parsed some of the commits that end up being on that > strange corner case. I haven't tried walking any other test cases, but wouldn't that be arguing that the our assumption that the current merge-base is at least complete if not optimum? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: merge-base: fully contaminate the well. 2005-11-11 6:04 ` Junio C Hamano @ 2005-11-11 16:18 ` Linus Torvalds 0 siblings, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2005-11-11 16:18 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, 10 Nov 2005, Junio C Hamano wrote: > > > Btw, I don't think your contamination logic is necessarily complete. We > > may not even have parsed some of the commits that end up being on that > > strange corner case. > > I haven't tried walking any other test cases, but wouldn't that > be arguing that the our assumption that the current merge-base > is at least complete if not optimum? Oh yes, it's always complete, even though it may not be optimal. And it's going to be optimal in all realistic cases. And even in the unrealistic cases if we end up returning a commit that is actually reachable by another one (through at least two levels of other commits that were in the wrong date-order with the _other_ ways of reaching that commit), the recursive strategy of merging the merge-bases will always end up boiling it doing to the optimal thing. So I think this is absolutely 100% correct. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: merge-base: fully contaminate the well. 2005-11-11 5:36 ` Linus Torvalds 2005-11-11 6:04 ` Junio C Hamano @ 2005-11-11 8:28 ` Junio C Hamano 1 sibling, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2005-11-11 8:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > Btw, I don't think your contamination logic is necessarily complete. We > may not even have parsed some of the commits that end up being on that > strange corner case.... You are right. And the situation seems really bad. The full-contaminator is not full at all, and fails miserably in not so pathlogical case. If we have something like this: 1 2 List A B C D E F G F E F1 E2 - - - - 2 1 - |\ /| G1 E2 D1 C1 - - 1 1 2 1 1 \ \ / | E2 D1 C1 - - 1 1 2 1 1 |\ D /| G3 D3 C3 - - 3 3 2 1 3 | \ | / | D3 C3 - - 3 3 2 1 3 | C | C7 - - 7 3 2 1 3 | | | | B | | | / \ A / \ | / G we would end up finding D and G and stop there, without ever seeing A or B. B _might_ be touched when we look at C at the last round, but there is no way for us to find G is reachable from D (or C) without parsing more than what we parsed in the main loop. The worst part of this is that you can indefinitely extend C-B-A chain trivially, and all it takes is the one, initial commit G, that has a screwed-up timestamp. All the other commits in this example are in the right time order. Very sad. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 21:02 ` Fredrik Kuivinen 2005-11-08 21:47 ` Junio C Hamano 2005-11-08 21:52 ` Linus Torvalds @ 2005-11-08 23:04 ` Johannes Schindelin 2 siblings, 0 replies; 58+ messages in thread From: Johannes Schindelin @ 2005-11-08 23:04 UTC (permalink / raw) To: Fredrik Kuivinen; +Cc: Linus Torvalds, Junio C Hamano, git Hi, On Tue, 8 Nov 2005, Fredrik Kuivinen wrote: > On Tue, Nov 08, 2005 at 12:58:50PM +0100, Johannes Schindelin wrote: > > > > We already have a fallback list: after really-trivial, try automatic, ..., > > try resolve. Why not just add recursive? So, if even resolve failed, just > > try once more, with recursive. > > > > I don't think this is a very good idea for two reasons. The first one > is that there are some merge scenarios involving renames which should > be conflicts but are cleanly merged by git-resolve. > > The second reason is that with the fall back list the recursive > strategy will only be used in the strange corner cases and will thus > not get nearly the same amount of testing it would get if it was the > first choice (or directly after the really-trivial merge). Two very valid points. You convinced me. Ciao, Dscho ^ permalink raw reply [flat|nested] 58+ messages in thread
* [RFC/PATCH] Make git-recursive the default strategy for git-pull. 2005-11-08 0:33 ` Linus Torvalds 2005-11-08 0:59 ` Junio C Hamano 2005-11-08 11:58 ` Johannes Schindelin @ 2005-11-08 16:21 ` Junio C Hamano 2005-11-11 22:25 ` Comments on recursive merge Junio C Hamano 3 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2005-11-08 16:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: git This does two things: - It changes the hardcoded default merge strategy for two-head git-pull from resolve to recursive. - .git/config file acquires two configuration items. pull.twohead names the strategy for two-head case, and pull.octopus names the strategy for octopus merge. IOW you are paranoid, you can have the following lines in your .git/config file and keep using git-merge-resolve when pulling one remote: [pull] twohead = resolve OTOH, you can say this: [pull] twohead = resolve twohead = recursive to try quicker resolve first, and when it fails, fall back to recursive. Signed-off-by: Junio C Hamano <junkio@cox.net> --- Linus Torvalds <torvalds@osdl.org> writes: > Hmm. True. The _really_ trivial in-index case triggers for me pretty > often, but I haven't done any statistics. It might be only 50% of the > time. >... > It's certainly an option to just do what I just did, namely use the > default one until it breaks, and then just do "git reset --hard" and re-do > the pull with "-s recursive". A bit sad, and it would be good to have > coverage on the recursive strategy.. Hopefully something like this would make people aware of recursive and give it a wider coverage and chance to mature. git-pull.sh | 16 ++++++++++++++-- 1 files changed, 14 insertions(+), 2 deletions(-) applies-to: 75922cf23cc070e2d5220d961a8f645f1bc8bb60 3acc20beaf0df9ce11a1b7aabf8c9dc7507a9b44 diff --git a/git-pull.sh b/git-pull.sh index 2358af6..3b875ad 100755 --- a/git-pull.sh +++ b/git-pull.sh @@ -79,10 +79,22 @@ case "$merge_head" in exit 0 ;; ?*' '?*) - strategy_default_args='-s octopus' + var=`git-var -l | sed -ne 's/^pull\.octopus=/-s /p'` + if test '' = "$var" + then + strategy_default_args='-s octopus' + else + strategy_default_args=$var + fi ;; *) - strategy_default_args='-s resolve' + var=`git-var -l | sed -ne 's/^pull\.twohead=/-s /p'` + if test '' = "$var" + then + strategy_default_args='-s recursive' + else + strategy_default_args=$var + fi ;; esac --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-08 0:33 ` Linus Torvalds ` (2 preceding siblings ...) 2005-11-08 16:21 ` [RFC/PATCH] Make git-recursive the default strategy for git-pull Junio C Hamano @ 2005-11-11 22:25 ` Junio C Hamano 2005-11-11 22:53 ` Linus Torvalds 2005-11-12 6:35 ` Ryan Anderson 3 siblings, 2 replies; 58+ messages in thread From: Junio C Hamano @ 2005-11-11 22:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: >> Another thing to consider is if it is fast enough for everyday >> trivial merges. > > Hmm. True. The _really_ trivial in-index case triggers for me pretty > often, but I haven't done any statistics. It might be only 50% of the > time. Just for fun, I randomly picked two heads/master commits from linux-2.6 repository (one was when I happened to have pulled the last time, and the other was when I thought this might be an interesting exercise and pulled again), and fed the commits between the two to a little script that looks at commits and tries to stat what they did (the script ignores renames so they appear as deletes and adds). Here is what the script spitted out: Total commit objects: 3957 Trivial Merges: 72 (1.82%) Merges: 225 (5.69%) Number of paths touched by non-merge commits: average 4.50, median 2, min 2, max 199 Number of merge parents: average 2.00, median 2, min 2, max 2 Number of merge bases: average 1.00, median 1, min 1, max 1 File level merges: average 37.61, median 8, min 0, max 555 Number of changed paths from the first parent: average 379.09, median 66, min 1, max 7553 File level 3-ways: average 1.96, median 1, min 0, max 37 Paths deleted: average 47.56, median 15, min 0, max 554 This counts what happened in individual devleoper's trees, subsystem maintainer trees and your tree, not just what you saw yourself. Some observations. - Trivial Merges count is surprisingly high. About 1/3 of merges are pure in-index merges. - Most of the commits (developer commits, not merges) are small and touches only a couple of paths. - Nobody does octopus ;-). - We did not have multi-base merge case during the period looked at (but the sample count is very low). - merge-one-file was called for only a handful (median 8) files, which is negligibly small compared to the total 17K files in the kernel tree, and fairly small compared to the number of changed paths from the first parent (meaning, read-tree trivial collapsing helped majorly). Among them, the number of paths that needed real file-level 3-way merges were even smaller (avg 1.96). All three of these points together is a fine demonstration that you designed git really right. The samples were between these two commits: commit 6693e74a16ef563960764bd963f1048392135c3c Author: Linus Torvalds <torvalds@g5.osdl.org> Date: Tue Oct 25 20:40:09 2005 -0700 commit 388f7ef720a982f49925e7b4e96f216f208f8c03 Author: Linus Torvalds <torvalds@g5.osdl.org> Date: Fri Nov 11 09:26:39 2005 -0800 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-11 22:25 ` Comments on recursive merge Junio C Hamano @ 2005-11-11 22:53 ` Linus Torvalds 2005-11-12 0:42 ` Junio C Hamano 2005-11-12 6:35 ` Ryan Anderson 1 sibling, 1 reply; 58+ messages in thread From: Linus Torvalds @ 2005-11-11 22:53 UTC (permalink / raw) To: Junio C Hamano, Paul Mackerras; +Cc: Git Mailing List On Fri, 11 Nov 2005, Junio C Hamano wrote: > > Some observations. > > - Trivial Merges count is surprisingly high. About 1/3 of > merges are pure in-index merges. I actually don't think that is surprisingly high, and would actually have expected it to be closer to 50%. On the other hand, the merges that end up being pure fast-forwards aren't counted as merges at all (since they don't show up as commits), so maybe that's what skews my preception of a big percentage of merges as being really trivial. > - Most of the commits (developer commits, not merges) are > small and touches only a couple of paths. This is something where I think the kernel is perhaps unusual, especially for a big project. We really do encourage people to make lots of small and well-defined changes, and the whole flow of development has been geared towards it. > - Nobody does octopus ;-). I do think octopus is really cool, and still think seeing that five-way octopus-merge in gitk in the git history was really really cool. It doesn't look as good any more, btw: do "gitk" on the current git tree, and search for "Octopus merge", and you'll see some of the history lines crossing each other. Paul? But yeah, it's a pretty special thing. I think its coolness factor way outweighs its usefullness factor ;^p > - We did not have multi-base merge case during the period > looked at (but the sample count is very low). Again, this is possibly because the kernel has already had a few years of distributed SCM usage under its belt, and we've tried to not only merge in a timely manner, but also try to keep history reasonably clean and not have a lot of cross-merging back and forth. That cuts down on multi-base possibilities. > - merge-one-file was called for only a handful (median 8) > files, which is negligibly small compared to the total 17K > files in the kernel tree, and fairly small compared to the > number of changed paths from the first parent (meaning, > read-tree trivial collapsing helped majorly). Among them, > the number of paths that needed real file-level 3-way merges > were even smaller (avg 1.96). I definitely think this is true for any big project. Small projects will inevitably have changes that modify large portions of the source base. But with small projects, it doesn't really matter _what_ you do, you can do it fast. Big projects (at least the sane kind) will never have lots of changes that modify a very big percentage of the source-tree. It's just too painful (and I'm not talking from a SCM angle, just from a developer angle). > All three of these points together is a fine demonstration > that you designed git really right. Well, it's self-re-inforcing. It was designed for the kernel usage patterns, so using the kernel to confirm that it's the "right design" is a bit self-serving. Sure, it's a good sign that my mental model of what the usage patters are does actually match reality, but at the same time it might be more interesting to see if other projects that use git end up using it the same way and/or have different statistics. I do expect that the size of the project will impact the statistics a lot. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-11 22:53 ` Linus Torvalds @ 2005-11-12 0:42 ` Junio C Hamano 0 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2005-11-12 0:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > I actually don't think that is surprisingly high, and would actually have > expected it to be closer to 50%. > > On the other hand, the merges that end up being pure fast-forwards aren't > counted as merges at all (since they don't show up as commits), so maybe > that's what skews my preception of a big percentage of merges as being > really trivial. That is what I meant by "it does not show just what you saw". It shows the whole community experience by everybody who has commit there. Being _the_ integration point, I imagine that most of the pure fast-forwards you saw were real merges for somebody else, and that merge being fast/safe/convenient counts, not for you but for your subsystem people. >> All three of these points together is a fine demonstration >> that you designed git really right. > > Well, it's self-re-inforcing. It was designed for the kernel > usage patterns, so using the kernel to confirm that it's the > "right design" is a bit self-serving. That is true. My point was that it's not like git was done right only for _you_, sacrificing subsystem people. The sample 4k commits show that the assumption of the usage pattern git is optimized for actually holds (commits being small, merges being mostly trivial) for kernel people other than you. It is yet to be seen if the same assumption holds for other projects. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Comments on recursive merge.. 2005-11-11 22:25 ` Comments on recursive merge Junio C Hamano 2005-11-11 22:53 ` Linus Torvalds @ 2005-11-12 6:35 ` Ryan Anderson 2005-11-12 7:44 ` [PATCH] GIT commit statistics Junio C Hamano 1 sibling, 1 reply; 58+ messages in thread From: Ryan Anderson @ 2005-11-12 6:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, git [-- Attachment #1: Type: text/plain, Size: 875 bytes --] Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > >>>Another thing to consider is if it is fast enough for everyday >>>trivial merges. >> >>Hmm. True. The _really_ trivial in-index case triggers for me pretty >>often, but I haven't done any statistics. It might be only 50% of the >>time. > > > Just for fun, I randomly picked two heads/master commits from > linux-2.6 repository (one was when I happened to have pulled the > last time, and the other was when I thought this might be an > interesting exercise and pulled again), and fed the commits > between the two to a little script that looks at commits and > tries to stat what they did (the script ignores renames so they > appear as deletes and adds). Mind sharing the script? It'be nice to know if these stats are typical, or unusual when you get numbers from a variety of other trees. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 256 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH] GIT commit statistics. 2005-11-12 6:35 ` Ryan Anderson @ 2005-11-12 7:44 ` Junio C Hamano 2005-11-12 12:19 ` Martin Langhoff 0 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-12 7:44 UTC (permalink / raw) To: Ryan Anderson; +Cc: Linus Torvalds, git Ryan Anderson <ryan@michonline.com> writes: > Junio C Hamano wrote: > >> Just for fun, I randomly picked two heads/master commits from >> linux-2.6 repository ... and fed the commits >> between the two to a little script that looks at commits and >> tries to stat what they did (the script ignores renames so they >> appear as deletes and adds). > > Mind sharing the script? > > It'be nice to know if these stats are typical, or unusual when you get > numbers from a variety of other trees. Very unpolished but here they are. I misread the trivial count in my original message. Trivial and Merge are counted separately, so among 3957 commits, merges were 297 (72 trivials and 225 others). -- >8 -- cut here -- >8 -- Subject: [PATCH] GIT commit statistics A set of scripts that read the existing commit history, and show various stats. Sample usage: # Arguments are given to git-rev-list; defaults to ORIG.. # if not given, to retrace what was just pulled. $ ./contrib/jc-git-stat-1.sh v0.99.9g..maint | ./contrib/jc-git-stat-1-log.perl Total commit objects: 43 Trivial Merges: 1 (2.33%) Merges: 1 (2.33%) Number of paths touched by non-merge commits: average 3.00, median 2, min 2, max 18 Number of merge parents: average 2.50, median 3, min 2, max 3 Number of merge bases: average 1.00, median 1, min 1, max 1 File level merges: average 0.50, median 1, min 0, max 1 Number of changed paths from the first parent: average 28.00, median 52, min 4, max 52 File level 3-ways: average 1.00, median 1, min 1, max 1 * "Trivial Merges" are the ones done by read-tree --trivial; * "Merges" are other merges; * "File level merges" are paths not collapsed by read-tree 3-way (i.e. given to merge-one-file); * "File level 3-ways" are paths merge-one-file would have run 'merge'; Signed-off-by: Junio C Hamano <junkio@cox.net> --- contrib/jc-git-stat-1-log.perl | 87 ++++++++++++++++++++++++++++++++++++++++ contrib/jc-git-stat-1-mof.sh | 59 +++++++++++++++++++++++++++ contrib/jc-git-stat-1.sh | 74 ++++++++++++++++++++++++++++++++++ 3 files changed, 220 insertions(+), 0 deletions(-) create mode 100755 contrib/jc-git-stat-1-log.perl create mode 100755 contrib/jc-git-stat-1-mof.sh create mode 100755 contrib/jc-git-stat-1.sh applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a 2cb3da4b260ed82dc379a11d91f55fe774a2ea49 diff --git a/contrib/jc-git-stat-1-log.perl b/contrib/jc-git-stat-1-log.perl new file mode 100755 index 0000000..b70af2b --- /dev/null +++ b/contrib/jc-git-stat-1-log.perl @@ -0,0 +1,87 @@ +#!/usr/bin/perl + +my ($patches, $failures, $merges, $trivials) = (0, 0, 0, 0); +my (@patch_paths, + @parent_counts, + @base_counts, + @merge_counts, + @path_counts, + @res_counts, + @merge_m, + @merge_a, + @merge_d, + @merge_c, + @merge_u); + +sub avg_median { + my ($ary) = shift; + my ($msg) = shift; + my @a = sort { $a <=> $b } @$ary; + my $sum = 0; + for (@a) { $sum += $_ } + return unless (@a && $sum); + my ($avg, $med) = ($sum/@a, $a[(@a/2)]); + my ($min, $max) = ($a[0], $a[$#a]); + printf "%s:\n\taverage %.2f, median %d, min %d, max %d\n", + $msg, $avg, $med, $min, $max; +} + +while (<>) { + next unless (s/^([MCFT]) [0-9a-f]{40} //); + chomp; + my $type = $1; + if ($type eq 'F') { + $failures++; + next; + } + if ($type eq 'C') { + $patches++; + push @patch_paths, $_; + next; + } + if ($type eq 'M') { + $merges++; + } + elsif ($type eq 'T') { + $trivials++; + } + else { + die "?? $type"; + } + s/^(\d+) (\d+) (\d+) (\d+) (\d+) *//; + push @parent_counts, $1; + push @base_counts, $2; + push @merge_counts, $3; + push @path_counts, $4; + push @res_counts, $5; + if ($type eq 'M') { + /M=(\d+) A=(\d+) D=(\d+) C=(\d+) U=(\d+)/ or die; + push @merge_m, $1; + push @merge_a, $2; + push @merge_d, $3; + push @merge_c, $4; + push @merge_u, $5; + } +} + +my $total = ($failures+$patches+$merges+$trivials); +print "Total commit objects: $total\n"; +printf "Trivial Merges: $trivials (%.2f%%)\n", ($trivials * 100.0/$total); +printf "Merges: $merges (%.2f%%)\n", ($merges * 100.0/$total); +if ($failures) { + print "Failures: $failures\n"; +} + +avg_median(\@patch_paths, "Number of paths touched by non-merge commits"); +avg_median(\@parent_counts, "Number of merge parents"); +avg_median(\@base_counts, "Number of merge bases"); +avg_median(\@merge_counts, "File level merges"); +avg_median(\@path_counts, "Number of changed paths from the first parent"); +#avg_median(\@res_counts, ""); +avg_median(\@merge_m, "File level 3-ways"); +avg_median(\@merge_a, "Paths added"); +avg_median(\@merge_d, "Paths deleted"); +avg_median(\@merge_c, "Paths identically added with wrong permission"); +avg_median(\@merge_u, "Paths added differently"); + + diff --git a/contrib/jc-git-stat-1-mof.sh b/contrib/jc-git-stat-1-mof.sh new file mode 100755 index 0000000..2be6d8b --- /dev/null +++ b/contrib/jc-git-stat-1-mof.sh @@ -0,0 +1,59 @@ +#!/bin/sh +# +# Copyright (c) Linus Torvalds, 2005 +# Copyright (c) Junio C Hamano, 2005 +# +# This is modified from the git per-file merge script, called with +# +# $1 - original file SHA1 (or empty) +# $2 - file in branch1 SHA1 (or empty) +# $3 - file in branch2 SHA1 (or empty) +# $4 - pathname in repository +# $5 - orignal file mode (or empty) +# $6 - file in branch1 mode (or empty) +# $7 - file in branch2 mode (or empty) +# +# Handle some trivial cases.. The _really_ trivial cases have +# been handled already by git-read-tree, but that one doesn't +# do any merges that might change the tree layout. + +case "${1:-.}${2:-.}${3:-.}" in +# +# Deleted in both or deleted in one and unchanged in the other +# +"$1.." | "$1.$1" | "$1$1.") + echo D + ;; + +# +# Added in one. +# +".$2." | "..$3" ) + echo A + ;; + +# +# Added in both (check for same permissions). +# +".$3$2") + if [ "$6" != "$7" ]; then + echo C + else + echo A + fi + ;; + +# +# Modified in both, but differently. +# +"$1$2$3") + echo M + ;; + +".$2$3") + echo U + ;; +*) + echo C + ;; +esac diff --git a/contrib/jc-git-stat-1.sh b/contrib/jc-git-stat-1.sh new file mode 100755 index 0000000..b03c69d --- /dev/null +++ b/contrib/jc-git-stat-1.sh @@ -0,0 +1,74 @@ +#!/bin/sh + +MOF=`dirname "$0"`/jc-git-stat-1-mof.sh + +GIT_INDEX_FILE=.tmp-index +export GIT_INDEX_FILE +LF=' +' + +check_merge () { + rm -f $GIT_INDEX_FILE + commit=$1 + shift + + case "$#" in + 2) + MB=$(git-merge-base --all "$@") + ;; + *) + MB=$(git-show-branch --merge-base "$@") + ;; + esac + basecnt=$(echo $MB | wc -l) + + if git-read-tree --trivial -m $MB "$@" 2>/dev/null + then + type=T + pathcnt=$(git-diff-index --cached --name-status "$1" | wc -l) + rescnt=$(git-diff-index --cached --name-status "$commit" | wc -l) + echo "T $commit $# $basecnt 0 $pathcnt $rescnt" + elif git-read-tree -m $MB "$@" 2>/dev/null + then + script='s/^ *\([0-9]*\) *\([A-Z]\)/\2=\1/' + type=M + mergecnt=$(git-ls-files --unmerged | sort -k 4,4 -u | wc -l) + pathcnt=$(git-diff-index --cached --name-status "$1" | wc -l) + + C=0 A=0 M=0 U=0 D=0 + eval `git-merge-index -o "$MOF" -a | + sort | + uniq -c | + sed -e "$script"` + rescnt=$(git-diff-index --cached --name-status "$commit" | wc -l) + echo "M $commit $# $basecnt $mergecnt $pathcnt $rescnt M=$M A=$A D=$D C=$C U=$U" + else + echo "F $commit $# $basecnt" + fi +} + +check_patch () { + pathcnt=$(git-diff-tree --name-status -r "$1" | wc -l) + echo "C $1 $pathcnt" +} + +case "$#" in +0) + set ORIG_HEAD.. ;; +esac + +git-rev-list --parents "$@" | +while read commit parents +do + case "$parents" in + ?*' '?*) + # Merge + check_merge $commit $parents + ;; + *) + # Change + check_patch $commit $parents + ;; + esac +done + --- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-12 7:44 ` [PATCH] GIT commit statistics Junio C Hamano @ 2005-11-12 12:19 ` Martin Langhoff 2005-11-12 12:53 ` Petr Baudis ` (3 more replies) 0 siblings, 4 replies; 58+ messages in thread From: Martin Langhoff @ 2005-11-12 12:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: Ryan Anderson, Linus Torvalds, git On 11/12/05, Junio C Hamano <junkio@cox.net> wrote: > Ryan Anderson <ryan@michonline.com> writes: > > > Junio C Hamano wrote: > > > >> Just for fun, I randomly picked two heads/master commits from > >> linux-2.6 repository ... and fed the commits > >> between the two to a little script that looks at commits and > >> tries to stat what they did (the script ignores renames so they > >> appear as deletes and adds). Related to this, I've been wondering whether it'd be possible to teach git to rebase local patches, even if that means rewriting local history. When you are dealing with team shared repo, the sequences of pull/push end up being quite messy, full of little meaningless merges. Similarly, when dealing with an upstream, my tree gets slowly out of sync and slightly messy. Eventually I get a new checkout, and rebase any pending patches with git-format-patch and git-am. The same process would be much easier if I could just cg-update from the repo and get it to try and actually rebase my local commits -- rewriting history as if I had committed them after the update. Of course, it'd be cheating... but we cheat all the time anyway, we only sweat harder at it ;-) cheers, martin ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-12 12:19 ` Martin Langhoff @ 2005-11-12 12:53 ` Petr Baudis 2005-11-15 10:04 ` Catalin Marinas 2005-11-12 19:04 ` Johannes Schindelin ` (2 subsequent siblings) 3 siblings, 1 reply; 58+ messages in thread From: Petr Baudis @ 2005-11-12 12:53 UTC (permalink / raw) To: Martin Langhoff Cc: Junio C Hamano, Ryan Anderson, Linus Torvalds, git, catalin.marinas Dear diary, on Sat, Nov 12, 2005 at 01:19:45PM CET, I got a letter where Martin Langhoff <martin.langhoff@gmail.com> said that... > The same process would be much easier if I could just cg-update from > the repo and get it to try and actually rebase my local commits -- > rewriting history as if I had committed them after the update. Of > course, it'd be cheating... but we cheat all the time anyway, we only > sweat harder at it ;-) I'm a bit reluctant about this functionality available in cg-update, but then people will start to want commit stack and stuff, while they should be just already long using StGIT for tracking their patches. Actually, I wanted to also implement e-mail functionality to cg-mkpatch, but I'm not sure now - perhaps people wanting that should really just use StGIT. Cogito or GIT core is not very suitable for keeping your patches against someone else's tree if he is not going to GIT-merge with you, exactly because it's not really very convenient to update your patches. On the same note, I would like StGIT to drop functionality not really belonging to patch stack manager (stg add, stg rm, stg status, ...) so that its commandset gets smaller and more focused - but before I would suggest dropping stg status, cg-status must be able to do conflicts tracking, so I will dedicate another mail to this sometime in the future, with a more detailed proposal. So, is there any reason why you want this in GIT/Cogito and don't want to use StGIT? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-12 12:53 ` Petr Baudis @ 2005-11-15 10:04 ` Catalin Marinas 2005-11-15 15:29 ` Chuck Lever 0 siblings, 1 reply; 58+ messages in thread From: Catalin Marinas @ 2005-11-15 10:04 UTC (permalink / raw) To: Petr Baudis Cc: Martin Langhoff, Junio C Hamano, Ryan Anderson, Linus Torvalds, git On 12/11/05, Petr Baudis <pasky@suse.cz> wrote: > On the same note, I would like StGIT to drop functionality not really > belonging to patch stack manager (stg add, stg rm, stg status, ...) so > that its commandset gets smaller and more focused This was the case with the first StGIT implementations but I slowly began to want to only use StGIT and not switch to something else for trivial SCM operations. I eventually added 'stg commit' which stores the patches permanently into the base of the stack to enable some kind of maintainer mode for StGIT. My main use for this was to import patches directly into the main branch and not keep a separate one and pull between them. > - but before I would > suggest dropping stg status, cg-status must be able to do conflicts > tracking, so I will dedicate another mail to this sometime in the > future, with a more detailed proposal. The gitmergeonefile.py script in StGIT adds every conflict to the .git/conflicts file which is read by 'stg status'. My goal is not to leave any unmerged entried in the index even if there are conflicts. Maybe this could be changed and .git/conflicts file avoided entirely. Anyway, while I'll try not to add more SCM functionality to StGIT, I don't think I should remove the existing add/rm/status functionality. It's just handy not to use a different command when you want a new file added to a patch. -- Catalin ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-15 10:04 ` Catalin Marinas @ 2005-11-15 15:29 ` Chuck Lever 0 siblings, 0 replies; 58+ messages in thread From: Chuck Lever @ 2005-11-15 15:29 UTC (permalink / raw) To: Petr Baudis Cc: Catalin Marinas, Martin Langhoff, Junio C Hamano, Ryan Anderson, Linus Torvalds, git [-- Attachment #1: Type: text/plain, Size: 1324 bytes --] Catalin Marinas wrote: > On 12/11/05, Petr Baudis <pasky@suse.cz> wrote: > >>On the same note, I would like StGIT to drop functionality not really >>belonging to patch stack manager (stg add, stg rm, stg status, ...) so >>that its commandset gets smaller and more focused > > > This was the case with the first StGIT implementations but I slowly > began to want to only use StGIT and not switch to something else for > trivial SCM operations. I eventually added 'stg commit' which stores > the patches permanently into the base of the stack to enable some kind > of maintainer mode for StGIT. My main use for this was to import > patches directly into the main branch and not keep a separate one and > pull between them. ... > Anyway, while I'll try not to add more SCM functionality to StGIT, I > don't think I should remove the existing add/rm/status functionality. > It's just handy not to use a different command when you want a new > file added to a patch. petr, currently it isn't recommended to use StGIT with other porcelains. so either: make it completely safe to use StGIT with other porcelains, or add a minimal amount of SCM-like functionality so users don't miss it and try to use other porcelains and trash their repositories. overall i agree with catalin-- it's just easier to use a single tool. [-- Attachment #2: cel.vcf --] [-- Type: text/x-vcard, Size: 439 bytes --] begin:vcard fn:Chuck Lever n:Lever;Charles org:Network Appliance, Incorporated;Linux NFS Client Development adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA email;internet:cel@citi.umich.edu title:Member of Technical Staff tel;work:+1 734 763-4415 tel;fax:+1 734 763 4434 tel;home:+1 734 668-1089 x-mozilla-html:FALSE url:http://www.monkey.org/~cel/ version:2.1 end:vcard ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-12 12:19 ` Martin Langhoff 2005-11-12 12:53 ` Petr Baudis @ 2005-11-12 19:04 ` Johannes Schindelin 2005-11-13 10:59 ` Junio C Hamano 2005-11-13 11:11 ` Petr Baudis 3 siblings, 0 replies; 58+ messages in thread From: Johannes Schindelin @ 2005-11-12 19:04 UTC (permalink / raw) To: Martin Langhoff; +Cc: git Hi, On Sun, 13 Nov 2005, Martin Langhoff wrote: > [...] I've been wondering whether it'd be possible to teach git to > rebase local patches, even if that means rewriting local history. When > you are dealing with team shared repo, the sequences of pull/push end up > being quite messy, full of little meaningless merges. Similarly, when > dealing with an upstream, my tree gets slowly out of sync and slightly > messy. Eventually I get a new checkout, and rebase any pending patches > with git-format-patch and git-am. I thought about this problem for a while. You described a typical use case (in a small team with one central repo), where pushes are done only from a cleaned up branch, but the work branches stay. If the tree corresponding to the central HEAD matches the tree corresponding of your private merge branch at a given stage, a simple graft should be your solution. Example: You pull and push from/to origin on a central server. Your (dirty) work branch is master. Now, at a given time, origin^{tree}==master~15^{tree}. You could then generate a graft which tells git that all parents of origin are also parents of master~15. If at a given stage, origin^{tree}==master^{tree}, that graft would make your next merge a fast forward, effectively cleaning up your branch without loosing your history. The graft could be generated by a simple script. Would this help you? Ciao, Dscho ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-12 12:19 ` Martin Langhoff 2005-11-12 12:53 ` Petr Baudis 2005-11-12 19:04 ` Johannes Schindelin @ 2005-11-13 10:59 ` Junio C Hamano 2005-11-13 20:42 ` Martin Langhoff 2005-11-13 11:11 ` Petr Baudis 3 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-13 10:59 UTC (permalink / raw) To: Martin Langhoff; +Cc: git Martin Langhoff <martin.langhoff@gmail.com> writes: > Similarly, when dealing with an upstream, my tree gets slowly out of > sync and slightly messy. Eventually I get a new checkout, and rebase > any pending patches with git-format-patch and git-am. The key is not to let your tree go "slowly" out of sync, from my experience. When Linus was the maintainer, I used to do the equivalent of the following all the time to keep up with his tree while keeping my history clean [*1*]. Frequently [*2*], I tried to see if Linus made something new and interesting. My "origin" branch was always copy of Linus head. $ git fetch origin The command would say "Fast forward". So what did he do? $ git show-branch master origin ! [origin] Separate LDFLAGS and CFLAGS. * [master] Rename lost+found to lost-found. -- + [master] Rename lost+found to lost-found. + [master^] Fix compilation warnings in pack-redundant.c + [master~2] Debian: build-depend on libexpat-dev. + [origin] Separate LDFLAGS and CFLAGS. ++ [master~3] Split gitk into seperate RPM package Ah, the commit master~3 was what he had the last time I pulled from him, and since then he made a commit while I did three. I could do "git pull . origin" at this point, but that would result in a useless mini-merge. My tree is not public so I can freely rebase to clean things up. $ git rebase origin $ git show-branch ! [origin] Separate LDFLAGS and CFLAGS. * [master] Rename lost+found to lost-found. -- + [master] Rename lost+found to lost-found. + [master^] Fix compilation warnings in pack-redundant.c + [master~2] Debian: build-depend on libexpat-dev. ++ [origin] Separate LDFLAGS and CFLAGS. Now I am fast-forward, so I could ask him to pull from me [*3*]. I think each of your developers can do the same, treating the "project shared repository" as "Linus repository" and pull that into the "origin" branch, and when the "master" is ready, push it back into the shared repository (which is equivalent of Linus pulling everything from me while doing nothing else in his repository). For a sizable change that deserves a topic branch with a long sequence of commits, rebasing is not always the optimum solution; and you may want to keep the full merge history of such a branch pushed into the public repository as is. But for simpler cases that 'git rebase' can handle easily without conflicts, the above procedure would help you keeping the history of your shared repository less cluttered. [Footnotes] *1* Back then we did not have multi-head fetch, show-branch nor rebase, so I did these using a homebrew Porcelain. *2* Unlike CVS which always mucks with the working tree, 'git fetch' into a branch that is not current one is an operation and can be done even when I am in the middle of a heavy hackery. Being able to peek into what others are up even when your tree is in a messy state (the fetch is often followed by log and diff) helps you to avoid doing duplicated work or going in a wrong direction, which was great. *3* Even back then almost all changes were fed via e-mail to the maintainer. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-13 10:59 ` Junio C Hamano @ 2005-11-13 20:42 ` Martin Langhoff 2005-11-14 3:33 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Martin Langhoff @ 2005-11-13 20:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 11/13/05, Junio C Hamano <junkio@cox.net> wrote: > Ah, the commit master~3 was what he had the last time I pulled > from him, and since then he made a commit while I did three. I > could do "git pull . origin" at this point, but that would > result in a useless mini-merge. My tree is not public so I can > freely rebase to clean things up. > > $ git rebase origin > $ git show-branch What happens if there are conflicts during git-rebase? I'm thinking of adding an '-r' option to cg-update that will rebase instead of merging, if the rebase is clean. Is there a cheap way to ask from a shell script whether the merge is truly trivial? I thought git-diff-tree would help me here, but it doesn't... martin ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-13 20:42 ` Martin Langhoff @ 2005-11-14 3:33 ` Junio C Hamano 2005-11-14 4:01 ` Martin Langhoff 0 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-14 3:33 UTC (permalink / raw) To: Martin Langhoff; +Cc: git Martin Langhoff <martin.langhoff@gmail.com> writes: > On 11/13/05, Junio C Hamano <junkio@cox.net> wrote: >> .... I >> could do "git pull . origin" at this point, but that would >> result in a useless mini-merge. My tree is not public so I can >> freely rebase to clean things up. >> >> $ git rebase origin >> $ git show-branch > > What happens if there are conflicts during git-rebase? Well, obviously you could resolve them ;-). But if you are rebasing just to reduce trivial mini-merges, it might make more sense to honestly record the merge if the rebase involves conflict resolution. After all, the reason rebase got conflicts is because the development trail by somebody else that has been already committed to the shared "master" branch overlapped what you were doing in your "master" branch, isn't it? In your message you indicated that you use "format-patch" piped to "am". I think that is a better approach than "rebase" these days; the conflict can be handled easier with that approach, and if you use "--3way" flag you do not even have to worry about patches in your branch that is already there in the shared "master" (your "origin") branch. So instead of running "git rebase origin" at this point, I may do something like this [*1*]: $ git-reset --hard origin $ git-format-patch -k --stdout origin ORIG_HEAD | git am -3 -k The first step rewinds my "master" (the original is stored in ORIG_HEAD), and the second step extracts the commits that were in my master but not in origin in a patch form, an replay them on top of the "master" (which was rewound to "origin"). "git-am" would stop at the first unapplicable patch if there is a conflict, leaving the conflicting patch in .dotest/patch. I have to fix it up before going further. Here is how. 1. "git am" 3-way fallback would have kicked in, because I have all the blobs the patch is supposed to apply to, and my working tree and index is in a state just like when I am resolving a conflicting merge after a pull. Clean up the conflict in the working tree, build-test and all as usual. 2. Run "git diff HEAD >.dotest/patch" to record what the patch should have been if it were to apply cleanly on top of the previous state. If I did a noteworthy adjustment to the patch, I might also edit .dotest/final-commit to update the commit log message. 3. Then reset the working tree and index before the failed application of this patch with "git reset --hard". After that: $ git am -3 would let me restart from that commit that did not replay well. > Is there a cheap way to ask from a shell script whether the merge is > truly trivial? I thought git-diff-tree would help me here, but it > doesn't... This was recently added by Linus to help git-merge do that: git-read-tree --trivial -m -u $O $A $B The command exits with a non-zero status, without touching index nor working tree, when the merge is not "truly trivial". Otherwise it does its thing -- the trivial in-index merge is done, files in working tree updated and the only thing left for you to do is to create a commit having parent $A and $B. Would that help? [Footnote] *1* This is what the "make rebase restartable" comment in TODO list is about, and I wanted to rewrite "rebase" to do exactly these two commands, but I got distracted ;-). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-14 3:33 ` Junio C Hamano @ 2005-11-14 4:01 ` Martin Langhoff 2005-11-14 6:06 ` Junio C Hamano 0 siblings, 1 reply; 58+ messages in thread From: Martin Langhoff @ 2005-11-14 4:01 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 11/14/05, Junio C Hamano <junkio@cox.net> wrote: > In your message you indicated that you use "format-patch" piped > to "am". I think that is a better approach than "rebase" these > days Hmmm. But doesn't deal well with binary changes. We deal with a large set of projects, and while we don't manage that many binary files, it is just enough that I'll have to pass on only using format-patch. > This was recently added by Linus to help git-merge do that: > > git-read-tree --trivial -m -u $O $A $B Cool! Abusing that, perhaps I could teach git-rebase to take a '--trivial-only' flag, and then cg-update --rebase could try git-rebase --trivial-only, and fall back on cg-merge... cheers, martin ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-14 4:01 ` Martin Langhoff @ 2005-11-14 6:06 ` Junio C Hamano 2005-11-14 8:51 ` Martin Langhoff 0 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2005-11-14 6:06 UTC (permalink / raw) To: Martin Langhoff; +Cc: git Martin Langhoff <martin.langhoff@gmail.com> writes: > On 11/14/05, Junio C Hamano <junkio@cox.net> wrote: >> In your message you indicated that you use "format-patch" piped >> to "am". I think that is a better approach than "rebase" these >> days > > Hmmm. But doesn't deal well with binary changes. We deal with a large > set of projects, and while we don't manage that many binary files, it > is just enough that I'll have to pass on only using format-patch. It shouldn't be too tricky to enhance "git am" (git-apply called at around line 49 in it) to grok binary differences for this purpose, because you would have both pre- and post-image blob in your object database, because the patch is being used only to replay what you have in your reository, and it records their abbreviated SHA1 name. I've never felt need to "merge" the binary files myself and had never got around doing this, but if you are interested, it would go something like this: . Make "index %.7s..%.7s" that abbreviates pre- and post- image blob SHA1s in diff.c configurable to spit out full 40 bytes. Call that option --full-index-sha1. . The updated "git rebase" that uses the "format-patch | am" I outlined would pass the --full-index-sha1 to format-patch (which is pased onto underlying diff-tree -p). . In apply.c, check if all of the following holds: * we have both the full 40-byte old_sha1_prefix[] and new_sha1_prefix[]; and * what the index records matches old_sha1_prefix[]; and * the new blob is found in the object database; and for such a path: * change parse_chunk() not to barf even on a binary patch. * change apply_data() to just declare the patch application result is the new blob recorded in the patch. Hmm. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-14 6:06 ` Junio C Hamano @ 2005-11-14 8:51 ` Martin Langhoff 2005-11-14 9:25 ` Petr Baudis ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Martin Langhoff @ 2005-11-14 8:51 UTC (permalink / raw) To: Junio C Hamano, Petr Baudis; +Cc: git On 11/14/05, Junio C Hamano <junkio@cox.net> wrote: > It shouldn't be too tricky to enhance "git am" (git-apply called > at around line 49 in it) to grok binary differences for this > purpose, because you would have both pre- and post-image blob in > your object database, because the patch is being used only to > replay what you have in your reository, and it records their > abbreviated SHA1 name. I'm curious. What would be the advantages of this over git-read-tree -m for use within a single repo? I keep thinking that I need an intra-repo way of doing it (arguably faster and more reliable), instead of git-format-patch|git-am, which is less reliable and slower. OTOH, if this is heading towards teaching git-am how to apply changes to binary files based on known SHA1s, this will give birth to a type of patch that applies only if you have the objects beforehand. Is that enough to get by? Perhaps we need a format to fully describe binary files? > I've never felt need to "merge" the binary files myself and had > never got around doing this, but if you are interested, it would > go something like this: That's quite a bit of C hacking... I'm game for all the Perl and shell scripts in git, but I know better than start learning C in _this_ project with you, Linus and the whole list watching me make the fool ;-) So noone hacks this bit of C you are proposing I'll eventually get something done with cg-update and git-rebase. In the meantime, the user experience of working with a small team and a shared repo has improved significantly by switching from cg-update to cg-fetch && git-rebase origin. So much so that I suspect that it'd be a big win for cg-update to default to rebase on merges from 'origin'. cheers, martin ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-14 8:51 ` Martin Langhoff @ 2005-11-14 9:25 ` Petr Baudis 2005-11-14 21:25 ` Martin Langhoff 2005-11-14 9:27 ` Junio C Hamano 2005-11-15 3:00 ` Junio C Hamano 2 siblings, 1 reply; 58+ messages in thread From: Petr Baudis @ 2005-11-14 9:25 UTC (permalink / raw) To: Martin Langhoff; +Cc: Junio C Hamano, git Dear diary, on Mon, Nov 14, 2005 at 09:51:29AM CET, I got a letter where Martin Langhoff <martin.langhoff@gmail.com> said that... > In the meantime, the user experience of working with a small team and > a shared repo has improved significantly by switching from cg-update > to cg-fetch && git-rebase origin. So much so that I suspect that it'd > be a big win for cg-update to default to rebase on merges from > 'origin'. I still don't understand what's the point. And it is confusing for the user, since the history suddenly doesn't reflect how the GIT development happened (which is what the history is about), and it is downright deadly as soon as more than one branch gets around, which makes it even more confusing for the user (you can do cg-update, yeah - oh wait, unless you do X, then you need to remember to always do cg-update -R or whatever). All the woes just to get rid of the merge commits. What's wrong on merge commits? If they irritate you so much, cg-log -M. ;-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-14 9:25 ` Petr Baudis @ 2005-11-14 21:25 ` Martin Langhoff 0 siblings, 0 replies; 58+ messages in thread From: Martin Langhoff @ 2005-11-14 21:25 UTC (permalink / raw) To: Petr Baudis; +Cc: Junio C Hamano, git On 11/14/05, Petr Baudis <pasky@suse.cz> wrote: > All the woes just to get rid of the merge commits. What's wrong on merge > commits? If they irritate you so much, cg-log -M. ;-) Well, if you have a team of 5 working closely, doing commit/update several times a day, soon the following things happen: - everyone has sightly different histories, even if they all have the same tree - in the repo, there are 'update' merges galore. - as soon as you _actually_ have branches, it's really hard to distinguish branch merges from 'same branch update before commit' merges. by replacing the daily/hourly intra-team, self-branch `cg-update` with `cg-fetch && git-rebase` our history makes much more sense _and_ looking at gitk I can see clearly again the interesting merges (ie: merges with other branches). In practice, it's exactly what happens with git's history -- the merges are actually from rebased patches, that's why the history is "readable". does that make sense? cheers, martin ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-14 8:51 ` Martin Langhoff 2005-11-14 9:25 ` Petr Baudis @ 2005-11-14 9:27 ` Junio C Hamano 2005-11-15 3:00 ` Junio C Hamano 2 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2005-11-14 9:27 UTC (permalink / raw) To: Martin Langhoff; +Cc: Petr Baudis, git Martin Langhoff <martin.langhoff@gmail.com> writes: > I'm curious. What would be the advantages of this over git-read-tree > -m for use within a single repo? On a large tree I had an impression that applying patch and then falling back on 3-way is faster, but other than that, nothing, really. Just being able to use a single mechanism, which does not buy us much. > OTOH, if this is heading towards teaching git-am how to apply changes > to binary files based on known SHA1s, this will give birth to a type > of patch that applies only if you have the objects beforehand. Is that > enough to get by? Perhaps we need a format to fully describe binary > files? I'd rather not see our "patch" go in the direction of recording both pre- and post-image of blob for binary files, which is what we would end up doing if we really want to do binary flexibly. Well, that may be nice as an option, but not by default. An option halfway in between would be to record the pre-image SHA1 and post- blob, perhaps compressed-uuencoded. This would limit us to the case that the recipient has not touched the binary file and replacing it, but in practice that might be enough. I'm willing to do the C hackery myself if there is enough interest in it. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-14 8:51 ` Martin Langhoff 2005-11-14 9:25 ` Petr Baudis 2005-11-14 9:27 ` Junio C Hamano @ 2005-11-15 3:00 ` Junio C Hamano 2 siblings, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2005-11-15 3:00 UTC (permalink / raw) To: Martin Langhoff; +Cc: git Martin Langhoff <martin.langhoff@gmail.com> writes: >> I've never felt need to "merge" the binary files myself and had >> never got around doing this, but if you are interested, it would >> go something like this: > > That's quite a bit of C hacking... I'll be pushing out something along the lines I said in the proposed updates branch tonight. I've even run 1 (one) test ;-). The relevant commits are these: + [pu~5^2] rebase: make it usable for binary files as well. + [pu~5^2^] diff: --full-index + [pu~5^2~2] apply: allow-binary-replacement. + [pu~5^2~3] Rewrite rebase to use git-format-patch piped to git-am. Note that this does not make generated "binary patch" usable across repositories yet. Since our patches are reversible out of principle, making it usable across repositories involves recording both the pre- and post-image of binary blobs in the patch output, which may be a useful option in some cases but not necessary for the immediate application of intra-repo rebasing. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] GIT commit statistics. 2005-11-12 12:19 ` Martin Langhoff ` (2 preceding siblings ...) 2005-11-13 10:59 ` Junio C Hamano @ 2005-11-13 11:11 ` Petr Baudis 3 siblings, 0 replies; 58+ messages in thread From: Petr Baudis @ 2005-11-13 11:11 UTC (permalink / raw) To: Martin Langhoff; +Cc: Junio C Hamano, Ryan Anderson, Linus Torvalds, git Dear diary, on Sat, Nov 12, 2005 at 01:19:45PM CET, I got a letter where Martin Langhoff <martin.langhoff@gmail.com> said that... > Similarly, when dealing with an upstream, my tree gets slowly out of > sync and slightly messy. I've been replying only to this, but missed the team shared repo case, where StGIT obviously does not make much sense. > Related to this, I've been wondering whether it'd be possible to teach > git to rebase local patches, even if that means rewriting local > history. When you are dealing with team shared repo, the sequences of > pull/push end up being quite messy, full of little meaningless merges. Well, only pulls make up for merges. But what's wrong with that? This is just what you get when using distributed VCS, and what's the point in skewing the history to look different than how did it happen? I'd just get used to it. :-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ VI has two modes: the one in which it beeps and the one in which it doesn't. ^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2005-11-15 15:30 UTC | newest] Thread overview: 58+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-11-07 16:48 Comments on recursive merge Linus Torvalds 2005-11-07 16:56 ` Linus Torvalds 2005-11-07 23:19 ` [PATCH] merge-recursive: Only print relevant rename messages Fredrik Kuivinen 2005-11-07 23:54 ` Junio C Hamano 2005-11-09 10:36 ` Fredrik Kuivinen 2005-11-07 22:58 ` Comments on recursive merge Fredrik Kuivinen 2005-11-08 0:13 ` Junio C Hamano 2005-11-08 0:33 ` Linus Torvalds 2005-11-08 0:59 ` Junio C Hamano 2005-11-08 11:58 ` Johannes Schindelin 2005-11-08 21:02 ` Fredrik Kuivinen 2005-11-08 21:47 ` Junio C Hamano 2005-11-08 21:52 ` Linus Torvalds 2005-11-08 22:36 ` Fredrik Kuivinen 2005-11-08 23:05 ` Linus Torvalds 2005-11-08 23:18 ` Johannes Schindelin 2005-11-09 0:18 ` Linus Torvalds 2005-11-09 6:10 ` Junio C Hamano 2005-11-09 0:32 ` Petr Baudis 2005-11-09 0:51 ` Linus Torvalds 2005-11-09 0:59 ` Junio C Hamano 2005-11-09 1:22 ` Linus Torvalds 2005-11-09 1:42 ` Junio C Hamano 2005-11-09 10:20 ` Junio C Hamano 2005-11-09 14:59 ` Petr Baudis 2005-11-09 16:30 ` Linus Torvalds 2005-11-09 20:13 ` Junio C Hamano 2005-11-09 21:58 ` Linus Torvalds 2005-11-09 22:56 ` Junio C Hamano 2005-11-09 23:34 ` Linus Torvalds 2005-11-11 2:58 ` merge-base: fully contaminate the well Junio C Hamano 2005-11-11 5:36 ` Linus Torvalds 2005-11-11 6:04 ` Junio C Hamano 2005-11-11 16:18 ` Linus Torvalds 2005-11-11 8:28 ` Junio C Hamano 2005-11-08 23:04 ` Comments on recursive merge Johannes Schindelin 2005-11-08 16:21 ` [RFC/PATCH] Make git-recursive the default strategy for git-pull Junio C Hamano 2005-11-11 22:25 ` Comments on recursive merge Junio C Hamano 2005-11-11 22:53 ` Linus Torvalds 2005-11-12 0:42 ` Junio C Hamano 2005-11-12 6:35 ` Ryan Anderson 2005-11-12 7:44 ` [PATCH] GIT commit statistics Junio C Hamano 2005-11-12 12:19 ` Martin Langhoff 2005-11-12 12:53 ` Petr Baudis 2005-11-15 10:04 ` Catalin Marinas 2005-11-15 15:29 ` Chuck Lever 2005-11-12 19:04 ` Johannes Schindelin 2005-11-13 10:59 ` Junio C Hamano 2005-11-13 20:42 ` Martin Langhoff 2005-11-14 3:33 ` Junio C Hamano 2005-11-14 4:01 ` Martin Langhoff 2005-11-14 6:06 ` Junio C Hamano 2005-11-14 8:51 ` Martin Langhoff 2005-11-14 9:25 ` Petr Baudis 2005-11-14 21:25 ` Martin Langhoff 2005-11-14 9:27 ` Junio C Hamano 2005-11-15 3:00 ` Junio C Hamano 2005-11-13 11:11 ` Petr Baudis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).