* "git merge" merges too much! @ 2009-11-29 3:21 Greg A. Woods 2009-11-29 5:14 ` Jeff King 2009-11-29 5:15 ` "git merge" merges too much! Junio C Hamano 0 siblings, 2 replies; 33+ messages in thread From: Greg A. Woods @ 2009-11-29 3:21 UTC (permalink / raw) To: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 2828 bytes --] I'm trying to learn to use Git to manage local changes. I'm very new to Git (hi all!), but not new at all to version tracking tools in general. Hope I've found the right list on which to ask potentially naive questions! I've been doing _lots_ of reading about Git, but I can't seem to find anything about the problems I relate below. One task I'm working on is to try to find the best way to merge changes made from one branch to another, eg. to propagate local fixes from one release to another. However in at least one simple case "git merge" merges too much. I have something like this started from a remote-cloned repository where BL1.2 is a branch from the remote master HEAD, which happens to correspond to a tag "TR1.2", the release-1.2 tag, and I've made three local commits to my local BL1.2 branch: A, B, and C: BL1.2 - A - B - C <- BL1.2 HEAD / master 1 - 2 - TR1.1 - 3 - 4 - 5 - TR1.2 <- master HEAD (there are no "release" branches in this project, just tags on the master branch to represent release points -- is there any way to get "git log" to show which tags are associated with a given commit and/or branch? The real project is freedesktop.org's xinit repo, but the real tree is too messy to diagram here -- hopefully I've extracted the essence of the problem correctly) I now want to create a branch "BL1.1" and merge commits A, B, and C to it in order to back-port my local fixes to the TR1.1 release. "TR1.1" is simply a tag on the origin/master trunk. I do the following: git checkout -b BL1.1 TR1.1 git merge BL1.2 However this seems to merge all of 3, 4, and 5, as well as A, B, and C. I think I can (barely) understand why it's doing what it's doing, but that's not what I want it to do. However it looks like Git doesn't have the same idea of a branch "base" point as I think I do. Running "git log TR1.2..BL1.2" does show me exactly the changes I wish to propagate, but "git merge TR1.2..BL1.2" says "not something we can merge". Sigh. How can I get it to merge just the changes from the "base" of the BL1.2 branch to its head? Is using either git-cherry-pick or "git log -p | git-am", the only way to do this? Which way best preserves Git's ability to realize if a change has already been included on the target branch, if any? Is this the kind of "problem" that drove the creators of Stacked-Git to invent their tools? Is there any way to get "git log --graph" (and/or gitk) to show me all the branch heads, not just the current/specified one? -- Greg A. Woods +1 416 218-0098 VE3TCP RoboHack <woods@robohack.ca> Planix, Inc. <woods@planix.com> Secrets of the Weird <woods@weird.com> [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-11-29 3:21 "git merge" merges too much! Greg A. Woods @ 2009-11-29 5:14 ` Jeff King 2009-11-30 18:12 ` Greg A. Woods 2009-11-29 5:15 ` "git merge" merges too much! Junio C Hamano 1 sibling, 1 reply; 33+ messages in thread From: Jeff King @ 2009-11-29 5:14 UTC (permalink / raw) To: The Git Mailing List On Sat, Nov 28, 2009 at 10:21:25PM -0500, Greg A. Woods wrote: > Hope I've found the right list on which to ask potentially naive > questions! I've been doing _lots_ of reading about Git, but I can't > seem to find anything about the problems I relate below. Yep, you're in the right place. > master branch to represent release points -- is there any way to get > "git log" to show which tags are associated with a given commit and/or Try "git log --decorate". > BL1.2 - A - B - C <- BL1.2 HEAD > / > master 1 - 2 - TR1.1 - 3 - 4 - 5 - TR1.2 <- master HEAD > > [...] > > git checkout -b BL1.1 TR1.1 > git merge BL1.2 > > However this seems to merge all of 3, 4, and 5, as well as A, B, and C. > > I think I can (barely) understand why it's doing what it's doing, but > that's not what I want it to do. However it looks like Git doesn't have > the same idea of a branch "base" point as I think I do. Yes. Git doesn't really view history as branches in the way you are thinking. History is simply a directed graph, and when you merge two nodes in the graph, it takes into account _everything_ that happened to reach those two points since the last time they diverged (which in your case is simply TR1.1, as BL1.2 is a strict superset). There is no way in the history graph to represent "we have these commits, but not this subsequence". You have to create new commits A', B', and C' which introduce more or less the same changes as their counterparts (and they may even be _exactly_ the same except for the parentage, but then again, they may not if the changes they make do not apply in the same way on top of TR1.1). > Running "git log TR1.2..BL1.2" does show me exactly the changes I wish > to propagate, but "git merge TR1.2..BL1.2" says "not something we can > merge". Sigh. > > How can I get it to merge just the changes from the "base" of the BL1.2 > branch to its head? > > Is using either git-cherry-pick or "git log -p | git-am", the only way > to do this? Which way best preserves Git's ability to realize if a > change has already been included on the target branch, if any? Yes, you must cherry-pick or use rebase (which is a more featureful version of the pipeline you mentioned). Either way will produce an equivalent set of commits (cherry-pick is useful when you are picking a couple of commits; rebase is useful for rewriting a whole stretch of history. It sounds like you want to do the latter). The resulting commits will have different commit ids, but git generally does a good job at merging such things, because it looks only at the result state and not the intermediate commits. If both sides have made an equivalent change, then there is no conflict. > Is there any way to get "git log --graph" (and/or gitk) to show me all > the branch heads, not just the current/specified one? Try "--all" with either gitk or "git log". Or if you want a subset of heads, just name them. Hope that helps, -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-11-29 5:14 ` Jeff King @ 2009-11-30 18:12 ` Greg A. Woods 2009-11-30 19:22 ` Dmitry Potapov 2009-12-02 20:09 ` Jeff King 0 siblings, 2 replies; 33+ messages in thread From: Greg A. Woods @ 2009-11-30 18:12 UTC (permalink / raw) To: Jeff King; +Cc: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 3740 bytes --] Thank you very much for confirming my understanding of why "git merge" was merging more changes than I had desired it to merge. The way "git merge" works does concern me somewhat though as I try to figure out how I might use "topic" branches to develop local features and then merge them onto each supported release branch. Some guides I've read suggest this methodology, but I'm sure how well this will work, even when the remote project uses release branches to manage official releases. Perhaps I really should look at StGit, but I'm not sure about it either. At Sun, 29 Nov 2009 00:14:27 -0500, Jeff King <peff@peff.net> wrote: Subject: Re: "git merge" merges too much! > > On Sat, Nov 28, 2009 at 10:21:25PM -0500, Greg A. Woods wrote: > > > > master branch to represent release points -- is there any way to get > > "git log" to show which tags are associated with a given commit and/or > > Try "git log --decorate". Excellent! That's exactly what I was looking for. (From a first pass through the documentation I would never have guessed that "tags" were also a form of "refs". All these different names for things in the Git vs. many other VCS's, like "ref names" _really_ confusing to anyone like me with too much experience using those other revision control systems. Part of the problem is that Git documentation seems to be two-or-more minded about several of its concepts and features. Even the gitglossary(7) is somewhat inconsistent on how it uses "ref" and "refs". Perhaps all that's needed is some firm editing and clean-up of the manuals and documentation by a good strong technical editor.) > Yes, you must cherry-pick or use rebase (which is a more featureful > version of the pipeline you mentioned). "git rebase" will not work for me unless it grows a "copy" option , i.e. one which does not delete the original branch (i.e. avoids the "reset" phase of its operation). This option would likely only make sense when used with the "--onto" option, I would guess. "git rebase -i" would certainly give me all the control I could possibly want when copying changes from branch to branch. It likely wouldn't make sense to base this new "copy" feature directly on "git rebase" though, especially in light of all the warnings about how "git rebase" isn't friendly when applied to already published branches. I think in theory this "copy" feature won't cause problems for already-published branches. Perhaps it should be a whole new top-level command such as "git copy", but then it's so much like "git merge" I'm not sure.... Ideally if "git merge" could be taught how to work with just the changes from the base of a branch then it would do what I'm looking for directly. > The resulting commits will have different commit ids, but git generally > does a good job at merging such things, because it looks only at the > result state and not the intermediate commits. If both sides have made > an equivalent change, then there is no conflict. > > Is there any way to get "git log --graph" (and/or gitk) to show me all > > the branch heads, not just the current/specified one? > > Try "--all" with either gitk or "git log". Or if you want a subset of > heads, just name them. Awsome! Those options provide just what I wanted to see! (git-log(1) is worse than ls(1) for having too many options, but worst of all in the release I'm still using it doesn't respond sensibly nor consistently with other commands when given the "-?" option.) -- Greg A. Woods +1 416 218-0098 VE3TCP RoboHack <woods@robohack.ca> Planix, Inc. <woods@planix.com> Secrets of the Weird <woods@weird.com> [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-11-30 18:12 ` Greg A. Woods @ 2009-11-30 19:22 ` Dmitry Potapov 2009-12-01 18:52 ` Greg A. Woods 2009-12-02 20:09 ` Jeff King 1 sibling, 1 reply; 33+ messages in thread From: Dmitry Potapov @ 2009-11-30 19:22 UTC (permalink / raw) To: The Git Mailing List; +Cc: Jeff King On Mon, Nov 30, 2009 at 01:12:31PM -0500, Greg A. Woods wrote: > > The way "git merge" works does concern me somewhat though as I try to > figure out how I might use "topic" branches to develop local features > and then merge them onto each supported release branch. The basic idea of using topic branches is development is done on separate branches are merged to the release branch only when they are ready to be released. These branches are based on the oldest branch in what they may be included. It means that fixes are normally based on the stable branch and new feature are based on the master branch, i.e. the branch that contains changes for the next new feature release. Not all branches got merged immediately into master. For instance, the git project has 'pu' (proposed updates) and 'next' branches. Only when a new feature proved itself to be useful and reliable, it is "graduated" to the master branch. Thus the master branch is rather stable and it is released on regular intervals (no need for a long stabilization period). The key difference comparing to what you may got used is that branches are normally based on the oldest branch in what this feature may be included. Thus normally changes are not backported to old branches, because you can merge them directly. > > Yes, you must cherry-pick or use rebase (which is a more featureful > > version of the pipeline you mentioned). > > "git rebase" will not work for me unless it grows a "copy" option , > i.e. one which does not delete the original branch (i.e. avoids the > "reset" phase of its operation). There is no reset phase... It is just reassigning the head of branch to point to a different commit-id. If you want to copy a branch instead of rebasing the old one, you create a new branch (a new name) that points to the same commit as the branch that you want to copy, after that you rebase this new branch. You can do that like this: $ git branch new-foo foo $ git rebase --onto newbase oldbase new-foo > It likely wouldn't make sense to base this new "copy" feature directly > on "git rebase" though, especially in light of all the warnings about > how "git rebase" isn't friendly when applied to already published > branches. I think in theory this "copy" feature won't cause problems > for already-published branches. The "copy" does not have the problem of rebase, but it has a different problem: You have two series of commits instead of one. If you found a bug in one of those commits, you will have to patch each series separately. Also, git merge may produce additional conflicts... So, copying commits is not something that I would recommend to do often. Dmitry ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-11-30 19:22 ` Dmitry Potapov @ 2009-12-01 18:52 ` Greg A. Woods 2009-12-01 20:50 ` Dmitry Potapov 2009-12-02 10:20 ` Nanako Shiraishi 0 siblings, 2 replies; 33+ messages in thread From: Greg A. Woods @ 2009-12-01 18:52 UTC (permalink / raw) To: Dmitry Potapov; +Cc: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 4904 bytes --] At Mon, 30 Nov 2009 22:22:12 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: Subject: Re: "git merge" merges too much! > > The key difference comparing to what you may got used is that branches > are normally based on the oldest branch in what this feature may be > included. Thus normally changes are not backported to old branches, > because you can merge them directly. Hmmm... the idea of creating topic branches based on the oldest branch where the feature might be used is indeed neither intuitive, nor is it mentioned anywhere I've so far read about using topic branches in Git. To use topic branches effectively this way, especially in managing local and custom changes to a large remote project where separate working directories are needed for long-running builds, I think some additional software configuration management tool must be used to create "configuration" branches where all the desired change sets (topic branches) are merged. I spent half my dreaming time early this morning running through scenarios of how to use topic branches, with true merging (not re-basing), in a usable work-flow. At the moment I'm leaning towards a process where the configuration branch is re-created for every build -- i.e. the merges are redone from every topic branch to a freshly configured branch forked from the locally supported release branch, hopefully making use of git-rerere to solve most conflicts in as automated a fashion as is possible. This may not be a sane thing to do though -- it may be too much work to do for every fix. It somewhat goes against the current natural trend in many of the projects I work on to develop changes on the trunk and then back-port (some of) them to release branches. Perhaps Stacked-Git really is the best answer. I will have to investigate more. > > > Yes, you must cherry-pick or use rebase (which is a more featureful > > > version of the pipeline you mentioned). > > > > "git rebase" will not work for me unless it grows a "copy" option , > > i.e. one which does not delete the original branch (i.e. avoids the > > "reset" phase of its operation). > > There is no reset phase... By "reset phase" I meant this part, from git-rebase(1): The current branch is reset to <upstream>, or <newbase> if the --onto option was supplied. This has the exact same effect as git reset --hard <upstream> (or <newbase>). > It is just reassigning the head of branch to > point to a different commit-id. If you want to copy a branch instead of > rebasing the old one, you create a new branch (a new name) that points > to the same commit as the branch that you want to copy, after that you > rebase this new branch. You can do that like this: > > $ git branch new-foo foo > > $ git rebase --onto newbase oldbase new-foo Hmmm.... I'll have to think about that. It makes some sense, but I don't intuitively read the command-line parameters well enough to predict the outcome in all of the scenarios I'm interested in. what is "oldbase" there? I'm guessing it means "base of foo" (and for the moment, "new-foo" too)? It's confusing because the manual page uses the word "upstream" to describe this parameter. From my experiments it looks like what I might want to do to copy a local branch to port its changes from one release branch to another is something like this (where local-v2.0 is a branch with local changes forked from release branch REL-v2.0, and I want to back-port these changes to a new local branch forked from the release branch REL-v1.0): $ git branch local-base-v1.0 REL-v1.0 # mark base of new branch $ git branch local-v1.0 local-v2.0 # dup head of src branch $ git rebase --onto local-base-v1.0 REL-v2.0 local-v1.0 $ git branch -d local-base-v1.0 The first and last steps may not be necessary if REL-v1.0 really is a branch, but in my play project it is just a tag on the trunk. In the case that it were really already a branch then hopefully this would do: $ git branch local-v1.0 local-v2.0 # dup head of src branch $ git rebase --onto REL-v1.0 REL-v2.0 local-v1.0 The trick here seems to be to invent the name of the new branch based on where it's going to be rebased to. I think this does suffice very nicely as a "git copy" operation! > The "copy" does not have the problem of rebase, but it has a different > problem: You have two series of commits instead of one. If you found > a bug in one of those commits, you will have to patch each series > separately. Also, git merge may produce additional conflicts... So, > copying commits is not something that I would recommend to do often. Indeed. -- Greg A. Woods +1 416 218-0098 VE3TCP RoboHack <woods@robohack.ca> Planix, Inc. <woods@planix.com> Secrets of the Weird <woods@weird.com> [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-12-01 18:52 ` Greg A. Woods @ 2009-12-01 20:50 ` Dmitry Potapov 2009-12-01 21:58 ` Greg A. Woods 2009-12-02 10:20 ` Nanako Shiraishi 1 sibling, 1 reply; 33+ messages in thread From: Dmitry Potapov @ 2009-12-01 20:50 UTC (permalink / raw) To: The Git Mailing List On Tue, Dec 01, 2009 at 01:52:18PM -0500, Greg A. Woods wrote: > At Mon, 30 Nov 2009 22:22:12 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: > Subject: Re: "git merge" merges too much! > > > > The key difference comparing to what you may got used is that branches > > are normally based on the oldest branch in what this feature may be > > included. Thus normally changes are not backported to old branches, > > because you can merge them directly. > > Hmmm... the idea of creating topic branches based on the oldest branch > where the feature might be used is indeed neither intuitive, nor is it > mentioned anywhere I've so far read about using topic branches in Git. Most things that we consider "intuitive" are those that we got used to. Git is different in many aspect than other VCSes (such as CVS/SVN), and the workflow that good for those VCSes may not be optimal for Git. There is a good description that provide basic knowledge how to use Git: man gitworkflows or online: http://www.kernel.org/pub/software/scm/git/docs/gitworkflows.html If you do not base your changes on the oldest branch then you will not be able to merge changes, which implies you will have to cherry-pick manually without ability automatic to track what changes were merged and what were not, this is a recipe for a disaster... > At the moment I'm leaning towards a process where the configuration > branch is re-created for every build -- i.e. the merges are redone from > every topic branch to a freshly configured branch forked from the > locally supported release branch, hopefully making use of git-rerere to > solve most conflicts in as automated a fashion as is possible. I am not quite sure that I fully understood your idea of configuration branches, but I want to warn you about one serious limitations of git-rerere -- it stores conflict resolution per-file basis. This means that if resolution of some conflict implies some change to another file then git-rerere will not help you here. So, it handles maybe 80-90% cases, but not all of them. > > Perhaps Stacked-Git really is the best answer. I will have to > investigate more. There is also TopGit. I have never used any of them, but if you are interested in patch management system, you probably should look at both of them. StGit is modelled after quilt, while TopGit is aimed to be better integrated with Git and better fit to work in distributed environment. But as I said, I do not have any first hand experience with any of them. (Personally, I would look at TopGit first, but maybe I am biased here). > > > > $ git branch new-foo foo > > > > $ git rebase --onto newbase oldbase new-foo > > Hmmm.... I'll have to think about that. It makes some sense, but I > don't intuitively read the command-line parameters well enough to > predict the outcome in all of the scenarios I'm interested in. > > what is "oldbase" there? I'm guessing it means "base of foo" (and for > the moment, "new-foo" too)? You have: o---o---o---o---o newbase \ o---o---o---o---o oldbase \ o---o---o foo and you want this: o---o---o---o---o newbase | \ | o´--o´--o´ new-foo \ o---o---o---o---o oldbase \ o---o---o foo Dmitry ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-12-01 20:50 ` Dmitry Potapov @ 2009-12-01 21:58 ` Greg A. Woods 2009-12-02 0:22 ` Dmitry Potapov 0 siblings, 1 reply; 33+ messages in thread From: Greg A. Woods @ 2009-12-01 21:58 UTC (permalink / raw) To: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 5801 bytes --] At Tue, 1 Dec 2009 23:50:57 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: Subject: Re: "git merge" merges too much! > > > > > > > $ git branch new-foo foo > > > > > > $ git rebase --onto newbase oldbase new-foo > > > > Hmmm.... I'll have to think about that. It makes some sense, but I > > don't intuitively read the command-line parameters well enough to > > predict the outcome in all of the scenarios I'm interested in. > > > > what is "oldbase" there? I'm guessing it means "base of foo" (and for > > the moment, "new-foo" too)? > > You have: > > o---o---o---o---o newbase > \ > o---o---o---o---o oldbase > \ > o---o---o foo Yes, sort of -- in the ideal situation, but not in my particular example where "oldbase" is just a tag, not a real branch. So yes, "oldbase" is in fact "base of foo". Trickier still is when the "oldbase" branch has one or more commits newer then "base of foo". Does Git not have a symbolic name for the true base of a branch? I.e. is there not some form of symbolic name for "N" in the following? o---o---o---o---o---o---o---o master \ o---o---N---o---o release-1 \ o---o---o local-release-1 (now of course if it is discovered that "release-1" has progressed since the base of "foo" then "foo" should be rebased first, but perhaps there is not time to do this before the other release has to be supported) > and you want this: > > o---o---o---o---o newbase > | \ > | o'--o'--o' new-foo > \ > o---o---o---o---o oldbase > \ > o---o---o foo Yes, sort of I suppose, if you trim all the non-relevant branches. What I really want, I think, is something like this where at least the non-relevant "master" branch is still shown: 1'--2'--3' new-foo / o---o---o newbase / o---o---o---o---o---o---o---o master \ o---o---o oldbase \ 1---2---3 foo Here's part of my confusion -- "newbase" as used above is actually older than "oldbase". :-) so ideally "oldbase" should always be described in terms of "foo", not just given an arbitrary unrelated name. Of course that doesn't rule out the following scenario either where "newbase" really is newer than "oldbase" -- in my world a given project might become locally supported first on either a newer release, or an older release, so both above and below might happen: 1'--2'--3' new-foo / o---o---o newbase / o---o---o---o---o---o---o---o master \ o---o---o oldbase \ 1---2---3 foo And eventually I want to also merge whatever is still relevant from foo to a "local" branch off master so that those changes can be sent (usually as patches) upstream. Sometimes I want to do development on a topic branch as close to the tip of "master" so that it can most easily be pushed upstream, and then back-port those changes to older release branches. In fact the latter is exactly how I picture release branches to work in normal development, and this is how several of the big projects I'd like to get using Git are doing development (now usually with CVS). Note too that in these kinds of projects "topic" branches are _always_ forked from the current tip of "master", long-running ones sometimes rebased to keep up with "master", small fixes and changes are made directly to the master branch; and small fixes, as well as relevant features, sometimes those developed on "topic" branches, are back-ported to release branches. Note I'm not talking about ideals of best practises specific for Git here -- I'm talking about actual working operational practises that people are _very_ familiar with and which have been well proven using a vast wide variety of different VCS's in the past. For example I seriously doubt any of the developers of the projects I'm thinking of that I'd like to switch to using Git are ever going to want to fork their topic branches from the oldest release branch base that they intend to support, and many such projects will necessarily always have at least a few long-running topic branches that will have to be frequently rebased to keep up with the trunk so that their eventual merging will go as smoothly as possible, and yet once any of these topic branches is finally "closed" their changes may also have to be back-ported to release branches. To me the natural way to do these kinds of back-porting "merges" is to restrict the merge to select only the commits on the branch, i.e. from its base to its tip, thus the motivation for the topic of my thread (and I think the motivation for the "What is the best way to backport a feature?" thread as well). I think if Git could do this kind of "partial" merging directly without having to "copy" deltas with "rebase" or "cherry-pick" or "am" or whatever, and thus create separate histories for them, then it would be much better at supporting this traditional practice of using branches to manage releases. Without such ability it truly does look as though some form of "patch" management tool is also a necessary thing(evil?), as "rebase" and "cherry-pick" could quickly get way out of control and be way too much work otherwise. -- Greg A. Woods Planix, Inc. <woods@planix.com> +1 416 218 0099 http://www.planix.com/ [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-12-01 21:58 ` Greg A. Woods @ 2009-12-02 0:22 ` Dmitry Potapov 0 siblings, 0 replies; 33+ messages in thread From: Dmitry Potapov @ 2009-12-02 0:22 UTC (permalink / raw) To: The Git Mailing List On Tue, Dec 01, 2009 at 04:58:34PM -0500, Greg A. Woods wrote: > At Tue, 1 Dec 2009 23:50:57 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: > Subject: Re: "git merge" merges too much! > > > > > > > > > > $ git branch new-foo foo > > > > > > > > $ git rebase --onto newbase oldbase new-foo > > > > > > Hmmm.... I'll have to think about that. It makes some sense, but I > > > don't intuitively read the command-line parameters well enough to > > > predict the outcome in all of the scenarios I'm interested in. > > > > > > what is "oldbase" there? I'm guessing it means "base of foo" (and for > > > the moment, "new-foo" too)? > > > > You have: > > > > o---o---o---o---o newbase > > \ > > o---o---o---o---o oldbase > > \ > > o---o---o foo > > Yes, sort of -- in the ideal situation, but not in my particular example > where "oldbase" is just a tag, not a real branch. It does not matter whether it is tag or branch or just SHA-1. You can use any two reference as newbase and oldbase. They specify two points in DAG. The only thing that has to be a branch in my example is new-foo. > > So yes, "oldbase" is in fact "base of foo". Trickier still is when the > "oldbase" branch has one or more commits newer then "base of foo". Does > Git not have a symbolic name for the true base of a branch? I.e. is > there not some form of symbolic name for "N" in the following? > > o---o---o---o---o---o---o---o master > \ > o---o---N---o---o release-1 > \ > o---o---o local-release-1 You can always find SHA-1 for N using the following command: git merge-base release-1 local-release-1 but you do not have to do that to rebase your changes. You just can run: # create a copy of local-release-1, so it will not disappear git branch copy-release-1 local-release-1 # rebase the branch to master git rebase --onto master release-1 copy-release-1 Dmitry ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-12-01 18:52 ` Greg A. Woods 2009-12-01 20:50 ` Dmitry Potapov @ 2009-12-02 10:20 ` Nanako Shiraishi 1 sibling, 0 replies; 33+ messages in thread From: Nanako Shiraishi @ 2009-12-02 10:20 UTC (permalink / raw) To: The Git Mailing List; +Cc: Dmitry Potapov Quoting "Greg A. Woods" <woods@planix.com> writes: > At Mon, 30 Nov 2009 22:22:12 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: > Subject: Re: "git merge" merges too much! >> >> The key difference comparing to what you may got used is that branches >> are normally based on the oldest branch in what this feature may be >> included. Thus normally changes are not backported to old branches, >> because you can merge them directly. > > Hmmm... the idea of creating topic branches based on the oldest branch > where the feature might be used is indeed neither intuitive, nor is it > mentioned anywhere I've so far read about using topic branches in Git. You may want to add the result of googling "Fun with" site:gitster.livejournal.com to the list of Git documents you read. "Fork from the oldest branch" is one of the techniques Junio teaches often and many of his other techiniques are built upon. He not just teaches useful techniques but explains a lot about the reasoning behind them in his Git book. His blog articles have the same explanations on many topics I saw in his book but not in other places. It is a useful substitute until his book gets translated to English for people who don't read Japanese. -- Nanako Shiraishi http://ivory.ap.teacup.com/nanako3/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-11-30 18:12 ` Greg A. Woods 2009-11-30 19:22 ` Dmitry Potapov @ 2009-12-02 20:09 ` Jeff King 2009-12-03 1:21 ` Git documentation consistency (was: "git merge" merges too much!) Greg A. Woods 1 sibling, 1 reply; 33+ messages in thread From: Jeff King @ 2009-12-02 20:09 UTC (permalink / raw) To: The Git Mailing List On Mon, Nov 30, 2009 at 01:12:31PM -0500, Greg A. Woods wrote: > (From a first pass through the documentation I would never have guessed > that "tags" were also a form of "refs". All these different names for I find git is much simpler to use and understand if you start "at the bottom" with the basic concepts (because for the most part, git is really a set of tools for manipulating the few basic data structures). For a short intro, try: http://eagain.net/articles/git-for-computer-scientists/ I think Scott Chacon's "Pro Git" book also takes a similar approach, but I confess that I have not actually read it carefully. At this point, I know enough about git to make reading it not very interesting. :) You can find it online at: http://progit.org/book/ > features. Even the gitglossary(7) is somewhat inconsistent on how it > uses "ref" and "refs". Perhaps all that's needed is some firm editing > and clean-up of the manuals and documentation by a good strong technical > editor.) I skimmed it and didn't see any inconsistency. If you have something specific in mind, please point it out so we can fix it. > "git rebase" will not work for me unless it grows a "copy" option , > i.e. one which does not delete the original branch (i.e. avoids the > "reset" phase of its operation). This option would likely only make > sense when used with the "--onto" option, I would guess. I think Dmitry already mentioned this, but you probably want to create a new branch to hold your rebased history if you don't want to modify the existing branch. > (git-log(1) is worse than ls(1) for having too many options, but worst > of all in the release I'm still using it doesn't respond sensibly nor > consistently with other commands when given the "-?" option.) $ ls -? ls: invalid option -- '?' Try `ls --help' for more information. $ ls --help ;# or ls -h [copious usage information] $ git log -? fatal: unrecognized argument: -? $ git log --help [the man page] $ git log -h usage: git log [<options>] [<since>..<until>] [[--] <path>...] or: git show [options] <object>... $ cd /outside/of/git/repo $ git log -? fatal: Not a git repository (or any of the parent directories): .git So "-?" is bogus for both ls and git. But there are two failings I see: 1. Outside of a repository, "git log" does not even get to the argument-parsing phase to see that "-?" is bogus. We short-circuit "-h" and "--help" to avoid actually looking for a git repository, but obviously cannot do so for every "--bogus" argument we see. We could potentially also short-circuit "-?" (and probably map it to "-h" if we were going to do that). However, I didn't think "-?" was in common use. 2. "git log -h" doesn't mention any of the options specifically, though other git commands do (e.g., try "git archive -h"). This is because the option list is generated by our parseopt library, but the revision and diff options (which are the only ones that "git log" takes) do not use parseopt. Maybe we should point to "--help" for the full list in that case. -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Git documentation consistency (was: "git merge" merges too much!) 2009-12-02 20:09 ` Jeff King @ 2009-12-03 1:21 ` Greg A. Woods 2009-12-03 1:34 ` Git documentation consistency Junio C Hamano 2009-12-03 2:07 ` Git documentation consistency (was: "git merge" merges too much!) Jeff King 0 siblings, 2 replies; 33+ messages in thread From: Greg A. Woods @ 2009-12-03 1:21 UTC (permalink / raw) To: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 3500 bytes --] At Wed, 2 Dec 2009 15:09:04 -0500, Jeff King <peff@peff.net> wrote: Subject: Re: "git merge" merges too much! > > I find git is much simpler to use and understand if you start "at the > bottom" with the basic concepts (because for the most part, git is > really a set of tools for manipulating the few basic data structures). I think that's the problem actually -- I don't really want to know too much about how it works under the hood (yet), I just want to use it in the most effective way for my purposes. There's lots of talk about using Git as the basis for a true high-level VCS and SCM system, yet it doesn't look to me that anyone has created such a VCS or SCMS using Git. > I skimmed it and didn't see any inconsistency. If you have something > specific in mind, please point it out so we can fix it. I think anyone who's been participating on this list for any significant amount of time is far too close to the subject to be able to serve as a candid independent technical editor who could really help clean things up and make the documentation much more consistent. Obviously such an editor would also require the help of experts at all the details too. :-) Unfortunately I'm not a very good technical editor, and I don't really have time to devote to doing such editing of documentation either. > > (git-log(1) is worse than ls(1) for having too many options, but worst > > of all in the release I'm still using it doesn't respond sensibly nor > > consistently with other commands when given the "-?" option.) > > $ ls -? > ls: invalid option -- '?' > Try `ls --help' for more information. Please keep in mind all the world is not GNU: $ ls -? ls: unknown option -- ? usage: ls [-AaBbCcdFfghikLlmnopqRrSsTtuWwx1] [file ...] My point was that _most_ other Git sub-commands already do respond to "-?" sensibly with real, helpful, information; usually a summary of the command options and parameters. I.e. this is yet another form of inconsistency in "documentation" in Git. :-) The reference to "ls" was just as a comparison with it's somewhat extensive variety of options. In fact "git log" is way more complex than "ls" because its parameters are not all just simple flags like those for "ls" -- they often have their own parameters too. > $ git log -h > usage: git log [<options>] [<since>..<until>] [[--] <path>...] > or: git show [options] <object>... Indeed, so why the heck can't it do something similar with '-?'. That's just sloppy programming, no? Most other commands know '-?', and despite the silliness with GNU Ls, use of '-?' to request summary usage information is pretty much a de facto standard for unix commands. Your point about mentioning "--help" in the summary usage information is a good one though -- especially for a command with a very complex set of command-line parameters. However that alone isn't sufficient -- users still need the summary as that alone helps trigger associations and may be sufficient to allow a user to proceed quickly to get the command to do what they want. (the whole "fatal: not a git repository" error for "git foo -[h?]" handling is also a rather silly one -- but I guess when something grows quickly and from many inputs there's not always time to keep some of these basic things clean and consistent) -- Greg A. Woods Planix, Inc. <woods@planix.com> +1 416 218 0099 http://www.planix.com/ [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Git documentation consistency 2009-12-03 1:21 ` Git documentation consistency (was: "git merge" merges too much!) Greg A. Woods @ 2009-12-03 1:34 ` Junio C Hamano 2009-12-03 7:22 ` Greg A. Woods 2009-12-03 2:07 ` Git documentation consistency (was: "git merge" merges too much!) Jeff King 1 sibling, 1 reply; 33+ messages in thread From: Junio C Hamano @ 2009-12-03 1:34 UTC (permalink / raw) To: Greg A. Woods; +Cc: The Git Mailing List "Greg A. Woods" <woods@planix.com> writes: > $ ls -? > ls: unknown option -- ? > usage: ls [-AaBbCcdFfghikLlmnopqRrSsTtuWwx1] [file ...] > ... > Most other commands know '-?', and despite > the silliness with GNU Ls, use of '-?' to request summary usage > information is pretty much a de facto standard for unix commands. I think you are showing ignorance here, as -? is *not* even close to standard, nor even widely used practice at all. I somehow doubt your ls would respond to "ls -X" any differently from "ls -?", but is giving the same canned response to any unknown option. The "usage: ls [-AaBbC...] [file...]" indeed is much better than abstract "usage: frotz <options> <args>" that does not list what <options> are, but that is a totally different thing. On that point, I think Peff already made a good suggestion of giving the full help text in such a case. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Git documentation consistency 2009-12-03 1:34 ` Git documentation consistency Junio C Hamano @ 2009-12-03 7:22 ` Greg A. Woods 2009-12-03 7:45 ` Jeff King 2009-12-03 16:22 ` Marko Kreen 0 siblings, 2 replies; 33+ messages in thread From: Greg A. Woods @ 2009-12-03 7:22 UTC (permalink / raw) To: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 2399 bytes --] At Wed, 02 Dec 2009 17:34:01 -0800, Junio C Hamano <gitster@pobox.com> wrote: Subject: Re: Git documentation consistency > > I think you are showing ignorance here, as -? is *not* even close to > standard, nor even widely used practice at all. I think I should know something about Unix command line and option parsers, having used them for some 25 years or so now. In fact I've used most every kind of unix that ever was, and I've worked on the source to more than a few. > I somehow doubt your ls > would respond to "ls -X" any differently from "ls -?", but is giving the > same canned response to any unknown option. Indeed. Whether it is useful for "-?" to give a more detailed summary than the basic one-line usage message often depends on how complex the command-line features are. My preference is for '-?' (and if possible '-h' as well) to give detailed usage info, though hopefully limited to just one screen full if possible (though with Git's free use of $PAGER in many contexts it might be OK to feed longer usage info to $PAGER too). So, the basic usage messages for command-line "syntax" problems should be a one-line summary (following the error message). If more detail could be useful then '-?' (and if possible '-h') should display a more detailed usage message. (and if one line suffices then '-?' and '-h' don't really have to be treated specially) > The "usage: ls [-AaBbC...] [file...]" indeed is much better than abstract > "usage: frotz <options> <args>" that does not list what <options> are, but > that is a totally different thing. On that point, I think Peff already > made a good suggestion of giving the full help text in such a case. Indeed the abstract content-free "<options>" style message is almost useless in most cases. It would also be useful to reflect in the usage message what the driver program saw as its argv[0], not what the internal command saw: $ git rebase -X Usage: /usr/pkg/libexec/git-core//git-rebase [--interactive | -i] [-v] [--onto <newbase>] <upstream> [<branch>] Or perhaps this could be cleaned up with something as simple as always stripping off the pathname with the likes of: argv0 = (argv0 = strrchr(argv[0], '/')) ? argv0 + 1 : argv[0]; -- Greg A. Woods Planix, Inc. <woods@planix.com> +1 416 218 0099 http://www.planix.com/ [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Git documentation consistency 2009-12-03 7:22 ` Greg A. Woods @ 2009-12-03 7:45 ` Jeff King 2009-12-03 15:24 ` Uri Okrent 2009-12-03 16:22 ` Marko Kreen 1 sibling, 1 reply; 33+ messages in thread From: Jeff King @ 2009-12-03 7:45 UTC (permalink / raw) To: The Git Mailing List On Thu, Dec 03, 2009 at 02:22:41AM -0500, Greg A. Woods wrote: > > I think you are showing ignorance here, as -? is *not* even close to > > standard, nor even widely used practice at all. > > I think I should know something about Unix command line and option > parsers, having used them for some 25 years or so now. In fact I've > used most every kind of unix that ever was, and I've worked on the > source to more than a few. I don't want to nitpick, because the main thrust of your point (that "git foo --bogus" should show a more useful message) is not affected by this subpoint at all. But if we are considering special-casing "-?", I would like to see some evidence that it is actually in use. I can't seem to find it respected anywhere, as shown below (and note that yes, some of this output does show a useful help message, but that is because --bogus would show the same message, and I am not disputing that we should handle that case): # My linux box $ uname -sr Linux 2.6.31-1-686 $ ls -? ls: invalid option -- '?' Try `ls --help' for more information. # Solaris $ uname -sr SunOS 5.8 $ /bin/ls -? /bin/ls: illegal option -- ? usage: ls -1RaAdCxmnlogrtucpFbqisfL [files] $ /usr/ucb/ls -? ;# appears to ignore bogus options entirely? foo $ /usr/ucb/fold -? /usr/ucb/fold: illegal option -- ? Usage: fold [-bs] [-w width | -width ] [file...] $ /usr/xpg4/bin/ls -? /usr/xpg4/bin/ls: illegal option -- ? usage: ls -1RaAdCxmnlogrtucpFbqisfL [files] $ /usr/xpg6/bin/ls -? bash: /usr/xpg6/bin/ls: No such file or directory # AIX $ uname -svr AIX 1 6 $ /bin/ls -? /bin/ls: illegal option -- ? usage: ls [-1ACFHLNRabcdefgilmnopqrstuxEUX] [File...] So what systems _do_ treat "-?" specially? -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Git documentation consistency 2009-12-03 7:45 ` Jeff King @ 2009-12-03 15:24 ` Uri Okrent 0 siblings, 0 replies; 33+ messages in thread From: Uri Okrent @ 2009-12-03 15:24 UTC (permalink / raw) To: Jeff King; +Cc: The Git Mailing List On Wed, Dec 2, 2009 at 11:45 PM, Jeff King <peff@peff.net> wrote: > So what systems _do_ treat "-?" specially? > > -Peff Windows seems to (of course you need to use '/' instead of '-'): Microsoft Windows [Version 6.0.6001] c:\>dir -h Volume in drive C has no label. Volume Serial Number is B09A-B49F Directory of c:\ File Not Found c:\>dir /h Invalid switch - "h". c:\>dir /help Invalid switch - "help". c:\>dir /? Displays a list of files and subdirectories in a directory. DIR [drive:][path][filename] [/A[[:]attributes]] [/B] [/C] [/D] [/L] [/N] [/O[[:]sortorder]] [/P] [/Q] [/R] [/S] [/T[[:]timefield]] [/W] [/X] [/4] [drive:][path][filename] Specifies drive, directory, and/or files to list. /A Displays files with specified attributes. attributes D Directories R Read-only files H Hidden files A Files ready for archiving S System files I Not content indexed files L Reparse Points - Prefix meaning not /B Uses bare format (no heading information or summary). /C Display the thousand separator in file sizes. This is the default. Use /-C to disable display of separator. /D Same as wide but files are list sorted by column. /L Uses lowercase. /N New long list format where filenames are on the far right. /O List by files in sorted order. sortorder N By name (alphabetic) S By size (smallest first) E By extension (alphabetic) D By date/time (oldest first) G Group directories first - Prefix to reverse order /P Pauses after each screenful of information. /Q Display the owner of the file. /R Display alternate data streams of the file. /S Displays files in specified directory and all subdirectories. /T Controls which time field displayed or used for sorting timefield C Creation A Last Access W Last Written /W Uses wide list format. /X This displays the short names generated for non-8dot3 file names. The format is that of /N with the short name inserted before the long name. If no short name is present, blanks are displayed in its place. /4 Displays four-digit years Switches may be preset in the DIRCMD environment variable. Override preset switches by prefixing any switch with - (hyphen)--for example, /-W. c:\> -- Uri Please consider the environment before printing this message. http://www.panda.org/how_you_can_help/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Git documentation consistency 2009-12-03 7:22 ` Greg A. Woods 2009-12-03 7:45 ` Jeff King @ 2009-12-03 16:22 ` Marko Kreen 2009-12-09 19:56 ` Greg A. Woods 1 sibling, 1 reply; 33+ messages in thread From: Marko Kreen @ 2009-12-03 16:22 UTC (permalink / raw) To: The Git Mailing List On 12/3/09, Greg A. Woods <woods@planix.com> wrote: > At Wed, 02 Dec 2009 17:34:01 -0800, Junio C Hamano <gitster@pobox.com> wrote: > Subject: Re: Git documentation consistency > > I think you are showing ignorance here, as -? is *not* even close to > > standard, nor even widely used practice at all. > > I think I should know something about Unix command line and option > parsers, having used them for some 25 years or so now. In fact I've > used most every kind of unix that ever was, and I've worked on the > source to more than a few. '?' is what getopt(3) is supposed to return for unknown options. -- marko ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Git documentation consistency 2009-12-03 16:22 ` Marko Kreen @ 2009-12-09 19:56 ` Greg A. Woods 0 siblings, 0 replies; 33+ messages in thread From: Greg A. Woods @ 2009-12-09 19:56 UTC (permalink / raw) To: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 2174 bytes --] At Thu, 3 Dec 2009 18:22:27 +0200, Marko Kreen <markokr@gmail.com> wrote: Subject: Re: Git documentation consistency > > On 12/3/09, Greg A. Woods <woods@planix.com> wrote: > > At Wed, 02 Dec 2009 17:34:01 -0800, Junio C Hamano <gitster@pobox.com> wrote: > > Subject: Re: Git documentation consistency > > > I think you are showing ignorance here, as -? is *not* even close to > > > standard, nor even widely used practice at all. > > > > I think I should know something about Unix command line and option > > parsers, having used them for some 25 years or so now. In fact I've > > used most every kind of unix that ever was, and I've worked on the > > source to more than a few. > > '?' is what getopt(3) is supposed to return for unknown options. Indeed, which is why it cannot ever, in general, be used as a valid option with some command-specific meaning, and so why the one-line form of the usage message should always be displayed when the user explicitly gives a '-?' option (i.e. in the same way as if any unknown option is given). If the one-line usage message is too terse to explain more complex command line syntax then '-h' (and --help) should display a short multi-line summary of the command's usage. I guess what I should have suggested in the first place is that all Git sub-commands should respond with a one-line[*] usage message when they encounter an unknown option, such as '-?', and that they should (only) display a more detailed multi-line help summary when given '-h' or '--help' options. I was most surprised when I didn't get a one-line usage summary from "git log -?", just getopt(3)'s error message. [*] commands which have multiple distinct modes of operation with separate and unique command-line syntax for each mode, should of course display a one-line summary for each command mode, such as in this example: void usage(void) { fprintf(stderr, "Usage: %s [-abcdef]\n", getprogname()); fprintf(stderr, " %s [-l]\n", getprogname()); exit(2); } -- Greg A. Woods Planix, Inc. <woods@planix.com> +1 416 218 0099 http://www.planix.com/ [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Git documentation consistency (was: "git merge" merges too much!) 2009-12-03 1:21 ` Git documentation consistency (was: "git merge" merges too much!) Greg A. Woods 2009-12-03 1:34 ` Git documentation consistency Junio C Hamano @ 2009-12-03 2:07 ` Jeff King 1 sibling, 0 replies; 33+ messages in thread From: Jeff King @ 2009-12-03 2:07 UTC (permalink / raw) To: The Git Mailing List On Wed, Dec 02, 2009 at 08:21:39PM -0500, Greg A. Woods wrote: > > I find git is much simpler to use and understand if you start "at the > > bottom" with the basic concepts (because for the most part, git is > > really a set of tools for manipulating the few basic data structures). > > I think that's the problem actually -- I don't really want to know too > much about how it works under the hood (yet), I just want to use it in > the most effective way for my purposes. > > There's lots of talk about using Git as the basis for a true high-level > VCS and SCM system, yet it doesn't look to me that anyone has created > such a VCS or SCMS using Git. Sure, I can understand that. And I invite you (or anyone) to work on such a VCS, and I am sure I am not alone on the list in sincerely hoping you succeed and offering to help support however core git developers can. But we have seen people try this in the past, and it never quite seems to work. All of the cogito people ended up migrating to git. I was one of them. In my case, the git tools offered better access to the fundamental operations, which is what I found interesting and powerful about it. I suspect some others migrated for the same reasons, though perhaps many did simply because cogito did not keep up with core git in terms of features. There is also "eg" these days, which attempts to do what you're saying. I don't know how big a userbase it has; I've never been personally interested in it. > I think anyone who's been participating on this list for any significant > amount of time is far too close to the subject to be able to serve as a > candid independent technical editor who could really help clean things > up and make the documentation much more consistent. Obviously such an > editor would also require the help of experts at all the details too. :-) Sure, I think an outsider doing a really nice job of overhauling the documentation would be nice. There are some git books, some by insiders, and some not. For the same reason that you mention, it would hard for me to assess their quality with too much objectivity. :) My question was more of a "leaving aside overhauling the documentation, did you see something obvious that we can fix right now" kind of thing. > > $ ls -? > > ls: invalid option -- '?' > > Try `ls --help' for more information. > > Please keep in mind all the world is not GNU: > > $ ls -? > ls: unknown option -- ? > usage: ls [-AaBbCcdFfghikLlmnopqRrSsTtuWwx1] [file ...] Right, but my point is unchanged. Neither ls actually _recognizes_ "-?". They do the same for "--bogosity". > My point was that _most_ other Git sub-commands already do respond to > "-?" sensibly with real, helpful, information; usually a summary of the > command options and parameters. Yes, for the same reason that "ls" does: they don't recognize it. But if you are asking for "git log" to produce the short usage message, then that is part of my issue (1) from the last message: "log" doesn't use the same parseopt library as (most of) the rest of git[1]. Yes, it's inconsistent. Those inconsistencies were introduced over time (before we had parseopt), and we are slowly fixing them over time. Patches welcome. :) [1] There are other inconsistencies because of this, too. You can't say "git log -pz", but must say "git log -p -z". > > $ git log -h > > usage: git log [<options>] [<since>..<until>] [[--] <path>...] > > or: git show [options] <object>... > > Indeed, so why the heck can't it do something similar with '-?'. That's > just sloppy programming, no? Most other commands know '-?', and despite > the silliness with GNU Ls, use of '-?' to request summary usage > information is pretty much a de facto standard for unix commands. Nothing "knows" -?, but it is true that the parseopt-ified commands behave differently (and IMHO, better). You can call it sloppy, I guess; it is really an artifact of commands being written and changing over time. As I said, we are slowly converging on consistency. And no, I don't want to get into a big debate over whether it is better to plan your software up front, or to let it evolve over time. I have opinions that do not necessarily line up with how git came into being, but the end product is useful enough that I like to use it and hack on it. :) > (the whole "fatal: not a git repository" error for "git foo -[h?]" > handling is also a rather silly one -- but I guess when something grows > quickly and from many inputs there's not always time to keep some of > these basic things clean and consistent) The problem here is that there are two chunks of code: the "git" wrapper, and the "log" command. The wrapper knows a few things about each command like "does it need to be in a git repository?", and checks that before we even look at the command-line options. There is an explicit "check for --help" hack. Fixing that startup procedure to be more sane would be possible, but there are a lot of hidden demons lurking in changing the order of the startup sequence. Again, patches welcome. :) -Peff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-11-29 3:21 "git merge" merges too much! Greg A. Woods 2009-11-29 5:14 ` Jeff King @ 2009-11-29 5:15 ` Junio C Hamano 2009-11-30 18:40 ` Greg A. Woods 1 sibling, 1 reply; 33+ messages in thread From: Junio C Hamano @ 2009-11-29 5:15 UTC (permalink / raw) To: Greg A. Woods; +Cc: The Git Mailing List In order to make things smoother and easier in the future, you may want to learn "topic branch" workflows (found in many git tutorial material). But it is too late for the history you already created; "cherry-pick" is your friend to recover from the shape of your existing history. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-11-29 5:15 ` "git merge" merges too much! Junio C Hamano @ 2009-11-30 18:40 ` Greg A. Woods 2009-11-30 20:50 ` Junio C Hamano 2009-11-30 21:17 ` Dmitry Potapov 0 siblings, 2 replies; 33+ messages in thread From: Greg A. Woods @ 2009-11-30 18:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 4105 bytes --] At Sat, 28 Nov 2009 21:15:05 -0800, Junio C Hamano <gitster@pobox.com> wrote: Subject: Re: "git merge" merges too much! > > In order to make things smoother and easier in the future, you may want to > learn "topic branch" workflows (found in many git tutorial material). I was thinking hard about topic branches too, but I'm still having a hard time figuring out how they might work best for my purposes. One hard problem is how to create "clean" topic branches and use them effectively when the local working environment requires all or many of the whole set of local changes. Bootstrapping these local changes as topic branches may be one thing; but going further with topic branches to create local features or fixes, some of which are to be submitted upstream for eventual inclusion in the origin/master branch, and others of which will only ever be ported forward to future official releases, is quite another thing all together. These new-feature topic branches will have to be worked on from the main (most well supported) local branch, which will be forked from the main release branch (or based on the trunk at the main release tag), but yet care will have to be taken to make sure the merge doesn't include dependencies on local-only changes (i.e. so that it can be safely submitted upstream). Some projects also only wish to receive patches that work against their trunk branch, so a given local topic branch will have to be merged onto yet another branch forking from the trunk where it may likely encounter conflicts (since the topic branch is based on older code from an existing release). This isn't really a Git problem I suppose, except for the fact that it means the lack of easy support for multiple working directories that track different branches makes this kind of development somewhat more difficult to do with Git than with, say, CVS. While it may be quite convenient in small projects to quickly move a single working directory from one branch to another and do various builds and tests from the result, large projects (say where a compile takes the better part of a working day or more and where testing requires multi-day processes) demand that working directories remain "stable", and multiple lines of development therefore demand multiple working directories. Developing procedures around Git to manage this with "push" and "pull" into multiple local copies of the repository, each with their own working directory, is of course possible (though not necessarily easy), but once again if the repositories are similarly huge then it may not be possible to support multiple repo copies for each developer in a given working environment. So, ideally it seems from my understanding at this point that I want to be able to repeatedly merge topic branch changes (as they are worked on) into multiple local configuration branches (each of which perhaps includes multiple local topics and other changes), each of which pushes its changes out into possibly multiple working directories. > But it is too late for the history you already created; "cherry-pick" is > your friend to recover from the shape of your existing history. The problem is that it's not _my_ existing history -- it's from the remote project I'm trying to work with. I think you'll agree there nothing wrong with a project using tags alone to manage its releases. However this means I've got to work out how to do merges of my local changes onto multiple locally created branches which fork off from these tags. Perhaps using this cherry-pick tool is truly the best I can do in this situation. (it makes me worry though about how I might manage a super-project which pulls from many remote sub-projects (and which includes large amounts of its own code) where some of those remote projects use release branches, and some just use tags, etc.) -- Greg A. Woods +1 416 218-0098 VE3TCP RoboHack <woods@robohack.ca> Planix, Inc. <woods@planix.com> Secrets of the Weird <woods@weird.com> [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-11-30 18:40 ` Greg A. Woods @ 2009-11-30 20:50 ` Junio C Hamano 2009-11-30 21:17 ` Dmitry Potapov 1 sibling, 0 replies; 33+ messages in thread From: Junio C Hamano @ 2009-11-30 20:50 UTC (permalink / raw) To: The Git Mailing List; +Cc: Junio C Hamano "Greg A. Woods" <woods@planix.com> writes: > .... This isn't really a Git problem I suppose, except > for the fact that it means the lack of easy support for multiple working > directories that track different branches makes this kind of development > somewhat more difficult to do with Git than with, say, CVS. You want to google for git-new-workdir then; it is found in contrib/workdir and fairly widely used. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-11-30 18:40 ` Greg A. Woods 2009-11-30 20:50 ` Junio C Hamano @ 2009-11-30 21:17 ` Dmitry Potapov 2009-12-01 0:24 ` Greg A. Woods 1 sibling, 1 reply; 33+ messages in thread From: Dmitry Potapov @ 2009-11-30 21:17 UTC (permalink / raw) To: The Git Mailing List; +Cc: Junio C Hamano On Mon, Nov 30, 2009 at 01:40:38PM -0500, Greg A. Woods wrote: > > While it may be quite convenient in small projects to quickly move a > single working directory from one branch to another and do various > builds and tests from the result, large projects (say where a compile > takes the better part of a working day or more and where testing > requires multi-day processes) demand that working directories remain > "stable", and multiple lines of development therefore demand multiple > working directories. It depends on the project and what tools are used, but using ccache and proper dependencies help a lot to reduce the cost of switching. In fact, it may be faster to switch to another branch and have to recompile a few files than to go into another working directory, because when you go to another working directory, you hit cold cache and things get very slow. And then if a project is huge and takes a lot of time to compile and test everything, I do not think, it is a good idea to build that in your work tree. Instead, you make a shanshot using git-archive and then run full build and test on it. In this way, you know that you test exactly what you have committed (you can amend any commit later until you publish it). Dmitry ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-11-30 21:17 ` Dmitry Potapov @ 2009-12-01 0:24 ` Greg A. Woods 2009-12-01 5:47 ` Dmitry Potapov 0 siblings, 1 reply; 33+ messages in thread From: Greg A. Woods @ 2009-12-01 0:24 UTC (permalink / raw) To: The Git Mailing List; +Cc: Dmitry Potapov [-- Attachment #1: Type: text/plain, Size: 2349 bytes --] At Tue, 1 Dec 2009 00:17:44 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: Subject: Re: "git merge" merges too much! > > It depends on the project and what tools are used, but using ccache and > proper dependencies help a lot to reduce the cost of switching. In fact, > it may be faster to switch to another branch and have to recompile a few > files than to go into another working directory, because when you go to > another working directory, you hit cold cache and things get very slow. perhaps, sometimes, but at least with some tools that can more often than not just end up with a confusing mess if you happen to change something at exactly the wrong time, and with Git it seems almost too easy to wildly change many files in the working directory, even if you can get them back into their previous state relatively quickly Things get even weirder if you happen to be playing with older branches too -- most build tools don't have ability to follow files that go back in time as they assume any product files newer than the sources are already up-to-date, no matter how much older the sources might become on a second build. From a good software hygiene perspective the only safe way I can see to build a Git working directory after manipulating any branches with Git is to do a complete "make clean && make" cycle. That is until Git also incorporates, or integrates with, really good build tools.... :-) > And then if a project is huge and takes a lot of time to compile and > test everything, I do not think, it is a good idea to build that in your > work tree. Instead, you make a shanshot using git-archive and then run > full build and test on it. In this way, you know that you test exactly > what you have committed (you can amend any commit later until you > publish it). I think this "git-new-workdir" script is the thing to try. It probably means keeping separate "configuration" branches, one for each build working directory, but I think that's OK. These source trees are big enough that one doesn't just go throwing around entire copies of them willy-nilly. Disk bandwidth is also a limited resource we are very concerned about, not just disk space. -- Greg A. Woods Planix, Inc. <woods@planix.com> +1 416 218 0099 http://www.planix.com/ [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: "git merge" merges too much! 2009-12-01 0:24 ` Greg A. Woods @ 2009-12-01 5:47 ` Dmitry Potapov 2009-12-01 17:59 ` multiple working directories for long-running builds (was: "git merge" merges too much!) Greg A. Woods 0 siblings, 1 reply; 33+ messages in thread From: Dmitry Potapov @ 2009-12-01 5:47 UTC (permalink / raw) To: The Git Mailing List On Mon, Nov 30, 2009 at 07:24:14PM -0500, Greg A. Woods wrote: > > Things get even weirder if you happen to be playing with older branches > too -- most build tools don't have ability to follow files that go back > in time as they assume any product files newer than the sources are > already up-to-date, no matter how much older the sources might become on > a second build. No, files do not go back in time when you switch between branches. The timestamp on files is the time when they are written to your working tree (and it is so with other VCSes that I worked with, so I do not know where you get idea of file timestamps going back in time). Thus any building tool such as 'make' should work fine provided you have correct dependencies. I have switched between widely diversed branches, and I have never had any problem with that. Dmitry ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: multiple working directories for long-running builds (was: "git merge" merges too much!) 2009-12-01 5:47 ` Dmitry Potapov @ 2009-12-01 17:59 ` Greg A. Woods 2009-12-01 18:51 ` Dmitry Potapov 0 siblings, 1 reply; 33+ messages in thread From: Greg A. Woods @ 2009-12-01 17:59 UTC (permalink / raw) To: The Git Mailing List; +Cc: Dmitry Potapov [-- Attachment #1: Type: text/plain, Size: 2257 bytes --] At Tue, 1 Dec 2009 08:47:34 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: Subject: Re: "git merge" merges too much! > > On Mon, Nov 30, 2009 at 07:24:14PM -0500, Greg A. Woods wrote: > > > > Things get even weirder if you happen to be playing with older branches > > too -- most build tools don't have ability to follow files that go back > > in time as they assume any product files newer than the sources are > > already up-to-date, no matter how much older the sources might become on > > a second build. > > No, files do not go back in time when you switch between branches. The > timestamp on files is the time when they are written to your working > tree Hmmm, I didn't really say anything in particular about file timestamps -- I meant the file content may go back in time. More correctly I should have said that the file content may become inconsistent with the state of other files that have just been compiled. If the timestamps do not get set back to commit time, but rather are simply updated to move the last modify time to the time each change is made to a working file (which is as you said, to be expected), regardless of whether its content goes back in time or not, then this may or may not help a currently running build to figure out what really needs to be re-compiled. Likely it won't even for a recursive-make style update build, but certainly not for one where all build actions are pre-determined before any of them are started. If the content of one or more files goes back in time to an earlier state while the compile is happening then ultimately the result must be considered to be undefined. The best you can hope for is a break in the compile. This is why I agreed with you that a build should never be done in a working directory where any file editing or VCS action is occurring simultaneously. I just disagreed that "git archive" was a reasonable alternative to leaving the working directory alone during the entire time of the build. It is not really reasonable for large projects any more than stopping all work on the sources is reasonable. -- Greg A. Woods Planix, Inc. <woods@planix.com> +1 416 218 0099 http://www.planix.com/ [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: multiple working directories for long-running builds (was: "git merge" merges too much!) 2009-12-01 17:59 ` multiple working directories for long-running builds (was: "git merge" merges too much!) Greg A. Woods @ 2009-12-01 18:51 ` Dmitry Potapov 2009-12-01 18:58 ` Greg A. Woods 0 siblings, 1 reply; 33+ messages in thread From: Dmitry Potapov @ 2009-12-01 18:51 UTC (permalink / raw) To: The Git Mailing List On Tue, Dec 01, 2009 at 12:59:58PM -0500, Greg A. Woods wrote: > At Tue, 1 Dec 2009 08:47:34 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: > Subject: Re: "git merge" merges too much! > > > > On Mon, Nov 30, 2009 at 07:24:14PM -0500, Greg A. Woods wrote: > > > > > > Things get even weirder if you happen to be playing with older branches > > > too -- most build tools don't have ability to follow files that go back > > > in time as they assume any product files newer than the sources are > > > already up-to-date, no matter how much older the sources might become on > > > a second build. > > > > No, files do not go back in time when you switch between branches. The > > timestamp on files is the time when they are written to your working > > tree > > Hmmm, I didn't really say anything in particular about file timestamps > -- I meant the file content may go back in time. More correctly I > should have said that the file content may become inconsistent with the > state of other files that have just been compiled. There is no difference of content going back in time or forth. If a file is changed, any decent build system should recompile the corresponding files. If the build does not handle dependencies properly, you can end up with inconsistent state just by editing some files. > If the timestamps do not get set back to commit time, but rather are > simply updated to move the last modify time to the time each change is > made to a working file (which is as you said, to be expected), More precisely, Git does not anything about modification time during checkout. The system automatically updates the modification time when a file is written, and Git does not mess with it. > regardless of whether its content goes back in time or not, then this > may or may not help a currently running build to figure out what really > needs to be re-compiled. Obviously, switching branches while running build may produce very confusing results, but it is not any different than editing files by hands during built -- any concurrent modification may confuse the build system. > I just disagreed that "git archive" was a reasonable alternative to > leaving the working directory alone during the entire time of the build. Using "git archive" allows you avoid running long time procedure such as full clean build and testing in the working tree. Also, it is guaranteed that you test exactly what you put in Git and some other garbage in your working tree does not affect the result. But my point was that switching between branches and recompile a few changed files may be faster than going to another working tree. Dmitry ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: multiple working directories for long-running builds (was: "git merge" merges too much!) 2009-12-01 18:51 ` Dmitry Potapov @ 2009-12-01 18:58 ` Greg A. Woods 2009-12-01 21:18 ` Dmitry Potapov 0 siblings, 1 reply; 33+ messages in thread From: Greg A. Woods @ 2009-12-01 18:58 UTC (permalink / raw) To: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 1701 bytes --] At Tue, 1 Dec 2009 21:51:14 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: Subject: Re: multiple working directories for long-running builds (was: "git merge" merges too much!) > > Obviously, switching branches while running build may produce very > confusing results, but it is not any different than editing files by > hands during built -- any concurrent modification may confuse the build > system. That's what I said. This is why multiple working directories is an essential feature for any significantly large project. > > I just disagreed that "git archive" was a reasonable alternative to > > leaving the working directory alone during the entire time of the build. > > Using "git archive" allows you avoid running long time procedure such as > full clean build and testing in the working tree. Also, it is guaranteed > that you test exactly what you put in Git and some other garbage in your > working tree does not affect the result. Sure, but let's be very clear here: "git archive" is likely even more impossible for some large projects to use than "git clone" would be to use to create build directories. Disk bandwidth is almost always more expensive than disk space. > But my point was that switching > between branches and recompile a few changed files may be faster than > going to another working tree. That's possibly going to generate even more unnecessary churn in the working directory, and thus even more unnecessary re-compiles. Multiple working directories are really the only sane solution sometimes. -- Greg A. Woods Planix, Inc. <woods@planix.com> +1 416 218 0099 http://www.planix.com/ [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: multiple working directories for long-running builds (was: "git merge" merges too much!) 2009-12-01 18:58 ` Greg A. Woods @ 2009-12-01 21:18 ` Dmitry Potapov 2009-12-01 22:25 ` Jeff Epler ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Dmitry Potapov @ 2009-12-01 21:18 UTC (permalink / raw) To: The Git Mailing List On Tue, Dec 01, 2009 at 01:58:05PM -0500, Greg A. Woods wrote: > > > > I just disagreed that "git archive" was a reasonable alternative to > > > leaving the working directory alone during the entire time of the build. > > > > Using "git archive" allows you avoid running long time procedure such as > > full clean build and testing in the working tree. Also, it is guaranteed > > that you test exactly what you put in Git and some other garbage in your > > working tree does not affect the result. > > Sure, but let's be very clear here: "git archive" is likely even more > impossible for some large projects to use than "git clone" would be to > use to create build directories. AFAIK, "git archive" is cheaper than git clone. I do not say it is fast for huge project, but if you want to run a process such as clean build and test that takes a long time anyway, it does not add much to the total time. > > Disk bandwidth is almost always more expensive than disk space. Disk bandwidth is certainly more expensive than disk space, and the whole point was to avoid a lot of disk bandwidth by using hot cache. If you have two working tree then it is likely that only one will be in the hot cache, that is why you can switch faster (and to recompile a few files) than going to another working tree. It has never been about disk space, it is about disk cache and keeping it hot. > > Multiple working directories are really the only sane solution > sometimes. Sure, sometimes... I do not know details to say what will be better in your case, but I just wanted to say that you should weight that against switching, because switching in Git is very fast. Much faster than with any other VCS... Another thing to consider is that if you put a really huge project in one Git repo than Git may not be as fast as you may want, because Git tracks the whole project as the whole. So, you may want to split your project in a few relatively independent modules (See git submodule). Dmitry ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: multiple working directories for long-running builds (was: "git merge" merges too much!) 2009-12-01 21:18 ` Dmitry Potapov @ 2009-12-01 22:25 ` Jeff Epler 2009-12-01 22:44 ` Greg A. Woods 2009-12-02 2:09 ` multiple working directories for long-running builds Junio C Hamano 2 siblings, 0 replies; 33+ messages in thread From: Jeff Epler @ 2009-12-01 22:25 UTC (permalink / raw) To: The Git Mailing List On Wed, Dec 02, 2009 at 12:18:30AM +0300, Dmitry Potapov wrote: > AFAIK, "git archive" is cheaper than git clone. I do not say it is fast > for huge project, but if you want to run a process such as clean build > and test that takes a long time anyway, it does not add much to the > total time. If you want to keep a separate copy of your source tree in order to get consistent builds, "git archive" is not much cheaper in disk space or in time, at least on this unix system: $ find orig -exec md5sum {} + > /dev/null 2>&1 # ensure hot cache $ time git clone orig temp-clone Initialized empty Git repository in /usr/local/jepler/src/temp-clone/.git/ 0.6 real 0.3 user 0.6 system $ time (GIT_DIR=orig/.git git archive --format tar --prefix temp-archive/ HEAD | tar xf -) 0.5 real 0.2 user 0.5 system $ du -s orig temp-clone temp-archive 41880 orig 14640 temp-clone # du excludes files already accounted for by 'orig' 14304 temp-archive .. and the next run to bring temp-clone up to date can be even faster, since it's just 'git pull' and will only touch changed files. Jeff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: multiple working directories for long-running builds (was: "git merge" merges too much!) 2009-12-01 21:18 ` Dmitry Potapov 2009-12-01 22:25 ` Jeff Epler @ 2009-12-01 22:44 ` Greg A. Woods 2009-12-02 0:10 ` Dmitry Potapov 2009-12-02 2:09 ` multiple working directories for long-running builds Junio C Hamano 2 siblings, 1 reply; 33+ messages in thread From: Greg A. Woods @ 2009-12-01 22:44 UTC (permalink / raw) To: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 4046 bytes --] At Wed, 2 Dec 2009 00:18:30 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: Subject: Re: multiple working directories for long-running builds (was: "git merge" merges too much!) > > AFAIK, "git archive" is cheaper than git clone. It depends on what you mean by "cheaper" It's clearly going to require less disk space. However it's also clearly going to require more disk bandwidth, potentially a _LOT_ more disk bandwidth. > I do not say it is fast > for huge project, but if you want to run a process such as clean build > and test that takes a long time anyway, it does not add much to the > total time. I think you need to try throwing around an archive of, say, 50,000 small files a few times simultaneously on your system to appreciate the issue. (i.e. consider the load on a storage subsystem, say a SAN or NAS, where with your suggestion there might be a dozen or more developers running "git archive" frequently enough that even three or four might be doing it at the same time, and this on top of all the i/o bandwidth required for the builds all of the other developers are also running at the same time.) > > Disk bandwidth is almost always more expensive than disk space. > > Disk bandwidth is certainly more expensive than disk space, and the > whole point was to avoid a lot of disk bandwidth by using hot cache. Huh? Throwing around the archive has nothing to do with the build system in this case. Please let me worry about optimizing the builds -- that's well under control already and is not really yet an issue for the VCS, at least not yet, and maybe never in many cases. I'm just not willing to even consider using what would really be the most simplistic and most expensive form of updating a working directory as could ever be imagined. "Git archive" is truly unintelligent, as-is. Perhaps if "git archive" could talk intelligently to an rsync process and be smart about updating an existing working directory it would be the ideal answer, but _NEVER_ with the current method of just unpacking an archive over an existing directory! (Now there's a good Google SoC, or masters, project for someone eager to learn about rsync & git internals!) Local filesystem "git clone" is usable in many scenarios, but it just won't work nearly so efficiently in a scenario where users have local repos on their workstations and use an NFS NAS to feed the build servers. As I understand it this 'git-new-workdir' script will work though since it uses symlinks that can be pointed across the mount back to the local disk on the user's workstation. They can just mount the build directory and go into it and run a "git checkout" and start another build on the build server(s). A major further advantage of multiple working directories is that this eliminates one more point of failure -- i.e. you don't end up with multiple copies of the repo that _should_ be effectively read-only for everything but "push", and perhaps then only to one branch. I don't like giving developers too much rope, especially in all the wrong places. "git archive" does achieve the same even better I suppose, but without something like a "--format=rsync" option it's completely out of the question. > Another thing to consider is that if you put a really huge project in one > Git repo than Git may not be as fast as you may want, because Git tracks > the whole project as the whole. So, you may want to split your project in > a few relatively independent modules (See git submodule). Indeed -- but sometimes I think this is not feasible either. I know of at least three very real-world projects where there are tens of thousands of small files that really must be managed as one unit, and where running a build in that tree could take a whole day or two on even the fastest currently available dedicated build server. Eg. pkgsrc. -- Greg A. Woods Planix, Inc. <woods@planix.com> +1 416 218 0099 http://www.planix.com/ [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: multiple working directories for long-running builds (was: "git merge" merges too much!) 2009-12-01 22:44 ` Greg A. Woods @ 2009-12-02 0:10 ` Dmitry Potapov 2009-12-03 5:11 ` Greg A. Woods 0 siblings, 1 reply; 33+ messages in thread From: Dmitry Potapov @ 2009-12-02 0:10 UTC (permalink / raw) To: The Git Mailing List On Tue, Dec 01, 2009 at 05:44:15PM -0500, Greg A. Woods wrote: > At Wed, 2 Dec 2009 00:18:30 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: > Subject: Re: multiple working directories for long-running builds (was: "git merge" merges too much!) > > > > AFAIK, "git archive" is cheaper than git clone. > > It depends on what you mean by "cheaper" You said: >>> "git archive" is likely even more >>> impossible for some large projects to use than "git clone" My point was that I do not see why you believe "git archive" is more expensive than "git clone". Accordingly to Jeff Epler's numbers, "git archive" is 20% faster than "git clone"... > It's clearly going to require > less disk space. However it's also clearly going to require more disk > bandwidth, potentially a _LOT_ more disk bandwidth. Well, it does, but as I said earlier it is not the case where I expect things being instantaneous. If you do not a full build and test, then it is going to take a lot of time anyway, and git-archive is not likely to be a big overhead in relative terms. > > I do not say it is fast > > for huge project, but if you want to run a process such as clean build > > and test that takes a long time anyway, it does not add much to the > > total time. > > I think you need to try throwing around an archive of, say, 50,000 small > files a few times simultaneously on your system to appreciate the issue. > > (i.e. consider the load on a storage subsystem, say a SAN or NAS, where > with your suggestion there might be a dozen or more developers running > "git archive" frequently enough that even three or four might be doing > it at the same time, and this on top of all the i/o bandwidth required > for the builds all of the other developers are also running at the same > time.) First of all, I do not see why it should be done frequently. The full build and test may be run once or twice a day, and the full test and build may take an hour. git-archive will take probably less than minute for 50,000 small files (especially if you use tmpfs). In other words, it is 1% overhead, but you get a clean build, which is fully tested. You can sure that no garbage left in the worktree that could influence the result. > > > > Disk bandwidth is almost always more expensive than disk space. > > > > Disk bandwidth is certainly more expensive than disk space, and the > > whole point was to avoid a lot of disk bandwidth by using hot cache. > > Huh? Throwing around the archive has nothing to do with the build > system in this case. "git archive" to do full build and test, which is rarely done. Normally, you just switch between branches, which means a few files are changed and rebuild, and no archive is involved here. > > I'm just not willing to even consider using what would really be the > most simplistic and most expensive form of updating a working directory > as could ever be imagined. "Git archive" is truly unintelligent, as-is. "git archive" is NOT for updating your working tree. You use "git checkout" to switch between branches. "git checkout" is intelligent enough to overwrite only those files that actually differ between two versions. > A major further advantage of multiple working directories is that this > eliminates one more point of failure -- i.e. you don't end up with > multiple copies of the repo that _should_ be effectively read-only for > everything but "push", and perhaps then only to one branch. Multiple copies of the same repo is never a problem (except taking some disks space). I really do not understand why you say that some copies should be effectively read-only... You can start to work on some feature at one place (using one repo) and then continue in another place using another repo. (Obviously, it will require to fetch changes from the first repo, before you will be able to continue, but it is just one command). In other words, I really do not understand what are you talking about here. > > I know of at least three very real-world projects where there are tens > of thousands of small files that really must be managed as one unit, and > where running a build in that tree could take a whole day or two on even > the fastest currently available dedicated build server. Eg. pkgsrc. Tens of thousands files should not be a problem... For instance, the Linux kernel has around 30 thousands and Git works very well in this case. But I would consider to split if it has hundrends of thousands... Dmitry ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: multiple working directories for long-running builds (was: "git merge" merges too much!) 2009-12-02 0:10 ` Dmitry Potapov @ 2009-12-03 5:11 ` Greg A. Woods 0 siblings, 0 replies; 33+ messages in thread From: Greg A. Woods @ 2009-12-03 5:11 UTC (permalink / raw) To: The Git Mailing List [-- Attachment #1: Type: text/plain, Size: 4674 bytes --] At Wed, 2 Dec 2009 03:10:21 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote: Subject: Re: multiple working directories for long-running builds (was: "git merge" merges too much!) > > My point was that I do not see why you believe "git archive" is more > expensive than "git clone". Accordingly to Jeff Epler's numbers, > "git archive" is 20% faster than "git clone"... Really!?!?!? You don't see it? Why is this so hard to understand? Sorry for my incredulity, but I thought this issue was obvious. The slightly more expensive "git clone" happens only _ONCE_. After that you just run "git pull" I think (plus maybe "git reset --hard"?), but in any case it's a heck of a lot less I/O and CPU than "git archive". And of course you skip even the one-time "git clone" operation if you use the even faster and simpler git-new-workdir script. "git archive" has to be run _EVERY_ time you need to update a working directory and it currently has no choice but to toss every bit of the whole working directory, up from the filesystem, across a pipe, and back down to the filesystem. It literally couldn't be more expensive! Sure, no matter how you do it, updating the working directory might not always be the biggest part of the operation, but it's insane to use the most expensive mechanism ever possible when there are far cheaper alternatives. BTW, there cannot, and MUST NOT, be any integrity advantage to using "git archive" over using multiple working directories. "git archive branch" must, by definition, produce exactly the same result as if you did "git checkout branch; rm -rf .git" or else it is buggy. Note also that the build directories created with git-new-workdir can be treated as read-only, and perhaps even forced to be read-only by mount options or maybe just by a corporate policy directive. (in all projects I'm working on the source tree can be read-only -- product files are always generated elsewhere) > Multiple copies of the same repo is never a problem (except taking some > disks space). Exactly -- gigabytes of disk space per copy in the cases I'm concerned about (i.e. where hard links are impossible). I've heard that at least one very large project has an 8GB repository currently. Three of the large projects I work on now are about a gigabyte per copy. That's just what's under .git too, not including the whole working directory as well. I can't even manage a "git clone" from HTTP of one of them without increasing my default process limits as it is so big and uses up too much memory. I guess one could skip the initial more-expensive "git clone" operation by copying the repo using low-level bit moving commands, like "cp -r" or whatever, and then tweak the result to make it appear as if it had been cloned, but even that requires moving gigabytes of data unnecessarily across what is likely to be a network connection of some sort. Are you fighting against git-new-workdir, or the concept of multiple working directories? > > A major further advantage of multiple working directories is that this > > eliminates one more point of failure -- i.e. you don't end up with > > multiple copies of the repo that _should_ be effectively read-only for > > everything but "push", and perhaps then only to one branch. > > I really do not understand why you say that some copies > should be effectively read-only... You can start to work on some feature > at one place (using one repo) and then continue in another place using > another repo. (Obviously, it will require to fetch changes from the > first repo, before you will be able to continue, but it is just one > command). In other words, I really do not understand what are you > talking about here. Developers, especially more junior ones, work on code, and they (are supposed to) spend almost all of their intellectual energy on the issues to do with creating and modifying code -- they are not expected to be integration engineers, nor are they expected to be VCS and SCM experts. The more steps you put in place for them to do, and the more places you allow them to store changes, etc., etc., etc., the more mistakes that they will make. Besides, in some scenarios build directories will be checked out from integration branches which shouldn't have any direct commits made to them, especially not to fix a problem in a build. BTW, pkgsrc has well over 50,000 files, FreeBSD ports is over 100,000. Neither can really be split in any rational way. -- Greg A. Woods Planix, Inc. <woods@planix.com> +1 416 218 0099 http://www.planix.com/ [-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: multiple working directories for long-running builds 2009-12-01 21:18 ` Dmitry Potapov 2009-12-01 22:25 ` Jeff Epler 2009-12-01 22:44 ` Greg A. Woods @ 2009-12-02 2:09 ` Junio C Hamano 2 siblings, 0 replies; 33+ messages in thread From: Junio C Hamano @ 2009-12-02 2:09 UTC (permalink / raw) To: Dmitry Potapov; +Cc: The Git Mailing List Dmitry Potapov <dpotapov@gmail.com> writes: > On Tue, Dec 01, 2009 at 01:58:05PM -0500, Greg A. Woods wrote: >> >> > > I just disagreed that "git archive" was a reasonable alternative to >> > > leaving the working directory alone during the entire time of the build. >> > >> > Using "git archive" allows you avoid running long time procedure such as >> > full clean build and testing in the working tree. Also, it is guaranteed >> > that you test exactly what you put in Git and some other garbage in your >> > working tree does not affect the result. >> >> Sure, but let's be very clear here: "git archive" is likely even more >> impossible for some large projects to use than "git clone" would be to >> use to create build directories. > > AFAIK, "git archive" is cheaper than git clone. I do not say it is fast > for huge project, but if you want to run a process such as clean build > and test that takes a long time anyway, it does not add much to the > total time. I do not understand people who advocate for "git archive" to be used in this manner at all. I do use a set of separate build directories, and I typically run 5 to 10 full builds (in each) per day, but I rarely if ever make fix in them. Perhaps the usage pattern expected by people who want others to use "git archive" to prepare separate build directories may be different from how I use them for. I see two downsides in using "git archive": - "archive" piped to "tar xf -" will overwrite _all_ files every time you refresh the build area, causing extra work on "make" and any build procedure based on file timestamps. Sure, you can work it around by using ccache but why make your life complicated? - When a build in these separate build areas fails, you would want to go there and try to diagnose or even fix the problem in there, not in your primary working area (after all, the whole point of keeping a separate build area is so that you do not have to switch branches too much in the primary working area). A directory structure prepared by "archive" piped to "tar xf -" however is not a work tree, and any experimental changes (e.g. "debugf()") or fixes you make there need to be reverted or taken back manually to be placed in the primary working area. If your build area is prepared with new-workdir, then you share the history and you even share the ref namespace, so that "reset --hard" will remove all the debugf() added while diagnosing, and "diff" will give you the patch you need to take home. You could even make a commit from your build area, but this cuts both ways. You need to be aware that after committing on a branch in one repository other repositories that have the same branch checked out will become out of sync. It is however less of an issue in practice, because the build areas are typically used to check out integration branches (e.g. 'master' and 'next' in git.git) that you do not directly commit anyway, and you will get very aware of the tentative nature of the tree, as the update procedure for such a build area prepared with new-workdir is always: cd /buildfarm/<branch>/ && git reset --hard This will not touch any file that do not have to get updated, so your "make" won't get confused. ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2009-12-09 19:56 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-11-29 3:21 "git merge" merges too much! Greg A. Woods 2009-11-29 5:14 ` Jeff King 2009-11-30 18:12 ` Greg A. Woods 2009-11-30 19:22 ` Dmitry Potapov 2009-12-01 18:52 ` Greg A. Woods 2009-12-01 20:50 ` Dmitry Potapov 2009-12-01 21:58 ` Greg A. Woods 2009-12-02 0:22 ` Dmitry Potapov 2009-12-02 10:20 ` Nanako Shiraishi 2009-12-02 20:09 ` Jeff King 2009-12-03 1:21 ` Git documentation consistency (was: "git merge" merges too much!) Greg A. Woods 2009-12-03 1:34 ` Git documentation consistency Junio C Hamano 2009-12-03 7:22 ` Greg A. Woods 2009-12-03 7:45 ` Jeff King 2009-12-03 15:24 ` Uri Okrent 2009-12-03 16:22 ` Marko Kreen 2009-12-09 19:56 ` Greg A. Woods 2009-12-03 2:07 ` Git documentation consistency (was: "git merge" merges too much!) Jeff King 2009-11-29 5:15 ` "git merge" merges too much! Junio C Hamano 2009-11-30 18:40 ` Greg A. Woods 2009-11-30 20:50 ` Junio C Hamano 2009-11-30 21:17 ` Dmitry Potapov 2009-12-01 0:24 ` Greg A. Woods 2009-12-01 5:47 ` Dmitry Potapov 2009-12-01 17:59 ` multiple working directories for long-running builds (was: "git merge" merges too much!) Greg A. Woods 2009-12-01 18:51 ` Dmitry Potapov 2009-12-01 18:58 ` Greg A. Woods 2009-12-01 21:18 ` Dmitry Potapov 2009-12-01 22:25 ` Jeff Epler 2009-12-01 22:44 ` Greg A. Woods 2009-12-02 0:10 ` Dmitry Potapov 2009-12-03 5:11 ` Greg A. Woods 2009-12-02 2:09 ` multiple working directories for long-running builds Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).