* Hey - A Conceptual Simplication.... @ 2009-11-18 12:55 George Dennie 2009-11-18 13:18 ` Jonathan del Strother ` (4 more replies) 0 siblings, 5 replies; 25+ messages in thread From: George Dennie @ 2009-11-18 12:55 UTC (permalink / raw) To: git; +Cc: torvalds A Clean checkout command might be... The Git model does not seem to go far enough conceptually, for some unexplainable reason... In particular, why is Git not treating the entire working tree as the versioned document (qualified of course by the .gitignore file). Instead, Git is treating a manually maintained list of files within the working tree as the versioned document, this list being initialized and manually amended by the "Git add/rm/mv" commands, etc. The result is conceptual complexity and rather counter-intuitive behavior. For example, adding and renaming files outside of Git is not considered editing the version until you subsequently do a "Git Add ." Contrast that with editing or deleting files outside of Git. Yet adding and renaming files and folders is a significant part of substantive projects, especially in the early stages and experimental branches. Granted, this is not a big deal functionally, but what is being lost is conceptual simplicity (and consistency, in my book) and conceptual simplicity is a key value point, if not THE key. Also can we augment checkout to totally CLEAN the working directory prior to a restore. If necessary we can augment .gitignore to stipulate those files or folders that should be excluded from the cleaning. This suggestion is in recognition of the fact that if you are not versioning the file, it is typically trash; which becomes the case when the entire working treat is treated as the versioned document. Consequently, I recommend the following new commands: "Git commit -x" -- performs a "Git add ." then a "Git commit" "Git checkout -x" -- that clean the working tree prior to perform a checkout P.S. Great your work. George Dennie, BMath The Point Of Sale People www.pospeople.com BUS: 416-496-2921 FAX: 416-496-9496 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie @ 2009-11-18 13:18 ` Jonathan del Strother 2009-11-18 13:25 ` Jan Krüger ` (3 subsequent siblings) 4 siblings, 0 replies; 25+ messages in thread From: Jonathan del Strother @ 2009-11-18 13:18 UTC (permalink / raw) To: George Dennie; +Cc: git, torvalds 2009/11/18 George Dennie <gdennie@pospeople.com>: > A Clean checkout command might be... > > The Git model does not seem to go far enough conceptually, for some > unexplainable reason... > > In particular, why is Git not treating the entire working tree as the > versioned document (qualified of course by the .gitignore file). > > Instead, Git is treating a manually maintained list of files within the > working tree as the versioned document, this list being initialized and > manually amended by the "Git add/rm/mv" commands, etc. > > The result is conceptual complexity and rather counter-intuitive behavior. > For example, adding and renaming files outside of Git is not considered > editing the version until you subsequently do a "Git Add ." Contrast that > with editing or deleting files outside of Git. Yet adding and renaming files > and folders is a significant part of substantive projects, especially in the > early stages and experimental branches. > > Granted, this is not a big deal functionally, but what is being lost is > conceptual simplicity (and consistency, in my book) and conceptual > simplicity is a key value point, if not THE key. > > Also can we augment checkout to totally CLEAN the working directory prior to > a restore. If necessary we can augment .gitignore to stipulate those files > or folders that should be excluded from the cleaning. This suggestion is in > recognition of the fact that if you are not versioning the file, it is > typically trash; which becomes the case when the entire working treat is > treated as the versioned document. > > Consequently, I recommend the following new commands: > "Git commit -x" -- performs a "Git add ." then a "Git commit" > "Git checkout -x" -- that clean the working tree prior to perform a > checkout > Perhaps try 'git commit -a' and 'git checkout -f' ? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie 2009-11-18 13:18 ` Jonathan del Strother @ 2009-11-18 13:25 ` Jan Krüger 2009-11-18 18:51 ` George Dennie 2009-11-18 13:30 ` Thomas Rast ` (2 subsequent siblings) 4 siblings, 1 reply; 25+ messages in thread From: Jan Krüger @ 2009-11-18 13:25 UTC (permalink / raw) To: George Dennie; +Cc: git, torvalds Hi, > The result is conceptual complexity and rather counter-intuitive > behavior. For example, adding and renaming files outside of Git is > not considered editing the version until you subsequently do a "Git > Add ." Contrast that with editing or deleting files outside of Git. > Yet adding and renaming files and folders is a significant part of > substantive projects, especially in the early stages and experimental > branches. yet even now, people routinely add huge amounts of files they didn't actually want to add, and then have to expend a huge amount of effort to get them out of the history again (particularly if that history has already been published). What you are describing is a workflow that is even fuller of potential for wrong turns than the current standard workflow is. If simplicity leads to a greater potential for errors, how is it a good thing? This kind of workflow actually involves more work for the user. She now has to meticulously maintain an accurate list of ignore patterns, particularly because of this: > Also can we augment checkout to totally CLEAN the working directory > prior to a restore. If necessary we can augment .gitignore to > stipulate those files or folders that should be excluded from the > cleaning. So if I forget to add a certain pattern, my file is lost forever? Uhh... > This suggestion is in recognition of the fact that if you > are not versioning the file, it is typically trash Just how typical is that, though? I wouldn't want to be the one to judge that. In light of my concerns, I oppose adding your suggestions to the official CLI of git and I suggest that you create your own commands to enable this kind of workflow. For example: git config --global alias.commitx '!git add . && git commit' git config --global alias.checkoutx '!git clean && git checkout' Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Hey - A Conceptual Simplication.... 2009-11-18 13:25 ` Jan Krüger @ 2009-11-18 18:51 ` George Dennie 2009-11-18 19:40 ` Jakub Narebski 2009-11-20 1:35 ` Dmitry Potapov 0 siblings, 2 replies; 25+ messages in thread From: George Dennie @ 2009-11-18 18:51 UTC (permalink / raw) To: 'Jan Krüger'; +Cc: git Thanks Jan, Jason, Jonathan, and Thomas for your response, your thoughts and concerns are enlightening.... Jan Kruger wrote... > git config --global alias.commitx '!git add . && git commit' > git config --global alias.checkoutx '!git clean && git checkout' Thank you. Being new to git, I did not know that such aliasing was available within it. Jason Sewell wrote... > If you have a bunch of debugging code sitting around in your working tree after you've tracked down a > problem, you don't want to commit all of those printfs, etc. - you want to commit the fix. This has > ramifications from making diffs of history cleaner to making git bisect actually useful. One of the concerns I have with the manual pick-n-commit is that you can forget a file or two. Consequently, unless you do a clean checkout and test of the commit, you don't know that your publishable version even compiles. It seems safer to commit the entirety of your work in its working state and then do a clean checkout from a dedicated publishable branch and manually merge the changes in that, test, and commit. It seems the intuitive model is to treat version control as applying to the whole document, not parts of it. In this respect the document is defined by the IDE, namely the entire solution, warts and all. When you start selectively saving parts of the document then you are doing two things, versioning and publishing; and at the same time. This was a critical flaw in older version control approaches because the software solution document is a file system sub-tree. What you termed the debugging/printf's I would treat as a distinctions between a debug vs. a release version that may be suitably delineated by #define's or preferably separate unit tests assemblies. If I must prune prior to committing; however, then it seems reverting spurious printf's may offer a more reliable and automatable technique than ensuring that I have added all the new class files, resource files, text files, sub projects, etc; that may constitute the "fix." Once so selectively reverted I can test and commit such a publishable version. Jason Sewell wrote... > Isn't fastidiously maintaining a .gitignore file to contain everything you *don't* want in the project more confusing > than explicitly specifying things you *do* want in the project? This is git ignore for "cleaning prior to a check" and git ignore for "adding to index" and is not an either or. You would specify what you don't want to version tracked as normal but you can also stipulate what you don't want to be deleted during a clean restore (which should otherwise completely wipe the folder prior to restoring a specific commit). This would permit embedding non-version elements within the version tree for whatever reason you find necessary. Thomas Rast wrote... > That would require supernaturally good maintenance of your .gitignore to avoid adding or (worse) nuking files by accident. On the contrary, the approach would all but eliminate the possibility of loss of data since you would not manually (and therefore error prone-ingly) pruning until after a commit. In fact, one might default automatic commits (if required) prior to checkouts or at least an alert system when uncommitted changes exists. Thanks again for your input. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-18 18:51 ` George Dennie @ 2009-11-18 19:40 ` Jakub Narebski 2009-11-18 19:52 ` Jason Sewall 2009-11-20 1:35 ` Dmitry Potapov 1 sibling, 1 reply; 25+ messages in thread From: Jakub Narebski @ 2009-11-18 19:40 UTC (permalink / raw) To: George Dennie; +Cc: 'Jan Krüger', git "George Dennie" <gdennie@pospeople.com> writes: > Thanks Jan, Jason, Jonathan, and Thomas for your response, your thoughts and > concerns are enlightening.... > Jason Sewell wrote... > > > If you have a bunch of debugging code sitting around in your working tree > > after you've tracked down a problem, you don't want to commit all > > of those printfs, etc. - you want to commit the fix. This has > > ramifications from making diffs of history cleaner to making git > > bisect actually useful. > > One of the concerns I have with the manual pick-n-commit is that you can > forget a file or two. I don't think that this concern is valid. The files which make project are those defined in Makefile or equivalent project file, _not_ all files (or even all files of specific type / extension) that do happen to reside in given directory. And those files whould be known to git, either added when importing project into git, or added when they were created. And if they are known it is enough to use "git commit -a" to pick all changes. So I don't see how you can 'forget a file or two'. Are those *theoretical* concerns, or is it something that happened to you doring using git? > Consequently, unless you do a clean checkout and test > of the commit, you don't know that your publishable version even compiles. > It seems safer to commit the entirety of your work in its working state and > then do a clean checkout from a dedicated publishable branch and manually > merge the changes in that, test, and commit. That's what git stash --keep-index is for. That, and continuous integration repository, with it's hooks. > > It seems the intuitive model is to treat version control as applying to the > whole document, not parts of it. In this respect the document is defined by > the IDE, namely the entire solution, warts and all. Yes, and IDE has project file which defines which files are in project, just like version control system has it's tracked files. > When you start > selectively saving parts of the document then you are doing two things, > versioning and publishing; and at the same time. This was a critical flaw in > older version control approaches because the software solution document is a > file system sub-tree. Atomic commits are important, but the distinction between tracked files, (untracked) ignored files, and files in "limbo" state (neither tracked nor ignored) is orthogonal to having atomic commits. > Jason Sewell wrote... > > > Isn't fastidiously maintaining a .gitignore file to contain > > everything you *don't* want in the project more confusing than > > explicitly specifying things you *do* want in the project? > > This is git ignore for "cleaning prior to a check" and git ignore for > "adding to index" and is not an either or. You would specify what you don't > want to version tracked as normal but you can also stipulate what you don't > want to be deleted during a clean restore (which should otherwise completely > wipe the folder prior to restoring a specific commit). This would permit > embedding non-version elements within the version tree for whatever reason > you find necessary. And this is supposedly easier to use? I don't think so. > Thomas Rast wrote... > > > That would require supernaturally good maintenance of your > > .gitignore to avoid adding or (worse) nuking files by accident. > > On the contrary, the approach would all but eliminate the possibility of > loss of data since you would not manually (and therefore error prone-ingly) > pruning until after a commit. In fact, one might default automatic commits > (if required) prior to checkouts or at least an alert system when > uncommitted changes exists. What? I cannot understand you here. I think that automatic pruning of non-versioned files is _more_ error prone than manual deleting of files. And much more error prone that just keeping non-ignored and non-tracked files. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-18 19:40 ` Jakub Narebski @ 2009-11-18 19:52 ` Jason Sewall 2009-11-19 2:03 ` George Dennie 0 siblings, 1 reply; 25+ messages in thread From: Jason Sewall @ 2009-11-18 19:52 UTC (permalink / raw) To: Jakub Narebski; +Cc: George Dennie, Jan Krüger, git Sorry for the 2x post, George; forgot to include the list in my reply.... On Wed, Nov 18, 2009 at 1:51 PM, George Dennie <gdennie@pospeople.com> wrote: [some cleanup of quote line wrapping] > Jason Sewall wrote... >> If you have a bunch of debugging code sitting around in your >> working tree after you've tracked down a problem, you don't want to >> commit all of those printfs, etc. - you want to commit the >> fix. This has ramifications from making diffs of history cleaner to >> making git bisect actually useful. > One of the concerns I have with the manual pick-n-commit is that you > can forget a file or two. Consequently, unless you do a clean > checkout and test of the commit, you don't know that your > publishable version even compiles. It seems safer to commit the > entirety of your work in its working state and then do a clean > checkout from a dedicated publishable branch and manually merge the > changes in that, test, and commit. I find git status very useful in preparing a commit; untracked (and 'un-ignored') files are listed right there and I can if there are new source files that are not present but not tracked. You could even add a 'pre-commit hook' to make sure that you don't have any untracked *.c (or whatever) files before you actually make the commit. As to 'publishable' version, it's probably a good idea to run 'make distcheck' or the equivalent before making a release anyway. > It seems the intuitive model is to treat version control as applying > to the whole document, not parts of it. In this respect the document > is defined by the IDE, namely the entire solution, warts and > all. When you start selectively saving parts of the document then > you are doing two things, versioning and publishing; and at the same > time. This was a critical flaw in older version control approaches > because the software solution document is a file system sub-tree. I find this leads to big, shapeless commits and, as I mentioned before, it seriously limits the utility of 'git bisect'. I also fail to see how 'selectively saving parts of the document' is versioning and publishing - what is the publishing part? The act of committing is one thing (and 'saving parts of the document' is one conceivable name for it) and publishing another. Your workflow may vary, but before actually 'publishing' (perhaps pushing out to a public repo, or merging into a public branch), it's probably a good idea to test the code with whatever system you use anyway. > What you termed the debugging/printf's I would treat as a > distinctions between a debug vs. a release version that may be > suitably delineated by #define's or preferably separate unit tests > assemblies. If I must prune prior to committing; however, then it > seems reverting spurious printf's may offer a more reliable and > automatable technique than ensuring that I have added all the new > class files, resource files, text files, sub projects, etc; that may > constitute the "fix." Once so selectively reverted I can test and > commit such a publishable version. What if you are hacking away and make changes to several parts of the code at once? Making the commits as fine-grained as possible makes it easier to cherry-pick, bisect, and understand the history. As to debugging code, I admit I sometimes will use git gui or git add -p to stage just what I want and then put whatever is 'left over' in a branch that I might use again later if another bug comes up. Then I can reset --hard my 'working' branch and the debugging code is gone. > Jason Sewell wrote... >> Isn't fastidiously maintaining a .gitignore file to contain >> everything you *don't* want in the project more confusing than >> explicitly specifying things you *do* want in the project? > > This is git ignore for "cleaning prior to a check" and git ignore > for "adding to index" and is not an either or. You would specify > what you don't want to version tracked as normal but you can also > stipulate what you don't want to be deleted during a clean restore > (which should otherwise completely wipe the folder prior to > restoring a specific commit). This would permit embedding > non-version elements within the version tree for whatever reason you > find necessary. Perhaps I don't understand your scheme, but it sounds like you're advocating 2 .gitignores: * .gitignore_track; with everything you don't automatically staged but which can be trashed by your cleaning checkout * .gitignore_keep; with things you don't want staged but which shouldn't be deleted by git during cleaning That seems even more confusing. I'm actually having trouble seeing why you want this untracked-file nuking checkout at all. Care to give an example? > Thomas Rast wrote... >> That would require supernaturally good maintenance of your >> .gitignore to > avoid adding or (worse) nuking files by accident. > > On the contrary, the approach would all but eliminate the > possibility of loss of data since you would not manually (and > therefore error prone-ingly) pruning until after a commit. In fact, > one might default automatic commits (if required) prior to checkouts > or at least an alert system when uncommitted changes exists. Who is pruning after a commit? Once nice thing about checkout is that it will refuse to move to a different commit if there are files that will get trashed. Then you can say 'oops, I should stash/commit/nuke that stuff before I change HEAD. Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Hey - A Conceptual Simplication.... 2009-11-18 19:52 ` Jason Sewall @ 2009-11-19 2:03 ` George Dennie 2009-11-19 7:42 ` Björn Steinbrink ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: George Dennie @ 2009-11-19 2:03 UTC (permalink / raw) To: 'Jason Sewall', 'Jakub Narebski' Cc: 'Jan Krüger', git Thanks Linus, Jason, and Jakub... Linus Torvalds wrote.... >On Wed, 18 Nov 2009, George Dennie wrote: >> >> The Git model does not seem to go far enough conceptually, for some >> unexplainable reason... > > Others already mentioned this, but the concept you missed is the git 'index', which is actually very > central (it is actually the first part of git written, before even the object database) but is something > that most people who get started with git can (and do) ignore. Uhmmm, subtle. I hear you. Thanks for the heads up. But before that, I just put these two cents down... One of the persistent problems with software documentation is that it often fails to define the "functional or usage" model, apart from a dry list of commands. I am sure there are many good reasons for this. For one thing, explaining stuff is hard. Now, I have not had occasions to do merges, as such. So I am finding the justification for the index vague. I am wondering whether this might be a great space to describe the functional model of git in a way that more clearly justifies the index... Specifically, can there be a succinct description of the usage or functional model of Git that necessarily incorporates the index. For example, the functional notion of the repository seems well defined: a growing web of immutable commits each created as either an isolated commit or more typically an update and/or merger of one or more pre-existing commits. With such a description the rest of the structure becomes almost implicit: Commits may be annotated such as with release number labels. Commits that have not been linked to such as by an update or merger remain dangling like loose threads in the web and are called branches. Branches may be given special labels that the repository will then automatically update so as to refer to the latest commit to that branch. I don't yet have such a clear model for the index. Yes it is a staging platform, but so is the IDE....I'll do more reading. Jason Sewell wrote.... > I find this leads to big, shapeless commits and, as I mentioned before, it seriously limits the utility > of 'git bisect'. I also fail to see how 'selectively saving parts of the document' is versioning and > publishing - what is the publishing part? The act of committing is one thing (and 'saving... The notion of a shapeless commit is curious. Intuitively, I consider a commit as capturing the state of my work at a transactional boundary (i.e. a successful unit test...or even lunch break). However, your characterization of "shape" suggest that you are constructing something other than the immediate functionality of the software. Consequently, your software document is not really the solution files alone but also this commit history that you meticulously craft. Further, the participating of the IDE is not to compose within itself the committable document but rather to contribute to such a document in pieces. In fact, the closest metaphor to this process/workflow seems to be submitting articles to a magazine; except you are both the writer and editor/graphic artist; and each edition of the magazine becoming the committable version. With this metaphor the index does play a clear role as a layout board of sorts for the complete magazine. And also clearly, the IDE does not "functionally" edit the entire committable document but rather parts of it. Even though it may effectively have the entirety of the index in its working tree; Git requires that it be submitted to the index which is the true committable document. It begs the question, why is the working tree (the IDE document) so closely tied to the repository since it really amounts to a scratch pad. In fact, while the index may be attach to the working tree, the repository can be anywhere and have more than one index attached...yeah, I know, having a personal dedicated repository is cheap. (A great example of how expediency, the proximity of the repository, might obscure the functional model by making what is arbitrary and due to convention appear a functional necessity...; if, in fact, my above conclusion is correct of course :) > What if you are hacking away and make changes to several parts of the code at once? Making the commits > as fine-grained as possible makes it easier to cherry-pick, bisect, and understand the history. You know Jason, it is often hard to isolate my changes to specific files. I have come to appreciate unit tests as a means of delineating changes. However, clearly the historically record of your solution tree is of substantially value to you. It is something I will have to pay closer attention in my case. > Perhaps I don't understand your scheme, but it sounds like you're advocating 2 .gitignores: > > * .gitignore_track; with everything you don't automatically staged but which can be trashed by your cleaning checkout > * .gitignore_keep; with things you don't want staged but which shouldn't be deleted by git during cleaning Yep, that may be one implementation...but essentially the current .gitignores list exclusionary filters for the "git add ." command. The suggestion was to augment it to also include exclusionary filters for the proposed "git checkout -clean" command. By perhaps prefixing "+" and "-" symbols to the listed elements you can designate each filter's participation in the "do not add" and "do not delete" activities, respectively. However, this suggest was with the presumption that the work tree was the committable document, but clearly it is not. > Who is pruning after a commit? Once nice thing about checkout is that it will refuse to move to a > different commit if there are files that will get trashed. Then you can say 'oops, I should > stash/commit/nuke that stuff before I change HEAD. Not trashing files is a nice thing by checkout. However, are you referring to changes added to the index or changes made in the working tree but not yet added to the index. Base on my current understanding of the functional model, you would be referring to the index since the working tree is little more than a scratch pad. The pruning comment was in recognition that the working tree was not expected to be committable in its entirety. George. Thanks again for your input and if you have the time I welcome your response. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-19 2:03 ` George Dennie @ 2009-11-19 7:42 ` Björn Steinbrink 2009-11-19 20:12 ` George Dennie 2009-11-19 10:27 ` Jakub Narebski 2009-11-20 1:48 ` Dmitry Potapov 2 siblings, 1 reply; 25+ messages in thread From: Björn Steinbrink @ 2009-11-19 7:42 UTC (permalink / raw) To: George Dennie Cc: 'Jason Sewall', 'Jakub Narebski', 'Jan Krüger', git On 2009.11.18 21:03:31 -0500, George Dennie wrote: > Jason Sewell wrote.... > > I find this leads to big, shapeless commits and, as I mentioned > > before, it seriously limits the utility of 'git bisect'. I also > > fail to see how 'selectively saving parts of the document' is > > versioning and publishing - what is the publishing part? The act of > > committing is one thing (and 'saving... > > The notion of a shapeless commit is curious. Intuitively, I consider a > commit as capturing the state of my work at a transactional boundary > (i.e. a successful unit test...or even lunch break). However, your > characterization of "shape" suggest that you are constructing > something other than the immediate functionality of the software. > Consequently, your software document is not really the solution files > alone but also this commit history that you meticulously craft. Your "lunch break" as a transaction boundary is a great example of something that probably most people on this list would consider to create commits that need rewriting before publishing them. Let's take an extreme example: You work on adding a feature to some webmail site that adds colors to the mail being displayed, using different colors for the headers, quoted sections and the text from the sender. The colors should be configurable by the user. *work* git commit -m "Go for a coffee" *work* git commit -m "Lunch break" *work* git commit -m "Meeting" *work* git commit -m "Time to go home" *come back to work* *work* git commit -m "Finished the mail coloring support" This gives you: * Finished the mail coloring support | * Time to go home | * Meeting | * Lunch break | * Go for a coffee Such a history is basically completely useless. It's (ab)using the VCS as a plain code dump. In a week, you'll be able to see that you had a meeting that day, but it doesn't tell you anything about what you did to the project. And even with less "insane" commit messages, the "transactional boundaries" are totally arbitrary. They're aligned to things you did that have absolutely nothing to do with the stuff you're tracking in your VCS. A far more useful history might look like this: * Colorize quoted text in a mail, depending on its quoting depth | * Parse mails into a tree structure to represent sections of quoted text | * Colorize mail headers | * Add support for the user to change the colors used for mails | * Add configuration variable for the colors used for mails At each step, something functionally changed about the software. The commit messages tell you something about how the software evolved. And if you get bogus values for the colors in the configuration, you can be 90% sure, by only looking at the commit messages, that you have a bug in the "Add support for the user to change the colors ..." commit, and not in one of the others. So you can run "git show $that_commit" to see the diff of the changes you made in that commit and quickly check them for your bug. And while that's not sooo useful for commits that added new functionality, it's extremely useful for commits that just made small changes to existing functionality. Finding a bug in a large piece of code (say 2000 lines) isn't trivial. But if you know that a commit that changed 5 lines in that code is responsible for the breakage, all you have to do is to identify the faulty change, which is a lot easier. And with a large history, where it's not obvious in which commit something got broken, "git bisect" can help to quickly find the bad commit. Now consider "git bisect" finding your "Lunch break" commit. Looking at the commit message tells nothing. The diff is pretty much arbitrary, might be huge. Not much help. Finding the "Add support for the user to change the colors ..." commit already tells you something just because of the commit message. And the diff is about just one specific change. It's all nicely separated, and that's a huge value. Using git and producing nice commits is about _documenting_ the history of your code. And having small, self-contained and well separated commits is key to that. And the index can be a great help with that. Given the above example, you might already have some code to use the configured colors, just for testing, so things aren't so boring. Maybe even some hack-up of the code you'll be using later. If that part of the code would be committed right away, you'd mess up your commit, because it wouldn't be about a single change anymore, but would also have your testing code in there. Bad. But you don't want to throw the testing code away either, because it's useful right now, and you might need it later, because it might evolve into the final code used for the actual coloring. So, what now? You have code that you want to commit, and some code you don't want to commit, and which needs to go away temporarily, so you can test without it. No problem, here comes the index. Say you have: config.c # Has changes for the colors show_mail.c # Has changes to use the colors whatever.c # Has some changes for both You do: git add config.c # Add to the index git add -p whatever.c # Only add some hunks to the index So now the index has what you want to commit, and the working tree still has everything. git stash save --keep-index Now your working tree and index only have the things you want to commit. You run your unit tests, everythings fine. You commit and get a nice clean commit, for which you write a useful commit message. git stash pop You've got your changes back that you didn't want to commit just yet, and you can continue working. Another use-case I have found for myself is to use the index to separate reviewed and not-yet-reviewed changes. Before I commit, I always review the diff of the things I'm going to commit. So I start out with "git diff" and start reading. When I finished reviewing a file, I can do "git add $that_file", so the diff for that file will no longer be shown by "git diff". That nicely cuts down the size of the "git diff" output to things I'm still interested in. Quite useful when you are forced to do a large commit, because you did some refactoring. If I find a bug during the review, I can fix that and re-run "git diff", which will only show changes to me that I didn't declare as "good" already by adding them to the index. Sure, it takes some pratice and discipline to generate a nice, useful history. But that's not much different from writing code. Others will hate you for writing unreadable spaghetti code, and so will they hate you for producing a useless history that tells them that you had lunch, instead of telling them what you did to the code ;-) Björn ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Hey - A Conceptual Simplication.... 2009-11-19 7:42 ` Björn Steinbrink @ 2009-11-19 20:12 ` George Dennie 2009-11-19 21:27 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: George Dennie @ 2009-11-19 20:12 UTC (permalink / raw) To: git Cc: B.Steinbrink, 'Jason Sewall', 'Jakub Narebski', 'Jan Krüger', torvalds Thanks Jakub Narebski and Björn Steinbrink...Nice description Björn. I think an important piece of conceptual information missing from the docs is a concise list of the conceptual properties defining the context of the working tree, index, and repository during normal use. This itemization would go far in explaining the synergies between the various commands. Functionally, all the commands merely manipulate these properties. If these properties were summarize in context one would expect that would represent a very complete functional model of Git. A user could review the description figure what they wanted to do and then find the command(s) to accomplish it. Presently this knowledge is accreted over time as oppose to merely being read and in the space of a few minutes "groked" (of course it could be that I am particularly limited :). For example, towards a functional model, is this close? (note: all properties can be blank/empty)... REPOSITORIES Collection of Commits Collection of Branches -- collection of commits without children -- as a result each commits either augments -- and existing branch or creates a new one Master Branch -- typically the publishable development history INDEX Collections of Parent/Merge Commits -- the commit will use all these as its parent Staged Commit -- these changes are shown relative to the working tree Default Branch -- the history the staged commit is suppose to augment Collection of Stashes -- these are not copies of the working tree since they -- only contain "versioned" files/folders and so is not -- a backup WORKING_TREE Collection of Files and Folders As far as I can tell, the working tree is not suppose to be stateful, but it seems the commands treat it as such. What is interesting is that branches serve to encourage a serialized view of commits. More than structure, they are like books in a library narrating a development story. Consequently, and interestingly, they are as much the purpose of the repository as the commits they organize...which is interesting. Again, thanks for your patients. George. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-19 20:12 ` George Dennie @ 2009-11-19 21:27 ` Junio C Hamano 2009-11-20 0:49 ` Jakub Narebski 2009-11-20 2:31 ` Dmitry Potapov 2 siblings, 0 replies; 25+ messages in thread From: Junio C Hamano @ 2009-11-19 21:27 UTC (permalink / raw) To: George Dennie Cc: git, B.Steinbrink, 'Jason Sewall', 'Jakub Narebski', 'Jan Krüger', torvalds "George Dennie" <gdennie@pospeople.com> writes: > REPOSITORIES > Collection of Commits Ok. > Collection of Branches > -- collection of commits without children Wrong. > -- as a result each commits either augments > -- and existing branch or creates a new one Ok. > Master Branch > -- typically the publishable development history Not necessarily. > INDEX > Collections of Parent/Merge Commits > -- the commit will use all these as its parent Wrong. > Staged Commit > -- these changes are shown relative to the working tree A new word for me. I doubt we need to have such a concept. > Default Branch > -- the history the staged commit is suppose to augment We typically call it "the current branch". It is "the branch whose tip will advance by one commit when you make a new commit" and determined by HEAD. > Collection of Stashes > -- these are not copies of the working tree since they > -- only contain "versioned" files/folders and so is not > -- a backup I think it is better to say what these _are_, instead of saying what they are not. These are not yoghurt cups, these are nor bicycles, these are not knitting needles. Listing what they are not does not give you more information. > WORKING_TREE > Collection of Files and Folders Ok. > As far as I can tell, the working tree is not suppose to be stateful, but it > seems the commands treat it as such. I am not sure what you are trying to say by "stateful" here. A work tree has files and directories, and if you edit one of the files of course it changes its state. ---------------------------------------------------------------- A branch is just a pointer to one commit (or nothingness, if it is unborn, but that is such a special case you do not have to worry about yet until you understand git more). The commit can have many children, but you do not care about them when looking at the branch, as there is no "parent-to-children" pointer. The pointer that represents a branch moves to another commit by different operations. - If you make a new commit while on the branch, it points to the new commit. This is the most typical, and is done by many every-day commands, such as "commit", "am", "merge", "cherry-pick", "revert". Typically the new commit B is a direct child of the commit the branch used to point at A, and B has A as its first parent. - There are commands that let you violate the above, i.e. you can change what commit the branch pointer points at, and the new commit A does not have to be a direct child of the commit currently pointed by the branch. "reset" and "rebase" are examples of such commands and are to rewrite the history. There is the "current branch" that you are on. It is recorded in HEAD (cat .git/HEAD to see it). When you create a new commit, the tip of the branch HEAD points at is updated to point at the new commit. Since the new commit is made a direct child of the current commit, this will appear to the users as "advancing the branch". The state (contents of files and symlinks together with where they are in the tree) to be commited next is recorded in the index. "git add" and friends are used to update this state in the index, and "git diff" with various options allow you to view the difference between this state and work tree or arbitrary commit. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-19 20:12 ` George Dennie 2009-11-19 21:27 ` Junio C Hamano @ 2009-11-20 0:49 ` Jakub Narebski 2009-11-20 6:27 ` Junio C Hamano 2009-11-20 2:31 ` Dmitry Potapov 2 siblings, 1 reply; 25+ messages in thread From: Jakub Narebski @ 2009-11-20 0:49 UTC (permalink / raw) To: George Dennie Cc: git, B.Steinbrink, 'Jason Sewall', 'Jan Krüger', torvalds On Thu, 19 Nov 2009, George Dennie wrote: > Thanks Jakub Narebski and Björn Steinbrink...Nice description Björn. > > I think an important piece of conceptual information missing from the docs > is a concise list of the conceptual properties defining the context of the > working tree, index, and repository during normal use. This itemization > would go far in explaining the synergies between the various commands. If you didn't find sufficient description of underlying concepts behind git in "Git User's Manual" (distributed with Git), "Git Community Book" or "Pro Git", take a look at the following documents: * "Git for Computer Scientists" * "Git From Bottom's Up" * "The Git Parable" > Functionally, all the commands merely manipulate these properties. If these > properties were summarize in context one would expect that would represent a > very complete functional model of Git. A user could review the description > figure what they wanted to do and then find the command(s) to accomplish it. I disagree. While understanding underlying concepts of Git helps with finding a way to get what one wants to achieve, I don't think that the way presented here would work in practice. > Presently this knowledge is accreted over time as oppose to merely being > read and in the space of a few minutes "groked" (of course it could be that > I am particularly limited :). It is documented, see referenced mentioned above. > For example, towards a functional model, is this close? (note: all > properties can be blank/empty)... > > REPOSITORIES > Collection of Commits Direct Acyclic Graph of Commits, where edges in graph point from commit to zero or more its parents. > Collection of Branches > -- collection of commits without children Errr... what? Commit doesn't *have* [pointer to] children. Also branch can point to commit for which there exists other commit which has given commit as parent (up-to-date or fast-forward situation, e.g.) a---b---c <--- branch_a \ \-d---e <--- branch_b Branches (or branch heads / branch tips) are named references into DAG of commits, points where DAG of commits grow. > -- as a result each commits either augments > -- and existing branch or creates a new one Commits do not create a new branch. New commits must be crated on existing branch (or on unnamed branch aka detached HEAD, but that is advanced usage). > Master Branch > -- typically the publishable development history TANSTAAMB. There ain't such thing as a master branch. ;-))))) Well, at least not in a sense of there being a branch that is a trunk branch distinguished by _technical_ means. > > INDEX > Collections of Parent/Merge Commits > -- the commit will use all these as its parent No. The index is set of versions of files (blobs) that would go as a contents (tree) of a next commit (if you use "git commit', not "git commit -a"). > > Staged Commit > -- these changes are shown relative to the working tree Errr.... what? > > Default Branch > -- the history the staged commit is suppose to augment Errr... what? If by "default branch" you mean "current branch", it is currently checked out branch, where new commit would go, pointed by HEAD symbolic reference. > WORKING_TREE > Collection of Files and Folders > > > As far as I can tell, the working tree is not suppose to be stateful, but it > seems the commands treat it as such. Stateful? Working tree / working area is a working area. It can be disconnected from repository via core.worktree, --work-tree option and GIT_WORK_TREE environment, see also contrib/workdir/git-new-workdir > Again, thanks for your patients. patience. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-20 0:49 ` Jakub Narebski @ 2009-11-20 6:27 ` Junio C Hamano 0 siblings, 0 replies; 25+ messages in thread From: Junio C Hamano @ 2009-11-20 6:27 UTC (permalink / raw) To: Jakub Narebski Cc: George Dennie, git, B.Steinbrink, 'Jason Sewall', 'Jan Krüger', torvalds Jakub Narebski <jnareb@gmail.com> writes: > If you didn't find sufficient description of underlying concepts behind > git in "Git User's Manual" (distributed with Git), "Git Community Book" > or "Pro Git", take a look at the following documents: > > * "Git for Computer Scientists" > * "Git From Bottom's Up" > * "The Git Parable" > ... > It is documented, see referenced mentioned above. I actually would want ourselves step back a bit and make sure that anybody who is completely new to git won't get confused with the concepts after s/he reads our "Git User's Manual" and nothing else. Listing five or six documents and "you'll find information somewhere among these" *might* be the best thing we could do at this very second, but we should strive to do better than that. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-19 20:12 ` George Dennie 2009-11-19 21:27 ` Junio C Hamano 2009-11-20 0:49 ` Jakub Narebski @ 2009-11-20 2:31 ` Dmitry Potapov 2 siblings, 0 replies; 25+ messages in thread From: Dmitry Potapov @ 2009-11-20 2:31 UTC (permalink / raw) To: George Dennie Cc: git, B.Steinbrink, 'Jason Sewall', 'Jakub Narebski', 'Jan Krüger', torvalds On Thu, Nov 19, 2009 at 03:12:35PM -0500, George Dennie wrote: > > I think an important piece of conceptual information missing from the docs > is a concise list of the conceptual properties defining the context of the > working tree, index, and repository during normal use. This itemization > would go far in explaining the synergies between the various commands. Speaking about "normal use"... I suggest you read about Git workflows: $ git help gitworkflows > > Functionally, all the commands merely manipulate these properties. If these > properties were summarize in context one would expect that would represent a > very complete functional model of Git. A user could review the description > figure what they wanted to do and then find the command(s) to accomplish it. It is like to say that driving a car merely means to manipulate its components, so if these components were summarized, it would be all that one needs to know to drive a car... While I don't dispute that basic understanding of key Git concepts is important, understanding of a typical Git workflow cannot be deduced from knowledge of separate parts. Now if I were to describe Git just in a few words, I would say that Git repository is just a DAG of objects, the working tree is the place where you work, and the index is what helps you to create fine-grained commits and do merges. But it says very little (if anything) about how to use it. Dmitry ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-19 2:03 ` George Dennie 2009-11-19 7:42 ` Björn Steinbrink @ 2009-11-19 10:27 ` Jakub Narebski 2009-11-20 1:48 ` Dmitry Potapov 2 siblings, 0 replies; 25+ messages in thread From: Jakub Narebski @ 2009-11-19 10:27 UTC (permalink / raw) To: George Dennie; +Cc: 'Jason Sewall', 'Jan Krüger', git Side-note: you are employing very strange line wrapping... you should word wrap your lines so they do not exceed 70-76 characters, and you should not (except when required for readability) rewrap quoted text. On Thu, 19 Nov 2009, George Dennie wrote: > Thanks Linus, Jason, and Jakub... > > Linus Torvalds wrote.... >>On Wed, 18 Nov 2009, George Dennie wrote: >>> >>> The Git model does not seem to go far enough conceptually, for some >>> unexplainable reason... >> >> Others already mentioned this, but the concept you missed is the git >> 'index', which is actually very central (it is actually the first >> part of git written, before even the object database) but is >> something that most people who get started with git can (and do) >> ignore. > > Uhmmm, subtle. I hear you. Thanks for the heads up. But before that, > I just put these two cents down... > [...] Now, I have not had occasions to do merges, as such. So I am > finding the justification for the index vague. [...] Errr... you didn't do any merges? What is then your experience with using version control, then? As for using index during merge: merge is joining two (or more) lines of history (lines of development), bringing contents of another branch into current branch. Some of changes are independent, for example if one branch changes one file, and other branch changed other file. This is so called trivial merge, example of tree-level merge. Even if branches merged touch the same file, if changes were made in separate sections of file git can merge changes (using three-way merge / diff3 algorithm). The problem starts if there are changes which touch the same sections of a file. This generates so called merge conflict (contents conflict), and you have to resolve such conflict manually. During merge index helps to manage information about yet unmerged parts. Let's assume for example that you made a mistake in merge resolution in some file, and you want to scratch your attempt and try it anew. Without index it would be very hard to do without trashing resolutions of other conflicts. > For example, the functional notion of the repository seems well > defined: a growing web of immutable commits each created as either > an isolated commit or more typically an update and/or merger of > one or more pre-existing commits. If by "web" you mean DAG (Directed Acyclic Graph) of commits, then yes, it is _part_ of repository. There are also refs (branches, tags, remote-tracking branches), which are also part of repository, very important part. Those are named references into DAG of commits. As to commits being created as update of existing commit or from scratch: that would depend on the way of development. Merge commits are much, much more rare than ordinary commits (especially that git favors fast-forwards by default when there is no need for merge). > > With such a description the rest of the structure becomes almost > implicit: Commits may be annotated such as with release number labels. > Commits that have not been linked to such as by an update or merger > remain dangling like loose threads in the web and are called branches. > Branches may be given special labels that the repository will then > automatically update so as to refer to the latest commit to that > branch. Almost right. > I don't yet have such a clear model for the index. Yes it is a staging > platform, but so is the IDE....I'll do more reading. The index is area where you prepare commits, if needed. But you don't need to care that there is something like the index, and prepare your commits in working area. But when you need it, it is there. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-19 2:03 ` George Dennie 2009-11-19 7:42 ` Björn Steinbrink 2009-11-19 10:27 ` Jakub Narebski @ 2009-11-20 1:48 ` Dmitry Potapov 2009-11-20 1:55 ` david 2009-11-20 2:35 ` Björn Steinbrink 2 siblings, 2 replies; 25+ messages in thread From: Dmitry Potapov @ 2009-11-20 1:48 UTC (permalink / raw) To: George Dennie Cc: 'Jason Sewall', 'Jakub Narebski', 'Jan Krüger', git On Wed, Nov 18, 2009 at 09:03:31PM -0500, George Dennie wrote: > > For example, the functional notion of the repository seems well > defined: a growing web of immutable commits each created as either an > isolated commit or more typically an update and/or merger of one or > more pre-existing commits. In Git, commits are not immutable. One thing that many Git users do is git-rebase, which in essense is re-writing or re-ordering exising commits. So, you can change history in Git, but you should never change the published history. (Of course, that leads to the question what is considered as published history. For instance, commits merged on the proposed-updates branch are usually not considered to be "published", so they can be re-written or discarded later). So, the correct way to use Git is to find the right balance between the need to clean up after mistakes (using git-rebase) and not doing too much, so you will not lose important history or create problems for other peoples. > > The notion of a shapeless commit is curious. Intuitively, I consider a > commit as capturing the state of my work at a transactional boundary > (i.e. a successful unit test...or even lunch break). No, it is not what Git commits were intended for. In Git, a commit is a change intended to achieve some goal. Basically, you send a patch to maintainer, and you should explain what this patch does and why it is useful... If your explanation is "I have a lunch break now", it is very bad explanation, thus a bad patch. Dmitry ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-20 1:48 ` Dmitry Potapov @ 2009-11-20 1:55 ` david 2009-11-20 2:56 ` Dmitry Potapov 2009-11-20 2:35 ` Björn Steinbrink 1 sibling, 1 reply; 25+ messages in thread From: david @ 2009-11-20 1:55 UTC (permalink / raw) To: Dmitry Potapov Cc: George Dennie, 'Jason Sewall', 'Jakub Narebski', 'Jan Krüger', git On Fri, 20 Nov 2009, Dmitry Potapov wrote: > On Wed, Nov 18, 2009 at 09:03:31PM -0500, George Dennie wrote: >> >> For example, the functional notion of the repository seems well >> defined: a growing web of immutable commits each created as either an >> isolated commit or more typically an update and/or merger of one or >> more pre-existing commits. > > In Git, commits are not immutable. One thing that many Git users do > is git-rebase, which in essense is re-writing or re-ordering exising > commits. So, you can change history in Git, but you should never change > the published history. (Of course, that leads to the question what is > considered as published history. For instance, commits merged on the > proposed-updates branch are usually not considered to be "published", > so they can be re-written or discarded later). > > So, the correct way to use Git is to find the right balance between > the need to clean up after mistakes (using git-rebase) and not doing > too much, so you will not lose important history or create problems > for other peoples. the typical advice is to clean up before you make changes public, but not afterwords. David Lang >> >> The notion of a shapeless commit is curious. Intuitively, I consider a >> commit as capturing the state of my work at a transactional boundary >> (i.e. a successful unit test...or even lunch break). > > No, it is not what Git commits were intended for. In Git, a commit is > a change intended to achieve some goal. Basically, you send a patch > to maintainer, and you should explain what this patch does and why it > is useful... If your explanation is "I have a lunch break now", it is > very bad explanation, thus a bad patch. > > > Dmitry > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-20 1:55 ` david @ 2009-11-20 2:56 ` Dmitry Potapov 0 siblings, 0 replies; 25+ messages in thread From: Dmitry Potapov @ 2009-11-20 2:56 UTC (permalink / raw) To: david Cc: George Dennie, 'Jason Sewall', 'Jakub Narebski', 'Jan Krüger', git On Thu, Nov 19, 2009 at 05:55:21PM -0800, david@lang.hm wrote: > On Fri, 20 Nov 2009, Dmitry Potapov wrote: > >> So, the correct way to use Git is to find the right balance between >> the need to clean up after mistakes (using git-rebase) and not doing >> too much, so you will not lose important history or create problems >> for other peoples. > > the typical advice is to clean up before you make changes public, but not > afterwords. True, except patches may get additional clean up or improvements based on review feedback, or even get some small fix-ups while they live on 'pu'. But re-writing something that other people may base their work on is clearly wrong. On the other hand, rebasing a large series of patches even if it has never been published may be a wrong way to go, because you replace well tested states with some others, which were not tested. So if it is a long and complex series of patches, chances are high that you can break something in it. So, it requires some judgement when to use git-rebase and when git-merge. Dmitry ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-20 1:48 ` Dmitry Potapov 2009-11-20 1:55 ` david @ 2009-11-20 2:35 ` Björn Steinbrink 2009-11-20 3:08 ` Dmitry Potapov 1 sibling, 1 reply; 25+ messages in thread From: Björn Steinbrink @ 2009-11-20 2:35 UTC (permalink / raw) To: Dmitry Potapov Cc: George Dennie, 'Jason Sewall', 'Jakub Narebski', 'Jan Krüger', git On 2009.11.20 04:48:44 +0300, Dmitry Potapov wrote: > On Wed, Nov 18, 2009 at 09:03:31PM -0500, George Dennie wrote: > > > > For example, the functional notion of the repository seems well > > defined: a growing web of immutable commits each created as either an > > isolated commit or more typically an update and/or merger of one or > > more pre-existing commits. > > In Git, commits are not immutable. Commit _are_ immutable. Like all git objects (blob, tree, commits, tag). "Rewriting" history actually means creating a new history (adding objects), and then changing a ref (most often a branch head) to reference the new instead of the old history. Björn ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-20 2:35 ` Björn Steinbrink @ 2009-11-20 3:08 ` Dmitry Potapov 0 siblings, 0 replies; 25+ messages in thread From: Dmitry Potapov @ 2009-11-20 3:08 UTC (permalink / raw) To: Björn Steinbrink Cc: George Dennie, 'Jason Sewall', 'Jakub Narebski', 'Jan Krüger', git On Fri, Nov 20, 2009 at 03:35:40AM +0100, Björn Steinbrink wrote: > On 2009.11.20 04:48:44 +0300, Dmitry Potapov wrote: > > On Wed, Nov 18, 2009 at 09:03:31PM -0500, George Dennie wrote: > > > > > > For example, the functional notion of the repository seems well > > > defined: a growing web of immutable commits each created as either an > > > isolated commit or more typically an update and/or merger of one or > > > more pre-existing commits. > > > > In Git, commits are not immutable. > > Commit _are_ immutable. Like all git objects (blob, tree, commits, tag). > "Rewriting" history actually means creating a new history (adding > objects), and then changing a ref (most often a branch head) to > reference the new instead of the old history. I stand corrected. All objects in Git repository are actually immutable, but because references can be changed (and tools like git-rebase change it automatically), it _appears_ like editing existing commits, but in fact old commits do not disappear immediately. Even if there is no other branches or tags that refer to old commits, git-reflog stores references to them for 30 days after that the garbage collector can remove them. Dmitry ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-18 18:51 ` George Dennie 2009-11-18 19:40 ` Jakub Narebski @ 2009-11-20 1:35 ` Dmitry Potapov 2009-11-20 6:33 ` Junio C Hamano 1 sibling, 1 reply; 25+ messages in thread From: Dmitry Potapov @ 2009-11-20 1:35 UTC (permalink / raw) To: George Dennie; +Cc: 'Jan Krüger', git On Wed, Nov 18, 2009 at 01:51:56PM -0500, George Dennie wrote: > > One of the concerns I have with the manual pick-n-commit is that you can > forget a file or two. It is more difficult to make this mistake with Git than many others VCSes, because Git shows the list of files that are changed but not committed as well as the list of untracked files when you try to commit something. So, it has never been a real issue for me in practice... > Consequently, unless you do a clean checkout and test > of the commit, you don't know that your publishable version even compiles. If you want to be sure that clean checkout will be compiled, the only way to guarantee that is to do a clean checkout. Even if you commit all files except those that are specified in .gitignore, it is not enough to be sure that a clean checkout will be compiled... But in most cases, you do not need to do that to be *reasonable* sure that a clean checkout will be compiled later, and if you have any doubts, you can do a clean checkout and testing _after_ committing your changes. There is no reason to be afraid to commit something that may not work if you can amend that later (until you publish your changes). > It seems safer to commit the entirety of your work in its working state and > then do a clean checkout from a dedicated publishable branch and manually > merge the changes in that, test, and commit. Maybe I did not understand your words, but I am not sure what is gained in this way... Clearly there is no reason to publish a work that you have not tested yet. And no one cares about crap that you keep in your working tree either... So, a better approach is to commit your changes as a series of patches that can be reviewed easily, then do all testing and then publish them for integration with the main development branch. > > It seems the intuitive model is to treat version control as applying to the > whole document, not parts of it. In this respect the document is defined by > the IDE, namely the entire solution, warts and all. This is a very bogus idea. If you want to preserve all warts etc, you just do backup of the whole disk and now you have a state that can be compiled any time later (provided that your hardware do not change too much). In my experience, in most cases when I was not able to compile an old version were caused not by forgetting to commit something, but changing in the environment (like new compiler, new libraries, etc). But when your commits are fine-grained, you can always cherry-pick the corresponding fix-up and compile this old version if it is necessary. In my experience, the value of VCS history is the ability to look at it (sometimes many years later) and understand who wrote this line and why. Also, nearly all cases when I had to compile some old version were due to bisecting some tricky bug. In both cases, having fine-grained commits was crucial to success. > When you start > selectively saving parts of the document then you are doing two things, > versioning and publishing; and at the same time. No, you don't. Committing some changes and publishing them are two separated operations in Git, and that it is pretty much fundamental. Normally, you commit changes in a few separated patches, review them to make sure that changes match commit messages, do all testing, and only then you publish them. Dmitry ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-20 1:35 ` Dmitry Potapov @ 2009-11-20 6:33 ` Junio C Hamano 2009-11-20 15:07 ` Dmitry Potapov 0 siblings, 1 reply; 25+ messages in thread From: Junio C Hamano @ 2009-11-20 6:33 UTC (permalink / raw) To: Dmitry Potapov; +Cc: George Dennie, 'Jan Krüger', git Dmitry Potapov <dpotapov@gmail.com> writes: > It is more difficult to make this mistake with Git than many others > VCSes, because Git shows the list of files that are changed but not > committed as well as the list of untracked files when you try to commit > something. Not really in practice. Too many people carry their existing practice of using -m to write a useless single liner commit log message that they acquired while using their previous SCM. Arguably, useless log messages are less of a problem on systems like CVS/SVN because they do not do useful log summarization such as "log -- paths..." or "shortlog", so they can be excused for learning the practice in the first place, though. That incidentally is exactly why earlier we (mostly me and Linus) recommended people not to teach "commit -m" to new people, but of course nobody listened ;-). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-20 6:33 ` Junio C Hamano @ 2009-11-20 15:07 ` Dmitry Potapov 0 siblings, 0 replies; 25+ messages in thread From: Dmitry Potapov @ 2009-11-20 15:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: George Dennie, 'Jan Krüger', git On Thu, Nov 19, 2009 at 10:33:05PM -0800, Junio C Hamano wrote: > Dmitry Potapov <dpotapov@gmail.com> writes: > > > It is more difficult to make this mistake with Git than many others > > VCSes, because Git shows the list of files that are changed but not > > committed as well as the list of untracked files when you try to commit > > something. > > Not really in practice. Too many people carry their existing practice of > using -m to write a useless single liner commit log message that they > acquired while using their previous SCM. Well, at least, Git allows to avoid this mistake and produce good commit messages, but you are right it is difficult to break old bad habits... > Arguably, useless log messages > are less of a problem on systems like CVS/SVN because they do not do > useful log summarization such as "log -- paths..." or "shortlog", so they > can be excused for learning the practice in the first place, though. I think quite often commits in CVS/SVN cannot be summarized, because a single commit often contains what would be a short series of patches in Git plus a few separated fix-ups that are completely unrelated to the whole series. It is trivial to split your changes in a few separate commits in Git, but it is difficult to do that with CVS/SVN. > That incidentally is exactly why earlier we (mostly me and Linus) > recommended people not to teach "commit -m" to new people, but of course > nobody listened ;-). Those who got used to '-m' in another VCS will quickly find it on their own... BTW, Git User's Manual uses "git commit -m" 8 times in different examples, largely to explain what is committed here, and I think it is similar with other introductions to Git. Though, clearly '-m' is rarely useful in practice... Dmitry ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie 2009-11-18 13:18 ` Jonathan del Strother 2009-11-18 13:25 ` Jan Krüger @ 2009-11-18 13:30 ` Thomas Rast 2009-11-18 13:31 ` Jason Sewall 2009-11-18 20:36 ` Linus Torvalds 4 siblings, 0 replies; 25+ messages in thread From: Thomas Rast @ 2009-11-18 13:30 UTC (permalink / raw) To: George Dennie; +Cc: git, torvalds George Dennie wrote: > > Instead, Git is treating a manually maintained list of files within the > working tree as the versioned document, this list being initialized and > manually amended by the "Git add/rm/mv" commands, etc. This feature is called the "index", and is not merely a list of the files, but also their content. Please read http://tomayko.com/writings/the-thing-about-git for a nice explanation why this is a good and useful thing. > "Git commit -x" -- performs a "Git add ." then a "Git commit" > "Git checkout -x" -- that clean the working tree prior to perform a checkout That would require supernaturally good maintenance of your .gitignore to avoid adding or (worse) nuking files by accident. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie ` (2 preceding siblings ...) 2009-11-18 13:30 ` Thomas Rast @ 2009-11-18 13:31 ` Jason Sewall 2009-11-18 20:36 ` Linus Torvalds 4 siblings, 0 replies; 25+ messages in thread From: Jason Sewall @ 2009-11-18 13:31 UTC (permalink / raw) To: George Dennie; +Cc: git, torvalds On Wed, Nov 18, 2009 at 7:55 AM, George Dennie <gdennie@pospeople.com> wrote: > > In particular, why is Git not treating the entire working tree as the > versioned document (qualified of course by the .gitignore file). > > Instead, Git is treating a manually maintained list of files within the > working tree as the versioned document, this list being initialized and > manually amended by the "Git add/rm/mv" commands, etc. Isn't fastidiously maintaining a .gitignore file to contain everything you *don't* want in the project more confusing than explicitly specifying things you *do* want in the project? > The result is conceptual complexity and rather counter-intuitive behavior. > For example, adding and renaming files outside of Git is not considered > editing the version until you subsequently do a "Git Add ." Contrast that > with editing or deleting files outside of Git. Yet adding and renaming files > and folders is a significant part of substantive projects, especially in the > early stages and experimental branches. > > Granted, this is not a big deal functionally, but what is being lost is > conceptual simplicity (and consistency, in my book) and conceptual > simplicity is a key value point, if not THE key. In fact, it's a big deal in functionality, but the utility is in being able to to specify exactly what I want to be part of each commit. One of git's great features is the ability to specify *exactly* what you want to be part of each commit, down to the line. This means that each commit can be extremely fine grained and represent specific bug fixes and or features. If you have a bunch of debugging code sitting around in your working tree after you've tracked down a problem, you don't want to commit all of those printfs, etc. - you want to commit the fix. This has ramifications from making diffs of history cleaner to making git bisect actually useful. > Also can we augment checkout to totally CLEAN the working directory prior to > a restore. If necessary we can augment .gitignore to stipulate those files > or folders that should be excluded from the cleaning. This suggestion is in > recognition of the fact that if you are not versioning the file, it is > typically trash; which becomes the case when the entire working treat is > treated as the versioned document. This is even worse. It's already pretty easy to trash your working directory by reflexively typing git checkout -f, and you want to > Consequently, I recommend the following new commands: > "Git commit -x" -- performs a "Git add ." then a "Git commit" > "Git checkout -x" -- that clean the working tree prior to perform a > checkout I see that Jan has replied with some loaded guns, *ahem* aliases. Go ahead and use them, but I recommend you look at the diffs in git.git or some other repository that takes advantage of making commits as compact as possible, and learn how to use git add -p. Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Hey - A Conceptual Simplication.... 2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie ` (3 preceding siblings ...) 2009-11-18 13:31 ` Jason Sewall @ 2009-11-18 20:36 ` Linus Torvalds 4 siblings, 0 replies; 25+ messages in thread From: Linus Torvalds @ 2009-11-18 20:36 UTC (permalink / raw) To: George Dennie; +Cc: git On Wed, 18 Nov 2009, George Dennie wrote: > > The Git model does not seem to go far enough conceptually, for some > unexplainable reason... Others already mentioned this, but the concept you missed is the git 'index', which is actually very central (it is actually the first part of git written, before even the object database) but is something that most people who get started with git can (and do) ignore. Now, admittedly, for casual use it's not always clear _why_ the index is so central, so the fact that you overlooked it is certainly easy to understand. Just take my word for it: to truly understand git, you do need to understand the index. You can ignore it for a long time, because one of the primary reasons for it existing is about performance. That happens to be a primary goal of git, of course, but some people always think it's "just performance". It's way more fundamental than that. So the way you can start getting used to the index is to think of it as a way to avoid having to do a full 'readdir()' on the whole tree to figure out what is in there, and avoiding having to read all the files to check that their contents still match. Of course, if that was _all_ the index did, it could be seen purely as a cache, and have no semantic visibility at all. And that's not the case: the index does have real semantic visibility. The first time you'll see it is when you decide to stage your changes in parts. The index is what allows you to _not_ always commit all your changes exactly because git keeps track of something more than _just_ your whole current working tree. A special case (but a really useful one) of the "staging your changes in parts" is when you do merges. Now, most people don't do merges like I do (what, average of 5 merges per day, day in and day out), so most people don't care quite as deeply as I do, but if you ever do a merge where 99% merged cleanly, and 1% did not (which is the common case for conflicts), you'll really understand why having a system that keeps track of the parts that merged cleanly is _critical_. So for merges, the index keeps track of what merged cleanly, and what didn't, and what the original state for the not-clean stuff was. And as somebody who probably does more merges than likely any other human in the history of the world, I can state with some authority that any source control model that doesn't have this is fundamentally broken. So the index is really _really_ important. Even if you can ignore it most of the time. And the index is why you don't have a model of "always just track the exact tree state". Linus ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2009-11-20 15:08 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie 2009-11-18 13:18 ` Jonathan del Strother 2009-11-18 13:25 ` Jan Krüger 2009-11-18 18:51 ` George Dennie 2009-11-18 19:40 ` Jakub Narebski 2009-11-18 19:52 ` Jason Sewall 2009-11-19 2:03 ` George Dennie 2009-11-19 7:42 ` Björn Steinbrink 2009-11-19 20:12 ` George Dennie 2009-11-19 21:27 ` Junio C Hamano 2009-11-20 0:49 ` Jakub Narebski 2009-11-20 6:27 ` Junio C Hamano 2009-11-20 2:31 ` Dmitry Potapov 2009-11-19 10:27 ` Jakub Narebski 2009-11-20 1:48 ` Dmitry Potapov 2009-11-20 1:55 ` david 2009-11-20 2:56 ` Dmitry Potapov 2009-11-20 2:35 ` Björn Steinbrink 2009-11-20 3:08 ` Dmitry Potapov 2009-11-20 1:35 ` Dmitry Potapov 2009-11-20 6:33 ` Junio C Hamano 2009-11-20 15:07 ` Dmitry Potapov 2009-11-18 13:30 ` Thomas Rast 2009-11-18 13:31 ` Jason Sewall 2009-11-18 20:36 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).