* [RFC] git-add update with all-0 object @ 2006-11-30 22:08 Daniel Barkalow 2006-11-30 22:32 ` Johannes Schindelin ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Daniel Barkalow @ 2006-11-30 22:08 UTC (permalink / raw) To: git; +Cc: Junio C Hamano One thing that I think is non-intuitive to a lot of users (either novice or who just don't do it much) is that it matters where in the process you do "git add <path>" if you're also changing the file. Even if you understand the index, you may not realize (or may not have internalized the fact) that what git-add does is update the index with what's there now. I think the more obvious behavior is to have it record the fact that you want to have the path tracked, but require one of the usual updating mechanisms to get a particular content into the index. This should be pretty simple to implement: use --cacheinfo 0 0 $path instead of --add -- $path, and teach programs that look at the objects recorded in the index (rather than just hashes or other info) about all-0 hashes meaning "but no content there". write-tree would probably just skip the entry (and then you could add a file, but still produce commits without it until you actually do either an update-index explicitly or one of the commit option sets that updates it); diff would treat it as empty; checkout would ignore it. -Daniel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow @ 2006-11-30 22:32 ` Johannes Schindelin 2006-11-30 22:34 ` Nicolas Pitre 2006-11-30 22:46 ` Linus Torvalds 2 siblings, 0 replies; 15+ messages in thread From: Johannes Schindelin @ 2006-11-30 22:32 UTC (permalink / raw) To: Daniel Barkalow; +Cc: git, Junio C Hamano Hi, On Thu, 30 Nov 2006, Daniel Barkalow wrote: > I think the more obvious behavior is to have it record the fact that you > want to have the path tracked, but require one of the usual updating > mechanisms to get a particular content into the index. I fear that this is just your being used to the CVS mindset. Please see http://article.gmane.org/gmane.comp.version-control.git/32792 for details. Hth, Dscho ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow 2006-11-30 22:32 ` Johannes Schindelin @ 2006-11-30 22:34 ` Nicolas Pitre 2006-11-30 22:41 ` Jakub Narebski 2006-11-30 22:46 ` Linus Torvalds 2 siblings, 1 reply; 15+ messages in thread From: Nicolas Pitre @ 2006-11-30 22:34 UTC (permalink / raw) To: Daniel Barkalow; +Cc: git, Junio C Hamano On Thu, 30 Nov 2006, Daniel Barkalow wrote: > One thing that I think is non-intuitive to a lot of users (either novice > or who just don't do it much) is that it matters where in the process you > do "git add <path>" if you're also changing the file. Even if you > understand the index, you may not realize (or may not have internalized > the fact) that what git-add does is update the index with what's there > now. And actually I think this is a good thing. This is what makes the index worth it. Better find a way to make it obvious to people what's happening. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-11-30 22:34 ` Nicolas Pitre @ 2006-11-30 22:41 ` Jakub Narebski 2006-11-30 22:49 ` Nicolas Pitre 0 siblings, 1 reply; 15+ messages in thread From: Jakub Narebski @ 2006-11-30 22:41 UTC (permalink / raw) To: git Nicolas Pitre wrote: > On Thu, 30 Nov 2006, Daniel Barkalow wrote: > >> One thing that I think is non-intuitive to a lot of users (either novice >> or who just don't do it much) is that it matters where in the process you >> do "git add <path>" if you're also changing the file. Even if you >> understand the index, you may not realize (or may not have internalized >> the fact) that what git-add does is update the index with what's there >> now. > > And actually I think this is a good thing. This is what makes the index > worth it. Better find a way to make it obvious to people what's > happening. Still, perhaps (perhaps!) it would be useful to have "intent to add" git-add. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-11-30 22:41 ` Jakub Narebski @ 2006-11-30 22:49 ` Nicolas Pitre 0 siblings, 0 replies; 15+ messages in thread From: Nicolas Pitre @ 2006-11-30 22:49 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Thu, 30 Nov 2006, Jakub Narebski wrote: > Nicolas Pitre wrote: > > > On Thu, 30 Nov 2006, Daniel Barkalow wrote: > > > >> One thing that I think is non-intuitive to a lot of users (either novice > >> or who just don't do it much) is that it matters where in the process you > >> do "git add <path>" if you're also changing the file. Even if you > >> understand the index, you may not realize (or may not have internalized > >> the fact) that what git-add does is update the index with what's there > >> now. > > > > And actually I think this is a good thing. This is what makes the index > > worth it. Better find a way to make it obvious to people what's > > happening. > > Still, perhaps (perhaps!) it would be useful to have "intent to add" > git-add. Well, sure. It could be an argument to git-add. But surely not the default? git-add --latest maybe? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow 2006-11-30 22:32 ` Johannes Schindelin 2006-11-30 22:34 ` Nicolas Pitre @ 2006-11-30 22:46 ` Linus Torvalds 2006-12-01 0:12 ` Daniel Barkalow 2 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2006-11-30 22:46 UTC (permalink / raw) To: Daniel Barkalow; +Cc: git, Junio C Hamano On Thu, 30 Nov 2006, Daniel Barkalow wrote: > > I think the more obvious behavior is to have it record the fact that you > want to have the path tracked, but require one of the usual updating > mechanisms to get a particular content into the index. While this certainly matches the git model better than just automatically taking whatever state exist at commit time (you instead introduce it as a special "empty state" case), I don't think you really want it. Why? Two reasons: - you're still left with all the same issues (ie you do need to use "git commit -a" because that is simply fundamental, and if you don't, "git commit" now causes an ERROR, which is just illogical - you just added the data!) So it's simply better to just tell people "git add" adds the whole state. Explain to them that git doesn't track "filenames", it tracks state, and when you do a "git add", it really adds the _data_ and the permissions too. Really, if you didn't come from years of broken SCM's, you'd think that it's _natural_ that when you add a file for tracking, you add its contents too. It's not that git is surprising or unnatural, it's that CVS is. - you generally really don't want to see "git diff" show you the big diff for a new creation. You only think you do, but trust me, you generally don't. It's the same thing as with doing merges - keeping the automerged state in the index is actually nice, because it means that the default "git diff" can just shut the heck up about the things that may be the _bulk_ of the change, but it's not the interesting part. So I would suggest that if people are irritated with "git diff" for example not showing newly added files AT ALL, then the solution to that isn't that they should be added as "empty" or "all zeroes". We do have other state bits in the index already (we need them for marking things as being unmerged etc), and if the problem is that you want to see that you have a pending add, it's easy enough to have "git add" always set a bit saying "this file is new". A normal "read tree object" would populate index entries with that bit cleared, and so it would be possible to have git add file.c git diff show something like diff --git a/file.c b/file.c added file <mode> <sha1> rather than show the whole big diff (which I _really_ don't think you want to see, and which is actually against the whole point, which is that you add _content_ to the index, and "git diff" will always show you the stuff that is _not_ added to the index yet). (Of course, if you _also_ had changed it between the "git add" and the "git diff", you'd get both the "added file <mode> <sha1>" _and_ the diff that is the diff between the thing you added, and the status it has now). So showing a real diff after "git add" would really be wrong. The index really is important. But if it's _only_ an issue of worrying about seeing added files at all, we can add a "people comfort" bit to do that. (Quite frankly, I don't think it's worthwhile. I really think this is a documentation issue. Make people understand that "git add" adds the contents too, and that git never tracks filenames on their own at all). So it is always going to be true that git add file echo New line >> file git commit must commit the old contents of the file. That really _does_ follow from the whole "track contents" model. Anything that doesn't do this is fundamnetally broken, and has broken the notion of what "git add" means. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-11-30 22:46 ` Linus Torvalds @ 2006-12-01 0:12 ` Daniel Barkalow 2006-12-01 4:57 ` Theodore Tso 0 siblings, 1 reply; 15+ messages in thread From: Daniel Barkalow @ 2006-12-01 0:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: git, Junio C Hamano On Thu, 30 Nov 2006, Linus Torvalds wrote: > A normal "read tree object" would populate index entries with that bit > cleared, and so it would be possible to have > > git add file.c > git diff > > show something like > > diff --git a/file.c b/file.c > added file <mode> <sha1> > > rather than show the whole big diff (which I _really_ don't think you want > to see, and which is actually against the whole point, which is that you > add _content_ to the index, and "git diff" will always show you the stuff > that is _not_ added to the index yet). I'm not sure I want to see the whole added file more when diffing two trees, or when I do "git diff --cached" after "git update-index --add", than when I do "git diff" after "git add", but I'll concede that viewing the content of a new file as a diff is no fun. (Maybe diff-against-nothing for display needs work in general? It's solve the whole root commit thing, too.) > (Of course, if you _also_ had changed it between the "git add" and the > "git diff", you'd get both the "added file <mode> <sha1>" _and_ the diff > that is the diff between the thing you added, and the status it has now). > > So showing a real diff after "git add" would really be wrong. The index > really is important. But if it's _only_ an issue of worrying about seeing > added files at all, we can add a "people comfort" bit to do that. This is where I think "git add" is really broken. For every other git command, if the command causes the index to not match HEAD, the command contains "index" either in the name of the command or in an option. So, if you understand the index, and you understand git's model, but you don't know this one weird corner case, you will come to the conclusion that "git add <path>" leaves <path> such that the index matches HEAD. Now *you* know that "git add" really is "git update-index --add", because you were typing the latter (well, "git update-cache --add", anyway) before "git add" existed at all. But for new users, and anyone who wasn't adding a lot of files back then, it's a surprising exception that has to be learned and internalized. "git checkout" leaves the index matching HEAD or its original state. "git commit" leaves the index matching HEAD (the new HEAD) or its original state. "git reset" (all options) leaves the index matching HEAD or its original state. "git pull/merge" does disrupt the index, but it also starts to prepare a commit based on multiple *HEAD files, and it leaves every stage of the index matching some *HEAD or its original state. And new users still seem to wonder where the merge happens, because it doesn't say "in the index". "git apply" leaves the index alone. "git update-index" says it works on the index. "git apply --index" says it works on the index. Am I missing any violations of the rule? I guess "git rm", but that's just for the CVS-damaged, unnecessary anyway, and it still doesn't care about the state of the working directory at any particular point in time. And I still prefer "git update-index --force-remove" as a command for that operation. So it's obvious that the "add" functionality is properly called "git add --index", because whatever "git add" would, it would have to leave the index matching HEAD or its original state. (Well, okay, '"git commit -i path" ^C', violates the rule. But I forgot until recently that -i stands for --include, not --index, which would make a reasonable expansion, too) > (Quite frankly, I don't think it's worthwhile. I really think this is a > documentation issue. Make people understand that "git add" adds the > contents too, and that git never tracks filenames on their own at all). I think people's model is likely to be closer to "touch" for the index, especially since it has no effect if the file is already in the index. > So it is always going to be true that > > git add file > echo New line >> file > git commit > > must commit the old contents of the file. That really _does_ follow from > the whole "track contents" model. Anything that doesn't do this is > fundamnetally broken, and has broken the notion of what "git add" means. "git add" doesn't *say* it changes the index, and nothing else there *says* it changes the index, so "git commit" there should say "nothing to commit", because you never did "git update-index file", either before or after the change, and you didn't do "git commit file" or "git commit -a". Just tossing the words in commands around, it's obvious that what "git add file" should do is mean that you can now do "git update-index file" instead of "git update-index --add file". Saying you shouldn't need "update-index" after adding a file is like saying you shouldn't need "update-index" after modifying a file. But it shouldn't change my index any more than "git apply" should, because it doesn't say it updates the index. (Of course, it would be good to have "git add --index file", matching "git apply --index patch", which does what "git add" does now.) Now, in order to interact correctly with reseting, checking out a different branch, etc, it wants to have the information in the index file, so there isn't a separate file with a list to lose stuff from. And it patterns naturally as an adjunct to the index for some things (like ls-files, which doesn't care at all what the content associated with filenames is). But that's fundamentally an implementation detail, not an aspect of the model. -Daniel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-12-01 0:12 ` Daniel Barkalow @ 2006-12-01 4:57 ` Theodore Tso 2006-12-01 6:20 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Theodore Tso @ 2006-12-01 4:57 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Linus Torvalds, git, Junio C Hamano On Thu, Nov 30, 2006 at 07:12:31PM -0500, Daniel Barkalow wrote: > This is where I think "git add" is really broken. For every other git > command, if the command causes the index to not match HEAD, the command > contains "index" either in the name of the command or in an option. > > So, if you understand the index, and you understand git's model, but you > don't know this one weird corner case, you will come to the conclusion > that "git add <path>" leaves <path> such that the index matches HEAD. But it's not just this one wierd corner case. You yourself said that "git pull/merge" leave the index where it's != HEAD. I have serious trouble believing that "if the command leaves index != HEAD, the command must contain 'index' in either the name of the command or the option" is all that important of a consistent rule or principle that must be maintained at all costs. By the way, after thinking about this for a while, part of the problem is that the name "index" really sucks. Which is perhaps why Linus is now trying to stop us from actually using the term "index" in these discussions. :-) If we called it a "staging area", as our Great Leader has suggested, I think it would be a lot easier for novice users to understand. Consider what is in the git man page: The index is a simple binary file, which contains an efficient representation of a virtual directory content at some random time. It does so by a simple array that associates a set of names, dates, permissions and content (aka "blob") objects together. The cache is always kept ordered by name, and names are unique (with a few very specific rules) at any point in time, but the cache has no long-term meaning, and can be partially updated at any time..... In particular, the index file can have the representation of an intermediate tree that has not yet been instantiated. So the index can be thought of as a write-back cache, which can contain dirty information that has not yet been written back to the backing store. For a kernel programmer, this might not be understandable --- but for your typical application programmer, this is enough to cause him or her to conclude that git is simply not meant for use by mere mortals. So as Junio and Linus have both said, it's all about your mental model, and if we think about it in terms of a staging area for a commit, and we think about what commands are most natural given that model, it's far more important than whether a command has "index" in its name or specified in an option. Put another way, the reason why I think people are liking the whole "git add" and "git rm" suggestion is that it's a nice middle ground between the "hide the index" and the "shove the index in the user's face" approaches. It's not that we are hiding the fact that there is this thing with the horribly chosen name "index", but instead we talk about this concept of a staging area and we don't dwell on things like the fact that it is a binary file which stores an efficient representation of a virtual directory.... blah blah blah. Once this is done, the only command which is still problematic to describe is "git diff". Yes, it almost always does the right thing. But if you read the man page, even we are now using "<tree-ish>" instead of "<ent>" to describe it, it still forces the user who is reading the man page to prove to him- or her-self that it really always does the right thing. The EXAMPLES section really helps, but even so, the man page is need in terrible of help. For example, exactly what "git diff" does is described in terms of "git diff-files", "git diff-index". and "git diff-tree". (And the command name git-diff-index, git-diff-tree and git-diff-files in the DESCRIPTION aren't even hotlinks, making it hard to get to the plumbing man pages, which is the only place where you can get documentation of the options accepted by git-diff.) OK, so once the novice user gets past this hurdle, he/she says, OK, what does "git diff <tree-ish>" does? Hmm, according to EXAMPLES, this diffs the working tree with the named tree. What options can I give? Well, with one one <tree-ish>, I have to go to read the man page for "git-diff-index", whose synposis says, "Compares content and mode of blobs between the index and repository". But wait! According to git-diff's EXAMLES section, "git diff <tree-ish>" doesn't involve the index at all! Why does the synposis say anything about the index? And this leaves the novice confused and bewildered. And why not? If the user spends time puzzling through the man page, he/she will discover that: 1) "git diff-index <tree>" compares the tree with the working directory, and doesn't involve the index at all, even though it is in the command name. WTF?!? 2) If you want to really diff the index, you have to use the command "git diff-index --cached <tree>" If you look at this from the point of the novice user, it becomes very clear why the index and commands that operate on the index are hopelessly confusing. Yes, if you the grasshopper read and medidate very deeply the low-level meaning of the plumbing, and then someone like Linus slaps you upside the head with one of his e-mail messages, it will suddenly make sense to you. The problem with this method is that it doesn't scale terribly well. :-) But if you are just reading the "git-diff" man page for the first time, and are then forced to read the "git-diff-index" man page to puzzle out what a particular "git diff" option does, and then have to confront the notion that something as "git diff HEAD" involves a command "git diff-index", even though this confusing thing called the index is never involved unless the --cache option is given --- can you see how this might cause the beginning user of git to conclude that git is hopelessly confusing and too hard to use? The question then is how can we fix the "git diff" man page, and how do we explain "git diff" in a tutorial so that users can understand what in the world does it do? For a starting point, I'd recommend moving the EXAMPLES to the beginning of the man page, and moving the any mention of git-diff-index, git-diff-files, and git-diff-tree to the very end of the man page, and to put the most commonly used options in the git-diff man page, so that most users don't have to look at the low-level plumbing man pages to figure out how the high-level git-diff works. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-12-01 4:57 ` Theodore Tso @ 2006-12-01 6:20 ` Junio C Hamano 2006-12-02 8:55 ` Jakub Narebski 2006-12-01 7:10 ` Linus Torvalds 2006-12-01 8:10 ` Daniel Barkalow 2 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2006-12-01 6:20 UTC (permalink / raw) To: Theodore Tso; +Cc: Daniel Barkalow, Linus Torvalds, git Theodore Tso <tytso@mit.edu> writes: > The question then is how can we fix the "git diff" man page, and how > do we explain "git diff" in a tutorial so that users can understand > what in the world does it do? For a starting point, I'd recommend > moving the EXAMPLES to the beginning of the man page, and moving the > any mention of git-diff-index, git-diff-files, and git-diff-tree to > the very end of the man page, and to put the most commonly used > options in the git-diff man page, so that most users don't have to > look at the low-level plumbing man pages to figure out how the > high-level git-diff works. All good points. The only slight worry I have is that just moving EXAMPLE up deviates from the traditional UNIX manpage order of presenting information. I think the plumbing manuals can (and probably should) stay as the technical manual for Porcelain writers. "git diff", "git add" and friends that are clearly Porcelain should talk about what it does in the terms of end user operation in the DESCRIPTION section and puts less stress on how things work behind the scene in technical terms. For example, from git-diff(1): DESCRIPTION ----------- Show changes between two trees, a tree and the working tree, a tree and the index file, or the index file and the working tree. The combination of what is compared with what is determined by the number of trees given to the command. That may be an accurate description of what the command does in technical terms, but it does not tell why the user may want to compare "a tree and the working tree". The users would want to know which case applies to their current situation and we should make it easier for them to find that information. For example, although --cached is technically speaking one of the --diff-options, it should be separated out from other options when we talk about 'git-diff'. Also, although 'git-diff' is designed to work on tree-ish, Porcelain users will use with commit-ish (either a commit or an annotated signed tag that points at a commit) 99.9% of the time, so we should mention <tree-ish> at the end as a sidenote and talk about <commit>. DESCRIPTION ----------- This command shows changes between four combinations of states. * 'git-diff' [--options] [--] [<path>...] is to see the changes you made relative to the index (staging area for the next commit). In other words, the differences are what you _could_ tell git to further add to the index but you still haven't. You can stage these changes by using gitlink:git-update-index[1]. * 'git-diff' [--options] --cached [<commit>] [--] [<path>...] is to see the changes you staged for the next commit relative to the named <tree-ish>. Typically you would want comparison with the latest commit, so if you do not give <commit>, it defaults to HEAD. * 'git-diff' [--options] <commit> -- [<path>...] is to see the changes you have in your working tree, regardless of you staged them or not, relative to the named <commit>. * 'git-diff' [--options] <commit> <commit> -- [<path>...] is to see the changes between two <commit>. Just in case if you are doing something exotic, it should be noted that all of the <commit> in the above descriptoin can be any <tree-ish>. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-12-01 6:20 ` Junio C Hamano @ 2006-12-02 8:55 ` Jakub Narebski 0 siblings, 0 replies; 15+ messages in thread From: Jakub Narebski @ 2006-12-02 8:55 UTC (permalink / raw) To: git Junio C Hamano wrote: > * 'git-diff' [--options] <commit> <commit> -- [<path>...] > > is to see the changes between two <commit>. > > Just in case if you are doing something exotic, it > should be noted that all of the <commit> in the above > descriptoin can be any <tree-ish>. s/descriptoin/description/ It _might_ be worth mentioning that you can compare two arbitrary files using git diff [--options] <blob1 sha> <blob2 sha> where <blob sha> can be entered as <tree-ish>:<filename>, usually <commit>:<filename> (<filename> is HEAD:<filename>) to compare blob (file) from a named tree/from a given commit, or as :<stage>:<filename> (or just ::<filename> if file is not in merge conflict) to compare blob (file) from an index. If I understand correctly there is currently no way to compare files from a working tree, not to mention files outside working tree (including /dev/null) with that syntax. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-12-01 4:57 ` Theodore Tso 2006-12-01 6:20 ` Junio C Hamano @ 2006-12-01 7:10 ` Linus Torvalds 2006-12-01 8:10 ` Daniel Barkalow 2 siblings, 0 replies; 15+ messages in thread From: Linus Torvalds @ 2006-12-01 7:10 UTC (permalink / raw) To: Theodore Tso; +Cc: Daniel Barkalow, git, Junio C Hamano On Thu, 30 Nov 2006, Theodore Tso wrote: > > By the way, after thinking about this for a while, part of the problem > is that the name "index" really sucks. Hey, it was originally called "cache". I don't care _what_ it's called, I just want people knowing about it, because hiding it will just cripple git (ie at the very least, when you hit a merge conflict, you really do want to to understand it if you ever want to go the the "next level"). If people are more comfortable just calling it the "staging area", and talking about it in those terms, I'll be happy. > Put another way, the reason why I think people are liking the whole > "git add" and "git rm" suggestion is that it's a nice middle ground > between the "hide the index" and the "shove the index in the user's > face" approaches. It's not that we are hiding the fact that there is > this thing with the horribly chosen name "index", but instead we talk > about this concept of a staging area and we don't dwell on things like > the fact that it is a binary file which stores an efficient > representation of a virtual directory.... blah blah blah. Yes. And even "git diff" isn't really a problem once you understand the staging area. If people feel worried, let them use "git diff HEAD". You won't need to use git for _that_ long until you realize that since the staging area is going to match the HEAD under normal circumstances (and when it doesn't, you actually tend to prefer to get the diff against the staging area _anyway_), you'll find people just starting to use "git diff" and not worry about it. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-12-01 4:57 ` Theodore Tso 2006-12-01 6:20 ` Junio C Hamano 2006-12-01 7:10 ` Linus Torvalds @ 2006-12-01 8:10 ` Daniel Barkalow 2006-12-01 9:37 ` Andy Parkins 2006-12-02 8:26 ` Jakub Narebski 2 siblings, 2 replies; 15+ messages in thread From: Daniel Barkalow @ 2006-12-01 8:10 UTC (permalink / raw) To: Theodore Tso; +Cc: Linus Torvalds, git, Junio C Hamano On Thu, 30 Nov 2006, Theodore Tso wrote: > By the way, after thinking about this for a while, part of the problem > is that the name "index" really sucks. Which is perhaps why Linus is > now trying to stop us from actually using the term "index" in these > discussions. :-) If we called it a "staging area", as our Great > Leader has suggested, I think it would be a lot easier for novice > users to understand. Consider what is in the git man page: > > The index is a simple binary file, which contains an efficient > representation of a virtual directory content at some random > time. It does so by a simple array that associates a set of > names, dates, permissions and content (aka "blob") objects > together. The cache is always kept ordered by name, and names > are unique (with a few very specific rules) at any point in > time, but the cache has no long-term meaning, and can be > partially updated at any time..... > > In particular, the index file can have the representation of > an intermediate tree that has not yet been instantiated. So > the index can be thought of as a write-back cache, which can > contain dirty information that has not yet been written back > to the backing store. > > For a kernel programmer, this might not be understandable --- but for > your typical application programmer, this is enough to cause him or > her to conclude that git is simply not meant for use by mere mortals. My position on this subject is that "index" is a good name, but that description is a terrible description, and "index" is a word that needs a good description in context. If we just said up front: Git's "index" is a staging area that you use to prepare commits. It maps filenames to content. It allows git to remember changes you want to put into the next commit while you do more work. For normal commits, it is not necessary to use the index, but it is very helpful for complicated commits, because it lets you focus on the part you're still working on while git remembers the part you're done with. I think people would get it. (If it were called the "cache" still, it would be hopeless, because "cache" implies false things; "index" doesn't imply anything initially.) Of course, we'd still have to disabuse people of the notion that the index can store the information "there's nothing at this path yet, but I'm interested in it", because that's a piece of information people often know before a file is ready, and think git would be able to remember in a staging area. -Daniel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-12-01 8:10 ` Daniel Barkalow @ 2006-12-01 9:37 ` Andy Parkins 2006-12-02 8:35 ` Jakub Narebski 2006-12-02 8:26 ` Jakub Narebski 1 sibling, 1 reply; 15+ messages in thread From: Andy Parkins @ 2006-12-01 9:37 UTC (permalink / raw) To: git On Friday 2006 December 01 08:10, Daniel Barkalow wrote: > My position on this subject is that "index" is a good name, but that > description is a terrible description, and "index" is a word that needs a > good description in context. If we just said up front: If we need to explain what "index" means in the context of diff then it's not a good name :-) An index /everywhere else/ is a lookup table. topic->page number; author->book title. record id->byte position. There is never any content in an index, indices just point at content. I imagine that's how git's index got it's name. (I'm only guessing as I've not looked at what's actually inside git's "index"). Here's my guess: git update-index file1 hashes file1, stores it somewhere under that hash and writes the hash->filename connection to .git/index. That is why git's index is called an index. It's a hash->filename index. Unfortunately, "index" in colloquial git actually means the combination of .git/index plus the hashed file itself. That's no longer an index, it's a "book". :-) It's made worse, I think, by the fact that git doesn't want to do any index-like things with the "index". Being content-oriented rather than name-oriented means that an entry like "file1->NOTHING" is impossible in git. This leads to the sort of "git-add means track this filename" confusion that turns up a lot with new users. It's probably all too late to change the nomenclature, but I've always been of the opinion that names are important, they confer meaning. When we use a common word, with common meaning and deviate from that common meaning we are bound to create confusion. New users don't have any "git-way-of-thinking" knowledge when they begin, so when they hear "index" they can only fall back on their standard understanding of that word. We shouldn't be surprised then when new users don't get "the index". Andy -- Dr Andy Parkins, M Eng (hons), MIEE ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-12-01 9:37 ` Andy Parkins @ 2006-12-02 8:35 ` Jakub Narebski 0 siblings, 0 replies; 15+ messages in thread From: Jakub Narebski @ 2006-12-02 8:35 UTC (permalink / raw) To: git Andy Parkins wrote: > On Friday 2006 December 01 08:10, Daniel Barkalow wrote: > >> My position on this subject is that "index" is a good name, but that >> description is a terrible description, and "index" is a word that needs a >> good description in context. If we just said up front: > > If we need to explain what "index" means in the context of diff then it's not > a good name :-) But "staging area" or more descriptive "staging area for commits" is a bit long. But we no longer name the "index" "dircache". > An index /everywhere else/ is a lookup table. topic->page number; > author->book title. record id->byte position. There is never any content in > an index, indices just point at content. Just like git index. > I imagine that's how git's index got it's name. (I'm only guessing as I've > not looked at what's actually inside git's "index"). Here's my guess: > > git update-index file1 hashes file1, stores it somewhere under that hash and > writes the hash->filename connection to .git/index. That is why git's index > is called an index. It's a hash->filename index. This "somewhere" is object repository. And it is reverse: it is filename->(stat + hash) index; from a file in the working area to the blob (or tree) in the repository. > Unfortunately, "index" in colloquial git actually means the combination > of .git/index plus the hashed file itself. That's no longer an index, it's > a "book". :-) Yes, it is true that "index" in colloquial git means "index version" (version pointed by the "index"). > It's made worse, I think, by the fact that git doesn't want to do any > index-like things with the "index". Being content-oriented rather than > name-oriented means that an entry like "file1->NOTHING" is impossible in git. > This leads to the sort of "git-add means track this filename" confusion that > turns up a lot with new users. It is possible. By convention all-0 hash means 'no such object'. The very first message in this thread tried to make use of it... but "git add" to mark filename as interesting instead of "git add" to add _current_ contents of the file goes a bit against git ideas. > It's probably all too late to change the nomenclature, but I've always been of > the opinion that names are important, they confer meaning. When we use a > common word, with common meaning and deviate from that common meaning we are > bound to create confusion. New users don't have any "git-way-of-thinking" > knowledge when they begin, so when they hear "index" they can only fall back > on their standard understanding of that word. We shouldn't be surprised then > when new users don't get "the index". Well, "dircache" was changed to "index". "<ent>" was axed in preference to "<tree-ish>". I think using "staging area" name in git man pages would be a good idea (as would be making --index to be alias to --cached). -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object 2006-12-01 8:10 ` Daniel Barkalow 2006-12-01 9:37 ` Andy Parkins @ 2006-12-02 8:26 ` Jakub Narebski 1 sibling, 0 replies; 15+ messages in thread From: Jakub Narebski @ 2006-12-02 8:26 UTC (permalink / raw) To: git Daniel Barkalow wrote: > Of course, we'd still have to disabuse people of the notion that the index > can store the information "there's nothing at this path yet, but I'm > interested in it", because that's a piece of information people often know > before a file is ready, and think git would be able to remember in a > staging area. Well, that was what about the first message in this thread about. Marking a file "interesting" (so 'git commit -a' would pick it up) using all-0 for object hash... which of course requires review and if necessary modification of all core tools which touch the index. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-12-02 8:53 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow 2006-11-30 22:32 ` Johannes Schindelin 2006-11-30 22:34 ` Nicolas Pitre 2006-11-30 22:41 ` Jakub Narebski 2006-11-30 22:49 ` Nicolas Pitre 2006-11-30 22:46 ` Linus Torvalds 2006-12-01 0:12 ` Daniel Barkalow 2006-12-01 4:57 ` Theodore Tso 2006-12-01 6:20 ` Junio C Hamano 2006-12-02 8:55 ` Jakub Narebski 2006-12-01 7:10 ` Linus Torvalds 2006-12-01 8:10 ` Daniel Barkalow 2006-12-01 9:37 ` Andy Parkins 2006-12-02 8:35 ` Jakub Narebski 2006-12-02 8:26 ` Jakub Narebski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).