* [RFC] git-add update with all-0 object
@ 2006-11-30 22:08 Daniel Barkalow
2006-11-30 22:32 ` Johannes Schindelin
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Daniel Barkalow @ 2006-11-30 22:08 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
One thing that I think is non-intuitive to a lot of users (either novice
or who just don't do it much) is that it matters where in the process you
do "git add <path>" if you're also changing the file. Even if you
understand the index, you may not realize (or may not have internalized
the fact) that what git-add does is update the index with what's there
now.
I think the more obvious behavior is to have it record the fact that you
want to have the path tracked, but require one of the usual updating
mechanisms to get a particular content into the index.
This should be pretty simple to implement: use --cacheinfo 0 0 $path
instead of --add -- $path, and teach programs that look at the objects
recorded in the index (rather than just hashes or other info) about all-0
hashes meaning "but no content there". write-tree would probably just
skip the entry (and then you could add a file, but still produce commits
without it until you actually do either an update-index explicitly or one
of the commit option sets that updates it); diff would treat it as empty;
checkout would ignore it.
-Daniel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow
@ 2006-11-30 22:32 ` Johannes Schindelin
2006-11-30 22:34 ` Nicolas Pitre
2006-11-30 22:46 ` Linus Torvalds
2 siblings, 0 replies; 15+ messages in thread
From: Johannes Schindelin @ 2006-11-30 22:32 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: git, Junio C Hamano
Hi,
On Thu, 30 Nov 2006, Daniel Barkalow wrote:
> I think the more obvious behavior is to have it record the fact that you
> want to have the path tracked, but require one of the usual updating
> mechanisms to get a particular content into the index.
I fear that this is just your being used to the CVS mindset. Please see
http://article.gmane.org/gmane.comp.version-control.git/32792 for details.
Hth,
Dscho
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow
2006-11-30 22:32 ` Johannes Schindelin
@ 2006-11-30 22:34 ` Nicolas Pitre
2006-11-30 22:41 ` Jakub Narebski
2006-11-30 22:46 ` Linus Torvalds
2 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2006-11-30 22:34 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: git, Junio C Hamano
On Thu, 30 Nov 2006, Daniel Barkalow wrote:
> One thing that I think is non-intuitive to a lot of users (either novice
> or who just don't do it much) is that it matters where in the process you
> do "git add <path>" if you're also changing the file. Even if you
> understand the index, you may not realize (or may not have internalized
> the fact) that what git-add does is update the index with what's there
> now.
And actually I think this is a good thing. This is what makes the index
worth it. Better find a way to make it obvious to people what's
happening.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-11-30 22:34 ` Nicolas Pitre
@ 2006-11-30 22:41 ` Jakub Narebski
2006-11-30 22:49 ` Nicolas Pitre
0 siblings, 1 reply; 15+ messages in thread
From: Jakub Narebski @ 2006-11-30 22:41 UTC (permalink / raw)
To: git
Nicolas Pitre wrote:
> On Thu, 30 Nov 2006, Daniel Barkalow wrote:
>
>> One thing that I think is non-intuitive to a lot of users (either novice
>> or who just don't do it much) is that it matters where in the process you
>> do "git add <path>" if you're also changing the file. Even if you
>> understand the index, you may not realize (or may not have internalized
>> the fact) that what git-add does is update the index with what's there
>> now.
>
> And actually I think this is a good thing. This is what makes the index
> worth it. Better find a way to make it obvious to people what's
> happening.
Still, perhaps (perhaps!) it would be useful to have "intent to add"
git-add.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow
2006-11-30 22:32 ` Johannes Schindelin
2006-11-30 22:34 ` Nicolas Pitre
@ 2006-11-30 22:46 ` Linus Torvalds
2006-12-01 0:12 ` Daniel Barkalow
2 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2006-11-30 22:46 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: git, Junio C Hamano
On Thu, 30 Nov 2006, Daniel Barkalow wrote:
>
> I think the more obvious behavior is to have it record the fact that you
> want to have the path tracked, but require one of the usual updating
> mechanisms to get a particular content into the index.
While this certainly matches the git model better than just automatically
taking whatever state exist at commit time (you instead introduce it as a
special "empty state" case), I don't think you really want it.
Why?
Two reasons:
- you're still left with all the same issues (ie you do need to use "git
commit -a" because that is simply fundamental, and if you don't, "git
commit" now causes an ERROR, which is just illogical - you just added
the data!)
So it's simply better to just tell people "git add" adds the whole
state. Explain to them that git doesn't track "filenames", it tracks
state, and when you do a "git add", it really adds the _data_ and the
permissions too.
Really, if you didn't come from years of broken SCM's, you'd think that
it's _natural_ that when you add a file for tracking, you add its
contents too. It's not that git is surprising or unnatural, it's that
CVS is.
- you generally really don't want to see "git diff" show you the big diff
for a new creation. You only think you do, but trust me, you generally
don't. It's the same thing as with doing merges - keeping the
automerged state in the index is actually nice, because it means that
the default "git diff" can just shut the heck up about the things that
may be the _bulk_ of the change, but it's not the interesting part.
So I would suggest that if people are irritated with "git diff" for
example not showing newly added files AT ALL, then the solution to that
isn't that they should be added as "empty" or "all zeroes". We do have
other state bits in the index already (we need them for marking things as
being unmerged etc), and if the problem is that you want to see that you
have a pending add, it's easy enough to have "git add" always set a bit
saying "this file is new".
A normal "read tree object" would populate index entries with that bit
cleared, and so it would be possible to have
git add file.c
git diff
show something like
diff --git a/file.c b/file.c
added file <mode> <sha1>
rather than show the whole big diff (which I _really_ don't think you want
to see, and which is actually against the whole point, which is that you
add _content_ to the index, and "git diff" will always show you the stuff
that is _not_ added to the index yet).
(Of course, if you _also_ had changed it between the "git add" and the
"git diff", you'd get both the "added file <mode> <sha1>" _and_ the diff
that is the diff between the thing you added, and the status it has now).
So showing a real diff after "git add" would really be wrong. The index
really is important. But if it's _only_ an issue of worrying about seeing
added files at all, we can add a "people comfort" bit to do that.
(Quite frankly, I don't think it's worthwhile. I really think this is a
documentation issue. Make people understand that "git add" adds the
contents too, and that git never tracks filenames on their own at all).
So it is always going to be true that
git add file
echo New line >> file
git commit
must commit the old contents of the file. That really _does_ follow from
the whole "track contents" model. Anything that doesn't do this is
fundamnetally broken, and has broken the notion of what "git add" means.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-11-30 22:41 ` Jakub Narebski
@ 2006-11-30 22:49 ` Nicolas Pitre
0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2006-11-30 22:49 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
On Thu, 30 Nov 2006, Jakub Narebski wrote:
> Nicolas Pitre wrote:
>
> > On Thu, 30 Nov 2006, Daniel Barkalow wrote:
> >
> >> One thing that I think is non-intuitive to a lot of users (either novice
> >> or who just don't do it much) is that it matters where in the process you
> >> do "git add <path>" if you're also changing the file. Even if you
> >> understand the index, you may not realize (or may not have internalized
> >> the fact) that what git-add does is update the index with what's there
> >> now.
> >
> > And actually I think this is a good thing. This is what makes the index
> > worth it. Better find a way to make it obvious to people what's
> > happening.
>
> Still, perhaps (perhaps!) it would be useful to have "intent to add"
> git-add.
Well, sure. It could be an argument to git-add. But surely not the
default?
git-add --latest maybe?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-11-30 22:46 ` Linus Torvalds
@ 2006-12-01 0:12 ` Daniel Barkalow
2006-12-01 4:57 ` Theodore Tso
0 siblings, 1 reply; 15+ messages in thread
From: Daniel Barkalow @ 2006-12-01 0:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git, Junio C Hamano
On Thu, 30 Nov 2006, Linus Torvalds wrote:
> A normal "read tree object" would populate index entries with that bit
> cleared, and so it would be possible to have
>
> git add file.c
> git diff
>
> show something like
>
> diff --git a/file.c b/file.c
> added file <mode> <sha1>
>
> rather than show the whole big diff (which I _really_ don't think you want
> to see, and which is actually against the whole point, which is that you
> add _content_ to the index, and "git diff" will always show you the stuff
> that is _not_ added to the index yet).
I'm not sure I want to see the whole added file more when diffing two
trees, or when I do "git diff --cached" after "git update-index --add",
than when I do "git diff" after "git add", but I'll concede that viewing
the content of a new file as a diff is no fun. (Maybe diff-against-nothing
for display needs work in general? It's solve the whole root commit thing,
too.)
> (Of course, if you _also_ had changed it between the "git add" and the
> "git diff", you'd get both the "added file <mode> <sha1>" _and_ the diff
> that is the diff between the thing you added, and the status it has now).
>
> So showing a real diff after "git add" would really be wrong. The index
> really is important. But if it's _only_ an issue of worrying about seeing
> added files at all, we can add a "people comfort" bit to do that.
This is where I think "git add" is really broken. For every other git
command, if the command causes the index to not match HEAD, the command
contains "index" either in the name of the command or in an option.
So, if you understand the index, and you understand git's model, but you
don't know this one weird corner case, you will come to the conclusion
that "git add <path>" leaves <path> such that the index matches HEAD.
Now *you* know that "git add" really is "git update-index --add", because
you were typing the latter (well, "git update-cache --add", anyway) before
"git add" existed at all. But for new users, and anyone who wasn't adding
a lot of files back then, it's a surprising exception that has to be
learned and internalized.
"git checkout" leaves the index matching HEAD or its original state.
"git commit" leaves the index matching HEAD (the new HEAD) or its original
state.
"git reset" (all options) leaves the index matching HEAD or its original
state.
"git pull/merge" does disrupt the index, but it also starts to prepare a
commit based on multiple *HEAD files, and it leaves every stage of the
index matching some *HEAD or its original state. And new users still seem
to wonder where the merge happens, because it doesn't say "in the index".
"git apply" leaves the index alone.
"git update-index" says it works on the index.
"git apply --index" says it works on the index.
Am I missing any violations of the rule? I guess "git rm", but that's just
for the CVS-damaged, unnecessary anyway, and it still doesn't care about
the state of the working directory at any particular point in time. And I
still prefer "git update-index --force-remove" as a command for that
operation.
So it's obvious that the "add" functionality is properly called "git add
--index", because whatever "git add" would, it would have to leave the
index matching HEAD or its original state.
(Well, okay, '"git commit -i path" ^C', violates the rule. But I forgot
until recently that -i stands for --include, not --index, which would make
a reasonable expansion, too)
> (Quite frankly, I don't think it's worthwhile. I really think this is a
> documentation issue. Make people understand that "git add" adds the
> contents too, and that git never tracks filenames on their own at all).
I think people's model is likely to be closer to "touch" for the index,
especially since it has no effect if the file is already in the index.
> So it is always going to be true that
>
> git add file
> echo New line >> file
> git commit
>
> must commit the old contents of the file. That really _does_ follow from
> the whole "track contents" model. Anything that doesn't do this is
> fundamnetally broken, and has broken the notion of what "git add" means.
"git add" doesn't *say* it changes the index, and nothing else there
*says* it changes the index, so "git commit" there should say "nothing to
commit", because you never did "git update-index file", either before or
after the change, and you didn't do "git commit file" or "git commit -a".
Just tossing the words in commands around, it's obvious that what
"git add file" should do is mean that you can now do
"git update-index file" instead of
"git update-index --add file". Saying you shouldn't need "update-index"
after adding a file is like saying you shouldn't need "update-index" after
modifying a file.
But it shouldn't change my index any more than "git apply" should, because
it doesn't say it updates the index. (Of course, it would be good to have
"git add --index file", matching "git apply --index patch", which does
what "git add" does now.)
Now, in order to interact correctly with reseting, checking out a
different branch, etc, it wants to have the information in the index
file, so there isn't a separate file with a list to lose stuff from. And
it patterns naturally as an adjunct to the index for some things (like
ls-files, which doesn't care at all what the content associated with
filenames is). But that's fundamentally an implementation detail, not an
aspect of the model.
-Daniel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-12-01 0:12 ` Daniel Barkalow
@ 2006-12-01 4:57 ` Theodore Tso
2006-12-01 6:20 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Theodore Tso @ 2006-12-01 4:57 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Linus Torvalds, git, Junio C Hamano
On Thu, Nov 30, 2006 at 07:12:31PM -0500, Daniel Barkalow wrote:
> This is where I think "git add" is really broken. For every other git
> command, if the command causes the index to not match HEAD, the command
> contains "index" either in the name of the command or in an option.
>
> So, if you understand the index, and you understand git's model, but you
> don't know this one weird corner case, you will come to the conclusion
> that "git add <path>" leaves <path> such that the index matches HEAD.
But it's not just this one wierd corner case. You yourself said that
"git pull/merge" leave the index where it's != HEAD.
I have serious trouble believing that "if the command leaves index !=
HEAD, the command must contain 'index' in either the name of the
command or the option" is all that important of a consistent rule or
principle that must be maintained at all costs.
By the way, after thinking about this for a while, part of the problem
is that the name "index" really sucks. Which is perhaps why Linus is
now trying to stop us from actually using the term "index" in these
discussions. :-) If we called it a "staging area", as our Great
Leader has suggested, I think it would be a lot easier for novice
users to understand. Consider what is in the git man page:
The index is a simple binary file, which contains an efficient
representation of a virtual directory content at some random
time. It does so by a simple array that associates a set of
names, dates, permissions and content (aka "blob") objects
together. The cache is always kept ordered by name, and names
are unique (with a few very specific rules) at any point in
time, but the cache has no long-term meaning, and can be
partially updated at any time.....
In particular, the index file can have the representation of
an intermediate tree that has not yet been instantiated. So
the index can be thought of as a write-back cache, which can
contain dirty information that has not yet been written back
to the backing store.
For a kernel programmer, this might not be understandable --- but for
your typical application programmer, this is enough to cause him or
her to conclude that git is simply not meant for use by mere mortals.
So as Junio and Linus have both said, it's all about your mental
model, and if we think about it in terms of a staging area for a
commit, and we think about what commands are most natural given that
model, it's far more important than whether a command has "index" in
its name or specified in an option.
Put another way, the reason why I think people are liking the whole
"git add" and "git rm" suggestion is that it's a nice middle ground
between the "hide the index" and the "shove the index in the user's
face" approaches. It's not that we are hiding the fact that there is
this thing with the horribly chosen name "index", but instead we talk
about this concept of a staging area and we don't dwell on things like
the fact that it is a binary file which stores an efficient
representation of a virtual directory.... blah blah blah.
Once this is done, the only command which is still problematic to
describe is "git diff". Yes, it almost always does the right thing.
But if you read the man page, even we are now using "<tree-ish>"
instead of "<ent>" to describe it, it still forces the user who is
reading the man page to prove to him- or her-self that it really
always does the right thing. The EXAMPLES section really helps, but
even so, the man page is need in terrible of help.
For example, exactly what "git diff" does is described in terms of
"git diff-files", "git diff-index". and "git diff-tree". (And the
command name git-diff-index, git-diff-tree and git-diff-files in the
DESCRIPTION aren't even hotlinks, making it hard to get to the
plumbing man pages, which is the only place where you can get
documentation of the options accepted by git-diff.)
OK, so once the novice user gets past this hurdle, he/she says, OK,
what does "git diff <tree-ish>" does? Hmm, according to EXAMPLES,
this diffs the working tree with the named tree. What options can I
give? Well, with one one <tree-ish>, I have to go to read the man
page for "git-diff-index", whose synposis says, "Compares content and
mode of blobs between the index and repository". But wait! According
to git-diff's EXAMLES section, "git diff <tree-ish>" doesn't involve
the index at all! Why does the synposis say anything about the index?
And this leaves the novice confused and bewildered. And why not? If
the user spends time puzzling through the man page, he/she will
discover that:
1) "git diff-index <tree>" compares the tree with the working
directory, and doesn't involve the index at all, even though it is in
the command name. WTF?!?
2) If you want to really diff the index, you have to use the command
"git diff-index --cached <tree>"
If you look at this from the point of the novice user, it becomes very
clear why the index and commands that operate on the index are
hopelessly confusing. Yes, if you the grasshopper read and medidate
very deeply the low-level meaning of the plumbing, and then someone
like Linus slaps you upside the head with one of his e-mail messages,
it will suddenly make sense to you. The problem with this method is
that it doesn't scale terribly well. :-)
But if you are just reading the "git-diff" man page for the first
time, and are then forced to read the "git-diff-index" man page to
puzzle out what a particular "git diff" option does, and then have to
confront the notion that something as "git diff HEAD" involves a
command "git diff-index", even though this confusing thing called the
index is never involved unless the --cache option is given --- can you
see how this might cause the beginning user of git to conclude that
git is hopelessly confusing and too hard to use?
The question then is how can we fix the "git diff" man page, and how
do we explain "git diff" in a tutorial so that users can understand
what in the world does it do? For a starting point, I'd recommend
moving the EXAMPLES to the beginning of the man page, and moving the
any mention of git-diff-index, git-diff-files, and git-diff-tree to
the very end of the man page, and to put the most commonly used
options in the git-diff man page, so that most users don't have to
look at the low-level plumbing man pages to figure out how the
high-level git-diff works.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-12-01 4:57 ` Theodore Tso
@ 2006-12-01 6:20 ` Junio C Hamano
2006-12-02 8:55 ` Jakub Narebski
2006-12-01 7:10 ` Linus Torvalds
2006-12-01 8:10 ` Daniel Barkalow
2 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2006-12-01 6:20 UTC (permalink / raw)
To: Theodore Tso; +Cc: Daniel Barkalow, Linus Torvalds, git
Theodore Tso <tytso@mit.edu> writes:
> The question then is how can we fix the "git diff" man page, and how
> do we explain "git diff" in a tutorial so that users can understand
> what in the world does it do? For a starting point, I'd recommend
> moving the EXAMPLES to the beginning of the man page, and moving the
> any mention of git-diff-index, git-diff-files, and git-diff-tree to
> the very end of the man page, and to put the most commonly used
> options in the git-diff man page, so that most users don't have to
> look at the low-level plumbing man pages to figure out how the
> high-level git-diff works.
All good points. The only slight worry I have is that just
moving EXAMPLE up deviates from the traditional UNIX manpage
order of presenting information.
I think the plumbing manuals can (and probably should) stay as
the technical manual for Porcelain writers. "git diff", "git
add" and friends that are clearly Porcelain should talk about
what it does in the terms of end user operation in the
DESCRIPTION section and puts less stress on how things work
behind the scene in technical terms. For example, from
git-diff(1):
DESCRIPTION
-----------
Show changes between two trees, a tree and the working tree, a
tree and the index file, or the index file and the working tree.
The combination of what is compared with what is determined by
the number of trees given to the command.
That may be an accurate description of what the command does in
technical terms, but it does not tell why the user may want to
compare "a tree and the working tree". The users would want to
know which case applies to their current situation and we should
make it easier for them to find that information.
For example, although --cached is technically speaking one of
the --diff-options, it should be separated out from other
options when we talk about 'git-diff'. Also, although 'git-diff'
is designed to work on tree-ish, Porcelain users will use with
commit-ish (either a commit or an annotated signed tag that
points at a commit) 99.9% of the time, so we should mention
<tree-ish> at the end as a sidenote and talk about <commit>.
DESCRIPTION
-----------
This command shows changes between four combinations
of states.
* 'git-diff' [--options] [--] [<path>...]
is to see the changes you made relative to the index
(staging area for the next commit). In other words, the
differences are what you _could_ tell git to further add
to the index but you still haven't. You can stage
these changes by using gitlink:git-update-index[1].
* 'git-diff' [--options] --cached [<commit>] [--] [<path>...]
is to see the changes you staged for the next commit
relative to the named <tree-ish>. Typically you would
want comparison with the latest commit, so if you do
not give <commit>, it defaults to HEAD.
* 'git-diff' [--options] <commit> -- [<path>...]
is to see the changes you have in your working tree,
regardless of you staged them or not, relative to the
named <commit>.
* 'git-diff' [--options] <commit> <commit> -- [<path>...]
is to see the changes between two <commit>.
Just in case if you are doing something exotic, it
should be noted that all of the <commit> in the above
descriptoin can be any <tree-ish>.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-12-01 4:57 ` Theodore Tso
2006-12-01 6:20 ` Junio C Hamano
@ 2006-12-01 7:10 ` Linus Torvalds
2006-12-01 8:10 ` Daniel Barkalow
2 siblings, 0 replies; 15+ messages in thread
From: Linus Torvalds @ 2006-12-01 7:10 UTC (permalink / raw)
To: Theodore Tso; +Cc: Daniel Barkalow, git, Junio C Hamano
On Thu, 30 Nov 2006, Theodore Tso wrote:
>
> By the way, after thinking about this for a while, part of the problem
> is that the name "index" really sucks.
Hey, it was originally called "cache".
I don't care _what_ it's called, I just want people knowing about it,
because hiding it will just cripple git (ie at the very least, when you
hit a merge conflict, you really do want to to understand it if you ever
want to go the the "next level").
If people are more comfortable just calling it the "staging area", and
talking about it in those terms, I'll be happy.
> Put another way, the reason why I think people are liking the whole
> "git add" and "git rm" suggestion is that it's a nice middle ground
> between the "hide the index" and the "shove the index in the user's
> face" approaches. It's not that we are hiding the fact that there is
> this thing with the horribly chosen name "index", but instead we talk
> about this concept of a staging area and we don't dwell on things like
> the fact that it is a binary file which stores an efficient
> representation of a virtual directory.... blah blah blah.
Yes.
And even "git diff" isn't really a problem once you understand the staging
area. If people feel worried, let them use "git diff HEAD". You won't need
to use git for _that_ long until you realize that since the staging area
is going to match the HEAD under normal circumstances (and when it
doesn't, you actually tend to prefer to get the diff against the staging
area _anyway_), you'll find people just starting to use "git diff" and not
worry about it.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-12-01 4:57 ` Theodore Tso
2006-12-01 6:20 ` Junio C Hamano
2006-12-01 7:10 ` Linus Torvalds
@ 2006-12-01 8:10 ` Daniel Barkalow
2006-12-01 9:37 ` Andy Parkins
2006-12-02 8:26 ` Jakub Narebski
2 siblings, 2 replies; 15+ messages in thread
From: Daniel Barkalow @ 2006-12-01 8:10 UTC (permalink / raw)
To: Theodore Tso; +Cc: Linus Torvalds, git, Junio C Hamano
On Thu, 30 Nov 2006, Theodore Tso wrote:
> By the way, after thinking about this for a while, part of the problem
> is that the name "index" really sucks. Which is perhaps why Linus is
> now trying to stop us from actually using the term "index" in these
> discussions. :-) If we called it a "staging area", as our Great
> Leader has suggested, I think it would be a lot easier for novice
> users to understand. Consider what is in the git man page:
>
> The index is a simple binary file, which contains an efficient
> representation of a virtual directory content at some random
> time. It does so by a simple array that associates a set of
> names, dates, permissions and content (aka "blob") objects
> together. The cache is always kept ordered by name, and names
> are unique (with a few very specific rules) at any point in
> time, but the cache has no long-term meaning, and can be
> partially updated at any time.....
>
> In particular, the index file can have the representation of
> an intermediate tree that has not yet been instantiated. So
> the index can be thought of as a write-back cache, which can
> contain dirty information that has not yet been written back
> to the backing store.
>
> For a kernel programmer, this might not be understandable --- but for
> your typical application programmer, this is enough to cause him or
> her to conclude that git is simply not meant for use by mere mortals.
My position on this subject is that "index" is a good name, but that
description is a terrible description, and "index" is a word that needs a
good description in context. If we just said up front:
Git's "index" is a staging area that you use to prepare commits. It maps
filenames to content. It allows git to remember changes you want to put
into the next commit while you do more work. For normal commits, it is
not necessary to use the index, but it is very helpful for complicated
commits, because it lets you focus on the part you're still working on
while git remembers the part you're done with.
I think people would get it. (If it were called the "cache" still, it
would be hopeless, because "cache" implies false things; "index" doesn't
imply anything initially.)
Of course, we'd still have to disabuse people of the notion that the index
can store the information "there's nothing at this path yet, but I'm
interested in it", because that's a piece of information people often know
before a file is ready, and think git would be able to remember in a
staging area.
-Daniel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-12-01 8:10 ` Daniel Barkalow
@ 2006-12-01 9:37 ` Andy Parkins
2006-12-02 8:35 ` Jakub Narebski
2006-12-02 8:26 ` Jakub Narebski
1 sibling, 1 reply; 15+ messages in thread
From: Andy Parkins @ 2006-12-01 9:37 UTC (permalink / raw)
To: git
On Friday 2006 December 01 08:10, Daniel Barkalow wrote:
> My position on this subject is that "index" is a good name, but that
> description is a terrible description, and "index" is a word that needs a
> good description in context. If we just said up front:
If we need to explain what "index" means in the context of diff then it's not
a good name :-)
An index /everywhere else/ is a lookup table. topic->page number;
author->book title. record id->byte position. There is never any content in
an index, indices just point at content.
I imagine that's how git's index got it's name. (I'm only guessing as I've
not looked at what's actually inside git's "index"). Here's my guess:
git update-index file1 hashes file1, stores it somewhere under that hash and
writes the hash->filename connection to .git/index. That is why git's index
is called an index. It's a hash->filename index.
Unfortunately, "index" in colloquial git actually means the combination
of .git/index plus the hashed file itself. That's no longer an index, it's
a "book". :-)
It's made worse, I think, by the fact that git doesn't want to do any
index-like things with the "index". Being content-oriented rather than
name-oriented means that an entry like "file1->NOTHING" is impossible in git.
This leads to the sort of "git-add means track this filename" confusion that
turns up a lot with new users.
It's probably all too late to change the nomenclature, but I've always been of
the opinion that names are important, they confer meaning. When we use a
common word, with common meaning and deviate from that common meaning we are
bound to create confusion. New users don't have any "git-way-of-thinking"
knowledge when they begin, so when they hear "index" they can only fall back
on their standard understanding of that word. We shouldn't be surprised then
when new users don't get "the index".
Andy
--
Dr Andy Parkins, M Eng (hons), MIEE
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-12-01 8:10 ` Daniel Barkalow
2006-12-01 9:37 ` Andy Parkins
@ 2006-12-02 8:26 ` Jakub Narebski
1 sibling, 0 replies; 15+ messages in thread
From: Jakub Narebski @ 2006-12-02 8:26 UTC (permalink / raw)
To: git
Daniel Barkalow wrote:
> Of course, we'd still have to disabuse people of the notion that the index
> can store the information "there's nothing at this path yet, but I'm
> interested in it", because that's a piece of information people often know
> before a file is ready, and think git would be able to remember in a
> staging area.
Well, that was what about the first message in this thread about. Marking
a file "interesting" (so 'git commit -a' would pick it up) using all-0
for object hash... which of course requires review and if necessary
modification of all core tools which touch the index.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-12-01 9:37 ` Andy Parkins
@ 2006-12-02 8:35 ` Jakub Narebski
0 siblings, 0 replies; 15+ messages in thread
From: Jakub Narebski @ 2006-12-02 8:35 UTC (permalink / raw)
To: git
Andy Parkins wrote:
> On Friday 2006 December 01 08:10, Daniel Barkalow wrote:
>
>> My position on this subject is that "index" is a good name, but that
>> description is a terrible description, and "index" is a word that needs a
>> good description in context. If we just said up front:
>
> If we need to explain what "index" means in the context of diff then it's not
> a good name :-)
But "staging area" or more descriptive "staging area for commits" is
a bit long. But we no longer name the "index" "dircache".
> An index /everywhere else/ is a lookup table. topic->page number;
> author->book title. record id->byte position. There is never any content in
> an index, indices just point at content.
Just like git index.
> I imagine that's how git's index got it's name. (I'm only guessing as I've
> not looked at what's actually inside git's "index"). Here's my guess:
>
> git update-index file1 hashes file1, stores it somewhere under that hash and
> writes the hash->filename connection to .git/index. That is why git's index
> is called an index. It's a hash->filename index.
This "somewhere" is object repository. And it is reverse: it is
filename->(stat + hash) index; from a file in the working area to the blob
(or tree) in the repository.
> Unfortunately, "index" in colloquial git actually means the combination
> of .git/index plus the hashed file itself. That's no longer an index, it's
> a "book". :-)
Yes, it is true that "index" in colloquial git means "index version"
(version pointed by the "index").
> It's made worse, I think, by the fact that git doesn't want to do any
> index-like things with the "index". Being content-oriented rather than
> name-oriented means that an entry like "file1->NOTHING" is impossible in git.
> This leads to the sort of "git-add means track this filename" confusion that
> turns up a lot with new users.
It is possible. By convention all-0 hash means 'no such object'. The very
first message in this thread tried to make use of it... but "git add" to
mark filename as interesting instead of "git add" to add _current_ contents
of the file goes a bit against git ideas.
> It's probably all too late to change the nomenclature, but I've always been of
> the opinion that names are important, they confer meaning. When we use a
> common word, with common meaning and deviate from that common meaning we are
> bound to create confusion. New users don't have any "git-way-of-thinking"
> knowledge when they begin, so when they hear "index" they can only fall back
> on their standard understanding of that word. We shouldn't be surprised then
> when new users don't get "the index".
Well, "dircache" was changed to "index". "<ent>" was axed in preference
to "<tree-ish>". I think using "staging area" name in git man pages would
be a good idea (as would be making --index to be alias to --cached).
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] git-add update with all-0 object
2006-12-01 6:20 ` Junio C Hamano
@ 2006-12-02 8:55 ` Jakub Narebski
0 siblings, 0 replies; 15+ messages in thread
From: Jakub Narebski @ 2006-12-02 8:55 UTC (permalink / raw)
To: git
Junio C Hamano wrote:
> * 'git-diff' [--options] <commit> <commit> -- [<path>...]
>
> is to see the changes between two <commit>.
>
> Just in case if you are doing something exotic, it
> should be noted that all of the <commit> in the above
> descriptoin can be any <tree-ish>.
s/descriptoin/description/
It _might_ be worth mentioning that you can compare two arbitrary
files using
git diff [--options] <blob1 sha> <blob2 sha>
where <blob sha> can be entered as <tree-ish>:<filename>, usually
<commit>:<filename> (<filename> is HEAD:<filename>) to compare blob (file)
from a named tree/from a given commit, or as :<stage>:<filename> (or
just ::<filename> if file is not in merge conflict) to compare blob (file)
from an index.
If I understand correctly there is currently no way to compare files from a
working tree, not to mention files outside working tree
(including /dev/null) with that syntax.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-12-02 8:53 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow
2006-11-30 22:32 ` Johannes Schindelin
2006-11-30 22:34 ` Nicolas Pitre
2006-11-30 22:41 ` Jakub Narebski
2006-11-30 22:49 ` Nicolas Pitre
2006-11-30 22:46 ` Linus Torvalds
2006-12-01 0:12 ` Daniel Barkalow
2006-12-01 4:57 ` Theodore Tso
2006-12-01 6:20 ` Junio C Hamano
2006-12-02 8:55 ` Jakub Narebski
2006-12-01 7:10 ` Linus Torvalds
2006-12-01 8:10 ` Daniel Barkalow
2006-12-01 9:37 ` Andy Parkins
2006-12-02 8:35 ` Jakub Narebski
2006-12-02 8:26 ` Jakub Narebski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).