[RFC] git-add update with all-0 object

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] git-add update with all-0 object
@ 2006-11-30 22:08 Daniel Barkalow
  2006-11-30 22:32 ` Johannes Schindelin
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Daniel Barkalow @ 2006-11-30 22:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

One thing that I think is non-intuitive to a lot of users (either novice 
or who just don't do it much) is that it matters where in the process you 
do "git add <path>" if you're also changing the file. Even if you 
understand the index, you may not realize (or may not have internalized 
the fact) that what git-add does is update the index with what's there 
now.

I think the more obvious behavior is to have it record the fact that you 
want to have the path tracked, but require one of the usual updating 
mechanisms to get a particular content into the index.

This should be pretty simple to implement: use --cacheinfo 0 0 $path 
instead of --add -- $path, and teach programs that look at the objects 
recorded in the index (rather than just hashes or other info) about all-0 
hashes meaning "but no content there". write-tree would probably just 
skip the entry (and then you could add a file, but still produce commits 
without it until you actually do either an update-index explicitly or one 
of the commit option sets that updates it); diff would treat it as empty; 
checkout would ignore it.

	-Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow
@ 2006-11-30 22:32 ` Johannes Schindelin
  2006-11-30 22:34 ` Nicolas Pitre
  2006-11-30 22:46 ` Linus Torvalds
  2 siblings, 0 replies; 15+ messages in thread
From: Johannes Schindelin @ 2006-11-30 22:32 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Junio C Hamano

Hi,

On Thu, 30 Nov 2006, Daniel Barkalow wrote:

> I think the more obvious behavior is to have it record the fact that you 
> want to have the path tracked, but require one of the usual updating 
> mechanisms to get a particular content into the index.

I fear that this is just your being used to the CVS mindset. Please see 
http://article.gmane.org/gmane.comp.version-control.git/32792 for details.

Hth,
Dscho

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow
  2006-11-30 22:32 ` Johannes Schindelin
@ 2006-11-30 22:34 ` Nicolas Pitre
  2006-11-30 22:41   ` Jakub Narebski
  2006-11-30 22:46 ` Linus Torvalds
  2 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2006-11-30 22:34 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Junio C Hamano

On Thu, 30 Nov 2006, Daniel Barkalow wrote:

> One thing that I think is non-intuitive to a lot of users (either novice 
> or who just don't do it much) is that it matters where in the process you 
> do "git add <path>" if you're also changing the file. Even if you 
> understand the index, you may not realize (or may not have internalized 
> the fact) that what git-add does is update the index with what's there 
> now.

And actually I think this is a good thing.  This is what makes the index 
worth it.  Better find a way to make it obvious to people what's 
happening.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-11-30 22:34 ` Nicolas Pitre
@ 2006-11-30 22:41   ` Jakub Narebski
  2006-11-30 22:49     ` Nicolas Pitre
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Narebski @ 2006-11-30 22:41 UTC (permalink / raw)
  To: git

Nicolas Pitre wrote:

> On Thu, 30 Nov 2006, Daniel Barkalow wrote:
> 
>> One thing that I think is non-intuitive to a lot of users (either novice 
>> or who just don't do it much) is that it matters where in the process you 
>> do "git add <path>" if you're also changing the file. Even if you 
>> understand the index, you may not realize (or may not have internalized 
>> the fact) that what git-add does is update the index with what's there 
>> now.
> 
> And actually I think this is a good thing.  This is what makes the index 
> worth it.  Better find a way to make it obvious to people what's 
> happening.

Still, perhaps (perhaps!) it would be useful to have "intent to add"
git-add.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-11-30 22:41   ` Jakub Narebski
@ 2006-11-30 22:49     ` Nicolas Pitre
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2006-11-30 22:49 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Thu, 30 Nov 2006, Jakub Narebski wrote:

> Nicolas Pitre wrote:
> 
> > On Thu, 30 Nov 2006, Daniel Barkalow wrote:
> > 
> >> One thing that I think is non-intuitive to a lot of users (either novice 
> >> or who just don't do it much) is that it matters where in the process you 
> >> do "git add <path>" if you're also changing the file. Even if you 
> >> understand the index, you may not realize (or may not have internalized 
> >> the fact) that what git-add does is update the index with what's there 
> >> now.
> > 
> > And actually I think this is a good thing.  This is what makes the index 
> > worth it.  Better find a way to make it obvious to people what's 
> > happening.
> 
> Still, perhaps (perhaps!) it would be useful to have "intent to add"
> git-add.

Well, sure.  It could be an argument to git-add.  But surely not the 
default?

git-add --latest maybe?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow
  2006-11-30 22:32 ` Johannes Schindelin
  2006-11-30 22:34 ` Nicolas Pitre
@ 2006-11-30 22:46 ` Linus Torvalds
  2006-12-01  0:12   ` Daniel Barkalow
  2 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2006-11-30 22:46 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Junio C Hamano

On Thu, 30 Nov 2006, Daniel Barkalow wrote:
> 
> I think the more obvious behavior is to have it record the fact that you 
> want to have the path tracked, but require one of the usual updating 
> mechanisms to get a particular content into the index.

While this certainly matches the git model better than just automatically 
taking whatever state exist at commit time (you instead introduce it as a 
special "empty state" case), I don't think you really want it.

Why? 

Two reasons:

 - you're still left with all the same issues (ie you do need to use "git 
   commit -a" because that is simply fundamental, and if you don't, "git 
   commit" now causes an ERROR, which is just illogical - you just added 
   the data!)

   So it's simply better to just tell people "git add" adds the whole 
   state. Explain to them that git doesn't track "filenames", it tracks 
   state, and when you do a "git add", it really adds the _data_ and the 
   permissions too.

   Really, if you didn't come from years of broken SCM's, you'd think that 
   it's _natural_ that when you add a file for tracking, you add its 
   contents too. It's not that git is surprising or unnatural, it's that 
   CVS is.

 - you generally really don't want to see "git diff" show you the big diff 
   for a new creation. You only think you do, but trust me, you generally 
   don't. It's the same thing as with doing merges - keeping the 
   automerged state in the index is actually nice, because it means that 
   the default "git diff" can just shut the heck up about the things that 
   may be the _bulk_ of the change, but it's not the interesting part.

So I would suggest that if people are irritated with "git diff" for 
example not showing newly added files AT ALL, then the solution to that 
isn't that they should be added as "empty" or "all zeroes". We do have 
other state bits in the index already (we need them for marking things as 
being unmerged etc), and if the problem is that you want to see that you 
have a pending add, it's easy enough to have "git add" always set a bit 
saying "this file is new".

A normal "read tree object" would populate index entries with that bit 
cleared, and so it would be possible to have

	git add file.c
	git diff

show something like

	diff --git a/file.c b/file.c
	added file <mode> <sha1>

rather than show the whole big diff (which I _really_ don't think you want 
to see, and which is actually against the whole point, which is that you 
add _content_ to the index, and "git diff" will always show you the stuff 
that is _not_ added to the index yet).

(Of course, if you _also_ had changed it between the "git add" and the 
"git diff", you'd get both the "added file <mode> <sha1>" _and_ the diff 
that is the diff between the thing you added, and the status it has now).

So showing a real diff after "git add" would really be wrong. The index 
really is important. But if it's _only_ an issue of worrying about seeing 
added files at all, we can add a "people comfort" bit to do that.

(Quite frankly, I don't think it's worthwhile. I really think this is a 
documentation issue. Make people understand that "git add" adds the 
contents too, and that git never tracks filenames on their own at all).

So it is always going to be true that

	git add file
	echo New line >> file
	git commit

must commit the old contents of the file. That really _does_ follow from 
the whole "track contents" model. Anything that doesn't do this is 
fundamnetally broken, and has broken the notion of what "git add" means.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-11-30 22:46 ` Linus Torvalds
@ 2006-12-01  0:12   ` Daniel Barkalow
  2006-12-01  4:57     ` Theodore Tso
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Barkalow @ 2006-12-01  0:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Junio C Hamano

On Thu, 30 Nov 2006, Linus Torvalds wrote:

> A normal "read tree object" would populate index entries with that bit 
> cleared, and so it would be possible to have
> 
> 	git add file.c
> 	git diff
> 
> show something like
> 
> 	diff --git a/file.c b/file.c
> 	added file <mode> <sha1>
> 
> rather than show the whole big diff (which I _really_ don't think you want 
> to see, and which is actually against the whole point, which is that you 
> add _content_ to the index, and "git diff" will always show you the stuff 
> that is _not_ added to the index yet).

I'm not sure I want to see the whole added file more when diffing two 
trees, or when I do "git diff --cached" after "git update-index --add", 
than when I do "git diff" after "git add", but I'll concede that viewing 
the content of a new file as a diff is no fun. (Maybe diff-against-nothing 
for display needs work in general? It's solve the whole root commit thing, 
too.)

> (Of course, if you _also_ had changed it between the "git add" and the 
> "git diff", you'd get both the "added file <mode> <sha1>" _and_ the diff 
> that is the diff between the thing you added, and the status it has now).
> 
> So showing a real diff after "git add" would really be wrong. The index 
> really is important. But if it's _only_ an issue of worrying about seeing 
> added files at all, we can add a "people comfort" bit to do that.

This is where I think "git add" is really broken. For every other git 
command, if the command causes the index to not match HEAD, the command 
contains "index" either in the name of the command or in an option.

So, if you understand the index, and you understand git's model, but you 
don't know this one weird corner case, you will come to the conclusion 
that "git add <path>" leaves <path> such that the index matches HEAD.

Now *you* know that "git add" really is "git update-index --add", because 
you were typing the latter (well, "git update-cache --add", anyway) before 
"git add" existed at all. But for new users, and anyone who wasn't adding 
a lot of files back then, it's a surprising exception that has to be 
learned and internalized.

"git checkout" leaves the index matching HEAD or its original state.
"git commit" leaves the index matching HEAD (the new HEAD) or its original 
state.
"git reset" (all options) leaves the index matching HEAD or its original 
state.
"git pull/merge" does disrupt the index, but it also starts to prepare a 
commit based on multiple *HEAD files, and it leaves every stage of the 
index matching some *HEAD or its original state. And new users still seem 
to wonder where the merge happens, because it doesn't say "in the index".
"git apply" leaves the index alone.

"git update-index" says it works on the index.
"git apply --index" says it works on the index.

Am I missing any violations of the rule? I guess "git rm", but that's just 
for the CVS-damaged, unnecessary anyway, and it still doesn't care about 
the state of the working directory at any particular point in time. And I 
still prefer "git update-index --force-remove" as a command for that 
operation.

So it's obvious that the "add" functionality is properly called "git add 
--index", because whatever "git add" would, it would have to leave the 
index matching HEAD or its original state.

(Well, okay, '"git commit -i path" ^C', violates the rule. But I forgot 
until recently that -i stands for --include, not --index, which would make 
a reasonable expansion, too)

> (Quite frankly, I don't think it's worthwhile. I really think this is a 
> documentation issue. Make people understand that "git add" adds the 
> contents too, and that git never tracks filenames on their own at all).

I think people's model is likely to be closer to "touch" for the index, 
especially since it has no effect if the file is already in the index.

> So it is always going to be true that
> 
> 	git add file
> 	echo New line >> file
> 	git commit
> 
> must commit the old contents of the file. That really _does_ follow from 
> the whole "track contents" model. Anything that doesn't do this is 
> fundamnetally broken, and has broken the notion of what "git add" means.

"git add" doesn't *say* it changes the index, and nothing else there 
*says* it changes the index, so "git commit" there should say "nothing to 
commit", because you never did "git update-index file", either before or 
after the change, and you didn't do "git commit file" or "git commit -a". 

Just tossing the words in commands around, it's obvious that what 
"git add file" should do is mean that you can now do
"git update-index file" instead of
"git update-index --add file". Saying you shouldn't need "update-index" 
after adding a file is like saying you shouldn't need "update-index" after 
modifying a file.

But it shouldn't change my index any more than "git apply" should, because 
it doesn't say it updates the index. (Of course, it would be good to have 
"git add --index file", matching "git apply --index patch", which does 
what "git add" does now.)

Now, in order to interact correctly with reseting, checking out a 
different branch, etc, it wants to have the information in the index 
file, so there isn't a separate file with a list to lose stuff from. And 
it patterns naturally as an adjunct to the index for some things (like 
ls-files, which doesn't care at all what the content associated with 
filenames is). But that's fundamentally an implementation detail, not an 
aspect of the model.

	-Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-12-01  0:12   ` Daniel Barkalow
@ 2006-12-01  4:57     ` Theodore Tso
  2006-12-01  6:20       ` Junio C Hamano
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Theodore Tso @ 2006-12-01  4:57 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, git, Junio C Hamano

On Thu, Nov 30, 2006 at 07:12:31PM -0500, Daniel Barkalow wrote:
> This is where I think "git add" is really broken. For every other git 
> command, if the command causes the index to not match HEAD, the command 
> contains "index" either in the name of the command or in an option.
> 
> So, if you understand the index, and you understand git's model, but you 
> don't know this one weird corner case, you will come to the conclusion 
> that "git add <path>" leaves <path> such that the index matches HEAD.

But it's not just this one wierd corner case.  You yourself said that
"git pull/merge" leave the index where it's != HEAD.   

I have serious trouble believing that "if the command leaves index !=
HEAD, the command must contain 'index' in either the name of the
command or the option" is all that important of a consistent rule or
principle that must be maintained at all costs.

By the way, after thinking about this for a while, part of the problem
is that the name "index" really sucks.  Which is perhaps why Linus is
now trying to stop us from actually using the term "index" in these
discussions.  :-)    If we called it a "staging area", as our Great
Leader has suggested, I think it would be a lot easier for novice
users to understand.    Consider what is in the git man page:

	The index is a simple binary file, which contains an efficient
	representation of a virtual directory content at some random
	time.  It does so by a simple array that associates a set of
	names, dates, permissions and content (aka "blob") objects
	together. The cache is always kept ordered by name, and names
	are unique (with a few very specific rules) at any point in
	time, but the cache has no long-term meaning, and can be
	partially updated at any time.....

	In particular, the index file can have the representation of
	an intermediate tree that has not yet been instantiated. So
	the index can be thought of as a write-back cache, which can
	contain dirty information that has not yet been written back
	to the backing store.

For a kernel programmer, this might not be understandable --- but for
your typical application programmer, this is enough to cause him or
her to conclude that git is simply not meant for use by mere mortals.

So as Junio and Linus have both said, it's all about your mental
model, and if we think about it in terms of a staging area for a
commit, and we think about what commands are most natural given that
model, it's far more important than whether a command has "index" in
its name or specified in an option.

Put another way, the reason why I think people are liking the whole
"git add" and "git rm" suggestion is that it's a nice middle ground
between the "hide the index" and the "shove the index in the user's
face" approaches.  It's not that we are hiding the fact that there is
this thing with the horribly chosen name "index", but instead we talk
about this concept of a staging area and we don't dwell on things like
the fact that it is a binary file which stores an efficient
representation of a virtual directory.... blah blah blah.

Once this is done, the only command which is still problematic to
describe is "git diff".  Yes, it almost always does the right thing.
But if you read the man page, even we are now using "<tree-ish>"
instead of "<ent>" to describe it, it still forces the user who is
reading the man page to prove to him- or her-self that it really
always does the right thing.  The EXAMPLES section really helps, but
even so, the man page is need in terrible of help.

For example, exactly what "git diff" does is described in terms of
"git diff-files", "git diff-index". and "git diff-tree".  (And the
command name git-diff-index, git-diff-tree and git-diff-files in the
DESCRIPTION aren't even hotlinks, making it hard to get to the
plumbing man pages, which is the only place where you can get
documentation of the options accepted by git-diff.)    

OK, so once the novice user gets past this hurdle, he/she says, OK,
what does "git diff <tree-ish>" does?  Hmm, according to EXAMPLES,
this diffs the working tree with the named tree.  What options can I
give?  Well, with one one <tree-ish>, I have to go to read the man
page for "git-diff-index", whose synposis says, "Compares content and
mode of blobs between the index and repository".  But wait!  According
to git-diff's EXAMLES section, "git diff <tree-ish>" doesn't involve
the index at all!  Why does the synposis say anything about the index?
And this leaves the novice confused and bewildered.  And why not?  If
the user spends time puzzling through the man page, he/she will
discover that:

1) "git diff-index <tree>" compares the tree with the working
directory, and doesn't involve the index at all, even though it is in
the command name.  WTF?!?

2) If you want to really diff the index, you have to use the command
"git diff-index --cached <tree>"

If you look at this from the point of the novice user, it becomes very
clear why the index and commands that operate on the index are
hopelessly confusing.  Yes, if you the grasshopper read and medidate
very deeply the low-level meaning of the plumbing, and then someone
like Linus slaps you upside the head with one of his e-mail messages,
it will suddenly make sense to you.  The problem with this method is
that it doesn't scale terribly well.  :-)

But if you are just reading the "git-diff" man page for the first
time, and are then forced to read the "git-diff-index" man page to
puzzle out what a particular "git diff" option does, and then have to
confront the notion that something as "git diff HEAD" involves a
command "git diff-index", even though this confusing thing called the
index is never involved unless the --cache option is given --- can you
see how this might cause the beginning user of git to conclude that
git is hopelessly confusing and too hard to use?

The question then is how can we fix the "git diff" man page, and how
do we explain "git diff" in a tutorial so that users can understand
what in the world does it do?  For a starting point, I'd recommend
moving the EXAMPLES to the beginning of the man page, and moving the
any mention of git-diff-index, git-diff-files, and git-diff-tree to
the very end of the man page, and to put the most commonly used
options in the git-diff man page, so that most users don't have to
look at the low-level plumbing man pages to figure out how the
high-level git-diff works.  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-12-01  4:57     ` Theodore Tso
@ 2006-12-01  6:20       ` Junio C Hamano
  2006-12-02  8:55         ` Jakub Narebski
  2006-12-01  7:10       ` Linus Torvalds
  2006-12-01  8:10       ` Daniel Barkalow
  2 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2006-12-01  6:20 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Daniel Barkalow, Linus Torvalds, git

Theodore Tso <tytso@mit.edu> writes:

> The question then is how can we fix the "git diff" man page, and how
> do we explain "git diff" in a tutorial so that users can understand
> what in the world does it do?  For a starting point, I'd recommend
> moving the EXAMPLES to the beginning of the man page, and moving the
> any mention of git-diff-index, git-diff-files, and git-diff-tree to
> the very end of the man page, and to put the most commonly used
> options in the git-diff man page, so that most users don't have to
> look at the low-level plumbing man pages to figure out how the
> high-level git-diff works.  

All good points.  The only slight worry I have is that just
moving EXAMPLE up deviates from the traditional UNIX manpage
order of presenting information.

I think the plumbing manuals can (and probably should) stay as
the technical manual for Porcelain writers.  "git diff", "git
add" and friends that are clearly Porcelain should talk about
what it does in the terms of end user operation in the
DESCRIPTION section and puts less stress on how things work
behind the scene in technical terms.  For example, from
git-diff(1):

        DESCRIPTION
        -----------
        Show changes between two trees, a tree and the working tree, a
        tree and the index file, or the index file and the working tree.
        The combination of what is compared with what is determined by
        the number of trees given to the command.

That may be an accurate description of what the command does in
technical terms, but it does not tell why the user may want to
compare "a tree and the working tree".  The users would want to
know which case applies to their current situation and we should
make it easier for them to find that information.

For example, although --cached is technically speaking one of
the --diff-options, it should be separated out from other
options when we talk about 'git-diff'.  Also, although 'git-diff'
is designed to work on tree-ish, Porcelain users will use with
commit-ish (either a commit or an annotated signed tag that
points at a commit) 99.9% of the time, so we should mention
<tree-ish> at the end as a sidenote and talk about <commit>.

	DESCRIPTION
	-----------
	This command shows changes between four combinations 
	of states.

	* 'git-diff' [--options] [--] [<path>...]

          is to see the changes you made relative to the index
          (staging area for the next commit).  In other words, the
          differences are what you _could_ tell git to further add
          to the index but you still haven't.  You can stage
          these changes by using gitlink:git-update-index[1].

        * 'git-diff' [--options] --cached [<commit>] [--] [<path>...]

          is to see the changes you staged for the next commit
          relative to the named <tree-ish>.  Typically you would
          want comparison with the latest commit, so if you do
          not give <commit>, it defaults to HEAD.

        * 'git-diff' [--options] <commit> -- [<path>...]

          is to see the changes you have in your working tree,
          regardless of you staged them or not, relative to the
          named <commit>.

	* 'git-diff' [--options] <commit> <commit> -- [<path>...]

	  is to see the changes between two <commit>.

	Just in case if you are doing something exotic, it
        should be noted that all of the <commit> in the above
        descriptoin can be any <tree-ish>.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-12-01  6:20       ` Junio C Hamano
@ 2006-12-02  8:55         ` Jakub Narebski
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Narebski @ 2006-12-02  8:55 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:

>         * 'git-diff' [--options] <commit> <commit> -- [<path>...]
> 
>           is to see the changes between two <commit>.
> 
>         Just in case if you are doing something exotic, it
>         should be noted that all of the <commit> in the above
>         descriptoin can be any <tree-ish>.

s/descriptoin/description/
          
It _might_ be worth mentioning that you can compare two arbitrary
files using

   git diff [--options] <blob1 sha> <blob2 sha>

where <blob sha> can be entered as <tree-ish>:<filename>, usually
<commit>:<filename> (<filename> is HEAD:<filename>) to compare blob (file)
from a named tree/from a given commit, or as :<stage>:<filename> (or
just ::<filename> if file is not in merge conflict) to compare blob (file)
from an index.

If I understand correctly there is currently no way to compare files from a
working tree, not to mention files outside working tree
(including /dev/null) with that syntax.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-12-01  4:57     ` Theodore Tso
  2006-12-01  6:20       ` Junio C Hamano
@ 2006-12-01  7:10       ` Linus Torvalds
  2006-12-01  8:10       ` Daniel Barkalow
  2 siblings, 0 replies; 15+ messages in thread
From: Linus Torvalds @ 2006-12-01  7:10 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Daniel Barkalow, git, Junio C Hamano

On Thu, 30 Nov 2006, Theodore Tso wrote:
>
> By the way, after thinking about this for a while, part of the problem
> is that the name "index" really sucks.

Hey, it was originally called "cache".

I don't care _what_ it's called, I just want people knowing about it, 
because hiding it will just cripple git (ie at the very least, when you 
hit a merge conflict, you really do want to to understand it if you ever 
want to go the the "next level").

If people are more comfortable just calling it the "staging area", and 
talking about it in those terms, I'll be happy.

> Put another way, the reason why I think people are liking the whole
> "git add" and "git rm" suggestion is that it's a nice middle ground
> between the "hide the index" and the "shove the index in the user's
> face" approaches.  It's not that we are hiding the fact that there is
> this thing with the horribly chosen name "index", but instead we talk
> about this concept of a staging area and we don't dwell on things like
> the fact that it is a binary file which stores an efficient
> representation of a virtual directory.... blah blah blah.

Yes.

And even "git diff" isn't really a problem once you understand the staging 
area. If people feel worried, let them use "git diff HEAD". You won't need 
to use git for _that_ long until you realize that since the staging area 
is going to match the HEAD under normal circumstances (and when it 
doesn't, you actually tend to prefer to get the diff against the staging 
area _anyway_), you'll find people just starting to use "git diff" and not 
worry about it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-12-01  4:57     ` Theodore Tso
  2006-12-01  6:20       ` Junio C Hamano
  2006-12-01  7:10       ` Linus Torvalds
@ 2006-12-01  8:10       ` Daniel Barkalow
  2006-12-01  9:37         ` Andy Parkins
  2006-12-02  8:26         ` Jakub Narebski
  2 siblings, 2 replies; 15+ messages in thread
From: Daniel Barkalow @ 2006-12-01  8:10 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Linus Torvalds, git, Junio C Hamano

On Thu, 30 Nov 2006, Theodore Tso wrote:

> By the way, after thinking about this for a while, part of the problem
> is that the name "index" really sucks.  Which is perhaps why Linus is
> now trying to stop us from actually using the term "index" in these
> discussions.  :-)    If we called it a "staging area", as our Great
> Leader has suggested, I think it would be a lot easier for novice
> users to understand.    Consider what is in the git man page:
> 
> 	The index is a simple binary file, which contains an efficient
> 	representation of a virtual directory content at some random
> 	time.  It does so by a simple array that associates a set of
> 	names, dates, permissions and content (aka "blob") objects
> 	together. The cache is always kept ordered by name, and names
> 	are unique (with a few very specific rules) at any point in
> 	time, but the cache has no long-term meaning, and can be
> 	partially updated at any time.....
> 
> 	In particular, the index file can have the representation of
> 	an intermediate tree that has not yet been instantiated. So
> 	the index can be thought of as a write-back cache, which can
> 	contain dirty information that has not yet been written back
> 	to the backing store.
> 
> For a kernel programmer, this might not be understandable --- but for
> your typical application programmer, this is enough to cause him or
> her to conclude that git is simply not meant for use by mere mortals.

My position on this subject is that "index" is a good name, but that 
description is a terrible description, and "index" is a word that needs a 
good description in context. If we just said up front:

 Git's "index" is a staging area that you use to prepare commits. It maps 
 filenames to content. It allows git to remember changes you want to put 
 into the next commit while you do more work. For normal commits, it is 
 not necessary to use the index, but it is very helpful for complicated 
 commits, because it lets you focus on the part you're still working on 
 while git remembers the part you're done with.

I think people would get it. (If it were called the "cache" still, it 
would be hopeless, because "cache" implies false things; "index" doesn't 
imply anything initially.)

Of course, we'd still have to disabuse people of the notion that the index 
can store the information "there's nothing at this path yet, but I'm 
interested in it", because that's a piece of information people often know 
before a file is ready, and think git would be able to remember in a 
staging area.

	-Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-12-01  8:10       ` Daniel Barkalow
@ 2006-12-01  9:37         ` Andy Parkins
  2006-12-02  8:35           ` Jakub Narebski
  2006-12-02  8:26         ` Jakub Narebski
  1 sibling, 1 reply; 15+ messages in thread
From: Andy Parkins @ 2006-12-01  9:37 UTC (permalink / raw)
  To: git

On Friday 2006 December 01 08:10, Daniel Barkalow wrote:

> My position on this subject is that "index" is a good name, but that
> description is a terrible description, and "index" is a word that needs a
> good description in context. If we just said up front:

If we need to explain what "index" means in the context of diff then it's not 
a good name :-)

An index /everywhere else/ is a lookup table.  topic->page number; 
author->book title.  record id->byte position.  There is never any content in 
an index, indices just point at content.

I imagine that's how git's index got it's name.  (I'm only guessing as I've 
not looked at what's actually inside git's "index").  Here's my guess:

git update-index file1 hashes file1, stores it somewhere under that hash and 
writes the hash->filename connection to .git/index.  That is why git's index 
is called an index.  It's a hash->filename index.

Unfortunately, "index" in colloquial git actually means the combination 
of .git/index plus the hashed file itself.  That's no longer an index, it's 
a "book". :-)

It's made worse, I think, by the fact that git doesn't want to do any 
index-like things with the "index".  Being content-oriented rather than 
name-oriented means that an entry like "file1->NOTHING" is impossible in git.  
This leads to the sort of "git-add means track this filename" confusion that 
turns up a lot with new users.

It's probably all too late to change the nomenclature, but I've always been of 
the opinion that names are important, they confer meaning.  When we use a 
common word, with common meaning and deviate from that common meaning we are 
bound to create confusion.  New users don't have any "git-way-of-thinking" 
knowledge when they begin, so when they hear "index" they can only fall back 
on their standard understanding of that word.  We shouldn't be surprised then 
when new users don't get "the index".

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-12-01  9:37         ` Andy Parkins
@ 2006-12-02  8:35           ` Jakub Narebski
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Narebski @ 2006-12-02  8:35 UTC (permalink / raw)
  To: git

Andy Parkins wrote:

> On Friday 2006 December 01 08:10, Daniel Barkalow wrote:
> 
>> My position on this subject is that "index" is a good name, but that
>> description is a terrible description, and "index" is a word that needs a
>> good description in context. If we just said up front:
> 
> If we need to explain what "index" means in the context of diff then it's not 
> a good name :-)

But "staging area" or more descriptive "staging area for commits" is
a bit long. But we no longer name the "index" "dircache".

> An index /everywhere else/ is a lookup table.  topic->page number; 
> author->book title.  record id->byte position.  There is never any content in 
> an index, indices just point at content.

Just like git index.

> I imagine that's how git's index got it's name.  (I'm only guessing as I've 
> not looked at what's actually inside git's "index").  Here's my guess:
> 
> git update-index file1 hashes file1, stores it somewhere under that hash and 
> writes the hash->filename connection to .git/index.  That is why git's index 
> is called an index.  It's a hash->filename index.

This "somewhere" is object repository. And it is reverse: it is 
filename->(stat + hash) index; from a file in the working area to the blob
(or tree) in the repository.

> Unfortunately, "index" in colloquial git actually means the combination 
> of .git/index plus the hashed file itself.  That's no longer an index, it's 
> a "book". :-)

Yes, it is true that "index" in colloquial git means "index version"
(version pointed by the "index").

> It's made worse, I think, by the fact that git doesn't want to do any 
> index-like things with the "index".  Being content-oriented rather than 
> name-oriented means that an entry like "file1->NOTHING" is impossible in git.  
> This leads to the sort of "git-add means track this filename" confusion that 
> turns up a lot with new users.

It is possible. By convention all-0 hash means 'no such object'. The very
first message in this thread tried to make use of it... but "git add" to
mark filename as interesting instead of "git add" to add _current_ contents
of the file goes a bit against git ideas.

> It's probably all too late to change the nomenclature, but I've always been of 
> the opinion that names are important, they confer meaning.  When we use a 
> common word, with common meaning and deviate from that common meaning we are 
> bound to create confusion.  New users don't have any "git-way-of-thinking" 
> knowledge when they begin, so when they hear "index" they can only fall back 
> on their standard understanding of that word.  We shouldn't be surprised then 
> when new users don't get "the index".

Well, "dircache" was changed to "index". "<ent>" was axed in preference
to "<tree-ish>". I think using "staging area" name in git man pages would
be a good idea (as would be making --index to be alias to --cached).
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] git-add update with all-0 object
  2006-12-01  8:10       ` Daniel Barkalow
  2006-12-01  9:37         ` Andy Parkins
@ 2006-12-02  8:26         ` Jakub Narebski
  1 sibling, 0 replies; 15+ messages in thread
From: Jakub Narebski @ 2006-12-02  8:26 UTC (permalink / raw)
  To: git

Daniel Barkalow wrote:

> Of course, we'd still have to disabuse people of the notion that the index 
> can store the information "there's nothing at this path yet, but I'm 
> interested in it", because that's a piece of information people often know 
> before a file is ready, and think git would be able to remember in a 
> staging area.

Well, that was what about the first message in this thread about. Marking
a file "interesting" (so 'git commit -a' would pick it up) using all-0
for object hash... which of course requires review and if necessary
modification of all core tools which touch the index.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-12-02  8:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-30 22:08 [RFC] git-add update with all-0 object Daniel Barkalow
2006-11-30 22:32 ` Johannes Schindelin
2006-11-30 22:34 ` Nicolas Pitre
2006-11-30 22:41   ` Jakub Narebski
2006-11-30 22:49     ` Nicolas Pitre
2006-11-30 22:46 ` Linus Torvalds
2006-12-01  0:12   ` Daniel Barkalow
2006-12-01  4:57     ` Theodore Tso
2006-12-01  6:20       ` Junio C Hamano
2006-12-02  8:55         ` Jakub Narebski
2006-12-01  7:10       ` Linus Torvalds
2006-12-01  8:10       ` Daniel Barkalow
2006-12-01  9:37         ` Andy Parkins
2006-12-02  8:35           ` Jakub Narebski
2006-12-02  8:26         ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).