Terminology

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Terminology
@ 2005-07-31 13:52 Johannes Schindelin
  2005-07-31 18:33 ` Terminology Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Schindelin @ 2005-07-31 13:52 UTC (permalink / raw)
  To: git

Hi,

the other day I got confused by the terminology. Maybe I'm not the only
one:

The GIT equivalent of a CVS branch is sometimes called a branch
(git-new-branch), sometimes a tree (git-switch-tree), and sometimes a
head (which seems counterintuitive to CVS people: they only have one
HEAD; pun(s) intended).

What is worse: a tree often refers to something different, namely a
directory structure corresponding to a certain commit (which SVN people
would call revision). And in $GIT_DIR/branches, short cuts for remote
addresses are stored (and therefore I would have preferred
$GIT_DIR/remotes).

Maybe we should decide on a common terminology before kicking out 1.0, and
look through all files in Documentation/ to have a consistent vocabulary.
And poor me does not get confused no more.

Ciao,
Dscho

--
Git-R-Done

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Terminology
  2005-07-31 13:52 Terminology Johannes Schindelin
@ 2005-07-31 18:33 ` Junio C Hamano
  2005-07-31 22:38   ` Terminology Johannes Schindelin
  2005-08-05 14:57   ` Terminology Johannes Schindelin
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2005-07-31 18:33 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Maybe we should decide on a common terminology before kicking out 1.0, and
> look through all files in Documentation/ to have a consistent vocabulary.
> And poor me does not get confused no more.

Glad to see you started the discussion on this one.  I have a
slight worry and suspicion that this might open a can of worms,
but I agree we need to get this done.  We probably would end up
spliting the Terminology section in Documentation/git.txt into a
separate "Glossary" document.

Care to volunteer drafting a strawman, listing the concepts we
need terms for, marking the ones we seem to use the same word
for?   You do not have to suggest which candidate term to use
for all of them.  Something along these lines...

 - The unit of storage in GIT is called "object"; no other word
   is used and the word "object" is used only for this purpose
   so this one is OK.
  
 - A 20-byte SHA1 to uniquely identify "objects"; README and
   early Linus messages call this "object name" so does
   tutorial.  Many places say "object SHA1" or just "SHA1".

 - An "object database" stores a set of "objects", and an
   individial object can be retrieved by giving it its object
   name.

 - Storing a regular file or a symlink in the object database
   results in a "blob object" created.  You cannot directly
   store filesystem directory, but a collection of blob objects
   and other tree objects can be recorded as a "tree object"
   which corresponds to this notion.

 - $GIT_INDEX_FILE is "index file", which is a collection of
   "cache entries".  The former is sometimes called "cache
   file", the latter just "cache".

 - the directory which corresponds to the top of the hierarchy
   described in the index file; I've seen words like "working
   tree", "working directory", "work tree" used.

 - When the stat information a cache entry records matches what
   is in the work tree, the entry is called "clean" or
   "up-to-date".  The opposite is "dirty" or "not up-to-date".

 - An index file can be in "merged" or "unmerged" state.  The
   former is when it does not have anything but stage 0 entries,
   the latter otherwise.

 - An merged index file can be written as a "tree object", which
   is technically a set of interconnected tree objects but we
   equate it with the toplevel tree object with this set.

 - A "tree object" can be recorded as a part of a "commit
   object".  The tree object is said to be "associated with" the
   commit object.

 - A "tag object" can be recorded as a pointer to another object
   of any type. The act of following the pointer a tag object
   holds (this can go recursively) until we get to a non-tag
   object is sometimes called "resolving the tag".

 - The following objects are collectively called "tree-ish": a
   tree object, a commit object, a tag object that resolves to
   either a commit or a tree object, and can be given to
   commands that expect to work on a tree object.

 - The files under $GIT_DIR/refs record object names, and are
   called "refs".  What is under refs/heads/ are called "heads",
   refs/tags/ "tags".  Typically, they are either object names
   of commit objects or tag objects that resolve to commit
   objects, but a tag can point at any object.

 - A "head" is always an object name of a commit, and marks the
   latest commit in one line of development.  A line of
   development is often called a "branch".  We sometimes use the
   word "branch head" to stress the fact that we are talking
   about a single commit that is the latest one in a "branch".

 - Combining the states from more than one lines of developments
   is called "merging" and typically done between two branch
   heads.  This is called "resolving" in the tutorial and there
   is git-resolve-script command for it.

 - A set of "refs" with the set of objects reachable from them
   constitute a "repository".  Although currently there is no
   provision for a repository to say that its objects are stored
   in this and that object database, multiple repositories can
   share the same object database, and there is not a conceptual
   limit that a repository must retrive its objects from a
   single object database.

 - The act of finding out the object names recorded in "refs" a
   different repository records, optionally updating a local
   "refs" with their values, and retrieving the objects
   reachable from them is called "fetching".  Fetching immediately
   followed by merging is called "pulling".

 - The act of updating "refs" in a different repository with new
   value and populating the object database(s) associated with
   the repository is called "pushing".

 - Currently refs/heads records branch heads of both locally
   created branches and branches fetched from other
   repositories.

 - Currently, fetching always happen against a single branch
   head on a remote repository, and (a remote repository, name
   of the branch) is stored in $GIT_DIR/branches/ as a
   short-hand mechanism.  A file in this directory identifies
   a remote repository by its URL, and the branch to fetch/pull
   from is identified with the URL fragment notation, absense of
   which makes it default to "master".

-jc

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Terminology
  2005-07-31 18:33 ` Terminology Junio C Hamano
@ 2005-07-31 22:38   ` Johannes Schindelin
  2005-08-05 14:57   ` Terminology Johannes Schindelin
  1 sibling, 0 replies; 9+ messages in thread
From: Johannes Schindelin @ 2005-07-31 22:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

I tried to avoid the work. But I'll do it.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Terminology
  2005-07-31 18:33 ` Terminology Junio C Hamano
  2005-07-31 22:38   ` Terminology Johannes Schindelin
@ 2005-08-05 14:57   ` Johannes Schindelin
  2005-08-05 18:57     ` Terminology Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Johannes Schindelin @ 2005-08-05 14:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

I am finally finished with my preliminary survey: I took what you sent as 
a strawman, and inserted what I found (I tried to say only something about 
ambiguous naming):

  - The unit of storage in GIT is called "object"; no other word
    is used and the word "object" is used only for this purpose
    so this one is OK.

  - A 20-byte SHA1 to uniquely identify "objects"; README and
    early Linus messages call this "object name" so does
    tutorial.  Many places say "object SHA1" or just "SHA1".

"Object" is short for "immutable object". git-cat-file.txt says
"repository object".

  - An "object database" stores a set of "objects", and an
    individial object can be retrieved by giving it its object
    name.

Tutorial calls it an "object store". git-fsck-cache.txt names it
"database" at first, but then also uses "object pool".

  - Storing a regular file or a symlink in the object database
    results in a "blob object" created.  You cannot directly
    store filesystem directory, but a collection of blob objects
    and other tree objects can be recorded as a "tree object"
    which corresponds to this notion.

  - $GIT_INDEX_FILE is "index file", which is a collection of
    "cache entries".  The former is sometimes called "cache
    file", the latter just "cache".

Tutorial says "cache" aka "index". Though technically, a cache
is the index file _plus_ the related objects in the object database.
git-update-cache.txt even makes the difference between the "index"
and the "directory cache".

  - the directory which corresponds to the top of the hierarchy
    described in the index file; I've seen words like "working
    tree", "working directory", "work tree" used.

The tutorial initially says "working tree", but then "working
directory". Usually, a directory does not include its
subdirectories, though. git-apply-patch-script.txt, git-apply.txt,
git-hash-object.txt, git-read-tree.txt
use "work tree". git-checkout-cache.txt, git-commit-tree.txt,
git-diff-cache.txt, git-ls-tree.txt, git-update-cache.txt contain
"working directory". git-diff-files.txt talks about a "working tree".

  - When the stat information a cache entry records matches what
    is in the work tree, the entry is called "clean" or
    "up-to-date".  The opposite is "dirty" or "not up-to-date".

  - An index file can be in "merged" or "unmerged" state.  The
    former is when it does not have anything but stage 0 entries,
    the latter otherwise.

That seems to be unambiguous (sometimes it's called "index",
sometimes "index file"; I don't think that matters).

  - An merged index file can be written as a "tree object", which
    is technically a set of interconnected tree objects but we
    equate it with the toplevel tree object with this set.

  - A "tree object" can be recorded as a part of a "commit
    object".  The tree object is said to be "associated with" the
    commit object.

In diffcore.txt, "changeset" is used in place of "commit".

  - A "tag object" can be recorded as a pointer to another object
    of any type. The act of following the pointer a tag object
    holds (this can go recursively) until we get to a non-tag
    object is sometimes called "resolving the tag".

  - The following objects are collectively called "tree-ish": a
    tree object, a commit object, a tag object that resolves to
    either a commit or a tree object, and can be given to
    commands that expect to work on a tree object.

We could call this category an "ent".

  - The files under $GIT_DIR/refs record object names, and are
    called "refs".  What is under refs/heads/ are called "heads",
    refs/tags/ "tags".  Typically, they are either object names
    of commit objects or tag objects that resolve to commit
    objects, but a tag can point at any object.

The tutorial never calls them "refs", but instead "references".

  - A "head" is always an object name of a commit, and marks the
    latest commit in one line of development.  A line of
    development is often called a "branch".  We sometimes use the
    word "branch head" to stress the fact that we are talking
    about a single commit that is the latest one in a "branch".

In the tutorial, the latter is used in reverse: it talks about a
"HEAD development branch" and a "HEAD branch".

I find it a little bit troublesome that $GIT_DIR/branches does not
really refer to a branch, but rather to a (possibly remote) repository.

  - Combining the states from more than one lines of developments
    is called "merging" and typically done between two branch
    heads.  This is called "resolving" in the tutorial and there
    is git-resolve-script command for it.

  - A set of "refs" with the set of objects reachable from them
    constitute a "repository".  Although currently there is no
    provision for a repository to say that its objects are stored
    in this and that object database, multiple repositories can
    share the same object database, and there is not a conceptual
    limit that a repository must retrive its objects from a
    single object database.

This is referred to as "git archive" in the tutorial at first,
and later as "repository". However, in "Copying archives", a
very confusing explanation tells us that a "repository" normally
is a "working tree", where I would rather say that the repository
lives inside a hidden subdirectory of the working tree.

git-fsck-cache.txt talks about an "archive", too. However, it then
says "valid tree", when sureley a repository is meant.

Everywhere else, it is called "repository".

  - The act of finding out the object names recorded in "refs" a
    different repository records, optionally updating a local
    "refs" with their values, and retrieving the objects
    reachable from them is called "fetching".  Fetching immediately
    followed by merging is called "pulling".

In that sense, git-http-pull would be more appropriately named
git-http-fetch, and analogous git-ssh-pull.

Also, git-pull-script.txt says "Pull and merge", contradicting this
definition.

  - The act of updating "refs" in a different repository with new
    value and populating the object database(s) associated with
    the repository is called "pushing".

  - Currently refs/heads records branch heads of both locally
    created branches and branches fetched from other
    repositories.

  - Currently, fetching always happen against a single branch
    head on a remote repository, and (a remote repository, name
    of the branch) is stored in $GIT_DIR/branches/ as a
    short-hand mechanism.  A file in this directory identifies
    a remote repository by its URL, and the branch to fetch/pull
    from is identified with the URL fragment notation, absense of
    which makes it default to "master".

  - a "pack" usually consists of two files: a file containing objects
    in a compressed format, and an index to the first file. If the
    pack is uncompressed at once (e.g. when git-clone is called), the
    index is not necessary.

git-pack-objects calls this a "packed archive" first, but then reverts
to "pack". git-show-index.txt and git-verify-pack.txt call the .pack file
"packed GIT archive", and the index "idx file". git-unpack-objects.txt
calls the .pack file "pack archive".

I'd also add a short explanation of the following unambiguous terms:

"plumbing", also referred to as "core": the basic set of programs and
scripts usable to half-gods like Linus.

"porcelain", also referred to as "SCM": a thin layer over the plumbing
making GIT usage nice to regular people.

"type": one of the identifiers "commit","tree","tag" and "blob" describing
the type of an object.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Terminology
  2005-08-05 14:57   ` Terminology Johannes Schindelin
@ 2005-08-05 18:57     ` Linus Torvalds
  2005-08-05 19:53       ` Terminology barkalow
  2005-08-05 23:07       ` Terminology Johannes Schindelin
  0 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2005-08-05 18:57 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

On Fri, 5 Aug 2005, Johannes Schindelin wrote:
> 
> Tutorial says "cache" aka "index". Though technically, a cache
> is the index file _plus_ the related objects in the object database.
> git-update-cache.txt even makes the difference between the "index"
> and the "directory cache".

I think we should globally rename it to "index".

The "directory cache" and later "cache" naming came from when I started
doing the work - before git was even git at all, and had no backing store
what-so-ever, I started out writing "cache.h" and "read-cache.c", and it
was really first a trial at doing a totally SCM-neutral directory cache
front-end.

You don't even see that in the git revision history, because that was 
before git was self-hosting - the project was partly started to also work 
as possibly just a fast front-end to something that wasn't as fast (ie 
think something like a front-end to make "monotone" work better).

So the "directory cache" and "cache" naming comes from that historical 
background: it was really started as a front-end cache, and in fact the 
".git" directory was called ".dircache" initially. You can see some of 
that in the very earliest git releases: by then I had already done the 
backing store, and the thing was already called "git", but the "dircache" 
naming still remains in places.

For example, here's my "backup" target in the initial checkin:

	backup: clean
		cd .. ; tar czvf dircache.tar.gz dir-cache

which shows that not only did I call the resulting tar file "dircache", 
the directory I was developing stuff in was called "dir-cache" as well ;)

The index obviously ended up doing a lot more, and especially with the
different stages it became much more than just a directory cache thing:  
it's integral to how git does the fast part of a merge. So we should call
it "index" and edit out the old "cache" and "director cache" naming
entirely.

>   - the directory which corresponds to the top of the hierarchy
>     described in the index file; I've seen words like "working
>     tree", "working directory", "work tree" used.
> 
> The tutorial initially says "working tree", but then "working
> directory". Usually, a directory does not include its
> subdirectories, though. git-apply-patch-script.txt, git-apply.txt,
> git-hash-object.txt, git-read-tree.txt
> use "work tree". git-checkout-cache.txt, git-commit-tree.txt,
> git-diff-cache.txt, git-ls-tree.txt, git-update-cache.txt contain
> "working directory". git-diff-files.txt talks about a "working tree".

I think we should use "working tree" throughout, since "working directory" 
is unix-speak for "pwd" and has a totally different meaning.

>   - When the stat information a cache entry records matches what
>     is in the work tree, the entry is called "clean" or
>     "up-to-date".  The opposite is "dirty" or "not up-to-date".
>
>   - An index file can be in "merged" or "unmerged" state.  The
>     former is when it does not have anything but stage 0 entries,
>     the latter otherwise.

I think the "unmerged" case should be mentioned in the "cache entry" 
thing, since it's really a per-entry state, exactly like "dirty/clean".

Then, explaining a "unmerged index" as being an index file with some 
entries being unmerged makes more sense. 

As it is, the above "explains" an index file as being unmerged by talking 
about "stage 0 entries", which in turn haven't been explained at all.

>   - A "tree object" can be recorded as a part of a "commit
>     object".  The tree object is said to be "associated with" the
>     commit object.
> 
> In diffcore.txt, "changeset" is used in place of "commit".

We really should use "commit" throughout. ex-BK users sometimes lip into
"changeset" (which in turn is probably because BK had these per-file
commits too - deltas), but there's no point in the distinction in git. A 
commit is a commit.

>   - The following objects are collectively called "tree-ish": a
>     tree object, a commit object, a tag object that resolves to
>     either a commit or a tree object, and can be given to
>     commands that expect to work on a tree object.
> 
> We could call this category an "ent".

LOL. You are a total geek.

>   - The files under $GIT_DIR/refs record object names, and are
>     called "refs".  What is under refs/heads/ are called "heads",
>     refs/tags/ "tags".  Typically, they are either object names
>     of commit objects or tag objects that resolve to commit
>     objects, but a tag can point at any object.
> 
> The tutorial never calls them "refs", but instead "references".

It might be worth saying explicitly that a reference is nothing but the 
same thing as a "object name" aka "sha1". And make it very clear that it 
can point to any object type, although commits tend to be the most common 
thng you want to reference. That then leads naturally into a very specific 
_subcase_ of refs, namely a "head":

>   - A "head" is always an object name of a commit, and marks the
>     latest commit in one line of development.  A line of
>     development is often called a "branch".  We sometimes use the
>     word "branch head" to stress the fact that we are talking
>     about a single commit that is the latest one in a "branch".
> 
> In the tutorial, the latter is used in reverse: it talks about a
> "HEAD development branch" and a "HEAD branch".
> 
> I find it a little bit troublesome that $GIT_DIR/branches does not
> really refer to a branch, but rather to a (possibly remote) repository.

Yes, I find the $GIT_DIR/branches naming to be confusing too. 

They are really pointers to external repositories, and "branch" is
confusing. But I suspect the confusion is partly due to me, since I used
to think (and argue) that we should aim for a "separate repositories for
separate branches" approach.

That single-branch-multiple-repository mentality came from my BK
background, and from me thinking that local branches would be confusing.  

Jeff has been dragging me into the "local branches are good" camp, and
these days I'm obviously a big believer. But the blame for this confusion
falls squarely on me, with Pasky picking it up from there into cogito and
using the "branches" name, and then git pickin git back up from cogito
through trying to compatible.

>   - The act of finding out the object names recorded in "refs" a
>     different repository records, optionally updating a local
>     "refs" with their values, and retrieving the objects
>     reachable from them is called "fetching".  Fetching immediately
>     followed by merging is called "pulling".
> 
> In that sense, git-http-pull would be more appropriately named
> git-http-fetch, and analogous git-ssh-pull.
> 
> Also, git-pull-script.txt says "Pull and merge", contradicting this
> definition.

To confuse things even more, cogito calls a fetch "pull" and a pull 
"update".

I personally think "fetch" is unambigious: it's just the act of fetching, 
with no "merge" activity at all. So we should use that.

What to call a "fetch+merge" is a bit ambiguous. I obviously prefer
"pull", but cogito disagrees, and you're right, "git-http-pull" and
"git-ssh-pull" both really do just fetches.

But I think "update" isn't right either: to me, update would be the 
non-merging kind (ie I think "update" implies "refresh" which in turn 
implies a "fetch"-like behaviour).

So I'd vote for making the suggested definition official: "fetch" means
fetching the data, and "pull" means "fetch + merge". 

And "update" would be just something that happens to refs: A "fetch" will
obviously update the reference that we had to the external tree we fetched
from, while a commit/merge/whatever will obviously update the current
branch reference (HEAD).

So "update" would really be just a small technical detail.

>   - a "pack" usually consists of two files: a file containing objects
>     in a compressed format, and an index to the first file. If the
>     pack is uncompressed at once (e.g. when git-clone is called), the
>     index is not necessary.
> 
> git-pack-objects calls this a "packed archive" first, but then reverts
> to "pack". git-show-index.txt and git-verify-pack.txt call the .pack file
> "packed GIT archive", and the index "idx file". git-unpack-objects.txt
> calls the .pack file "pack archive".

We should just call them packs. An archive can be multiple packs and lots 
of non-packed objects too. 

> "plumbing", also referred to as "core": the basic set of programs and
> scripts usable to half-gods like Linus.
> 
> "porcelain", also referred to as "SCM": a thin layer over the plumbing
> making GIT usage nice to regular people.
> 
> "type": one of the identifiers "commit","tree","tag" and "blob" describing
> the type of an object.

Yes. Some old docs may call this type a "tag", since I was really thinking
in not in the SCM meaning at all, but in the _computer_architecture_
meaning, where people usually call objects with enforced types "tagged".

Ie from a computer architecture standpoint you can have "tagged memory" or
"tagged pointers", and LISP machines are often implemented with the
pointers containing the type ("tag") of the thing they point to (for
example, the low two bits might be the "tag" on the pointer). So I was
talking about "tagged objects" when I just meant that the type of the
object was embedded in the object itself, the way tagged memory
architectures work.

In retrospect, that naming _really_ confused some people, I know I had
trouble explaining git concepts to David Wheeler because I used "tagged
objects" _not_ to mean a SCM style "tag", but to mean "typed objects".

If somebody sees an old reference to "object tags", those should all be 
fixed to say "object types".

			Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Terminology
  2005-08-05 18:57     ` Terminology Linus Torvalds
@ 2005-08-05 19:53       ` barkalow
  2005-08-05 23:08         ` Terminology Johannes Schindelin
  2005-08-05 23:07       ` Terminology Johannes Schindelin
  1 sibling, 1 reply; 9+ messages in thread
From: barkalow @ 2005-08-05 19:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Junio C Hamano, git

On Fri, 5 Aug 2005, Linus Torvalds wrote:

> On Fri, 5 Aug 2005, Johannes Schindelin wrote:
> 
> >   - The files under $GIT_DIR/refs record object names, and are
> >     called "refs".  What is under refs/heads/ are called "heads",
> >     refs/tags/ "tags".  Typically, they are either object names
> >     of commit objects or tag objects that resolve to commit
> >     objects, but a tag can point at any object.
> > 
> > The tutorial never calls them "refs", but instead "references".
> 
> It might be worth saying explicitly that a reference is nothing but the 
> same thing as a "object name" aka "sha1".

Well, it's an object name stored in a file. This adds a layer of 
indirection and a meaningful name.

> So I'd vote for making the suggested definition official: "fetch" means
> fetching the data, and "pull" means "fetch + merge". 

So what's the converse of "fetch" (to rename git-ssh-push to)? 
Maybe "ship"?

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Terminology
  2005-08-05 18:57     ` Terminology Linus Torvalds
  2005-08-05 19:53       ` Terminology barkalow
@ 2005-08-05 23:07       ` Johannes Schindelin
  1 sibling, 0 replies; 9+ messages in thread
From: Johannes Schindelin @ 2005-08-05 23:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Hi,

wow! What a long mail! But I probably deserved it, quoting that lengthy 
mail from Junio...

On Fri, 5 Aug 2005, Linus Torvalds wrote:

> On Fri, 5 Aug 2005, Johannes Schindelin wrote:
> > 
> > Tutorial says "cache" aka "index". Though technically, a cache
> > is the index file _plus_ the related objects in the object database.
> > git-update-cache.txt even makes the difference between the "index"
> > and the "directory cache".
> 
> I think we should globally rename it to "index".

Totally agree. The index is a central concept. But let's keep in mind -- 
and make future Documentation/ readers do the same -- that the index,
without the referenced objects in the objects database, is only a 
skeleton.

> The "directory cache" and later "cache" naming came from when I started
> doing the work - before git was even git at all, and had no backing store
> what-so-ever, I started out writing "cache.h" and "read-cache.c", and it
> was really first a trial at doing a totally SCM-neutral directory cache
> front-end.
> 
> You don't even see that in the git revision history, because that was 
> before git was self-hosting - the project was partly started to also work 
> as possibly just a fast front-end to something that wasn't as fast (ie 
> think something like a front-end to make "monotone" work better).
> 
> So the "directory cache" and "cache" naming comes from that historical 
> background: it was really started as a front-end cache, and in fact the 
> ".git" directory was called ".dircache" initially. You can see some of 
> that in the very earliest git releases: by then I had already done the 
> backing store, and the thing was already called "git", but the "dircache" 
> naming still remains in places.
> 
> For example, here's my "backup" target in the initial checkin:
> 
> 	backup: clean
> 		cd .. ; tar czvf dircache.tar.gz dir-cache
> 
> which shows that not only did I call the resulting tar file "dircache", 
> the directory I was developing stuff in was called "dir-cache" as well ;)
> 
> The index obviously ended up doing a lot more, and especially with the
> different stages it became much more than just a directory cache thing:  
> it's integral to how git does the fast part of a merge. So we should call
> it "index" and edit out the old "cache" and "director cache" naming
> entirely.

I quoted this entirely, for a good reason: Linus, one day you really 
should write a Wikibook about all the "small" projects you started. I 
still remember the words "I'm doing a (free) operating system (just a 
hobby, won't be big...". There's so much to be learnt about good 
engineering. And people do want to add there anecdotes to it.

> >   - the directory which corresponds to the top of the hierarchy
> >     described in the index file; I've seen words like "working
> >     tree", "working directory", "work tree" used.
> > 
> > The tutorial initially says "working tree", but then "working
> > directory". Usually, a directory does not include its
> > subdirectories, though. git-apply-patch-script.txt, git-apply.txt,
> > git-hash-object.txt, git-read-tree.txt
> > use "work tree". git-checkout-cache.txt, git-commit-tree.txt,
> > git-diff-cache.txt, git-ls-tree.txt, git-update-cache.txt contain
> > "working directory". git-diff-files.txt talks about a "working tree".
> 
> I think we should use "working tree" throughout, since "working directory" 
> is unix-speak for "pwd" and has a totally different meaning.

I hoped so much.

> >   - An index file can be in "merged" or "unmerged" state.  The
> >     former is when it does not have anything but stage 0 entries,
> >     the latter otherwise.
> 
> I think the "unmerged" case should be mentioned in the "cache entry" 
> thing, since it's really a per-entry state, exactly like "dirty/clean".
> 
> Then, explaining a "unmerged index" as being an index file with some 
> entries being unmerged makes more sense. 
> 
> As it is, the above "explains" an index file as being unmerged by talking 
> about "stage 0 entries", which in turn haven't been explained at all.

That's right. We probably should copy a bit from git-read-tree.txt, or at 
least reference it in the glossary.

> >   - A "tree object" can be recorded as a part of a "commit
> >     object".  The tree object is said to be "associated with" the
> >     commit object.
> > 
> > In diffcore.txt, "changeset" is used in place of "commit".
> 
> We really should use "commit" throughout. ex-BK users sometimes lip into
> "changeset" (which in turn is probably because BK had these per-file
> commits too - deltas), but there's no point in the distinction in git. A 
> commit is a commit.

That is, if you don't do "git-update-cache <single-file>" (which is not 
possible with some porcelains).

Apart from that: I think that it is quite important to make the 
distinction between a "commit" and a "commit object". Newbies (in that 
case, people working with CVS are newbies to the concepts of git, too) 
tend understand better what you say, if you make that distinction very 
clearly, IMHO.

> >   - The following objects are collectively called "tree-ish": a
> >     tree object, a commit object, a tag object that resolves to
> >     either a commit or a tree object, and can be given to
> >     commands that expect to work on a tree object.
> > 
> > We could call this category an "ent".
> 
> LOL. You are a total geek.

I take that as a compliment :-)

> >   - The files under $GIT_DIR/refs record object names, and are
> >     called "refs".  What is under refs/heads/ are called "heads",
> >     refs/tags/ "tags".  Typically, they are either object names
> >     of commit objects or tag objects that resolve to commit
> >     objects, but a tag can point at any object.
> > 
> > The tutorial never calls them "refs", but instead "references".
> 
> It might be worth saying explicitly that a reference is nothing but the 
> same thing as a "object name" aka "sha1". And make it very clear that it 
> can point to any object type, although commits tend to be the most common 
> thng you want to reference. That then leads naturally into a very specific 
> _subcase_ of refs, namely a "head":

Do not forget signed tags! Strictly said, these are references to 
references which are signed.

> >   - A "head" is always an object name of a commit, and marks the
> >     latest commit in one line of development.  A line of
> >     development is often called a "branch".  We sometimes use the
> >     word "branch head" to stress the fact that we are talking
> >     about a single commit that is the latest one in a "branch".
> > 
> > In the tutorial, the latter is used in reverse: it talks about a
> > "HEAD development branch" and a "HEAD branch".

Actually, I don't think it a good idea to talk about a "HEAD branch" or 
"development branch". I'd prefer "branch".

> > I find it a little bit troublesome that $GIT_DIR/branches does not
> > really refer to a branch, but rather to a (possibly remote) repository.
> 
> Yes, I find the $GIT_DIR/branches naming to be confusing too. 

I don't know if we can hide it from the users, or if we should bite the 
apple and rename it to "remotes/", or even better "repositories/".

> Jeff has been dragging me into the "local branches are good" camp, and
> these days I'm obviously a big believer.

I think that Jeff really deserves the credit for this. Yours truly was 
convinced that one repository should hold one branch only. But I was 
convinced otherwise, too.

> >   - The act of finding out the object names recorded in "refs" a
> >     different repository records, optionally updating a local
> >     "refs" with their values, and retrieving the objects
> >     reachable from them is called "fetching".  Fetching immediately
> >     followed by merging is called "pulling".
> > 
> > In that sense, git-http-pull would be more appropriately named
> > git-http-fetch, and analogous git-ssh-pull.
> > 
> > Also, git-pull-script.txt says "Pull and merge", contradicting this
> > definition.
> 
> To confuse things even more, cogito calls a fetch "pull" and a pull 
> "update".

I really think this should be unified. Pasky?

> I personally think "fetch" is unambigious: it's just the act of fetching, 
> with no "merge" activity at all. So we should use that.

Agree.

> What to call a "fetch+merge" is a bit ambiguous. I obviously prefer
> "pull", but cogito disagrees, and you're right, "git-http-pull" and
> "git-ssh-pull" both really do just fetches.

Let's rename them before 1.0.

> But I think "update" isn't right either: to me, update would be the 
> non-merging kind (ie I think "update" implies "refresh" which in turn 
> implies a "fetch"-like behaviour).
> 
> So I'd vote for making the suggested definition official: "fetch" means
> fetching the data, and "pull" means "fetch + merge". 

This should be discussed. Obviously, I come from CVS and understand 
"update" to be what cogito says it is. But then, it is also true that 
CVS's usage of "update" is misleading, because it really does a merge, not 
forcing the user to do a commit before merge (because that is not possible 
in CVS). So basically, I agree: "pull" is unambiguous as far as I am 
concerned.

> >   - a "pack" usually consists of two files: a file containing objects
> >     in a compressed format, and an index to the first file. If the
> >     pack is uncompressed at once (e.g. when git-clone is called), the
> >     index is not necessary.
> > 
> > git-pack-objects calls this a "packed archive" first, but then reverts
> > to "pack". git-show-index.txt and git-verify-pack.txt call the .pack file
> > "packed GIT archive", and the index "idx file". git-unpack-objects.txt
> > calls the .pack file "pack archive".
> 
> We should just call them packs. An archive can be multiple packs and lots 
> of non-packed objects too. 

And they can have pack indices which do not relate at all to the central 
GIT index.

> > "type": one of the identifiers "commit","tree","tag" and "blob" describing
> > the type of an object.
> 
> Yes. Some old docs may call this type a "tag", since I was really thinking
> in not in the SCM meaning at all, but in the _computer_architecture_
> meaning, where people usually call objects with enforced types "tagged".
> 
> Ie from a computer architecture standpoint you can have "tagged memory" or
> "tagged pointers", and LISP machines are often implemented with the
> pointers containing the type ("tag") of the thing they point to (for
> example, the low two bits might be the "tag" on the pointer). So I was
> talking about "tagged objects" when I just meant that the type of the
> object was embedded in the object itself, the way tagged memory
> architectures work.
> 
> In retrospect, that naming _really_ confused some people, I know I had
> trouble explaining git concepts to David Wheeler because I used "tagged
> objects" _not_ to mean a SCM style "tag", but to mean "typed objects".
> 
> If somebody sees an old reference to "object tags", those should all be 
> fixed to say "object types".

Agree.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Terminology
  2005-08-05 19:53       ` Terminology barkalow
@ 2005-08-05 23:08         ` Johannes Schindelin
  0 siblings, 0 replies; 9+ messages in thread
From: Johannes Schindelin @ 2005-08-05 23:08 UTC (permalink / raw)
  To: barkalow; +Cc: Linus Torvalds, Junio C Hamano, git

Hi,

On Fri, 5 Aug 2005, barkalow@iabervon.org wrote:

> On Fri, 5 Aug 2005, Linus Torvalds wrote:
> 
> > On Fri, 5 Aug 2005, Johannes Schindelin wrote:
> > 
> > >   - The files under $GIT_DIR/refs record object names, and are
> > >     called "refs".  What is under refs/heads/ are called "heads",
> > >     refs/tags/ "tags".  Typically, they are either object names
> > >     of commit objects or tag objects that resolve to commit
> > >     objects, but a tag can point at any object.
> > > 
> > > The tutorial never calls them "refs", but instead "references".
> > 
> > It might be worth saying explicitly that a reference is nothing but the 
> > same thing as a "object name" aka "sha1".
> 
> Well, it's an object name stored in a file. This adds a layer of 
> indirection and a meaningful name.

Yes.

> > So I'd vote for making the suggested definition official: "fetch" means
> > fetching the data, and "pull" means "fetch + merge". 
> 
> So what's the converse of "fetch" (to rename git-ssh-push to)? 
> Maybe "ship"?

I actually like "push". You know, not everybody agrees that "push" is the 
opposite of "pull"...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Terminology
@ 2005-08-06  0:54 linux
  0 siblings, 0 replies; 9+ messages in thread
From: linux @ 2005-08-06  0:54 UTC (permalink / raw)
  To: barkalow; +Cc: git

> So what's the converse of "fetch" (to rename git-ssh-push to)? 
> Maybe "ship"?

The opposite of "fetch" is "throw" or "toss".
(Just avoid tossing cookies or off.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-08-06  0:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-06  0:54 Terminology linux
  -- strict thread matches above, loose matches on Subject: below --
2005-07-31 13:52 Terminology Johannes Schindelin
2005-07-31 18:33 ` Terminology Junio C Hamano
2005-07-31 22:38   ` Terminology Johannes Schindelin
2005-08-05 14:57   ` Terminology Johannes Schindelin
2005-08-05 18:57     ` Terminology Linus Torvalds
2005-08-05 19:53       ` Terminology barkalow
2005-08-05 23:08         ` Terminology Johannes Schindelin
2005-08-05 23:07       ` Terminology Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).