git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* A Basic Git Question About File Tracking
@ 2011-10-04  0:53 Jon Forrest
  2011-10-04  1:10 ` Jonathan Nieder
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Forrest @ 2011-10-04  0:53 UTC (permalink / raw)
  To: git

I've been reading the Pro Git book. I'm having trouble really
understanding the concept of file tracking. Here's where
my confusion starts.

The Pro Git book says "Untracked basically means that Git sees a
file you didn’t have in the previous snapshot (commit)".

Is this right? I can easily think of a counter example.
Let's say you put a new file in the working directory of a
Git repo. Then you "git add" it. At this point, the file hasn't
been in any commit. Yet, 'git status' doesn't show the file
as being untracked. Should that statement be "Untracked basically
means that Git sees a file you didn’t have in the previous
snapshot (commit) or a file that hasn't been staged."?

One additional confusing thing is that "git add" apparently
both starts tracking a file and puts it in the index the
first time a file is added. Thereafter, "git add" only puts
the file in the index. One of my research projects is to understand
what goes on internally when a file is tracked.

Jon Forrest

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Basic Git Question About File Tracking
  2011-10-04  0:53 A Basic Git Question About File Tracking Jon Forrest
@ 2011-10-04  1:10 ` Jonathan Nieder
  2011-10-04  1:14   ` Jon Forrest
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Nieder @ 2011-10-04  1:10 UTC (permalink / raw)
  To: Jon Forrest; +Cc: git

Hi!

Jon Forrest wrote:

> The Pro Git book says "Untracked basically means that Git sees a
> file you didn’t have in the previous snapshot (commit)".

Yep, that's a bug in the Pro Git book.  "Untracked" means "not in
the index", nothing more, nothing less.

I believe Scott takes patches[1]. :)

Hope that helps,
Jonathan

[1] https://github.com/progit/progit

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Basic Git Question About File Tracking
  2011-10-04  1:10 ` Jonathan Nieder
@ 2011-10-04  1:14   ` Jon Forrest
  2011-10-04  1:22     ` Jonathan Nieder
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Forrest @ 2011-10-04  1:14 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On 10/3/2011 6:10 PM, Jonathan Nieder wrote:
> Hi!

Thanks for the quick reply.

> Jon Forrest wrote:
>
>> The Pro Git book says "Untracked basically means that Git sees a
>> file you didn’t have in the previous snapshot (commit)".
>
> Yep, that's a bug in the Pro Git book.  "Untracked" means "not in
> the index", nothing more, nothing less.

But your definition doesn't include files that
have been committed. In the following trivial case
in a new git repository

cp /etc/passwd x
git add x
git commit -m"fooling around"

is "x" tracked? Your definition says it isn't
but "git status" makes me think it is.

Sorry to be so pedantic.

Jon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Basic Git Question About File Tracking
  2011-10-04  1:14   ` Jon Forrest
@ 2011-10-04  1:22     ` Jonathan Nieder
  2011-10-09  0:08       ` Jon Forrest
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Nieder @ 2011-10-04  1:22 UTC (permalink / raw)
  To: Jon Forrest; +Cc: git

Jon Forrest wrote:
> On 10/3/2011 6:10 PM, Jonathan Nieder wrote:

>> "Untracked" means "not in
>> the index", nothing more, nothing less.
>
> But your definition doesn't include files that
> have been committed. In the following trivial case
> in a new git repository
>
> cp /etc/passwd x
> git add x
> git commit -m"fooling around"
>
> is "x" tracked? Your definition says it isn't
> but "git status" makes me think it is.

Yes, "x" is tracked.  Moreover, "x" is in the index.  You can
list files in the index with the "git ls-files -s" command.

Does that help?

> Sorry to be so pedantic.

No problem --- it's good to clarify these things (especially if it
results in finding documentation that should be clarified, too).

Jonathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Basic Git Question About File Tracking
  2011-10-04  1:22     ` Jonathan Nieder
@ 2011-10-09  0:08       ` Jon Forrest
  2011-10-09  1:17         ` Jakub Narebski
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Forrest @ 2011-10-09  0:08 UTC (permalink / raw)
  To: git

On 10/3/2011 6:22 PM, Jonathan Nieder wrote:

[I'm just getting back to this question. I had accidentally
sent this follow up directly to Jonathan but I want to
continue this on the email list.]

> Yes, "x" is tracked.  Moreover, "x" is in the index.  You can
> list files in the index with the "git ls-files -s" command.

This spoils my understanding of what the index
is. I had been thinking that after you add files
to the index, and then commit, the index is then
empty. In other words, whatever's in the index
gets committed, and then the index is cleaned.

On the other hand, if the definition of a tracked
file is a file that's in the index, then this definitely
clears up my understanding of tracked files.

If every file that's 'git add'ed stays in the
index, how does git know which files to commit?

I can't prove it but I suspect that many git beginners
also are confused by this.

Thanks for your replies.

Jon Forrest

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Basic Git Question About File Tracking
  2011-10-09  0:08       ` Jon Forrest
@ 2011-10-09  1:17         ` Jakub Narebski
  2011-10-09  2:42           ` A Basic Git Question About File Tracking [ANSWERED] Jon Forrest
  2011-10-09 16:57           ` A Basic Git Question About File Tracking Scott Chacon
  0 siblings, 2 replies; 9+ messages in thread
From: Jakub Narebski @ 2011-10-09  1:17 UTC (permalink / raw)
  To: Jon Forrest; +Cc: git, Jonathan Nieder

Jon Forrest <nobozo@gmail.com> writes:

> On 10/3/2011 6:22 PM, Jonathan Nieder wrote:
> 
> [I'm just getting back to this question. I had accidentally
> sent this follow up directly to Jonathan but I want to
> continue this on the email list.]
> 
> > Yes, "x" is tracked.  Moreover, "x" is in the index.  You can
> > list files in the index with the "git ls-files -s" command.
> 
> This spoils my understanding of what the index
> is. I had been thinking that after you add files
> to the index, and then commit, the index is then
> empty. In other words, whatever's in the index
> gets committed, and then the index is cleaned.
> 
> On the other hand, if the definition of a tracked
> file is a file that's in the index, then this definitely
> clears up my understanding of tracked files.
> 
> If every file that's 'git add'ed stays in the
> index, how does git know which files to commit?
> 
> I can't prove it but I suspect that many git beginners
> also are confused by this.

You seem to be under [false] impression that git commit is about
_changes_ / _changeset_.

It is not true.  What is stored in git commit object is (pointer to)
_snapshot_ of a state of a project at given time.  This means that
"git commit" creates a tree object out of state of the index, and
creates commit object that points to said newly created tree, and has
version you started work from as its parent.  It is commit remember
the previous version that allows to turn commit into changeset.

Hopefully that would clear up your confusion.
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Basic Git Question About File Tracking [ANSWERED]
  2011-10-09  1:17         ` Jakub Narebski
@ 2011-10-09  2:42           ` Jon Forrest
  2011-10-09  9:37             ` Jakub Narebski
  2011-10-09 16:57           ` A Basic Git Question About File Tracking Scott Chacon
  1 sibling, 1 reply; 9+ messages in thread
From: Jon Forrest @ 2011-10-09  2:42 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On 10/8/2011 6:17 PM, Jakub Narebski wrote:

> You seem to be under [false] impression that git commit is about
> _changes_ / _changeset_.

This is correct. The Pro Git book says:

"You stage these modified files and then commit
all your staged changes"

Plus, even "git status" tells me

$ git status
# On branch master
# Changes to be committed:

But I see my error. Below is what I hope is a clear
explanation of what I didn't understand. It presumes
that the reader understands the git objects model.
Please let me know if anything is incorrect.
----------
When you "git add" a file two things happen:

1) The file is copied to the git objects tree.
This location where the file is copied depends
on the hash of the file's content.

2) An entry for the file is added to the git index.
This entry includes the same hash that was mentioned
in #1.

A tracked file has an entry in the git index file.
A copy of the file also exists in the objects tree.

When you run 'git status', git computes the hash of
every file in your working directory and looks
up each file in the index. If the file isn't found
then the file is shown as untracked.

When you do a commit, the hash values of everything
in the index are copied into a tree object. The hash
value of the tree object is then placed in a commit object.
No copies of tracked files in the working directory are
made at commit time. This is because the files were already
copied into the objects tree when 'git add' was run.
This is one reason why git commits are so fast.

-----

How's that?

Thanks to everyone for sticking with me on this.

Jon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Basic Git Question About File Tracking [ANSWERED]
  2011-10-09  2:42           ` A Basic Git Question About File Tracking [ANSWERED] Jon Forrest
@ 2011-10-09  9:37             ` Jakub Narebski
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Narebski @ 2011-10-09  9:37 UTC (permalink / raw)
  To: Jon Forrest; +Cc: git, Jonathan Nieder

On Sun, 9 Oct 2011, Jon Forrest wrote:
> On 10/8/2011 6:17 PM, Jakub Narebski wrote:
> 
> > You seem to be under [false] impression that git commit is about
> > _changes_ / _changeset_.
> 
> This is correct. The Pro Git book says:
> 
> "You stage these modified files and then commit
> all your staged changes"
> 
> Plus, even "git status" tells me
> 
> $ git status
> # On branch master
> # Changes to be committed:

Well, that is because the two representations: delta / changeset
("differential") representation and snapshot ("integral") representation
are related, and [in practice] one can be transformed into the other.

Sometimes it is better to think about commit as representing changeset
from parent commit, sometimes it is better to think of a commit as of
snapshot of a state of project.

But under the hood git model is snapshot-based.

> But I see my error. Below is what I hope is a clear
> explanation of what I didn't understand. It presumes
> that the reader understands the git objects model.
> Please let me know if anything is incorrect.
> ----------
> When you "git add" a file two things happen:
> 
> 1) The file is copied to the git objects tree.

Actually it is file _contents_ that is copied to git object _store_.

> This location where the file is copied depends
> on the hash of the file's content.

I'd say that this is unnecessary implementation detail of "loose"
object format.  I would say that _identifier_ of added object is
based on its contents.

> 
> 2) An entry for the file is added to the git index.
> This entry includes the same hash that was mentioned
> in #1.

Yes.
 
> A tracked file has an entry in the git index file.

Yes.

> A copy of the file also exists in the objects tree.

A copy of a _contents_ of a file at specific point of time
exists in object _store_ (not necessary object tree, as it
can be packed).
 
> When you run 'git status', git computes the hash of
> every file in your working directory and looks
> up each file in the index. If the file isn't found
> then the file is shown as untracked.

Sidenote: git stores in the index also stats of a file (modification
time etc.) so it is possible to avoid recomputing the hash of every 
file.
 
> When you do a commit, the hash values of everything
> in the index are copied into a tree object. The hash
> value of the tree object is then placed in a commit object.

True, though I would probably state it a bit differently.

> No copies of tracked files in the working directory are
> made at commit time. This is because the files were already
> copied into the objects tree when 'git add' was run.
> This is one reason why git commits are so fast.

Well, there is also "git commit -a", but it is true that git
copies into object store only those tracked files that changed.

Also I think that the main reason that git commits are fast is
that they are local operation, and not over the network as in the
case of centralized version control systems.
 
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Basic Git Question About File Tracking
  2011-10-09  1:17         ` Jakub Narebski
  2011-10-09  2:42           ` A Basic Git Question About File Tracking [ANSWERED] Jon Forrest
@ 2011-10-09 16:57           ` Scott Chacon
  1 sibling, 0 replies; 9+ messages in thread
From: Scott Chacon @ 2011-10-09 16:57 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jon Forrest, git, Jonathan Nieder

Jon,

2011/10/8 Jakub Narebski <jnareb@gmail.com>:
>> This spoils my understanding of what the index
>> is. I had been thinking that after you add files
>> to the index, and then commit, the index is then
>> empty. In other words, whatever's in the index
>> gets committed, and then the index is cleaned.
>>
>> On the other hand, if the definition of a tracked
>> file is a file that's in the index, then this definitely
>> clears up my understanding of tracked files.
>>
>> If every file that's 'git add'ed stays in the
>> index, how does git know which files to commit?

It may help to read a blog post I put on the Pro Git blog called
"Reset Demystified" that talks about a simplified model of the HEAD,
index and working directory.

http://progit.org/2011/07/11/reset.html

Let me know if that helps.  And you're right, the book should say "not
in the index" rather than "not be in the last commit", that would be
more technically correct. I think at that point in the book I have not
gone into any details about the index yet, so it would be confusing
without more detail.

thanks,
Scott

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-10-09 16:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-04  0:53 A Basic Git Question About File Tracking Jon Forrest
2011-10-04  1:10 ` Jonathan Nieder
2011-10-04  1:14   ` Jon Forrest
2011-10-04  1:22     ` Jonathan Nieder
2011-10-09  0:08       ` Jon Forrest
2011-10-09  1:17         ` Jakub Narebski
2011-10-09  2:42           ` A Basic Git Question About File Tracking [ANSWERED] Jon Forrest
2011-10-09  9:37             ` Jakub Narebski
2011-10-09 16:57           ` A Basic Git Question About File Tracking Scott Chacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).