git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Index Fileformat: stat(2) info necessary? What for?
@ 2013-08-26 12:34 Erik Bernoth
  2013-08-26 22:17 ` Jeff King
  0 siblings, 1 reply; 2+ messages in thread
From: Erik Bernoth @ 2013-08-26 12:34 UTC (permalink / raw)
  To: git

Hi,

I am still working on implementing git in Python for self education
purposes. Implementing the Index in memory was no problem after I
understood how its done with help of Andreas Ericsson and Junio C
Hamano.

Now I want to store an Index state to the filesystem in a
git-compatible file format. I looked up what the Git documentation has
to say about that [1]. Now there's a lot of information (all the
stat(2) stuff) that gets stored about the staged files, which I never
needed for file-IO in Python or Java. In my eyes if a person would be
cloning my git repository he wouldn't need it as well, because the new
inode on his system will probably be different from mine and applying
the access rights onto the cloning user id and group id would also
make sense, because that user introduced that file to that system.

Thus I am now missing concrete experience in when this stat(2)
information comes in handy or if it would be completely okay in a
python-git implementation to just store the info shown with `git
ls-files -s` to a file, maybe zlib.compressed like a git object. Of
course I would then lose the compatibility with git repositories,
which is a shame even if it would make sense. What is your opinion?


[1] https://github.com/git/git/blob/master/Documentation/technical/index-format.txt


Cheers
Erik

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Index Fileformat: stat(2) info necessary? What for?
  2013-08-26 12:34 Index Fileformat: stat(2) info necessary? What for? Erik Bernoth
@ 2013-08-26 22:17 ` Jeff King
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2013-08-26 22:17 UTC (permalink / raw)
  To: Erik Bernoth; +Cc: git

On Mon, Aug 26, 2013 at 02:34:18PM +0200, Erik Bernoth wrote:

> Now there's a lot of information (all the stat(2) stuff) that gets
> stored about the staged files, which I never needed for file-IO in
> Python or Java. In my eyes if a person would be cloning my git
> repository he wouldn't need it as well, because the new inode on his
> system will probably be different from mine and applying the access
> rights onto the cloning user id and group id would also make sense,
> because that user introduced that file to that system.

Git does not look at the index at all when cloning; only the actual
objects. So that stat information is not copied (the information in the
clone's index comes from the checkout procedure on the receiving side).

> Thus I am now missing concrete experience in when this stat(2)
> information comes in handy or if it would be completely okay in a
> python-git implementation to just store the info shown with `git
> ls-files -s` to a file, maybe zlib.compressed like a git object. Of
> course I would then lose the compatibility with git repositories,
> which is a shame even if it would make sense. What is your opinion?

The stat information is there for performance. Think about how you would
implement "git diff" between the working tree and the index (or a tree).

Naively, you would have to open each file and either compare it byte for
byte to what is committed, or hash it and compare the hash to what is
committed. We have to open the files anyway to show a real patch, of
course, but most files haven't been modified, and we would like to avoid
opening them at all.

By keeping the stat information, we can check that the version in the
index is at some sha1 X, and then check that the stat information
matches what is on the disk, and know that what is on disk also has sha1
X. That makes comparing to the index or a tree very cheap; we only call
lstat() once per file, rather than opening and hashing all of the bytes.

-Peff

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-08-26 22:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-26 12:34 Index Fileformat: stat(2) info necessary? What for? Erik Bernoth
2013-08-26 22:17 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).