From: Jeff King <peff@peff.net>
To: Erik Bernoth <erik.bernoth@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: Index Fileformat: stat(2) info necessary? What for?
Date: Mon, 26 Aug 2013 18:17:18 -0400 [thread overview]
Message-ID: <20130826221718.GA12384@sigill.intra.peff.net> (raw)
In-Reply-To: <CAB46HOmMtgD+CtWUS3CQhr+ux1a3JP=hF2Cerd2nmDWzX5pxcw@mail.gmail.com>
On Mon, Aug 26, 2013 at 02:34:18PM +0200, Erik Bernoth wrote:
> Now there's a lot of information (all the stat(2) stuff) that gets
> stored about the staged files, which I never needed for file-IO in
> Python or Java. In my eyes if a person would be cloning my git
> repository he wouldn't need it as well, because the new inode on his
> system will probably be different from mine and applying the access
> rights onto the cloning user id and group id would also make sense,
> because that user introduced that file to that system.
Git does not look at the index at all when cloning; only the actual
objects. So that stat information is not copied (the information in the
clone's index comes from the checkout procedure on the receiving side).
> Thus I am now missing concrete experience in when this stat(2)
> information comes in handy or if it would be completely okay in a
> python-git implementation to just store the info shown with `git
> ls-files -s` to a file, maybe zlib.compressed like a git object. Of
> course I would then lose the compatibility with git repositories,
> which is a shame even if it would make sense. What is your opinion?
The stat information is there for performance. Think about how you would
implement "git diff" between the working tree and the index (or a tree).
Naively, you would have to open each file and either compare it byte for
byte to what is committed, or hash it and compare the hash to what is
committed. We have to open the files anyway to show a real patch, of
course, but most files haven't been modified, and we would like to avoid
opening them at all.
By keeping the stat information, we can check that the version in the
index is at some sha1 X, and then check that the stat information
matches what is on the disk, and know that what is on disk also has sha1
X. That makes comparing to the index or a tree very cheap; we only call
lstat() once per file, rather than opening and hashing all of the bytes.
-Peff
prev parent reply other threads:[~2013-08-26 22:17 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-26 12:34 Index Fileformat: stat(2) info necessary? What for? Erik Bernoth
2013-08-26 22:17 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130826221718.GA12384@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=erik.bernoth@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).