From: Christoph Groth <christoph@grothesque.org>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>, git@vger.kernel.org
Subject: Re: Stat cache in .git/index hinders syncing of repositories
Date: Fri, 24 Jan 2020 10:16:18 +0100 [thread overview]
Message-ID: <8736c5tdst.fsf@drac> (raw)
In-Reply-To: <20200121025311.GA4113372@camp.crustytoothpaste.net> (brian m. carlson's message of "Tue, 21 Jan 2020 02:53:11 +0000")
[-- Attachment #1: Type: text/plain, Size: 2727 bytes --]
brian m. carlson wrote:
> On 2020-01-20 at 23:53:22, Christoph Groth wrote:
> > My point is that it’s not just private data: When I excluded
> > .git/index from synchronization, staging files for a commit was no
> > longer synchronized.
>
> (...)
>
> Storing all of this data in one file means that only one file need be
> mapped into memory and rewritten. Git writes to the index by
> atomically creating a lock file along side of it and writing the new
> contents into it, and then doing an atomic replace. This approach
> wouldn't be possible with multiple files, and any update to it
> wouldn't be atomic.
Thanks a lot for the explanation. To me, it still seems less
satisfying, from a design point of view, to mix state (=what changes
have been staged) with an ephemeral cache that is specific to
a particular file system. Without having thought deeply about it,
I have the impression that it wouldn’t matter if the stat cache and the
“staging state” of the repository would be atomic each on their own.
But I understand now that all of this hardly matters in practice (see
below), so I’m not motivated to work on this, and probably no one else
is. :-)
> However, having said that, nobody has provided a compelling case for
> using multiple files for storing different types of working tree
> state. The existing options are available for cases like yours and
> others', and they work. Since there are clear benefits to the current
> model, including simplicity and robustness, and few downsides, nobody
> has decided to change it.
Indeed, I do see hardly any disadvantages of globally setting
trustctime = false
checkstat = minimal
as I do now. In fact, I wonder what is the purpose of caching the
subsecond part of mtime and the ctime in the first place. Perhaps it
matters for scripted use of git where several operations can occur in
the same second, but even then only changes that keep file sizes
constant would be affected.
> I should add that even if, for some reason, we did add support for
> splitting this data out, I'm not sure if we'd support syncing only
> part of the repository state and blowing away other state. We don't
> really support that now (other than through tools like fetch and
> clone) and I don't think we'd want to encourage that behavior in the
> future.
The stat cache file would not be really part of the state of the
repository, since deleting it would not change anything, but only slow
down the next operation. (That’s at least my understanding currently,
perhaps I’m still overseeing something.)
Brian, Johannes, Junio, thanks a lot for taking the time to clarify this
issue.
Christoph
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
prev parent reply other threads:[~2020-01-24 9:16 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-17 23:57 Stat cache in .git/index hinders syncing of repositories Christoph Groth
2020-01-18 18:15 ` Junio C Hamano
2020-01-18 19:06 ` Christoph Groth
2020-01-18 19:42 ` brian m. carlson
2020-01-18 22:04 ` Christoph Groth
2020-01-20 12:01 ` Johannes Schindelin
2020-01-20 23:53 ` Christoph Groth
2020-01-21 2:53 ` brian m. carlson
2020-01-24 9:16 ` Christoph Groth [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8736c5tdst.fsf@drac \
--to=christoph@grothesque.org \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).