From: Christoph Groth <christoph@grothesque.org>
To: git@vger.kernel.org
Subject: Stat cache in .git/index hinders syncing of repositories
Date: Sat, 18 Jan 2020 00:57:36 +0100 [thread overview]
Message-ID: <87v9p9skjz.fsf@drac> (raw)
Hello,
I am using unison to sync home directories across multiple machines.
This includes a fair number of git repositories and works very well.
Unison recently acquired a new feature that allows to treat selected
subdirectories (like .git) atomically. This makes the syncing perfectly
safe.
Some people say that one should use git itself to sync git working
directories, but IMHO these people oversee the difference between
collaboration (using git) and being able to continue one’s own
unfinished work on a different machine, including uncommitted files,
stashes, and - if it has to be - in the middle of a merge. Moreover, it
is simpler not to have to treat git repositories specially when syncing.
Syncing git repositories is thus clearly useful.
However, there is one problem with syncing git repositories, that has
been noticed by multiple people [1]: The file .git/index contains not
only the “git index”, but also a cache of stat-data of the files in the
working directory. Some file synchronizers are able to sync mtimes, but
syncing ctimes would be bizarre (if it is even possible).
So, say that machines A and B are synced. A new git repository appears
on machine A. The synchronizer is run which results in copying all the
files of the new repo verbatim to machine B. Note that now on machine
B the cache inside the file .git/index contains invalid stat
information. So when "git status" is run on B .git/index gets
rewritten, and the next sync operation copies it back to A, where again
it is rewritten even by something as harmless as "git status". And so
on, and so forth...
In my opinion the root of this ping-pong problem is that .git/index
mixes information about the status of the repository (=what has been
staged) that should be synced with a cache of machine-specific
filesystem metadata.
I am not an expert of git-internals, but perhaps it would be a good idea
to move the cache into a separate file that could be put on a "ignore"
list for synchronizers? It seems to me that this has been already
proposed in a different context [2], and I would not be surprised if
factoring out the cache had other beneficial effects.
If it is not feasible to separate the cache, perhaps another possibility
would be to add a new possible value for core.checkStat that would
disable stat structure checking except for file sizes?
As a workaround for now, I exclude .git/index from syncing. This seems
to work quite well, but I would be scared to sync unfinished merges like
this.
Thanks
Christoph
[1] https://stackoverflow.com/questions/12126247/why-does-git-index-change-when-i-havent-done-anything-to-my-repository
[2] https://www.mail-archive.com/git@vger.kernel.org/msg48065.html
next reply other threads:[~2020-01-18 0:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-17 23:57 Christoph Groth [this message]
2020-01-18 18:15 ` Stat cache in .git/index hinders syncing of repositories Junio C Hamano
2020-01-18 19:06 ` Christoph Groth
2020-01-18 19:42 ` brian m. carlson
2020-01-18 22:04 ` Christoph Groth
2020-01-20 12:01 ` Johannes Schindelin
2020-01-20 23:53 ` Christoph Groth
2020-01-21 2:53 ` brian m. carlson
2020-01-24 9:16 ` Christoph Groth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v9p9skjz.fsf@drac \
--to=christoph@grothesque.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).