From: Thomas Rast <trast@student.ethz.ch>
To: Thomas Gummerer <t.gummerer@gmail.com>
Cc: <git@vger.kernel.org>, <trast@student.ethz.ch>,
<gitster@pobox.com>, <mhagger@alum.mit.edu>, <pclouds@gmail.com>
Subject: Handling racy entries in the v5 format [Re: [GSoC] Designing a faster index format - Progress report week 7]
Date: Wed, 6 Jun 2012 11:45:44 +0200 [thread overview]
Message-ID: <87aa0gbwon.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <20120604200746.GK6449@tgummerer> (Thomas Gummerer's message of "Mon, 4 Jun 2012 22:07:46 +0200")
Hi,
Michael, Thomas and me just had a lengthy discussion on IRC about racy
entries. I'll use "simultaneously" from the perspective of the
filesystem's mtimes; depending on your USE_NSEC, that may mean in the
same second, or the same nanosecond.
Background: Racy Entries
------------------------
There are two cases of racy index entries:
(A) echo foo >foo
git add foo
echo bar >foo
If the latter two commands happen simultaneously, lstat() will match the
index entry. Git handles this by checking foo.mtime >= index.mtime, and
if so, doing a content check. Such entries are called racy.
(B) echo foo >foo
git add foo # (i)
echo bar >foo
sleep 2
: >dummy
git add dummy # (ii)
If the commands before the sleep happen simultaneously, then foo.mtime
has not changed since (i), but due to (ii) index.mtime has, defeating
the raciness check. To handle this, git checks for racy entries
*w.r.t. the old index* immediately before it writes a new index. For
all[1] such entries it does a content check. All racy entries found to
be modified get ce_size=0, which tells the next git that "we know they
are modified". We call them "smudged".
The Problem
-----------
The use of ce_size=0 is a problem for index v5. The current drafts
exclude the size field, instead wrapping it in stat_crc along with most
of the other stat fields.
There are some obvious solutions:
* Put the size field back, costing us 4B/entry.
* Use some other marker field for the v5 format, e.g., the stat crc.
Neither of these is good, for an entirely different reason: The current
scheme checks *all* entries for being racy w.r.t. the old index, before
any write. This completely defeats the point of index v5: *avoid*
loading the entire index for small changes.
Proposed Solution
-----------------
(Michael, we have adapted it somewhat this since you left IRC.)
When writing an entry: check whether ce_mtime >= index.mtime. If so,
write out ce_mtime=0.
The index.mtime here is a lower bound on the mtime of the new index,
obtained e.g. by touching the index and then stat()ing it immediately
before writing out the changed entries.
Note that this is a fundamentally different approach from the one taken
in v[2-4] indexes. In the old approach, it is the *next* writer's
responsibility to ensure that all racy entries are either truly clean,
or smudged (since they will presumably lose their raciness). In the new
approach, racy entries are immediately smudged and remain so until an
update.
Footnotes:
[1] Ignoring the case where st_size==0 at the beginning, which needs
some arguing around because st_size is also the smudge marker.
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2012-06-06 9:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-04 20:07 [GSoC] Designing a faster index format - Progress report week 7 Thomas Gummerer
2012-06-06 9:45 ` Thomas Rast [this message]
2012-06-06 13:01 ` Handling racy entries in the v5 format [Re: [GSoC] Designing a faster index format - Progress report week 7] Johannes Sixt
2012-06-06 17:31 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87aa0gbwon.fsf@thomas.inf.ethz.ch \
--to=trast@student.ethz.ch \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=t.gummerer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).