From: Thomas Rast <trast@student.ethz.ch>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>,
Thomas Gummerer <italyhockeyfeed@gmail.com>, <david@lang.hm>,
Michael Haggerty <mhagger@alum.mit.edu>,
Junio C Hamano <gitster@pobox.com>,
Thomas Rast <trast@student.ethz.ch>, <git@vger.kernel.org>
Subject: Re: [GSoC] Designing a faster index format
Date: Fri, 6 Apr 2012 17:24:26 +0200 [thread overview]
Message-ID: <878vi8sx1x.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <CACsJy8DaBxCtU7UQcc510J71zk95DMMsWdr9S3eYTupdRLjWBg@mail.gmail.com> (Nguyen Thai Ngoc Duy's message of "Fri, 6 Apr 2012 22:11:49 +0700")
Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> On Fri, Apr 6, 2012 at 10:22 AM, Shawn Pearce <spearce@spearce.org> wrote:
>> On Thu, Apr 5, 2012 at 14:49, Thomas Rast <trast@inf.ethz.ch> wrote:
>>> This is quite complete already, which I think is great, but it's still
>>> missing one "obvious" approach: a directory-tree based layout that uses
>>> "flat" storage. That is, the entries grouped by directory and thus
>>> arranged into the "natural" tree, so as to allow parsing only part of
>>> it. But not pulling any tricks to make it easy to change; a nontrivial
>>> change would mean a rewrite. How good do you think that could be?
>>
>> I have been wondering this myself. Aren't most updates to the index
>> just updating the stat information of an existing entry?
>>
>> If so we could structure the index as flat lists for each directory
>> similar to a canonical tree, but with a wider field to hold not just
>> the SHA-1 but also the stat information of each file. If the entry is
>> just the component name ("foo.c" and not "src/foo.c") and the SHA-1
>> and stat data, you can probably protect the entire entry with a CRC-32
>> for each entry. Updates can be performed in place by taking the write
>> lock with index.lock as an empty file, then overwriting the SHA-1 and
>> stat field followed, by updating the CRC-32. Readers that see the
>> wrong CRC-32 for an entry can sleep for a short period, retry the
>> read, and fail after some number of attempts if they cannot get a
>> valid read of that entry.
>
> But does that mean the reader can have some old entries (because the
> writer has not updated them yet by the time of reading) mixed with new
> ones, i.e. inconsistent data?
That can be fixed with another CRC or hash covering the CRCs for the
entries, so that a reader can notice a partial update.
But even so: do we make any promises that (say) git-add is atomic in the
sense that a reader always gets the before-update results or the
after-update results? Non-builtins (e.g. git add -p) may make small
incremental updates to the index, so they wouldn't be atomic anyway.
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2012-04-06 15:24 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-20 21:17 [GSoC] Designing a faster index format Thomas Gummerer
2012-03-21 1:29 ` Nguyen Thai Ngoc Duy
2012-03-21 9:22 ` Thomas Gummerer
2012-03-21 10:34 ` Nguyen Thai Ngoc Duy
[not found] ` <CALgYhfPOJpKbM__iU4KvChWeXWyyhWb2ocR-SLvrQQHNw5F5dQ@mail.gmail.com>
2012-03-21 11:18 ` Nguyen Thai Ngoc Duy
2012-03-21 12:51 ` Thomas Rast
2012-03-21 15:43 ` Thomas Gummerer
2012-03-21 16:19 ` Thomas Rast
2012-03-22 22:51 ` Thomas Gummerer
2012-03-23 10:10 ` Thomas Rast
2012-03-25 1:28 ` Thomas Gummerer
2012-03-26 20:35 ` Thomas Gummerer
2012-03-26 21:14 ` Junio C Hamano
2012-03-27 11:08 ` Thomas Gummerer
2012-03-27 11:47 ` Thomas Rast
2012-03-29 15:21 ` Thomas Gummerer
2012-03-29 21:06 ` Junio C Hamano
2012-03-30 5:19 ` Nguyen Thai Ngoc Duy
2012-04-02 21:02 ` Thomas Gummerer
2012-04-03 8:51 ` Michael Haggerty
2012-04-03 12:28 ` Nguyen Thai Ngoc Duy
2012-04-03 19:07 ` Thomas Gummerer
2012-04-03 20:15 ` david
2012-04-04 20:05 ` Thomas Gummerer
2012-04-05 14:39 ` Noel Grandin
2012-04-05 21:49 ` Thomas Rast
2012-04-06 3:22 ` Shawn Pearce
2012-04-06 15:11 ` Nguyen Thai Ngoc Duy
2012-04-06 15:24 ` Thomas Rast [this message]
2012-04-06 15:44 ` Nguyen Thai Ngoc Duy
2012-04-06 17:13 ` Shawn Pearce
2012-04-06 17:23 ` Nguyen Thai Ngoc Duy
2012-04-06 17:56 ` Shawn Pearce
[not found] ` <878vi18eqd.fsf@thomas.inf.ethz.ch>
[not found] ` <83571955-9256-4032-9182-FA9062D28B9D@gmail.com>
[not found] ` <8D2805A4-9C5F-43A9-B3ED-0DB77341A03C@gmail.com>
2012-04-19 10:49 ` Nguyen Thai Ngoc Duy
[not found] ` <877gxcoron.fsf@thomas.inf.ethz.ch>
2012-04-20 20:02 ` Jeff King
2012-04-05 10:43 ` Michael Haggerty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878vi8sx1x.fsf@thomas.inf.ethz.ch \
--to=trast@student.ethz.ch \
--cc=david@lang.hm \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=italyhockeyfeed@gmail.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).