git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
To: Shawn Pearce <spearce@spearce.org>
Cc: Thomas Rast <trast@inf.ethz.ch>,
	Thomas Gummerer <italyhockeyfeed@gmail.com>,
	david@lang.hm, Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Thomas Rast <trast@student.ethz.ch>,
	git@vger.kernel.org
Subject: Re: [GSoC] Designing a faster index format
Date: Fri, 6 Apr 2012 22:11:49 +0700	[thread overview]
Message-ID: <CACsJy8DaBxCtU7UQcc510J71zk95DMMsWdr9S3eYTupdRLjWBg@mail.gmail.com> (raw)
In-Reply-To: <CAJo=hJssfTvGqzQtaAj+Dk_R2oU5BwY=sQQuH3=SFTf+Zcp=3A@mail.gmail.com>

On Fri, Apr 6, 2012 at 10:22 AM, Shawn Pearce <spearce@spearce.org> wrote:
> On Thu, Apr 5, 2012 at 14:49, Thomas Rast <trast@inf.ethz.ch> wrote:
>> This is quite complete already, which I think is great, but it's still
>> missing one "obvious" approach: a directory-tree based layout that uses
>> "flat" storage.  That is, the entries grouped by directory and thus
>> arranged into the "natural" tree, so as to allow parsing only part of
>> it.  But not pulling any tricks to make it easy to change; a nontrivial
>> change would mean a rewrite.  How good do you think that could be?
>
> I have been wondering this myself. Aren't most updates to the index
> just updating the stat information of an existing entry?
>
> If so we could structure the index as flat lists for each directory
> similar to a canonical tree, but with a wider field to hold not just
> the SHA-1 but also the stat information of each file. If the entry is
> just the component name ("foo.c" and not "src/foo.c") and the SHA-1
> and stat data, you can probably protect the entire entry with a CRC-32
> for each entry. Updates can be performed in place by taking the write
> lock with index.lock as an empty file, then overwriting the SHA-1 and
> stat field followed, by updating the CRC-32. Readers that see the
> wrong CRC-32 for an entry can sleep for a short period, retry the
> read, and fail after some number of attempts if they cannot get a
> valid read of that entry.

But does that mean the reader can have some old entries (because the
writer has not updated them yet by the time of reading) mixed with new
ones, i.e. inconsistent data?


> Adding a new file or deleting an existing file from the index would
> require a full rewrite.
>
> Within a single tree/directory entry, it probably doesn't have to be
> binary searchable. Canonical trees aren't. Linear scans through a
> directory is OK, so long as the scans are broken up by the directory
> tree structure just like they are in canonical trees.
>
> Dealing with the conflict stages during merges (1, 2, 3) could be
> handled by appending the conflict data at the end of the index file,
> when conflicts are resolved this tail region could be truncated back
> to the real end of the file. A bit could be set on the normal entry in
> the trees to denote there is a conflict, and additional stage data is
> expected in the tail region of the file.



-- 
Duy

  reply	other threads:[~2012-04-06 15:12 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-20 21:17 [GSoC] Designing a faster index format Thomas Gummerer
2012-03-21  1:29 ` Nguyen Thai Ngoc Duy
2012-03-21  9:22   ` Thomas Gummerer
2012-03-21 10:34     ` Nguyen Thai Ngoc Duy
     [not found]       ` <CALgYhfPOJpKbM__iU4KvChWeXWyyhWb2ocR-SLvrQQHNw5F5dQ@mail.gmail.com>
2012-03-21 11:18         ` Nguyen Thai Ngoc Duy
2012-03-21 12:51       ` Thomas Rast
2012-03-21 15:43         ` Thomas Gummerer
2012-03-21 16:19           ` Thomas Rast
2012-03-22 22:51             ` Thomas Gummerer
2012-03-23 10:10               ` Thomas Rast
2012-03-25  1:28                 ` Thomas Gummerer
2012-03-26 20:35                 ` Thomas Gummerer
2012-03-26 21:14                   ` Junio C Hamano
2012-03-27 11:08                     ` Thomas Gummerer
2012-03-27 11:47                   ` Thomas Rast
2012-03-29 15:21                     ` Thomas Gummerer
2012-03-29 21:06                       ` Junio C Hamano
2012-03-30  5:19                         ` Nguyen Thai Ngoc Duy
2012-04-02 21:02                           ` Thomas Gummerer
2012-04-03  8:51                             ` Michael Haggerty
2012-04-03 12:28                               ` Nguyen Thai Ngoc Duy
2012-04-03 19:07                               ` Thomas Gummerer
2012-04-03 20:15                                 ` david
2012-04-04 20:05                                   ` Thomas Gummerer
2012-04-05 14:39                                     ` Noel Grandin
2012-04-05 21:49                                     ` Thomas Rast
2012-04-06  3:22                                       ` Shawn Pearce
2012-04-06 15:11                                         ` Nguyen Thai Ngoc Duy [this message]
2012-04-06 15:24                                           ` Thomas Rast
2012-04-06 15:44                                             ` Nguyen Thai Ngoc Duy
2012-04-06 17:13                                               ` Shawn Pearce
2012-04-06 17:23                                                 ` Nguyen Thai Ngoc Duy
2012-04-06 17:56                                                   ` Shawn Pearce
     [not found]                                       ` <878vi18eqd.fsf@thomas.inf.ethz.ch>
     [not found]                                         ` <83571955-9256-4032-9182-FA9062D28B9D@gmail.com>
     [not found]                                           ` <8D2805A4-9C5F-43A9-B3ED-0DB77341A03C@gmail.com>
2012-04-19 10:49                                             ` Nguyen Thai Ngoc Duy
     [not found]                                             ` <877gxcoron.fsf@thomas.inf.ethz.ch>
2012-04-20 20:02                                               ` Jeff King
2012-04-05 10:43                                 ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8DaBxCtU7UQcc510J71zk95DMMsWdr9S3eYTupdRLjWBg@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=david@lang.hm \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=italyhockeyfeed@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=spearce@spearce.org \
    --cc=trast@inf.ethz.ch \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).