git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Jeff King <peff@peff.net>,
	git@vger.kernel.org
Subject: Re: RFC: Flat directory for notes, or fan-out?  Both!
Date: Tue, 10 Feb 2009 11:09:09 -0800	[thread overview]
Message-ID: <20090210190909.GS30949@spearce.org> (raw)
In-Reply-To: <7vocxam96s.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> wrote:
> 
> I wonder if we can solve this by introducing a local cache that is a flat
> file that looks like:
> 
>     magic number for /usr/bin/file

Don't forget a version number.  Waste 4 bytes now and its easier
to change the format in the future if we need to.

>     tree object SHA-1 the file caches
>     Number of entries in this file
>     256 fan-out offsets into this file
>     N entries of <SHA-1, SHA-1>, sorted
>     Checksum of the file itself
> 
> and use it when availble (otherwise optionally create it upon the first
> lookup).  The file can be used by mmaping it and then doing a newton
> raphson or binary search similar to the way patch-ids.c does.

Yup.  Sort of my thoughts when I was thinking about that external
index for a "git database".

I was considering a much more complex file layout though; one that
would permit editing without completely recopying the file every
time something changes.

More or less a traditional block oriented on-disk M-tree, with
copy-on-write semantics for the blocks.  This would permit us to
quickly append onto the end of the file with new updates, and then
periodically copy and flatten out the the file as necessary to
reclaim the prior dead space.

E.g.:

  magic number
  version
  [intermediate blocks ...]
  [leaf blocks...]
  root block

Writers would append modified leaf and intermediate blocks as
necessary to the end of the file, then append a new root block.

Readers would read the file tail and verify it is a root, then scan
with a traditional M-tree search algorithm.

If the root block has a "magic block header" and a strong checksum
at the tail of the block, readers can concurrently read while a
writer is appending.  Any invalid root block just means the reader
is seeing the middle of a write, or an aborted write, and should
scan backwards to locate the prior valid root.

If the root block also has a commit SHA-1 indicating which commit
that root become valid under, a reader can decide if that root
might give it answers which aren't correct for the current value of
the notes history it is reading, and scan backwards for some older
root block.  We could accelerate that by including the file offset
of the prior root block in each new root.

GC compacting the file is just a matter of write-locking the file
to block out a new writer, then traversing the current root and
copying all blocks that are reachable.

</end-hand-waving>

> I am hoping that I could eventually rewrite rerere to use something like
> this, so that rerere database can be shared, just like the way notes can
> be shared, across repositories.

Ooh, great idea.  If we could toss rerere data into something that
can be transported around, and efficiently accessed.  I like it.

-- 
Shawn.

  reply	other threads:[~2009-02-10 19:11 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-09 21:12 RFC: Flat directory for notes, or fan-out? Both! Johannes Schindelin
2009-02-10  7:58 ` Boyd Stephen Smith Jr.
2009-02-10 13:16   ` Jeff King
2009-02-11  1:58     ` Boyd Stephen Smith Jr.
2009-02-11  2:35       ` Linus Torvalds
2009-02-11  3:30         ` Sam Vilain
2009-02-11  3:54           ` Linus Torvalds
2009-02-11  5:05             ` Sam Vilain
2009-02-11 12:35               ` Johannes Schindelin
2009-02-10 12:18 ` Jeff King
2009-02-10 12:59   ` Johannes Schindelin
2009-02-10 13:10     ` Jeff King
2009-02-10 13:32       ` Johannes Schindelin
2009-02-10 15:58         ` Junio C Hamano
2009-02-10 16:48           ` Shawn O. Pearce
2009-02-10 16:48           ` Johannes Schindelin
2009-02-10 16:56             ` Shawn O. Pearce
2009-02-10 17:31               ` Johannes Schindelin
2009-02-10 18:35               ` Junio C Hamano
2009-02-10 19:09                 ` Shawn O. Pearce [this message]
2009-02-10 21:10                 ` Johannes Schindelin
2009-02-10 22:16                   ` Thomas Rast
2009-02-10 22:26                     ` Thomas Rast
2009-02-10 22:32                     ` Junio C Hamano
2009-02-11 20:02                   ` Jeff King
2009-02-11 20:57                     ` Johannes Schindelin
2009-02-11 21:16                       ` Junio C Hamano
2009-02-11 23:05                         ` Johannes Schindelin
2009-02-10 16:44         ` Shawn O. Pearce
2009-02-10 17:09           ` Johannes Schindelin
2009-02-10 17:17             ` Shawn O. Pearce
2009-02-11  3:19           ` Sam Vilain
2009-02-11  1:14 ` Sam Vilain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090210190909.GS30949@spearce.org \
    --to=spearce@spearce.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).