git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Jeff King <peff@peff.net>,
	git@vger.kernel.org
Subject: Re: RFC: Flat directory for notes, or fan-out?  Both!
Date: Tue, 10 Feb 2009 10:35:39 -0800	[thread overview]
Message-ID: <7vocxam96s.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <20090210165610.GP30949@spearce.org> (Shawn O. Pearce's message of "Tue, 10 Feb 2009 08:56:10 -0800")

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>> On Tue, 10 Feb 2009, Junio C Hamano wrote:
>> > 
>> > I could do a revert on 'master' if it is really needed, but I found that
>> > the above reasoning is a bit troublesome.  The thing is, if a tree to hold
>> > the notes would be huge to be unmanageable, then it would still be huge to
>> > be unmanageable if you split it into 256 pieces.
>> 
>> The thing is, a tree object of 17 megabyte is unmanagably large if you 
>> have to read it whenever you access even a single node.  Having 256 trees 
>> instead, each of which is about 68 kilobyte is much nicer.
>
> See my other email on this thread; we'd probably need to unpack
> all 256 subtrees *anyway* due to the distribution of SHA-1 names
> for commits.

I wonder if we can solve this by introducing a local cache that is a flat
file that looks like:

    magic number for /usr/bin/file
    tree object SHA-1 the file caches
    Number of entries in this file
    256 fan-out offsets into this file
    N entries of <SHA-1, SHA-1>, sorted
    Checksum of the file itself

and use it when availble (otherwise optionally create it upon the first
lookup).  The file can be used by mmaping it and then doing a newton
raphson or binary search similar to the way patch-ids.c does.

The top-level API for such a hash-map would perhaps look like:

    /*
     * take the object name a tree object that is a hash map,
     * return an opaque struct.
     */
    struct hashmap *hashmap_open(const unsigned char *);

    /*
     * find the value given the key and return 0, or return negative
     * if not found.
     */
    int hashmap_lookup(struct hashmap *map, const unsigned char *key,
    		       unsigned char *val);

    /* discard the thing */
    void hashmap_close(struct hashmap *map);

We should be able to use these in "git log" and friends where Dscho added
the hook in his git-notes topic.

I am hoping that I could eventually rewrite rerere to use something like
this, so that rerere database can be shared, just like the way notes can
be shared, across repositories.

  parent reply	other threads:[~2009-02-10 18:37 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-09 21:12 RFC: Flat directory for notes, or fan-out? Both! Johannes Schindelin
2009-02-10  7:58 ` Boyd Stephen Smith Jr.
2009-02-10 13:16   ` Jeff King
2009-02-11  1:58     ` Boyd Stephen Smith Jr.
2009-02-11  2:35       ` Linus Torvalds
2009-02-11  3:30         ` Sam Vilain
2009-02-11  3:54           ` Linus Torvalds
2009-02-11  5:05             ` Sam Vilain
2009-02-11 12:35               ` Johannes Schindelin
2009-02-10 12:18 ` Jeff King
2009-02-10 12:59   ` Johannes Schindelin
2009-02-10 13:10     ` Jeff King
2009-02-10 13:32       ` Johannes Schindelin
2009-02-10 15:58         ` Junio C Hamano
2009-02-10 16:48           ` Shawn O. Pearce
2009-02-10 16:48           ` Johannes Schindelin
2009-02-10 16:56             ` Shawn O. Pearce
2009-02-10 17:31               ` Johannes Schindelin
2009-02-10 18:35               ` Junio C Hamano [this message]
2009-02-10 19:09                 ` Shawn O. Pearce
2009-02-10 21:10                 ` Johannes Schindelin
2009-02-10 22:16                   ` Thomas Rast
2009-02-10 22:26                     ` Thomas Rast
2009-02-10 22:32                     ` Junio C Hamano
2009-02-11 20:02                   ` Jeff King
2009-02-11 20:57                     ` Johannes Schindelin
2009-02-11 21:16                       ` Junio C Hamano
2009-02-11 23:05                         ` Johannes Schindelin
2009-02-10 16:44         ` Shawn O. Pearce
2009-02-10 17:09           ` Johannes Schindelin
2009-02-10 17:17             ` Shawn O. Pearce
2009-02-11  3:19           ` Sam Vilain
2009-02-11  1:14 ` Sam Vilain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vocxam96s.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).