From: Jeff King <peff@peff.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: Johan Herland <johan@herland.net>,
Sverre Rabbelier <srabbelier@gmail.com>,
git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Andreas Ericsson <ae@op5.se>,
NAKAMURA Takumi <geek4civic@gmail.com>,
A Large Angry SCM <gitzilla@gmail.com>
Subject: Re: Git is not scalable with too many refs/*
Date: Tue, 14 Jun 2011 15:47:49 -0400 [thread overview]
Message-ID: <20110614194749.GA1567@sigill.intra.peff.net> (raw)
In-Reply-To: <BANLkTin0CjnM_hMaEpMroZdDhhavaoKAv00_4xBqeHj9biToVA@mail.gmail.com>
On Tue, Jun 14, 2011 at 12:20:29PM -0700, Shawn O. Pearce wrote:
> > We would want to store the cache in an on-disk format that could be
> > searched easily. Possibly something like the packed-refs format would be
> > sufficient, if we mmap'd and binary searched it. It would be dirt simple
> > if we used an existing key/value store like gdbm or tokyocabinet, but we
> > usually try to avoid extra dependencies.
>
> Yea, not a bad idea. Use a series of SSTable like things, like Hadoop
> uses. It doesn't need to be as complex as the Hadoop SSTable concept.
> But a simple sorted string to string mapping file that is immutable,
> with edits applied by creating an overlay file that contains
> new/updated entries.
>
> As you point out, we can use the notes tree to tell us the validity of
> the cache, and do incremental updates. If the current cache doesn't
> match the notes ref, compute the tree diff between the current cache's
> source tree and the new tree, and create a new SSTable like thing that
> has the relevant updates as an overlay of the existing tables. After
> some time you will have many of these little overlay files, and a GC
> can just merge them down to a single file.
I was really hoping that it would be fast enough that we could simply
blow away the old mapping and recreate it from scratch. That gets us out
of writing any journaling-type code with overlays. For something like
svn revisions, it's probably fine to take an extra second or two to
build the cache after we do a fetch. But it wouldn't scale to something
that was getting updated frequently.
If we're going to start doing clever database-y things, I'd much rather
use a proven key/value db solution like tokyocabinet. I'm just not sure
how to degrade gracefully when the db library isn't available. Don't
allow reverse mappings? Fallback to something slow?
> The only problem is, you probably want this "reverse notes index" to
> be indexing a portion of the note blob text, not all of it. That is,
> we want the SVN note text to say something like "SVN Revision: r1828"
> so `git log --notes=svn` shows us something more useful than just
> "r1828". But in the reverse index, we may only want the key to be
> "r1828". So you need some sort of small mapping function to decide
> what to put into that reverse index.
I had assumed that we would just be writing r1828 into the note. The
output via git log is actually pretty readable:
$ git notes --ref=svn/revisions add -m r1828
$ git show --notes=svn/revisions
...
Notes (svn/revisions):
r1828
Of course this is just one use case.
For that matter, we have to figure out how one would actually reference
the reverse mapping. If we have a simple, pure-reverse mapping, we can
just generate and cache them on the fly, and give a special syntax.
Like:
$ git log notes/svn/revisions@{revnote:r1828}
which would invert the notes/svn/revisions tree, search for r1828, and
reference the resulting commit.
If you had something more heavyweight that actually needed to parse
during the mapping, you might have something like:
$ : set up the mapping
$ git config revnote.svn.map 'SVN Revision: (r[0-9]+)'
$ : do the reverse; we should be able to build the cache on the fly
$ git notes reverse r1828
346ab9aaa1cf7b1ed2dd2c0a67bccc5b8ec23f7c
$ : so really you could have a similar ref syntax like, though
$ : this would require some ref parser updates, as we currently
$ : assume anything to the left of @{} is a real ref
$ git log r1828@{revnote:svn}
The syntaxes are not as nice as having a real ref. In the last example,
we could probably look for the contents of "@{}" as a possible revnote
mapping (since we've already had to name it via the configuration), to
make it "r1828@{svn}". Or you could even come up with a default set of
revnotes to consider, so that if we lookup "r1828" and it isn't a real
ref, we fall back to trying r1828@{revnote:svn}.
I dunno. I'm just throwing ideas out at this point.
-Peff
next prev parent reply other threads:[~2011-06-14 19:47 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-09 3:44 Git is not scalable with too many refs/* NAKAMURA Takumi
2011-06-09 6:50 ` Sverre Rabbelier
2011-06-09 15:23 ` Shawn Pearce
2011-06-09 15:52 ` A Large Angry SCM
2011-06-09 15:56 ` Shawn Pearce
2011-06-09 16:26 ` Jeff King
2011-06-10 3:59 ` NAKAMURA Takumi
2011-06-13 22:27 ` Jeff King
2011-06-14 0:17 ` Andreas Ericsson
2011-06-14 0:30 ` Jeff King
2011-06-14 4:41 ` Junio C Hamano
2011-06-14 7:26 ` Sverre Rabbelier
2011-06-14 10:02 ` Johan Herland
2011-06-14 10:34 ` Sverre Rabbelier
2011-06-14 17:02 ` Jeff King
2011-06-14 19:20 ` Shawn Pearce
2011-06-14 19:47 ` Jeff King [this message]
2011-06-14 20:12 ` Shawn Pearce
2011-09-08 19:53 ` Martin Fick
2011-09-09 0:52 ` Martin Fick
2011-09-09 1:05 ` Thomas Rast
2011-09-09 1:13 ` Thomas Rast
2011-09-09 15:59 ` Jens Lehmann
2011-09-25 20:43 ` Martin Fick
2011-09-26 12:41 ` Christian Couder
2011-09-26 17:47 ` Martin Fick
2011-09-26 18:56 ` Christian Couder
2011-09-30 16:41 ` Martin Fick
2011-09-30 19:26 ` Martin Fick
2011-09-30 21:02 ` Martin Fick
2011-09-30 22:06 ` Martin Fick
2011-10-01 20:41 ` Junio C Hamano
2011-10-02 5:19 ` Michael Haggerty
2011-10-03 0:46 ` Martin Fick
2011-10-04 8:08 ` Michael Haggerty
2011-10-03 18:12 ` Martin Fick
2011-10-03 19:42 ` Junio C Hamano
2011-10-04 8:16 ` Michael Haggerty
2011-10-08 20:59 ` Martin Fick
2011-10-09 5:43 ` Michael Haggerty
2011-09-28 19:38 ` Martin Fick
2011-09-28 22:10 ` Martin Fick
2011-09-29 0:54 ` Julian Phillips
2011-09-29 1:37 ` Martin Fick
2011-09-29 2:19 ` Julian Phillips
2011-09-29 16:38 ` Martin Fick
2011-09-29 18:26 ` Julian Phillips
2011-09-29 18:27 ` René Scharfe
2011-09-29 19:10 ` Junio C Hamano
2011-09-29 4:18 ` [PATCH] refs: Use binary search to lookup refs faster Julian Phillips
2011-09-29 21:57 ` Junio C Hamano
2011-09-29 22:04 ` [PATCH v2] " Julian Phillips
2011-09-29 22:06 ` [PATCH] " Junio C Hamano
2011-09-29 22:11 ` [PATCH v3] " Julian Phillips
2011-09-29 23:48 ` Junio C Hamano
2011-09-30 15:30 ` Michael Haggerty
2011-09-30 16:38 ` Junio C Hamano
2011-09-30 17:56 ` [PATCH] refs: Remove duplicates after sorting with qsort Julian Phillips
2011-10-02 5:15 ` [PATCH v3] refs: Use binary search to lookup refs faster Michael Haggerty
2011-10-02 5:45 ` Junio C Hamano
2011-10-04 20:58 ` Junio C Hamano
2011-09-30 1:13 ` Martin Fick
2011-09-30 3:44 ` Junio C Hamano
2011-09-30 8:04 ` Julian Phillips
2011-09-30 15:45 ` Martin Fick
2011-09-29 20:44 ` Git is not scalable with too many refs/* Martin Fick
2011-09-29 19:10 ` Julian Phillips
2011-09-29 20:11 ` Martin Fick
2011-09-30 9:12 ` René Scharfe
2011-09-30 16:09 ` Martin Fick
2011-09-30 16:52 ` Junio C Hamano
2011-09-30 18:17 ` René Scharfe
2011-10-01 15:28 ` René Scharfe
2011-10-01 15:38 ` [PATCH 1/8] checkout: check for "Previous HEAD" notice in t2020 René Scharfe
2011-10-01 19:02 ` Sverre Rabbelier
2011-10-01 15:43 ` [PATCH 2/8] revision: factor out add_pending_sha1 René Scharfe
2011-10-01 15:51 ` [PATCH 3/8] checkout: use add_pending_{object,sha1} in orphan check René Scharfe
2011-10-01 15:56 ` [PATCH 4/8] revision: add leak_pending flag René Scharfe
2011-10-01 16:01 ` [PATCH 5/8] bisect: use " René Scharfe
2011-10-01 16:02 ` [PATCH 6/8] bundle: " René Scharfe
2011-10-01 16:09 ` [PATCH 7/8] checkout: " René Scharfe
2011-10-01 16:16 ` [PATCH 8/8] commit: factor out clear_commit_marks_for_object_array René Scharfe
2011-09-26 15:15 ` Git is not scalable with too many refs/* Martin Fick
2011-09-26 15:21 ` Sverre Rabbelier
2011-09-26 15:48 ` Martin Fick
2011-09-26 15:56 ` Sverre Rabbelier
2011-09-26 16:38 ` Martin Fick
2011-09-26 16:49 ` Julian Phillips
2011-09-26 18:07 ` Martin Fick
2011-09-26 18:37 ` Julian Phillips
2011-09-26 20:01 ` Martin Fick
2011-09-26 20:07 ` Junio C Hamano
2011-09-26 20:28 ` Julian Phillips
2011-09-26 21:39 ` Martin Fick
2011-09-26 21:52 ` Martin Fick
2011-09-26 23:26 ` Julian Phillips
2011-09-26 23:37 ` David Michael Barr
2011-09-27 1:01 ` [PATCH] refs.c: Fix slowness with numerous loose refs David Barr
2011-09-27 2:04 ` David Michael Barr
2011-09-26 23:38 ` Git is not scalable with too many refs/* Junio C Hamano
2011-09-27 0:00 ` [PATCH] Don't sort ref_list too early Julian Phillips
2011-10-02 4:58 ` Michael Haggerty
2011-09-27 0:12 ` Git is not scalable with too many refs/* Martin Fick
2011-09-27 0:22 ` Julian Phillips
2011-09-27 2:34 ` Martin Fick
2011-09-27 7:59 ` Julian Phillips
2011-09-27 8:20 ` Sverre Rabbelier
2011-09-27 9:01 ` Julian Phillips
2011-09-27 10:01 ` Sverre Rabbelier
2011-09-27 10:25 ` Nguyen Thai Ngoc Duy
2011-09-27 11:07 ` Michael Haggerty
2011-09-27 12:10 ` Julian Phillips
2011-09-26 22:30 ` Julian Phillips
2011-09-26 15:32 ` Michael Haggerty
2011-09-26 15:42 ` Martin Fick
2011-09-26 16:25 ` Thomas Rast
2011-09-09 13:50 ` Michael Haggerty
2011-09-09 15:51 ` Michael Haggerty
2011-09-09 16:03 ` Jens Lehmann
2011-06-10 7:41 ` Andreas Ericsson
2011-06-10 19:41 ` Shawn Pearce
2011-06-10 20:12 ` Jakub Narebski
2011-06-10 20:35 ` Jeff King
2011-06-13 7:08 ` Andreas Ericsson
2011-06-09 11:18 ` Jakub Narebski
2011-06-09 15:42 ` Stephen Bash
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110614194749.GA1567@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=ae@op5.se \
--cc=geek4civic@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=gitzilla@gmail.com \
--cc=johan@herland.net \
--cc=spearce@spearce.org \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).