From: "Shawn O. Pearce" <spearce@spearce.org>
To: Scott Chacon <schacon@gmail.com>
Cc: git list <git@vger.kernel.org>
Subject: Re: Mercurial on BigTable
Date: Thu, 11 Jun 2009 21:14:28 -0700 [thread overview]
Message-ID: <20090612041428.GP16497@spearce.org> (raw)
In-Reply-To: <d411cc4a0906101215t313b2037k713aa1ce974c30cc@mail.gmail.com>
Scott Chacon <schacon@gmail.com> wrote:
> Has anyone watched this yet?
>
> http://code.google.com/events/io/sessions/MercurialBigTable.html
I hadn't seen that yet, thanks.
> It's kind of interesting - a Googler talks about getting Mercurial
> running on BigTable. What fascinates me is that if I'm not horribly
> mistaken, it seems like they just threw out the revlog format entirely
> and just store the data in a key-value store as sort of a Git-like
> content addressable filesystem.
Almost... but not quite. If you look at the way they store files
they embed the file path as part of the BigTable key. This makes it
cheap to return all revisions between X and Y for any given file, as
its just a range scan over the keys. Git doesn't do this normally.
In Hg, and in their implementation of it on BigTable, if a file
content is copied between two paths (same blob in git terms) they
actually duplicate the data, once under each path. We could do
something like that in Git... and just pay the price on copy, and
then you can get a storage layout like they do, and have it scale
well onto a larger system. But... pack size will suffer in what
the client receives, it will be bigger.
> Does anyone know how they do the graph walking efficiently with this
> structure? He mentioned it was about half as fast as native Hg, but
> that seemed to be acceptable. Curious if anyone had any thoughts or
> information on this. Shawn, are there technical reasons why this
> works well the way they're doing it for Hg but would not for Git (like
> in the repo MINA based server)? It looks like the data structure and
> protocol exchange are incredibly similar after they threw away all the
> revlog stuff.
I think they also added more pointers and data caches that don't
exist in Hg normally, but exist in their BigTable backend. Like
precomputing pointers from a commit to the most recent ancestor
that is a merge, i think that was mentioned in the talk.
The JGit/MINA based servers run git "well enough", but that's off
local disk, and we do pay a good price compared to C Git. E.g.
we really need a revcache to accelerate the object enumeration phase,
that takes ages in JGit. And indexing a pushed pack is rather slow
compared to C Git, a large push could take up to a minute or two
to fully index and fsck.
> Or is it just that they're fine with the speed loss and
> the Android project would not be?
What does Android have to do with Hg? Android went with Git for
a lot of reasons, none of them having to do with the performance
or availability of Hg on code.google.com. All of them had to do
with Git being a really solid DVCS that has a very bright future.
--
Shawn.
prev parent reply other threads:[~2009-06-12 4:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-10 19:15 Mercurial on BigTable Scott Chacon
2009-06-10 19:23 ` Sverre Rabbelier
2009-06-11 2:02 ` Andreas Ericsson
2009-06-11 8:24 ` Jakub Narebski
2009-06-12 3:46 ` Shawn O. Pearce
2009-06-12 7:14 ` Jakub Narebski
2009-06-11 14:37 ` Sitaram Chamarty
2009-06-12 4:14 ` Shawn O. Pearce [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090612041428.GP16497@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=schacon@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).