From: Andreas Ericsson <ae@op5.se>
To: Scott Chacon <schacon@gmail.com>
Cc: git list <git@vger.kernel.org>
Subject: Re: Mercurial on BigTable
Date: Thu, 11 Jun 2009 04:02:45 +0200 [thread overview]
Message-ID: <4A3065C5.3070203@op5.se> (raw)
In-Reply-To: <d411cc4a0906101215t313b2037k713aa1ce974c30cc@mail.gmail.com>
Scott Chacon wrote:
> Has anyone watched this yet?
>
> http://code.google.com/events/io/sessions/MercurialBigTable.html
>
> It's kind of interesting - a Googler talks about getting Mercurial
> running on BigTable. What fascinates me is that if I'm not horribly
> mistaken, it seems like they just threw out the revlog format entirely
> and just store the data in a key-value store as sort of a Git-like
> content addressable filesystem.
It does indeed seem like that, yes. Would have been fun to be there to
congratulate him on implementing something that's already existed for
about three years ;-)
> I had thought they were taking
> advantage of the revlog structure somehow, but it appears like they
> basically just changed the underlying data format to be much more like
> Git and rewrote ah Hg speaking server on top of that. They even
> explicitly store the head values like refs instead of reading
> childless nodes out of the revlog, which is what I thought Hg did.
>
Well, storing the head values as refs is the only thing that makes
sense if you're using a database to track things, since you'd otherwise
have to map in too much data to get any sort of performance at all
out of it.
> Does anyone know how they do the graph walking efficiently with this
> structure? He mentioned it was about half as fast as native Hg, but
> that seemed to be acceptable.
Yes, so they don't. DAG walking means they have to look up several
changesets in a linear fashion, but if they don't know the order
up front they'll have to suffer the penalty of actually fetching
each commit from the bigtable database over the network. It would
be similar to storing git objects in a database on a different
host, which would also be quite a lot slower than just hitting an
mmap()'ed file in binary form.
> Curious if anyone had any thoughts or
> information on this. Shawn, are there technical reasons why this
> works well the way they're doing it for Hg but would not for Git (like
> in the repo MINA based server)? It looks like the data structure and
> protocol exchange are incredibly similar after they threw away all the
> revlog stuff. Or is it just that they're fine with the speed loss and
> the Android project would not be?
>
I'm more curious as to why they didn't choose git. The only explanation
that was actually true is that hg works well over HTTP (if you can call
3 network requests per not-up-to-date head "well"). Since I can't imagine
them not doing proper research before launching a project that almost
certainly cost quite a lot of money, and I personally think that the
"http rules all" explanation sounded weak, I'm guessing there were other
reasons as to why they didn't go with git instead, and I'm fairly curious
to hear them. If I was to take a guess, I'd say git is written in a pretty
unfriendly way for implementing other storage engines.
Ah well. In a year or two they'll probably support git as well. One can
hope at least ;-)
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
next prev parent reply other threads:[~2009-06-11 2:02 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-10 19:15 Mercurial on BigTable Scott Chacon
2009-06-10 19:23 ` Sverre Rabbelier
2009-06-11 2:02 ` Andreas Ericsson [this message]
2009-06-11 8:24 ` Jakub Narebski
2009-06-12 3:46 ` Shawn O. Pearce
2009-06-12 7:14 ` Jakub Narebski
2009-06-11 14:37 ` Sitaram Chamarty
2009-06-12 4:14 ` Shawn O. Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A3065C5.3070203@op5.se \
--to=ae@op5.se \
--cc=git@vger.kernel.org \
--cc=schacon@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).