* Google Code supports Git
@ 2011-07-16 10:24 Nguyen Thai Ngoc Duy
2011-07-16 20:44 ` Sverre Rabbelier
2011-07-17 2:26 ` Shawn Pearce
0 siblings, 2 replies; 6+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-07-16 10:24 UTC (permalink / raw)
To: Git Mailing List
Just out of curiousity and because I happen to know we have Googlers
here. If it's not confidential, are there any changes in git to make
it work with Google Code? I am particularly interested in whether
Google modifies git to use bigtable (or cassandra, I remember Shawn
had a prototype).
--
Duy
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Google Code supports Git
2011-07-16 10:24 Google Code supports Git Nguyen Thai Ngoc Duy
@ 2011-07-16 20:44 ` Sverre Rabbelier
2011-07-16 21:18 ` A Large Angry SCM
2011-07-17 2:26 ` Shawn Pearce
1 sibling, 1 reply; 6+ messages in thread
From: Sverre Rabbelier @ 2011-07-16 20:44 UTC (permalink / raw)
To: Nguyen Thai Ngoc Duy; +Cc: Git Mailing List
Heya,
On Sat, Jul 16, 2011 at 12:24, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> Just out of curiousity and because I happen to know we have Googlers
> here. If it's not confidential, are there any changes in git to make
> it work with Google Code? I am particularly interested in whether
> Google modifies git to use bigtable (or cassandra, I remember Shawn
> had a prototype).
If nothing else, the "hg on bigtable" talk from I/O 2008 is probably relevant.
http://www.youtube.com/watch?v=ri796Hx8las
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Google Code supports Git
2011-07-16 20:44 ` Sverre Rabbelier
@ 2011-07-16 21:18 ` A Large Angry SCM
0 siblings, 0 replies; 6+ messages in thread
From: A Large Angry SCM @ 2011-07-16 21:18 UTC (permalink / raw)
To: Sverre Rabbelier; +Cc: Nguyen Thai Ngoc Duy, Git Mailing List
On 07/16/2011 04:44 PM, Sverre Rabbelier wrote:
> Heya,
>
> On Sat, Jul 16, 2011 at 12:24, Nguyen Thai Ngoc Duy<pclouds@gmail.com> wrote:
>> Just out of curiousity and because I happen to know we have Googlers
>> here. If it's not confidential, are there any changes in git to make
>> it work with Google Code? I am particularly interested in whether
>> Google modifies git to use bigtable (or cassandra, I remember Shawn
>> had a prototype).
>
> If nothing else, the "hg on bigtable" talk from I/O 2008 is probably relevant.
>
> http://www.youtube.com/watch?v=ri796Hx8las
>
I know I would would appreciate it and I believe many on this list would
also appreciate understanding how git hosting in Google code was
implemented.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Google Code supports Git
2011-07-16 10:24 Google Code supports Git Nguyen Thai Ngoc Duy
2011-07-16 20:44 ` Sverre Rabbelier
@ 2011-07-17 2:26 ` Shawn Pearce
2011-07-17 10:46 ` Sebastian Schuberth
2011-07-17 15:45 ` Ævar Arnfjörð Bjarmason
1 sibling, 2 replies; 6+ messages in thread
From: Shawn Pearce @ 2011-07-17 2:26 UTC (permalink / raw)
To: Nguyen Thai Ngoc Duy; +Cc: Git Mailing List
On Sat, Jul 16, 2011 at 03:24, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> Just out of curiousity and because I happen to know we have Googlers
> here. If it's not confidential, are there any changes in git to make
> it work with Google Code? I am particularly interested in whether
> Google modifies git to use bigtable
A major milestone in Git was adding smart HTTP. If you watch the talk
Sverre linked to, you will learn that Google is based heavily on HTTP.
A fundamental issue at the time Hg on Google Code was added was Git
didn't really work well over HTTP. Adding smart HTTP in Git 1.6.6 made
it more realistic for Google to support Git on Project Hosting. I
added smart HTTP support for kernel.org so their users behind
firewalls could still use an efficient Git protocol to fetch revisions
from kernel.org for projects that are hosted there. Its a nice bonus
that this work made Git on Google Code more realistic for Google.
We are trying to get the engineer responsible for making Git on Google
Code possible to give a recorded tech talk like the one Sverre linked
to. I don't want to steal his thunder, but I can say the Git on Google
Code work is not based on C Git or JGit. :-)
> (or cassandra, I remember Shawn
> had a prototype).
This was an unrelated project, and is what I deem to be a failure...
quite unlike Git on Google Code. :-)
For some background, at GitTogether in Oct. 2010 I showed a demo of
JGit using the Apache Cassandra database as an object / reference
store. This prototype didn't really scale well; even though I demoed
the linux-2.6 repository being cloned through a JGit daemon using
Cassandra as the backing store, it was slow and used too much CPU and
memory resources to be useful in any context beyond a "Look, I can do
this!" demo. I managed to open source this work, it may still be
laying around somewhere, but I basically threw it out the window and
said "that isn't good, and I can't believe I put my name on it!". (And
for the record I was not the first to try this, Scott Chacon at GitHub
tried something similar first and demoed it at GitTogether in 2009.)
In late Jan/Feb 2011 I released a series of patches for JGit that
added what I called "DHT" (distributed hash table) support. These
patches are now part of the JGit project. Its different from the
original Cassandra prototype. With this work, JGit tries to treat the
DHT as though it were a virtual memory system. Relatively standard
pack files are segmented into ~ 1 MiB chunks, then stored into the DHT
with row keys based on the SHA-1 hash of the content of the "pack
chunk". The bet here is that the locality of data in a pack file is
quite good, so loading a chunk of commits ~1 MiB in size should get us
a number of related commits, amortizing out the round-trip time to the
database. This was to resolve one of the latency problems I saw with
the Cassandra prototype, which stored 1 commit per row and had awful
performance during a major revision traversal like a clone has.
The JGit DHT work lead me to discover the pack locality is not as good
as we think it is. Its good, but it can be better. I added some
patches to JGit's PackWriter to reorder objects in an order that gave
better data locality. After Junio and I started sharing an office, I
began nagging him about this locality problem in Git pack files... and
that nagging lead to a series of patches Junio posted about a week
back to improve pack-objects.c. The improvement is small on local
disk, it reduces some minor page faults, however there isn't much
difference in overall running time. Over higher latency filesystems
however, like an NFS server in another city, it should help reading.
Just recently I posted a message to the jgit-dev mailing list saying I
also now think JGit DHT isn't a viable solution, and am likely to
discard it in the future. Its implementation is very complicated, and
it just doesn't perform as well as I had hoped. FWIW, this work was
not for Google, but was for open source Git hosting sites like
source.android.com, eclipse.org, KDE, etc. where they need to manage a
large number of Git repositories, and want to have hot-failover and
load-balancing to reduce down time caused by hardware failures.
Unfortunately it hasn't been panning out, because the performance loss
is a lot compared to the small administrative improvements it might
bring. Not to mention the additional complexity of running the
clustered database vs. just a bunch of Git repositories in a
directory.
I can tell you that none of this is what Git for Google Code does.
As for how Git on Google Code is implemented... you'll just have to
wait for the tech talk from the engineer responsible. I can say it
wasn't me, and it wasn't Junio. I am too busy with JGit and Gerrit
Code Review, and Junio is too busy being Git maintainer to work on a
major new feature like this.
:-)
--
Shawn.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Google Code supports Git
2011-07-17 2:26 ` Shawn Pearce
@ 2011-07-17 10:46 ` Sebastian Schuberth
2011-07-17 15:45 ` Ævar Arnfjörð Bjarmason
1 sibling, 0 replies; 6+ messages in thread
From: Sebastian Schuberth @ 2011-07-17 10:46 UTC (permalink / raw)
To: Shawn Pearce; +Cc: Nguyen Thai Ngoc Duy, Git Mailing List
On 17.07.2011 04:26, Shawn Pearce wrote:
> to. I don't want to steal his thunder, but I can say the Git on Google
> Code work is not based on C Git or JGit. :-)
Then it's probably JavaScript ;-)
https://github.com/danlucraft/git.js
--
Sebastian Schuberth
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Google Code supports Git
2011-07-17 2:26 ` Shawn Pearce
2011-07-17 10:46 ` Sebastian Schuberth
@ 2011-07-17 15:45 ` Ævar Arnfjörð Bjarmason
1 sibling, 0 replies; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-07-17 15:45 UTC (permalink / raw)
To: Shawn Pearce; +Cc: Nguyen Thai Ngoc Duy, Git Mailing List
On Sun, Jul 17, 2011 at 04:26, Shawn Pearce <spearce@spearce.org> wrote:
> We are trying to get the engineer responsible for making Git on Google
> Code possible to give a recorded tech talk like the one Sverre linked
> to. I don't want to steal his thunder, but I can say the Git on Google
> Code work is not based on C Git or JGit. :-)
I don't think you have to worry about that, looks like he already
stole his own thunder:
http://hackerne.ws/item?id=2769816
http://code.google.com/u/dborowitz@google.com/
I.e. Google Code is using Dulwich:
http://www.samba.org/~jelmer/dulwich/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-07-17 15:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-16 10:24 Google Code supports Git Nguyen Thai Ngoc Duy
2011-07-16 20:44 ` Sverre Rabbelier
2011-07-16 21:18 ` A Large Angry SCM
2011-07-17 2:26 ` Shawn Pearce
2011-07-17 10:46 ` Sebastian Schuberth
2011-07-17 15:45 ` Ævar Arnfjörð Bjarmason
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).