* Cloning speed comparison
@ 2005-08-13 1:54 Petr Baudis
2005-08-13 2:12 ` Linus Torvalds
2005-08-15 17:50 ` Daniel Barkalow
0 siblings, 2 replies; 11+ messages in thread
From: Petr Baudis @ 2005-08-13 1:54 UTC (permalink / raw)
To: git
Hello,
I've wondered how slow the protocols other than rsync are, and the
(well, a bit dubious; especially wrt. caching on the remote side)
results are:
git clone-pack:ssh 25s
git rsync 27s
git http-pull 47s
git dumb-http 54s
git ssh-pull 660s
cogito clone-pack:ssh 35s (!)
cogito rsync 140s
cogito ssh-pull 480s
cogito http-pull extrapolated to about an hour!
cogito dumb-http N/A (missing info in the repository)
(I didn't test the git server protocol, since kernel.org doesn't run
git server and I was too lazy to setup one.)
The git repository contains one big pack, one small pack and few
standalone objects (5882 objects in total), while cogito is standalone
objects only (9670 objects in total, 8681 reachable).
The numbers are off by some epsilons, as I didn't bother with multiple
measures, but shouldn't be hugely off for a general comparison. The
network connection has 2048kbit/s download, the other side was
www.kernel.org for HTTP and rsync, and master.kernel.org for ssh.
Pulling from localhost (128M of RAM, 5M to 30M free - awful, yes):
cogito rsync:ssh 150s
cogito ssh-pull 120s (but didn't complete, see PS)
cogito http-pull 260s
cogito clone-pack:ssh 340s
Anyway, clone-pack is a clear winner for networks (but someone should
re-check that, especially compared to rsync, wrt. server-side file
caching); really cool fast, but not very practical for anonymous access.
Any volunteers for a simple CGI (or gitweb addon) script + HTTP support
in clone-pack? HTTP is certainly the most suitable protocol for
anonymous pulls, so it's a shame it's still that sluggish.
It is so slow here since it has some very ugly access pattern on the
objects database and my RAM is full so it does not get cached; even on
the servers, it was slower at first - unfortunately, I didn't measure
that, so what's in the top table are second accesses. Still, I would
expect the big repositories to stay mostly in the server cache, so this
isn't that big problem for those, I think.
PS:
With the latest git version as of time of writing this:
$ time cg-clone git+ssh://pasky@localhost/home/pasky/WWW/dev/git/.g cogito
...
progress: 5759 objects, 10292457 bytes
$ time cg-clone http://localhost/~pasky/dev/git/.g cogito
...
progress: 8681 objects, 14881571 bytes
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
If you want the holes in your knowledge showing up try teaching
someone. -- Alan Cox
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cloning speed comparison
2005-08-13 1:54 Cloning speed comparison Petr Baudis
@ 2005-08-13 2:12 ` Linus Torvalds
2005-08-13 3:10 ` Petr Baudis
2005-08-15 17:50 ` Daniel Barkalow
1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2005-08-13 2:12 UTC (permalink / raw)
To: Petr Baudis; +Cc: git
On Sat, 13 Aug 2005, Petr Baudis wrote:
>
> Anyway, clone-pack is a clear winner for networks (but someone should
> re-check that, especially compared to rsync, wrt. server-side file
> caching); really cool fast, but not very practical for anonymous access.
git-daemon is for the anonymous access case, either started from inetd
(or any other external "listen to port, exec service" thing), or with the
built-in listening stuff.
It uses exactly the same protocol and logic as the regular ssh clone-pack
thing, except it doesn't authenticate the remote end: it only checks that
the local end is accepting anonymous pulls by checking whether there is a
"git-daemon-export-ok" file in the git directory.
In my tests, the git daemon was noticeably faster than ssh, if only
because the authentication actually tends to be a big part of the overhead
in small pulls.
[ Hey. There's a deer outside my window eating our roses again. Cute ]
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cloning speed comparison
2005-08-13 2:12 ` Linus Torvalds
@ 2005-08-13 3:10 ` Petr Baudis
2005-08-13 3:28 ` Linus Torvalds
2005-08-13 5:16 ` H. Peter Anvin
0 siblings, 2 replies; 11+ messages in thread
From: Petr Baudis @ 2005-08-13 3:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git, ftpadmin
Dear diary, on Sat, Aug 13, 2005 at 04:12:26AM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> On Sat, 13 Aug 2005, Petr Baudis wrote:
> >
> > Anyway, clone-pack is a clear winner for networks (but someone should
> > re-check that, especially compared to rsync, wrt. server-side file
> > caching); really cool fast, but not very practical for anonymous access.
>
> git-daemon is for the anonymous access case, either started from inetd
> (or any other external "listen to port, exec service" thing), or with the
> built-in listening stuff.
>
> It uses exactly the same protocol and logic as the regular ssh clone-pack
> thing, except it doesn't authenticate the remote end: it only checks that
> the local end is accepting anonymous pulls by checking whether there is a
> "git-daemon-export-ok" file in the git directory.
>
> In my tests, the git daemon was noticeably faster than ssh, if only
> because the authentication actually tends to be a big part of the overhead
> in small pulls.
Oh. Sounds nice, are there plans to run this on kernel.org too? (So far,
90% of my GIT network activity happens with kernel.org; the rest is with
my notebook, and I want to keep that ssh.)
BTW, is the pack protocol flexible enough to be extended to support
pushing? That would be great as well. You might suggest just using ssh,
but that (i) requires you to be root on the machine to add new users
(ii) consequently adds administrative burden (iii) isn't easy to set up
so that the user has no shell access, shall you want to restrict that.
> [ Hey. There's a deer outside my window eating our roses again. Cute ]
Oh, it must be nice in Oregon. I can't imagine anything like that to
happen in Czechia unless you live at a solitude or in some lonely tiny
village.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
If you want the holes in your knowledge showing up try teaching
someone. -- Alan Cox
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cloning speed comparison
2005-08-13 3:10 ` Petr Baudis
@ 2005-08-13 3:28 ` Linus Torvalds
2005-08-13 5:16 ` H. Peter Anvin
2005-08-13 5:16 ` H. Peter Anvin
1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2005-08-13 3:28 UTC (permalink / raw)
To: Petr Baudis; +Cc: git, ftpadmin
On Sat, 13 Aug 2005, Petr Baudis wrote:
>
> Oh. Sounds nice, are there plans to run this on kernel.org too? (So far,
> 90% of my GIT network activity happens with kernel.org; the rest is with
> my notebook, and I want to keep that ssh.)
Maybe. I don't know what the status of that is, but the plan was to at
least give it a try.
> BTW, is the pack protocol flexible enough to be extended to support
> pushing?
The _protocol_ could handle it, but you obviously need some kind of secure
authentication, and quite frankly, one of the selling points on git-daemon
right now is that it's all read-only and very simple and there should be
no security issues because it will never write anything at all.
So right now git-daemon only accepts requests from fetch-pack.
> > [ Hey. There's a deer outside my window eating our roses again. Cute ]
>
> Oh, it must be nice in Oregon. I can't imagine anything like that to
> happen in Czechia unless you live at a solitude or in some lonely tiny
> village.
Deer are really just oversized rats with horns (*). They're cute, though,
and it's kind of funny looking up from the screen and noticing one
munching on the roses just ten feet away.
Linus
(*) Did I mention that biology wasn't one of the things I did at Uni?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cloning speed comparison
2005-08-13 3:28 ` Linus Torvalds
@ 2005-08-13 5:16 ` H. Peter Anvin
2005-08-13 5:25 ` Linus Torvalds
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2005-08-13 5:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Petr Baudis, git, ftpadmin
Linus Torvalds wrote:
>
>>BTW, is the pack protocol flexible enough to be extended to support
>>pushing?
>
> The _protocol_ could handle it, but you obviously need some kind of secure
> authentication, and quite frankly, one of the selling points on git-daemon
> right now is that it's all read-only and very simple and there should be
> no security issues because it will never write anything at all.
>
Running it over ssh would be a good way to do authentication...
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cloning speed comparison
2005-08-13 3:10 ` Petr Baudis
2005-08-13 3:28 ` Linus Torvalds
@ 2005-08-13 5:16 ` H. Peter Anvin
1 sibling, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2005-08-13 5:16 UTC (permalink / raw)
To: Petr Baudis; +Cc: Linus Torvalds, git, ftpadmin
Petr Baudis wrote:
>>>
>>In my tests, the git daemon was noticeably faster than ssh, if only
>>because the authentication actually tends to be a big part of the overhead
>>in small pulls.
>
> Oh. Sounds nice, are there plans to run this on kernel.org too? (So far,
> 90% of my GIT network activity happens with kernel.org; the rest is with
> my notebook, and I want to keep that ssh.)
>
Yes, when I get some time...
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cloning speed comparison
2005-08-13 5:16 ` H. Peter Anvin
@ 2005-08-13 5:25 ` Linus Torvalds
2005-08-13 23:25 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2005-08-13 5:25 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Petr Baudis, git, ftpadmin
On Fri, 12 Aug 2005, H. Peter Anvin wrote:
>
> Running it over ssh would be a good way to do authentication...
Well, if you have ssh as an option, you don't need git-daemon any more,
since the protocol that git-daemon does runs quite well over ssh on its
own...
The only point of git-daemon really is when you don't have ssh access (ie
you may want to give people a limited interface, but not full ssh). Ie
as-is, it's only for anonymous reads of a git archive, but it obviously
_could_ do more.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cloning speed comparison
2005-08-13 5:25 ` Linus Torvalds
@ 2005-08-13 23:25 ` H. Peter Anvin
0 siblings, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2005-08-13 23:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Petr Baudis, git, ftpadmin
Linus Torvalds wrote:
>
> On Fri, 12 Aug 2005, H. Peter Anvin wrote:
>
>>Running it over ssh would be a good way to do authentication...
>
>
> Well, if you have ssh as an option, you don't need git-daemon any more,
> since the protocol that git-daemon does runs quite well over ssh on its
> own...
>
> The only point of git-daemon really is when you don't have ssh access (ie
> you may want to give people a limited interface, but not full ssh). Ie
> as-is, it's only for anonymous reads of a git archive, but it obviously
> _could_ do more.
>
Okay. So use git-daemon for the anonymous users, and run the git
protocol over ssh for writing. Seems easy enough for me.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cloning speed comparison
2005-08-13 1:54 Cloning speed comparison Petr Baudis
2005-08-13 2:12 ` Linus Torvalds
@ 2005-08-15 17:50 ` Daniel Barkalow
2005-08-15 20:46 ` Junio C Hamano
1 sibling, 1 reply; 11+ messages in thread
From: Daniel Barkalow @ 2005-08-15 17:50 UTC (permalink / raw)
To: Petr Baudis; +Cc: git
On Sat, 13 Aug 2005, Petr Baudis wrote:
> Hello,
>
> I've wondered how slow the protocols other than rsync are, and the
> (well, a bit dubious; especially wrt. caching on the remote side)
> results are:
>
> git clone-pack:ssh 25s
> git rsync 27s
> git http-pull 47s
> git dumb-http 54s
> git ssh-pull 660s
>
> cogito clone-pack:ssh 35s (!)
> cogito rsync 140s
> cogito ssh-pull 480s
> cogito http-pull extrapolated to about an hour!
I should be able to get http-pull down to the neighborhood of
(current) ssh-pull; http-pull is that slow (when the source repository
isn't packed) because it's entirely sequential, rather than overlapping
requests like ssh-pull now does.
I should also be able to get ssh-pull down to the area of clone-pack, but
that's lower-priority, since there's clone-pack.
(I've written an untested patch for local-pull, which I'll be testing,
cleaning, and submitting tonight, assuming my newly-arrived monitor
actually works)
> PS:
> With the latest git version as of time of writing this:
> $ time cg-clone git+ssh://pasky@localhost/home/pasky/WWW/dev/git/.g cogito
> ...
> progress: 5759 objects, 10292457 bytes
> $ time cg-clone http://localhost/~pasky/dev/git/.g cogito
> ...
> progress: 8681 objects, 14881571 bytes
I've noticed that ssh connections don't actually disconnect at the end
with recent versions of ssh sometimes. In my experience, this occasionally
happens with git, but always happens with scp, suggesting that it's an ssh
bug of some sort; I've also only noticed this with openssh 3.9_p1 with
some of Gentoo's -r2 patches.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cloning speed comparison
2005-08-15 17:50 ` Daniel Barkalow
@ 2005-08-15 20:46 ` Junio C Hamano
2005-08-15 21:27 ` Daniel Barkalow
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2005-08-15 20:46 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: git
Daniel Barkalow <barkalow@iabervon.org> writes:
> I should be able to get http-pull down to the neighborhood of
> (current) ssh-pull; http-pull is that slow (when the source repository
> isn't packed) because it's entirely sequential, rather than overlapping
> requests like ssh-pull now does.
I like those prefetch() and process() code in pull.c very much.
I have been wondering if increasing parallelism more by
prefetching beyond the immediate parents of the current commit,
in "if (get_history)" part of process_commit(). Maybe it is not
worth it because doing a commit, its associated tree(s) and its
parents would already give us enough parallelism already.
> (I've written an untested patch for local-pull, which I'll be testing,
> cleaning, and submitting tonight, assuming my newly-arrived monitor
> actually works)
That is a great news. Thank you for doing this; looking forward
to see it, but no rush. Enjoy your new monitor.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cloning speed comparison
2005-08-15 20:46 ` Junio C Hamano
@ 2005-08-15 21:27 ` Daniel Barkalow
0 siblings, 0 replies; 11+ messages in thread
From: Daniel Barkalow @ 2005-08-15 21:27 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Mon, 15 Aug 2005, Junio C Hamano wrote:
> Daniel Barkalow <barkalow@iabervon.org> writes:
>
> > I should be able to get http-pull down to the neighborhood of
> > (current) ssh-pull; http-pull is that slow (when the source repository
> > isn't packed) because it's entirely sequential, rather than overlapping
> > requests like ssh-pull now does.
>
> I like those prefetch() and process() code in pull.c very much.
>
> I have been wondering if increasing parallelism more by
> prefetching beyond the immediate parents of the current commit,
> in "if (get_history)" part of process_commit(). Maybe it is not
> worth it because doing a commit, its associated tree(s) and its
> parents would already give us enough parallelism already.
It is actually already maxing out the parallelism; it has a FIFO of
objects which it needs, and calls prefetch() when it enqueues an object
and fetch() when it dequeues it. It only cares about the dependancies for
this purpose, not the types.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-08-15 21:24 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-13 1:54 Cloning speed comparison Petr Baudis
2005-08-13 2:12 ` Linus Torvalds
2005-08-13 3:10 ` Petr Baudis
2005-08-13 3:28 ` Linus Torvalds
2005-08-13 5:16 ` H. Peter Anvin
2005-08-13 5:25 ` Linus Torvalds
2005-08-13 23:25 ` H. Peter Anvin
2005-08-13 5:16 ` H. Peter Anvin
2005-08-15 17:50 ` Daniel Barkalow
2005-08-15 20:46 ` Junio C Hamano
2005-08-15 21:27 ` Daniel Barkalow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).