git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Cloning speed comparison
@ 2005-08-13  1:54 Petr Baudis
  2005-08-13  2:12 ` Linus Torvalds
  2005-08-15 17:50 ` Daniel Barkalow
  0 siblings, 2 replies; 11+ messages in thread
From: Petr Baudis @ 2005-08-13  1:54 UTC (permalink / raw)
  To: git

  Hello,

  I've wondered how slow the protocols other than rsync are, and the
(well, a bit dubious; especially wrt. caching on the remote side)
results are:

	git	clone-pack:ssh	25s
	git	rsync		27s
	git	http-pull	47s
	git	dumb-http	54s
	git	ssh-pull	660s

	cogito	clone-pack:ssh	35s (!)
	cogito	rsync		140s
	cogito	ssh-pull	480s
	cogito	http-pull	extrapolated to about an hour!
	cogito	dumb-http	N/A (missing info in the repository)

  (I didn't test the git server protocol, since kernel.org doesn't run
git server and I was too lazy to setup one.)

  The git repository contains one big pack, one small pack and few
standalone objects (5882 objects in total), while cogito is standalone
objects only (9670 objects in total, 8681 reachable).

  The numbers are off by some epsilons, as I didn't bother with multiple
measures, but shouldn't be hugely off for a general comparison. The
network connection has 2048kbit/s download, the other side was
www.kernel.org for HTTP and rsync, and master.kernel.org for ssh.

  Pulling from localhost (128M of RAM, 5M to 30M free - awful, yes):

	cogito	rsync:ssh	150s
	cogito	ssh-pull	120s (but didn't complete, see PS)
	cogito	http-pull	260s
	cogito	clone-pack:ssh	340s

  Anyway, clone-pack is a clear winner for networks (but someone should
re-check that, especially compared to rsync, wrt. server-side file
caching); really cool fast, but not very practical for anonymous access.
Any volunteers for a simple CGI (or gitweb addon) script + HTTP support
in clone-pack? HTTP is certainly the most suitable protocol for
anonymous pulls, so it's a shame it's still that sluggish.

  It is so slow here since it has some very ugly access pattern on the
objects database and my RAM is full so it does not get cached; even on
the servers, it was slower at first - unfortunately, I didn't measure
that, so what's in the top table are second accesses. Still, I would
expect the big repositories to stay mostly in the server cache, so this
isn't that big problem for those, I think.

  PS:
	With the latest git version as of time of writing this:
	$ time cg-clone git+ssh://pasky@localhost/home/pasky/WWW/dev/git/.g cogito
	...
	progress: 5759 objects, 10292457 bytes
	$ time cg-clone http://localhost/~pasky/dev/git/.g cogito
	...
	progress: 8681 objects, 14881571 bytes

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
If you want the holes in your knowledge showing up try teaching
someone.  -- Alan Cox

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cloning speed comparison
  2005-08-13  1:54 Cloning speed comparison Petr Baudis
@ 2005-08-13  2:12 ` Linus Torvalds
  2005-08-13  3:10   ` Petr Baudis
  2005-08-15 17:50 ` Daniel Barkalow
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2005-08-13  2:12 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git



On Sat, 13 Aug 2005, Petr Baudis wrote:
> 
>   Anyway, clone-pack is a clear winner for networks (but someone should
> re-check that, especially compared to rsync, wrt. server-side file
> caching); really cool fast, but not very practical for anonymous access.

git-daemon is for the anonymous access case, either started from inetd 
(or any other external "listen to port, exec service" thing), or with the 
built-in listening stuff.

It uses exactly the same protocol and logic as the regular ssh clone-pack 
thing, except it doesn't authenticate the remote end: it only checks that 
the local end is accepting anonymous pulls by checking whether there is a 
"git-daemon-export-ok" file in the git directory.

In my tests, the git daemon was noticeably faster than ssh, if only 
because the authentication actually tends to be a big part of the overhead 
in small pulls.

[ Hey. There's a deer outside my window eating our roses again. Cute ]

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cloning speed comparison
  2005-08-13  2:12 ` Linus Torvalds
@ 2005-08-13  3:10   ` Petr Baudis
  2005-08-13  3:28     ` Linus Torvalds
  2005-08-13  5:16     ` H. Peter Anvin
  0 siblings, 2 replies; 11+ messages in thread
From: Petr Baudis @ 2005-08-13  3:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, ftpadmin

Dear diary, on Sat, Aug 13, 2005 at 04:12:26AM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> On Sat, 13 Aug 2005, Petr Baudis wrote:
> > 
> >   Anyway, clone-pack is a clear winner for networks (but someone should
> > re-check that, especially compared to rsync, wrt. server-side file
> > caching); really cool fast, but not very practical for anonymous access.
> 
> git-daemon is for the anonymous access case, either started from inetd 
> (or any other external "listen to port, exec service" thing), or with the 
> built-in listening stuff.
> 
> It uses exactly the same protocol and logic as the regular ssh clone-pack 
> thing, except it doesn't authenticate the remote end: it only checks that 
> the local end is accepting anonymous pulls by checking whether there is a 
> "git-daemon-export-ok" file in the git directory.
> 
> In my tests, the git daemon was noticeably faster than ssh, if only 
> because the authentication actually tends to be a big part of the overhead 
> in small pulls.

Oh. Sounds nice, are there plans to run this on kernel.org too? (So far,
90% of my GIT network activity happens with kernel.org; the rest is with
my notebook, and I want to keep that ssh.)

BTW, is the pack protocol flexible enough to be extended to support
pushing? That would be great as well. You might suggest just using ssh,
but that (i) requires you to be root on the machine to add new users
(ii) consequently adds administrative burden (iii) isn't easy to set up
so that the user has no shell access, shall you want to restrict that.

> [ Hey. There's a deer outside my window eating our roses again. Cute ]

Oh, it must be nice in Oregon. I can't imagine anything like that to
happen in Czechia unless you live at a solitude or in some lonely tiny
village.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
If you want the holes in your knowledge showing up try teaching
someone.  -- Alan Cox

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cloning speed comparison
  2005-08-13  3:10   ` Petr Baudis
@ 2005-08-13  3:28     ` Linus Torvalds
  2005-08-13  5:16       ` H. Peter Anvin
  2005-08-13  5:16     ` H. Peter Anvin
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2005-08-13  3:28 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, ftpadmin



On Sat, 13 Aug 2005, Petr Baudis wrote:
>
> Oh. Sounds nice, are there plans to run this on kernel.org too? (So far,
> 90% of my GIT network activity happens with kernel.org; the rest is with
> my notebook, and I want to keep that ssh.)

Maybe. I don't know what the status of that is, but the plan was to at 
least give it a try.

> BTW, is the pack protocol flexible enough to be extended to support
> pushing?

The _protocol_ could handle it, but you obviously need some kind of secure 
authentication, and quite frankly, one of the selling points on git-daemon 
right now is that it's all read-only and very simple and there should be 
no security issues because it will never write anything at all.

So right now git-daemon only accepts requests from fetch-pack.

> > [ Hey. There's a deer outside my window eating our roses again. Cute ]
> 
> Oh, it must be nice in Oregon. I can't imagine anything like that to
> happen in Czechia unless you live at a solitude or in some lonely tiny
> village.

Deer are really just oversized rats with horns (*). They're cute, though,
and it's kind of funny looking up from the screen and noticing one
munching on the roses just ten feet away. 

			Linus

(*) Did I mention that biology wasn't one of the things I did at Uni?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cloning speed comparison
  2005-08-13  3:28     ` Linus Torvalds
@ 2005-08-13  5:16       ` H. Peter Anvin
  2005-08-13  5:25         ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2005-08-13  5:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git, ftpadmin

Linus Torvalds wrote:
> 
>>BTW, is the pack protocol flexible enough to be extended to support
>>pushing?
> 
> The _protocol_ could handle it, but you obviously need some kind of secure 
> authentication, and quite frankly, one of the selling points on git-daemon 
> right now is that it's all read-only and very simple and there should be 
> no security issues because it will never write anything at all.
> 

Running it over ssh would be a good way to do authentication...

	-hpa

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cloning speed comparison
  2005-08-13  3:10   ` Petr Baudis
  2005-08-13  3:28     ` Linus Torvalds
@ 2005-08-13  5:16     ` H. Peter Anvin
  1 sibling, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2005-08-13  5:16 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Linus Torvalds, git, ftpadmin

Petr Baudis wrote:
>>>
>>In my tests, the git daemon was noticeably faster than ssh, if only 
>>because the authentication actually tends to be a big part of the overhead 
>>in small pulls.
> 
> Oh. Sounds nice, are there plans to run this on kernel.org too? (So far,
> 90% of my GIT network activity happens with kernel.org; the rest is with
> my notebook, and I want to keep that ssh.)
> 

Yes, when I get some time...

	-hpa

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cloning speed comparison
  2005-08-13  5:16       ` H. Peter Anvin
@ 2005-08-13  5:25         ` Linus Torvalds
  2005-08-13 23:25           ` H. Peter Anvin
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2005-08-13  5:25 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Petr Baudis, git, ftpadmin



On Fri, 12 Aug 2005, H. Peter Anvin wrote:
> 
> Running it over ssh would be a good way to do authentication...

Well, if you have ssh as an option, you don't need git-daemon any more, 
since the protocol that git-daemon does runs quite well over ssh on its 
own...

The only point of git-daemon really is when you don't have ssh access (ie
you may want to give people a limited interface, but not full ssh). Ie
as-is, it's only for anonymous reads of a git archive, but it obviously
_could_ do more.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cloning speed comparison
  2005-08-13  5:25         ` Linus Torvalds
@ 2005-08-13 23:25           ` H. Peter Anvin
  0 siblings, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2005-08-13 23:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git, ftpadmin

Linus Torvalds wrote:
> 
> On Fri, 12 Aug 2005, H. Peter Anvin wrote:
> 
>>Running it over ssh would be a good way to do authentication...
> 
> 
> Well, if you have ssh as an option, you don't need git-daemon any more, 
> since the protocol that git-daemon does runs quite well over ssh on its 
> own...
> 
> The only point of git-daemon really is when you don't have ssh access (ie
> you may want to give people a limited interface, but not full ssh). Ie
> as-is, it's only for anonymous reads of a git archive, but it obviously
> _could_ do more.
> 

Okay.  So use git-daemon for the anonymous users, and run the git 
protocol over ssh for writing.  Seems easy enough for me.

	-hpa

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cloning speed comparison
  2005-08-13  1:54 Cloning speed comparison Petr Baudis
  2005-08-13  2:12 ` Linus Torvalds
@ 2005-08-15 17:50 ` Daniel Barkalow
  2005-08-15 20:46   ` Junio C Hamano
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Barkalow @ 2005-08-15 17:50 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

On Sat, 13 Aug 2005, Petr Baudis wrote:

>   Hello,
> 
>   I've wondered how slow the protocols other than rsync are, and the
> (well, a bit dubious; especially wrt. caching on the remote side)
> results are:
> 
> 	git	clone-pack:ssh	25s
> 	git	rsync		27s
> 	git	http-pull	47s
> 	git	dumb-http	54s
> 	git	ssh-pull	660s
> 
> 	cogito	clone-pack:ssh	35s (!)
> 	cogito	rsync		140s
> 	cogito	ssh-pull	480s
> 	cogito	http-pull	extrapolated to about an hour!

I should be able to get http-pull down to the neighborhood of 
(current) ssh-pull; http-pull is that slow (when the source repository 
isn't packed) because it's entirely sequential, rather than overlapping 
requests like ssh-pull now does.

I should also be able to get ssh-pull down to the area of clone-pack, but 
that's lower-priority, since there's clone-pack.

(I've written an untested patch for local-pull, which I'll be testing, 
cleaning, and submitting tonight, assuming my newly-arrived monitor 
actually works)

>   PS:
> 	With the latest git version as of time of writing this:
> 	$ time cg-clone git+ssh://pasky@localhost/home/pasky/WWW/dev/git/.g cogito
> 	...
> 	progress: 5759 objects, 10292457 bytes
> 	$ time cg-clone http://localhost/~pasky/dev/git/.g cogito
> 	...
> 	progress: 8681 objects, 14881571 bytes

I've noticed that ssh connections don't actually disconnect at the end 
with recent versions of ssh sometimes. In my experience, this occasionally 
happens with git, but always happens with scp, suggesting that it's an ssh 
bug of some sort; I've also only noticed this with openssh 3.9_p1 with 
some of Gentoo's -r2 patches.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cloning speed comparison
  2005-08-15 17:50 ` Daniel Barkalow
@ 2005-08-15 20:46   ` Junio C Hamano
  2005-08-15 21:27     ` Daniel Barkalow
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2005-08-15 20:46 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> I should be able to get http-pull down to the neighborhood of 
> (current) ssh-pull; http-pull is that slow (when the source repository 
> isn't packed) because it's entirely sequential, rather than overlapping 
> requests like ssh-pull now does.

I like those prefetch() and process() code in pull.c very much.

I have been wondering if increasing parallelism more by
prefetching beyond the immediate parents of the current commit,
in "if (get_history)" part of process_commit().  Maybe it is not
worth it because doing a commit, its associated tree(s) and its
parents would already give us enough parallelism already.

> (I've written an untested patch for local-pull, which I'll be testing, 
> cleaning, and submitting tonight, assuming my newly-arrived monitor 
> actually works)

That is a great news.  Thank you for doing this; looking forward
to see it, but no rush.  Enjoy your new monitor.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cloning speed comparison
  2005-08-15 20:46   ` Junio C Hamano
@ 2005-08-15 21:27     ` Daniel Barkalow
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Barkalow @ 2005-08-15 21:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, 15 Aug 2005, Junio C Hamano wrote:

> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > I should be able to get http-pull down to the neighborhood of 
> > (current) ssh-pull; http-pull is that slow (when the source repository 
> > isn't packed) because it's entirely sequential, rather than overlapping 
> > requests like ssh-pull now does.
> 
> I like those prefetch() and process() code in pull.c very much.
> 
> I have been wondering if increasing parallelism more by
> prefetching beyond the immediate parents of the current commit,
> in "if (get_history)" part of process_commit().  Maybe it is not
> worth it because doing a commit, its associated tree(s) and its
> parents would already give us enough parallelism already.

It is actually already maxing out the parallelism; it has a FIFO of 
objects which it needs, and calls prefetch() when it enqueues an object 
and fetch() when it dequeues it. It only cares about the dependancies for 
this purpose, not the types.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-08-15 21:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-13  1:54 Cloning speed comparison Petr Baudis
2005-08-13  2:12 ` Linus Torvalds
2005-08-13  3:10   ` Petr Baudis
2005-08-13  3:28     ` Linus Torvalds
2005-08-13  5:16       ` H. Peter Anvin
2005-08-13  5:25         ` Linus Torvalds
2005-08-13 23:25           ` H. Peter Anvin
2005-08-13  5:16     ` H. Peter Anvin
2005-08-15 17:50 ` Daniel Barkalow
2005-08-15 20:46   ` Junio C Hamano
2005-08-15 21:27     ` Daniel Barkalow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).