Dump http servers still slow?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Dump http servers still slow?
@ 2005-07-28 21:00 Darrin Thompson
  2005-07-29  2:24 ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Darrin Thompson @ 2005-07-28 21:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio,

I just ran git clone against the mainline git repository using both http
and rsync. http was still quite slow compared to rsync. I expected that
the http time would be much faster than in the past due to the pack
file.

Is there something simple I'm missing?

--
Darrin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Dump http servers still slow?
  2005-07-28 21:00 Dump http servers still slow? Darrin Thompson
@ 2005-07-29  2:24 ` Junio C Hamano
  2005-07-29 14:03   ` Darrin Thompson
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2005-07-29  2:24 UTC (permalink / raw)
  To: Darrin Thompson; +Cc: git

Darrin Thompson <darrint@progeny.com> writes:

> I just ran git clone against the mainline git repository using both http
> and rsync. http was still quite slow compared to rsync. I expected that
> the http time would be much faster than in the past due to the pack
> file.
>
> Is there something simple I'm missing?

No, the only thing you missed was that I did not write it to
make it fast, but just to make it work ;-).  The commit walker
simply does not work against a dumb http server repository that
is packed and prune-packed, which is already the case for both
kernel and git repositories.

The thing is, the base pack for the git repository is 1.8MB
currently containing 4500+ objects, while we accumulated 600+
unpacked objects since then which is about ~5MB.  The commit
walker needs to fetched the latter one by one in the old way.

When packed incrementally on top of the base pack, these 600+
unpacked objects compress down to something like 400KB, and I
was hoping we could wait until we accumulate enough to produce
an incremental about a meg or so ...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Dump http servers still slow?
  2005-07-29  2:24 ` Junio C Hamano
@ 2005-07-29 14:03   ` Darrin Thompson
  2005-07-29 14:48     ` Ryan Anderson
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Darrin Thompson @ 2005-07-29 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 2005-07-28 at 19:24 -0700, Junio C Hamano wrote:
> The thing is, the base pack for the git repository is 1.8MB
> currently containing 4500+ objects, while we accumulated 600+
> unpacked objects since then which is about ~5MB.  The commit
> walker needs to fetched the latter one by one in the old way.
> 
> When packed incrementally on top of the base pack, these 600+
> unpacked objects compress down to something like 400KB, and I
> was hoping we could wait until we accumulate enough to produce
> an incremental about a meg or so ...

Ok... so lets check my assumptions:

1. Pack files should reduce the number of http round trips.
2. What I'm seeing when I check out mainline git is the acquisition of a
single large pack, then 600+ more recent objects. Better than before,
but still hundreds of round trips.
3. If I wanted to further speed up the initial checkout on my own
repositories I could frequently repack my most recent few hundred
objects.
4. If curl had pipelining then less pack management would be needed.

Where is the code for gitweb? (i.e. http://kernel.org/git ) Seems like
it could benefit from some git-send-pack superpowers.

--
Darrin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Dump http servers still slow?
  2005-07-29 14:03   ` Darrin Thompson
@ 2005-07-29 14:48     ` Ryan Anderson
  2005-07-29 14:57       ` Darrin Thompson
  2005-07-30  2:11     ` Junio C Hamano
  2005-07-31  6:51     ` Junio C Hamano
  2 siblings, 1 reply; 10+ messages in thread
From: Ryan Anderson @ 2005-07-29 14:48 UTC (permalink / raw)
  To: Darrin Thompson; +Cc: Junio C Hamano, git

On Fri, Jul 29, 2005 at 09:03:41AM -0500, Darrin Thompson wrote:
> 
> Where is the code for gitweb? (i.e. http://kernel.org/git ) Seems like
> it could benefit from some git-send-pack superpowers.

http://www.kernel.org/pub/software/scm/gitweb/

It occurs to me that pulling this into the main git repository might not
be a bad idea, since it is currently living outside any revision
tracking at the moment.

-- 

Ryan Anderson
  sometimes Pug Majere

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Dump http servers still slow?
  2005-07-29 14:48     ` Ryan Anderson
@ 2005-07-29 14:57       ` Darrin Thompson
  2005-07-29 15:08         ` Radoslaw AstralStorm Szkodzinski
  0 siblings, 1 reply; 10+ messages in thread
From: Darrin Thompson @ 2005-07-29 14:57 UTC (permalink / raw)
  To: Ryan Anderson; +Cc: Junio C Hamano, git

On Fri, 2005-07-29 at 10:48 -0400, Ryan Anderson wrote:
> On Fri, Jul 29, 2005 at 09:03:41AM -0500, Darrin Thompson wrote:
> > 
> > Where is the code for gitweb? (i.e. http://kernel.org/git ) Seems like
> > it could benefit from some git-send-pack superpowers.
> 
> http://www.kernel.org/pub/software/scm/gitweb/
> 
> It occurs to me that pulling this into the main git repository might not
> be a bad idea, since it is currently living outside any revision
> tracking at the moment.
> 

Can't see the code.

http://www.kernel.org/pub/software/scm/gitweb/gitweb.cgi

Internal Server Error

--
Darrin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Dump http servers still slow?
  2005-07-29 14:57       ` Darrin Thompson
@ 2005-07-29 15:08         ` Radoslaw AstralStorm Szkodzinski
  2005-07-29 15:26           ` Darrin Thompson
  0 siblings, 1 reply; 10+ messages in thread
From: Radoslaw AstralStorm Szkodzinski @ 2005-07-29 15:08 UTC (permalink / raw)
  To: Darrin Thompson; +Cc: ryan, junkio, git

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

On Fri, 29 Jul 2005 09:57:36 -0500
Darrin Thompson <darrint@progeny.com> wrote:

> Can't see the code.
> 
> http://www.kernel.org/pub/software/scm/gitweb/gitweb.cgi
> 
> Internal Server Error
> 

Use FTP.

-- 
AstralStorm

GPG Key ID = 0xD1F10BA2
GPG Key fingerprint = 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2
Please encrypt if you can.

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Dump http servers still slow?
  2005-07-29 15:08         ` Radoslaw AstralStorm Szkodzinski
@ 2005-07-29 15:26           ` Darrin Thompson
  0 siblings, 0 replies; 10+ messages in thread
From: Darrin Thompson @ 2005-07-29 15:26 UTC (permalink / raw)
  To: Radoslaw AstralStorm Szkodzinski; +Cc: ryan, junkio, git

On Fri, 2005-07-29 at 17:08 +0200, Radoslaw AstralStorm Szkodzinski
wrote:
> On Fri, 29 Jul 2005 09:57:36 -0500
> Darrin Thompson <darrint@progeny.com> wrote:
> 
> > Can't see the code.
> > 
> > http://www.kernel.org/pub/software/scm/gitweb/gitweb.cgi
> > 
> > Internal Server Error
> > 
> 
> Use FTP.
> 

Duh. Thanks.

--
Darrin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Dump http servers still slow?
  2005-07-29 14:03   ` Darrin Thompson
  2005-07-29 14:48     ` Ryan Anderson
@ 2005-07-30  2:11     ` Junio C Hamano
  2005-07-31  6:51     ` Junio C Hamano
  2 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2005-07-30  2:11 UTC (permalink / raw)
  To: Darrin Thompson; +Cc: git

Darrin Thompson <darrint@progeny.com> writes:

> Ok... so lets check my assumptions:
>
> 1. Pack files should reduce the number of http round trips.
> 2. What I'm seeing when I check out mainline git is the acquisition of a
> single large pack, then 600+ more recent objects. Better than before,
> but still hundreds of round trips.
> 3. If I wanted to further speed up the initial checkout on my own
> repositories I could frequently repack my most recent few hundred
> objects.
> 4. If curl had pipelining then less pack management would be needed.

All true.  Another possibility is to make multiple requests in
parallel; if curl does not do pipelining, either switch to
something that does, or have more then one process using curl.

The dumb server preparation creates three files, two of which is
currently used by clone (one is list of packs, the other is list
of branches and tags).  The third one is commit ancestry
information.  The commit walker could be taught to read it to
figure out what commits it still needs to fetch without waiting
for the commit being retrieved to be parsed.

Sorry, I am not planning to write that part myself.

One potential low hanging fruit is that even for cloning via
git:// URL we _might_ be better off starting with the dumb
server protocol; get the list of statically prepared packs and
obtain them upfront before starting the clone-pack/upload-pack
protocol pair.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Dump http servers still slow?
  2005-07-29 14:03   ` Darrin Thompson
  2005-07-29 14:48     ` Ryan Anderson
  2005-07-30  2:11     ` Junio C Hamano
@ 2005-07-31  6:51     ` Junio C Hamano
  2005-08-01 14:03       ` Darrin Thompson
  2 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2005-07-31  6:51 UTC (permalink / raw)
  To: git; +Cc: Darrin Thompson, barkalow

Darrin Thompson <darrint@progeny.com> writes:

> 1. Pack files should reduce the number of http round trips.
> 2. What I'm seeing when I check out mainline git is the acquisition of a
> single large pack, then 600+ more recent objects. Better than before,
> but still hundreds of round trips.

I've packed the git.git repository, by the way.  It has 43
unpacked objects totalling 224 kilobytes, so cloning over dumb
http should go a lot faster until we accumulate more unpacked
objects.

Some of you may have noticed that in the proposed updates queue
("pu" branch) I have a couple of commits related to pulling from
a packed dumb http server.  There are two "git fetch http://"
commits to let you pull from such, and another stupid "count
objects" script that you can use to see how many unpacked
objects you have in your repository; the latter is to help
you decide when to repack.

Brave souls may want to try out the dumb http fetch.  For
example, it _should_ do the right thing even if you do the
following:

 $ git clone http://www.kernel.org/pub/scm/git/git.git/ newdir
 $ cd newdir
 $ mv .git/objects/pack/pack-* . ;# even if you unpack packs on your
 $ rm -f pack-*.idx		 ;# end, it should do the right thing.
 $ for pack in pack-*.pack; do
     git-unpack-objects <$pack
     rm -f "$pack"
   done
 $ rm -f .git/refs/heads/pu
 $ git prune ;# lose objects in "pu" but still not in "master"
 $ git pull origin pu
 $ git ls-remote origin |
   while read sha1 refname
   do
       case "$refname" in
       refs/heads/master) echo $sha1 >".git/$refname" ;;
       esac
   done ;# revert master to upstream master
 $ old=$(git-rev-parse master^^^^^^^^^^)
 $ echo "$old" >.git/refs/heads/master ;# rewind further
 $ git checkout -f master
 $ git prune ;# try losing a bit more objects.
 $ git pull origin master
 $ git ls-remote ./.		;# show me my refs
 $ git ls-remote origin		;# show me his refs

Unlike my other shell scripts I usually write in my e-mail
buffer, I have actually run the above ;-).

-jc

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Dump http servers still slow?
  2005-07-31  6:51     ` Junio C Hamano
@ 2005-08-01 14:03       ` Darrin Thompson
  0 siblings, 0 replies; 10+ messages in thread
From: Darrin Thompson @ 2005-08-01 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, barkalow

On Sat, 2005-07-30 at 23:51 -0700, Junio C Hamano wrote:
> Darrin Thompson <darrint@progeny.com> writes:
> 
> > 1. Pack files should reduce the number of http round trips.
> > 2. What I'm seeing when I check out mainline git is the acquisition of a
> > single large pack, then 600+ more recent objects. Better than before,
> > but still hundreds of round trips.
> 
> I've packed the git.git repository, by the way.  It has 43
> unpacked objects totalling 224 kilobytes, so cloning over dumb
> http should go a lot faster until we accumulate more unpacked
> objects.

I did a pull from the office and the times were 27 sec for http and 17
sec for rsync. So the moral of the story should be that frequent repacks
are sufficient for decent http performance.

--
Darrin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-08-01 14:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-28 21:00 Dump http servers still slow? Darrin Thompson
2005-07-29  2:24 ` Junio C Hamano
2005-07-29 14:03   ` Darrin Thompson
2005-07-29 14:48     ` Ryan Anderson
2005-07-29 14:57       ` Darrin Thompson
2005-07-29 15:08         ` Radoslaw AstralStorm Szkodzinski
2005-07-29 15:26           ` Darrin Thompson
2005-07-30  2:11     ` Junio C Hamano
2005-07-31  6:51     ` Junio C Hamano
2005-08-01 14:03       ` Darrin Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).