git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] git-clone should create packed refs
@ 2008-02-15  0:33 Johan Herland
  2008-02-15  0:53 ` Johannes Schindelin
  0 siblings, 1 reply; 3+ messages in thread
From: Johan Herland @ 2008-02-15  0:33 UTC (permalink / raw)
  To: git; +Cc: Kristian Høgsberg

Hi,

I'm experimenting with converting deep (lots of history) CVS repos to Git, 
and I notice that cloning the resulting Git repos is _slow_. E.g. an 
example repo with 10000 tags and 1000 branches will take ~24 seconds to 
clone. Debugging shows that >95% of that time is spent by calling "git 
update-ref" for each of the 11000 refs. I can easily get the total runtime 
down to ~4 seconds by replacing the "git update-ref ..." with something 
like "echo $sha1 $destname >> $GIT_DIR/packed-refs". Some more 
investigation shows that what's actually taking so long is not writing all 
these 40-bytes ref files and their corresponding reflogs, but rather the 
overhead of creating the "git update-ref" process 11000 times (echo is a 
shell builtin, I presume, so doesn't have the same overhead). My conclusion 
is therefore that making "git clone" a builtin will solve my performance 
problems (since the update-ref is now a function call, rather than a 
subprocess).

Searching the list, I find that - lo and behold - someone (CCed) is actually 
already working on this. :)
(BTW, a progress report on this work would be nice...)


So the only niggle I have left, is that when git-clone is cloning repos with 
thousands of refs, it makes sense to create a packed-refs file directly in 
the clone, instead of having to run "git pack-refs" (or "git gc") 
afterwards to (re)pack the refs. This has pretty much the same reasoning as 
transferring and storing the objects in packs instead of exploding them 
into loose objects.

In my case, the upstream repo already has packed refs, so it just seems 
stupid to explode them into "loose" refs when cloning, and make me re-pack 
them afterwards.

Looking at git-clone.sh, I even find that when cloning, the refs are 
transferred in a format similar (but not identical) to the packed-refs file 
format (see CLONE_HEAD in git-clone.sh).

AFAICS, the only complication with this proposal is how to deal with the 
reflogs. Right now, for each ref created, a corresponding reflog with a 
single entry is written. Therefore - in my example repo above - the 
current "git clone" writes ~22000 files, and my proposal offers only a net 
reduction in #files written by ~50%, instead of ~100%. For reference, the 
reflog entries written by "git clone" look like this:
	"000... $sha1 A U Thor <e@mail> $timestamp  clone: from $repo"
IMHO, these entries don't carry much value:
- The $sha1 is self-evident (and if later changed, will still be mentioned
  in the next reflog entry).
- The author name and email would probably be self-evident/uninteresting in
  most cases.
- The timestamp might be marginally useful, as I can't immediately document
  another way of getting the time of cloning.
- The $repo would also be self-evident in many cases, and would in any case
  also be listed in the config file in the "origin" remote section.
I'd therefore suggest to make reflog creation in "git clone" optional, in 
order to avoid having the number of files written be proportional to the 
number of refs.

I would imagine that even though the time used on Linux for writing 
thousands of files might be negligible, this is not the case on certain 
other OSes...


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] git-clone should create packed refs
  2008-02-15  0:33 [RFC] git-clone should create packed refs Johan Herland
@ 2008-02-15  0:53 ` Johannes Schindelin
  2008-02-15  1:13   ` Johan Herland
  0 siblings, 1 reply; 3+ messages in thread
From: Johannes Schindelin @ 2008-02-15  0:53 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Kristian Høgsberg

Hi,

On Fri, 15 Feb 2008, Johan Herland wrote:

> when git-clone is cloning repos with thousands of refs, it makes sense 
> to create a packed-refs file directly in the clone, instead of having to 
> run "git pack-refs" (or "git gc") afterwards to (re)pack the refs.

Sure, and it's easy, too.  The format of the packed-refs file is exactly 
the same as the output of "git ls-remote <origin>".

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] git-clone should create packed refs
  2008-02-15  0:53 ` Johannes Schindelin
@ 2008-02-15  1:13   ` Johan Herland
  0 siblings, 0 replies; 3+ messages in thread
From: Johan Herland @ 2008-02-15  1:13 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Kristian Høgsberg

On Friday 15 February 2008, Johannes Schindelin wrote:
> Hi,
> 
> On Fri, 15 Feb 2008, Johan Herland wrote:
> 
> > when git-clone is cloning repos with thousands of refs, it makes sense 
> > to create a packed-refs file directly in the clone, instead of having to 
> > run "git pack-refs" (or "git gc") afterwards to (re)pack the refs.
> 
> Sure, and it's easy, too.  The format of the packed-refs file is exactly 
> the same as the output of "git ls-remote <origin>".

As I said in my first email: similar, but not identical.

Contrast:

$ git ls-remote origin
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd        HEAD
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd        refs/heads/another_branch
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd        refs/heads/master
d6f7eacba9f07aa382a113e129866266c8d60642        refs/tags/complex_tag
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd        refs/tags/complex_tag^{}
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd        refs/tags/simple_tag
$ cat .git/packed-refs
# pack-refs with: peeled
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/heads/master
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/remotes/origin/another_branch
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/remotes/origin/master
d6f7eacba9f07aa382a113e129866266c8d60642 refs/tags/complex_tag
^ec09f49905e94f3bbf04bf3521f1fc59b1345cbd
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/tags/simple_tag
$


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-02-15  1:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-15  0:33 [RFC] git-clone should create packed refs Johan Herland
2008-02-15  0:53 ` Johannes Schindelin
2008-02-15  1:13   ` Johan Herland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).