* [RFC] git-clone should create packed refs
@ 2008-02-15 0:33 Johan Herland
2008-02-15 0:53 ` Johannes Schindelin
0 siblings, 1 reply; 3+ messages in thread
From: Johan Herland @ 2008-02-15 0:33 UTC (permalink / raw)
To: git; +Cc: Kristian Høgsberg
Hi,
I'm experimenting with converting deep (lots of history) CVS repos to Git,
and I notice that cloning the resulting Git repos is _slow_. E.g. an
example repo with 10000 tags and 1000 branches will take ~24 seconds to
clone. Debugging shows that >95% of that time is spent by calling "git
update-ref" for each of the 11000 refs. I can easily get the total runtime
down to ~4 seconds by replacing the "git update-ref ..." with something
like "echo $sha1 $destname >> $GIT_DIR/packed-refs". Some more
investigation shows that what's actually taking so long is not writing all
these 40-bytes ref files and their corresponding reflogs, but rather the
overhead of creating the "git update-ref" process 11000 times (echo is a
shell builtin, I presume, so doesn't have the same overhead). My conclusion
is therefore that making "git clone" a builtin will solve my performance
problems (since the update-ref is now a function call, rather than a
subprocess).
Searching the list, I find that - lo and behold - someone (CCed) is actually
already working on this. :)
(BTW, a progress report on this work would be nice...)
So the only niggle I have left, is that when git-clone is cloning repos with
thousands of refs, it makes sense to create a packed-refs file directly in
the clone, instead of having to run "git pack-refs" (or "git gc")
afterwards to (re)pack the refs. This has pretty much the same reasoning as
transferring and storing the objects in packs instead of exploding them
into loose objects.
In my case, the upstream repo already has packed refs, so it just seems
stupid to explode them into "loose" refs when cloning, and make me re-pack
them afterwards.
Looking at git-clone.sh, I even find that when cloning, the refs are
transferred in a format similar (but not identical) to the packed-refs file
format (see CLONE_HEAD in git-clone.sh).
AFAICS, the only complication with this proposal is how to deal with the
reflogs. Right now, for each ref created, a corresponding reflog with a
single entry is written. Therefore - in my example repo above - the
current "git clone" writes ~22000 files, and my proposal offers only a net
reduction in #files written by ~50%, instead of ~100%. For reference, the
reflog entries written by "git clone" look like this:
"000... $sha1 A U Thor <e@mail> $timestamp clone: from $repo"
IMHO, these entries don't carry much value:
- The $sha1 is self-evident (and if later changed, will still be mentioned
in the next reflog entry).
- The author name and email would probably be self-evident/uninteresting in
most cases.
- The timestamp might be marginally useful, as I can't immediately document
another way of getting the time of cloning.
- The $repo would also be self-evident in many cases, and would in any case
also be listed in the config file in the "origin" remote section.
I'd therefore suggest to make reflog creation in "git clone" optional, in
order to avoid having the number of files written be proportional to the
number of refs.
I would imagine that even though the time used on Linux for writing
thousands of files might be negligible, this is not the case on certain
other OSes...
Have fun! :)
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] git-clone should create packed refs
2008-02-15 0:33 [RFC] git-clone should create packed refs Johan Herland
@ 2008-02-15 0:53 ` Johannes Schindelin
2008-02-15 1:13 ` Johan Herland
0 siblings, 1 reply; 3+ messages in thread
From: Johannes Schindelin @ 2008-02-15 0:53 UTC (permalink / raw)
To: Johan Herland; +Cc: git, Kristian Høgsberg
Hi,
On Fri, 15 Feb 2008, Johan Herland wrote:
> when git-clone is cloning repos with thousands of refs, it makes sense
> to create a packed-refs file directly in the clone, instead of having to
> run "git pack-refs" (or "git gc") afterwards to (re)pack the refs.
Sure, and it's easy, too. The format of the packed-refs file is exactly
the same as the output of "git ls-remote <origin>".
Ciao,
Dscho
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] git-clone should create packed refs
2008-02-15 0:53 ` Johannes Schindelin
@ 2008-02-15 1:13 ` Johan Herland
0 siblings, 0 replies; 3+ messages in thread
From: Johan Herland @ 2008-02-15 1:13 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Kristian Høgsberg
On Friday 15 February 2008, Johannes Schindelin wrote:
> Hi,
>
> On Fri, 15 Feb 2008, Johan Herland wrote:
>
> > when git-clone is cloning repos with thousands of refs, it makes sense
> > to create a packed-refs file directly in the clone, instead of having to
> > run "git pack-refs" (or "git gc") afterwards to (re)pack the refs.
>
> Sure, and it's easy, too. The format of the packed-refs file is exactly
> the same as the output of "git ls-remote <origin>".
As I said in my first email: similar, but not identical.
Contrast:
$ git ls-remote origin
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd HEAD
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/heads/another_branch
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/heads/master
d6f7eacba9f07aa382a113e129866266c8d60642 refs/tags/complex_tag
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/tags/complex_tag^{}
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/tags/simple_tag
$ cat .git/packed-refs
# pack-refs with: peeled
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/heads/master
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/remotes/origin/another_branch
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/remotes/origin/master
d6f7eacba9f07aa382a113e129866266c8d60642 refs/tags/complex_tag
^ec09f49905e94f3bbf04bf3521f1fc59b1345cbd
ec09f49905e94f3bbf04bf3521f1fc59b1345cbd refs/tags/simple_tag
$
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-02-15 1:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-15 0:33 [RFC] git-clone should create packed refs Johan Herland
2008-02-15 0:53 ` Johannes Schindelin
2008-02-15 1:13 ` Johan Herland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).