From: Eric Wong <normalperson@yhbt.net>
To: Mike Hommey <mh@glandium.org>
Cc: git@vger.kernel.org
Subject: Re: Significant performance waste in git-svn and friends
Date: Thu, 6 Sep 2007 00:04:08 -0700 [thread overview]
Message-ID: <20070906070407.GA19624@soma> (raw)
In-Reply-To: <20070905184710.GA3632@glandium.org>
Mike Hommey <mh@glandium.org> wrote:
> Hi,
Hi Mike,
> Being a pervert abusing the way subversion doesn't deal with branches
> and tags, I'm actually not a user of git-svn or git-svnimport, because
> they just can't deal easily with my perversion. So I'm writing a script
> to do the conversion for me, and since I also like to learn new things
> when I'm coding, I'm writing it in ruby.
>
> Anyways, one of the things I'm trying to convert is my svk repository
> for debian packaging of xulrunner (so, a significant subset of the
> mozilla tree), which doesn't involve a lot of revisions (around 280,
> because I only imported releases or CVS snapshots), but involves a lot
> of files (roughly 20k).
>
> The first thing I noticed when twisting around the svk repo so that
> git-svn could somehow import it a while ago, is that running git-svn
> was in my case significantly slower than svnadmin dump | svnadmin load
> (more than 2 times slower).
>
> And now, with my own script, I got the same kind of "slowdown". So I
> investigated it, and it didn't take long to realize that replacing
> git-hash-object by a simple reimplementation in ruby was *way* faster.
> git-hash-object being more than probably what you do the most when you
> import a remote repository, it is not much of a surprise that forking
> thousands of times is a huge performance waste.
I haven't looked at the times in a while, but I suspect that exec()
is the (much bigger) culprit.
Since I usually import off remote repositories, so I notice network
latency way before I notice local performance problems with git-svn.
> So, just for the record, I did a lame hack of git-svn to see what kind
> of speedup could happen in git-svn. You can find this lame hack as a
> patch below. I did some tests (with a 1.5.2.1 release) and here are the
> results, importing only the trunk (192 revisions), with no checkout, and
> redirecting stdout to /dev/null:
>
> original git-svn:
> real 25m1.871s
> user 8m51.593s
> sys 12m31.659s
>
> patched git-svn:
> real 14m45.870s
> user 7m31.928s
> sys 4m1.047s
That's awesome.
> - It might be worth testing if git-cat-file is called a lot. If so,
> implementing a simple git-cat-file equivalent that would work for
> unpacked objects could improve speed.
IIRC git-cat-file is called a lot. Every modified file needs the
original cat-ed to make use of the delta.
> The same things obviously apply to git-cvsimport and other scripts
> calling git-hash-object a lot.
Making git-svn use fast-import would be very nice. I've got a bunch
of other git-svn things that I need to work on, but having git-svn
converted to use fast-import would be nice. Or allowing Git.pm
to access more of the git internals...
However, how well/poorly would fast-import work for incremental
fetches throughout the day?
--
Eric Wong
next prev parent reply other threads:[~2007-09-06 7:04 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-05 18:47 Significant performance waste in git-svn and friends Mike Hommey
2007-09-05 20:40 ` Junio C Hamano
2007-09-05 21:19 ` David Kastrup
2007-09-06 1:07 ` Patrick Doyle
2007-09-06 2:19 ` Shawn O. Pearce
2007-09-06 2:16 ` Shawn O. Pearce
2007-09-06 5:52 ` Mike Hommey
2007-09-06 7:04 ` Eric Wong [this message]
2007-09-07 4:55 ` Shawn O. Pearce
2007-09-07 6:28 ` Steven Grimm
2007-09-07 5:41 ` Mike Hommey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070906070407.GA19624@soma \
--to=normalperson@yhbt.net \
--cc=git@vger.kernel.org \
--cc=mh@glandium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.