* Re: git-svn with big subversion repository
2011-03-02 2:43 git-svn with big subversion repository John Kristian
@ 2011-03-02 16:09 ` Thomas Ferris Nicolaisen
2011-03-03 4:13 ` Phil Hord
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Thomas Ferris Nicolaisen @ 2011-03-02 16:09 UTC (permalink / raw)
To: John Kristian; +Cc: git@vger.kernel.org
Hi John,
I've successfully run git svn clone on a repository with about 100k
revisions. The clone was not of the whole repository, but rather a
subdirectory for a project using the trunk/tags/branches structure.
The project is about 200k files and about 4GB.
The initial clone took hours and hours (on my macbook). I basically
had to leave it on over night (the svn server is here on the LAN,
running over https).
The only problem I had was that the clone would occasionally exit (not
stall, as you say). This is a know problem described here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=526989
The solution is to just run git svn fetch, as the cloning will pick up
where it stopped. To keep from having to do this yourself, loop the
fetch in a shell script. I blogged about it here:
<http://blog.tfnico.com/2010/07/living-with-subversion-and-git-in.html>
And there are also some more tricks and tips for living with git+svn
here: <http://www.tfnico.com/presentations/git-and-subversion>
You could also investigate how the Apache folks have made their git
mirrors here: http://git.apache.org/ - at least they have an SVN repo
with over a million revisions. I think they did something like
svn-dump + git fast-import, but I couldn't find any details on the
fly.
On Wed, Mar 2, 2011 at 3:43 AM, John Kristian <jkristian@linkedin.com> wrote:
> How do you recommend using git to work with branches of a large, busy
> subversion repository? In general, how can small teams use git for their
> tasks, and use subversion to coordinate with a larger organization?
>
> git-svn has some trouble, I find. For example, this tries to copy the entire
> repo starting with revision 1:
>
> git svn clone --stdlayout svn+ssh://server/repo/project
>
> This would take weeks, I estimate for my subversion repository.
>
> Choosing a subset of the repository enables git svn clone to cope, but then
> git svn fetch will stall after processing a few revisions. For example:
>
> git svn clone --no-follow-parent --no-minimize-url \
> --branches=branches \
> --ignore-paths="^(?!branches/(TEAM_|RELEASE_))" \
> -r $BASE svn+ssh://server/repo/project
> git svn fetch --no-follow-parent # stalls
>
> I don't why it stalls. I guess it's doing something that requires processing
> the entire subversion repository.
>
> The best I can do is clone each subversion branch into a separate svn-remote
> section of the .git/config file, for example:
>
> git svn clone --no-follow-parent --no-minimize-url \
> --svn-remote=TEAM_FOO --id=TEAM_FOO \
> -r $BASE svn+ssh://server/repo/project/branches/TEAM_FOO
> git svn fetch --no-follow-parent
>
> The clone runs about as long as svn checkout, and the fetch replays the
> later revisions briskly. Sadly, the relationship between branches isn't
> fetched: git log won't tell me how a given subversion branch was copied from
> another. I use svn for that.
>
> I'm using git version 1.7.4, git-svn version 1.7.4 (svn 1.6.5), svn version
> 1.6.0 (r36650) and Mac OS X version 10.6.5. I got git from MacPorts.
>
> - John Kristian
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-svn with big subversion repository
2011-03-02 2:43 git-svn with big subversion repository John Kristian
2011-03-02 16:09 ` Thomas Ferris Nicolaisen
@ 2011-03-03 4:13 ` Phil Hord
2011-03-05 10:53 ` Florian Weimer
2011-03-09 5:53 ` Jason Miller
3 siblings, 0 replies; 6+ messages in thread
From: Phil Hord @ 2011-03-03 4:13 UTC (permalink / raw)
To: John Kristian; +Cc: git@vger.kernel.org
On 03/01/2011 09:43 PM, John Kristian wrote:
> How do you recommend using git to work with branches of a large, busy
> subversion repository? In general, how can small teams use git for their
> tasks, and use subversion to coordinate with a larger organization?
>
> git-svn has some trouble, I find. For example, this tries to copy the entire
> repo starting with revision 1:
>
> git svn clone --stdlayout svn+ssh://server/repo/project
>
> This would take weeks, I estimate for my subversion repository.
>
> Choosing a subset of the repository enables git svn clone to cope, but then
> git svn fetch will stall after processing a few revisions. For example:
>
> git svn clone --no-follow-parent --no-minimize-url \
> --branches=branches \
> --ignore-paths="^(?!branches/(TEAM_|RELEASE_))" \
> -r $BASE svn+ssh://server/repo/project
> git svn fetch --no-follow-parent # stalls
>
> I don't why it stalls. I guess it's doing something that requires processing
> the entire subversion repository.
My initial git-svn clone took several days and many restarts. It was
much faster on my laptop. I found out later I had a flaky router and it
was dropping about 20% of my packets. Replaced the router and the clone
dropped to a reasonable couple-of-hours. Is it just me?
You can optimize by cloning specific paths inside the svn repo and then
merging in git later.
Phil
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-svn with big subversion repository
2011-03-02 2:43 git-svn with big subversion repository John Kristian
2011-03-02 16:09 ` Thomas Ferris Nicolaisen
2011-03-03 4:13 ` Phil Hord
@ 2011-03-05 10:53 ` Florian Weimer
2011-03-09 5:53 ` Jason Miller
3 siblings, 0 replies; 6+ messages in thread
From: Florian Weimer @ 2011-03-05 10:53 UTC (permalink / raw)
To: git
* John Kristian:
> How do you recommend using git to work with branches of a large, busy
> subversion repository? In general, how can small teams use git for their
> tasks, and use subversion to coordinate with a larger organization?
I've used svnsync to a local repository and git-svn against that.
This meant that my experiments do not cause excessive load on the
server.
You should definitely coordinate this because this could be considered
leaching IP if the repository is not public.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-svn with big subversion repository
2011-03-02 2:43 git-svn with big subversion repository John Kristian
` (2 preceding siblings ...)
2011-03-05 10:53 ` Florian Weimer
@ 2011-03-09 5:53 ` Jason Miller
[not found] ` <C99D031D.D0D9%jkristian@linkedin.com>
3 siblings, 1 reply; 6+ messages in thread
From: Jason Miller @ 2011-03-09 5:53 UTC (permalink / raw)
To: John Kristian; +Cc: git@vger.kernel.org
On Wed, 2 Mar 2011 02:43:23 +0000
John Kristian <jkristian@linkedin.com> wrote:
> How do you recommend using git to work with branches of a large, busy
> subversion repository? In general, how can small teams use git for
> their tasks, and use subversion to coordinate with a larger
> organization?
I don't know if this is the same problem that you are having, but...
I had to clone a repository with 200k revisions and 12000
branches+tags. It was going to take weeks and weeks with a local
svnsync mirror on a high-end workstation with fast disks.
I've never touched perl-code previous to this, but a friend pointed me
at a good perl profiler, and I found pretty quickly the offending line
of code in git-svn:
3515 return unless ::verify_ref($self->refname.'^0');
This was basically doing a
system("git rev-parse --verify some-reference^0")
several times per revision fetched per branch. When you have 12000
branches, that really, really adds up. I made a change that seems to
speed it up by a factor of about 10-20x on my repository, but I'm still
digging around in git to see if I'm doing it correctly.
My basic logic is that if the above one-liner returns true then either
one of the following files will exist:
$ENV{GIT_DIR}/refs/remotes/$refname
$ENV{GIT_DIR}/refs/$refname
$ENV{GIT_DIR}/refs/heads/$refname
or there will be an entry for the reference in
$ENV{GIT_DIR}/packed-refs
Furthermore since packed-refs changes infrequently, you can cache its
contents.
I'm still digging around in the plumbing to see if this is
assumption is true or not. If I find it is true, I'll likely submit a
patch. Now that it "works on my machine" I've backburnered it a bit
since git is more a tool I use than a project I hack on.
-Jason
^ permalink raw reply [flat|nested] 6+ messages in thread