* git-svn with big subversion repository
@ 2011-03-02 2:43 John Kristian
2011-03-02 16:09 ` Thomas Ferris Nicolaisen
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: John Kristian @ 2011-03-02 2:43 UTC (permalink / raw)
To: git@vger.kernel.org
How do you recommend using git to work with branches of a large, busy
subversion repository? In general, how can small teams use git for their
tasks, and use subversion to coordinate with a larger organization?
git-svn has some trouble, I find. For example, this tries to copy the entire
repo starting with revision 1:
git svn clone --stdlayout svn+ssh://server/repo/project
This would take weeks, I estimate for my subversion repository.
Choosing a subset of the repository enables git svn clone to cope, but then
git svn fetch will stall after processing a few revisions. For example:
git svn clone --no-follow-parent --no-minimize-url \
--branches=branches \
--ignore-paths="^(?!branches/(TEAM_|RELEASE_))" \
-r $BASE svn+ssh://server/repo/project
git svn fetch --no-follow-parent # stalls
I don't why it stalls. I guess it's doing something that requires processing
the entire subversion repository.
The best I can do is clone each subversion branch into a separate svn-remote
section of the .git/config file, for example:
git svn clone --no-follow-parent --no-minimize-url \
--svn-remote=TEAM_FOO --id=TEAM_FOO \
-r $BASE svn+ssh://server/repo/project/branches/TEAM_FOO
git svn fetch --no-follow-parent
The clone runs about as long as svn checkout, and the fetch replays the
later revisions briskly. Sadly, the relationship between branches isn't
fetched: git log won't tell me how a given subversion branch was copied from
another. I use svn for that.
I'm using git version 1.7.4, git-svn version 1.7.4 (svn 1.6.5), svn version
1.6.0 (r36650) and Mac OS X version 10.6.5. I got git from MacPorts.
- John Kristian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-svn with big subversion repository
2011-03-02 2:43 git-svn with big subversion repository John Kristian
@ 2011-03-02 16:09 ` Thomas Ferris Nicolaisen
2011-03-03 4:13 ` Phil Hord
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Thomas Ferris Nicolaisen @ 2011-03-02 16:09 UTC (permalink / raw)
To: John Kristian; +Cc: git@vger.kernel.org
Hi John,
I've successfully run git svn clone on a repository with about 100k
revisions. The clone was not of the whole repository, but rather a
subdirectory for a project using the trunk/tags/branches structure.
The project is about 200k files and about 4GB.
The initial clone took hours and hours (on my macbook). I basically
had to leave it on over night (the svn server is here on the LAN,
running over https).
The only problem I had was that the clone would occasionally exit (not
stall, as you say). This is a know problem described here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=526989
The solution is to just run git svn fetch, as the cloning will pick up
where it stopped. To keep from having to do this yourself, loop the
fetch in a shell script. I blogged about it here:
<http://blog.tfnico.com/2010/07/living-with-subversion-and-git-in.html>
And there are also some more tricks and tips for living with git+svn
here: <http://www.tfnico.com/presentations/git-and-subversion>
You could also investigate how the Apache folks have made their git
mirrors here: http://git.apache.org/ - at least they have an SVN repo
with over a million revisions. I think they did something like
svn-dump + git fast-import, but I couldn't find any details on the
fly.
On Wed, Mar 2, 2011 at 3:43 AM, John Kristian <jkristian@linkedin.com> wrote:
> How do you recommend using git to work with branches of a large, busy
> subversion repository? In general, how can small teams use git for their
> tasks, and use subversion to coordinate with a larger organization?
>
> git-svn has some trouble, I find. For example, this tries to copy the entire
> repo starting with revision 1:
>
> git svn clone --stdlayout svn+ssh://server/repo/project
>
> This would take weeks, I estimate for my subversion repository.
>
> Choosing a subset of the repository enables git svn clone to cope, but then
> git svn fetch will stall after processing a few revisions. For example:
>
> git svn clone --no-follow-parent --no-minimize-url \
> --branches=branches \
> --ignore-paths="^(?!branches/(TEAM_|RELEASE_))" \
> -r $BASE svn+ssh://server/repo/project
> git svn fetch --no-follow-parent # stalls
>
> I don't why it stalls. I guess it's doing something that requires processing
> the entire subversion repository.
>
> The best I can do is clone each subversion branch into a separate svn-remote
> section of the .git/config file, for example:
>
> git svn clone --no-follow-parent --no-minimize-url \
> --svn-remote=TEAM_FOO --id=TEAM_FOO \
> -r $BASE svn+ssh://server/repo/project/branches/TEAM_FOO
> git svn fetch --no-follow-parent
>
> The clone runs about as long as svn checkout, and the fetch replays the
> later revisions briskly. Sadly, the relationship between branches isn't
> fetched: git log won't tell me how a given subversion branch was copied from
> another. I use svn for that.
>
> I'm using git version 1.7.4, git-svn version 1.7.4 (svn 1.6.5), svn version
> 1.6.0 (r36650) and Mac OS X version 10.6.5. I got git from MacPorts.
>
> - John Kristian
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-svn with big subversion repository
2011-03-02 2:43 git-svn with big subversion repository John Kristian
2011-03-02 16:09 ` Thomas Ferris Nicolaisen
@ 2011-03-03 4:13 ` Phil Hord
2011-03-05 10:53 ` Florian Weimer
2011-03-09 5:53 ` Jason Miller
3 siblings, 0 replies; 6+ messages in thread
From: Phil Hord @ 2011-03-03 4:13 UTC (permalink / raw)
To: John Kristian; +Cc: git@vger.kernel.org
On 03/01/2011 09:43 PM, John Kristian wrote:
> How do you recommend using git to work with branches of a large, busy
> subversion repository? In general, how can small teams use git for their
> tasks, and use subversion to coordinate with a larger organization?
>
> git-svn has some trouble, I find. For example, this tries to copy the entire
> repo starting with revision 1:
>
> git svn clone --stdlayout svn+ssh://server/repo/project
>
> This would take weeks, I estimate for my subversion repository.
>
> Choosing a subset of the repository enables git svn clone to cope, but then
> git svn fetch will stall after processing a few revisions. For example:
>
> git svn clone --no-follow-parent --no-minimize-url \
> --branches=branches \
> --ignore-paths="^(?!branches/(TEAM_|RELEASE_))" \
> -r $BASE svn+ssh://server/repo/project
> git svn fetch --no-follow-parent # stalls
>
> I don't why it stalls. I guess it's doing something that requires processing
> the entire subversion repository.
My initial git-svn clone took several days and many restarts. It was
much faster on my laptop. I found out later I had a flaky router and it
was dropping about 20% of my packets. Replaced the router and the clone
dropped to a reasonable couple-of-hours. Is it just me?
You can optimize by cloning specific paths inside the svn repo and then
merging in git later.
Phil
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-svn with big subversion repository
2011-03-02 2:43 git-svn with big subversion repository John Kristian
2011-03-02 16:09 ` Thomas Ferris Nicolaisen
2011-03-03 4:13 ` Phil Hord
@ 2011-03-05 10:53 ` Florian Weimer
2011-03-09 5:53 ` Jason Miller
3 siblings, 0 replies; 6+ messages in thread
From: Florian Weimer @ 2011-03-05 10:53 UTC (permalink / raw)
To: git
* John Kristian:
> How do you recommend using git to work with branches of a large, busy
> subversion repository? In general, how can small teams use git for their
> tasks, and use subversion to coordinate with a larger organization?
I've used svnsync to a local repository and git-svn against that.
This meant that my experiments do not cause excessive load on the
server.
You should definitely coordinate this because this could be considered
leaching IP if the repository is not public.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-svn with big subversion repository
2011-03-02 2:43 git-svn with big subversion repository John Kristian
` (2 preceding siblings ...)
2011-03-05 10:53 ` Florian Weimer
@ 2011-03-09 5:53 ` Jason Miller
[not found] ` <C99D031D.D0D9%jkristian@linkedin.com>
3 siblings, 1 reply; 6+ messages in thread
From: Jason Miller @ 2011-03-09 5:53 UTC (permalink / raw)
To: John Kristian; +Cc: git@vger.kernel.org
On Wed, 2 Mar 2011 02:43:23 +0000
John Kristian <jkristian@linkedin.com> wrote:
> How do you recommend using git to work with branches of a large, busy
> subversion repository? In general, how can small teams use git for
> their tasks, and use subversion to coordinate with a larger
> organization?
I don't know if this is the same problem that you are having, but...
I had to clone a repository with 200k revisions and 12000
branches+tags. It was going to take weeks and weeks with a local
svnsync mirror on a high-end workstation with fast disks.
I've never touched perl-code previous to this, but a friend pointed me
at a good perl profiler, and I found pretty quickly the offending line
of code in git-svn:
3515 return unless ::verify_ref($self->refname.'^0');
This was basically doing a
system("git rev-parse --verify some-reference^0")
several times per revision fetched per branch. When you have 12000
branches, that really, really adds up. I made a change that seems to
speed it up by a factor of about 10-20x on my repository, but I'm still
digging around in git to see if I'm doing it correctly.
My basic logic is that if the above one-liner returns true then either
one of the following files will exist:
$ENV{GIT_DIR}/refs/remotes/$refname
$ENV{GIT_DIR}/refs/$refname
$ENV{GIT_DIR}/refs/heads/$refname
or there will be an entry for the reference in
$ENV{GIT_DIR}/packed-refs
Furthermore since packed-refs changes infrequently, you can cache its
contents.
I'm still digging around in the plumbing to see if this is
assumption is true or not. If I find it is true, I'll likely submit a
patch. Now that it "works on my machine" I've backburnered it a bit
since git is more a tool I use than a project I hack on.
-Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-svn with big subversion repository
[not found] ` <C99D031D.D0D9%jkristian@linkedin.com>
@ 2011-03-11 0:32 ` Jason Miller
0 siblings, 0 replies; 6+ messages in thread
From: Jason Miller @ 2011-03-11 0:32 UTC (permalink / raw)
To: John Kristian; +Cc: git
On 18:14 Wed 09 Mar , John Kristian wrote:
Mr Kristian:
> Thanks for sharing your experience. After patching git-svn, were you able to clone your subversion repository?
Indeed I was, it took about 48 hours to to the initial import. However,
I forgot to mention one other important thing that was a problem.
There is a pattern in svn of doing the following:
/trunk/module1
/trunk/module2
/trunk/module3
Then some branches will be like this:
svn cp /trunk/ /branches/mybranch1
and others might be:
svn cp /trunk/module2 /brancyes/mybranchofmodule2
If this hasn't ever been done on your repository, you can stop reading
now.
There is no way to represent this in Git directly, so the correct thing
to do here is to create a git repository for each module. Now the hard
thing is telling git-svn how to handle this. I ended up writing a
python script that reads in the SVN changelog and finds all of the
children of e.g. /trunk/module1. Any that were copied from /trunk, it
appends "/module1" to the path, and any that were copied from
/trunk/module1, it leaves alone. This then goes in the git
configuration file as the list of branches to fetch.
-Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-11 0:33 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-02 2:43 git-svn with big subversion repository John Kristian
2011-03-02 16:09 ` Thomas Ferris Nicolaisen
2011-03-03 4:13 ` Phil Hord
2011-03-05 10:53 ` Florian Weimer
2011-03-09 5:53 ` Jason Miller
[not found] ` <C99D031D.D0D9%jkristian@linkedin.com>
2011-03-11 0:32 ` Jason Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).