git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Subversion integration with git
@ 2010-03-25 14:17 David Michael Barr
  2010-03-25 14:31 ` Erik Faye-Lund
  2010-03-25 18:23 ` Pascal Obry
  0 siblings, 2 replies; 8+ messages in thread
From: David Michael Barr @ 2010-03-25 14:17 UTC (permalink / raw)
  To: git

Hi folks,

As my first posting to the list, I'd like to start by giving a big thank you to all the git developers and maintainers for such a great tool.

Unfortunately, I still have to interact with lesser tools such as Subversion and that is what leads me to post.

I'm employed on proprietary project which is supported by a large number of open source tools. The 'canonical' source repository is hosted on a Subversion server on the other side of a rather unreliable WAN link. To date I've been using a combination of git-svn, cron, and a handful of bash scripts to handle marshalling commits between our git repositories and the Subversion instance. However, whilst this solution works well for incremental commits, every time a branch is created on the remote repository it's a hassle to synchronise.
So I thought I'd use git-svn and standard layout - this resulted in blasting my link with so many HTTP requests that I got a stern warning from our sysadmin and I'm sure the firm on the other side of the link weren't impressed.
After exploring a few solutions I used SVK to create a local mirror of the repository.

When I pointed git-svn at the local mirror, it took 4 days, a whole lot of RAM and fell over at 90% completion with a checksum error.

When I pointed svn-all-fast-export at the repository it had to skip three commits or would indefinitely spew garbage.

When I pointed svn2git.py at a dump of the repository it successfully imported 50% of commits and then ran at snail's pace, ETA next century.

I decided that I liked the idea of subversion dump in - git fast-import out but it had to scale well.

So I grabbed the git-fast-import documentation and the Subversion dump format documentation and tried to design a data structure that would map well between them and scale linearly with my repository.

I started a new project to implement my design and am curious as to how many git users actually care about this kind of problem. While conversion is once off for most projects - there are an awful number of projects currently using Subversion. As the community and tool-chain builds around git, that will mean many desiring to make the transition. I hope to make it far less painful than it has been for me.

My project is still in the preview phase but has enough to import commit-tree structure bar symlinks and executable flags. It imports my 22000+ commit 2.8GB dump in 4 minutes. It is currently 840 non-comment lines of C. I aim to produce output that git-svn can take over from.

Is it worthwhile to start a new project - or would it be better to grok the internals of existing projects and try to make them scale?

Best regards,
David M Barr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Subversion integration with git
  2010-03-25 14:17 Subversion integration with git David Michael Barr
@ 2010-03-25 14:31 ` Erik Faye-Lund
       [not found]   ` <E560EF9A-AF07-4316-9047-6D1A1802F743@cordelta.com>
  2010-03-25 18:23 ` Pascal Obry
  1 sibling, 1 reply; 8+ messages in thread
From: Erik Faye-Lund @ 2010-03-25 14:31 UTC (permalink / raw)
  To: David Michael Barr; +Cc: git, artagnon, Sverre Rabbelier

On Thu, Mar 25, 2010 at 3:17 PM, David Michael Barr
<david.barr@cordelta.com> wrote:
> Hi folks,
>
> As my first posting to the list, I'd like to start by giving a big thank you to all the git developers and maintainers for such a great tool.
>
> Unfortunately, I still have to interact with lesser tools such as Subversion and that is what leads me to post.
>
> I'm employed on proprietary project which is supported by a large number of open source tools. The 'canonical' source repository is hosted on a Subversion server on the other side of a rather unreliable WAN link. To date I've been using a combination of git-svn, cron, and a handful of bash scripts to handle marshalling commits between our git repositories and the Subversion instance. However, whilst this solution works well for incremental commits, every time a branch is created on the remote repository it's a hassle to synchronise.
> So I thought I'd use git-svn and standard layout - this resulted in blasting my link with so many HTTP requests that I got a stern warning from our sysadmin and I'm sure the firm on the other side of the link weren't impressed.
> After exploring a few solutions I used SVK to create a local mirror of the repository.
>
> When I pointed git-svn at the local mirror, it took 4 days, a whole lot of RAM and fell over at 90% completion with a checksum error.
>
> When I pointed svn-all-fast-export at the repository it had to skip three commits or would indefinitely spew garbage.
>
> When I pointed svn2git.py at a dump of the repository it successfully imported 50% of commits and then ran at snail's pace, ETA next century.
>
> I decided that I liked the idea of subversion dump in - git fast-import out but it had to scale well.
>
> So I grabbed the git-fast-import documentation and the Subversion dump format documentation and tried to design a data structure that would map well between them and scale linearly with my repository.
>
> I started a new project to implement my design and am curious as to how many git users actually care about this kind of problem. While conversion is once off for most projects - there are an awful number of projects currently using Subversion. As the community and tool-chain builds around git, that will mean many desiring to make the transition. I hope to make it far less painful than it has been for me.
>
> My project is still in the preview phase but has enough to import commit-tree structure bar symlinks and executable flags. It imports my 22000+ commit 2.8GB dump in 4 minutes. It is currently 840 non-comment lines of C. I aim to produce output that git-svn can take over from.
>

Wow, your figures sounds very impressive. I'd love to have a look at
it! I've tried to convert simiar-sized SVN repos before, but given up
due to the poor performance. So at work I'm currently using git-svn
with only parts of the history imported, and falling back to SVN when
having to dig far in the history (which is not much fun).

> Is it worthwhile to start a new project - or would it be better to grok the internals of existing projects and try to make them scale?
>

I think it falls very close to the native-git-svn Google SoC
project[1], and if you are able to share what you have I'm sure
Ramkumar (I hope you don't mind me CC'ing you, and that I spelled your
name right) would appreciate having a look.

[1]: https://git.wiki.kernel.org/index.php/SoC2010Ideas#A_remote_helper_for_svn

-- 
Erik "kusma" Faye-Lund

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Subversion integration with git
       [not found]   ` <E560EF9A-AF07-4316-9047-6D1A1802F743@cordelta.com>
@ 2010-03-25 17:52     ` Ramkumar Ramachandra
  2010-03-25 23:50       ` David Michael Barr
  2010-03-30 14:05       ` David Michael Barr
  0 siblings, 2 replies; 8+ messages in thread
From: Ramkumar Ramachandra @ 2010-03-25 17:52 UTC (permalink / raw)
  To: David Michael Barr; +Cc: Erik Faye-Lund, Sverre Rabbelier, Git Mailing List

Hi,

> I think it falls very close to the native-git-svn Google SoC
> project[1], and if you are able to share what you have I'm sure
> Ramkumar (I hope you don't mind me CC'ing you, and that I spelled your
> name right) would appreciate having a look.

Yes. Thank you for CC'ing me, Erik.

> Is it worthwhile to start a new project - or would it be better to grok the internals of existing projects and try to make them scale?

Honestly, I've just started looking into this issue, so I'll wait for
someone else to comment on this. As far as interest is concerned, yes-
a lot of people seem to be pretty excited about my GSoC project
proposal [1]. My proposal has more to do with getting native support
working, than building a fantastic SVN importer. I certainly have
neither the time or experience to build an SVN importer that's any
better than git-svn.perl in the summer term, and I'm clear that I
don't intend to do this. However, if my proposal gets accepted, I
could work with you to get it integrated into the remote helper that
I'll be building. Depending on the complexity of your project, this
might only be possible at the end of my GSoC term.

[1] http://thread.gmane.org/gmane.comp.version-control.git/142623

-- Ram

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Subversion integration with git
  2010-03-25 14:17 Subversion integration with git David Michael Barr
  2010-03-25 14:31 ` Erik Faye-Lund
@ 2010-03-25 18:23 ` Pascal Obry
  2010-03-28 12:03   ` David Michael Barr
  1 sibling, 1 reply; 8+ messages in thread
From: Pascal Obry @ 2010-03-25 18:23 UTC (permalink / raw)
  To: David Michael Barr; +Cc: git

David,

> My project is still in the preview phase but has enough to import 
> commit-tree structure bar symlinks and executable flags. It imports 
> my 22000+ commit 2.8GB dump in 4 minutes. It is currently 840 
> non-comment lines of C. I aim to produce output that git-svn can
> take over from.

Impressive numbers! I've converted many projects using git-svn and yes
it is slow. Just curious, does it handles branches? Can it handles not
standard layout (trunk/branch/tags)? When you have a git-svn compatible
output I would be willing to test it on a project.

Pascal.

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|    http://www.obry.net  -  http://v2p.fr.eu.org
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver keys.gnupg.net --recv-key F949BD3B

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Subversion integration with git
  2010-03-25 17:52     ` Ramkumar Ramachandra
@ 2010-03-25 23:50       ` David Michael Barr
  2010-03-30 14:05       ` David Michael Barr
  1 sibling, 0 replies; 8+ messages in thread
From: David Michael Barr @ 2010-03-25 23:50 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Erik Faye-Lund, Sverre Rabbelier, Git Mailing List

Hi,

>> I think it falls very close to the native-git-svn Google SoC
>> project[1], and if you are able to share what you have I'm sure
>> Ramkumar (I hope you don't mind me CC'ing you, and that I spelled your
>> name right) would appreciate having a look.
> 
> Yes. Thank you for CC'ing me, Erik.
> 
>> Is it worthwhile to start a new project - or would it be better to grok the internals of existing projects and try to make them scale?
> 
> However, if my proposal gets accepted, I
> could work with you to get it integrated into the remote helper that
> I'll be building. Depending on the complexity of your project, this
> might only be possible at the end of my GSoC term.

>From Ramkumar's proposal[1]:

> The distinct components I plan to write are:

> 2. An exporter for SVN repositories, which will extract all the
> relevant revision history and metadata to import into Git.
> 3. A remote helper for Git that takes the data from this SVN exporter,
> and uses git-fast-import to create corresponding commits in Git.

The scope of my project roughly corresponds to these two components.

With regard to licensing issues, I opted to work only with import/export streams so that no linking is required.

In the context of GSoC, I am studying my final year of a Bachelor of Science (Computer Science x 2) at Australian National University, Canberra.
As for FOSS contributions, the last time I released something was when I helped port the iBurst wireless broadband driver (USB interface) [2] a few years ago; the demands of life vs. one's passion.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Subversion integration with git
  2010-03-25 18:23 ` Pascal Obry
@ 2010-03-28 12:03   ` David Michael Barr
  0 siblings, 0 replies; 8+ messages in thread
From: David Michael Barr @ 2010-03-28 12:03 UTC (permalink / raw)
  To: pascal; +Cc: git

Pascal,

>> My project is still in the preview phase but has enough to import 
>> commit-tree structure bar symlinks and executable flags. It imports 
>> my 22000+ commit 2.8GB dump in 4 minutes. It is currently 840 
>> non-comment lines of C. I aim to produce output that git-svn can
>> take over from.
> 
> Impressive numbers! I've converted many projects using git-svn and yes
> it is slow. Just curious, does it handles branches? Can it handles not
> standard layout (trunk/branch/tags)? When you have a git-svn compatible
> output I would be willing to test it on a project.

My initial design target is a one-to-one translation of the subversion history to a single linear git branch. I'm working under the assumption that something like git filter-branch can be used to transform the history to a more logical representation. This should allow any subversion layout to be handled.

When I have git-svn compatible output, I'll proudly announce the first release.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Subversion integration with git
  2010-03-25 17:52     ` Ramkumar Ramachandra
  2010-03-25 23:50       ` David Michael Barr
@ 2010-03-30 14:05       ` David Michael Barr
  2010-03-30 14:29         ` Ramkumar Ramachandra
  1 sibling, 1 reply; 8+ messages in thread
From: David Michael Barr @ 2010-03-30 14:05 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Erik Faye-Lund, Sverre Rabbelier, Git Mailing List

Hi,

>> I think it falls very close to the native-git-svn Google SoC
>> project[1], and if you are able to share what you have I'm sure
>> Ramkumar (I hope you don't mind me CC'ing you, and that I spelled your
>> name right) would appreciate having a look.
> 
> Yes. Thank you for CC'ing me, Erik.
> 
>> Is it worthwhile to start a new project - or would it be better to grok the internals of existing projects and try to make them scale?
> 
> ... if my proposal gets accepted, I
> could work with you to get it integrated into the remote helper that
> I'll be building. Depending on the complexity of your project, this
> might only be possible at the end of my GSoC term.
> 
> 
> 
> -- Ram

I've started looking at the first piece of the pipeline, reading from a
remote subversion URL. I stumbled upon rsvndump[2], which is
GPLv3+ licensed and promises to produce a Subversion dump from
a remote repository. This could be piped to my utility,
svn-dump-fast-export[3], to produce suitable input for git fast-import.
I believe this would address the first two components of Ram's.
proposal and allow more focus to be given to the interesting ones.
That's presuming that I have a feature-complete release by the time
the GSoC project begins.

My project is currently under a two-clause BSD style license.
This is primarily because the two projects it derives from were
distributed under the same license, rather than any preference.

As I've included a reference to my project, I'll emphasise that it is a
work in progress, with a handful of known bugs.
At present, symlinks are damaged on update and some files
disappear late in the history of my test repository.
I'm planning a rewrite of the parser once symlinks are complete.

[1] http://thread.gmane.org/gmane.comp.version-control.git/142623
[2] http://rsvndump.sourceforge.net/
[3] http://github.com/barrbrain/svn-dump-fast-export/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Subversion integration with git
  2010-03-30 14:05       ` David Michael Barr
@ 2010-03-30 14:29         ` Ramkumar Ramachandra
  0 siblings, 0 replies; 8+ messages in thread
From: Ramkumar Ramachandra @ 2010-03-30 14:29 UTC (permalink / raw)
  To: David Michael Barr; +Cc: Erik Faye-Lund, Sverre Rabbelier, Git Mailing List

Hi,

On Tue, Mar 30, 2010 at 7:35 PM, David Michael Barr
<david.barr@cordelta.com> wrote:
> I've started looking at the first piece of the pipeline, reading from a
> remote subversion URL. I stumbled upon rsvndump[2], which is
> GPLv3+ licensed and promises to produce a Subversion dump from
> a remote repository. This could be piped to my utility,
> svn-dump-fast-export[3], to produce suitable input for git fast-import.

In the latest version of my proposal, I've proposed to write a
mirroring tool which will basically be a stripped down version of
svnsync [1]. For testing, I therefore recommend that you use svnsync
instead of rsvndump because the former is included in the official
source tree. If however, you do a comparison and find that rsvndump is
actually a better alternative, let me know.

> I believe this would address the first two components of Ram's.
> proposal and allow more focus to be given to the interesting ones.
> That's presuming that I have a feature-complete release by the time
> the GSoC project begins.

That's fantastic! I suspect you haven't looked at the latest revision
of my proposal. Since Gmane doesn't seem to have indexed it, I just
sent you a mail off the list.

-- Ram

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-03-30 14:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-25 14:17 Subversion integration with git David Michael Barr
2010-03-25 14:31 ` Erik Faye-Lund
     [not found]   ` <E560EF9A-AF07-4316-9047-6D1A1802F743@cordelta.com>
2010-03-25 17:52     ` Ramkumar Ramachandra
2010-03-25 23:50       ` David Michael Barr
2010-03-30 14:05       ` David Michael Barr
2010-03-30 14:29         ` Ramkumar Ramachandra
2010-03-25 18:23 ` Pascal Obry
2010-03-28 12:03   ` David Michael Barr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).