CVS -> SVN -> Git

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* CVS -> SVN -> Git
@ 2007-07-13 14:48 Julian Phillips
  2007-07-13 23:03 ` Michael Haggerty
  2007-07-19 19:15 ` Simon 'corecode' Schubert
  0 siblings, 2 replies; 31+ messages in thread
From: Julian Phillips @ 2007-07-13 14:48 UTC (permalink / raw)
  To: git

Has anyone managed to succssfully import a Subversion repository that was 
initially imported from CVS using cvs2svn using fast-import?

It looks like cvs2svn has created a rather big mess.   It has created 
single commits that change files in more than one branch and/or tag. 
It also creates tags using more than one commit.  Now I come to try and 
import the Subversion history into git and I'm having trouble creating a 
sensible stream to feed into fast-import.

I'm trying to use fast-import because git-svnimport creates a incorrect 
repository that is missing files and even whole directories (I suppose 
this could be due to the confusion from cvs2svn), and git-svn is a) _way_ 
too slow, b) doesn't do merges and c) munges the commit comments.

-- 
Julian

  ---
Riffle West Virginia is so small that the Boy Scout had to double as the
town drunk.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-13 14:48 CVS -> SVN -> Git Julian Phillips
@ 2007-07-13 23:03 ` Michael Haggerty
  2007-07-14  5:30   ` Martin Langhoff
  2007-07-19 19:15 ` Simon 'corecode' Schubert
  1 sibling, 1 reply; 31+ messages in thread
From: Michael Haggerty @ 2007-07-13 23:03 UTC (permalink / raw)
  To: Julian Phillips; +Cc: git

Julian Phillips wrote:
> Has anyone managed to succssfully import a Subversion repository that
> was initially imported from CVS using cvs2svn using fast-import?
> 
> It looks like cvs2svn has created a rather big mess.   It has created
> single commits that change files in more than one branch and/or tag. It
> also creates tags using more than one commit.  Now I come to try and
> import the Subversion history into git and I'm having trouble creating a
> sensible stream to feed into fast-import.

I'm the main cvs2svn developer.  Obviously, the tool is intended to
convert to Subversion, but there are ways to tune it to make its output
a little bit more git-friendly.

[Please note that both CVS and SVN allow changes to multiple
tags/branches in a single commit and creating tags using more than one
commit.  That is why cvs2svn converts these repository "features" 1:1 by
default.]

Release 2.0.0-rc1 of cvs2svn (released today) has a
--no-cross-branch-commits option that prevents commits that affect more
than one branch.  For multiproject conversions, the
"ctx.cross_project_commits" option might also be useful.  (The latter is
only available if you start cvs2svn with an --options file.)

The new cvs2svn release is also more intelligent about determining the
most likely source branch from which a tag/branch was created.  This
does not eliminate the creation of tags from more than one revision, but
it should reduce its frequency.  If your repository uses any vendor
branches, you might also consider --exclude'ing them.  In the new
cvs2svn version, this causes vendor revisions to be grafted onto trunk
and thereby eliminates another common cause of multiple-source
branches/tags.

Incidentally, now that cvs2svn 2.0.0 is nearly out, I am thinking about
what it would take to write some other back ends for cvs2svn--turning
it, essentially, into cvs2xxx.  Most of the work that cvs2svn does is
inferring the most plausible history of the repository from CVS's
sketchy, incomplete, idiomatic, and often corrupt data.  This work
should also be useful for a cvs2git or cvs2hg or cvs2baz or ...

I haven't played with a distributed SCM yet, but if somebody would be
interested in working with me on this please let me know.

Michael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-13 23:03 ` Michael Haggerty
@ 2007-07-14  5:30   ` Martin Langhoff
  2007-07-14 17:09     ` Michael Haggerty
  2007-07-19 19:18     ` Simon 'corecode' Schubert
  0 siblings, 2 replies; 31+ messages in thread
From: Martin Langhoff @ 2007-07-14  5:30 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Julian Phillips, git

On 7/14/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> Incidentally, now that cvs2svn 2.0.0 is nearly out, I am thinking about
> what it would take to write some other back ends for cvs2svn--turning
> it, essentially, into cvs2xxx.  Most of the work that cvs2svn does is
> inferring the most plausible history of the repository from CVS's
> sketchy, incomplete, idiomatic, and often corrupt data.  This work
> should also be useful for a cvs2git or cvs2hg or cvs2baz or ...

Great to hear that. I'm game if we can do something in this direction
- surely we can make it talk to fastimport ;-)

Does cvs2svn handle incremental imports, remembering any "guesses"
taken earlier? Last time I looked at it, it had far better logic than
cvsps, but it didn't do incremental imports, and repeated imports done
at different times would "guess" different branching points for new
branches, so it _really_ didn't support incrementals

cheers,



m

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14  5:30   ` Martin Langhoff
@ 2007-07-14 17:09     ` Michael Haggerty
  2007-07-14 17:32       ` Chris Shoemaker
                         ` (3 more replies)
  2007-07-19 19:18     ` Simon 'corecode' Schubert
  1 sibling, 4 replies; 31+ messages in thread
From: Michael Haggerty @ 2007-07-14 17:09 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Julian Phillips, git, dev

Martin Langhoff wrote:
> On 7/14/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> Incidentally, now that cvs2svn 2.0.0 is nearly out, I am thinking about
>> what it would take to write some other back ends for cvs2svn--turning
>> it, essentially, into cvs2xxx.  Most of the work that cvs2svn does is
>> inferring the most plausible history of the repository from CVS's
>> sketchy, incomplete, idiomatic, and often corrupt data.  This work
>> should also be useful for a cvs2git or cvs2hg or cvs2baz or ...
> 
> Great to hear that. I'm game if we can do something in this direction
> - surely we can make it talk to fastimport ;-)

We added some hooks to cvs2svn 2.0 to start working in this direction.
But I don't really know what information is needed for a git import.
One quick-and-dirty idea that I had was to have cvs2svn output
information compatible with cvsps's output, as I believe that several
tools rely on cvsps to do the dirty work and so could perhaps be
persuaded to use cvs2svn out of the box.

> Does cvs2svn handle incremental imports, remembering any "guesses"
> taken earlier? Last time I looked at it, it had far better logic than
> cvsps, but it didn't do incremental imports, and repeated imports done
> at different times would "guess" different branching points for new
> branches, so it _really_ didn't support incrementals

That's correct; cvs2svn does not support incremental conversion at all
(at least not yet).

Michael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 17:09     ` Michael Haggerty
@ 2007-07-14 17:32       ` Chris Shoemaker
  2007-07-14 20:01         ` Michael Haggerty
  2007-07-14 18:14       ` Steffen Prohaska
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Chris Shoemaker @ 2007-07-14 17:32 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Martin Langhoff, Julian Phillips, git, dev

On Sat, Jul 14, 2007 at 07:09:30PM +0200, Michael Haggerty wrote:
> Martin Langhoff wrote:
> > On 7/14/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> >> Incidentally, now that cvs2svn 2.0.0 is nearly out, I am thinking about
> >> what it would take to write some other back ends for cvs2svn--turning
> >> it, essentially, into cvs2xxx.  Most of the work that cvs2svn does is
> >> inferring the most plausible history of the repository from CVS's
> >> sketchy, incomplete, idiomatic, and often corrupt data.  This work
> >> should also be useful for a cvs2git or cvs2hg or cvs2baz or ...
> > 
> > Great to hear that. I'm game if we can do something in this direction
> > - surely we can make it talk to fastimport ;-)
> 
> We added some hooks to cvs2svn 2.0 to start working in this direction.
> But I don't really know what information is needed for a git import.
> One quick-and-dirty idea that I had was to have cvs2svn output
> information compatible with cvsps's output, as I believe that several
> tools rely on cvsps to do the dirty work and so could perhaps be
> persuaded to use cvs2svn out of the box.

Depending on how difficult that is, it might be very useful, even
if it's not the best way to interface with fast-import (which I
suspect it's not).  I, for one, would be interested to know how
cvs2svn's output compared to CVSps's, especially w.r.t. detecting each
branch's parent.

Perhaps one is always more correct than the other, but if not, I bet
that seeing the differences using the same format would help to
improve either one.

-chris

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 17:09     ` Michael Haggerty
  2007-07-14 17:32       ` Chris Shoemaker
@ 2007-07-14 18:14       ` Steffen Prohaska
  2007-07-15  2:22         ` Shawn O. Pearce
  2007-07-14 19:52       ` Eric S. Raymond
  2007-07-15 23:09       ` Scott Lamb
  3 siblings, 1 reply; 31+ messages in thread
From: Steffen Prohaska @ 2007-07-14 18:14 UTC (permalink / raw)
  To: Michael Haggerty, Shawn Pearce, Simon Hausmann
  Cc: Martin Langhoff, Julian Phillips, Git Mailing List, dev


On Jul 14, 2007, at 7:09 PM, Michael Haggerty wrote:

> Martin Langhoff wrote:
>> On 7/14/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>>> Incidentally, now that cvs2svn 2.0.0 is nearly out, I am thinking  
>>> about
>>> what it would take to write some other back ends for cvs2svn-- 
>>> turning
>>> it, essentially, into cvs2xxx.  Most of the work that cvs2svn  
>>> does is
>>> inferring the most plausible history of the repository from CVS's
>>> sketchy, incomplete, idiomatic, and often corrupt data.  This work
>>> should also be useful for a cvs2git or cvs2hg or cvs2baz or ...
>>
>> Great to hear that. I'm game if we can do something in this direction
>> - surely we can make it talk to fastimport ;-)
>
> We added some hooks to cvs2svn 2.0 to start working in this direction.
> But I don't really know what information is needed for a git import.
> One quick-and-dirty idea that I had was to have cvs2svn output
> information compatible with cvsps's output, as I believe that several
> tools rely on cvsps to do the dirty work and so could perhaps be
> persuaded to use cvs2svn out of the box.

 From my understanding, piping data to git fast-import would be
a sane gateway to git. The input format of fast-import is document
in [1].

Maybe Shaw Pearce has some comments on that. Shawn did most
(maybe all) of the work on git-fast-import.

Simon Hausmann wrote a p4 importer that uses fast-import as
its backend. Maybe, Simon can give hints how to get started.

I have no experience with neither git-fast-import nor the p4
importer but would be happy to test any improved way of importing
cvs to git. I experienced problems using git-cvsimport on a rather
large cvs repository. Hence it would be a real test of the superior
capabilities of cvs2svn.

	Steffen


[1] http://www.kernel.org/pub/software/scm/git/docs/git-fast-import.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 17:09     ` Michael Haggerty
  2007-07-14 17:32       ` Chris Shoemaker
  2007-07-14 18:14       ` Steffen Prohaska
@ 2007-07-14 19:52       ` Eric S. Raymond
  2007-07-14 20:58         ` Junio C Hamano
                           ` (2 more replies)
  2007-07-15 23:09       ` Scott Lamb
  3 siblings, 3 replies; 31+ messages in thread
From: Eric S. Raymond @ 2007-07-14 19:52 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Martin Langhoff, Julian Phillips, git, dev

Michael Haggerty <mhagger@alum.mit.edu>:
> Martin Langhoff wrote:
> > On 7/14/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> >> Incidentally, now that cvs2svn 2.0.0 is nearly out, I am thinking about
> >> what it would take to write some other back ends for cvs2svn--turning
> >> it, essentially, into cvs2xxx.  Most of the work that cvs2svn does is
> >> inferring the most plausible history of the repository from CVS's
> >> sketchy, incomplete, idiomatic, and often corrupt data.  This work
> >> should also be useful for a cvs2git or cvs2hg or cvs2baz or ...
> > 
> > Great to hear that. I'm game if we can do something in this direction
> > - surely we can make it talk to fastimport ;-)
> 
> We added some hooks to cvs2svn 2.0 to start working in this direction.

Excuse me, I missed Michael's Haggerty's original post. But as it
happens I've been doing quite a lot of work with VCSes and migration
tools recently and I have some opinions and experience that I think are
relevant.

In slightly more detail: I just finished forward-porting sccs2rcs to
Python, I've been moving some Subversion-hosted stuff to Mercurial,
and I've been thinking about writing a simple rcs2svn because the last
time I tried using cvs2svn on a large RCS history (the Jargon File, as
it happens, a couple years back) it did a very poor job of coalescing
related commits without the CVS metadata.  I'm going to hope 2.0.0 has
fixed that; I'll experiment and see.

Also, I'm in the process of rewriting Emacs VC mode and testing it
with three different VCSes. One consequence is that the Subversion
support in Emacs is going to cease sucking badly in the very near
future -- the VC-mode rewrite is giving VC mode the ability to make
atomic fileset commits if the underlying VCS will support them.  I'm
putting the finishing touches on that code today, as it happens.

Another consequence is that Mercurial and git support will get really
good, oh, about ten minutes after the new Subversion backend lands.
The blocker on all three was the same weakness in the engine of VC
mode, for whicch I was (alas) responsible as its original author and
*which I have now fixed*.

So, I hear about plans to make cvs2svn generate something other than
Subversion, and here's my instant reaction:

	    	       	   DON'T DO IT!

This is not because I think Subversion is some kind of final answer to the
VCS problem.  Fame from it -- I'm moving towards Mercurial.  No, the
real reason I think this would be a waste of time is subtler than that.

Subversion, by design, is very good at capturing the metadata from
SCCS and RCS and the various CVS variants floating around.  In fact,
lifting from those into Subversion is basically lossless - the real
problems are that (a) as Michael notes, the data you're losslessly
lifting is scratchy, and (b) as I've noted, you have to use heuristics
to coalesce file histories into changesets and those don't always make
the links they should.

That being the case, two-step conversion with tools that import CVS to
SVN and export from SVN to whatever actually works extremely well.
I'm speaking from direct recent experience here, not just theory. In
fact, it works so well well that I'm convinced a tool for direct
conversion from CVS to the third-generation systems would be misplaced
effort.  

I'd much rather see the effort go into improving import to Subversion
from CVS and older, cruftier systems.  Subversion is like the Heinlein quote
about low Earth orbit being halfway to anywhere -- once your code and 
metadata are there, export to advanced alien VCSes is easy.  So, Michael;
I know it's nice to think about building space probes that can go direct 
to the aliens -- but please concentrate on building a better heavy-lift 
vehicle, because low earth orbit is the hard part.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 17:32       ` Chris Shoemaker
@ 2007-07-14 20:01         ` Michael Haggerty
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Haggerty @ 2007-07-14 20:01 UTC (permalink / raw)
  To: Chris Shoemaker; +Cc: Martin Langhoff, Julian Phillips, git, dev

Chris Shoemaker wrote:
> [...] I, for one, would be interested to know how
> cvs2svn's output compared to CVSps's, especially w.r.t. detecting each
> branch's parent.

The problem, as I'm sure you are aware, is that CVS does not record
unambiguously the parent of a branch.  For example, the following two
situations are indistinguishable from the data stored in CVS:

1. Create BRANCH1 from trunk, then create BRANCH2 from BRANCH1 before
making any commits to BRANCH1

2. Create BRANCH1 from trunk, then create BRANCH2 from the same trunk
revision.

Older versions of cvs2svn would always create both branches from trunk.
 cvs2svn 2.0 gathers statistics about the "possible parents" of each
symbol across multiple files.  If BRANCH1 and BRANCH2 occur in another
file in a context that makes it clear that BRANCH2 was created from
BRANCH1 (i.e., because a revision was committed to BRANCH1 before
BRANCH2 was created), then it attempts to use BRANCH1 as the parent of
BRANCH2 in all files.

So yes, cvs2svn is somewhat intelligent about determining the correct
branch ancestry.

Michael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 19:52       ` Eric S. Raymond
@ 2007-07-14 20:58         ` Junio C Hamano
  2007-07-14 21:50         ` Oswald Buddenhagen
  2007-07-14 22:19         ` Michael Haggerty
  2 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2007-07-14 20:58 UTC (permalink / raw)
  To: esr; +Cc: Michael Haggerty, Martin Langhoff, Julian Phillips, git, dev

esr@thyrsus.com (Eric S. Raymond) writes:

> So, I hear about plans to make cvs2svn generate something other than
> Subversion, and here's my instant reaction:
>
> 	    	       	   DON'T DO IT!
>
> This is not because I think Subversion is some kind of final answer to the
> VCS problem.  Fame from it -- I'm moving towards Mercurial.  No, the
> real reason I think this would be a waste of time is subtler than that.
>
> Subversion, by design, is very good at capturing the metadata from
> SCCS and RCS and the various CVS variants floating around.  In fact,
> lifting from those into Subversion is basically lossless - the real
> problems are that (a) as Michael notes, the data you're losslessly
> lifting is scratchy, and (b) as I've noted, you have to use heuristics
> to coalesce file histories into changesets and those don't always make
> the links they should.

Converting to Subversion might be lossless, but is it really the
most convenient intermediate format for other people to convert
further from?

Even after xxx2svn overcomes the problems (a) and (b) you noted
above, my impression has been that svn2yyy needs to work harder
than necessary to grok the branches/ and tags/ that artificially
are flattened, only because Subversion does not do branches nor
tags, but just represents them as copies.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 19:52       ` Eric S. Raymond
  2007-07-14 20:58         ` Junio C Hamano
@ 2007-07-14 21:50         ` Oswald Buddenhagen
  2007-07-14 22:19         ` Michael Haggerty
  2 siblings, 0 replies; 31+ messages in thread
From: Oswald Buddenhagen @ 2007-07-14 21:50 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: Martin Langhoff, Julian Phillips, git, dev

On Sat, Jul 14, 2007 at 03:52:52PM -0400, Eric S. Raymond wrote:
> That being the case, two-step conversion with tools that import CVS to
> SVN and export from SVN to whatever actually works extremely well.
>
well, yes. hoooowever ... you are missing a few details:
- conversion time. until we have incremental conversions, this is
  absolutely critical to many organizations.
- psychology. cvs2xxx is simpler than cvs2svn + svn2xxx. it's also sort
  of a mindset thing. don't underestimate this.

-- 
Hi! I'm a .signature virus! Copy me into your ~/.signature, please!
--
Chaos, panic, and disorder - my work here is done.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 19:52       ` Eric S. Raymond
  2007-07-14 20:58         ` Junio C Hamano
  2007-07-14 21:50         ` Oswald Buddenhagen
@ 2007-07-14 22:19         ` Michael Haggerty
  2007-07-14 22:44           ` Karl Fogel
                             ` (2 more replies)
  2 siblings, 3 replies; 31+ messages in thread
From: Michael Haggerty @ 2007-07-14 22:19 UTC (permalink / raw)
  To: esr; +Cc: Martin Langhoff, Julian Phillips, git, dev

Eric S. Raymond wrote:
> In slightly more detail: I just finished forward-porting sccs2rcs to
> Python, I've been moving some Subversion-hosted stuff to Mercurial,
> and I've been thinking about writing a simple rcs2svn because the last
> time I tried using cvs2svn on a large RCS history (the Jargon File, as
> it happens, a couple years back) it did a very poor job of coalescing
> related commits without the CVS metadata.  I'm going to hope 2.0.0 has
> fixed that; I'll experiment and see.

Could you give a quick summary of the relevant differences between CVS
and RCS files in this context?  Then I'd be happy to try to figure out
how bad the situation still is today, and whether it can be easily improved.

> [...]
> So, I hear about plans to make cvs2svn generate something other than
> Subversion, and here's my instant reaction:
> 
> 	    	       	   DON'T DO IT!
> 
> This is not because I think Subversion is some kind of final answer to the
> VCS problem.  Fame from it -- I'm moving towards Mercurial.  No, the
> real reason I think this would be a waste of time is subtler than that.
> 
> Subversion, by design, is very good at capturing the metadata from
> SCCS and RCS and the various CVS variants floating around.  In fact,
> lifting from those into Subversion is basically lossless - the real
> problems are that (a) as Michael notes, the data you're losslessly
> lifting is scratchy, and (b) as I've noted, you have to use heuristics
> to coalesce file histories into changesets and those don't always make
> the links they should.
> 
> That being the case, two-step conversion with tools that import CVS to
> SVN and export from SVN to whatever actually works extremely well.

Other people have complained about having to convert from SVN to
distributed SCMs, because the SVN model doesn't map so easily to their
favorite.

You are basically suggesting that an SVN repository is the best lingua
franca of the SCM world, which I don't believe.  The CVS history *does*
have to be deformed a bit to fit into SVN, and an svn2xxx converter
would have to undo the deformation.

My idea is not to built (for example) cvs2git; rather, I'd like cvs2svn
to be split conceptually into two tools:

cvs2<abstract_description_of_cvs_history>, whose job it is to determine
the most likely "true" CVS history based on the data stored in the CVS
repository, and

<abstract_description_of_cvs_history>2svn

Then later write

<abstract_description_of_cvs_history>2git
<abstract_description_of_cvs_history>2hg

etc.

The first split is partly done in cvs2svn 2.0.  And I naively imagine
that writing the new output back ends won't be all that much work.

Michael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 22:19         ` Michael Haggerty
@ 2007-07-14 22:44           ` Karl Fogel
  2007-07-14 23:23           ` David Frech
  2007-07-15  1:39           ` Eric S. Raymond
  2 siblings, 0 replies; 31+ messages in thread
From: Karl Fogel @ 2007-07-14 22:44 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: esr, Martin Langhoff, Julian Phillips, git, dev

Michael Haggerty <mhagger@alum.mit.edu> writes:
> My idea is not to built (for example) cvs2git; rather, I'd like cvs2svn
> to be split conceptually into two tools:
>
> cvs2<abstract_description_of_cvs_history>, whose job it is to determine
> the most likely "true" CVS history based on the data stored in the CVS
> repository, and
>
> <abstract_description_of_cvs_history>2svn
>
> Then later write
>
> <abstract_description_of_cvs_history>2git
> <abstract_description_of_cvs_history>2hg
>
> etc.
>
> The first split is partly done in cvs2svn 2.0.  And I naively imagine
> that writing the new output back ends won't be all that much work.

I think an intermediate interchange format is the right way to go.

But, isn't this what VCP / RevML is all about?  Perhaps RevML is
already suited to be that interchange format... (Haven't looked at it
in detail, just pointing out that there has at least been an attempt
to reinvent this wheel already :-) ).

-Karl

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 22:19         ` Michael Haggerty
  2007-07-14 22:44           ` Karl Fogel
@ 2007-07-14 23:23           ` David Frech
  2007-07-15  2:30             ` Shawn O. Pearce
  2007-07-15 11:48             ` Michael Haggerty
  2007-07-15  1:39           ` Eric S. Raymond
  2 siblings, 2 replies; 31+ messages in thread
From: David Frech @ 2007-07-14 23:23 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: esr, Martin Langhoff, Julian Phillips, git, dev

Now that this party is really rollicking, I think I'll join in. ;-)

I have a modest svn repo (about 800 commits) that contains fifteen or
so small projects. It started life as a CVS repo, and as the projects
grew and changed, and as I learned more about CVS, things got moved
around. Later, when I got interested in svn (in 2005) I converted the
repo, using cvs2svn. It got a few things wrong - mostly, that it
thought there was one project in the repo, and created toplevel
trunk/, branches/, and tags/ directories, and lumped everything below
these.

So, in svn, I moved things around some more.

Now I want to switch to git. I've since added enough to svn that there
is no option but to use th svn repo as my source. git-svnimport
doesn't work for me because its idea of the structure of my repo is
too limited. I looked around, stumbled over fast-import, and got
hooked on the idea of using it. It seemed simple enough... I wrote a
350-line Lua (!!) program that parses the svn dump file and creates a
commit stream for fast-import.

It took a day and half to get the svn dump parsing right (it's an
egregiously bad format) but only a couple of hours to write the
fast-import backend.

The code "works" in the sense that it can read an svn dump and create
a git repo that looks reasonable, but it misses a few things, like
properly inferring branch creation from the "copyfrom" info in the svn
dump.

However, it's fairly fast (~35 commits/sec) and flexible. I want to,
in the process of doing this conversion, "canonicalize" the structure
of the repo and throw away all the commits from cvs and svn that just
moved things around. This poses another inference challenge, but
having a modest simple tool (ie, a short enough program to easily
understand and modify) helps.

Having done all this, I realized that this is a good way to go.
Separating, as Michael suggests, the "parsing" part from the "commit
generating" part, not only makes the tools easier to write, but makes
them more flexible. If hg or bzr had a git-like fast-import (maybe
they do) it would take me about 35 minutes to target that instead. And
in the process I came across some "missing features" in fast-import,
which Shawn Pearce was able to quickly add.

My repo is tiny, but I still think that speed and flexibility are key
in this process. If I can write a little script that can be useful to
someone with 100k commits instead of my measly 800, that's great.

For that matter, fast-import is a fairly short program. It wouldn't be
hard for other scm projects to do something similar. fast-import could
become a "standard" intermediate format. But even if that doesn't
happen, the amounts of code we're talking about (to do parsing and
commit generation) are reasonably modest and easy to change.

As soon as I make a bit more progress I'm going to make my code available.

Cheers,

- David

On 7/14/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> My idea is not to built (for example) cvs2git; rather, I'd like cvs2svn
> to be split conceptually into two tools:
>
> cvs2<abstract_description_of_cvs_history>, whose job it is to determine
> the most likely "true" CVS history based on the data stored in the CVS
> repository, and
>
> <abstract_description_of_cvs_history>2svn
>
> Then later write
>
> <abstract_description_of_cvs_history>2git
> <abstract_description_of_cvs_history>2hg
>
> etc.
>
> The first split is partly done in cvs2svn 2.0.  And I naively imagine
> that writing the new output back ends won't be all that much work.
>
> Michael
>
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
If I have not seen farther, it is because I have stood in the
footsteps of giants.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 22:19         ` Michael Haggerty
  2007-07-14 22:44           ` Karl Fogel
  2007-07-14 23:23           ` David Frech
@ 2007-07-15  1:39           ` Eric S. Raymond
  2007-07-15 12:04             ` Michael Haggerty
  2007-07-16  1:05             ` Martin Langhoff
  2 siblings, 2 replies; 31+ messages in thread
From: Eric S. Raymond @ 2007-07-15  1:39 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Martin Langhoff, Julian Phillips, git, dev

Michael Haggerty <mhagger@alum.mit.edu>:
> Could you give a quick summary of the relevant differences between CVS
> and RCS files in this context?  Then I'd be happy to try to figure out
> how bad the situation still is today, and whether it can be easily improved.

I found my copy of the bug report, and I misremembered the problem
slightly.  It turns out to be even more relevant to this 
discussion than I thought.

Thread begins with <20040810031409.GA25564@thyrsus.com> on
9 Aug 2004.  The thread title was "RFC -- enhancing cvs2svn to have a
notion of spans of mergeable commits".  Your mailing-list archive
search can't seem to find it, unfortunately.  I'll repost the query
iseparately

> Other people have complained about having to convert from SVN to
> distributed SCMs, because the SVN model doesn't map so easily to their
> favorite.

OK.  But I think that if SVN -> X is hard, CVS -> X is going to be harder.

> You are basically suggesting that an SVN repository is the best lingua
> franca of the SCM world, which I don't believe.

Not quite.  I'm suggesting it's an appropriate lingua franca for centralized
VCSes with branching, e.g. everything pre-Arch.

>                                               The CVS history *does*
> have to be deformed a bit to fit into SVN, and an svn2xxx converter
> would have to undo the deformation.

Then perhaps the right thing to think about is this: how exactly does
CVS history need to be deformed, and is there some way to express the
lost information as conventional properties or tags?

> My idea is not to built (for example) cvs2git; rather, I'd like cvs2svn
> to be split conceptually into two tools:

Well, that makes more sense.  But how would whatever the first half outputs
be different from an svn dump file? 
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 18:14       ` Steffen Prohaska
@ 2007-07-15  2:22         ` Shawn O. Pearce
  0 siblings, 0 replies; 31+ messages in thread
From: Shawn O. Pearce @ 2007-07-15  2:22 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Michael Haggerty, Simon Hausmann, Martin Langhoff,
	Julian Phillips, Git Mailing List, dev

Steffen Prohaska <prohaska@zib.de> wrote:
> On Jul 14, 2007, at 7:09 PM, Michael Haggerty wrote:
> >Martin Langhoff wrote:
> >>On 7/14/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> >>>Incidentally, now that cvs2svn 2.0.0 is nearly out, I am thinking  
> >>>about
> >>>what it would take to write some other back ends for cvs2svn-- 
> >>>turning
> >>>it, essentially, into cvs2xxx.  Most of the work that cvs2svn  
...
> >>
> >>Great to hear that. I'm game if we can do something in this direction
> >>- surely we can make it talk to fastimport ;-)
> >
> >We added some hooks to cvs2svn 2.0 to start working in this direction.
> >But I don't really know what information is needed for a git import.
> >One quick-and-dirty idea that I had was to have cvs2svn output
> >information compatible with cvsps's output, as I believe that several
> >tools rely on cvsps to do the dirty work and so could perhaps be
> >persuaded to use cvs2svn out of the box.
> 
> From my understanding, piping data to git fast-import would be
> a sane gateway to git. The input format of fast-import is document
> in [1].
> 
> Maybe Shaw Pearce has some comments on that. Shawn did most
> (maybe all) of the work on git-fast-import.

You must be new to this discussion.  ;-)

git-fast-import started as a backend for a hacked up version of
cvs2svn that Jon Smirl was working on to convert the massive Mozilla
CVS repository into Git.  Jon started from the cvs2svn codebase
because it best handled the damaged RCS files that exist in the
Mozilla repository.  Many emails have been exchanged between myself,
Michael and Jon on this subject.

So yes, git-fast-import was designed to act as a backend behind
something like cvs2xxx.  Some of the "oddities" of the fast-import
input language are the way they are partly because of the way
the (older) cvs2svn code generated output in SVN dump format.
Certain data was available at certain times and not at others,
so Jon wanted to feed it to git-fast-import when he had it, rather
than needing to buffer it or rearrange code.

I'm staying far away from writing fast-import frontends.  Anyone that
wants/needs a CVS frontend is welcome to implement one, but it
won't written be me.  I gave up CVS a long time ago and will never
return to it.  My only VCS is Git, and converting Git->Git is sort
of stupid.  So I have no need for a fast-import frontend.

But I do maintain fast-import.  Well over 99% of it was written
by me.  ;-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 23:23           ` David Frech
@ 2007-07-15  2:30             ` Shawn O. Pearce
  2007-07-15 11:48             ` Michael Haggerty
  1 sibling, 0 replies; 31+ messages in thread
From: Shawn O. Pearce @ 2007-07-15  2:30 UTC (permalink / raw)
  To: David Frech
  Cc: Michael Haggerty, esr, Martin Langhoff, Julian Phillips, git, dev

David Frech <david@nimblemachines.com> wrote:
> Now I want to switch to git. I've since added enough to svn that there
> is no option but to use th svn repo as my source. git-svnimport
> doesn't work for me because its idea of the structure of my repo is
> too limited. I looked around, stumbled over fast-import, and got
> hooked on the idea of using it. It seemed simple enough... I wrote a
> 350-line Lua (!!) program that parses the svn dump file and creates a
> commit stream for fast-import.
> 
> It took a day and half to get the svn dump parsing right (it's an
> egregiously bad format) but only a couple of hours to write the
> fast-import backend.

With the 'C' (copy) and 'R' (rename) operators in fast-import I was
starting to suspect that an SVN dump->fast-import stream translator
wasn't going to be that complex.

I wouldn't want to attempt to parse the SVN dump format directly
in fast-import.  As you said the format is horribly difficult
to read.  The entire fast-import stream parser is only 624 lines
of C (the other 1,636 lines of fast-import are for documentation,
the in memory tree/branch management and packfile generation).
I doubt the SVN dump file can be parsed in as few lines of C code.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 23:23           ` David Frech
  2007-07-15  2:30             ` Shawn O. Pearce
@ 2007-07-15 11:48             ` Michael Haggerty
  2007-07-16  1:08               ` Martin Langhoff
  1 sibling, 1 reply; 31+ messages in thread
From: Michael Haggerty @ 2007-07-15 11:48 UTC (permalink / raw)
  To: David Frech; +Cc: esr, Martin Langhoff, Julian Phillips, git, dev

David Frech wrote:
> I have a modest svn repo (about 800 commits) that contains fifteen or
> so small projects. It started life as a CVS repo, and as the projects
> grew and changed, and as I learned more about CVS, things got moved
> around. Later, when I got interested in svn (in 2005) I converted the
> repo, using cvs2svn. It got a few things wrong - mostly, that it
> thought there was one project in the repo, and created toplevel
> trunk/, branches/, and tags/ directories, and lumped everything below
> these.

I know this tangential to the main point of your post, but BTW
multiproject conversions were added to cvs2svn in release 1.5.

> It took a day and half to get the svn dump parsing right (it's an
> egregiously bad format) but only a couple of hours to write the
> fast-import backend.

I'm surprised you think that; I find the svn dump format quite easy and
straightforward.  (Of course it assumes some Subversionisms, like easy
deep directory copies, which I can imagine would be annoying in other
contexts.)  What don't you like about the format?

> Having done all this, I realized that this is a good way to go.
> Separating, as Michael suggests, the "parsing" part from the "commit
> generating" part, not only makes the tools easier to write, but makes
> them more flexible. If hg or bzr had a git-like fast-import (maybe
> they do) it would take me about 35 minutes to target that instead. And
> in the process I came across some "missing features" in fast-import,
> which Shawn Pearce was able to quickly add.

Yes, fast-import is a very easy-to-write format and looks to be very
well documented.  I don't think that having to write output in
fast-import format would be any kind of a hindrance for such a tool.

Michael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-15  1:39           ` Eric S. Raymond
@ 2007-07-15 12:04             ` Michael Haggerty
  2007-07-15 13:36               ` Eric S. Raymond
  2007-07-16  1:05             ` Martin Langhoff
  1 sibling, 1 reply; 31+ messages in thread
From: Michael Haggerty @ 2007-07-15 12:04 UTC (permalink / raw)
  To: esr; +Cc: Martin Langhoff, Julian Phillips, git, dev

Eric S. Raymond wrote:
> Michael Haggerty <mhagger@alum.mit.edu>:
>>                                               The CVS history *does*
>> have to be deformed a bit to fit into SVN, and an svn2xxx converter
>> would have to undo the deformation.
> 
> Then perhaps the right thing to think about is this: how exactly does
> CVS history need to be deformed, and is there some way to express the
> lost information as conventional properties or tags?

Hmmm, perhaps "deformed" was not the best word.  "Reorganized" is a
better description.

For example, cvs2svn internally deduces which files should be added to a
given branch in a given commit.  But the information cannot be output to
SVN in that form.  Instead, cvs2svn has to figure out which
*directories* to copy to the branch directory, then which files to
remove from the copied directory (because they shouldn't have been
tagged), and which other files to copy from other sources.  This extra
work, which is quite time- and space-consuming, is worse than pointless
when converting to git, because git has to invert the process to figure
out which individual files have to be tagged!

>> My idea is not to built (for example) cvs2git; rather, I'd like cvs2svn
>> to be split conceptually into two tools:
> 
> Well, that makes more sense.  But how would whatever the first half outputs
> be different from an svn dump file? 

The interface between the two halves does not necessarily need to be a
serialized data stream; it could just as well be via the Python API that
is used internally by cvs2svn to access the reconstructed commits and
supporting databases.  This would require the second half to be written
in Python, but otherwise would be very flexible and would avoid the need
to find a be-all serialized format.

Michael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-15 12:04             ` Michael Haggerty
@ 2007-07-15 13:36               ` Eric S. Raymond
  0 siblings, 0 replies; 31+ messages in thread
From: Eric S. Raymond @ 2007-07-15 13:36 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Martin Langhoff, Julian Phillips, git, dev

Michael Haggerty <mhagger@alum.mit.edu>:
> For example, cvs2svn internally deduces which files should be added to a
> given branch in a given commit.  But the information cannot be output to
> SVN in that form.  Instead, cvs2svn has to figure out which
> *directories* to copy to the branch directory, then which files to
> remove from the copied directory (because they shouldn't have been
> tagged), and which other files to copy from other sources.  This extra
> work, which is quite time- and space-consuming, is worse than pointless
> when converting to git, because git has to invert the process to figure
> out which individual files have to be tagged!

OK, that's a fair point.  I might have known the showstopper would be
somewhere near Subversion's tags-are-directories assumption.  And this
also neatly explains why I didn't see any problems or poor performance
during my recent conversions; the projects I was lifting had no tags.

> The interface between the two halves does not necessarily need to be a
> serialized data stream; it could just as well be via the Python API that
> is used internally by cvs2svn to access the reconstructed commits and
> supporting databases.  This would require the second half to be written
> in Python, but otherwise would be very flexible and would avoid the need
> to find a be-all serialized format.

Or...wait for it...the generator for the serialized format could be one
of the back ends!   Probably a good idea to have for debugging reasons, 
if nothing else.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14 17:09     ` Michael Haggerty
                         ` (2 preceding siblings ...)
  2007-07-14 19:52       ` Eric S. Raymond
@ 2007-07-15 23:09       ` Scott Lamb
  3 siblings, 0 replies; 31+ messages in thread
From: Scott Lamb @ 2007-07-15 23:09 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Martin Langhoff, Julian Phillips, git, dev

Michael Haggerty wrote:
> One quick-and-dirty idea that I had was to have cvs2svn output
> information compatible with cvsps's output, as I believe that several
> tools rely on cvsps to do the dirty work and so could perhaps be
> persuaded to use cvs2svn out of the box.

I think this would be an excellent approach. The interface between
cvs->X (cvsps), Y->git (git-fastimport), and cvs->git glue
(git-cvsimport) is a great idea for troubleshooting and for code sharing
with other converters. (Shawn O. Pearce's attitude is a great example of
this - he can maintain the part he cares about and several converters
benefit even though he's never used them.)

However, I was unhappy to see that cvsps doesn't reuse any cvs2svn code
or unit tests. I remember seeing a lot of those hairy cases on the
Subversion list long ago, so a CVS converter without those tests seems
untrustworthy. If I maintained an important CVS repository I wanted to
convert to git accurately, I would use cvs2svn.py+git-svnimport over
git-cvsimport any day.

They both seem much better than something like Tailor, though. I've
discovered several things that made me realize going through working
copies is error-prone (as well as slow).

>> Does cvs2svn handle incremental imports, remembering any "guesses"
>> taken earlier? Last time I looked at it, it had far better logic than
>> cvsps, but it didn't do incremental imports, and repeated imports done
>> at different times would "guess" different branching points for new
>> branches, so it _really_ didn't support incrementals
> 
> That's correct; cvs2svn does not support incremental conversion at all
> (at least not yet).

That's an important feature for me. I'm using git-cvsimport to track
other people's CVS repositories. Initial import is SLOW and
resource-intensive on the network, client, and server, so I couldn't
switch to anything that didn't support incremental use.

Best regards,
Scott

-- 
Scott Lamb <http://www.slamb.org/>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-15  1:39           ` Eric S. Raymond
  2007-07-15 12:04             ` Michael Haggerty
@ 2007-07-16  1:05             ` Martin Langhoff
  2007-07-19 12:02               ` Markus Schiltknecht
  2007-07-19 19:14               ` Simon 'corecode' Schubert
  1 sibling, 2 replies; 31+ messages in thread
From: Martin Langhoff @ 2007-07-16  1:05 UTC (permalink / raw)
  To: esr; +Cc: Michael Haggerty, Julian Phillips, git, dev

On 7/15/07, Eric S. Raymond <esr@thyrsus.com> wrote:
> Not quite.  I'm suggesting it's an appropriate lingua franca for centralized
> VCSes with branching, e.g. everything pre-Arch.

That's a huge goal that gets in the way of waht we want to do here: we
are trying to save time, not embark on some huge mission.

cvs2svn has all the "wtf-did-cvs-mean-by-that" algorithms that are
very hard to write and maintain, and it seems to be the best one at
that. Of course, it also writes SVN repos -- but I'm sure that's the
easiest part.

     We don't need no meta VCS for any of this.

All we need is to hook into the "write out a repo based on all the
stuff we parsed from cvs". Perhaps it's doable, and if Michael helps
out abstracting that part a bit, maintainable long term too.

cheers,

m

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-15 11:48             ` Michael Haggerty
@ 2007-07-16  1:08               ` Martin Langhoff
  2007-07-16  1:13                 ` Julian Phillips
  2007-07-16  1:30                 ` Karl Fogel
  0 siblings, 2 replies; 31+ messages in thread
From: Martin Langhoff @ 2007-07-16  1:08 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: David Frech, esr, Julian Phillips, git, dev

On 7/15/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> > It took a day and half to get the svn dump parsing right (it's an
> > egregiously bad format) but only a couple of hours to write the
> > fast-import backend.
>
> I'm surprised you think that; I find the svn dump format quite easy and
> straightforward.  (Of course it assumes some Subversionisms, like easy
> deep directory copies, which I can imagine would be annoying in other
> contexts.)  What don't you like about the format?

Is there good doco and samples for it? I wouldn't mind doing things by
way of an SVN dump parser.

> Yes, fast-import is a very easy-to-write format and looks to be very
> well documented.  I don't think that having to write output in
> fast-import format would be any kind of a hindrance for such a tool.

Damn! You've now figured out that all my volunteering was for the easy
part of the job ;-)




m

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-16  1:08               ` Martin Langhoff
@ 2007-07-16  1:13                 ` Julian Phillips
  2007-07-16  1:30                 ` Karl Fogel
  1 sibling, 0 replies; 31+ messages in thread
From: Julian Phillips @ 2007-07-16  1:13 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Michael Haggerty, David Frech, esr, git, dev

On Mon, 16 Jul 2007, Martin Langhoff wrote:

> On 7/15/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> >  It took a day and half to get the svn dump parsing right (it's an
>> >  egregiously bad format) but only a couple of hours to write the
>> >  fast-import backend.
>>
>>  I'm surprised you think that; I find the svn dump format quite easy and
>>  straightforward.  (Of course it assumes some Subversionisms, like easy
>>  deep directory copies, which I can imagine would be annoying in other
>>  contexts.)  What don't you like about the format?
>
> Is there good doco and samples for it? I wouldn't mind doing things by
> way of an SVN dump parser.

I don't know if it classes as what you call good, but it is documented:

http://svn.collab.net/repos/svn/trunk/notes/fs_dumprestore.txt

>
>>  Yes, fast-import is a very easy-to-write format and looks to be very
>>  well documented.  I don't think that having to write output in
>>  fast-import format would be any kind of a hindrance for such a tool.
>
> Damn! You've now figured out that all my volunteering was for the easy
> part of the job ;-)
>
>
>
>
> m
>
>

-- 
Julian

  ---
This process can check if this value is zero, and if it is, it does
something child-like.
 		-- Forbes Burkowski, CS 454, University of Washington

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-16  1:08               ` Martin Langhoff
  2007-07-16  1:13                 ` Julian Phillips
@ 2007-07-16  1:30                 ` Karl Fogel
  1 sibling, 0 replies; 31+ messages in thread
From: Karl Fogel @ 2007-07-16  1:30 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Michael Haggerty, David Frech, esr, Julian Phillips, git, dev

"Martin Langhoff" <martin.langhoff@gmail.com> writes:
> On 7/15/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> > It took a day and half to get the svn dump parsing right (it's an
>> > egregiously bad format) but only a couple of hours to write the
>> > fast-import backend.
>>
>> I'm surprised you think that; I find the svn dump format quite easy and
>> straightforward.  (Of course it assumes some Subversionisms, like easy
>> deep directory copies, which I can imagine would be annoying in other
>> contexts.)  What don't you like about the format?
>
> Is there good doco and samples for it? I wouldn't mind doing things by
> way of an SVN dump parser.

   http://svn.collab.net/repos/svn/trunk/notes/dump-load-format.txt

Best,
-Karl

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-16  1:05             ` Martin Langhoff
@ 2007-07-19 12:02               ` Markus Schiltknecht
  2007-07-20  3:51                 ` Karl Fogel
  2007-07-19 19:14               ` Simon 'corecode' Schubert
  1 sibling, 1 reply; 31+ messages in thread
From: Markus Schiltknecht @ 2007-07-19 12:02 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: esr, Michael Haggerty, Julian Phillips, git, dev

Hi,

Martin Langhoff wrote:
> cvs2svn has all the "wtf-did-cvs-mean-by-that" algorithms that are
> very hard to write and maintain, and it seems to be the best one at
> that. Of course, it also writes SVN repos -- but I'm sure that's the
> easiest part.
> 
>     We don't need no meta VCS for any of this.

Sure, we certainly need a meta format of some sort (not a full blown 
VCS, agreed, but somehow we need to represent commits, tags and 
branches). And IMO, the subversion based format is not a good one, 
because it treats branches and tags very different from most other 
systems (and from what it should be from a users perspective: an atomic 
operation).

We (Michael, Oswald and me) have discussed joining efforts of my cvs to 
monotone converter, but I quickly dropped that idea because the cvs2svn 
converter is too subversion specific. If cvs2svn wants to become a 
universal cvs importer, it needs to get rid of those assumptions (and do 
more work to unify tagging and branching).

Regards

Markus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-16  1:05             ` Martin Langhoff
  2007-07-19 12:02               ` Markus Schiltknecht
@ 2007-07-19 19:14               ` Simon 'corecode' Schubert
  2007-07-20  8:45                 ` Markus Schiltknecht
  1 sibling, 1 reply; 31+ messages in thread
From: Simon 'corecode' Schubert @ 2007-07-19 19:14 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: esr, Michael Haggerty, Julian Phillips, git, dev

[sorry for jumping in so late, didn't read git@vger for a while]

Martin Langhoff wrote:
> On 7/15/07, Eric S. Raymond <esr@thyrsus.com> wrote:
>> Not quite.  I'm suggesting it's an appropriate lingua franca for 
>> centralized
>> VCSes with branching, e.g. everything pre-Arch.

I do not think Eric is right here.  You will allways lose information when converting CVS to svn, and if it is just the uncertainty, the non-atomicity.  This is also information (hidden one, though).

> That's a huge goal that gets in the way of waht we want to do here: we
> are trying to save time, not embark on some huge mission.
> 
> cvs2svn has all the "wtf-did-cvs-mean-by-that" algorithms that are
> very hard to write and maintain, and it seems to be the best one at
> that. Of course, it also writes SVN repos -- but I'm sure that's the
> easiest part.

True.  However, cvs2svn has many assumptions (or at least has had when I last checked) which are targeted to svn, and unsuitable for a generic system (tags + branches).

>     We don't need no meta VCS for any of this.

Yes.  I've already done what people want, it is not called cvs2xxx, but fromcvs [1].  I don't think it is necessary to define an output format.  Of course, that's possible, but limiting yourself to a file format means you're losing flexibility, which is needed for efficient, correct and fast repository conversion.

cheers
  simon

[1] http://ww2.fs.ei.tum.de/~corecode/hg/fromcvs/

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-13 14:48 CVS -> SVN -> Git Julian Phillips
  2007-07-13 23:03 ` Michael Haggerty
@ 2007-07-19 19:15 ` Simon 'corecode' Schubert
  2007-07-20  5:58   ` Julian Phillips
  1 sibling, 1 reply; 31+ messages in thread
From: Simon 'corecode' Schubert @ 2007-07-19 19:15 UTC (permalink / raw)
  To: Julian Phillips; +Cc: git

Julian Phillips wrote:
> Has anyone managed to succssfully import a Subversion repository that 
> was initially imported from CVS using cvs2svn using fast-import?
> 
> It looks like cvs2svn has created a rather big mess.   It has created 
> single commits that change files in more than one branch and/or tag. It 
> also creates tags using more than one commit.  Now I come to try and 
> import the Subversion history into git and I'm having trouble creating a 
> sensible stream to feed into fast-import.

Did you try first converting the old CVS repo to git and then adding the svn changes?  That might give you much better results.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-14  5:30   ` Martin Langhoff
  2007-07-14 17:09     ` Michael Haggerty
@ 2007-07-19 19:18     ` Simon 'corecode' Schubert
  1 sibling, 0 replies; 31+ messages in thread
From: Simon 'corecode' Schubert @ 2007-07-19 19:18 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Michael Haggerty, Julian Phillips, git

Martin Langhoff wrote:
> On 7/14/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> Incidentally, now that cvs2svn 2.0.0 is nearly out, I am thinking about
>> what it would take to write some other back ends for cvs2svn--turning
>> it, essentially, into cvs2xxx.  Most of the work that cvs2svn does is
>> inferring the most plausible history of the repository from CVS's
>> sketchy, incomplete, idiomatic, and often corrupt data.  This work
>> should also be useful for a cvs2git or cvs2hg or cvs2baz or ...
> 
> Great to hear that. I'm game if we can do something in this direction
> - surely we can make it talk to fastimport ;-)

In this context I suggest looking at fromcvs [1], my cvs->otherscm converter.  Right now it does git + hg (and sqlite for queries), but it probably is easily extensible for other targets.

> Does cvs2svn handle incremental imports, remembering any "guesses"
> taken earlier? Last time I looked at it, it had far better logic than
> cvsps, but it didn't do incremental imports, and repeated imports done
> at different times would "guess" different branching points for new
> branches, so it _really_ didn't support incrementals

fromcvs will also handle incremental imports.  If not, please tell me and I will try to fix it.

cheers
  simon

[1] http://ww2.fs.ei.tum.de/~corecode/hg/fromcvs/

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-19 12:02               ` Markus Schiltknecht
@ 2007-07-20  3:51                 ` Karl Fogel
  0 siblings, 0 replies; 31+ messages in thread
From: Karl Fogel @ 2007-07-20  3:51 UTC (permalink / raw)
  To: Markus Schiltknecht
  Cc: Martin Langhoff, esr, Michael Haggerty, Julian Phillips, git, dev

Markus Schiltknecht <markus@bluegap.ch> writes:
> Sure, we certainly need a meta format of some sort (not a full blown
> VCS, agreed, but somehow we need to represent commits, tags and
> branches). And IMO, the subversion based format is not a good one,
> because it treats branches and tags very different from most other
> systems (and from what it should be from a users perspective: an
> atomic operation).

Huh?  I don't understand what you're saying about atomicity here.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-19 19:15 ` Simon 'corecode' Schubert
@ 2007-07-20  5:58   ` Julian Phillips
  0 siblings, 0 replies; 31+ messages in thread
From: Julian Phillips @ 2007-07-20  5:58 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: git

On Thu, 19 Jul 2007, Simon 'corecode' Schubert wrote:

> Julian Phillips wrote:
>>  Has anyone managed to succssfully import a Subversion repository that was
>>  initially imported from CVS using cvs2svn using fast-import?
>>
>>  It looks like cvs2svn has created a rather big mess.   It has created
>>  single commits that change files in more than one branch and/or tag. It
>>  also creates tags using more than one commit.  Now I come to try and
>>  import the Subversion history into git and I'm having trouble creating a
>>  sensible stream to feed into fast-import.
>
> Did you try first converting the old CVS repo to git and then adding the svn 
> changes?  That might give you much better results.

I thought about it, but there are over 20000 commits sitting on top of the 
converted history referring back to it - I'm not convinced that I could 
stitch things back together properly, the svn history now really does 
rely on the import done by cvs2svn.  (btw, I blame CVS for the mess not 
cvs2svn, we should have switched _before_ we started using branches 
heavily ...)

The problem really is that we use branching like it's going out of 
fashion.  We have thousands now, and had at least 10s if not 100s by the 
time we gave up on CVS.  Similarly with tags.

I think I've managed to get things sorted now with fast-import ... just 
need to be able to copy blobs from other commits and I think I'll be done. 
It really is a nice tool.

-- 
Julian

  ---
There is nothing so easy but that it becomes difficult when you do it
reluctantly.
 		-- Publius Terentius Afer (Terence)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: CVS -> SVN -> Git
  2007-07-19 19:14               ` Simon 'corecode' Schubert
@ 2007-07-20  8:45                 ` Markus Schiltknecht
  0 siblings, 0 replies; 31+ messages in thread
From: Markus Schiltknecht @ 2007-07-20  8:45 UTC (permalink / raw)
  To: Simon 'corecode' Schubert
  Cc: Martin Langhoff, esr, Michael Haggerty, Julian Phillips, git, dev

Simon 'corecode' Schubert wrote:
> I do not think Eric is right here.  You will allways lose information 
> when converting CVS to svn, and if it is just the uncertainty, the 
> non-atomicity.  This is also information (hidden one, though).

Full ACK.

> Yes.  I've already done what people want, it is not called cvs2xxx, but 
> fromcvs [1].  I don't think it is necessary to define an output format.  
> Of course, that's possible, but limiting yourself to a file format means 
> you're losing flexibility, which is needed for efficient, correct and 
> fast repository conversion.

Hm.. interesting. I'll have a close look.

Regards

Markus

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2007-07-20  8:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-13 14:48 CVS -> SVN -> Git Julian Phillips
2007-07-13 23:03 ` Michael Haggerty
2007-07-14  5:30   ` Martin Langhoff
2007-07-14 17:09     ` Michael Haggerty
2007-07-14 17:32       ` Chris Shoemaker
2007-07-14 20:01         ` Michael Haggerty
2007-07-14 18:14       ` Steffen Prohaska
2007-07-15  2:22         ` Shawn O. Pearce
2007-07-14 19:52       ` Eric S. Raymond
2007-07-14 20:58         ` Junio C Hamano
2007-07-14 21:50         ` Oswald Buddenhagen
2007-07-14 22:19         ` Michael Haggerty
2007-07-14 22:44           ` Karl Fogel
2007-07-14 23:23           ` David Frech
2007-07-15  2:30             ` Shawn O. Pearce
2007-07-15 11:48             ` Michael Haggerty
2007-07-16  1:08               ` Martin Langhoff
2007-07-16  1:13                 ` Julian Phillips
2007-07-16  1:30                 ` Karl Fogel
2007-07-15  1:39           ` Eric S. Raymond
2007-07-15 12:04             ` Michael Haggerty
2007-07-15 13:36               ` Eric S. Raymond
2007-07-16  1:05             ` Martin Langhoff
2007-07-19 12:02               ` Markus Schiltknecht
2007-07-20  3:51                 ` Karl Fogel
2007-07-19 19:14               ` Simon 'corecode' Schubert
2007-07-20  8:45                 ` Markus Schiltknecht
2007-07-15 23:09       ` Scott Lamb
2007-07-19 19:18     ` Simon 'corecode' Schubert
2007-07-19 19:15 ` Simon 'corecode' Schubert
2007-07-20  5:58   ` Julian Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).