git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "David Frech" <david@nimblemachines.com>
To: "Michael Haggerty" <mhagger@alum.mit.edu>
Cc: esr@thyrsus.com, "Martin Langhoff" <martin.langhoff@gmail.com>,
	"Julian Phillips" <julian@quantumfyre.co.uk>,
	git@vger.kernel.org, dev <dev@cvs2svn.tigris.org>
Subject: Re: CVS -> SVN -> Git
Date: Sat, 14 Jul 2007 16:23:27 -0700	[thread overview]
Message-ID: <7154c5c60707141623s3f70e967s226e5da29965a173@mail.gmail.com> (raw)
In-Reply-To: <46994BDF.6050803@alum.mit.edu>

Now that this party is really rollicking, I think I'll join in. ;-)

I have a modest svn repo (about 800 commits) that contains fifteen or
so small projects. It started life as a CVS repo, and as the projects
grew and changed, and as I learned more about CVS, things got moved
around. Later, when I got interested in svn (in 2005) I converted the
repo, using cvs2svn. It got a few things wrong - mostly, that it
thought there was one project in the repo, and created toplevel
trunk/, branches/, and tags/ directories, and lumped everything below
these.

So, in svn, I moved things around some more.

Now I want to switch to git. I've since added enough to svn that there
is no option but to use th svn repo as my source. git-svnimport
doesn't work for me because its idea of the structure of my repo is
too limited. I looked around, stumbled over fast-import, and got
hooked on the idea of using it. It seemed simple enough... I wrote a
350-line Lua (!!) program that parses the svn dump file and creates a
commit stream for fast-import.

It took a day and half to get the svn dump parsing right (it's an
egregiously bad format) but only a couple of hours to write the
fast-import backend.

The code "works" in the sense that it can read an svn dump and create
a git repo that looks reasonable, but it misses a few things, like
properly inferring branch creation from the "copyfrom" info in the svn
dump.

However, it's fairly fast (~35 commits/sec) and flexible. I want to,
in the process of doing this conversion, "canonicalize" the structure
of the repo and throw away all the commits from cvs and svn that just
moved things around. This poses another inference challenge, but
having a modest simple tool (ie, a short enough program to easily
understand and modify) helps.

Having done all this, I realized that this is a good way to go.
Separating, as Michael suggests, the "parsing" part from the "commit
generating" part, not only makes the tools easier to write, but makes
them more flexible. If hg or bzr had a git-like fast-import (maybe
they do) it would take me about 35 minutes to target that instead. And
in the process I came across some "missing features" in fast-import,
which Shawn Pearce was able to quickly add.

My repo is tiny, but I still think that speed and flexibility are key
in this process. If I can write a little script that can be useful to
someone with 100k commits instead of my measly 800, that's great.

For that matter, fast-import is a fairly short program. It wouldn't be
hard for other scm projects to do something similar. fast-import could
become a "standard" intermediate format. But even if that doesn't
happen, the amounts of code we're talking about (to do parsing and
commit generation) are reasonably modest and easy to change.

As soon as I make a bit more progress I'm going to make my code available.

Cheers,

- David

On 7/14/07, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> My idea is not to built (for example) cvs2git; rather, I'd like cvs2svn
> to be split conceptually into two tools:
>
> cvs2<abstract_description_of_cvs_history>, whose job it is to determine
> the most likely "true" CVS history based on the data stored in the CVS
> repository, and
>
> <abstract_description_of_cvs_history>2svn
>
> Then later write
>
> <abstract_description_of_cvs_history>2git
> <abstract_description_of_cvs_history>2hg
>
> etc.
>
> The first split is partly done in cvs2svn 2.0.  And I naively imagine
> that writing the new output back ends won't be all that much work.
>
> Michael
>
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
If I have not seen farther, it is because I have stood in the
footsteps of giants.

  parent reply	other threads:[~2007-07-14 23:23 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-13 14:48 CVS -> SVN -> Git Julian Phillips
2007-07-13 23:03 ` Michael Haggerty
2007-07-14  5:30   ` Martin Langhoff
2007-07-14 17:09     ` Michael Haggerty
2007-07-14 17:32       ` Chris Shoemaker
2007-07-14 20:01         ` Michael Haggerty
2007-07-14 18:14       ` Steffen Prohaska
2007-07-15  2:22         ` Shawn O. Pearce
2007-07-14 19:52       ` Eric S. Raymond
2007-07-14 20:58         ` Junio C Hamano
2007-07-14 21:50         ` Oswald Buddenhagen
2007-07-14 22:19         ` Michael Haggerty
2007-07-14 22:44           ` Karl Fogel
2007-07-14 23:23           ` David Frech [this message]
2007-07-15  2:30             ` Shawn O. Pearce
2007-07-15 11:48             ` Michael Haggerty
2007-07-16  1:08               ` Martin Langhoff
2007-07-16  1:13                 ` Julian Phillips
2007-07-16  1:30                 ` Karl Fogel
2007-07-15  1:39           ` Eric S. Raymond
2007-07-15 12:04             ` Michael Haggerty
2007-07-15 13:36               ` Eric S. Raymond
2007-07-16  1:05             ` Martin Langhoff
2007-07-19 12:02               ` Markus Schiltknecht
2007-07-20  3:51                 ` Karl Fogel
2007-07-19 19:14               ` Simon 'corecode' Schubert
2007-07-20  8:45                 ` Markus Schiltknecht
2007-07-15 23:09       ` Scott Lamb
2007-07-19 19:18     ` Simon 'corecode' Schubert
2007-07-19 19:15 ` Simon 'corecode' Schubert
2007-07-20  5:58   ` Julian Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7154c5c60707141623s3f70e967s226e5da29965a173@mail.gmail.com \
    --to=david@nimblemachines.com \
    --cc=dev@cvs2svn.tigris.org \
    --cc=esr@thyrsus.com \
    --cc=git@vger.kernel.org \
    --cc=julian@quantumfyre.co.uk \
    --cc=martin.langhoff@gmail.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).