git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Jakub Narebski" <jnareb@gmail.com>
Cc: git@vger.kernel.org, users@cvs2svn.tigris.org
Subject: Re: cvs2svn conversion directly to git ready for experimentation
Date: Thu, 2 Aug 2007 19:44:37 -0400	[thread overview]
Message-ID: <9e4733910708021644q6eba0e78gc2c6bcfba4816012@mail.gmail.com> (raw)
In-Reply-To: <f8r09t$qdg$1@sea.gmane.org>

On 8/1/07, Jakub Narebski <jnareb@gmail.com> wrote:
> Michael Haggerty wrote:
>
> > I am the maintainer of cvs2svn[1], which is a program for one-time
> > conversions from CVS to Subversion. cvs2svn is very robust against the
> > many peculiarities of CVS and can convert just about every CVS
> > repository we have ever seen.
> >
> > I've been working on a cvs2svn output pass that writes the converted CVS
> > repository directly into git rather than Subversion. The code runs now
> > with at least one repository from our test suite of nasty CVS repositories.
>
> Have you contacted Jon Smirl about his unpublished work on cvs2git,
> cvs2svn based CVS to Git converter?

My converter was derived from Michael's cvs2svn code. The bulk of my
work was converting cvs2svn to output in a format that git-fastimport
could consume. This was all rather straight forward and there was
nothing really interesting in the code.

What it exposed were fundamental issues about the technical
complexities of trying to reconstruct a change set history from CVS
which didn't record all of the needed info.  I was never able to
construct a satisfactory git representation of the Mozilla CVS
repository.  Michael has had a long time to work on the change set
detection code and he's probably added some new strategies.

My code did include a CVS file parser for extracting all the revisions
from the file in a single pass. Doing that is a major performance
benefit.  I believe I posted the code to the cvs2svn mailing list. It
was about 200 lines of code. Forking off cvs a million times to
extract the revisions takes days to run.

Same goes for forking git a million times.git-fastimport uses a pipe
to cvs2svn to avoid forking. git-fastimport also uses a technique from
the database world for bulk import, it imports everything without
indexing it. Indexing is done after the import finishes.

Between parsing the CVS files internally and Shawn's git-fastimport,
it was possible to import Mozilla CVS (2.4G) in about 2 hours and
generate a 450MB pack file. You need 3GB of RAM to do this - if swap
happens the process will take weeks to finish.

> Quote from InterfacesFrontendsAndTools page on GIT wiki[1]:
>
>   cvs2git is the unofficial name of Jon Smirl's modifications to cvs2svn.
>   These modifications allow cvs2svn to generate a data stream which is
>   consumed by Shawn Pearce's git-fast-import (now included in git.git).
>   git-fast-import converts its input stream directly into a Git .pack file,
>   minimizing the amount of IO required on large imports.
>
>   Jon Smirl stopped working on cvs2git[2] because first, Mozilla (which was
>   main target of his work) decided that to not to move to git, and second
>   because of troubles with cvs2svn architecture[*] (which it is based on).
>   Jon Smirl has posted his impressions on working on CVS importer in
>   "Some tips for doing a CVS importer" thread[3].
>
> References:
> -----------
> [1] http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#head-23858c2cde0cef60443d8e73e6829a95f8e191ef
> [2] http://msgid.gmane.org/9e4733910611190940y147992b8mbdfac5a51f42e0fe@mail.gmail.com
> [3] http://marc.theaimsgroup.com/?t=116405956000001&r=1&w=2
>
> Footnotes:
> ----------
> [*] If I remember correctly authors of cvs2svn were talking about separating
> the code dealing with disentangling CVS repository structure from the part
> translating it into Subversion repository (with its quirks), and the part
> generating Subversion repository.
>
> --
> Jakub Narebski
> Warsaw, Poland
> ShadeHawk on #git
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Jon Smirl
jonsmirl@gmail.com

  parent reply	other threads:[~2007-08-02 23:44 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-01  0:09 cvs2svn conversion directly to git ready for experimentation Michael Haggerty
2007-08-01  0:41 ` Johannes Schindelin
2007-08-01 22:09 ` Jakub Narebski
2007-08-02 16:58   ` Michael Haggerty
2007-08-02 23:44   ` Jon Smirl [this message]
2007-08-02  8:49 ` Steffen Prohaska
2007-08-02 17:23   ` Michael Haggerty
2007-08-02 19:22     ` Marko Macek
2007-08-02 23:59     ` Jon Smirl
2007-08-05  7:58       ` Oswald Buddenhagen
2007-08-02 17:35   ` Simon 'corecode' Schubert
2007-08-02 19:13     ` Steffen Prohaska
2007-08-02 19:29       ` Simon 'corecode' Schubert
2007-08-02 20:21         ` Robin Rosenberg
     [not found]           ` <200708022221.13129.robin.rosenberg.lists-RgPrefM1rjDQT0dZR+AlfA@public.gmane.org>
2007-08-02 20:31             ` Lübbe Onken
2007-08-02 20:32           ` Lübbe Onken
2007-08-02 20:33           ` Lübbe Onken
2007-08-02 22:02         ` Steffen Prohaska
2007-08-02 22:50           ` Simon 'corecode' Schubert
2007-08-02 23:50             ` Michael Haggerty
2007-08-03  8:40               ` Simon 'corecode' Schubert
2007-08-04  8:28             ` Steffen Prohaska
2007-08-03  3:07         ` Shawn O. Pearce
2007-08-02 23:37       ` Michael Haggerty
2007-08-02 20:43   ` Linus Torvalds
2007-08-02 23:19     ` Michael Haggerty
2007-08-03  3:12       ` Shawn O. Pearce
2007-08-02 23:55   ` Jon Smirl
     [not found] ` <8b65902a0708010438s24d16109k601b52c04cf9c066@mail.gmail.com>
2007-08-02 15:34   ` Michael Haggerty
2007-08-02 23:08     ` Martin Langhoff
2007-08-03  4:03       ` Johannes Schindelin
2007-08-03  6:48         ` Steffen Prohaska
2007-08-03  7:10       ` Steffen Prohaska
2007-08-03  8:36       ` Michael Haggerty
2007-08-03 14:35         ` Patwardhan, Rajesh
2007-08-03 15:41           ` Jon Smirl
2007-08-03 16:42             ` Patwardhan, Rajesh
2007-08-03 18:58             ` Michael Haggerty
2007-08-03 20:16               ` Jon Smirl
2007-08-03 20:27                 ` Jon Smirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e4733910708021644q6eba0e78gc2c6bcfba4816012@mail.gmail.com \
    --to=jonsmirl@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=users@cvs2svn.tigris.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).