git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Git Mailing List" <git@vger.kernel.org>
Subject: Some tips for doing a CVS importer
Date: Mon, 20 Nov 2006 16:49:17 -0500	[thread overview]
Message-ID: <9e4733910611201349s4d08b984g772c64982f148bfa@mail.gmail.com> (raw)

I have tried all of the available CVS importers. None of them are
without problems. If anyone is interested in writing one for git here
are some ideas on how to structure it.

1) there is a working lex/yacc for CVS in the parsecvs source code
2) The first time you parse a CVS file record everything and don't
parse it again.
3) When the file is first parsed use the deltas to generate the
revisions and feed them to git-fastimport, just remember the SHA1 or
an id in the import code. This is a critical step to getting decent
performance.
4) If you do #1 and #2 you don't need to store CVS revision numbers
and file names in memory. Because of that you can can easily do a
Mozilla import in 2GB, probably 1GB.
5) When comparing CVS revisions only use the CVS timestamps as a last
resort, instead use the dependency information in the CVS file
6) Match up commits by using an sha1 of the author and commit message
7) After all files are loaded, match up the symbols and insert them
into the dependency chains, if any of the symbols depend on a branch
commit the symbol lies on the branch, otherwise the symbol is on the
trunk,
8) Do a topological sort to build the change set commit tree
9) when you hit a loop in the tree break up delta change sets until
the loop can be removed, don't break up symbol change sets.
10) Mozilla has some large commits that were made over dial up. Commit
change sets can span hours. All of these commits need to be merged
into a single change set.
11) An algorithm needs to be developed for detecting branches merging
back into the trunk
12) cvs2svn has excellent test cases, use them to test the new
importer. The cvs2svn code is quite nice but it doesn't handle #7

-- 
Jon Smirl

             reply	other threads:[~2006-11-20 21:49 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-20 21:49 Jon Smirl [this message]
2006-11-20 23:03 ` Some tips for doing a CVS importer Martin Langhoff
2006-11-20 23:37   ` Jon Smirl
2006-11-21  0:29     ` Martin Langhoff
2006-11-21  0:55       ` Carl Worth
2006-11-21  1:40         ` Jon Smirl
2006-11-21  6:39           ` Shawn Pearce
2006-11-21 19:56             ` lamikr
2006-11-21 20:05               ` Shawn Pearce
2006-11-23 19:45                 ` Robin Rosenberg
2006-11-25  6:59                   ` Shawn Pearce
2006-11-21 20:03             ` Petr Baudis
2006-11-21 20:15               ` Shawn Pearce
2006-11-21 20:22               ` Johannes Schindelin
2006-11-23  9:10                 ` Johannes Sixt
2006-11-21 20:40               ` Martin Langhoff
2006-11-21  1:53       ` Jon Smirl
2006-11-26 10:18         ` Marko Macek
2006-11-26 15:35           ` Jon Smirl
2006-11-26 16:11             ` Marko Macek
2006-11-26 17:51               ` Jon Smirl
2006-11-27 11:29               ` Michael Haggerty
2006-11-21  6:43       ` Shawn Pearce
2006-11-27 11:24 ` Michael Haggerty
2006-11-27 11:51   ` Markus Schiltknecht
2006-11-27 22:09     ` Michael Haggerty
2006-11-28 15:18       ` Markus Schiltknecht
2006-11-30  0:35         ` Michael Haggerty
2006-11-30  0:45           ` Daniel Jacobowitz
2006-11-27 15:20   ` Jon Smirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e4733910611201349s4d08b984g772c64982f148bfa@mail.gmail.com \
    --to=jonsmirl@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).