git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: esr@thyrsus.com
Cc: "Johan Herland" <johan@herland.net>,
	"Jakub Narębski" <jnareb@gmail.com>,
	"Martin Langhoff" <martin.langhoff@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: Re: I have end-of-lifed cvsps
Date: Thu, 19 Dec 2013 00:44:29 +0100	[thread overview]
Message-ID: <52B2335D.2030607@alum.mit.edu> (raw)
In-Reply-To: <20131217184724.GA17709@thyrsus.com>

On 12/17/2013 07:47 PM, Eric S. Raymond wrote:
> Johan Herland <johan@herland.net>:
>> However, I fear that you underestimate the number of users that want
>> to use Git against CVS repos that are orders of magnitude larger (in
>> both dimensions: #commits and #files) than your example repo.
> 
> You may be right. See below...
> 
> I'm working with Alan Barret now on trying to convert the NetBSD
> repositories. They break cvs-fast-export through sheer bulk of
> metadata, by running the machine out of core.  This is exactly
> the kind of huge case that you're talking about.
> 
> Alan and I are going to take a good hard whack at modifying cvs-fast-export 
> to make this work. Because there really aren't any feasible alternatives.
> The analysis code in cvsps was never good enough. cvs2git, being written
> in Python, would hit the core limit faster than anything written in C.

cvs2git goes to great lengths to store intermediate data to disk and
keep the working set small and therefore (despite the Python overhead) I
am confident that it scales better than cvs-fast-export.  My usual test
repo was gcc:

Total CVS Files:             25013
Total CVS Revisions:        578010
Total CVS Branches:        1487929
Total CVS Tags:           11435500
Total Unique Tags:             814
Total Unique Branches:         116
CVS Repos Size in KB:      2074248
Total SVN Commits:           64501

I also regularly converted mozilla (4.2 GB) and emacs (560 MB) for
testing purposes.  These could all be converted on a 32-bit computer.

Other projects that cvs2svn/cvs2git could handle: FreeBSD, Gentoo, KDE,
GNOME, PostgreSQL.  (Though for KDE, which I think was in the 16 GB
range, I know that they used a giant machine for the conversion.)

If you haven't tried cvs2git yet, please start it up somewhere in the
background.  It might take a while but it should have no trouble with
your repos, and then you can compare the tools based on experience
rather than speculation.

> Which matters, because right now the set of people working on CVS lifters
> begins with me and ends with Michael Rafferty (cvs2git), who seems even
> less interested in incremental conversion than I am.  Unless somebody
> comes out of nowhere and wants to own that problem, it's not going
> to get solved.

A correct incremental converter could be done (as long as the CVS users
don't literally change history retroactively) but it would be a lot of
work.  Parsing the CVS files isn't the problem; after all, CVS has to do
that every time you check out a branch.  The problem is the extra
bookkeeping that would be needed to keep the overlapping history
consistent between runs N and N+1 of the tool.  I sketched out what
would be necessary once and it came out to several solid weeks of work.

But the traffic on the cvs2svn/cvs2git mailing list has trailed off
essentially to zero, so either the software is perfect already (haha) or
most everybody has already converted.  Therefore I don't invest any
significant time in that project these days.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  parent reply	other threads:[~2013-12-18 23:44 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-12  0:17 I have end-of-lifed cvsps Eric S. Raymond
2013-12-12  3:38 ` Martin Langhoff
2013-12-12  4:26   ` Eric S. Raymond
2013-12-12 13:42     ` Martin Langhoff
2013-12-12 17:17       ` Andreas Krey
2013-12-12 17:26         ` Martin Langhoff
2013-12-12 18:35           ` Eric S. Raymond
2013-12-12 18:29         ` Eric S. Raymond
2013-12-12 19:08           ` Martin Langhoff
2013-12-12 19:39             ` Eric S. Raymond
2013-12-12 19:48               ` Martin Langhoff
2013-12-12 20:58                 ` Eric S. Raymond
2013-12-12 22:51                   ` Martin Langhoff
2013-12-12 23:04                     ` Eric S. Raymond
2013-12-13  2:35                       ` Martin Langhoff
2013-12-13  3:38                         ` Eric S. Raymond
2013-12-12 18:15       ` Eric S. Raymond
2013-12-12 18:53         ` Martin Langhoff
2013-12-17 10:57       ` Jakub Narębski
2013-12-17 11:18         ` Johan Herland
2013-12-17 14:58           ` Eric S. Raymond
2013-12-17 17:52             ` Johan Herland
2013-12-17 18:47               ` Eric S. Raymond
2013-12-17 21:26                 ` Johan Herland
2013-12-17 22:41                   ` Eric S. Raymond
2013-12-18 23:44                 ` Michael Haggerty [this message]
2013-12-19  1:11                   ` Johan Herland
2013-12-19  9:31                     ` Michael Haggerty
2013-12-19 15:26                       ` Johan Herland
2013-12-19 16:18                         ` Michael Haggerty
2013-12-19  4:06                   ` Eric S. Raymond
2013-12-19  9:43                     ` Michael Haggerty
2013-12-17 14:07         ` Eric S. Raymond
2013-12-17 19:58           ` Jakub Narębski
2013-12-17 21:02             ` Eric S. Raymond
2013-12-18  0:02               ` Jakub Narębski
2013-12-18  0:21                 ` Eric S. Raymond
2013-12-18 15:39                   ` Jakub Narębski
2013-12-18 16:23                     ` incremental fast-import and marks (Re: I have end-of-lifed cvsps) Jonathan Nieder
2013-12-18 16:27                     ` I have end-of-lifed cvsps Eric S. Raymond
2013-12-18 16:53                       ` Martin Langhoff
2013-12-18 19:54                         ` John Keeping
2013-12-18 20:20                           ` Eric S. Raymond
2013-12-18 20:47                             ` Kent R. Spillner
2013-12-18 17:46                       ` Jeff King
2013-12-18 19:16                         ` Eric S. Raymond
2013-12-18  0:04               ` Andreas Schwab
2013-12-18  0:25                 ` Eric S. Raymond

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52B2335D.2030607@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=esr@thyrsus.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=johan@herland.net \
    --cc=martin.langhoff@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).