git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Eric S. Raymond" <esr@thyrsus.com>
To: Johan Herland <johan@herland.net>
Cc: "Jakub Narębski" <jnareb@gmail.com>,
	"Martin Langhoff" <martin.langhoff@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: Re: I have end-of-lifed cvsps
Date: Tue, 17 Dec 2013 17:41:36 -0500	[thread overview]
Message-ID: <20131217224136.GB19511@thyrsus.com> (raw)
In-Reply-To: <CALKQrgeRKosOSOhcbUArkh03mwJLPkcOH-DROCCnmbTdQ8afyg@mail.gmail.com>

Johan Herland <johan@herland.net>:
> > Alan and I are going to take a good hard whack at modifying cvs-fast-export
> > to make this work. Because there really aren't any feasible alternatives.
> > The analysis code in cvsps was never good enough. cvs2git, being written
> > in Python, would hit the core limit faster than anything written in C.
> 
> Depends on how it organizes its data structures. Have you actually
> tried running cvs2git on it? I'm not saying you are wrong, but I had
> similar problems with my custom converter (also written in Python),
> and solved them by adding multiple passes/phases instead of trying to
> do too much work in fewer passes. In the end I ended up storing the
> largest inter-phase data structures outside of Python (sqlite in my
> case) to save memory. Obviously it cost a lot in runtime, but it meant
> that I could actually chew through our largest CVS modules without
> running out of memory.

You make a good point.  cvs2git is descended from cvs2svn, which has
such a multipass organization - it will only have to avoid memory
limits per pass.  Alan and I will try that as a fallback if
cvs-fast-import continues to choke.
 
> > It is certainly the case that a sufficiently large CVS repo will break
> > anything, like a star with a mass over the Chandrasekhar limit becoming a
> > black hole :-)
> 
> :) True, although it's not the sheer size of the files themselves that
> is the actual problem. Most of those bytes are (deltified) file data,
> which you can pretty much stream through and convert to a
> corresponding fast-export stream of blob objects. The code for that
> should be fairly straightforward (and should also be eminently
> parallelizable, given enough cores and available I/O), resulting in a
> table mapping CVS file:revision pairs to corresponding Git blob SHA1s,
> and an accompanying (set of) packfile(s) holding said blobs.

Allowing for the fact that cvs-fast-export isn't git and doesn't use
SHA1s or packfiles, this is in fact how a large portion of
cvs-fast-export works.  The blob files get created during the walk
through the master file list, before actual topo analysis is done.

> The hard part comes when trying to correlate the metadata for all the
> per-file revisions, and distill that into a consistent sequence/DAG of
> changesets/commits across the entire CVS repo. And then, of course,
> trying to fit all the branches and tags into that DAG of commits is
> what really drives you mad... ;-)

Well I know this...:-)

> > The question is how common such supermassive cases are. My own guess is that
> > the *BSD repos and a handful of the oldest GNU projects are pretty much the
> > whole set; everybody else converted to Subversion within the last decade.
> 
> You may be right. At least for the open-source cases. I suspect
> there's still a considerable number of huge CVS repos within
> companies' walls...

If people with money want to hire me to slay those beasts, I'm available.
I'm not proud, I'll use cvs2git if I have to.
 
> > I find the very idea of writing anything that encourages
> > non-history-correct conversions disturbing and want no part of it.
> >
> > Which matters, because right now the set of people working on CVS lifters
> > begins with me and ends with Michael Rafferty (cvs2git),
> 
> s/Rafferty/Haggerty/?

Yup, I thinkoed.
 
> > who seems even
> > less interested in incremental conversion than I am.  Unless somebody
> > comes out of nowhere and wants to own that problem, it's not going
> > to get solved.
> 
> Agreed. It would be nice to have something to point to for people that
> want something similar to git-svn for CVS, but without a motivated
> owner, it won't happen.

I think the fact that it hasn't happened already is a good clue that
it's not going to. Given the decline curve of CVS usage, writing 
git-cvs might have looked like a decent investment of time once,
but that era probably ended five to eight years ago.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

  reply	other threads:[~2013-12-17 22:41 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-12  0:17 I have end-of-lifed cvsps Eric S. Raymond
2013-12-12  3:38 ` Martin Langhoff
2013-12-12  4:26   ` Eric S. Raymond
2013-12-12 13:42     ` Martin Langhoff
2013-12-12 17:17       ` Andreas Krey
2013-12-12 17:26         ` Martin Langhoff
2013-12-12 18:35           ` Eric S. Raymond
2013-12-12 18:29         ` Eric S. Raymond
2013-12-12 19:08           ` Martin Langhoff
2013-12-12 19:39             ` Eric S. Raymond
2013-12-12 19:48               ` Martin Langhoff
2013-12-12 20:58                 ` Eric S. Raymond
2013-12-12 22:51                   ` Martin Langhoff
2013-12-12 23:04                     ` Eric S. Raymond
2013-12-13  2:35                       ` Martin Langhoff
2013-12-13  3:38                         ` Eric S. Raymond
2013-12-12 18:15       ` Eric S. Raymond
2013-12-12 18:53         ` Martin Langhoff
2013-12-17 10:57       ` Jakub Narębski
2013-12-17 11:18         ` Johan Herland
2013-12-17 14:58           ` Eric S. Raymond
2013-12-17 17:52             ` Johan Herland
2013-12-17 18:47               ` Eric S. Raymond
2013-12-17 21:26                 ` Johan Herland
2013-12-17 22:41                   ` Eric S. Raymond [this message]
2013-12-18 23:44                 ` Michael Haggerty
2013-12-19  1:11                   ` Johan Herland
2013-12-19  9:31                     ` Michael Haggerty
2013-12-19 15:26                       ` Johan Herland
2013-12-19 16:18                         ` Michael Haggerty
2013-12-19  4:06                   ` Eric S. Raymond
2013-12-19  9:43                     ` Michael Haggerty
2013-12-17 14:07         ` Eric S. Raymond
2013-12-17 19:58           ` Jakub Narębski
2013-12-17 21:02             ` Eric S. Raymond
2013-12-18  0:02               ` Jakub Narębski
2013-12-18  0:21                 ` Eric S. Raymond
2013-12-18 15:39                   ` Jakub Narębski
2013-12-18 16:23                     ` incremental fast-import and marks (Re: I have end-of-lifed cvsps) Jonathan Nieder
2013-12-18 16:27                     ` I have end-of-lifed cvsps Eric S. Raymond
2013-12-18 16:53                       ` Martin Langhoff
2013-12-18 19:54                         ` John Keeping
2013-12-18 20:20                           ` Eric S. Raymond
2013-12-18 20:47                             ` Kent R. Spillner
2013-12-18 17:46                       ` Jeff King
2013-12-18 19:16                         ` Eric S. Raymond
2013-12-18  0:04               ` Andreas Schwab
2013-12-18  0:25                 ` Eric S. Raymond

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131217224136.GB19511@thyrsus.com \
    --to=esr@thyrsus.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=johan@herland.net \
    --cc=martin.langhoff@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).