From: "Eric S. Raymond" <esr@thyrsus.com>
To: Johan Herland <johan@herland.net>
Cc: "Jakub Narębski" <jnareb@gmail.com>,
"Martin Langhoff" <martin.langhoff@gmail.com>,
"Git Mailing List" <git@vger.kernel.org>
Subject: Re: I have end-of-lifed cvsps
Date: Tue, 17 Dec 2013 17:41:36 -0500 [thread overview]
Message-ID: <20131217224136.GB19511@thyrsus.com> (raw)
In-Reply-To: <CALKQrgeRKosOSOhcbUArkh03mwJLPkcOH-DROCCnmbTdQ8afyg@mail.gmail.com>
Johan Herland <johan@herland.net>:
> > Alan and I are going to take a good hard whack at modifying cvs-fast-export
> > to make this work. Because there really aren't any feasible alternatives.
> > The analysis code in cvsps was never good enough. cvs2git, being written
> > in Python, would hit the core limit faster than anything written in C.
>
> Depends on how it organizes its data structures. Have you actually
> tried running cvs2git on it? I'm not saying you are wrong, but I had
> similar problems with my custom converter (also written in Python),
> and solved them by adding multiple passes/phases instead of trying to
> do too much work in fewer passes. In the end I ended up storing the
> largest inter-phase data structures outside of Python (sqlite in my
> case) to save memory. Obviously it cost a lot in runtime, but it meant
> that I could actually chew through our largest CVS modules without
> running out of memory.
You make a good point. cvs2git is descended from cvs2svn, which has
such a multipass organization - it will only have to avoid memory
limits per pass. Alan and I will try that as a fallback if
cvs-fast-import continues to choke.
> > It is certainly the case that a sufficiently large CVS repo will break
> > anything, like a star with a mass over the Chandrasekhar limit becoming a
> > black hole :-)
>
> :) True, although it's not the sheer size of the files themselves that
> is the actual problem. Most of those bytes are (deltified) file data,
> which you can pretty much stream through and convert to a
> corresponding fast-export stream of blob objects. The code for that
> should be fairly straightforward (and should also be eminently
> parallelizable, given enough cores and available I/O), resulting in a
> table mapping CVS file:revision pairs to corresponding Git blob SHA1s,
> and an accompanying (set of) packfile(s) holding said blobs.
Allowing for the fact that cvs-fast-export isn't git and doesn't use
SHA1s or packfiles, this is in fact how a large portion of
cvs-fast-export works. The blob files get created during the walk
through the master file list, before actual topo analysis is done.
> The hard part comes when trying to correlate the metadata for all the
> per-file revisions, and distill that into a consistent sequence/DAG of
> changesets/commits across the entire CVS repo. And then, of course,
> trying to fit all the branches and tags into that DAG of commits is
> what really drives you mad... ;-)
Well I know this...:-)
> > The question is how common such supermassive cases are. My own guess is that
> > the *BSD repos and a handful of the oldest GNU projects are pretty much the
> > whole set; everybody else converted to Subversion within the last decade.
>
> You may be right. At least for the open-source cases. I suspect
> there's still a considerable number of huge CVS repos within
> companies' walls...
If people with money want to hire me to slay those beasts, I'm available.
I'm not proud, I'll use cvs2git if I have to.
> > I find the very idea of writing anything that encourages
> > non-history-correct conversions disturbing and want no part of it.
> >
> > Which matters, because right now the set of people working on CVS lifters
> > begins with me and ends with Michael Rafferty (cvs2git),
>
> s/Rafferty/Haggerty/?
Yup, I thinkoed.
> > who seems even
> > less interested in incremental conversion than I am. Unless somebody
> > comes out of nowhere and wants to own that problem, it's not going
> > to get solved.
>
> Agreed. It would be nice to have something to point to for people that
> want something similar to git-svn for CVS, but without a motivated
> owner, it won't happen.
I think the fact that it hasn't happened already is a good clue that
it's not going to. Given the decline curve of CVS usage, writing
git-cvs might have looked like a decent investment of time once,
but that era probably ended five to eight years ago.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
next prev parent reply other threads:[~2013-12-17 22:41 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-12 0:17 I have end-of-lifed cvsps Eric S. Raymond
2013-12-12 3:38 ` Martin Langhoff
2013-12-12 4:26 ` Eric S. Raymond
2013-12-12 13:42 ` Martin Langhoff
2013-12-12 17:17 ` Andreas Krey
2013-12-12 17:26 ` Martin Langhoff
2013-12-12 18:35 ` Eric S. Raymond
2013-12-12 18:29 ` Eric S. Raymond
2013-12-12 19:08 ` Martin Langhoff
2013-12-12 19:39 ` Eric S. Raymond
2013-12-12 19:48 ` Martin Langhoff
2013-12-12 20:58 ` Eric S. Raymond
2013-12-12 22:51 ` Martin Langhoff
2013-12-12 23:04 ` Eric S. Raymond
2013-12-13 2:35 ` Martin Langhoff
2013-12-13 3:38 ` Eric S. Raymond
2013-12-12 18:15 ` Eric S. Raymond
2013-12-12 18:53 ` Martin Langhoff
2013-12-17 10:57 ` Jakub Narębski
2013-12-17 11:18 ` Johan Herland
2013-12-17 14:58 ` Eric S. Raymond
2013-12-17 17:52 ` Johan Herland
2013-12-17 18:47 ` Eric S. Raymond
2013-12-17 21:26 ` Johan Herland
2013-12-17 22:41 ` Eric S. Raymond [this message]
2013-12-18 23:44 ` Michael Haggerty
2013-12-19 1:11 ` Johan Herland
2013-12-19 9:31 ` Michael Haggerty
2013-12-19 15:26 ` Johan Herland
2013-12-19 16:18 ` Michael Haggerty
2013-12-19 4:06 ` Eric S. Raymond
2013-12-19 9:43 ` Michael Haggerty
2013-12-17 14:07 ` Eric S. Raymond
2013-12-17 19:58 ` Jakub Narębski
2013-12-17 21:02 ` Eric S. Raymond
2013-12-18 0:02 ` Jakub Narębski
2013-12-18 0:21 ` Eric S. Raymond
2013-12-18 15:39 ` Jakub Narębski
2013-12-18 16:23 ` incremental fast-import and marks (Re: I have end-of-lifed cvsps) Jonathan Nieder
2013-12-18 16:27 ` I have end-of-lifed cvsps Eric S. Raymond
2013-12-18 16:53 ` Martin Langhoff
2013-12-18 19:54 ` John Keeping
2013-12-18 20:20 ` Eric S. Raymond
2013-12-18 20:47 ` Kent R. Spillner
2013-12-18 17:46 ` Jeff King
2013-12-18 19:16 ` Eric S. Raymond
2013-12-18 0:04 ` Andreas Schwab
2013-12-18 0:25 ` Eric S. Raymond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131217224136.GB19511@thyrsus.com \
--to=esr@thyrsus.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=johan@herland.net \
--cc=martin.langhoff@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.