From: "Eric S. Raymond" <esr@thyrsus.com>
To: Johan Herland <johan@herland.net>
Cc: "Jakub Narębski" <jnareb@gmail.com>,
"Martin Langhoff" <martin.langhoff@gmail.com>,
"Git Mailing List" <git@vger.kernel.org>
Subject: Re: I have end-of-lifed cvsps
Date: Tue, 17 Dec 2013 17:41:36 -0500 [thread overview]
Message-ID: <20131217224136.GB19511@thyrsus.com> (raw)
In-Reply-To: <CALKQrgeRKosOSOhcbUArkh03mwJLPkcOH-DROCCnmbTdQ8afyg@mail.gmail.com>
Johan Herland <johan@herland.net>:
> > Alan and I are going to take a good hard whack at modifying cvs-fast-export
> > to make this work. Because there really aren't any feasible alternatives.
> > The analysis code in cvsps was never good enough. cvs2git, being written
> > in Python, would hit the core limit faster than anything written in C.
>
> Depends on how it organizes its data structures. Have you actually
> tried running cvs2git on it? I'm not saying you are wrong, but I had
> similar problems with my custom converter (also written in Python),
> and solved them by adding multiple passes/phases instead of trying to
> do too much work in fewer passes. In the end I ended up storing the
> largest inter-phase data structures outside of Python (sqlite in my
> case) to save memory. Obviously it cost a lot in runtime, but it meant
> that I could actually chew through our largest CVS modules without
> running out of memory.
You make a good point. cvs2git is descended from cvs2svn, which has
such a multipass organization - it will only have to avoid memory
limits per pass. Alan and I will try that as a fallback if
cvs-fast-import continues to choke.
> > It is certainly the case that a sufficiently large CVS repo will break
> > anything, like a star with a mass over the Chandrasekhar limit becoming a
> > black hole :-)
>
> :) True, although it's not the sheer size of the files themselves that
> is the actual problem. Most of those bytes are (deltified) file data,
> which you can pretty much stream through and convert to a
> corresponding fast-export stream of blob objects. The code for that
> should be fairly straightforward (and should also be eminently
> parallelizable, given enough cores and available I/O), resulting in a
> table mapping CVS file:revision pairs to corresponding Git blob SHA1s,
> and an accompanying (set of) packfile(s) holding said blobs.
Allowing for the fact that cvs-fast-export isn't git and doesn't use
SHA1s or packfiles, this is in fact how a large portion of
cvs-fast-export works. The blob files get created during the walk
through the master file list, before actual topo analysis is done.
> The hard part comes when trying to correlate the metadata for all the
> per-file revisions, and distill that into a consistent sequence/DAG of
> changesets/commits across the entire CVS repo. And then, of course,
> trying to fit all the branches and tags into that DAG of commits is
> what really drives you mad... ;-)
Well I know this...:-)
> > The question is how common such supermassive cases are. My own guess is that
> > the *BSD repos and a handful of the oldest GNU projects are pretty much the
> > whole set; everybody else converted to Subversion within the last decade.
>
> You may be right. At least for the open-source cases. I suspect
> there's still a considerable number of huge CVS repos within
> companies' walls...
If people with money want to hire me to slay those beasts, I'm available.
I'm not proud, I'll use cvs2git if I have to.
> > I find the very idea of writing anything that encourages
> > non-history-correct conversions disturbing and want no part of it.
> >
> > Which matters, because right now the set of people working on CVS lifters
> > begins with me and ends with Michael Rafferty (cvs2git),
>
> s/Rafferty/Haggerty/?
Yup, I thinkoed.
> > who seems even
> > less interested in incremental conversion than I am. Unless somebody
> > comes out of nowhere and wants to own that problem, it's not going
> > to get solved.
>
> Agreed. It would be nice to have something to point to for people that
> want something similar to git-svn for CVS, but without a motivated
> owner, it won't happen.
I think the fact that it hasn't happened already is a good clue that
it's not going to. Given the decline curve of CVS usage, writing
git-cvs might have looked like a decent investment of time once,
but that era probably ended five to eight years ago.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
next prev parent reply other threads:[~2013-12-17 22:41 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-12 0:17 I have end-of-lifed cvsps Eric S. Raymond
2013-12-12 3:38 ` Martin Langhoff
2013-12-12 4:26 ` Eric S. Raymond
2013-12-12 13:42 ` Martin Langhoff
2013-12-12 17:17 ` Andreas Krey
2013-12-12 17:26 ` Martin Langhoff
2013-12-12 18:35 ` Eric S. Raymond
2013-12-12 18:29 ` Eric S. Raymond
2013-12-12 19:08 ` Martin Langhoff
2013-12-12 19:39 ` Eric S. Raymond
2013-12-12 19:48 ` Martin Langhoff
2013-12-12 20:58 ` Eric S. Raymond
2013-12-12 22:51 ` Martin Langhoff
2013-12-12 23:04 ` Eric S. Raymond
2013-12-13 2:35 ` Martin Langhoff
2013-12-13 3:38 ` Eric S. Raymond
2013-12-12 18:15 ` Eric S. Raymond
2013-12-12 18:53 ` Martin Langhoff
2013-12-17 10:57 ` Jakub Narębski
2013-12-17 11:18 ` Johan Herland
2013-12-17 14:58 ` Eric S. Raymond
2013-12-17 17:52 ` Johan Herland
2013-12-17 18:47 ` Eric S. Raymond
2013-12-17 21:26 ` Johan Herland
2013-12-17 22:41 ` Eric S. Raymond [this message]
2013-12-18 23:44 ` Michael Haggerty
2013-12-19 1:11 ` Johan Herland
2013-12-19 9:31 ` Michael Haggerty
2013-12-19 15:26 ` Johan Herland
2013-12-19 16:18 ` Michael Haggerty
2013-12-19 4:06 ` Eric S. Raymond
2013-12-19 9:43 ` Michael Haggerty
2013-12-17 14:07 ` Eric S. Raymond
2013-12-17 19:58 ` Jakub Narębski
2013-12-17 21:02 ` Eric S. Raymond
2013-12-18 0:02 ` Jakub Narębski
2013-12-18 0:21 ` Eric S. Raymond
2013-12-18 15:39 ` Jakub Narębski
2013-12-18 16:23 ` incremental fast-import and marks (Re: I have end-of-lifed cvsps) Jonathan Nieder
2013-12-18 16:27 ` I have end-of-lifed cvsps Eric S. Raymond
2013-12-18 16:53 ` Martin Langhoff
2013-12-18 19:54 ` John Keeping
2013-12-18 20:20 ` Eric S. Raymond
2013-12-18 20:47 ` Kent R. Spillner
2013-12-18 17:46 ` Jeff King
2013-12-18 19:16 ` Eric S. Raymond
2013-12-18 0:04 ` Andreas Schwab
2013-12-18 0:25 ` Eric S. Raymond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131217224136.GB19511@thyrsus.com \
--to=esr@thyrsus.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=johan@herland.net \
--cc=martin.langhoff@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).