From: Johan Herland <johan@herland.net>
To: Eric Raymond <esr@thyrsus.com>
Cc: "Jakub Narębski" <jnareb@gmail.com>,
"Martin Langhoff" <martin.langhoff@gmail.com>,
"Git Mailing List" <git@vger.kernel.org>
Subject: Re: I have end-of-lifed cvsps
Date: Tue, 17 Dec 2013 22:26:57 +0100 [thread overview]
Message-ID: <CALKQrgeRKosOSOhcbUArkh03mwJLPkcOH-DROCCnmbTdQ8afyg@mail.gmail.com> (raw)
In-Reply-To: <20131217184724.GA17709@thyrsus.com>
On Tue, Dec 17, 2013 at 7:47 PM, Eric S. Raymond <esr@thyrsus.com> wrote:
> I'm working with Alan Barret now on trying to convert the NetBSD
> repositories. They break cvs-fast-export through sheer bulk of
> metadata, by running the machine out of core. This is exactly
> the kind of huge case that you're talking about.
>
> Alan and I are going to take a good hard whack at modifying cvs-fast-export
> to make this work. Because there really aren't any feasible alternatives.
> The analysis code in cvsps was never good enough. cvs2git, being written
> in Python, would hit the core limit faster than anything written in C.
Depends on how it organizes its data structures. Have you actually
tried running cvs2git on it? I'm not saying you are wrong, but I had
similar problems with my custom converter (also written in Python),
and solved them by adding multiple passes/phases instead of trying to
do too much work in fewer passes. In the end I ended up storing the
largest inter-phase data structures outside of Python (sqlite in my
case) to save memory. Obviously it cost a lot in runtime, but it meant
that I could actually chew through our largest CVS modules without
running out of memory.
> It is certainly the case that a sufficiently large CVS repo will break
> anything, like a star with a mass over the Chandrasekhar limit becoming a
> black hole :-)
:) True, although it's not the sheer size of the files themselves that
is the actual problem. Most of those bytes are (deltified) file data,
which you can pretty much stream through and convert to a
corresponding fast-export stream of blob objects. The code for that
should be fairly straightforward (and should also be eminently
parallelizable, given enough cores and available I/O), resulting in a
table mapping CVS file:revision pairs to corresponding Git blob SHA1s,
and an accompanying (set of) packfile(s) holding said blobs.
The hard part comes when trying to correlate the metadata for all the
per-file revisions, and distill that into a consistent sequence/DAG of
changesets/commits across the entire CVS repo. And then, of course,
trying to fit all the branches and tags into that DAG of commits is
what really drives you mad... ;-)
> The question is how common such supermassive cases are. My own guess is that
> the *BSD repos and a handful of the oldest GNU projects are pretty much the
> whole set; everybody else converted to Subversion within the last decade.
You may be right. At least for the open-source cases. I suspect
there's still a considerable number of huge CVS repos within
companies' walls...
> I find the very idea of writing anything that encourages
> non-history-correct conversions disturbing and want no part of it.
>
> Which matters, because right now the set of people working on CVS lifters
> begins with me and ends with Michael Rafferty (cvs2git),
s/Rafferty/Haggerty/?
> who seems even
> less interested in incremental conversion than I am. Unless somebody
> comes out of nowhere and wants to own that problem, it's not going
> to get solved.
Agreed. It would be nice to have something to point to for people that
want something similar to git-svn for CVS, but without a motivated
owner, it won't happen.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
next prev parent reply other threads:[~2013-12-17 21:27 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-12 0:17 I have end-of-lifed cvsps Eric S. Raymond
2013-12-12 3:38 ` Martin Langhoff
2013-12-12 4:26 ` Eric S. Raymond
2013-12-12 13:42 ` Martin Langhoff
2013-12-12 17:17 ` Andreas Krey
2013-12-12 17:26 ` Martin Langhoff
2013-12-12 18:35 ` Eric S. Raymond
2013-12-12 18:29 ` Eric S. Raymond
2013-12-12 19:08 ` Martin Langhoff
2013-12-12 19:39 ` Eric S. Raymond
2013-12-12 19:48 ` Martin Langhoff
2013-12-12 20:58 ` Eric S. Raymond
2013-12-12 22:51 ` Martin Langhoff
2013-12-12 23:04 ` Eric S. Raymond
2013-12-13 2:35 ` Martin Langhoff
2013-12-13 3:38 ` Eric S. Raymond
2013-12-12 18:15 ` Eric S. Raymond
2013-12-12 18:53 ` Martin Langhoff
2013-12-17 10:57 ` Jakub Narębski
2013-12-17 11:18 ` Johan Herland
2013-12-17 14:58 ` Eric S. Raymond
2013-12-17 17:52 ` Johan Herland
2013-12-17 18:47 ` Eric S. Raymond
2013-12-17 21:26 ` Johan Herland [this message]
2013-12-17 22:41 ` Eric S. Raymond
2013-12-18 23:44 ` Michael Haggerty
2013-12-19 1:11 ` Johan Herland
2013-12-19 9:31 ` Michael Haggerty
2013-12-19 15:26 ` Johan Herland
2013-12-19 16:18 ` Michael Haggerty
2013-12-19 4:06 ` Eric S. Raymond
2013-12-19 9:43 ` Michael Haggerty
2013-12-17 14:07 ` Eric S. Raymond
2013-12-17 19:58 ` Jakub Narębski
2013-12-17 21:02 ` Eric S. Raymond
2013-12-18 0:02 ` Jakub Narębski
2013-12-18 0:21 ` Eric S. Raymond
2013-12-18 15:39 ` Jakub Narębski
2013-12-18 16:23 ` incremental fast-import and marks (Re: I have end-of-lifed cvsps) Jonathan Nieder
2013-12-18 16:27 ` I have end-of-lifed cvsps Eric S. Raymond
2013-12-18 16:53 ` Martin Langhoff
2013-12-18 19:54 ` John Keeping
2013-12-18 20:20 ` Eric S. Raymond
2013-12-18 20:47 ` Kent R. Spillner
2013-12-18 17:46 ` Jeff King
2013-12-18 19:16 ` Eric S. Raymond
2013-12-18 0:04 ` Andreas Schwab
2013-12-18 0:25 ` Eric S. Raymond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALKQrgeRKosOSOhcbUArkh03mwJLPkcOH-DROCCnmbTdQ8afyg@mail.gmail.com \
--to=johan@herland.net \
--cc=esr@thyrsus.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=martin.langhoff@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).