git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Petr Baudis <pasky@suse.cz>, Andy Whitcroft <apw@shadowen.org>,
	Git Mailing List <git@vger.kernel.org>,
	dev@cvs2svn.tigris.org
Subject: Re: Mozilla, git and Windows
Date: Tue, 28 Nov 2006 13:17:17 +0100	[thread overview]
Message-ID: <456C28CD.6020800@alum.mit.edu> (raw)
In-Reply-To: <9e4733910611271735y14bed29bk70ae67b5d28eb055@mail.gmail.com>

Jon Smirl wrote:
> As was mentioned in the thread about doing CVS to git import, the
> trick is to write your own CVS file parser, parse the file once (not
> once for each revision) and output all of the revisions to the git
> database in a single pass. When code is structured that way I can
> import the whole Mozilla repository into git in two hours. The
> fast-import back end also works with out forking, it just listens to
> command and stdin and acts on them, all of the commands are implement
> in a single binary.

Using cvs2svn, it is now possible to avoid having to invoke CVS/RCS
zillions of times.  Here is a brief description of how the new hooks work.

There is an interface called RevisionReader that is used to retrieve the
contents of a file.  The RevisionReader that should be used for a run of
cvs2svn can be set using the --options file method with a line like:

ctx.revision_reader = MyRevisionReader()

The RevisionReader interface includes a method get_revision_recorder(),
which should return an instance of RevisionRecorder.  The
RevisionRecorder has callback methods that are invoked as the CVS files
are parsed.  For example, RevisionRecorder.record_text() is passed the
log message and text (full text or delta) for each file revision.  The
record_text() method is allowed to return an arbitrary token (for
example, a content hash), and that token is stored into
CVSRevision.revision_recorder_token and carried along by cvs2svn.

The concrete RevisionReaders included with cvs2svn are RCSRevisionReader
and CVSRevisionReader, which have do-nothing RevisionRecorders and which
call rcs or cvs in OutputPass to get the file contents.  (This repeated
invocation of rcs/cvs is the most expensive part of the conversion.)

So what you would do to speed things up is write your own
RevisionRecorder, which constructs the file fulltext from the CVS deltas
and stores the contents in a git store, returning the file revision's
content hash as token.

Then write a RevisionReader that returns an instance of your
RevisionRecorder to be used in the CollectRevsPass of the conversion.
For OutputPass, the RevisionReader has to implement the method
get_content_stream(), which is passed a CVSRevision instance and has to
return a stream object that produces the file revision's contents.  In
your case, you wouldn't need the contents at all, but could just work
with CVSRevision.revision_recorder_token, which contains the hash that
was generated by your RevisionRecorder.

How you actually cook these tokens together into a git repository is up
to you :-)

Michael


  reply	other threads:[~2006-11-28 12:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-27 15:28 Mozilla, git and Windows Jon Smirl
2006-11-27 15:34 ` Andy Whitcroft
2006-11-27 16:13   ` Jon Smirl
2006-11-27 16:37     ` Robin Rosenberg
2006-11-27 22:13     ` Petr Baudis
2006-11-28  1:35       ` Jon Smirl
2006-11-28 12:17         ` Michael Haggerty [this message]
2006-11-28  0:30 ` Sam Vilain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=456C28CD.6020800@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=apw@shadowen.org \
    --cc=dev@cvs2svn.tigris.org \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=pasky@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).