git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Martin Langhoff" <martin.langhoff@gmail.com>
To: "Keith Packard" <keithp@keithp.com>
Cc: "Linus Torvalds" <torvalds@osdl.org>,
	"Jim Meyering" <jim@meyering.net>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Matthias Urlichs" <smurf@smurf.noris.de>,
	"Yann Dirson" <ydirson@altern.org>,
	"Pavel Roskin" <proski@gnu.org>
Subject: Re: git-cvsimport doesn't quite work, wrt branches
Date: Wed, 14 Jun 2006 13:56:13 +1200	[thread overview]
Message-ID: <46a038f90606131856o77d58467le4d3dab8021b32@mail.gmail.com> (raw)
In-Reply-To: <1150241459.20536.98.camel@neko.keithp.com>

On 6/14/06, Keith Packard <keithp@keithp.com> wrote:
> On Wed, 2006-06-14 at 10:55 +1200, Martin Langhoff wrote:
>
> > In terms of history parsing, parsecvs and cvs2svn are similar. I like
> > cvs2svn "many passes" approach better, though the Python source is
> > really messy. A good thing about cvs2svn is that it is a lot more
> > conservative WRT memory use.
>
> I will try to fix parsecvs so it doesn't take so much memory. Of course,
> my goal was to import various X.org repositories which have horrible
> issues, but aren't all that huge. And, for them, it works just fine.

Would it be possible to have it parse the RCS histories from a remote repo?

I had forgotten, but that's something else that the cvsps +
git-cvsimport combo can do. In short, to replace cvsps+git-cvsimport
...

 + not memory bound -- or at least must be able to import large
(mozilla, gentoo) with a decent amount of memory

 + must work local and remote (of course local can be faster)

 + must do incrementals reasonably well

> I'd like some help figuring out how to do incremental imports with
> parsecvs. As parsecvs already constructs the project history from the
> present into the past, it should be possible to "notice" when it hits
> existing bits in the repository and stop automatically. I think this
> will just take saving a bit of state in the git repository to mark where
> in CVS the tips of each branch come from.

Ok. Before starting to read the RCS files, I would look at all the
branch tips in the git repo, and read some metadata of the last commit
of each head into memory (author, commitmsg, timestamp, diffstat).

When parsing RCS files and building changesets to import, compare them
with the 'head' data. The timestamp granularity is seconds which is
pretty coarse -- you can ask for history post those timestamps, but
there's the risk of missing commits (this affects git-cvsimport today,
and I'm thinking how to fix it there). So borderline changesets should
be compared against the metadata you have.

There is the chance that your earlier import caught a commit partway
through, so you may end up putting in the 'rest' of the commit. That's
why diffstat can be useful.

Is that useful?


cheers,



martin

  reply	other threads:[~2006-06-14  1:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-13 16:41 git-cvsimport doesn't quite work, wrt branches Jim Meyering
2006-06-13 17:06 ` Jakub Narebski
2006-06-13 17:20 ` Linus Torvalds
2006-06-13 18:46   ` Keith Packard
2006-06-13 22:55     ` Martin Langhoff
2006-06-13 23:30       ` Keith Packard
2006-06-14  1:56         ` Martin Langhoff [this message]
2006-06-14  9:37       ` sf
2006-06-15  7:18     ` Yann Dirson
2006-06-13 21:13   ` Yann Dirson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46a038f90606131856o77d58467le4d3dab8021b32@mail.gmail.com \
    --to=martin.langhoff@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jim@meyering.net \
    --cc=keithp@keithp.com \
    --cc=proski@gnu.org \
    --cc=smurf@smurf.noris.de \
    --cc=torvalds@osdl.org \
    --cc=ydirson@altern.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).