git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Samuel Lucas Vaz de Mello <samuellucas@datacom.ind.br>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/4] Add more tests of cvsimport
Date: Sat, 21 Feb 2009 07:32:09 +0100	[thread overview]
Message-ID: <499F9FE9.6050006@alum.mit.edu> (raw)
In-Reply-To: <499F201E.2050106@datacom.ind.br>

Samuel Lucas Vaz de Mello wrote:
> Michael Haggerty wrote:
>> BTW, I don't want to trash "git cvsimport".  I'm not brave enough even
>> to try to implement incremental conversions in cvs2git.  So the fact
> 
> If I run cvs2git several times against a live cvs repo (using the
> same configuration), wouldn't it perform an incremental import?
> Is there anything that would make it produce different commits for
> the history?
> 
> I've just made a simple test here performing 2 imports (the 2nd with a
> dozen of new commits not in the 1st) and it seemed to work fine.
> 
> I know that it will take the same time/memory as the first import,
> but is there something that can break the repository or produce wrong
> data?

Cool, I'd never thought of that.  It's certainly not by design, but as
you've discovered, the interaction of cvs2git and git *almost* combine
to give you an incremental import.

Alas, it is only "almost".  There are many things that can happen in a
CVS repository that would cause the overlapping part of the history to
disagree between runs of cvs2svn.  The nastiest are things that a VCS
shouldn't really even allow, but are common in CVS, like

- Retroactively adding a file to a branch or tag.  (This is a
much-beloved feature of CVS.)  Since CVS doesn't record the timestamp
when a symbol is added to a file, cvs2git tries (subject to the
constraints of other timestamps) to group all such changes into a single
changeset.  So the creation of the symbol would look different in runs N
vs N+1 of cvs2git--containing different files and likely with a
different timestamp.

- Renaming a file "with history" by renaming or copying the associated
*,v file in the repository.  This retroactively changes the entire
history of that file and thus of all changesets that involved changes to
that file.

- Changing the "text vs binary" or keyword expansion mode of a file.
These properties apply to all revisions of a file, and therefore also
have a retroactive effect.

But even aside from these retroactive changes, the output of cvs2git is
not deterministic in any practical sense (though I've tried to make it
deterministic given *identical* input).  The problem is that there are
so many ambiguities in a CVS history (because CVS doesn't record enough
information) that cvs2git has to use heuristics to decide what
individual file events should be grouped together as commits.  The
trickiest part is that the graph of naively inferred changesets can have
cycles in it, and cvs2git uses several heuristics to decide how to split
up changesets so as to remove the cycles.  (See our design notes [1] for
all the hairy details.)  The CVS commits made between runs N and N+1
could easily change some of the heuristics' decisions, giving different
results even for the overlapping part of the history.

To add robust support for incremental commits to cvs2git would require
run N+1 to know about the decisions made in run N, to avoid
contradicting them.

I wonder what would happen if one would treat the results of cvs2git
conversions N and N+1 as two separate repositories and merge them using
git.  In many cases the merge would probably be trivial, and most
conflicts (except retroactive file renaming!) would probably tend to be
in the recent past and therefore resolvable manually.  At least the
repository shouldn't silently become corrupted, which can happen with
other incremental conversion tools.

The final problem is that cvs2git conversions of large CVS repositories
are quite time-consuming, so using it for incremental conversions of
large repositories would be painful.  No doubt it could be speeded up
considerably, especially if conversion N+1 was privy to the results of
conversion N.

These are all challenging problems and I would welcome volunteers and be
happy to get them started.

Michael

[1] http://cvs2svn.tigris.org/svn/cvs2svn/trunk/doc/design-notes.txt

  reply	other threads:[~2009-02-21  6:40 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-20  5:18 [PATCH 0/4] Add more tests of cvsimport Michael Haggerty
2009-02-20  5:18 ` [PATCH 1/4] Start a library for cvsimport-related tests Michael Haggerty
2009-02-20  5:18   ` [PATCH 2/4] Use CVS's -f option if available (ignore user's ~/.cvsrc file) Michael Haggerty
2009-02-20  5:18     ` [PATCH 3/4] Test contents of entire cvsimported "master" tree contents Michael Haggerty
2009-02-20  5:18       ` [PATCH 4/4] Add some tests of git-cvsimport's handling of vendor branches Michael Haggerty
2009-02-20  6:25 ` [PATCH 0/4] Add more tests of cvsimport Jeff King
2009-02-20  7:40   ` Junio C Hamano
2009-02-20 11:24     ` Michael Haggerty
2009-02-20 14:12       ` [HALF A PATCH] Teach the '--exclude' option to 'diff --no-index' Johannes Schindelin
2009-02-20 14:53         ` Jeff King
2009-02-20 15:03           ` Johannes Schindelin
2009-02-20 18:34             ` Jakub Narebski
2009-02-20 20:04               ` Johannes Schindelin
2009-02-20 16:34         ` Junio C Hamano
2009-02-24 16:15           ` Johannes Schindelin
2009-02-24 17:01             ` Junio C Hamano
2009-02-20 10:21   ` [PATCH 0/4] Add more tests of cvsimport Michael Haggerty
2009-02-20 15:00     ` Jeff King
2009-02-20 21:26     ` Samuel Lucas Vaz de Mello
2009-02-21  6:32       ` Michael Haggerty [this message]
2009-02-20  8:27 ` Ferry Huberts (Pelagic)
2009-02-21 13:05   ` Michael Haggerty
2009-02-21 13:19     ` Ferry Huberts (Pelagic)
2009-02-22 16:49     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=499F9FE9.6050006@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=samuellucas@datacom.ind.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).