Re: [PATCH 0/3] fixup remaining cvsimport tests

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Haggerty <mhagger@alum.mit.edu>
To: Chris Rorvick <chris@rorvick.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	John Keeping <john@keeping.me.uk>,
	git@vger.kernel.org
Subject: Re: [PATCH 0/3] fixup remaining cvsimport tests
Date: Wed, 23 Jan 2013 10:54:36 +0100	[thread overview]
Message-ID: <50FFB35C.7070809@alum.mit.edu> (raw)
In-Reply-To: <CAEUsAPaw8EUcZFbODDj9Z-=3Ppd1CC=jvYDvuyntFkX_3V0ynQ@mail.gmail.com>

On 01/20/2013 09:17 PM, Chris Rorvick wrote:
> I probably won't be sending any more patches on this.  My hope was to
> get cvsimport-3 (w/ cvsps as the engine) in a state such that one
> could transition from the previous version seamlessly.  But the break
> in t9605 has convinced me this is not worth the effort--even in this
> trivial case cvsps is broken.  The fuzzing logic aggregates commits
> into patch sets that have timestamps within a specified window and
> otherwise matching attributes.  This aggregation causes file-level
> commit timestamps to be lost and we are left with a single timestamp
> for the patch set: the minimum for all contained CVS commits.  When
> all commits have been processed, the patch sets are ordered
> chronologically and printed.
> 
> The problem is that is that a CVS commit is rolled into a patch set
> regardless of whether the patch set's timestamp falls within the
> adjacent CVS file-level commits.  Even worse, since the patch set
> timestamp changes as subsequent commits are added (i.e., it's always
> picking the earliest) it is potentially indeterminate at the time a
> commit is added.  The result is that file revisions can be reordered
> in resulting Git import (see t9605.)  I spent some time last week
> trying to solve this but I coudln't think of anything that wasn't a
> substantial re-work of the code.
> 
> I have never used cvs2git, but I suspect Eric's efforts in making it a
> potential backend for cvsimport are a better use of time.

Thanks for your explanation of how cvsps works.

This is roughly how cvs2svn used to work years ago, prior to release
2.x.  In addition it did a number of things to try to tweak the
timestamp ordering to avoid committing file-level commits in the wrong
order.  It never worked 100%; each tweak that was made to fix one
problem created another problem in another scenario.

cvs2svn/cvs2git 2.x takes a very different approach.  It uses a
timestamp threshold along with author and commit-message matching to
find the biggest set of file-level commits that might constitute a
repository-level commit.  But then it checks the proto-commits to see if
they violate the ordering constraints imposed by the individual
file-level commits.  For example, if the initial grouping gives the
following proto-commits:

proto-commit 1: a.txt 1.1        b.txt 1.2

proto-commit 2: a.txt 1.2        b.txt 1.1

then it is apparent that something is wrong, because a.txt 1.1
necessarily comes before a.txt 1.2 whereas b.txt 1.1 necessarily comes
before b.txt 1.2 (CVS can at least be relied on to get this right!) and
therefore there is no consistent ordering of the two proto-commits.
More generally, the proto-commits have to form a directed acyclic graph,
whereas this graph has a cycle 1 -> 2 -> 1.  When cvs2svn/cvs2git finds
a cycle, it uses heuristics to break up one or more of the proto-commits
to break the cycle.  In this case it might break proto-commit 1 into two
commits:

proto-commit 1a: a.txt 1.1

proto-commit 2:  a.txt 1.2        b.txt 1.1

proto-commit 1b:                  b.txt 1.2

Now it is possible to commit them in the order 1a,2,1b.  (Exactly this
scenario is tested in t9603.)

Of course a typical proto-commit graph often contains far more
complicated cycles, but the approach remains the same: split
proto-commits up as necessary until the graph is acyclic.  One can
quibble about the heuristics that cvs2svn/cvs2git uses to break up
proto-commits.  But the final result of the algorithm is *guaranteed* to
be consistent with the file-level CVS history and also self-consistent.

I am skeptical that a simpler approach will ever work 100%.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

next prev parent reply	other threads:[~2013-01-23  9:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-11  4:27 [PATCH 0/3] fixup remaining cvsimport tests Chris Rorvick
2013-01-11  4:27 ` [PATCH 1/3] t/lib-cvs.sh: allow cvsps version 3.x Chris Rorvick
2013-01-11  4:27 ` [PATCH 2/3] t9600: fixup for new cvsimport Chris Rorvick
2013-01-11  4:27 ` [PATCH 3/3] t9604: " Chris Rorvick
2013-01-20 12:58 ` [PATCH 0/3] fixup remaining cvsimport tests John Keeping
2013-01-20 15:22   ` Chris Rorvick
2013-01-20 15:28     ` John Keeping
2013-01-20 18:57       ` Junio C Hamano
2013-01-20 19:24         ` John Keeping
2013-01-20 21:17           ` Chris Rorvick
2013-01-20 20:17         ` Chris Rorvick
2013-01-21  1:34           ` Chris Rorvick
2013-01-21  2:43             ` Eric S. Raymond
2013-01-23  9:54           ` Michael Haggerty [this message]
2013-01-23 11:03             ` John Keeping
2013-01-24  3:15               ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50FFB35C.7070809@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=chris@rorvick.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=john@keeping.me.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).