From mboxrd@z Thu Jan  1 00:00:00 1970
From: Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: I have end-of-lifed cvsps
Date: Thu, 19 Dec 2013 17:18:19 +0100
Message-ID: <52B31C4B.8080404@alum.mit.edu>
References: <20131212001738.996EB38055C@snark.thyrsus.com> <CACPiFCK+Z7dOfO2v29PMKz+Y_fH1++xqMuTquSQ84d8KyjjFeQ@mail.gmail.com> <20131212042624.GB8909@thyrsus.com> <CACPiFC+bopf32cgDcQcVpL5vW=3KxmSP8Oh1see4KduQ1BNcPw@mail.gmail.com> <52B02DFF.5010408@gmail.com> <CALKQrgf3kuXRpbWmSp_nk8+zDFYNzkgV+dSBHaBbmUkxqjaDUA@mail.gmail.com> <20131217145809.GC15010@thyrsus.com> <CALKQrgeegcsO7YVqEmQxD4=HfR4eitodAov0tEh7MRvBxtRKUA@mail.gmail.com> <20131217184724.GA17709@thyrsus.com> <52B2335D.2030607@alum.mit.edu> <CALKQrgdin=8h9dr=h+VfGjX3suOGRXNsvzzcF=_L9cQDYtKPgg@mail.gmail.com> <52B2BCF9.5080300@alum.mit.edu> <CALKQrgeiVSPhe84xTnKQ6iAmN3UX_Jy77pgp5ieSwFQ21tWPFg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: Eric Raymond <esr@thyrsus.com>,
	=?UTF-8?B?SmFrdWIgTmFyxJlic2tp?= <jnareb@gmail.com>,
	Martin Langhoff <martin.langhoff@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
To: Johan Herland <johan@herland.net>
X-From: git-owner@vger.kernel.org Thu Dec 19 17:18:36 2013
Return-path: <git-owner@vger.kernel.org>
Envelope-to: gcvg-git-2@plane.gmane.org
Received: from vger.kernel.org ([209.132.180.67])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <git-owner@vger.kernel.org>)
	id 1VtgIx-000731-15
	for gcvg-git-2@plane.gmane.org; Thu, 19 Dec 2013 17:18:31 +0100
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755619Ab3LSQS0 (ORCPT <rfc822;gcvg-git-2@m.gmane.org>);
	Thu, 19 Dec 2013 11:18:26 -0500
Received: from alum-mailsec-scanner-7.mit.edu ([18.7.68.19]:57814 "EHLO
	alum-mailsec-scanner-7.mit.edu" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1754540Ab3LSQSY (ORCPT
	<rfc822;git@vger.kernel.org>); Thu, 19 Dec 2013 11:18:24 -0500
X-AuditID: 12074413-b7fc76d000002aba-94-52b31c4fa690
Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33])
	by alum-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 4F.A9.10938.F4C13B25; Thu, 19 Dec 2013 11:18:23 -0500 (EST)
Received: from [172.16.46.13] ([178.19.210.163])
	(authenticated bits=0)
        (User authenticated as mhagger@ALUM.MIT.EDU)
	by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id rBJGIKQZ028772
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
	Thu, 19 Dec 2013 11:18:22 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131005 Icedove/17.0.9
In-Reply-To: <CALKQrgeiVSPhe84xTnKQ6iAmN3UX_Jy77pgp5ieSwFQ21tWPFg@mail.gmail.com>
X-Enigmail-Version: 1.6
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrKKsWRmVeSWpSXmKPExsUixO6iqOsvsznI4MobeYurW3wsuq50M1ms
	uDqH2WLe3V1MFhvXmTiweuycdZfd49LL72wey752snh83iQXwBLFbZOUWFIWnJmep2+XwJ3x
	vPMOW0GrUcWCP/fYGxi3aXQxcnJICJhIPHx1ihXCFpO4cG89WxcjF4eQwGVGiaP39rJAOBuY
	JI7fmQLkcHDwCmhLXPrMB9LAIqAq8ff9frBmNgFdiUU9zUwgtqhAkMSjQw/ZQWxeAUGJkzOf
	sIDYIkD1Ox7/AlvALHCSUeLH3pvMIAlhATWJZ09XMUIsW8wqMbvnDhtIglMgUOLP3mNgiyUE
	xCV6GoNATGYBdYn184RAKpgF5CW2v53DPIFRcBaSdbMQqmYhqVrAyLyKUS4xpzRXNzcxM6c4
	NVm3ODkxLy+1SNdcLzezRC81pXQTIyTYhXcw7jopd4hRgINRiYd3xcuNQUKsiWXFlbmHGCU5
	mJREeZ9JbQ4S4kvKT6nMSCzOiC8qzUktPsQowcGsJMK7ByTHm5JYWZValA+TkuZgURLnVVui
	7ickkJ5YkpqdmlqQWgSTleHgUJLgNZQGahQsSk1PrUjLzClBSDNxcIIM55ISKU7NS0ktSiwt
	yYgHxW98MTCCQVI8QHvZQdp5iwsSc4GiEK2nGHU55n358I1RiCUvPy9VSpxXEKRIAKQoozQP
	bgUstb1iFAf6WBhiFA8wLcJNegW0hAloifHaTSBLShIRUlINjDVHlGac4dd8xH74p2Og0j43
	Ls/S3RlJHzO2+qQ/fztlg2LgiU7TBxcef5taWy7bZVYkcGi52kVdEc+kHe1nrzFe 
Sender: git-owner@vger.kernel.org
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
Archived-At: <http://permalink.gmane.org/gmane.comp.version-control.git/239523>

On 12/19/2013 04:26 PM, Johan Herland wrote:
> On Thu, Dec 19, 2013 at 10:31 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> On 12/19/2013 02:11 AM, Johan Herland wrote:
>>> On Thu, Dec 19, 2013 at 12:44 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>>>> A correct incremental converter could be done (as long as the CVS users
>>>> don't literally change history retroactively) but it would be a lot of work.
>>>
>>> Although I agree with that sentence as it is stated, I also believe
>>> that the parenthesized condition rules out a _majority_ of CVS repo of
>>> non-trivial size/history. So even though a correct incremental
>>> converter could be built, it would be pretty much useless if it did
>>> not gracefully handle rewritten history. And in the face of rewritten
>>> history it becomes pretty much impossible to define what a "correct"
>>> conversion should even look like (not to mention the difficulty of
>>> actually implementing that converter...).
>>
>> A correct conversion would, conceptually, take a diff between the old
>> CVS history and the new CVS history (I'm talking about the history as a
>> whole, not a diff between two changesets), figure out what had changed,
>> and then figure out what Git commits to make to effect the same
>> conceptual changes in Git-land.
>>
>> This means that the final Git history would have to depend not only on
>> the current entirety of the CVS history, but also on what the CVS
>> history *was* during previous incremental imports and how the tool chose
>> to represent that history in Git the previous rounds.
>>
>> There is a tradeoff here.  The smarter the tool is, the fewer
>> restrictions would have to be made on what people can do in CVS.  For
>> example, it wouldn't be unreasonable to impose a rule that people are
>> not allowed to move files within the CVS repository (e.g., to fake
>> move-file-with-history) after the CVS <-> Git bridge is in use.  (Abuses
>> of the history that occurred *before* the first incremental conversion,
>> on the other hand, wouldn't be a problem.)  If the user of the
>> incremental tool has *no* influence on how his colleagues use CVS, then
>> the tool would have to be very smart and/or the user would might
>> sometimes be forced to do another from-scratch conversion.
> 
> Agreed, but I find it quite ugly how the git history will end up
> different depending on _when_ the incremental conversion is run. It
> means that it will be impossible for two users to create the same Git
> repo (matching SHA1s), unless they carefully synchronize all of their
> conversion runs

Even git-svn doesn't guarantee the same results over time.  The most
obvious scenario when it fails is when somebody changes an SVN commit's
metadata retroactively using something like "svn propedit --revprop
svn:log".  Consistency over time across two independent conversion
processes (that don't communicate) is not even theoretically possible.

> (at which point it's much simpler to run a single
> conversion and then have both users fetch the result).

Yes.  That is a very reasonable approach.

[Discussion of hypothetical real-time inode-watching or proxy-based
converter omitted here...]
> Agreed, but if you want correct metadata (_when_ did these changes
> happen, _who_ performed them), then you need to actually monitor the
> CVS command stream (or CVS server files) in real time...

In my opinion it is ridiculous to try to design a CVS <-> Git bridge
that tries to use back-channels to fill in historical data that even CVS
doesn't record.  Such a thing would require an intimate connection to
the CVS server from the IT department that is presumably blocking a real
move to Git.  So who would ever be able to use it?

The only reason to record extra information would be to enable the
bridge to do self-consistent incremental conversions, and in that case
the *only* extra information that has to be recorded is the information
that would have anyway landed in Git during the previous conversion.

>>> My point here is that people will use whatever available tools they
>>> have to solve whatever problems they are currently having. And when
>>> CVS is your tool, you will sooner or later end up with a "solution"
>>> that irrevocably rewrites your CVS history.
>>
>> Yes, but I maintain that an incremental importer could keep a Git
>> history that is consistent with the CVS history in the sense that:
>>
>> 1. the result of checking out any branch or tag, right after a run of
>> the importer, gives the same results as checking the same branch or tag
>> out of CVS.
>>
>> 2. the Git history from one run is added to (never rewritten) by the
>> next run.
> 
> Yes, and even my simplest/fastest possible converter described above
> can meet those criteria. After that, it really becomes a question of
> _how_much_ CVS history you want to retain in your incremental import.

I think you want enough history to make it pleasant to work with the
resulting Git repository.  That approximately means that you need some
semblance of the CVS commits to be reconstructed, with their correct
metadata, on the closest thing to their correct branches that is
consistent with the CVS - Git impedance mismatch.

> I have described the two extremes above. Interestingly, _both_ of
> those extremes would look quite different from the
> whole-history-gone-incremental converters represented by cvs2git and
> cvs-fast-export, and _both_ of the extremes would probably also
> provide a converted result quite a bit faster than anything in between
> (one by virtue of depending on a single "cvs update" command, and the
> other by monitoring the CVS server and performing the conversion to
> Git in real time).

I am not an extremist.  And I know how much work it would be to start a
project like this from scratch.  After all, what it can do should be a
strict superset of what a tool like cvs2git can do, and cvs2svn/cvs2git
(according to Ohloh's COCOMO estimate) contains the equivalent of 7
person-years of effort.

Anyway, this is all just blah blah unless somebody volunteers to work on
it.  And I think that is highly unlikely, especially given the
decreasing number of CVS repositories in the wild.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/