From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Haggerty Subject: Re: I have end-of-lifed cvsps Date: Thu, 19 Dec 2013 17:18:19 +0100 Message-ID: <52B31C4B.8080404@alum.mit.edu> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> <20131217184724.GA17709@thyrsus.com> <52B2335D.2030607@alum.mit.edu> <52B2BCF9.5080300@alum.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Eric Raymond , =?UTF-8?B?SmFrdWIgTmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: Johan Herland X-From: git-owner@vger.kernel.org Thu Dec 19 17:18:36 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtgIx-000731-15 for gcvg-git-2@plane.gmane.org; Thu, 19 Dec 2013 17:18:31 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755619Ab3LSQS0 (ORCPT ); Thu, 19 Dec 2013 11:18:26 -0500 Received: from alum-mailsec-scanner-7.mit.edu ([18.7.68.19]:57814 "EHLO alum-mailsec-scanner-7.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754540Ab3LSQSY (ORCPT ); Thu, 19 Dec 2013 11:18:24 -0500 X-AuditID: 12074413-b7fc76d000002aba-94-52b31c4fa690 Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) by alum-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 4F.A9.10938.F4C13B25; Thu, 19 Dec 2013 11:18:23 -0500 (EST) Received: from [172.16.46.13] ([178.19.210.163]) (authenticated bits=0) (User authenticated as mhagger@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id rBJGIKQZ028772 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 19 Dec 2013 11:18:22 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131005 Icedove/17.0.9 In-Reply-To: X-Enigmail-Version: 1.6 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrKKsWRmVeSWpSXmKPExsUixO6iqOsvsznI4MobeYurW3wsuq50M1ms uDqH2WLe3V1MFhvXmTiweuycdZfd49LL72wey752snh83iQXwBLFbZOUWFIWnJmep2+XwJ3x vPMOW0GrUcWCP/fYGxi3aXQxcnJICJhIPHx1ihXCFpO4cG89WxcjF4eQwGVGiaP39rJAOBuY JI7fmQLkcHDwCmhLXPrMB9LAIqAq8ff9frBmNgFdiUU9zUwgtqhAkMSjQw/ZQWxeAUGJkzOf sIDYIkD1Ox7/AlvALHCSUeLH3pvMIAlhATWJZ09XMUIsW8wqMbvnDhtIglMgUOLP3mNgiyUE xCV6GoNATGYBdYn184RAKpgF5CW2v53DPIFRcBaSdbMQqmYhqVrAyLyKUS4xpzRXNzcxM6c4 NVm3ODkxLy+1SNdcLzezRC81pXQTIyTYhXcw7jopd4hRgINRiYd3xcuNQUKsiWXFlbmHGCU5 mJREeZ9JbQ4S4kvKT6nMSCzOiC8qzUktPsQowcGsJMK7ByTHm5JYWZValA+TkuZgURLnVVui 7ickkJ5YkpqdmlqQWgSTleHgUJLgNZQGahQsSk1PrUjLzClBSDNxcIIM55ISKU7NS0ktSiwt yYgHxW98MTCCQVI8QHvZQdp5iwsSc4GiEK2nGHU55n358I1RiCUvPy9VSpxXEKRIAKQoozQP bgUstb1iFAf6WBhiFA8wLcJNegW0hAloifHaTSBLShIRUlINjDVHlGac4dd8xH74p2Og0j43 Ls/S3RlJHzO2+qQ/fztlg2LgiU7TBxcef5taWy7bZVYkcGi52kVdEc+kHe1nrzFe Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 12/19/2013 04:26 PM, Johan Herland wrote: > On Thu, Dec 19, 2013 at 10:31 AM, Michael Haggerty wrote: >> On 12/19/2013 02:11 AM, Johan Herland wrote: >>> On Thu, Dec 19, 2013 at 12:44 AM, Michael Haggerty wrote: >>>> A correct incremental converter could be done (as long as the CVS users >>>> don't literally change history retroactively) but it would be a lot of work. >>> >>> Although I agree with that sentence as it is stated, I also believe >>> that the parenthesized condition rules out a _majority_ of CVS repo of >>> non-trivial size/history. So even though a correct incremental >>> converter could be built, it would be pretty much useless if it did >>> not gracefully handle rewritten history. And in the face of rewritten >>> history it becomes pretty much impossible to define what a "correct" >>> conversion should even look like (not to mention the difficulty of >>> actually implementing that converter...). >> >> A correct conversion would, conceptually, take a diff between the old >> CVS history and the new CVS history (I'm talking about the history as a >> whole, not a diff between two changesets), figure out what had changed, >> and then figure out what Git commits to make to effect the same >> conceptual changes in Git-land. >> >> This means that the final Git history would have to depend not only on >> the current entirety of the CVS history, but also on what the CVS >> history *was* during previous incremental imports and how the tool chose >> to represent that history in Git the previous rounds. >> >> There is a tradeoff here. The smarter the tool is, the fewer >> restrictions would have to be made on what people can do in CVS. For >> example, it wouldn't be unreasonable to impose a rule that people are >> not allowed to move files within the CVS repository (e.g., to fake >> move-file-with-history) after the CVS <-> Git bridge is in use. (Abuses >> of the history that occurred *before* the first incremental conversion, >> on the other hand, wouldn't be a problem.) If the user of the >> incremental tool has *no* influence on how his colleagues use CVS, then >> the tool would have to be very smart and/or the user would might >> sometimes be forced to do another from-scratch conversion. > > Agreed, but I find it quite ugly how the git history will end up > different depending on _when_ the incremental conversion is run. It > means that it will be impossible for two users to create the same Git > repo (matching SHA1s), unless they carefully synchronize all of their > conversion runs Even git-svn doesn't guarantee the same results over time. The most obvious scenario when it fails is when somebody changes an SVN commit's metadata retroactively using something like "svn propedit --revprop svn:log". Consistency over time across two independent conversion processes (that don't communicate) is not even theoretically possible. > (at which point it's much simpler to run a single > conversion and then have both users fetch the result). Yes. That is a very reasonable approach. [Discussion of hypothetical real-time inode-watching or proxy-based converter omitted here...] > Agreed, but if you want correct metadata (_when_ did these changes > happen, _who_ performed them), then you need to actually monitor the > CVS command stream (or CVS server files) in real time... In my opinion it is ridiculous to try to design a CVS <-> Git bridge that tries to use back-channels to fill in historical data that even CVS doesn't record. Such a thing would require an intimate connection to the CVS server from the IT department that is presumably blocking a real move to Git. So who would ever be able to use it? The only reason to record extra information would be to enable the bridge to do self-consistent incremental conversions, and in that case the *only* extra information that has to be recorded is the information that would have anyway landed in Git during the previous conversion. >>> My point here is that people will use whatever available tools they >>> have to solve whatever problems they are currently having. And when >>> CVS is your tool, you will sooner or later end up with a "solution" >>> that irrevocably rewrites your CVS history. >> >> Yes, but I maintain that an incremental importer could keep a Git >> history that is consistent with the CVS history in the sense that: >> >> 1. the result of checking out any branch or tag, right after a run of >> the importer, gives the same results as checking the same branch or tag >> out of CVS. >> >> 2. the Git history from one run is added to (never rewritten) by the >> next run. > > Yes, and even my simplest/fastest possible converter described above > can meet those criteria. After that, it really becomes a question of > _how_much_ CVS history you want to retain in your incremental import. I think you want enough history to make it pleasant to work with the resulting Git repository. That approximately means that you need some semblance of the CVS commits to be reconstructed, with their correct metadata, on the closest thing to their correct branches that is consistent with the CVS - Git impedance mismatch. > I have described the two extremes above. Interestingly, _both_ of > those extremes would look quite different from the > whole-history-gone-incremental converters represented by cvs2git and > cvs-fast-export, and _both_ of the extremes would probably also > provide a converted result quite a bit faster than anything in between > (one by virtue of depending on a single "cvs update" command, and the > other by monitoring the CVS server and performing the conversion to > Git in real time). I am not an extremist. And I know how much work it would be to start a project like this from scratch. After all, what it can do should be a strict superset of what a tool like cvs2git can do, and cvs2svn/cvs2git (according to Ohloh's COCOMO estimate) contains the equivalent of 7 person-years of effort. Anyway, this is all just blah blah unless somebody volunteers to work on it. And I think that is highly unlikely, especially given the decreasing number of CVS repositories in the wild. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/