From mboxrd@z Thu Jan 1 00:00:00 1970 From: esr@thyrsus.com (Eric S. Raymond) Subject: I have end-of-lifed cvsps Date: Wed, 11 Dec 2013 19:17:38 -0500 (EST) Message-ID: <20131212001738.996EB38055C@snark.thyrsus.com> To: git@vger.kernel.org X-From: git-owner@vger.kernel.org Thu Dec 12 01:24:05 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vqu4S-0004Wk-Nq for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 01:24:05 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751475Ab3LLAX6 (ORCPT ); Wed, 11 Dec 2013 19:23:58 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:60166 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751076Ab3LLAX6 (ORCPT ); Wed, 11 Dec 2013 19:23:58 -0500 X-Greylist: delayed 379 seconds by postgrey-1.27 at vger.kernel.org; Wed, 11 Dec 2013 19:23:58 EST Received: by snark.thyrsus.com (Postfix, from userid 1000) id 996EB38055C; Wed, 11 Dec 2013 19:17:38 -0500 (EST) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On the git tools wiki, the first paragraph of the entry for cvsps now reads: Warning: this code has been end-of-lifed by its maintainer in favor of cvs-fast-export. Several attempts over the space of a year to repair its deficient branch analysis and tag assignment have failed. Do not use it unless you are converting a strictly linear repository and cannot get rsync/ssh read access to the repo masters. If you must use it, be prepared to inspect and manually correct the history using reposurgeon. I tried very hard to salvage this program - the ability to remote-fetch CVS repos without rsync access was appealing - but I reached my limit earlier today when I actually found time to assemble a test set of CVS repos and run head-to-head tests comparing cvsps output to cvs-fast-export output. I've long believed that that cvs-fast-export has a better analyzer than cvsps just from having read the code for both of them, and having had to fix some serious bugs in cvsps that have no analogs in cvs-fast-export. Direct comparison of the stream outputs revealed that the difference in quality was larger than I had prevously grasped. Alas, I'm afraid the cvsps repo analysis code turns out to be crap all the way down on anything but the simplest linear and near-linear cases, and it doesn't do so hot on even those (all this *after* I fixed the most obvious bugs in the 2.x version). In retrospect, trying to repair it was misdirected effort. I recommend that git sever its dependency on this tool as soon as possible. I have shipped a 3.13 release with deprecation warnings fot archival purposes, after which I will cease maintainance and redirect anyone inquiring about cvsps to cvs-fast-export. (I also maintain cvs-fast-export, but credit for the excellent analysis code goes to Keith Packard. All I did was write the output stage, document it, and fix a few minor bugs.) -- Eric S. Raymond You [should] not examine legislation in the light of the benefits it will convey if properly administered, but in the light of the wrongs it would do and the harm it would cause if improperly administered -- Lyndon Johnson, former President of the U.S. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Langhoff Subject: Re: I have end-of-lifed cvsps Date: Wed, 11 Dec 2013 22:38:20 -0500 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Git Mailing List To: "Eric S. Raymond" X-From: git-owner@vger.kernel.org Thu Dec 12 04:39:03 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vqx79-0005VZ-5w for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 04:39:03 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751364Ab3LLDim (ORCPT ); Wed, 11 Dec 2013 22:38:42 -0500 Received: from mail-vb0-f52.google.com ([209.85.212.52]:40192 "EHLO mail-vb0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751290Ab3LLDil (ORCPT ); Wed, 11 Dec 2013 22:38:41 -0500 Received: by mail-vb0-f52.google.com with SMTP id p5so2335416vbn.11 for ; Wed, 11 Dec 2013 19:38:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=L/NutwL2Cn/+iL+KRJiA7Htz4108xND9oLZiD2Uqu+Q=; b=Zmr1ykaMYDRyvrBrVZIvCZ89J/0ANzhcuSm/Aw0QWnlO4sBr7oBTBluC4tNDKiOUKr cN5E2coGrkWJmtu/MGvZBDihYcJbhspUIFBFlVwJEAWI3wcho7ljL5O2L52CsRagzb5G jUXzO4+pJm2/VQz9G6WVNTgUmshzd3aT29JGo7d83UVePS5WnDkhoB5zk1isHMhI6dE+ aNjOuQODl5fmZtWFGmWcHPw/BYM1wjyPdqCpZZd2/72mEJ8FVNbhRdXKMwvARoJNryxd bYyPSL1UGDBl9qM7HJlxapaVEgIewR7UWKfogRv2ZXDGF82RvdZirpcSoMVtESIQ1kJo f0Wg== X-Received: by 10.52.231.130 with SMTP id tg2mr1869870vdc.16.1386819520570; Wed, 11 Dec 2013 19:38:40 -0800 (PST) Received: by 10.220.74.133 with HTTP; Wed, 11 Dec 2013 19:38:20 -0800 (PST) In-Reply-To: <20131212001738.996EB38055C@snark.thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Wed, Dec 11, 2013 at 7:17 PM, Eric S. Raymond wrote: > I tried very hard to salvage this program - the ability to > remote-fetch CVS repos without rsync access was appealing Is that the only thing we lose, if we abandon cusps? More to the point, is there today an incremental import option, outside of git-cvsimport+cvsps? [ I am a bit out of touch with the current codebase but I coded and maintained a good part of it back in the day. However naive/limited the cvsps parser was, it did help a lot of projects make the leap to git... ] regards, m -- martin.langhoff@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Wed, 11 Dec 2013 23:26:24 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131212042624.GB8909@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Git Mailing List To: Martin Langhoff X-From: git-owner@vger.kernel.org Thu Dec 12 05:26:32 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vqxr6-0004wr-3U for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 05:26:32 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751438Ab3LLE00 (ORCPT ); Wed, 11 Dec 2013 23:26:26 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:34955 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751086Ab3LLE0Z (ORCPT ); Wed, 11 Dec 2013 23:26:25 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 9C2973805F8; Wed, 11 Dec 2013 23:26:24 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Martin Langhoff : > On Wed, Dec 11, 2013 at 7:17 PM, Eric S. Raymond wrote: > > I tried very hard to salvage this program - the ability to > > remote-fetch CVS repos without rsync access was appealing > > Is that the only thing we lose, if we abandon cusps? More to the > point, is there today an incremental import option, outside of > git-cvsimport+cvsps? You'll have to remind me what you mean by "incremental" here. Possibly it's something cvs-fast-export could support. But what I'm trying to tell you is that, even after I've done a dozen releases and fixed the worst problems I could find, cvsps is far too likely to mangle anything that passes through it. The idea that you are preserving *anything* valuable by sticking with it is a mirage. "That bear trap! It's mangling your leg!" "But it's so *shiny*..." > [ I am a bit out of touch with the current codebase but I coded and > maintained a good part of it back in the day. However naive/limited > the cvsps parser was, it did help a lot of projects make the leap to > git... ] I fear those "lots of projects" have subtly damaged repository histories, then. I warned about this problem a year ago; today I found out it is much worse than I knew then, in fact so bad that I cannot responsibly do anything but try to get cvsps turfed out of use *as soon as possible*. And no, that should *not* wait on cvs-fast-export getting better support for "incremental" or any other legacy feature. Every week that cvsps remains the git project's choice is another week in which somebody's project history is likely to get trashed. This feels very strange and unpleasant. I've never had to shoot one of my own projects through the head before. I blogged about it: http://esr.ibiblio.org/?p=5167 Ignore the malware warning. It's triggered by something else on ibiblio.org; they're fixing it. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Langhoff Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 08:42:25 -0500 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Thu Dec 12 14:42:54 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vr6XV-00078J-Hd for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 14:42:53 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751929Ab3LLNmt (ORCPT ); Thu, 12 Dec 2013 08:42:49 -0500 Received: from mail-ve0-f169.google.com ([209.85.128.169]:64158 "EHLO mail-ve0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751623Ab3LLNms (ORCPT ); Thu, 12 Dec 2013 08:42:48 -0500 Received: by mail-ve0-f169.google.com with SMTP id c14so309241vea.28 for ; Thu, 12 Dec 2013 05:42:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=Ia6tgGyQbRvioYZqyybB0Pe9somRMwKlkcOcU7pez68=; b=DT0UGWstHnnSApdGQZ4EFvvsDXT4rLI7cdxvsADcgAjKC8fn9VgYZ0qXhCbUlxAwah ynSAsVPdcIRMe3EGKZqqA9V2iiwRRXkA21AxT99cI6et+h2dupgdwa5NJ41LHlxWmBGZ H2Ynbxqswlo1Ft3ZHcr700Z+4NoYCFFtNSL/zMTg1qTon/aLGRVa7PL78UCGh87nOXqC d9IlDwQXCWb/+Fli93ZoCRrjZfVOwoF2gUubjNtn7WVr+BadZKn+m2kvZMolBJNiQDLP YhR3hQh1oMygGtM1dfc456Msb4CE17/nBIqj63YVKzoneRqDh+tGcxnqY+D4c2QGARZh 8brw== X-Received: by 10.58.187.81 with SMTP id fq17mr2870961vec.14.1386855767658; Thu, 12 Dec 2013 05:42:47 -0800 (PST) Received: by 10.220.74.133 with HTTP; Thu, 12 Dec 2013 05:42:25 -0800 (PST) In-Reply-To: <20131212042624.GB8909@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Wed, Dec 11, 2013 at 11:26 PM, Eric S. Raymond wrote: > You'll have to remind me what you mean by "incremental" here. Possibly > it's something cvs-fast-export could support. User can - run a cvs to git import at time T, resulting in repo G - make commits to cvs repo - run cvs to git import at time T1, pointed to G, and the import tool will only add the new commits found in cvs between T and T1. > But what I'm trying to tell you is that, even after I've done a dozen > releases and fixed the worst problems I could find, cvsps is far too > likely to mangle anything that passes through it. The idea that you > are preserving *anything* valuable by sticking with it is a mirage. The bugs that lead to a mangled history are real. I acknowledge and respect that. However, with those limitations, the incremental feature has value in many scenarios. The two main ones are as follows: - A developer is tracking his/her own patches on top of a CVS-based project with git. This is often done with git-svn for example. If old/convoluted branches in the far past are mangled, this user won't care; as long as HEAD->master and/or the current/recent branch are consistent with reality, the tool fits a need. - A project plans to transition to git gradually. Experienced developers who'd normally work on CVS HEAD start working on git (and landing their work on CVS afterwards). Old/mangled branches and tags are of little interest, the big value is CVS HEAD (which is linear) and possibly recent release/stable branches. The history captured is good enough for git blame/log/pickaxe along the "master" line. At transition time the original CVS repo can be kept around in readonly mode, so people can still checkout the exact contents of an old branch or tag for example (assuming no destructive "surgery" was done in the CVS repo). The above examples assume that the CVS repos have used "flying fish" approach in the "interesting" (i.e.: recent) parts of their history. [ Simplifying a bit for non-CVS-geeks -- flying fish is using CVS HEAD for your development, plus 'feature branches' that get landed, plus long-lived 'stable release' branches. Most CVS projects in modern times use flying fish, which is a lot like what the git project uses in its own repo, but tuned to CVS's strengths (interesting commits linearized in CVS HEAD). Other approaches ('dovetail') tend to end up with unworkable messes given CVS's weaknesses. ] The cvsimport+cvsps combo does a reasonable (though imperfect) job on 'flying fish' CVS histories _and that is what most projects evolved to use_. If other cvs import tools can handle crazy histories, hats off to them. But careful with knifing cvsps! cheers, m -- martin.langhoff@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Krey Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 18:17:56 +0100 Message-ID: <20131212171756.GA6954@inner.h.apk.li> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Eric Raymond , Git Mailing List To: Martin Langhoff X-From: git-owner@vger.kernel.org Thu Dec 12 18:18:16 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vr9tv-0001Hj-R1 for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 18:18:16 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751886Ab3LLRSJ (ORCPT ); Thu, 12 Dec 2013 12:18:09 -0500 Received: from continuum.iocl.org ([217.140.74.2]:51245 "EHLO continuum.iocl.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751530Ab3LLRSH (ORCPT ); Thu, 12 Dec 2013 12:18:07 -0500 Received: (from krey@localhost) by continuum.iocl.org (8.11.3/8.9.3) id rBCHHuQ07352; Thu, 12 Dec 2013 18:17:56 +0100 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-message-flag: What did you expect to see here? Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, 12 Dec 2013 08:42:25 +0000, Martin Langhoff wrote: ... > - run a cvs to git import at time T, resulting in repo G > - make commits to cvs repo > - run cvs to git import at time T1, pointed to G, and the import tool > will only add the new commits found in cvs between T and T1. I'm pretty sure that being given only G the incremental approach wouldn't work - some extra state would be required. But anyway, the replacement question is a) how fast the cvs-fast-export is and b) whether its output is stable, that is, if the cvs repo C yields a git repo G, will then C with a few extra commits yield G' where every commit in G (as identified by its SHA1) is also in G', and G' additionally contains the new commits that were made to the CVS repo. If that is the case you effectively have an incremental mode, except that it's not quite as fast. At least that would be good enough for us - we ended up running a filter-branch on the resulting history, and that takes some time anyway. ... > The cvsimport+cvsps combo does a reasonable (though imperfect) job on > 'flying fish' CVS histories _and that is what most projects evolved to > use_. If other cvs import tools can handle crazy histories, hats off > to them. But careful with knifing cvsps! It won't magically disappear from your machine, and you have been warned. :-) Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds Date: Fri, 22 Jan 2010 07:29:21 -0800 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Langhoff Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 12:26:40 -0500 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <20131212171756.GA6954@inner.h.apk.li> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Eric Raymond , Git Mailing List To: Andreas Krey X-From: git-owner@vger.kernel.org Thu Dec 12 18:27:10 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrA2V-00082g-B1 for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 18:27:07 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751671Ab3LLR1E (ORCPT ); Thu, 12 Dec 2013 12:27:04 -0500 Received: from mail-ve0-f182.google.com ([209.85.128.182]:42608 "EHLO mail-ve0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751469Ab3LLR1B (ORCPT ); Thu, 12 Dec 2013 12:27:01 -0500 Received: by mail-ve0-f182.google.com with SMTP id jy13so544982veb.41 for ; Thu, 12 Dec 2013 09:27:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=F5POrwsuc2gggzZigtire3+H79EPwwCHOBnWIFrw60g=; b=JJIkCdoo0LAX935VJCYUcIdI6/RX81YcnHEpBPBci+cYrcYYvfNdYQ3le7BhKmncJM TUM2YJGnXLA0N6BVPBikgXZUIdAmOwf3aouv4lpYD3NRGpaSl36bXrdHcyKE304y3wlh pBiW5bvEzsXfMXsJ37V6WLd441yUfao4zgIqhAnIqtiP/9IdJdFTy3fXmBkQV4aMKewp WSC9J1+hTXlrZfN7F+EbVx8e42ZWfNWD+Xr85VTR9B6Frtsv2x86xtPMmAS0ORPx+ZuA HQyAlOsThwoLiA0uMyKOQewKjGq38lXG9yDioqarlRDTiVabDOUDqy0eN8RVGU+E+yxB tmhg== X-Received: by 10.52.230.202 with SMTP id ta10mr287366vdc.41.1386869220434; Thu, 12 Dec 2013 09:27:00 -0800 (PST) Received: by 10.220.74.133 with HTTP; Thu, 12 Dec 2013 09:26:40 -0800 (PST) In-Reply-To: <20131212171756.GA6954@inner.h.apk.li> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, Dec 12, 2013 at 12:17 PM, Andreas Krey wrote: > But anyway, the replacement question is a) how fast the cvs-fast-export is > and b) whether its output is stable In my prior work, the "better" CVS importers would not have stable output, so were not appropriate for incremental imports. And even the fastest ones were very slow on large repos. That is why I am asking the question. > It won't magically disappear from your machine, and you have been warned. :-) However, esr is making the case that git-cvsimport should stop using cvsps. My questions are aimed at understanding whether this actually results in proposing that an important feature is dropped. Perhaps a better alternative is now available. m -- martin.langhoff@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 13:15:13 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131212181513.GA16960@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Git Mailing List To: Martin Langhoff X-From: git-owner@vger.kernel.org Thu Dec 12 19:15:22 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrAn9-00031f-OX for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 19:15:20 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751674Ab3LLSPP (ORCPT ); Thu, 12 Dec 2013 13:15:15 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:40857 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750840Ab3LLSPO (ORCPT ); Thu, 12 Dec 2013 13:15:14 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id DB2B0380445; Thu, 12 Dec 2013 13:15:13 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Martin Langhoff : > On Wed, Dec 11, 2013 at 11:26 PM, Eric S. Raymond wrote: > > You'll have to remind me what you mean by "incremental" here. Possibly > > it's something cvs-fast-export could support. > > User can > > - run a cvs to git import at time T, resulting in repo G > - make commits to cvs repo > - run cvs to git import at time T1, pointed to G, and the import tool > will only add the new commits found in cvs between T and T1. No, cvs-fast-export doesn't do that. However, it is fast enough that you can probably just rebuild the whole repo each time you want to move content. When I did the conversion of groff recently I was getting rates of about 150 commits a second - and it will be faster now, because I found an expensive operation in the output stage I could optimize out. Now that you have reminded me of this, I remember implementing a -i option for cvsps-3.0 that could be combined with a time restriction to output incremental dumps. It's likely I could do the same thing for cvs-fast-import. > The above examples assume that the CVS repos have used "flying fish" > approach in the "interesting" (i.e.: recent) parts of their history. > > [ Simplifying a bit for non-CVS-geeks -- flying fish is using CVS HEAD > for your development, plus 'feature branches' that get landed, plus > long-lived 'stable release' branches. Most CVS projects in modern > times use flying fish, which is a lot like what the git project uses > in its own repo, but tuned to CVS's strengths (interesting commits > linearized in CVS HEAD). > > Other approaches ('dovetail') tend to end up with unworkable messes > given CVS's weaknesses. ] That terminology -- "flying fish" and "dovetail" -- is interesting, and I have not heard it before. It might be woth putting in the Jargon File. Can you point me at examples of live usage? -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 13:29:32 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131212182932.GB16960@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <20131212171756.GA6954@inner.h.apk.li> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Martin Langhoff , Git Mailing List To: Andreas Krey X-From: git-owner@vger.kernel.org Thu Dec 12 19:29:43 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrB10-0005fB-8z for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 19:29:38 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751653Ab3LLS3e (ORCPT ); Thu, 12 Dec 2013 13:29:34 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:40959 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751092Ab3LLS3d (ORCPT ); Thu, 12 Dec 2013 13:29:33 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 8722B380445; Thu, 12 Dec 2013 13:29:32 -0500 (EST) Content-Disposition: inline In-Reply-To: <20131212171756.GA6954@inner.h.apk.li> X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Andreas Krey : > But anyway, the replacement question is a) how fast the cvs-fast-export is > and b) whether its output is stable, that is, if the cvs repo C yields > a git repo G, will then C with a few extra commits yield G' where every > commit in G (as identified by its SHA1) is also in G', and G' additionally > contains the new commits that were made to the CVS repo. > > If that is the case you effectively have an incremental mode, except that > it's not quite as fast. I am almost certain the output of cvs-fast-export is stable. I believe the output of cvsps-3.x was, too. Not sure about 2.x. I wrote the output stages for both cvsps-3.x and cvs-fast-export, and went to some effort to verify that they write streams in the same "most natural" way - marks sequential from :1, blobs always witten as late as possible, fileops in the same sort order the git tools emit, etc. I have added writing a regression test test to verify the stability property to the TODO list. I will have this nailed down before the next point release, in a few days. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 13:35:33 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131212183533.GC16960@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <20131212171756.GA6954@inner.h.apk.li> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andreas Krey , Git Mailing List To: Martin Langhoff X-From: git-owner@vger.kernel.org Thu Dec 12 19:35:39 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrB6o-00021Z-Ha for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 19:35:38 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751709Ab3LLSfe (ORCPT ); Thu, 12 Dec 2013 13:35:34 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:41006 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751092Ab3LLSfe (ORCPT ); Thu, 12 Dec 2013 13:35:34 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 7AF1C380445; Thu, 12 Dec 2013 13:35:33 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Martin Langhoff : > In my prior work, the "better" CVS importers would not have stable > output, so were not appropriate for incremental imports. That is disturbing. I would consider lack of stability a severe and unacceptable failure mode in such a tool, if only because of the difficulties it creates for proper regression testing. If cvs-fast-export does not already have this property I will fix it so it does. And document that fact. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Langhoff Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 13:53:27 -0500 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <20131212181513.GA16960@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Thu Dec 12 19:53:54 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrBOT-0008GZ-D6 for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 19:53:53 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752012Ab3LLSxt (ORCPT ); Thu, 12 Dec 2013 13:53:49 -0500 Received: from mail-vc0-f180.google.com ([209.85.220.180]:33827 "EHLO mail-vc0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751774Ab3LLSxs (ORCPT ); Thu, 12 Dec 2013 13:53:48 -0500 Received: by mail-vc0-f180.google.com with SMTP id if17so578496vcb.25 for ; Thu, 12 Dec 2013 10:53:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=sLXmrHmAvrqXFOfh2hiROm/EjN6ld6nFqc/KF2Ir4FE=; b=NdlcdhI6jK4YA3wKxrlRWO40NArCBUob65aAtum4HYEvlC3Kb3LqeawcWcCUqCao2H K8db+432qRGe1XMFuF41C7W7aFVULY00A4mxFldwncg/sL2cHLRr8gARDpTndYlgx9k7 MQRg3rfR2jNaryRMPCOCXtIGnzndH+6tGqo4yHfmByN8RdwUZei5PiK8eatjKZzFFXTH i2Vh+XjnJc4aHqMt1IrVPrDGsYoGdSPlrODUvOdSGPUe3kpDw6zQpejNQn+kHxgd/OoY U5GdEv1CGRdCanoMfcUgUrEgVbUV64g3meDgSei0LUzyKdNWB1BSeyZu3/aT0cnciYvk UGoA== X-Received: by 10.220.116.136 with SMTP id m8mr223174vcq.77.1386874427827; Thu, 12 Dec 2013 10:53:47 -0800 (PST) Received: by 10.220.74.133 with HTTP; Thu, 12 Dec 2013 10:53:27 -0800 (PST) In-Reply-To: <20131212181513.GA16960@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, Dec 12, 2013 at 1:15 PM, Eric S. Raymond wrote: > That terminology -- "flying fish" and "dovetail" -- is interesting, and > I have not heard it before. It might be woth putting in the Jargon File. > Can you point me at examples of live usage? The canonical reference would be http://cvsbook.red-bean.com/cvsbook.html#Going%20Out%20On%20A%20Limb%20(How%20To%20Work%20With%20Branches%20And%20Survive) just by being on the internet and widely referenced it has probably eclipsed in google-juice examples of earlier usage. Karl Fogel may remember where he got the names from. cheers, m -- martin.langhoff@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Langhoff Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 14:08:33 -0500 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <20131212171756.GA6954@inner.h.apk.li> <20131212182932.GB16960@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Andreas Krey , Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Thu Dec 12 20:09:12 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrBd7-0003bq-Nv for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 20:09:02 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752502Ab3LLTI5 (ORCPT ); Thu, 12 Dec 2013 14:08:57 -0500 Received: from mail-vb0-f50.google.com ([209.85.212.50]:46782 "EHLO mail-vb0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752462Ab3LLTIz (ORCPT ); Thu, 12 Dec 2013 14:08:55 -0500 Received: by mail-vb0-f50.google.com with SMTP id w18so603539vbj.37 for ; Thu, 12 Dec 2013 11:08:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=UjwtIvvZr6qNQbDKjaCsQAk1cnU/L6Iw5NFIJ398hck=; b=ZWvTzS/sRc4Pl89yR79Noj/E9bfMzg+jpBwB24KyN1jy9hI1IdG5fKO8XZZbseVc/S lpQr6jmacDK7yIzW50r1OEuU1EbkNjuf6lhXLsXLUMcoWpVC6HTBer7kjQJIJElKg+Li 5rh7WVaaoWtauhw4uUDiwC1bx57/h3Unr63smSUXTXmbPYNqIQqZgTO7+ZBfjSbYFgRF eqb93r9YHXdv/pbLAknUycLvVe1u6WexwfIs1PqIczvyrCsNpN+Ug6pJiyj1iTt0jt85 Odye4W94Hj8RvchF01MY39fd5LFrbTicyTzu0LMRfDm26LA+PZpU3mT8mOk7ksMFcQwD 26Dw== X-Received: by 10.58.11.73 with SMTP id o9mr4644868veb.8.1386875334328; Thu, 12 Dec 2013 11:08:54 -0800 (PST) Received: by 10.220.74.133 with HTTP; Thu, 12 Dec 2013 11:08:33 -0800 (PST) In-Reply-To: <20131212182932.GB16960@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, Dec 12, 2013 at 1:29 PM, Eric S. Raymond wrote: > I am almost certain the output of cvs-fast-export is stable. I > believe the output of cvsps-3.x was, too. Not sure about 2.x. IIRC, making the output stable is nontrivial, specially on branches. Two cases are still in my mind, from when I was wrestling with cvsps. 1 - For a history with CVS HEAD and a long-running "stable release" branch ("STABLE"), which branched at P1... a - adding a file only at the tip of STABLE "retroactively changes history" for P1 and perhaps CVS HEAD b - forgetting to properly tag a subset of files with the branch tag, and doing it later retroactively changes history 2 - you can create a new branch or tag with files that do not belong together in any "commit". Doing so changes history retroactively ... when I say "changes history", I mean that the importers I know revise their guesses of what files were seen together in a 'commit'. This is specially true for history recorded with early cvs versions that did not record a 'commit id'. cvsps has the strange "feature" that it will cache its assumptions/guesses, and continue incrementally from there. So if a change in the CVS repo means that the old guess is now invalidated, it continues the charade instead of forcing a complete rewrite of the git history. Maybe the current crop of tools have developed stronger magic than what was available a few years ago... the task did seem impossible to me. cheers, m -- martin.langhoff@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 14:39:18 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131212193918.GA17529@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <20131212171756.GA6954@inner.h.apk.li> <20131212182932.GB16960@thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andreas Krey , Git Mailing List To: Martin Langhoff X-From: git-owner@vger.kernel.org Thu Dec 12 20:39:27 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrC6V-0002cR-U5 for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 20:39:24 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751527Ab3LLTjU (ORCPT ); Thu, 12 Dec 2013 14:39:20 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:41389 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750989Ab3LLTjT (ORCPT ); Thu, 12 Dec 2013 14:39:19 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 90392380445; Thu, 12 Dec 2013 14:39:18 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Martin Langhoff : > IIRC, making the output stable is nontrivial, specially on branches. > Two cases are still in my mind, from when I was wrestling with cvsps. > > 1 - For a history with CVS HEAD and a long-running "stable release" > branch ("STABLE"), which branched at P1... > > a - adding a file only at the tip of STABLE "retroactively changes > history" for P1 and perhaps CVS HEAD > > b - forgetting to properly tag a subset of files with the branch > tag, and doing it later retroactively changes history > > 2 - you can create a new branch or tag with files that do not belong > together in any "commit". Doing so changes history retroactively > > ... when I say "changes history", I mean that the importers I know > revise their guesses of what files were seen together in a 'commit'. > This is specially true for history recorded with early cvs versions > that did not record a 'commit id'. Yikes! That is a much stricter stability criterion than I thought you were specifying. No, cvs-fast-export probably doesn't satify all of these. I think it would handle 1a in a stable way, but 1b and 2 would throw it. I'm sure it can't be fooled in the presence of commitids, though, because when it has those it doesn't try to do any similarity matching. And (this is the important point) it won't match any change with a commit-id to any change without one. What I think this means is that cvs-fast-export is stable if you are using a server/client combination that generates commitids (that is, GNU CVS of any version newer than 1.12 of 2004, or CVS-NT). It is *not* necessary for stability that the entire history have them. Here's how the logic works out: 1. Commits grouped by commitid are stable - nothing in CVS ever rewrites those or assigns a duplicate. 2. No file change made with a commitid can destabilize a commit guess made without them, because the similarity checker never tries to put both kinds in a single changeset. Can you detect any flaw in this? -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Langhoff Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 14:48:44 -0500 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <20131212171756.GA6954@inner.h.apk.li> <20131212182932.GB16960@thyrsus.com> <20131212193918.GA17529@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Andreas Krey , Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Thu Dec 12 20:49:16 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrCG2-0001vG-Tz for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 20:49:15 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751800Ab3LLTtL (ORCPT ); Thu, 12 Dec 2013 14:49:11 -0500 Received: from mail-vc0-f171.google.com ([209.85.220.171]:62888 "EHLO mail-vc0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751608Ab3LLTtI (ORCPT ); Thu, 12 Dec 2013 14:49:08 -0500 Received: by mail-vc0-f171.google.com with SMTP id ik5so648966vcb.16 for ; Thu, 12 Dec 2013 11:49:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=Oe+xjFxGYHrIhNW6FAzX7Lqo+Kt9r0/yLc93adu/0VQ=; b=fYTw2wrY+38sbQi3+y8tdYxN2+z0m7F+bUvNRt18XDpsAr54u5JAp3QaQHy9r+K++3 /RKVDu2KHzpA+qQdyvYtJHLCeMNszSVXo8tXmaJXt2wCqPdR3JxKkXSWxVn1niroj+2B BSflk2VgmK239jIyrZ9Wgv+gT8KOaTNrQpeEXVfCg+sovIUvxPlJ0n4VCSiwEzUzqeS+ utzXTm6I9gd6s7ImMsQvQXX9/DU3QCEAqjvYILOCqGr8p2cD7dtuuO0GebK1Cb/U+ylJ fwgINBCwIdNyZ4dXbs/d8m6D3lBKmXjFnZ+Jcg9Kab0wQj+BSkci3VkngXYEFv7LW/oT 9Bpg== X-Received: by 10.58.54.69 with SMTP id h5mr4610330vep.25.1386877747962; Thu, 12 Dec 2013 11:49:07 -0800 (PST) Received: by 10.220.74.133 with HTTP; Thu, 12 Dec 2013 11:48:44 -0800 (PST) In-Reply-To: <20131212193918.GA17529@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, Dec 12, 2013 at 2:39 PM, Eric S. Raymond wrote: > Yikes! That is a much stricter stability criterion than I thought you > were specifying. :-) -- cvsps's approach is: if you have a cache, you can remember the lies you told earlier. It is impossible to be stable purely from the source data in the face of these issues. CVS is truly a PoS. > I think it would handle 1a in a stable way that is pretty important. Files added on a branch not affecting HEAD and earlier branch checkout matters. > What I think this means is that cvs-fast-export is stable if you are > using a server/client combination that generates commitids (that is, > GNU CVS of any version newer than 1.12 of 2004, or CVS-NT). It is > *not* necessary for stability that the entire history have them. > > Here's how the logic works out: > > 1. Commits grouped by commitid are stable - nothing in CVS ever rewrites > those or assigns a duplicate. > > 2. No file change made with a commitid can destabilize a commit guess > made without them, because the similarity checker never tries to put both > kinds in a single changeset. > > Can you detect any flaw in this? If someone creates a nonsensical tag or branch point, tagging files from different commits, how do you handle it? - without commit ids, does it affect your guesses? - regardless of commit ids, do you synthesize an artificial commit? How do you define parenthood for that artificial commit? curious, m -- martin.langhoff@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 15:58:19 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131212205819.GA18166@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <20131212171756.GA6954@inner.h.apk.li> <20131212182932.GB16960@thyrsus.com> <20131212193918.GA17529@thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andreas Krey , Git Mailing List To: Martin Langhoff X-From: git-owner@vger.kernel.org Thu Dec 12 21:58:25 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrDKz-0002fJ-6Y for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 21:58:25 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751776Ab3LLU6V (ORCPT ); Thu, 12 Dec 2013 15:58:21 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:41883 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751543Ab3LLU6U (ORCPT ); Thu, 12 Dec 2013 15:58:20 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id D7A3C380445; Thu, 12 Dec 2013 15:58:19 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Martin Langhoff : > If someone creates a nonsensical tag or branch point, tagging files > from different commits, how do you handle it? > > - without commit ids, does it affect your guesses? No. Tagging is never used to deduce changesets. Look: /* * The heart of the merge operation; detect when two * commits are "the same" */ static bool rev_commit_match (rev_commit *a, rev_commit *b) { /* * Versions of GNU CVS after 1.12 (2004) place a commitid in * each commit to track patch sets. Use it if present */ if (a->commitid && b->commitid) return a->commitid == b->commitid; if (a->commitid || b->commitid) return false; if (!commit_time_close (a->date, b->date)) return false; if (a->log != b->log) return false; if (a->author != b->author) return false; return true; } > - regardless of commit ids, do you synthesize an artificial commit? > How do you define parenthood for that artificial commit? Because tagging is never used to deduce changesets, the case does not arise. I have added an item to my to-do: document what the tool does with inconsistent tags. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Langhoff Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 17:51:13 -0500 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <20131212171756.GA6954@inner.h.apk.li> <20131212182932.GB16960@thyrsus.com> <20131212193918.GA17529@thyrsus.com> <20131212205819.GA18166@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Andreas Krey , Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Thu Dec 12 23:51:56 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrF6o-0003aS-3e for gcvg-git-2@plane.gmane.org; Thu, 12 Dec 2013 23:51:54 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751585Ab3LLWvg (ORCPT ); Thu, 12 Dec 2013 17:51:36 -0500 Received: from mail-ve0-f174.google.com ([209.85.128.174]:33933 "EHLO mail-ve0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751496Ab3LLWvf (ORCPT ); Thu, 12 Dec 2013 17:51:35 -0500 Received: by mail-ve0-f174.google.com with SMTP id pa12so840258veb.33 for ; Thu, 12 Dec 2013 14:51:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=4XrM4u0FuUgX1//paDYY2ZpvhZ5n+LnzlFkR6gN2DhI=; b=fBebxKTuLJWZzP12OK7TYIgJOsI+GU9F0NiUKl/SZ+aDT2zbtvqem23qegk2tteEgW bEgONdh9OghA4nLc2vzyIA4aKiaDzeIC/L562uTlbZWxpT0qF3ysKfTSQSDOt7b37v+L Xys1Xvt1NzbGTs+T3Y55DbN0SOwUZGJbQd5siQewYuiyXof28dw1aQgq+vm8DU1wzWY6 XztPyFj4C1xwZsyjWAzVfs5Z/IH5yL1c68A4RK+CKzFCkWXXJRA7AQG608Gby9/vUyjE +Luijx05qmN1D/kKsdA6QyMA1zpHOfgZxOcUbjj5SBv5MnQSFdoSFW8V8ehS+CdyEU86 Yirw== X-Received: by 10.220.244.132 with SMTP id lq4mr1029186vcb.31.1386888694630; Thu, 12 Dec 2013 14:51:34 -0800 (PST) Received: by 10.220.74.133 with HTTP; Thu, 12 Dec 2013 14:51:13 -0800 (PST) In-Reply-To: <20131212205819.GA18166@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, Dec 12, 2013 at 3:58 PM, Eric S. Raymond wrote: >> - regardless of commit ids, do you synthesize an artificial commit? >> How do you define parenthood for that artificial commit? > > Because tagging is never used to deduce changesets, the case does not arise. So if a branch has a nonsensical branching point, or a tag is nonsensical, is it ignored and not imported? curious, m -- martin.langhoff@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 18:04:54 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131212230454.GA20054@thyrsus.com> References: <20131212042624.GB8909@thyrsus.com> <20131212171756.GA6954@inner.h.apk.li> <20131212182932.GB16960@thyrsus.com> <20131212193918.GA17529@thyrsus.com> <20131212205819.GA18166@thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andreas Krey , Git Mailing List To: Martin Langhoff X-From: git-owner@vger.kernel.org Fri Dec 13 00:05:01 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrFJU-0004c0-Bc for gcvg-git-2@plane.gmane.org; Fri, 13 Dec 2013 00:05:00 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751849Ab3LLXE4 (ORCPT ); Thu, 12 Dec 2013 18:04:56 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:43196 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751717Ab3LLXEz (ORCPT ); Thu, 12 Dec 2013 18:04:55 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id D18A3380459; Thu, 12 Dec 2013 18:04:54 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Martin Langhoff : > On Thu, Dec 12, 2013 at 3:58 PM, Eric S. Raymond wrote: > >> - regardless of commit ids, do you synthesize an artificial commit? > >> How do you define parenthood for that artificial commit? > > > > Because tagging is never used to deduce changesets, the case does not arise. > > So if a branch has a nonsensical branching point, or a tag is > nonsensical, is it ignored and not imported? I don't know what happens when identically-named tags point at changes that resolve into two different commits. I will figure that out and document it. There's evidence, in the form of some code that is #ifdefed out, that Keith considered trying to make synthetic commits from tag cliques. But abandoned the idea because he couldn't figure out how to assign such cliques to a branch. I'm not sure what counts as a nonsensical branching point. I do know that Keith left this rather cryptic note in a REAME: Disjoint branch resolution. Branches occurring in a subset of the files are not correctly resolved; instead, an entirely disjoint history will be created containing the branch revisions and all parents back to the root. I'm not sure how to fix this; it seems to implicitly assume there will be only a single place to attach as branch parent, which may not be the case. In any case, the right revision will have a superset of the revisions present in the original branch parent; perhaps that will suffice. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Langhoff Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 21:35:54 -0500 Message-ID: References: <20131212042624.GB8909@thyrsus.com> <20131212171756.GA6954@inner.h.apk.li> <20131212182932.GB16960@thyrsus.com> <20131212193918.GA17529@thyrsus.com> <20131212205819.GA18166@thyrsus.com> <20131212230454.GA20054@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Andreas Krey , Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Fri Dec 13 03:36:21 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrIc0-0003uk-D1 for gcvg-git-2@plane.gmane.org; Fri, 13 Dec 2013 03:36:20 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752027Ab3LMCgQ (ORCPT ); Thu, 12 Dec 2013 21:36:16 -0500 Received: from mail-ve0-f173.google.com ([209.85.128.173]:48459 "EHLO mail-ve0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751742Ab3LMCgP (ORCPT ); Thu, 12 Dec 2013 21:36:15 -0500 Received: by mail-ve0-f173.google.com with SMTP id oz11so973952veb.32 for ; Thu, 12 Dec 2013 18:36:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=IdX7t5Hr2RHw8PdcTH5ICjwslkgZFQPlCdqsw+H6Bcs=; b=IPeAehk4fUw3uGsMdtnebTgN0g6jIRRPKBiwkOlM68n/FmS9mhy0vuaskhzTy8kJMR OIwG0MDZagLxm5XPXDjSWKVXZYGEghCuihf5C6gqf2MDjABpVakG9E6EVgK/nqxuJ11g PcMbLQZ6CP7xLzluy7QAaEqGV1iKuOMdMseQDQoRM1jR/fnIDsKu5pKaokxdlht8cIoW R6VnSMAug4/nOjaL7rhQGYgPKLuHazkR0KyAQEhqWl2F1Du8UNMeS3D6ksKU7C4SQhuA Cn56JiFbGHnNj2duUbNP5IY1AgAaovpLUUU9VHweeJT2iHteqi1y2M0u97VtoZJ16jGA 2nqQ== X-Received: by 10.52.36.80 with SMTP id o16mr45209vdj.48.1386902174810; Thu, 12 Dec 2013 18:36:14 -0800 (PST) Received: by 10.220.74.133 with HTTP; Thu, 12 Dec 2013 18:35:54 -0800 (PST) In-Reply-To: <20131212230454.GA20054@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, Dec 12, 2013 at 6:04 PM, Eric S. Raymond wrote: > I'm not sure what counts as a nonsensical branching point. I do know that > Keith left this rather cryptic note in a REAME: Keith names exactly what we are talking about. At that time, Keith was struggling with the old xorg cvs repo which these and quite a few other nasties. I was also struggling with the mozilla cvs repo with its own gremlins. Between my earlier explanation and Keith's notes it should be clear to you. It is absolutely trivial in CVS to have an "inconsistent" checkout (for example, if you switch branch with the -l parameter disabling recursion, or if you accidentally switch branch in a subdirectory). On that inconsistent checkout, nothing prevents you from tagging it, nor from creating a new branch. An importer with a 'consistent tree mentality' will look at the files/revs involved in that tag (or branching point) and find no tree to match. CVS repos with that crap exist. x11/xorg did (Jim Gettys challenged me to try importing it at an LCA, after the Bazaar NG folks passed on it). Mozilla did as well. IMHO it is a valid path to skip importing the tag/branch. As long as main dev work was in HEAD, things end up ok (which goes back to my flying fish notes). cheers, m -- martin.langhoff@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Thu, 12 Dec 2013 22:38:34 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131213033833.GB20850@thyrsus.com> References: <20131212171756.GA6954@inner.h.apk.li> <20131212182932.GB16960@thyrsus.com> <20131212193918.GA17529@thyrsus.com> <20131212205819.GA18166@thyrsus.com> <20131212230454.GA20054@thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andreas Krey , Git Mailing List To: Martin Langhoff X-From: git-owner@vger.kernel.org Fri Dec 13 04:38:58 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VrJaa-0006Rd-1l for gcvg-git-2@plane.gmane.org; Fri, 13 Dec 2013 04:38:56 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751829Ab3LMDig (ORCPT ); Thu, 12 Dec 2013 22:38:36 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:46120 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751726Ab3LMDif (ORCPT ); Thu, 12 Dec 2013 22:38:35 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 1B9F0380459; Thu, 12 Dec 2013 22:38:34 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Martin Langhoff : > On Thu, Dec 12, 2013 at 6:04 PM, Eric S. Raymond wrote: > > I'm not sure what counts as a nonsensical branching point. I do know that > > Keith left this rather cryptic note in a REAME: > > Keith names exactly what we are talking about. Oh, yeah, I figured that much out. What I wasn't clear on was (a) whether that's a complete description of "nonsensical branching point" or whether there are other pathologies fundamentally *different* from that one. I'm also not sure I have the end state of what cvs-fast-export does in that case visualized correctly. When he says: "an entirely disjoint history will be created containing the branch revisions and all parents back to the root", I'm visualizing something like this: a----b----c----d----e----f----g----h \ +----1----2----3---4 Suppose the root is a our pathological branch point is at d, then it sounds like he's saying cvs-fast-export will produce a changeset DAG that looks like this: a----b'---c'---d'---e----f----g----h \ +----b''---c''---d''----1----2----3----4 What I'm not clear on here is how b is related to b' and b'', c to c' and c'', and d to d' and d''. Which file changes go to which commit? I shall have to craft some broken RCS files to find out. Have I explained that I'm building a test suite? I intend to know exactly what the tool does in these cases and document it. > Between my earlier explanation and Keith's notes it should be clear to > you. It is absolutely trivial in CVS to have an "inconsistent" > checkout (for example, if you switch branch with the -l parameter > disabling recursion, or if you accidentally switch branch in a > subdirectory). That last one sounds easy to fall into and nasty. > On that inconsistent checkout, nothing prevents you from tagging it, > nor from creating a new branch. > > An importer with a 'consistent tree mentality' will look at the > files/revs involved in that tag (or branching point) and find no tree > to match. > > CVS repos with that crap exist. x11/xorg did (Jim Gettys challenged me > to try importing it at an LCA, after the Bazaar NG folks passed on > it). Mozilla did as well. > > > IMHO it is a valid path to skip importing the tag/branch. As long as > main dev work was in HEAD, things end up ok (which goes back to my > flying fish notes). The other way to handle it would be to translate the history as though every branch of a file subset had been an attempt to branch eveything. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?SmFrdWIgTmFyxJlic2tp?= Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 11:57:03 +0100 Message-ID: <52B02DFF.5010408@gmail.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Eric Raymond , Git Mailing List To: Martin Langhoff X-From: git-owner@vger.kernel.org Tue Dec 17 11:57:20 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VssL1-0004Nt-7y for gcvg-git-2@plane.gmane.org; Tue, 17 Dec 2013 11:57:19 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752567Ab3LQK5N convert rfc822-to-quoted-printable (ORCPT ); Tue, 17 Dec 2013 05:57:13 -0500 Received: from mail-ea0-f174.google.com ([209.85.215.174]:46680 "EHLO mail-ea0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751718Ab3LQK5L (ORCPT ); Tue, 17 Dec 2013 05:57:11 -0500 Received: by mail-ea0-f174.google.com with SMTP id b10so2743621eae.5 for ; Tue, 17 Dec 2013 02:57:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:newsgroups:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=rASUYyBpSNIHJkfUK9CVTcXhYluXz3qTLgNvVg/aKDg=; b=k9mLPEY20asUgN9G1U5/eOQGGD39tJUqQItfczuK29i4v+/RFhlG7LfM9ZZ/z+d63v Iwrc3UgyU48RLKMGCyq5dya/3pD3tF6J6pQriLV503SFinu8jz97aYjDvqsfTYFKEceS 6Uo+GECNkXna8cpomHDpYpkR4cB2EM3io0ufOpqdem+7voXN9FA/066iVfbHblvqFI7L MDwtFvPqBtiUkji0umxNEw+qwqUq38r7M7bTWNZD4uqRoIb2lXhLO0JEPeS3UgJTj30s M1gM7LcfG4gwEZcZ6Qmw+aTDUh0OzUxAv2XfJdVC7pqmn+vyy5S82AW3eyOFqap4CCFS UaCw== X-Received: by 10.14.212.69 with SMTP id x45mr22326347eeo.69.1387277830893; Tue, 17 Dec 2013 02:57:10 -0800 (PST) Received: from [158.75.2.83] ([158.75.2.83]) by mx.google.com with ESMTPSA id 44sm51942708eek.5.2013.12.17.02.57.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 17 Dec 2013 02:57:10 -0800 (PST) User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 Newsgroups: gmane.comp.version-control.git In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Martin Langhoff wrote: > On Wed, Dec 11, 2013 at 11:26 PM, Eric S. Raymond w= rote: >> You'll have to remind me what you mean by "incremental" here. Possib= ly >> it's something cvs-fast-export could support. > > User can > > - run a cvs to git import at time T, resulting in repo G > - make commits to cvs repo > - run cvs to git import at time T1, pointed to G, and the import to= ol > > will only add the new commits found in cvs between T and T1. I wonder if we can add support for incremental import once, for all VCS supporting fast-export, in one place, namely at the remote-helper. I don't know details, so I don't know if it is possible; certainly unstable fast-export output would be a problem, unless some tricks are used (like remembering mappings between versions). --=20 Jakub Nar=C4=99bski From mboxrd@z Thu Jan 1 00:00:00 1970 From: Johan Herland Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 12:18:28 +0100 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Martin Langhoff , Eric Raymond , Git Mailing List To: =?UTF-8?Q?Jakub_Nar=C4=99bski?= X-From: git-owner@vger.kernel.org Tue Dec 17 12:18:44 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vssfh-0005wA-AK for gcvg-git-2@plane.gmane.org; Tue, 17 Dec 2013 12:18:41 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751430Ab3LQLSg convert rfc822-to-quoted-printable (ORCPT ); Tue, 17 Dec 2013 06:18:36 -0500 Received: from mail12.copyleft.no ([188.94.218.224]:36462 "EHLO mail12.copyleft.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751008Ab3LQLSf convert rfc822-to-8bit (ORCPT ); Tue, 17 Dec 2013 06:18:35 -0500 Received: from locusts.copyleft.no ([188.94.218.116] helo=mail.mailgateway.no) by mail12.copyleft.no with esmtp (Exim 4.76) (envelope-from ) id 1VssfY-0004WO-Qt for git@vger.kernel.org; Tue, 17 Dec 2013 12:18:32 +0100 Received: from mail-pb0-f54.google.com ([209.85.160.54]) by mail.mailgateway.no with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1VssfY-000Jkt-FF for git@vger.kernel.org; Tue, 17 Dec 2013 12:18:32 +0100 Received: by mail-pb0-f54.google.com with SMTP id un15so6888493pbc.27 for ; Tue, 17 Dec 2013 03:18:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=eqHDmkRQplbncwG5gGhUujgx9sA3oSc7nuT6BjXWBHs=; b=TGAGJAIZuVll86tX3cSAfE611nbVLEjn2qzh/rLbFOws4oDB33jekfYpw5Z9ssFcrj d5xfousvpNpsB+k+mPA1yqwK/YR1I1QbiJWQsqh7rshhp2t+IHUKCGXxxuVmHXZY4snD +7xF9xwDVZMgME4dm/Wb9xGpA9z9efvtXqfQJzYWRQlk76SObtfyzAfu67naG2tlEBOz xHeKWn1xlIq3sYm24p7MUe51BUZcKXxSm2z1wjt6XNZxBRMj3oA49w3ot4PeCv/xghZ7 jY5ci0t2HgEAHzpbQXNRPtGYjDa5HG0qdhY/M80l7kEL8QJ1S+oFl4Kqpu7qvDQCcb68 s46Q== X-Received: by 10.68.236.35 with SMTP id ur3mr20613250pbc.137.1387279108491; Tue, 17 Dec 2013 03:18:28 -0800 (PST) Received: by 10.70.24.226 with HTTP; Tue, 17 Dec 2013 03:18:28 -0800 (PST) In-Reply-To: <52B02DFF.5010408@gmail.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, Dec 17, 2013 at 11:57 AM, Jakub Nar=C4=99bski wrote: > Martin Langhoff wrote: > >> On Wed, Dec 11, 2013 at 11:26 PM, Eric S. Raymond = wrote: >>> >>> You'll have to remind me what you mean by "incremental" here. Possi= bly >>> it's something cvs-fast-export could support. >> >> >> User can >> >> - run a cvs to git import at time T, resulting in repo G >> - make commits to cvs repo >> - run cvs to git import at time T1, pointed to G, and the import t= ool > >> >> >> will only add the new commits found in cvs between T and T1. > > > I wonder if we can add support for incremental import once, for all > VCS supporting fast-export, in one place, namely at the remote-helper= =2E > > I don't know details, so I don't know if it is possible; certainly > unstable fast-export output would be a problem, unless some tricks > are used (like remembering mappings between versions). You could do this by mapping some CVS revision identifier (like a hash over the file:revision pairs if nothing better is available), and that would be useful when trying to match up the git commit from a later import against the existing commits from an earlier import. HOWEVER, this only solves the "cheap" half of the problem. The reason people want incremental CVS import, is to avoid having to repeatedly convert the ENTIRE CVS history. This means that the CVS exporter must learn to start from a given point in the CVS history (identified by the above mapping) and then quickly and efficiently convert only the "new stuff" without having to consult/convert the rest of the CVS history. THIS is the hard part of incremental import. And it is much harder for systems like CVS - where the starting point has a broken concept of history... =2E..Johan --=20 Johan Herland, www.herland.net From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 09:07:46 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131217140746.GB15010@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Martin Langhoff , Git Mailing List To: Jakub =?utf-8?B?TmFyxJlic2tp?= X-From: git-owner@vger.kernel.org Tue Dec 17 15:07:53 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VsvJQ-0006yM-Bm for gcvg-git-2@plane.gmane.org; Tue, 17 Dec 2013 15:07:53 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753896Ab3LQOHs convert rfc822-to-quoted-printable (ORCPT ); Tue, 17 Dec 2013 09:07:48 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:60444 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753361Ab3LQOHs (ORCPT ); Tue, 17 Dec 2013 09:07:48 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id B5889380558; Tue, 17 Dec 2013 09:07:46 -0500 (EST) Content-Disposition: inline In-Reply-To: <52B02DFF.5010408@gmail.com> X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Jakub Nar=C4=99bski : > I wonder if we can add support for incremental import once, for all > VCS supporting fast-export, in one place, namely at the remote-helper= =2E Something in the pipeline - either the helper or the exporter - needs t= o have an equivalent of vc-fast-export's and cvsps's -i option, which omits all commits before a specified time and generates cookies like "from refs/heads/master^0" before each branch root in the incremental dump. This could be done in the wrapper, but only if the wrapper itself includes an import-stream parser, interprets the output from the exporter program, and re-emits it. Having done similar things myself in reposurgeon, I advise against this strategy; it would introduce a level of complexity to the wrapper that doesn't belong there, and make the exporter+wrapper comnination harder to verify. =46ortunately, incremental dump is trivial to implement in the output stage of an exporter if you have access to the exporter source code. I've done it in two different exporters. cvs-fast-export now has a regression test for this case > I don't know details, so I don't know if it is possible; certainly > unstable fast-export output would be a problem, unless some tricks > are used (like remembering mappings between versions). About such tricks I can only say "That way lies madness". The present Perl wrapper is buggy because it's over-complex. The replacement wrapp= er should do *less*, not more. Stable output and incremental dump are reasonable things to demand of your supported exporters. cvs-fast-export has incremental dump unconditionally, and stability relative to every CVS implementation since 2004. --=20 Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 09:58:09 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131217145809.GC15010@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jakub =?utf-8?B?TmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: Johan Herland X-From: git-owner@vger.kernel.org Tue Dec 17 15:58:18 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vsw6B-0008I9-8Y for gcvg-git-2@plane.gmane.org; Tue, 17 Dec 2013 15:58:15 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753625Ab3LQO6L (ORCPT ); Tue, 17 Dec 2013 09:58:11 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:60901 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752340Ab3LQO6K (ORCPT ); Tue, 17 Dec 2013 09:58:10 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 72927380868; Tue, 17 Dec 2013 09:58:09 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Johan Herland : > HOWEVER, this only solves the "cheap" half of the problem. The reason > people want incremental CVS import, is to avoid having to repeatedly > convert the ENTIRE CVS history. This means that the CVS exporter must > learn to start from a given point in the CVS history (identified by > the above mapping) and then quickly and efficiently convert only the > "new stuff" without having to consult/convert the rest of the CVS > history. THIS is the hard part of incremental import. And it is much > harder for systems like CVS - where the starting point has a broken > concept of history... I know of *no* importer that solves what you call the "deep" part of the problem. cvsps didn't, cvs-fast-import doesn't, cvs2git doesn't. All take the easy way out; parse the entire history, and limit what is emitted in the output stage. Actually, given what I know about delta-file parsing I'd say a "true" incremental CVS exporter would be so hard that it's really not worth the bother. The problem is the delta-based history representation. Trying to interpret that without building a complete set of history states in the process (which is most of the work a whole-history exporter does) would be brutally difficult - barely possible in principle maybe, but I wouldn't care to try it. It's much more practical to tune up a whole-history exporter so it's acceptably fast, then do incremental dumping by suppressing part of the conversion in the output stage. cvs-fast-export's benchmark repo is the history of GNU troff. That's 3057 commits in 1549 master files; when I reran it just now the whole-history conversion took 49 seconds. That's 3.7K commits a minute, which is plenty fast enough for anything smaller than (say) one of the *BSD repositories. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: Johan Herland Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 18:52:09 +0100 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: =?UTF-8?Q?Jakub_Nar=C4=99bski?= , Martin Langhoff , Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Tue Dec 17 18:52:22 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vsyof-0000c7-Ps for gcvg-git-2@plane.gmane.org; Tue, 17 Dec 2013 18:52:22 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755049Ab3LQRwR (ORCPT ); Tue, 17 Dec 2013 12:52:17 -0500 Received: from mail12.copyleft.no ([188.94.218.224]:37586 "EHLO mail12.copyleft.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752340Ab3LQRwQ (ORCPT ); Tue, 17 Dec 2013 12:52:16 -0500 Received: from locusts.copyleft.no ([188.94.218.116] helo=mail.mailgateway.no) by mail12.copyleft.no with esmtp (Exim 4.76) (envelope-from ) id 1VsyoY-0006k9-EO for git@vger.kernel.org; Tue, 17 Dec 2013 18:52:14 +0100 Received: from mail-pb0-f54.google.com ([209.85.160.54]) by mail.mailgateway.no with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1VsyoX-0005LN-UI for git@vger.kernel.org; Tue, 17 Dec 2013 18:52:14 +0100 Received: by mail-pb0-f54.google.com with SMTP id un15so7298785pbc.13 for ; Tue, 17 Dec 2013 09:52:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Qua09iePrrJKduPwrYHQToPlEyHAoXI6V0jXJGxQoMM=; b=EEsN2VAXfSy6MYddi7QskvdP8qG3Splaj4/jDBL20KTB1DcJOXe+AXWWnfTLZaNmN6 bPhJm8WdXaTYXFTUHiq6higg8TQsvhb0q9SbVqLxI0EPfA738m2aitYCVtK4ksqsMW/h WSU8LlqxKkiACiJx4lzTb8b9n+5axkGUFzgPpGpfjWkGlcwZWZ7L37b1geeXyT5esghv 0raw2oMzoLxAEhfQJnnFwy+l8Pm3stoV6uJniJ0rWGcAa9T28xrY2pFTvhE5Qc2eK3BJ bKr4lywORyuKOXf67MSteyUjlzUqEizQlqpS0/LrUjAVCNHwNMlgM3tLVlDjscktWdSF leAw== X-Received: by 10.68.190.103 with SMTP id gp7mr29025906pbc.74.1387302729947; Tue, 17 Dec 2013 09:52:09 -0800 (PST) Received: by 10.70.24.226 with HTTP; Tue, 17 Dec 2013 09:52:09 -0800 (PST) In-Reply-To: <20131217145809.GC15010@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, Dec 17, 2013 at 3:58 PM, Eric S. Raymond wrote: > Johan Herland : >> HOWEVER, this only solves the "cheap" half of the problem. The reason >> people want incremental CVS import, is to avoid having to repeatedly >> convert the ENTIRE CVS history. This means that the CVS exporter must >> learn to start from a given point in the CVS history (identified by >> the above mapping) and then quickly and efficiently convert only the >> "new stuff" without having to consult/convert the rest of the CVS >> history. THIS is the hard part of incremental import. And it is much >> harder for systems like CVS - where the starting point has a broken >> concept of history... > > I know of *no* importer that solves what you call the "deep" part of > the problem. cvsps didn't, cvs-fast-import doesn't, cvs2git doesn't. > All take the easy way out; parse the entire history, and limit what > is emitted in the output stage. Yes, and starting from a non-incremental importer, that's probably the only viable way to approach incrementalism. > Actually, given what I know about delta-file parsing I'd say a "true" > incremental CVS exporter would be so hard that it's really not worth the > bother. The problem is the delta-based history representation. > Trying to interpret that without building a complete set of history > states in the process (which is most of the work a whole-history > exporter does) would be brutally difficult - barely possible in > principle maybe, but I wouldn't care to try it. Agreed, you would either have to re-parse the entire ,v-file, or you would have to store some (probably a lot of) intermediate state that would allow you to resolve deltas of new revisions without having to parse all the old revisions. > It's much more practical to tune up a whole-history exporter so it's > acceptably fast, then do incremental dumping by suppressing part of > the conversion in the output stage. > > cvs-fast-export's benchmark repo is the history of GNU troff. That's > 3057 commits in 1549 master files; when I reran it just now the > whole-history conversion took 49 seconds. That's 3.7K commits a > minute, which is plenty fast enough for anything smaller than (say) > one of the *BSD repositories. Those are impressive numbers, and in that scenario, using a "repurposed" converter (i.e. whole-history converter that has been taught to do incremental output) is undoubtedly the best solution. However, I fear that you underestimate the number of users that want to use Git against CVS repos that are orders of magnitude larger (in both dimensions: #commits and #files) than your example repo. For these repos, running a proper whole-history conversion takes hours - or even days - and working incrementally on top of that is simply out of the question. Obviously, they still need the whole-history converter for the future point in time when they have collected enough motivation/buy-in to migrate the entire project/company to a better VCS, but until then, they want to use Git locally, while enduring CVS on the server. At my previous $DAYJOB, I was one of those people, and I ended up with a two-pronged "solution" to the problem (this is ~5 years ago now, so I'm somewhat fuzzy on the details): 1. Adopt an ad hoc incremental approach for working against the CVS server: Keep a CVS checkout next to my git repo. and maintain a map between corresponding states/commits in CVS and git. When I update from CVS, apply the corresponding patch to the "cvs" branch in my git repo. Rebase my git-based work on top of that, and use "git cvsexportcommit" to propagate my Git work back to CVS. This is crude and hacky as hell, but it provides me a local git-based workflow. 2. Start convincing fellow developers and lobby management about switching away from CVS. We got a discussion started, gained momentum, and eventually I got to spend most of my time preparing and performing the full-history conversion from CVS to git. This happened mostly before cvs2svn grew its cvs2git sibling, so I ended up writing a custom converter for our particular variation of insane and demented CVS practices. Today, I would probably have gone for cvs2git, or your more recent work. But back to my main point: I believe there are two classes of CVS converters, and I have slowly come to believe that they solve two fundamentally different problems. The first problem is "how to faithfully recreate the project history in a different VCS", which is solved by the full-history converters. Case closed. The second problem is somewhat harder to define, but I'll try: "how to allow me to work productively against a CVS server, without having to deal with the icky CVS bits". Compared to the first problem, the parameters differ somewhet: - Conversion/synchronization time must be short to allow me to stay productive and up-to-date with my colleagues. - Correctness of "current state" is very important. I must be sure that my git working tree is identical to its CVS counterpart, so that my git changes can be reproduced in CVS as faithfully as possible. - Correctness of "history" is less important. I can accept a messy/incorrect Git history, since I can always query the CVS server for the "correct" history (whatever that means in a CVS context...). - As a generic CVS user (not the CVS admin) I don't necessarily have direct access to the ,v files stored on the CVS server. Although a full-history converter with fairly stable output can be made to support this second problem for repos up to a certain size, there will probably still be users that want to work incrementally against much bigger repos, and I don't think _any_ full-history-gone-incremental importer will be able to support the biggest repos. Consequently I believe that for these big repos it is _impossible_ to get both fast incremental workflows and a high degree of (historical) correctness. cvsps tried to be all of the above, and failed badly at the correctness criteria. Therefore I support your decision to "shoot it through the head". I certainly also support any work towards making a full-history converter work in an incremental manner, as it will be immensely useful for smaller CVS repos. But at the same time we should realize that it won't be a solution for incrementally working against _large_ CVS repos. Although it should have been made obvious a long time ago, the removal of cvsps has now made it abundantly clear that Git currently provides no way to support the incremental workflow against large CVS repos. Maybe that is ok, and we can ignore that, waiting for the few remaining large CVS repos to die? Or maybe we need a new effort to fill this niche? Something that is NOT based on a full-history converter, and does NOT try to guarantee a history-correct conversion, but that DOES try to guarantee fast and relatively worry-free two-way synchronization against a CVS server. Unfortunately (or fortunately, depending on POV) I have not had to touch CVS in a long while, and I don't see that changing soon, so it is not my itch to scratch. ...Johan -- Johan Herland, www.herland.net From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 13:47:24 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131217184724.GA17709@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jakub =?utf-8?B?TmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: Johan Herland X-From: git-owner@vger.kernel.org Tue Dec 17 19:47:37 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vszg8-0005zF-In for gcvg-git-2@plane.gmane.org; Tue, 17 Dec 2013 19:47:36 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932403Ab3LQSr2 (ORCPT ); Tue, 17 Dec 2013 13:47:28 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:34793 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932332Ab3LQSr0 (ORCPT ); Tue, 17 Dec 2013 13:47:26 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id CEBD2380868; Tue, 17 Dec 2013 13:47:24 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Johan Herland : > However, I fear that you underestimate the number of users that want > to use Git against CVS repos that are orders of magnitude larger (in > both dimensions: #commits and #files) than your example repo. You may be right. See below... I'm working with Alan Barret now on trying to convert the NetBSD repositories. They break cvs-fast-export through sheer bulk of metadata, by running the machine out of core. This is exactly the kind of huge case that you're talking about. Alan and I are going to take a good hard whack at modifying cvs-fast-export to make this work. Because there really aren't any feasible alternatives. The analysis code in cvsps was never good enough. cvs2git, being written in Python, would hit the core limit faster than anything written in C. > Although a full-history converter with fairly stable output can be > made to support this second problem for repos up to a certain size, > there will probably still be users that want to work incrementally > against much bigger repos, and I don't think _any_ > full-history-gone-incremental importer will be able to support the > biggest repos. > > Consequently I believe that for these big repos it is _impossible_ to > get both fast incremental workflows and a high degree of (historical) > correctness. > > cvsps tried to be all of the above, and failed badly at the > correctness criteria. Therefore I support your decision to "shoot it > through the head". I certainly also support any work towards making a > full-history converter work in an incremental manner, as it will be > immensely useful for smaller CVS repos. But at the same time we should > realize that it won't be a solution for incrementally working against > _large_ CVS repos. It is certainly the case that a sufficiently large CVS repo will break anything, like a star with a mass over the Chandrasekhar limit becoming a black hole :-) The question is how common such supermassive cases are. My own guess is that the *BSD repos and a handful of the oldest GNU projects are pretty much the whole set; everybody else converted to Subversion within the last decade. > Although it should have been made obvious a long time ago, the removal > of cvsps has now made it abundantly clear that Git currently provides > no way to support the incremental workflow against large CVS repos. > Maybe that is ok, and we can ignore that, waiting for the few > remaining large CVS repos to die? Or maybe we need a new effort to > fill this niche? Something that is NOT based on a full-history > converter, and does NOT try to guarantee a history-correct conversion, > but that DOES try to guarantee fast and relatively worry-free two-way > synchronization against a CVS server. Unfortunately (or fortunately, > depending on POV) I have not had to touch CVS in a long while, and I > don't see that changing soon, so it is not my itch to scratch. Nor mine. I find the very idea of writing anything that encourages non-history-correct conversions disturbing and want no part of it. Which matters, because right now the set of people working on CVS lifters begins with me and ends with Michael Rafferty (cvs2git), who seems even less interested in incremental conversion than I am. Unless somebody comes out of nowhere and wants to own that problem, it's not going to get solved. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Jakub_Nar=C4=99bski?= Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 20:58:18 +0100 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Martin Langhoff , Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Tue Dec 17 20:59:07 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vt0nK-0008IR-5M for gcvg-git-2@plane.gmane.org; Tue, 17 Dec 2013 20:59:06 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752245Ab3LQT7B convert rfc822-to-quoted-printable (ORCPT ); Tue, 17 Dec 2013 14:59:01 -0500 Received: from mail-wi0-f176.google.com ([209.85.212.176]:49390 "EHLO mail-wi0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751976Ab3LQT7A convert rfc822-to-8bit (ORCPT ); Tue, 17 Dec 2013 14:59:00 -0500 Received: by mail-wi0-f176.google.com with SMTP id hq4so4285371wib.9 for ; Tue, 17 Dec 2013 11:58:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=PnP0c2o8tCPfd0y75JsXUMYhthrsVChVT3MLSxB739M=; b=E6AbDsLiJRbUxOK7btvHA/p5x3A2j2TbWD1v1IIlI/Rs1iM5+xMn6ymQVdoxAZP8QR QPkWYQj8PMlADXhocwnuPc7o4Yis6zi3TkFFXNJkF+HPC8ijdF5ViqrdeDbCiRx+MQ0j r8C10r1t8EGEJ+1fKtrle5h/gSWfkKiorsHzYz+5a0cqQ708dNjLOGg+wRXn4ppTrVqe XjN3OkvJP/PdO6WDtEuH5oQRcPOlx27/K/8nREslMcYGIm2tKkrfkTNk/QsTgAglwB77 gogA5VeWaL+6cYV9KCC3msQiSeYGq6xgHO1YZ0HCYGcG1hftXIbukYslEe+u/6s3KXG+ O0VA== X-Received: by 10.180.205.205 with SMTP id li13mr4928856wic.12.1387310339012; Tue, 17 Dec 2013 11:58:59 -0800 (PST) Received: by 10.227.86.201 with HTTP; Tue, 17 Dec 2013 11:58:18 -0800 (PST) In-Reply-To: <20131217140746.GB15010@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, Dec 17, 2013 at 3:07 PM, Eric S. Raymond wrot= e: > Jakub Nar=C4=99bski : >> I wonder if we can add support for incremental import once, for all >> VCS supporting fast-export, in one place, namely at the remote-helpe= r. > > Something in the pipeline - either the helper or the exporter - needs= to > have an equivalent of vc-fast-export's and cvsps's -i option, which > omits all commits before a specified time and generates cookies like > "from refs/heads/master^0" before each branch root in the incremental > dump. Errr... doesn't cvs-fast-export support --export-marks=3D to save progress and --import-marks=3D to continue incremental import? I *guess* that 'export' / 'import' capabilities-based remote helpers use 'export-marks ' / 'import-marks ' capability for increm= ental import, also known as "fetch", isn't it? But I might be mistaken, I don= 't know enough about remote helpers... I would check it in cvs-fast-export manpage, but the page seems to be down: http://isup.me/www.catb.org It's not just you! http://www.catb.org looks down from here. > This could be done in the wrapper, but only if the wrapper itself > includes an import-stream parser, interprets the output from the > exporter program, and re-emits it. Having done similar things > myself in reposurgeon, I advise against this strategy; it would > introduce a level of complexity to the wrapper that doesn't belong > there, and make the exporter+wrapper combination harder to verify. Right. > Fortunately, incremental dump is trivial to implement in the output > stage of an exporter if you have access to the exporter source code. > I've done it in two different exporters. cvs-fast-export now has a > regression test for this case This is I guess assuming that information from later commits doesn't change guesses about shape of history from earlier commits... --=20 Jakub Narebski From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 16:02:55 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131217210255.GA18217@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Martin Langhoff , Git Mailing List To: Jakub =?utf-8?B?TmFyxJlic2tp?= X-From: git-owner@vger.kernel.org Tue Dec 17 22:03:01 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vt1nA-0004CX-UW for gcvg-git-2@plane.gmane.org; Tue, 17 Dec 2013 22:03:01 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752628Ab3LQVC4 convert rfc822-to-quoted-printable (ORCPT ); Tue, 17 Dec 2013 16:02:56 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:35951 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751732Ab3LQVC4 (ORCPT ); Tue, 17 Dec 2013 16:02:56 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 3A539380868; Tue, 17 Dec 2013 16:02:55 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Jakub Nar=C4=99bski : > Errr... doesn't cvs-fast-export support --export-marks=3D to sa= ve > progress and --import-marks=3D to continue incremental import? No, cvs-fast-export does not have --export-marks. It doesn't generate t= he SHA1s that would require. Even if it did, it's not clear how that would= help. > I would check it in cvs-fast-export manpage, but the page seems to > be down: >=20 > http://isup.me/www.catb.org >=20 > It's not just you! http://www.catb.org looks down from here. Confirmed. Looks like ibiblio is having a bad day. I'll file a bug re= port.=20 > > Fortunately, incremental dump is trivial to implement in the output > > stage of an exporter if you have access to the exporter source code= =2E > > I've done it in two different exporters. cvs-fast-export now has a > > regression test for this case >=20 > This is I guess assuming that information from later commits doesn't > change guesses about shape of history from earlier commits... That's the "stability" property that Martin Langhoff and I were discuss= ing earlier. cvs-fast-export conversions are stable under incremental lifting providing a commitid-generating version of CVS is in use during each increment. Portions of the history *before the first lift* may lack commitids and will nevertheless remain stable through the whole process. All versions of CVS have generated commitids since 2004. --=20 Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: Johan Herland Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 22:26:57 +0100 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> <20131217184724.GA17709@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: =?UTF-8?Q?Jakub_Nar=C4=99bski?= , Martin Langhoff , Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Tue Dec 17 22:27:12 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vt2AY-0007sd-Mu for gcvg-git-2@plane.gmane.org; Tue, 17 Dec 2013 22:27:11 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752472Ab3LQV1G (ORCPT ); Tue, 17 Dec 2013 16:27:06 -0500 Received: from mail12.copyleft.no ([188.94.218.224]:38088 "EHLO mail12.copyleft.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751706Ab3LQV1E (ORCPT ); Tue, 17 Dec 2013 16:27:04 -0500 Received: from locusts.copyleft.no ([188.94.218.116] helo=mail.mailgateway.no) by mail12.copyleft.no with esmtp (Exim 4.76) (envelope-from ) id 1Vt2AP-0000AD-QV for git@vger.kernel.org; Tue, 17 Dec 2013 22:27:01 +0100 Received: from mail-pa0-f44.google.com ([209.85.220.44]) by mail.mailgateway.no with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1Vt2AP-0009dP-H2 for git@vger.kernel.org; Tue, 17 Dec 2013 22:27:01 +0100 Received: by mail-pa0-f44.google.com with SMTP id fa1so5041043pad.3 for ; Tue, 17 Dec 2013 13:26:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=zQWFhz5hG68esQBcwXWTX8+uqhLsT85Cl4h6ItaulVg=; b=f9Tm24Pe3Uin8TJbPMH14IQCFSzqx/iUTGKt+CDtvUiVziqUORzrsKg4KZPMFaeJHE lKFWVs21M/nf4W463Iu1mGKYpSOAHdxJS6gdr6aw+jX0MyA1l9hZXbgXcORd2Upskvnz 6h87B85aLdxiQ4+D8f3mn08zIYhGus/rNH+d6h7WVBBhoMcI4ZoD1ejBTPQRIi0ib/G6 SFtld9Imxz8N05nsNXvuZV0bmnl8pHZJIZ6Y+GDwOXvpw/Fn/mDXSNW7WPnZmo8U5WnF CXc0lMIZ6xL+v97acq6OJ+EPu4dGIIJAr9wS+hD75u598SmzO5sVe1gNUlCjHRabMOaO RL3Q== X-Received: by 10.68.212.37 with SMTP id nh5mr30015504pbc.16.1387315617476; Tue, 17 Dec 2013 13:26:57 -0800 (PST) Received: by 10.70.24.226 with HTTP; Tue, 17 Dec 2013 13:26:57 -0800 (PST) In-Reply-To: <20131217184724.GA17709@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, Dec 17, 2013 at 7:47 PM, Eric S. Raymond wrote: > I'm working with Alan Barret now on trying to convert the NetBSD > repositories. They break cvs-fast-export through sheer bulk of > metadata, by running the machine out of core. This is exactly > the kind of huge case that you're talking about. > > Alan and I are going to take a good hard whack at modifying cvs-fast-export > to make this work. Because there really aren't any feasible alternatives. > The analysis code in cvsps was never good enough. cvs2git, being written > in Python, would hit the core limit faster than anything written in C. Depends on how it organizes its data structures. Have you actually tried running cvs2git on it? I'm not saying you are wrong, but I had similar problems with my custom converter (also written in Python), and solved them by adding multiple passes/phases instead of trying to do too much work in fewer passes. In the end I ended up storing the largest inter-phase data structures outside of Python (sqlite in my case) to save memory. Obviously it cost a lot in runtime, but it meant that I could actually chew through our largest CVS modules without running out of memory. > It is certainly the case that a sufficiently large CVS repo will break > anything, like a star with a mass over the Chandrasekhar limit becoming a > black hole :-) :) True, although it's not the sheer size of the files themselves that is the actual problem. Most of those bytes are (deltified) file data, which you can pretty much stream through and convert to a corresponding fast-export stream of blob objects. The code for that should be fairly straightforward (and should also be eminently parallelizable, given enough cores and available I/O), resulting in a table mapping CVS file:revision pairs to corresponding Git blob SHA1s, and an accompanying (set of) packfile(s) holding said blobs. The hard part comes when trying to correlate the metadata for all the per-file revisions, and distill that into a consistent sequence/DAG of changesets/commits across the entire CVS repo. And then, of course, trying to fit all the branches and tags into that DAG of commits is what really drives you mad... ;-) > The question is how common such supermassive cases are. My own guess is that > the *BSD repos and a handful of the oldest GNU projects are pretty much the > whole set; everybody else converted to Subversion within the last decade. You may be right. At least for the open-source cases. I suspect there's still a considerable number of huge CVS repos within companies' walls... > I find the very idea of writing anything that encourages > non-history-correct conversions disturbing and want no part of it. > > Which matters, because right now the set of people working on CVS lifters > begins with me and ends with Michael Rafferty (cvs2git), s/Rafferty/Haggerty/? > who seems even > less interested in incremental conversion than I am. Unless somebody > comes out of nowhere and wants to own that problem, it's not going > to get solved. Agreed. It would be nice to have something to point to for people that want something similar to git-svn for CVS, but without a motivated owner, it won't happen. ...Johan -- Johan Herland, www.herland.net From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 17:41:36 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131217224136.GB19511@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> <20131217184724.GA17709@thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jakub =?utf-8?B?TmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: Johan Herland X-From: git-owner@vger.kernel.org Tue Dec 17 23:41:47 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vt3Kg-0006Yr-3w for gcvg-git-2@plane.gmane.org; Tue, 17 Dec 2013 23:41:42 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752020Ab3LQWli (ORCPT ); Tue, 17 Dec 2013 17:41:38 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:36688 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752216Ab3LQWlh (ORCPT ); Tue, 17 Dec 2013 17:41:37 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 316AE380868; Tue, 17 Dec 2013 17:41:36 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Johan Herland : > > Alan and I are going to take a good hard whack at modifying cvs-fast-export > > to make this work. Because there really aren't any feasible alternatives. > > The analysis code in cvsps was never good enough. cvs2git, being written > > in Python, would hit the core limit faster than anything written in C. > > Depends on how it organizes its data structures. Have you actually > tried running cvs2git on it? I'm not saying you are wrong, but I had > similar problems with my custom converter (also written in Python), > and solved them by adding multiple passes/phases instead of trying to > do too much work in fewer passes. In the end I ended up storing the > largest inter-phase data structures outside of Python (sqlite in my > case) to save memory. Obviously it cost a lot in runtime, but it meant > that I could actually chew through our largest CVS modules without > running out of memory. You make a good point. cvs2git is descended from cvs2svn, which has such a multipass organization - it will only have to avoid memory limits per pass. Alan and I will try that as a fallback if cvs-fast-import continues to choke. > > It is certainly the case that a sufficiently large CVS repo will break > > anything, like a star with a mass over the Chandrasekhar limit becoming a > > black hole :-) > > :) True, although it's not the sheer size of the files themselves that > is the actual problem. Most of those bytes are (deltified) file data, > which you can pretty much stream through and convert to a > corresponding fast-export stream of blob objects. The code for that > should be fairly straightforward (and should also be eminently > parallelizable, given enough cores and available I/O), resulting in a > table mapping CVS file:revision pairs to corresponding Git blob SHA1s, > and an accompanying (set of) packfile(s) holding said blobs. Allowing for the fact that cvs-fast-export isn't git and doesn't use SHA1s or packfiles, this is in fact how a large portion of cvs-fast-export works. The blob files get created during the walk through the master file list, before actual topo analysis is done. > The hard part comes when trying to correlate the metadata for all the > per-file revisions, and distill that into a consistent sequence/DAG of > changesets/commits across the entire CVS repo. And then, of course, > trying to fit all the branches and tags into that DAG of commits is > what really drives you mad... ;-) Well I know this...:-) > > The question is how common such supermassive cases are. My own guess is that > > the *BSD repos and a handful of the oldest GNU projects are pretty much the > > whole set; everybody else converted to Subversion within the last decade. > > You may be right. At least for the open-source cases. I suspect > there's still a considerable number of huge CVS repos within > companies' walls... If people with money want to hire me to slay those beasts, I'm available. I'm not proud, I'll use cvs2git if I have to. > > I find the very idea of writing anything that encourages > > non-history-correct conversions disturbing and want no part of it. > > > > Which matters, because right now the set of people working on CVS lifters > > begins with me and ends with Michael Rafferty (cvs2git), > > s/Rafferty/Haggerty/? Yup, I thinkoed. > > who seems even > > less interested in incremental conversion than I am. Unless somebody > > comes out of nowhere and wants to own that problem, it's not going > > to get solved. > > Agreed. It would be nice to have something to point to for people that > want something similar to git-svn for CVS, but without a motivated > owner, it won't happen. I think the fact that it hasn't happened already is a good clue that it's not going to. Given the decline curve of CVS usage, writing git-cvs might have looked like a decent investment of time once, but that era probably ended five to eight years ago. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Jakub_Nar=C4=99bski?= Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 01:02:04 +0100 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Martin Langhoff , Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Wed Dec 18 01:02:50 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vt4bC-0006yN-AN for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 01:02:50 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752093Ab3LRACq convert rfc822-to-quoted-printable (ORCPT ); Tue, 17 Dec 2013 19:02:46 -0500 Received: from mail-we0-f182.google.com ([74.125.82.182]:39617 "EHLO mail-we0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750966Ab3LRACp convert rfc822-to-8bit (ORCPT ); Tue, 17 Dec 2013 19:02:45 -0500 Received: by mail-we0-f182.google.com with SMTP id q59so6785251wes.41 for ; Tue, 17 Dec 2013 16:02:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=ChX+7AF91BrYPvZfXRgaK86UrDkzPIZJlwnOw21n8ew=; b=uzseBdui2wjo+O7vYBJrmJqvJ6VdDmWWalFUBvQEZt7tAsPDn/8HryUwyqyGn4FDDw eKqiGbA7c2oOsW2GfTodNfdYrHxOf068xI/CNRCptbQIiZAWidDdvGPAJK89RIxW8TZF wdFeaROQwKXM7HZcOZTioFX30x+RWPFVTfUljvGbjty9eCcUPLTRf+UuEUgKpa7Uk39u x+Qz+8jsu4bdZ3EQHfmhxSgmJG/HdTsebZ0uiJcJNVQsCGZd+WfuQZCTe2hUV8tUXzYk qSMast/jievFYcjZTsRrnxgGjelQV0PqKt1suEKiV+MICvlQkwyFsvPKobjirhWo9f2r OWHw== X-Received: by 10.180.198.43 with SMTP id iz11mr5825947wic.0.1387324964949; Tue, 17 Dec 2013 16:02:44 -0800 (PST) Received: by 10.227.86.201 with HTTP; Tue, 17 Dec 2013 16:02:04 -0800 (PST) In-Reply-To: <20131217210255.GA18217@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, Dec 17, 2013 at 10:02 PM, Eric S. Raymond wro= te: > Jakub Nar=C4=99bski : >> >> Errr... doesn't cvs-fast-export support --export-marks=3D to s= ave >> progress and --import-marks=3D to continue incremental import? > > No, cvs-fast-export does not have --export-marks. It doesn't generate= the > SHA1s that would require. Even if it did, it's not clear how that wou= ld help. I was thinking about how the following part of git-fast-export `--import-marks=3D` Any commits that have already been marked will not be exported again. If the backend uses a similar --import-marks file, this allows for in= cremental bidirectional exporting of the repository by keeping the marks the sa= me across runs. How cvs-fast-export know where to start exporting from in incremental m= ode? BTW. does cvs-fast-export support incremental *output*, or does it perform also incremental *work*? Anyway, that might mean that generic fast-import stream based increment= al (i.e. supporting proper thin fetch) remote helper is out of question, p= erhaps writing one for cvs / cvs-fe would bring incremental import from CVS to git? --=20 Jakub Narebski From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Schwab Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 01:04:34 +0100 Message-ID: <87vbynnhwt.fsf@igel.home> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain Cc: Jakub =?utf-8?Q?Nar=C4=99bski?= , Martin Langhoff , Git Mailing List To: esr@thyrsus.com X-From: git-owner@vger.kernel.org Wed Dec 18 01:04:44 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vt4d1-00005j-Hy for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 01:04:43 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751806Ab3LRAEj (ORCPT ); Tue, 17 Dec 2013 19:04:39 -0500 Received: from mail-out.m-online.net ([212.18.0.9]:41424 "EHLO mail-out.m-online.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750966Ab3LRAEi (ORCPT ); Tue, 17 Dec 2013 19:04:38 -0500 Received: from frontend1.mail.m-online.net (unknown [192.168.8.180]) by mail-out.m-online.net (Postfix) with ESMTP id 3dkc0N45x5z4KK4J; Wed, 18 Dec 2013 01:04:36 +0100 (CET) Received: from localhost (dynscan1.mnet-online.de [192.168.6.68]) by mail.m-online.net (Postfix) with ESMTP id 3dkc0N3zYxzbbgV; Wed, 18 Dec 2013 01:04:36 +0100 (CET) X-Virus-Scanned: amavisd-new at mnet-online.de Received: from mail.mnet-online.de ([192.168.8.180]) by localhost (dynscan1.mail.m-online.net [192.168.6.68]) (amavisd-new, port 10024) with ESMTP id M8w-DMg8OeeL; Wed, 18 Dec 2013 01:04:35 +0100 (CET) X-Auth-Info: 495En1KCinYKy9G13jKK3X2BFHToIpuMZOGscYsOj1A= Received: from igel.home (ppp-88-217-34-0.dynamic.mnet-online.de [88.217.34.0]) by mail.mnet-online.de (Postfix) with ESMTPA; Wed, 18 Dec 2013 01:04:35 +0100 (CET) Received: by igel.home (Postfix, from userid 1000) id C0F112C4436; Wed, 18 Dec 2013 01:04:34 +0100 (CET) X-Yow: Remember, in 2039, MOUSSE & PASTA will be available ONLY by prescription!! In-Reply-To: <20131217210255.GA18217@thyrsus.com> (Eric S. Raymond's message of "Tue, 17 Dec 2013 16:02:55 -0500") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: "Eric S. Raymond" writes: > All versions of CVS have generated commitids since 2004. Though older versions are still in use, eg. sourceware.org still does not generate commitids. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 19:21:22 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131218002122.GA20152@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Martin Langhoff , Git Mailing List To: Jakub =?utf-8?B?TmFyxJlic2tp?= X-From: git-owner@vger.kernel.org Wed Dec 18 01:21:29 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vt4tE-0005p9-BH for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 01:21:28 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751853Ab3LRAVY convert rfc822-to-quoted-printable (ORCPT ); Tue, 17 Dec 2013 19:21:24 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:37669 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751553Ab3LRAVX (ORCPT ); Tue, 17 Dec 2013 19:21:23 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 096BF380868; Tue, 17 Dec 2013 19:21:22 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Jakub Nar=C4=99bski : > > No, cvs-fast-export does not have --export-marks. It doesn't genera= te the > > SHA1s that would require. Even if it did, it's not clear how that w= ould help. >=20 > I was thinking about how the following part of git-fast-export > `--import-marks=3D` >=20 > Any commits that have already been marked will not be exported agai= n. > If the backend uses a similar --import-marks file, this allows for = incremental > bidirectional exporting of the repository by keeping the marks the = same > across runs. I understand that. But it's not relevant - cvs-fast-import doesn't know= about git SHA1s, and cannot. =20 > How cvs-fast-export know where to start exporting from in incremental= mode? You give it a cutoff date. This is the same way cvsps-2.x and 3.x worke= d, and it's what the cvsimport wrapper expects to pass down. > BTW. does cvs-fast-export support incremental *output*, or does it > perform also incremental *work*? As I tried to explain previously in my response to John Herland, it's incremental output only. There is *no* CVS exporter known to me, or him, that supports incremental work. That would be at best be impracti= cally difficult; given CVS's limitations it may be actually impossible. I wou= ldn't bet against impossible. > Anyway, that might mean that generic fast-import stream based increme= ntal > (i.e. supporting proper thin fetch) remote helper is out of question,= perhaps > writing one for cvs / cvs-fe would bring incremental import from CVS = to > git? Sorry, I don't understand that. --=20 Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Tue, 17 Dec 2013 19:25:45 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131218002545.GB20152@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> <87vbynnhwt.fsf@igel.home> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jakub =?utf-8?B?TmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: Andreas Schwab X-From: git-owner@vger.kernel.org Wed Dec 18 01:25:51 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vt4xS-00016T-Te for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 01:25:51 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751486Ab3LRAZq (ORCPT ); Tue, 17 Dec 2013 19:25:46 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:37699 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751440Ab3LRAZp (ORCPT ); Tue, 17 Dec 2013 19:25:45 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 5EF4B380868; Tue, 17 Dec 2013 19:25:45 -0500 (EST) Content-Disposition: inline In-Reply-To: <87vbynnhwt.fsf@igel.home> X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Andreas Schwab : > "Eric S. Raymond" writes: > > > All versions of CVS have generated commitids since 2004. > > Though older versions are still in use, eg. sourceware.org still does > not generate commitids. That is awful. Alas, there is not much anyone can do about stupidity that determined. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Jakub_Nar=C4=99bski?= Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 16:39:39 +0100 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> <20131218002122.GA20152@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Martin Langhoff , Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Wed Dec 18 16:40:59 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtJF2-0005qk-Ti for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 16:40:57 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755098Ab3LRPkx convert rfc822-to-quoted-printable (ORCPT ); Wed, 18 Dec 2013 10:40:53 -0500 Received: from mail-wi0-f180.google.com ([209.85.212.180]:57160 "EHLO mail-wi0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754774Ab3LRPkw convert rfc822-to-8bit (ORCPT ); Wed, 18 Dec 2013 10:40:52 -0500 Received: by mail-wi0-f180.google.com with SMTP id hm19so818404wib.1 for ; Wed, 18 Dec 2013 07:40:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=u5anfFFEvIil6avM53PJbQKDxYyPVjYtKeRTY101Juc=; b=nwqA8Ld98d7RlYqSRJWYOMS+65eupDr/QUtaBuBOyZa14yNWTYqOjNfXn0naa8SoLA +7w36QlgyR+wRV5H86KWgAOtgLYOfMemac0UJghEehX6mdLl7bFtfDAX7N8ljmpMuBQl 9dt6wRFlrkk1oZxxNUPsyQfB+uocoVugZR9P5UPmEAUVetwfk04aXIVC6luAGQWKbDUh plOcYgU2tduyqZssS2A41uqc+cbP/2FmNrdcAO1YZlnv1BdcfPWrVZ0O9dWAl5ZA14yW 6h96eykgJAOsiwJqQ2f+Uowt8DPYIg8c+eRAkdhgAd1/HlL6egsA36Ekpjm3yz04l7F5 WYkA== X-Received: by 10.180.205.205 with SMTP id li13mr8916342wic.12.1387381219099; Wed, 18 Dec 2013 07:40:19 -0800 (PST) Received: by 10.227.86.201 with HTTP; Wed, 18 Dec 2013 07:39:39 -0800 (PST) In-Reply-To: <20131218002122.GA20152@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Wed, Dec 18, 2013 at 1:21 AM, Eric S. Raymond wrot= e: > Jakub Nar=C4=99bski : >>> No, cvs-fast-export does not have --export-marks. It doesn't genera= te the >>> SHA1s that would require. Even if it did, it's not clear how that w= ould help. >> >> I was thinking about how the following part of git-fast-export >> `--import-marks=3D` >> >> Any commits that have already been marked will not be exported aga= in. >> If the backend uses a similar --import-marks file, this allows for= incremental >> bidirectional exporting of the repository by keeping the marks the= same >> across runs. > > I understand that. But it's not relevant - cvs-fast-import doesn't kn= ow about > git SHA1s, and cannot. It is a bit strange that markfile has explicitly SHA-1 (":markid "), instead of generic reference to commit, in the case of CVS it would be commitid (what to do for older repositories, though?), in case of Bazaa= r its revision id (GUID), etc. Can we assume that SCM v1 fast-export and SCM v2 fast-import markfile uses compatibile commit names in markfile? >> How cvs-fast-export know where to start exporting from in incrementa= l mode? > > You give it a cutoff date. This is the same way cvsps-2.x and 3.x wor= ked, > and it's what the cvsimport wrapper expects to pass down. Nice to know. I think it would be possible for remote-helper for cvs-fast-export to f= ind this cutoff date automatically (perhaps with some safety margin), for fetching (incremental import). >> BTW. does cvs-fast-export support incremental *output*, or does it >> perform also incremental *work*? > > As I tried to explain previously in my response to John Herland, it's > incremental output only. There is *no* CVS exporter known to me, or > him, that supports incremental work. That would be at best be imprac= tically > difficult; given CVS's limitations it may be actually impossible. I w= ouldn't > bet against impossible. Even with saving (or re-calculating from git import) guesses about CVS history made so far? Anyway I hope that incremental CVS import would be needed less and less as CVS is replaced by any more modern version control system. >> Anyway, that might mean that generic fast-import stream based increm= ental >> (i.e. supporting proper thin fetch) remote helper is out of question= , perhaps >> writing one for cvs / cvs-fe would bring incremental import from CVS= to >> git? > > Sorry, I don't understand that. I was thinking about creating remote-helper for cvs-fast-export, so tha= t git can use local CVS repository as "remote", using e.g. "cvsroot::" as repo URL, and using this mechanism for incremental import (aka fetch= ). (Or even "cvssync::" for automatic cvssync + cvs-fast-export). But from what I understand this is not as easy as it seems, even with remote-helper API having support for fast-import stream. --=20 Jakub Nar=C4=99bski From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jonathan Nieder Subject: incremental fast-import and marks (Re: I have end-of-lifed cvsps) Date: Wed, 18 Dec 2013 08:23:34 -0800 Message-ID: <20131218162239.GA26668@google.com> References: <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> <20131218002122.GA20152@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Eric Raymond , Martin Langhoff , Git Mailing List To: Jakub Narebski X-From: git-owner@vger.kernel.org Wed Dec 18 17:23:50 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtJuW-0004O9-3i for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 17:23:48 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754754Ab3LRQXo (ORCPT ); Wed, 18 Dec 2013 11:23:44 -0500 Received: from mail-yh0-f42.google.com ([209.85.213.42]:40355 "EHLO mail-yh0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754713Ab3LRQXn (ORCPT ); Wed, 18 Dec 2013 11:23:43 -0500 Received: by mail-yh0-f42.google.com with SMTP id z6so5391086yhz.15 for ; Wed, 18 Dec 2013 08:23:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=i8wxVypVK6ysVhYR56V/0Hq2IovSSB8T4wKZ00Ncdfk=; b=FzR2D/CN2V9kFY6OcorACaaePN+YDnye4+9Hso7FZrUjczlWG2SN5nkOVcwsp77v6q hkt6Y8U7/d53vuFcMxuVeDnR+zWeuPX1uG+L4nz5FkRvNNq5OyTlWeDGCurRGRJlk6vx ke0BlbmWxNOLSNbwc7kS4gcDBfpPSakNxUzgQoU9J8i/s6M9OJVbLVn7Y1/a00vMc6VZ nzHIFmmf7EkY4UmYXi44pNG8B9jXgPSQvi+IYx+auXnhwQ8XtuV+YEPsourzpUL2SOyG 2maihhh3ntEAK05axCiD6KXU9ygdZ5qRy7GXan50OAKb0HsiIVdyWT8bQ+f8Fh0OUHSc NVgg== X-Received: by 10.236.28.162 with SMTP id g22mr23228222yha.52.1387383822602; Wed, 18 Dec 2013 08:23:42 -0800 (PST) Received: from google.com ([2620:0:1000:5b00:b6b5:2fff:fec3:b50d]) by mx.google.com with ESMTPSA id r98sm956482yhp.3.2013.12.18.08.23.40 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Wed, 18 Dec 2013 08:23:41 -0800 (PST) Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Jakub Narebski wrote: > It is a bit strange that markfile has explicitly SHA-1 (":markid "), > instead of generic reference to commit, in the case of CVS it would be > commitid (what to do for older repositories, though?), in case of Bazaar > its revision id (GUID), etc. Usually importers use at least two separate files to save state, one mapping between git object names and mark numbers, and the other mapping between native revision identifiers and mark numbers. That way, when the importer uses marks to refer to previously imported commits or blobs, fast-import knows what commits or blobs it is talking about. From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 11:27:10 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131218162710.GA3573@thyrsus.com> References: <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> <20131218002122.GA20152@thyrsus.com> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Martin Langhoff , Git Mailing List To: Jakub =?utf-8?B?TmFyxJlic2tp?= X-From: git-owner@vger.kernel.org Wed Dec 18 17:27:16 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtJxs-00007n-6i for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 17:27:16 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753243Ab3LRQ1M convert rfc822-to-quoted-printable (ORCPT ); Wed, 18 Dec 2013 11:27:12 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:44900 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751896Ab3LRQ1M (ORCPT ); Wed, 18 Dec 2013 11:27:12 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 457FE380488; Wed, 18 Dec 2013 11:27:10 -0500 (EST) Content-Disposition: inline In-Reply-To: X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Jakub Nar=C4=99bski : > It is a bit strange that markfile has explicitly SHA-1 (":markid "), > instead of generic reference to commit, in the case of CVS it would b= e > commitid (what to do for older repositories, though?), in case of Baz= aar > its revision id (GUID), etc. Can we assume that SCM v1 fast-export a= nd > SCM v2 fast-import markfile uses compatibile commit names in markfile= ? =46or use in reposurgeon I have defined a generic cross-VCS reference t= o commit I call an "action stamp"; it consists of an RFC3339 date followe= d by=20 a committer email address. Here's an example: 2013-02-06T09:35:10Z!esr@thyrsus.com In any VCS with changesets (git, Subversion, bzr, Mercurial) this almost always suffices to uniquely identify a commit. The "almost" is because in these systems it is possible for a user to do multiple commi= ts in the same second. And now you know why I wish git had subsecond timestamp resolution! If= it did, uniqueness of these in a git stream could be guaranteed. The implied model completely breaks for CVS, of course. There you have= to=20 use commitids and plain give up when those don't exist. =20 > I think it would be possible for remote-helper for cvs-fast-export to= find > this cutoff date automatically (perhaps with some safety margin), for > fetching (incremental import). Yes. =20 > > As I tried to explain previously in my response to John Herland, it= 's > > incremental output only. There is *no* CVS exporter known to me, o= r > > him, that supports incremental work. That would be at best be impr= actically > > difficult; given CVS's limitations it may be actually impossible. I= wouldn't > > bet against impossible. >=20 > Even with saving (or re-calculating from git import) guesses about CV= S > history made so far? Even with that. cvsps-2.x tried to do something like this. It was a l= ose. =20 > Anyway I hope that incremental CVS import would be needed less > and less as CVS is replaced by any more modern version control system= =2E I agree. I have never understood why people on this list are attached = to it. > I was thinking about creating remote-helper for cvs-fast-export, so t= hat > git can use local CVS repository as "remote", using e.g. "cvsroot::" > as repo URL, and using this mechanism for incremental import (aka fet= ch). > (Or even "cvssync::" for automatic cvssync + cvs-fast-export). >=20 > But from what I understand this is not as easy as it seems, even with > remote-helper API having support for fast-import stream. It's a swamp I wouldn't want to walk into. --=20 Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Langhoff Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 11:53:47 -0500 Message-ID: References: <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> <20131218002122.GA20152@thyrsus.com> <20131218162710.GA3573@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: =?UTF-8?Q?Jakub_Nar=C4=99bski?= , Git Mailing List To: Eric Raymond X-From: git-owner@vger.kernel.org Wed Dec 18 17:54:25 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtKO8-00064L-91 for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 17:54:24 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755433Ab3LRQyM (ORCPT ); Wed, 18 Dec 2013 11:54:12 -0500 Received: from mail-wg0-f53.google.com ([74.125.82.53]:62753 "EHLO mail-wg0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752031Ab3LRQyK (ORCPT ); Wed, 18 Dec 2013 11:54:10 -0500 Received: by mail-wg0-f53.google.com with SMTP id k14so7821258wgh.20 for ; Wed, 18 Dec 2013 08:54:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=WdxbeEqmAEpMpvOBBwTnZOzB9BkdqZDPeE5tP/LpmlA=; b=U/UWT647GlKMKVHKnf0D29z00vc9ZmkOvvF4eDPuLDh2vnbW6p0KfSq+HL/tzwuMA3 ed8fD/8frofL9GszxhkJv9t7JgQRtEjxuOVjuoJq3S7Q6Rml6R10nKHpvint74RCXMeH B8/RRTCL4SMcNxnouPsM2/hpjBRxt081e2fedR0WaCCWRKJY5Du/k4+aExAcuHBzFIbn XCIIIAzPS/M0/dAFzVwSCWG0a0PTXuhdO5VKv/3cxFF+YrromnXeI0/kX/F8D+tx61te rZMIoWE2NKB3jXVtCy4DgocsGI85UwJYL1vNeAql3LhUfHcJ2kYWQEGV/eXzShP+jaK6 K0Cg== X-Received: by 10.180.19.165 with SMTP id g5mr8821444wie.31.1387385647502; Wed, 18 Dec 2013 08:54:07 -0800 (PST) Received: by 10.216.172.202 with HTTP; Wed, 18 Dec 2013 08:53:47 -0800 (PST) In-Reply-To: <20131218162710.GA3573@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Wed, Dec 18, 2013 at 11:27 AM, Eric S. Raymond wrote: >> Anyway I hope that incremental CVS import would be needed less >> and less as CVS is replaced by any more modern version control system. > > I agree. I have never understood why people on this list are attached to it. I think I have answered this question already once in this thread, and a few times in similar threads with Eric in the past. People track CVS repos that they have not control over. Smart programmers forced to work with a corporate CVS repo. It happens also with SVN, and witness the popularity of git-svn which can sanely interact with an "active" svn repo. This is a valid use case. Hard (impossible?) to support. But there should be no surprise as to its reasons. cheers, m -- martin.langhoff@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 12:46:15 -0500 Message-ID: <20131218174615.GA5597@sigill.intra.peff.net> References: <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> <20131218002122.GA20152@thyrsus.com> <20131218162710.GA3573@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Jakub =?utf-8?B?TmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: "Eric S. Raymond" X-From: git-owner@vger.kernel.org Wed Dec 18 18:46:31 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtLCX-0004Rb-6P for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 18:46:29 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755308Ab3LRRqT (ORCPT ); Wed, 18 Dec 2013 12:46:19 -0500 Received: from cloud.peff.net ([50.56.180.127]:46764 "HELO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1753242Ab3LRRqR (ORCPT ); Wed, 18 Dec 2013 12:46:17 -0500 Received: (qmail 27800 invoked by uid 102); 18 Dec 2013 17:46:17 -0000 Received: from c-71-63-4-13.hsd1.va.comcast.net (HELO sigill.intra.peff.net) (71.63.4.13) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.84) with ESMTPA; Wed, 18 Dec 2013 11:46:17 -0600 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Wed, 18 Dec 2013 12:46:15 -0500 Content-Disposition: inline In-Reply-To: <20131218162710.GA3573@thyrsus.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Wed, Dec 18, 2013 at 11:27:10AM -0500, Eric S. Raymond wrote: > For use in reposurgeon I have defined a generic cross-VCS reference to > commit I call an "action stamp"; it consists of an RFC3339 date followed by > a committer email address. Here's an example: > > 2013-02-06T09:35:10Z!esr@thyrsus.com > > In any VCS with changesets (git, Subversion, bzr, Mercurial) this > almost always suffices to uniquely identify a commit. The "almost" is > because in these systems it is possible for a user to do multiple commits > in the same second. FWIW, this has quite a few collisions in git.git: $ git log --format='%ct %ce' | sort | uniq -c | sort -rn | head 22 1172221032 normalperson@yhbt.net 22 1172221031 normalperson@yhbt.net 22 1172221029 normalperson@yhbt.net 21 1190197351 gitster@pobox.com 21 1172221030 normalperson@yhbt.net 20 1190197350 gitster@pobox.com 17 1172221033 normalperson@yhbt.net 15 1263457676 gitster@pobox.com 15 1193717011 gitster@pobox.com 14 1367447590 gitster@pobox.com In git, it may happen quite a bit during "git am" or "git rebase", in which a large number of commits are replayed in a tight loop. You can use the author timestamp instead, but it also collides (try "%at %ae" in the above command instead). > And now you know why I wish git had subsecond timestamp resolution! If it > did, uniqueness of these in a git stream could be guaranteed. It's still not guaranteed. Even with sufficient resolution that no two operations could possibly complete in the same time unit, clocks do not always march forward. They get reset, they may skew from machine to machine, the same operation may happen on different machines, etc. The probability of such collisions is significantly reduced, though, if only because the extra precision adds an essentially random factor. But in some cases you might even see the same commit "replayed" on top of different parts of the graph, or affecting different paths (e.g., by filter-branch). I.e., no matter what your precision, multiple hacked-up views of the changeset will still always have that same timestamp. -Peff From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 14:16:48 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131218191648.GA4533@thyrsus.com> References: <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> <20131218002122.GA20152@thyrsus.com> <20131218162710.GA3573@thyrsus.com> <20131218174615.GA5597@sigill.intra.peff.net> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jakub =?utf-8?B?TmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: Jeff King X-From: git-owner@vger.kernel.org Wed Dec 18 20:17:04 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtMc5-0002I4-N9 for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 20:16:58 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755957Ab3LRTQv (ORCPT ); Wed, 18 Dec 2013 14:16:51 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:46137 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755043Ab3LRTQt (ORCPT ); Wed, 18 Dec 2013 14:16:49 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 30315380488; Wed, 18 Dec 2013 14:16:48 -0500 (EST) Content-Disposition: inline In-Reply-To: <20131218174615.GA5597@sigill.intra.peff.net> X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Jeff King : > In git, it may happen quite a bit during "git am" or "git rebase", in > which a large number of commits are replayed in a tight loop. That's a good point - a repeatable real-world case in which we can expect that behavior. This case could be solved, though, with a slight tweak to the commit generator in git (given subsecond timestamps). It could keep the time of last commit and stall by an arbitrary small amount, enough to show up as a timestamp difference. Action stamps work pretty well inside reposurgeon because they're mainly used to identify commits from older VCSes that can't run that fast. Collisions are theoretically possible but I'm never seen one in the wild. > You can > use the author timestamp instead, but it also collides (try "%at %ae" in > the above command instead). Yes, obviously for the same reason. > > And now you know why I wish git had subsecond timestamp resolution! If it > > did, uniqueness of these in a git stream could be guaranteed. > > It's still not guaranteed. Even with sufficient resolution that no two > operations could possibly complete in the same time unit, clocks do not > always march forward. They get reset, they may skew from machine to > machine, the same operation may happen on different machines, etc. Right...but the *same person* submitting operations from *different machines* within the time window required to be caught by these effects is at worst fantastically unlikely. That case is exactly why action stamps have an email part. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Keeping Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 19:54:51 +0000 Message-ID: <20131218195450.GK3163@serenity.lan> References: <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> <20131218002122.GA20152@thyrsus.com> <20131218162710.GA3573@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Eric Raymond , Jakub =?utf-8?B?TmFyxJlic2tp?= , Git Mailing List To: Martin Langhoff X-From: git-owner@vger.kernel.org Wed Dec 18 20:55:09 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtND2-0002D9-BH for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 20:55:08 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751008Ab3LRTzB (ORCPT ); Wed, 18 Dec 2013 14:55:01 -0500 Received: from jackal.aluminati.org ([72.9.247.210]:35986 "EHLO jackal.aluminati.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750898Ab3LRTzA (ORCPT ); Wed, 18 Dec 2013 14:55:00 -0500 Received: from localhost (localhost [127.0.0.1]) by jackal.aluminati.org (Postfix) with ESMTP id B6182CDA5B5; Wed, 18 Dec 2013 19:54:59 +0000 (GMT) X-Virus-Scanned: Debian amavisd-new at serval.aluminati.org X-Spam-Flag: NO X-Spam-Score: -0.999 X-Spam-Level: X-Spam-Status: No, score=-0.999 tagged_above=-9999 required=6.31 tests=[ALL_TRUSTED=-1, URIBL_BLOCKED=0.001] autolearn=disabled Received: from jackal.aluminati.org ([127.0.0.1]) by localhost (jackal.aluminati.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2SMIei-MlJwU; Wed, 18 Dec 2013 19:54:59 +0000 (GMT) Received: from serenity.lan (mink.aluminati.org [10.0.7.180]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by jackal.aluminati.org (Postfix) with ESMTPSA id 6A139CDA55E; Wed, 18 Dec 2013 19:54:52 +0000 (GMT) Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.22 (2013-10-16) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Wed, Dec 18, 2013 at 11:53:47AM -0500, Martin Langhoff wrote: > On Wed, Dec 18, 2013 at 11:27 AM, Eric S. Raymond wrote: > >> Anyway I hope that incremental CVS import would be needed less > >> and less as CVS is replaced by any more modern version control system. > > > > I agree. I have never understood why people on this list are attached to it. > > I think I have answered this question already once in this thread, and > a few times in similar threads with Eric in the past. > > People track CVS repos that they have not control over. Smart > programmers forced to work with a corporate CVS repo. It happens also > with SVN, and witness the popularity of git-svn which can sanely > interact with an "active" svn repo. > > This is a valid use case. Hard (impossible?) to support. But there > should be no surprise as to its reasons. And at this point the git-cvsimport manpage says: WARNING: git cvsimport uses cvsps version 2, which is considered deprecated; it does not work with cvsps version 3 and later. If you are performing a one-shot import of a CVS repository consider using cvs2git[1] or parsecvs[2]. Which I think sums up the position nicely; if you're doing a one-shot import then the standalone tools are going to be a better choice, but if you're trying to use Git for your work on top of CVS the only choice is cvsps with git-cvsimport. From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 15:20:09 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131218202009.GA4935@thyrsus.com> References: <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> <20131218002122.GA20152@thyrsus.com> <20131218162710.GA3573@thyrsus.com> <20131218195450.GK3163@serenity.lan> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Martin Langhoff , Jakub =?utf-8?B?TmFyxJlic2tp?= , Git Mailing List To: John Keeping X-From: git-owner@vger.kernel.org Wed Dec 18 21:20:20 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtNbP-0004Ib-En for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 21:20:19 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751373Ab3LRUUM (ORCPT ); Wed, 18 Dec 2013 15:20:12 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:46890 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751148Ab3LRUUL (ORCPT ); Wed, 18 Dec 2013 15:20:11 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id 0DFF2380488; Wed, 18 Dec 2013 15:20:10 -0500 (EST) Content-Disposition: inline In-Reply-To: <20131218195450.GK3163@serenity.lan> X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: John Keeping : > Which I think sums up the position nicely; if you're doing a one-shot > import then the standalone tools are going to be a better choice, but if > you're trying to use Git for your work on top of CVS the only choice is > cvsps with git-cvsimport. Which will trash your history - the bugs in that are worse than the bugs in 3.0, which are bad enough that I *terminated* it. Lovely.... -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kent R. Spillner" Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 14:47:41 -0600 (CST) Message-ID: <1387399661.014711355@apps.rackspace.com> References: <52B02DFF.5010408@gmail.com> <20131217140746.GB15010@thyrsus.com> <20131217210255.GA18217@thyrsus.com> <20131218002122.GA20152@thyrsus.com> <20131218162710.GA3573@thyrsus.com> <20131218195450.GK3163@serenity.lan> <20131218202009.GA4935@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain;charset=UTF-8 Content-Transfer-Encoding: 8BIT Cc: "John Keeping" , "Martin Langhoff" , "=?utf-8?Q?Jakub_Nar=C4=99bski?=" , "Git Mailing List" To: esr@thyrsus.com X-From: git-owner@vger.kernel.org Wed Dec 18 21:56:42 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtOAb-0002G7-Rc for gcvg-git-2@plane.gmane.org; Wed, 18 Dec 2013 21:56:42 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752187Ab3LRU4h (ORCPT ); Wed, 18 Dec 2013 15:56:37 -0500 Received: from smtp172.dfw.emailsrvr.com ([67.192.241.172]:45917 "EHLO smtp172.dfw.emailsrvr.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751913Ab3LRU4g convert rfc822-to-8bit (ORCPT ); Wed, 18 Dec 2013 15:56:36 -0500 X-Greylist: delayed 524 seconds by postgrey-1.27 at vger.kernel.org; Wed, 18 Dec 2013 15:56:36 EST Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp17.relay.dfw1a.emailsrvr.com (SMTP Server) with ESMTP id 7145B188D8E for ; Wed, 18 Dec 2013 15:47:52 -0500 (EST) X-Virus-Scanned: OK Received: from smtp66.iad3a.emailsrvr.com (smtp66.iad3a.emailsrvr.com [173.203.187.66]) by smtp17.relay.dfw1a.emailsrvr.com (SMTP Server) with ESMTPS id 57343188D91 for ; Wed, 18 Dec 2013 15:47:52 -0500 (EST) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp25.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id B0E47E00C1; Wed, 18 Dec 2013 15:47:41 -0500 (EST) X-Virus-Scanned: OK Received: from app13.wa-webapps.iad3a (relay.iad3a.rsapps.net [172.27.255.110]) by smtp25.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 2CD45E009D; Wed, 18 Dec 2013 15:47:41 -0500 (EST) Received: from zerosphere.org (localhost.localdomain [127.0.0.1]) by app13.wa-webapps.iad3a (Postfix) with ESMTP id 0400C380045; Wed, 18 Dec 2013 15:47:41 -0500 (EST) Received: by apps.rackspace.com (Authenticated sender: sl4mmy@zerosphere.org, from: kspillner@acm.org) with HTTP; Wed, 18 Dec 2013 14:47:41 -0600 (CST) Importance: Normal X-Priority: 3 (Normal) X-Type: plain In-Reply-To: <20131218202009.GA4935@thyrsus.com> X-Mailer: webmail7.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: > Which will trash your history - the bugs in that are worse than the bugs > in 3.0, which are bad enough that I *terminated* it. Which *might* trash your history. cvsps v2 and git cvsimport work as advertised with simple, linear CVS repositories. I maintain a git mirror of an active CVS repo and run git cvsimport every few days to sync with the latest upstream changes. The only problem I encountered so far was when you released cvsps v3 and broke git cvsimport. :) I had to manually downgrade to cvsps v2.2b1 and configure my package manager to ignore cvsps updates, but I haven't had any problems since. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Haggerty Subject: Re: I have end-of-lifed cvsps Date: Thu, 19 Dec 2013 00:44:29 +0100 Message-ID: <52B2335D.2030607@alum.mit.edu> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> <20131217184724.GA17709@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Johan Herland , =?UTF-8?B?SmFrdWIgTmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: esr@thyrsus.com X-From: git-owner@vger.kernel.org Thu Dec 19 00:44:41 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtQnA-0001Na-Ke for gcvg-git-2@plane.gmane.org; Thu, 19 Dec 2013 00:44:41 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751877Ab3LRXog (ORCPT ); Wed, 18 Dec 2013 18:44:36 -0500 Received: from alum-mailsec-scanner-8.mit.edu ([18.7.68.20]:63993 "EHLO alum-mailsec-scanner-8.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751231Ab3LRXof (ORCPT ); Wed, 18 Dec 2013 18:44:35 -0500 X-AuditID: 12074414-b7fb46d000002a4d-a3-52b23362e23d Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) by alum-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP id 7A.BF.10829.26332B25; Wed, 18 Dec 2013 18:44:34 -0500 (EST) Received: from [192.168.69.148] (p57A24A3C.dip0.t-ipconnect.de [87.162.74.60]) (authenticated bits=0) (User authenticated as mhagger@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id rBINiVLQ019286 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 18 Dec 2013 18:44:32 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131005 Icedove/17.0.9 In-Reply-To: <20131217184724.GA17709@thyrsus.com> X-Enigmail-Version: 1.6 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrMKsWRmVeSWpSXmKPExsUixO6iqJtkvCnIYMJ6ZYurW3wsuq50M1ms uDqH2WLe3V1MFhvXmTiweuycdZfd49LL72wey752snh83iQXwBLFbZOUWFIWnJmep2+XwJ2x sP8uU8E7iYpLV54xNjAeFu5i5OSQEDCR+Pd2FxuELSZx4d56MFtI4DKjxJYVSl2MXED2OSaJ GxtfsoAkeAW0JVoOTQIq4uBgEVCV2HIrFiTMJqArsainmQnEFhUIknh06CE7RLmgxMmZT1hA ykUEhCWO9amBjGQWOMMo0btoE9guYQE1iWdPVzFC7LrJLPHp9XZmkASngKHE+xe/GEGaJQTE JXoag0BMZgF1ifXzhEAqmAXkJba/ncM8gVFwFpJtsxCqZiGpWsDIvIpRLjGnNFc3NzEzpzg1 Wbc4OTEvL7VI10IvN7NELzWldBMjJNBFdjAeOSl3iFGAg1GJhzfg+cYgIdbEsuLK3EOMkhxM SqK8s402BQnxJeWnVGYkFmfEF5XmpBYfYpTgYFYS4b3CApTjTUmsrEotyodJSXOwKInzflus 7ickkJ5YkpqdmlqQWgSTleHgUJLgXQQyVLAoNT21Ii0zpwQhzcTBCTKcS0qkODUvJbUosbQk Ix4Uu/HFwOgFSfEA7Z0I0s5bXJCYCxSFaD3FqMsx78uHb4xCLHn5ealS4rxLQIoEQIoySvPg VsDS2itGcaCPhXkngVTxAFMi3KRXQEuYgJY8X7MOZElJIkJKqoExZseblWoSNgJ6k6wSQ85X K6y8tX2eUXnzaafNZ00U3vjlxr62Z9dTlS3YlLRyHrdz9Cb+l99D1Vb8WLQptN3/ Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 12/17/2013 07:47 PM, Eric S. Raymond wrote: > Johan Herland : >> However, I fear that you underestimate the number of users that want >> to use Git against CVS repos that are orders of magnitude larger (in >> both dimensions: #commits and #files) than your example repo. > > You may be right. See below... > > I'm working with Alan Barret now on trying to convert the NetBSD > repositories. They break cvs-fast-export through sheer bulk of > metadata, by running the machine out of core. This is exactly > the kind of huge case that you're talking about. > > Alan and I are going to take a good hard whack at modifying cvs-fast-export > to make this work. Because there really aren't any feasible alternatives. > The analysis code in cvsps was never good enough. cvs2git, being written > in Python, would hit the core limit faster than anything written in C. cvs2git goes to great lengths to store intermediate data to disk and keep the working set small and therefore (despite the Python overhead) I am confident that it scales better than cvs-fast-export. My usual test repo was gcc: Total CVS Files: 25013 Total CVS Revisions: 578010 Total CVS Branches: 1487929 Total CVS Tags: 11435500 Total Unique Tags: 814 Total Unique Branches: 116 CVS Repos Size in KB: 2074248 Total SVN Commits: 64501 I also regularly converted mozilla (4.2 GB) and emacs (560 MB) for testing purposes. These could all be converted on a 32-bit computer. Other projects that cvs2svn/cvs2git could handle: FreeBSD, Gentoo, KDE, GNOME, PostgreSQL. (Though for KDE, which I think was in the 16 GB range, I know that they used a giant machine for the conversion.) If you haven't tried cvs2git yet, please start it up somewhere in the background. It might take a while but it should have no trouble with your repos, and then you can compare the tools based on experience rather than speculation. > Which matters, because right now the set of people working on CVS lifters > begins with me and ends with Michael Rafferty (cvs2git), who seems even > less interested in incremental conversion than I am. Unless somebody > comes out of nowhere and wants to own that problem, it's not going > to get solved. A correct incremental converter could be done (as long as the CVS users don't literally change history retroactively) but it would be a lot of work. Parsing the CVS files isn't the problem; after all, CVS has to do that every time you check out a branch. The problem is the extra bookkeeping that would be needed to keep the overlapping history consistent between runs N and N+1 of the tool. I sketched out what would be necessary once and it came out to several solid weeks of work. But the traffic on the cvs2svn/cvs2git mailing list has trailed off essentially to zero, so either the software is perfect already (haha) or most everybody has already converted. Therefore I don't invest any significant time in that project these days. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ From mboxrd@z Thu Jan 1 00:00:00 1970 From: Johan Herland Subject: Re: I have end-of-lifed cvsps Date: Thu, 19 Dec 2013 02:11:53 +0100 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> <20131217184724.GA17709@thyrsus.com> <52B2335D.2030607@alum.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Eric Raymond , =?UTF-8?Q?Jakub_Nar=C4=99bski?= , Martin Langhoff , Git Mailing List To: Michael Haggerty X-From: git-owner@vger.kernel.org Thu Dec 19 02:12:14 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtS9t-0007XS-FE for gcvg-git-2@plane.gmane.org; Thu, 19 Dec 2013 02:12:13 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751050Ab3LSBMI (ORCPT ); Wed, 18 Dec 2013 20:12:08 -0500 Received: from mail12.copyleft.no ([188.94.218.224]:41723 "EHLO mail12.copyleft.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750872Ab3LSBMH (ORCPT ); Wed, 18 Dec 2013 20:12:07 -0500 Received: from locusts.copyleft.no ([188.94.218.116] helo=mail.mailgateway.no) by mail12.copyleft.no with esmtp (Exim 4.76) (envelope-from ) id 1VtS9g-0004ai-FS for git@vger.kernel.org; Thu, 19 Dec 2013 02:12:01 +0100 Received: from mail-pd0-f175.google.com ([209.85.192.175]) by mail.mailgateway.no with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1VtS9d-000Euk-U1 for git@vger.kernel.org; Thu, 19 Dec 2013 02:11:58 +0100 Received: by mail-pd0-f175.google.com with SMTP id w10so404028pde.6 for ; Wed, 18 Dec 2013 17:11:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=IzeSAT73leCxg6xo8hVlurjUE/MxC8lxWGYZEfb7A/8=; b=eudnZORBlVFtSD+8kdx/YzcW94wdV/FwPba36HTRBUVjeGzjk/ctICeteHATRHrRdV 1/bcLs3VydJ6FdqPi6n7c9I+eJQKpJiOqSt9kqFBFY410A5ylT/bmRcNBeC/suV8ocC3 m9ps33IXAwFI1n4F6hH2LnnTZ2agbndas7U0qKUOo0eB0t5yH1Xm+jeTUNdJJFEl5Yia nKdrkdV9q+cWOW3UO1LYvcugp/3l1khsv+GTYWfTo944k4XI6vcImyVeHNQaPmqzeQbo mcDxNLIpYYyjsm2VFi90YR0okZEUf3+WhNtmJJ6Z5DEIsTm8TQY8xA+1Adfdsnp4lGyi 5F/A== X-Received: by 10.68.134.200 with SMTP id pm8mr37883804pbb.123.1387415514001; Wed, 18 Dec 2013 17:11:54 -0800 (PST) Received: by 10.70.24.226 with HTTP; Wed, 18 Dec 2013 17:11:53 -0800 (PST) In-Reply-To: <52B2335D.2030607@alum.mit.edu> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, Dec 19, 2013 at 12:44 AM, Michael Haggerty wrote: > A correct incremental converter could be done (as long as the CVS users > don't literally change history retroactively) but it would be a lot of work. Although I agree with that sentence as it is stated, I also believe that the parenthesized condition rules out a _majority_ of CVS repo of non-trivial size/history. So even though a correct incremental converter could be built, it would be pretty much useless if it did not gracefully handle rewritten history. And in the face of rewritten history it becomes pretty much impossible to define what a "correct" conversion should even look like (not to mention the difficulty of actually implementing that converter...). Here are just a couple of things a CVS user can do (and that happened fairly regularly at my previous $dayjob) that would make life difficult for an incremental converter (and that also makes stable output from a non-incremental converter hard to solve in practice): - A user "deletes" $file from $branch by simply removing the $branch symbol on $file (cvs tag -B -d $branch $file). CVS stores no record of this. Many non-incremental importers will see $file as never having existed on $branch. An incremental importer starting from a previously converted state, must somehow deal with that previous state no longer existing from the POV of CVS. - A user moves a release tag on a few files to include a late bugfix into an upcoming release (cvs tag -F -r $new_rev $tag $file). There might be no single point in time where the tagged state existed in the repo, it has become a "Frankentag". You could claim user error here, and that such shortcuts should not happen, but that doesn't really prevent it from ever happening. Recreating the tree state of the Frankentag in Git is easy, but what kind of history do you construct to lead up to that tree? - A modularized project develops code on HEAD, and make regular releases of each module by tagging the files in the module dir with "$modulename-$version". Afterwards a project-wide "stable" tag is moved on that subset of files to include the new module release into the "stable" tag. ("stable" is conceptually a branch, but the CVS mechanism used here is still the tag, since CVS branches cannot "follow" eachother like in Git). This is pretty much the same Frankentag scenario as above, except that in this case it might be considered Best Practice (it was at our $dayjob), and not a shortcut/user error made by a single user. (None of these examples even involve the "cvs admin" which allows you to do some truly scary and demented things to your CVS history...) My point here is that people will use whatever available tools they have to solve whatever problems they are currently having. And when CVS is your tool, you will sooner or later end up with a "solution" that irrevocably rewrites your CVS history. ...Johan -- Johan Herland, www.herland.net From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eric S. Raymond" Subject: Re: I have end-of-lifed cvsps Date: Wed, 18 Dec 2013 23:06:04 -0500 Organization: Eric Conspiracy Secret Labs Message-ID: <20131219040604.GA7654@thyrsus.com> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> <20131217184724.GA17709@thyrsus.com> <52B2335D.2030607@alum.mit.edu> Reply-To: esr@thyrsus.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Johan Herland , Jakub =?utf-8?B?TmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: Michael Haggerty X-From: git-owner@vger.kernel.org Thu Dec 19 05:06:13 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtUsF-0003w5-S9 for gcvg-git-2@plane.gmane.org; Thu, 19 Dec 2013 05:06:12 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752719Ab3LSEGH (ORCPT ); Wed, 18 Dec 2013 23:06:07 -0500 Received: from static-71-162-243-5.phlapa.fios.verizon.net ([71.162.243.5]:49775 "EHLO snark.thyrsus.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752415Ab3LSEGF (ORCPT ); Wed, 18 Dec 2013 23:06:05 -0500 Received: by snark.thyrsus.com (Postfix, from userid 1000) id D1FF2380488; Wed, 18 Dec 2013 23:06:04 -0500 (EST) Content-Disposition: inline In-Reply-To: <52B2335D.2030607@alum.mit.edu> X-Eric-Conspiracy: There is no conspiracy User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Michael Haggerty : > If you haven't tried cvs2git yet, please start it up somewhere in the > background. It might take a while but it should have no trouble with > your repos, and then you can compare the tools based on experience > rather than speculation. That would be a good thing. Michael, in case you're wondering why I've continued to work on cvs-fast-export when cvs2git exists, there are exactly two reasons: (a) it's a whole lot faster on repos that aren't large enough to demand multipass, and (b) the single-whole-dumpfile output makes it a better reposurgeon front end. > But the traffic on the cvs2svn/cvs2git mailing list has trailed off > essentially to zero, so either the software is perfect already (haha) or > most everybody has already converted. Therefore I don't invest any > significant time in that project these days. Reasonable. I'm doing this as a temporary break from working on GPSD. I don't expect to be investing a lot of time in it after I get it to a 1.0 state. -- Eric S. Raymond From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Haggerty Subject: Re: I have end-of-lifed cvsps Date: Thu, 19 Dec 2013 10:31:37 +0100 Message-ID: <52B2BCF9.5080300@alum.mit.edu> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> <20131217184724.GA17709@thyrsus.com> <52B2335D.2030607@alum.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Eric Raymond , =?UTF-8?B?SmFrdWIgTmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: Johan Herland X-From: git-owner@vger.kernel.org Thu Dec 19 10:31:53 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtZxQ-0001fp-Us for gcvg-git-2@plane.gmane.org; Thu, 19 Dec 2013 10:31:53 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751627Ab3LSJbs (ORCPT ); Thu, 19 Dec 2013 04:31:48 -0500 Received: from alum-mailsec-scanner-3.mit.edu ([18.7.68.14]:61049 "EHLO alum-mailsec-scanner-3.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751328Ab3LSJbn (ORCPT ); Thu, 19 Dec 2013 04:31:43 -0500 X-AuditID: 1207440e-b7fbc6d000004ad9-2b-52b2bcfe45e4 Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) by alum-mailsec-scanner-3.mit.edu (Symantec Messaging Gateway) with SMTP id A2.05.19161.EFCB2B25; Thu, 19 Dec 2013 04:31:42 -0500 (EST) Received: from [192.168.69.148] (p57A24715.dip0.t-ipconnect.de [87.162.71.21]) (authenticated bits=0) (User authenticated as mhagger@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id rBJ9VcWP011538 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 19 Dec 2013 04:31:40 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131005 Icedove/17.0.9 In-Reply-To: X-Enigmail-Version: 1.6 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrGKsWRmVeSWpSXmKPExsUixO6iqPtvz6Ygg48/mC2ubvGx6LrSzWSx 4uocZot5d3cxWWxcZ+LA6rFz1l12j0svv7N5LPvayeLxeZNcAEsUt01SYklZcGZ6nr5dAndG /+YH7AWrzSumTjzO3MC4X6eLkZNDQsBEovXFMUYIW0ziwr31bF2MXBxCApcZJe5MW8wE4Zxj knj35z1YFa+AtsTMxjdsIDaLgKpE24QrzCA2m4CuxKKeZiYQW1QgSOLRoYfsEPWCEidnPmEB sUWA6nc8/gW2gVngJKPEj703wZqFBdQknj1dxQixbR+LxMJHL8E6OAUCJTafvwFUxAF0n7hE T2MQiMksoC6xfp4QSAWzgLzE9rdzmCcwCs5Csm4WQtUsJFULGJlXMcol5pTm6uYmZuYUpybr Ficn5uWlFuka6+VmluilppRuYoSEO98Oxvb1MocYBTgYlXh4V7zcGCTEmlhWXJl7iFGSg0lJ lNd496YgIb6k/JTKjMTijPii0pzU4kOMEhzMSiK8iiA53pTEyqrUonyYlDQHi5I4r9oSdT8h gfTEktTs1NSC1CKYrAwHh5IE716QRsGi1PTUirTMnBKENBMHJ8hwLimR4tS8lNSixNKSjHhQ BMcXA2MYJMUDtPce2N7igsRcoChE6ylGXY55Xz58YxRiycvPS5US5+0DKRIAKcoozYNbAUtu rxjFgT4W5l0KUsUDTIxwk14BLWECWmK8FmxJSSJCSqqBMSumryB6J7ur7DnGHd/N7S7Om1pX 3dqYuGYCb47R9KtRL2QW/1xbKKWWIXQrd+NBuz1Vf37bcCTeCAzVa9NZ9iGppnbV Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 12/19/2013 02:11 AM, Johan Herland wrote: > On Thu, Dec 19, 2013 at 12:44 AM, Michael Haggerty wrote: >> A correct incremental converter could be done (as long as the CVS users >> don't literally change history retroactively) but it would be a lot of work. > > Although I agree with that sentence as it is stated, I also believe > that the parenthesized condition rules out a _majority_ of CVS repo of > non-trivial size/history. So even though a correct incremental > converter could be built, it would be pretty much useless if it did > not gracefully handle rewritten history. And in the face of rewritten > history it becomes pretty much impossible to define what a "correct" > conversion should even look like (not to mention the difficulty of > actually implementing that converter...). A correct conversion would, conceptually, take a diff between the old CVS history and the new CVS history (I'm talking about the history as a whole, not a diff between two changesets), figure out what had changed, and then figure out what Git commits to make to effect the same conceptual changes in Git-land. This means that the final Git history would have to depend not only on the current entirety of the CVS history, but also on what the CVS history *was* during previous incremental imports and how the tool chose to represent that history in Git the previous rounds. There is a tradeoff here. The smarter the tool is, the fewer restrictions would have to be made on what people can do in CVS. For example, it wouldn't be unreasonable to impose a rule that people are not allowed to move files within the CVS repository (e.g., to fake move-file-with-history) after the CVS <-> Git bridge is in use. (Abuses of the history that occurred *before* the first incremental conversion, on the other hand, wouldn't be a problem.) If the user of the incremental tool has *no* influence on how his colleagues use CVS, then the tool would have to be very smart and/or the user would might sometimes be forced to do another from-scratch conversion. > Here are just a couple of things a CVS user can do (and that happened > fairly regularly at my previous $dayjob) that would make life > difficult for an incremental converter (and that also makes stable > output from a non-incremental converter hard to solve in practice): > > - A user "deletes" $file from $branch by simply removing the $branch > symbol on $file (cvs tag -B -d $branch $file). CVS stores no record of > this. Many non-incremental importers will see $file as never having > existed on $branch. An incremental importer starting from a previously > converted state, must somehow deal with that previous state no longer > existing from the POV of CVS. No problem; the tool could just add a synthetic commit "git rm"ming the file from the branch. It wouldn't know *when* the file was deleted, so it would have to pick a plausible date between the time of the last incremental conversion and the one that discovers that the branch tag has been removed from the file. The resulting Git history would contain more complete information than CVS's history. > - A user moves a release tag on a few files to include a late bugfix > into an upcoming release (cvs tag -F -r $new_rev $tag $file). There > might be no single point in time where the tagged state existed in the > repo, it has become a "Frankentag". You could claim user error here, > and that such shortcuts should not happen, but that doesn't really > prevent it from ever happening. Recreating the tree state of the > Frankentag in Git is easy, but what kind of history do you construct > to lead up to that tree? Frankentags (tags that include file versions that didn't occur contemporaneously) can occur even with one-time CVS->Git conversions. The only way to handle them is to create a Git branch representing the tag and base it at a plausible Git commit, and then (on the branch) issue a fixup commit that makes the contents of the branch equal to the contents of the CVS branch. This is a problem that cvs2git already handles. A hypothetical incremental importer would have to notice the changes in the branch contents between the previous conversion and the current one, and create commits on the branch to bring it in line with the current contents. This is no uglier than what a one-shot conversion already has to do. > - A modularized project develops code on HEAD, and make regular > releases of each module by tagging the files in the module dir with > "$modulename-$version". Afterwards a project-wide "stable" tag is > moved on that subset of files to include the new module release into > the "stable" tag. ("stable" is conceptually a branch, but the CVS > mechanism used here is still the tag, since CVS branches cannot > "follow" eachother like in Git). This is pretty much the same > Frankentag scenario as above, except that in this case it might be > considered Best Practice (it was at our $dayjob), and not a > shortcut/user error made by a single user. Same problem and same solution as above, as far as I can see. > (None of these examples even involve the "cvs admin" which allows you > to do some truly scary and demented things to your CVS history...) Even some of these might be permitted. For example: * Obsoleting already-converted revisions: it's a pretty stupid thing to do in most cases and the tool could just ignore such events, retaining the history in Git. If the revisions were obsoleted because they contained proprietary information or something, then you've got a bigger problem on your hands but one that you would have even if you were using pure Git. * Retroactive changes to log messages: would probably have to be ignored or handled via notes. * Changes to the "default branch" (another brain-dead CVS feature related to vendor branches): I'd have to think about it. But handling vendor branches is already difficult for a one-time converter because CVS retains too little info (but cvs2git does it except in the most ambiguous cases). An incremental importer would have *more* information than a one-shot importer, because it would have a hope of catching the change to the default branch at roughly the time it occurred. > My point here is that people will use whatever available tools they > have to solve whatever problems they are currently having. And when > CVS is your tool, you will sooner or later end up with a "solution" > that irrevocably rewrites your CVS history. Yes, but I maintain that an incremental importer could keep a Git history that is consistent with the CVS history in the sense that: 1. the result of checking out any branch or tag, right after a run of the importer, gives the same results as checking the same branch or tag out of CVS. 2. the Git history from one run is added to (never rewritten) by the next run. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Haggerty Subject: Re: I have end-of-lifed cvsps Date: Thu, 19 Dec 2013 10:43:23 +0100 Message-ID: <52B2BFBB.5090100@alum.mit.edu> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> <20131217184724.GA17709@thyrsus.com> <52B2335D.2030607@alum.mit.edu> <20131219040604.GA7654@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Johan Herland , =?UTF-8?B?SmFrdWIgTmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: esr@thyrsus.com X-From: git-owner@vger.kernel.org Thu Dec 19 10:43:37 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vta8l-0004lu-5u for gcvg-git-2@plane.gmane.org; Thu, 19 Dec 2013 10:43:35 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752086Ab3LSJna (ORCPT ); Thu, 19 Dec 2013 04:43:30 -0500 Received: from alum-mailsec-scanner-7.mit.edu ([18.7.68.19]:58769 "EHLO alum-mailsec-scanner-7.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751196Ab3LSJn1 (ORCPT ); Thu, 19 Dec 2013 04:43:27 -0500 X-AuditID: 12074413-b7fc76d000002aba-4c-52b2bfbe36a7 Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) by alum-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 69.0D.10938.EBFB2B25; Thu, 19 Dec 2013 04:43:26 -0500 (EST) Received: from [192.168.69.148] (p57A24715.dip0.t-ipconnect.de [87.162.71.21]) (authenticated bits=0) (User authenticated as mhagger@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id rBJ9hNgr012016 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 19 Dec 2013 04:43:24 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131005 Icedove/17.0.9 In-Reply-To: <20131219040604.GA7654@thyrsus.com> X-Enigmail-Version: 1.6 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrKKsWRmVeSWpSXmKPExsUixO6iqLtv/6Yggy/XlS2ubvGx6LrSzWSx 4uocZot5d3cxWWxcZ+LA6rFz1l12j0svv7N5LPvayeLxeZNcAEsUt01SYklZcGZ6nr5dAnfG msZprAXfeCpm7DrN1MC4nquLkZNDQsBEYt7dqewQtpjEhXvr2boYuTiEBC4zStxcuIMFwjnH JHFyeSczSBWvgLbE/xfvGUFsFgFViZVbpjKB2GwCuhKLeprBbFGBIIlHhx6yQ9QLSpyc+QRo EAeHiICwxLE+NZCZzAJnGCV6F21iA6kRFlCTePZ0FSPEsrksEp0LfoMt4BQwkJhy4TYTSLOE gLhET2MQiMksoC6xfp4QSAWzgLzE9rdzmCcwCs5Csm0WQtUsJFULGJlXMcol5pTm6uYmZuYU pybrFicn5uWlFuma6+VmluilppRuYoQEu/AOxl0n5Q4xCnAwKvHwrni5MUiINbGsuDL3EKMk B5OSKK/x7k1BQnxJ+SmVGYnFGfFFpTmpxYcYJTiYlUR4FUFyvCmJlVWpRfkwKWkOFiVxXrUl 6n5CAumJJanZqakFqUUwWRkODiUJ3p17gBoFi1LTUyvSMnNKENJMHJwgw7mkRIpT81JSixJL SzLiQfEbXwyMYJAUD9Deon0ge4sLEnOBohCtpxh1OeZ9+fCNUYglLz8vVUqctwlkhwBIUUZp HtwKWGp7xSgO9LEw72aQUTzAtAg36RXQEiagJcZrwZaUJCKkpBoY5fpzvZrCIwKKf76YkTdj fubBWZqXc95cWFty+sDZ/pmWMi5Jd5k/njxjIHzjIl/F+0OrdnNNszRkTs+5/nir Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 12/19/2013 05:06 AM, Eric S. Raymond wrote: > Michael Haggerty : >> If you haven't tried cvs2git yet, please start it up somewhere in the >> background. It might take a while but it should have no trouble with >> your repos, and then you can compare the tools based on experience >> rather than speculation. > > That would be a good thing. > > Michael, in case you're wondering why I've continued to work on > cvs-fast-export when cvs2git exists, there are exactly two reasons: > (a) it's a whole lot faster on repos that aren't large enough to > demand multipass, What difference does speed make on little repositories? They are fast enough anyway. If you are worried about the speed of testing and iterating on your reposurgeon configuration, then just write the output of cvs2svn to a temporary file and use the temporary file as input to reposurgeon. > and (b) the single-whole-dumpfile output makes it a > better reposurgeon front end. I can't believe you are still hung up on this! OK, just for you, here it is: cvs2git-3.0, in gorgeous pipey purity: #! /bin/sh blobfile=$(mktemp /tmp/myblobs-XXXXXX.out) dumpfile=$(mktemp /tmp/mydump-XXXXXX.out) cvs2git-2.0 --blobfile="$blobfile" --dumpfile="$dumpfile" "$@" 1>&2 && cat "$blobfile" "$dumpfile" rm "$blobfile" "$dumpfile" I don't think that cvs2git-2.0 outputs any junk to stdout, but just in case it does I've redirected stdout explicitly to stderr to avoid commingling it with the output of this script. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ From mboxrd@z Thu Jan 1 00:00:00 1970 From: Johan Herland Subject: Re: I have end-of-lifed cvsps Date: Thu, 19 Dec 2013 16:26:18 +0100 Message-ID: References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> <20131217184724.GA17709@thyrsus.com> <52B2335D.2030607@alum.mit.edu> <52B2BCF9.5080300@alum.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Eric Raymond , =?UTF-8?Q?Jakub_Nar=C4=99bski?= , Martin Langhoff , Git Mailing List To: Michael Haggerty X-From: git-owner@vger.kernel.org Thu Dec 19 16:26:36 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtfUg-0005rV-Tu for gcvg-git-2@plane.gmane.org; Thu, 19 Dec 2013 16:26:35 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753469Ab3LSP03 (ORCPT ); Thu, 19 Dec 2013 10:26:29 -0500 Received: from mail12.copyleft.no ([188.94.218.224]:44643 "EHLO mail12.copyleft.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753311Ab3LSP0Z (ORCPT ); Thu, 19 Dec 2013 10:26:25 -0500 Received: from locusts.copyleft.no ([188.94.218.116] helo=mail.mailgateway.no) by mail12.copyleft.no with esmtp (Exim 4.76) (envelope-from ) id 1VtfUU-000426-TK for git@vger.kernel.org; Thu, 19 Dec 2013 16:26:22 +0100 Received: from mail-pd0-f174.google.com ([209.85.192.174]) by mail.mailgateway.no with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1VtfUU-000BRX-EB for git@vger.kernel.org; Thu, 19 Dec 2013 16:26:22 +0100 Received: by mail-pd0-f174.google.com with SMTP id x10so1250116pdj.33 for ; Thu, 19 Dec 2013 07:26:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ZYlZyoz9lnYRy33+YURfFirhupv7+dwxpsraMiaVJyQ=; b=XUcmi/7bRldBF31pJP2VlIBZRjnY1MHKbfCKnl0A/VxLgklVQBzjf685RP8njdbkNA 6EZRY9ku13O5AXeEHsv9jOH5a0Q8Xj0P5x7AqQw8O7NjZB/GRA0XAbgznmj6GZIkHIrW zuem0ywy7PbU1HpaCFE57KslZLYD1Ai6dRr0x2Jis2+yqrVlYZ2HTQamUbpiQ8kjUylj Lv1eJ+omUGT2psMW9W6CajY9v9JFQsD7sEpQf1otolJSMu5Zgax2AVqH4JevI1cGJN/J cHXIM8VZULA5u8nhPbXBkjgEQ3H3ksZfOV/FPPSZJUj/GkdVOyQqoS7FBvRa75+9f+IQ MuQQ== X-Received: by 10.68.130.130 with SMTP id oe2mr2260683pbb.135.1387466778302; Thu, 19 Dec 2013 07:26:18 -0800 (PST) Received: by 10.70.24.228 with HTTP; Thu, 19 Dec 2013 07:26:18 -0800 (PST) In-Reply-To: <52B2BCF9.5080300@alum.mit.edu> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, Dec 19, 2013 at 10:31 AM, Michael Haggerty wrote: > On 12/19/2013 02:11 AM, Johan Herland wrote: >> On Thu, Dec 19, 2013 at 12:44 AM, Michael Haggerty wrote: >>> A correct incremental converter could be done (as long as the CVS users >>> don't literally change history retroactively) but it would be a lot of work. >> >> Although I agree with that sentence as it is stated, I also believe >> that the parenthesized condition rules out a _majority_ of CVS repo of >> non-trivial size/history. So even though a correct incremental >> converter could be built, it would be pretty much useless if it did >> not gracefully handle rewritten history. And in the face of rewritten >> history it becomes pretty much impossible to define what a "correct" >> conversion should even look like (not to mention the difficulty of >> actually implementing that converter...). > > A correct conversion would, conceptually, take a diff between the old > CVS history and the new CVS history (I'm talking about the history as a > whole, not a diff between two changesets), figure out what had changed, > and then figure out what Git commits to make to effect the same > conceptual changes in Git-land. > > This means that the final Git history would have to depend not only on > the current entirety of the CVS history, but also on what the CVS > history *was* during previous incremental imports and how the tool chose > to represent that history in Git the previous rounds. > > There is a tradeoff here. The smarter the tool is, the fewer > restrictions would have to be made on what people can do in CVS. For > example, it wouldn't be unreasonable to impose a rule that people are > not allowed to move files within the CVS repository (e.g., to fake > move-file-with-history) after the CVS <-> Git bridge is in use. (Abuses > of the history that occurred *before* the first incremental conversion, > on the other hand, wouldn't be a problem.) If the user of the > incremental tool has *no* influence on how his colleagues use CVS, then > the tool would have to be very smart and/or the user would might > sometimes be forced to do another from-scratch conversion. Agreed, but I find it quite ugly how the git history will end up different depending on _when_ the incremental conversion is run. It means that it will be impossible for two users to create the same Git repo (matching SHA1s), unless they carefully synchronize all of their conversion runs (at which point it's much simpler to run a single conversion and then have both users fetch the result). There is a continuum here in incremental converters: At one end - given that you're always going to lose _some_ history - you can go "screw it! let's not care about history at all!", and do the fastest possible conversion: check out the current CVS version; diff that against the previous CVS version; apply the diff to your Git repo as a single commit. I suspect quite a lot of users would be happy with this solution - at least as a temporary measure while they wait for their surrounding organization to do a proper migraiton off CVS. At the other end - you can realize that the CVS storage format on the server is simply too lossy, and you can write a proxy or monitor that intercept CVS operations on the server, and replicate those in a companion Git repo as soon as they occur in CVS. Whether you write a CVS server monitor that detects changes to the CVS server files in real time (using e.g. inotify or similar), or you write a CVS server proxy that intercepts CVS commands from the user (also forwarding them to the _real_ CVS server) is an implementation detail[*]. The important thing is you should end up with is a real-time stream of changes that can be converted to corresponding changes in a Git repo. That should give you closest possible picture of what really happens in a CVS repo, even better than what CVS stores in its on-disk format. This would allow an organization to provide a (read-only) Git mirror of their CVS repo. What we have been discussing in this thread (various strategies for fixing up broken history in Git) can be considered intermediate points between the two extremes presented above: You try to recreate as much history as possible, but realize that you sometimes need to simply synthesize some fake history in order to make everything fit together. >> Here are just a couple of things a CVS user can do (and that happened >> fairly regularly at my previous $dayjob) that would make life >> difficult for an incremental converter (and that also makes stable >> output from a non-incremental converter hard to solve in practice): >> >> - A user "deletes" $file from $branch by simply removing the $branch >> symbol on $file (cvs tag -B -d $branch $file). CVS stores no record of >> this. Many non-incremental importers will see $file as never having >> existed on $branch. An incremental importer starting from a previously >> converted state, must somehow deal with that previous state no longer >> existing from the POV of CVS. > > No problem; the tool could just add a synthetic commit "git rm"ming the > file from the branch. It wouldn't know *when* the file was deleted, so > it would have to pick a plausible date between the time of the last > incremental conversion and the one that discovers that the branch tag > has been removed from the file. The resulting Git history would contain > more complete information than CVS's history. A server proxy/monitor analyzing CVS operations in real time would know _exactly_ when the file was removed... >> - A user moves a release tag on a few files to include a late bugfix >> into an upcoming release (cvs tag -F -r $new_rev $tag $file). There >> might be no single point in time where the tagged state existed in the >> repo, it has become a "Frankentag". You could claim user error here, >> and that such shortcuts should not happen, but that doesn't really >> prevent it from ever happening. Recreating the tree state of the >> Frankentag in Git is easy, but what kind of history do you construct >> to lead up to that tree? > > Frankentags (tags that include file versions that didn't occur > contemporaneously) can occur even with one-time CVS->Git conversions. > The only way to handle them is to create a Git branch representing the > tag and base it at a plausible Git commit, and then (on the branch) > issue a fixup commit that makes the contents of the branch equal to the > contents of the CVS branch. This is a problem that cvs2git already handles. > > A hypothetical incremental importer would have to notice the changes in > the branch contents between the previous conversion and the current one, > and create commits on the branch to bring it in line with the current > contents. This is no uglier than what a one-shot conversion already has > to do. True, but analyzing CVS operations in real time, you might be able to recreate the moving (and adding/deleting) of tags as file edits (and adds/deletes) in the corresponding Git branch. >> - A modularized project develops code on HEAD, and make regular >> releases of each module by tagging the files in the module dir with >> "$modulename-$version". Afterwards a project-wide "stable" tag is >> moved on that subset of files to include the new module release into >> the "stable" tag. ("stable" is conceptually a branch, but the CVS >> mechanism used here is still the tag, since CVS branches cannot >> "follow" eachother like in Git). This is pretty much the same >> Frankentag scenario as above, except that in this case it might be >> considered Best Practice (it was at our $dayjob), and not a >> shortcut/user error made by a single user. > > Same problem and same solution as above, as far as I can see. > >> (None of these examples even involve the "cvs admin" which allows you >> to do some truly scary and demented things to your CVS history...) > > Even some of these might be permitted. For example: > > * Obsoleting already-converted revisions: it's a pretty stupid thing to > do in most cases and the tool could just ignore such events, retaining > the history in Git. If the revisions were obsoleted because they > contained proprietary information or something, then you've got a bigger > problem on your hands but one that you would have even if you were using > pure Git. > > * Retroactive changes to log messages: would probably have to be ignored > or handled via notes. > > * Changes to the "default branch" (another brain-dead CVS feature > related to vendor branches): I'd have to think about it. But handling > vendor branches is already difficult for a one-time converter because > CVS retains too little info (but cvs2git does it except in the most > ambiguous cases). An incremental importer would have *more* information > than a one-shot importer, because it would have a hope of catching the > change to the default branch at roughly the time it occurred. Agreed, but if you want correct metadata (_when_ did these changes happen, _who_ performed them), then you need to actually monitor the CVS command stream (or CVS server files) in real time... >> My point here is that people will use whatever available tools they >> have to solve whatever problems they are currently having. And when >> CVS is your tool, you will sooner or later end up with a "solution" >> that irrevocably rewrites your CVS history. > > Yes, but I maintain that an incremental importer could keep a Git > history that is consistent with the CVS history in the sense that: > > 1. the result of checking out any branch or tag, right after a run of > the importer, gives the same results as checking the same branch or tag > out of CVS. > > 2. the Git history from one run is added to (never rewritten) by the > next run. Yes, and even my simplest/fastest possible converter described above can meet those criteria. After that, it really becomes a question of _how_much_ CVS history you want to retain in your incremental import. I have described the two extremes above. Interestingly, _both_ of those extremes would look quite different from the whole-history-gone-incremental converters represented by cvs2git and cvs-fast-export, and _both_ of the extremes would probably also provide a converted result quite a bit faster than anything in between (one by virtue of depending on a single "cvs update" command, and the other by monitoring the CVS server and performing the conversion to Git in real time). ...Johan [*]: That said, I suspect git-cvsserver would be a good starting point for implementing a CVS server proxy, if someone is actually interested in looking at this... -- Johan Herland, www.herland.net From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Haggerty Subject: Re: I have end-of-lifed cvsps Date: Thu, 19 Dec 2013 17:18:19 +0100 Message-ID: <52B31C4B.8080404@alum.mit.edu> References: <20131212001738.996EB38055C@snark.thyrsus.com> <20131212042624.GB8909@thyrsus.com> <52B02DFF.5010408@gmail.com> <20131217145809.GC15010@thyrsus.com> <20131217184724.GA17709@thyrsus.com> <52B2335D.2030607@alum.mit.edu> <52B2BCF9.5080300@alum.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Eric Raymond , =?UTF-8?B?SmFrdWIgTmFyxJlic2tp?= , Martin Langhoff , Git Mailing List To: Johan Herland X-From: git-owner@vger.kernel.org Thu Dec 19 17:18:36 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VtgIx-000731-15 for gcvg-git-2@plane.gmane.org; Thu, 19 Dec 2013 17:18:31 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755619Ab3LSQS0 (ORCPT ); Thu, 19 Dec 2013 11:18:26 -0500 Received: from alum-mailsec-scanner-7.mit.edu ([18.7.68.19]:57814 "EHLO alum-mailsec-scanner-7.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754540Ab3LSQSY (ORCPT ); Thu, 19 Dec 2013 11:18:24 -0500 X-AuditID: 12074413-b7fc76d000002aba-94-52b31c4fa690 Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) by alum-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 4F.A9.10938.F4C13B25; Thu, 19 Dec 2013 11:18:23 -0500 (EST) Received: from [172.16.46.13] ([178.19.210.163]) (authenticated bits=0) (User authenticated as mhagger@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id rBJGIKQZ028772 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 19 Dec 2013 11:18:22 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131005 Icedove/17.0.9 In-Reply-To: X-Enigmail-Version: 1.6 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrKKsWRmVeSWpSXmKPExsUixO6iqOsvsznI4MobeYurW3wsuq50M1ms uDqH2WLe3V1MFhvXmTiweuycdZfd49LL72wey752snh83iQXwBLFbZOUWFIWnJmep2+XwJ3x vPMOW0GrUcWCP/fYGxi3aXQxcnJICJhIPHx1ihXCFpO4cG89WxcjF4eQwGVGiaP39rJAOBuY JI7fmQLkcHDwCmhLXPrMB9LAIqAq8ff9frBmNgFdiUU9zUwgtqhAkMSjQw/ZQWxeAUGJkzOf sIDYIkD1Ox7/AlvALHCSUeLH3pvMIAlhATWJZ09XMUIsW8wqMbvnDhtIglMgUOLP3mNgiyUE xCV6GoNATGYBdYn184RAKpgF5CW2v53DPIFRcBaSdbMQqmYhqVrAyLyKUS4xpzRXNzcxM6c4 NVm3ODkxLy+1SNdcLzezRC81pXQTIyTYhXcw7jopd4hRgINRiYd3xcuNQUKsiWXFlbmHGCU5 mJREeZ9JbQ4S4kvKT6nMSCzOiC8qzUktPsQowcGsJMK7ByTHm5JYWZValA+TkuZgURLnVVui 7ickkJ5YkpqdmlqQWgSTleHgUJLgNZQGahQsSk1PrUjLzClBSDNxcIIM55ISKU7NS0ktSiwt yYgHxW98MTCCQVI8QHvZQdp5iwsSc4GiEK2nGHU55n358I1RiCUvPy9VSpxXEKRIAKQoozQP bgUstb1iFAf6WBhiFA8wLcJNegW0hAloifHaTSBLShIRUlINjDVHlGac4dd8xH74p2Og0j43 Ls/S3RlJHzO2+qQ/fztlg2LgiU7TBxcef5taWy7bZVYkcGi52kVdEc+kHe1nrzFe Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 12/19/2013 04:26 PM, Johan Herland wrote: > On Thu, Dec 19, 2013 at 10:31 AM, Michael Haggerty wrote: >> On 12/19/2013 02:11 AM, Johan Herland wrote: >>> On Thu, Dec 19, 2013 at 12:44 AM, Michael Haggerty wrote: >>>> A correct incremental converter could be done (as long as the CVS users >>>> don't literally change history retroactively) but it would be a lot of work. >>> >>> Although I agree with that sentence as it is stated, I also believe >>> that the parenthesized condition rules out a _majority_ of CVS repo of >>> non-trivial size/history. So even though a correct incremental >>> converter could be built, it would be pretty much useless if it did >>> not gracefully handle rewritten history. And in the face of rewritten >>> history it becomes pretty much impossible to define what a "correct" >>> conversion should even look like (not to mention the difficulty of >>> actually implementing that converter...). >> >> A correct conversion would, conceptually, take a diff between the old >> CVS history and the new CVS history (I'm talking about the history as a >> whole, not a diff between two changesets), figure out what had changed, >> and then figure out what Git commits to make to effect the same >> conceptual changes in Git-land. >> >> This means that the final Git history would have to depend not only on >> the current entirety of the CVS history, but also on what the CVS >> history *was* during previous incremental imports and how the tool chose >> to represent that history in Git the previous rounds. >> >> There is a tradeoff here. The smarter the tool is, the fewer >> restrictions would have to be made on what people can do in CVS. For >> example, it wouldn't be unreasonable to impose a rule that people are >> not allowed to move files within the CVS repository (e.g., to fake >> move-file-with-history) after the CVS <-> Git bridge is in use. (Abuses >> of the history that occurred *before* the first incremental conversion, >> on the other hand, wouldn't be a problem.) If the user of the >> incremental tool has *no* influence on how his colleagues use CVS, then >> the tool would have to be very smart and/or the user would might >> sometimes be forced to do another from-scratch conversion. > > Agreed, but I find it quite ugly how the git history will end up > different depending on _when_ the incremental conversion is run. It > means that it will be impossible for two users to create the same Git > repo (matching SHA1s), unless they carefully synchronize all of their > conversion runs Even git-svn doesn't guarantee the same results over time. The most obvious scenario when it fails is when somebody changes an SVN commit's metadata retroactively using something like "svn propedit --revprop svn:log". Consistency over time across two independent conversion processes (that don't communicate) is not even theoretically possible. > (at which point it's much simpler to run a single > conversion and then have both users fetch the result). Yes. That is a very reasonable approach. [Discussion of hypothetical real-time inode-watching or proxy-based converter omitted here...] > Agreed, but if you want correct metadata (_when_ did these changes > happen, _who_ performed them), then you need to actually monitor the > CVS command stream (or CVS server files) in real time... In my opinion it is ridiculous to try to design a CVS <-> Git bridge that tries to use back-channels to fill in historical data that even CVS doesn't record. Such a thing would require an intimate connection to the CVS server from the IT department that is presumably blocking a real move to Git. So who would ever be able to use it? The only reason to record extra information would be to enable the bridge to do self-consistent incremental conversions, and in that case the *only* extra information that has to be recorded is the information that would have anyway landed in Git during the previous conversion. >>> My point here is that people will use whatever available tools they >>> have to solve whatever problems they are currently having. And when >>> CVS is your tool, you will sooner or later end up with a "solution" >>> that irrevocably rewrites your CVS history. >> >> Yes, but I maintain that an incremental importer could keep a Git >> history that is consistent with the CVS history in the sense that: >> >> 1. the result of checking out any branch or tag, right after a run of >> the importer, gives the same results as checking the same branch or tag >> out of CVS. >> >> 2. the Git history from one run is added to (never rewritten) by the >> next run. > > Yes, and even my simplest/fastest possible converter described above > can meet those criteria. After that, it really becomes a question of > _how_much_ CVS history you want to retain in your incremental import. I think you want enough history to make it pleasant to work with the resulting Git repository. That approximately means that you need some semblance of the CVS commits to be reconstructed, with their correct metadata, on the closest thing to their correct branches that is consistent with the CVS - Git impedance mismatch. > I have described the two extremes above. Interestingly, _both_ of > those extremes would look quite different from the > whole-history-gone-incremental converters represented by cvs2git and > cvs-fast-export, and _both_ of the extremes would probably also > provide a converted result quite a bit faster than anything in between > (one by virtue of depending on a single "cvs update" command, and the > other by monitoring the CVS server and performing the conversion to > Git in real time). I am not an extremist. And I know how much work it would be to start a project like this from scratch. After all, what it can do should be a strict superset of what a tool like cvs2git can do, and cvs2svn/cvs2git (according to Ohloh's COCOMO estimate) contains the equivalent of 7 person-years of effort. Anyway, this is all just blah blah unless somebody volunteers to work on it. And I think that is highly unlikely, especially given the decreasing number of CVS repositories in the wild. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/