From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Keeping Subject: Re: git-cvsimport-3 and incremental imports Date: Mon, 21 Jan 2013 12:00:10 +0000 Message-ID: <20130121120010.GE7498@serenity.lan> References: <20130120200922.GC7498@serenity.lan> <20130120232008.GA25001@thyrsus.com> <20130121093658.GD7498@serenity.lan> <20130121112853.GA31693@thyrsus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: git@vger.kernel.org To: "Eric S. Raymond" X-From: git-owner@vger.kernel.org Mon Jan 21 13:00:44 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TxG3Q-00088Q-55 for gcvg-git-2@plane.gmane.org; Mon, 21 Jan 2013 13:00:44 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752793Ab3AUMAU (ORCPT ); Mon, 21 Jan 2013 07:00:20 -0500 Received: from jackal.aluminati.org ([72.9.247.210]:49427 "EHLO jackal.aluminati.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752440Ab3AUMAS (ORCPT ); Mon, 21 Jan 2013 07:00:18 -0500 Received: from localhost (localhost [127.0.0.1]) by jackal.aluminati.org (Postfix) with ESMTP id AC303CDA5EC; Mon, 21 Jan 2013 12:00:17 +0000 (GMT) X-Virus-Scanned: Debian amavisd-new at serval.aluminati.org X-Spam-Flag: NO X-Spam-Score: -12.9 X-Spam-Level: X-Spam-Status: No, score=-12.9 tagged_above=-9999 required=6.31 tests=[ALL_TRUSTED=-1, ALUMINATI_LOCAL_TESTS=-10, BAYES_00=-1.9] autolearn=ham Received: from jackal.aluminati.org ([127.0.0.1]) by localhost (jackal.aluminati.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YbEhdn5-fgN5; Mon, 21 Jan 2013 12:00:17 +0000 (GMT) Received: from pichi.aluminati.org (pichi.aluminati.org [10.0.16.50]) by jackal.aluminati.org (Postfix) with ESMTP id B75A7CDA603; Mon, 21 Jan 2013 12:00:16 +0000 (GMT) Received: from localhost (localhost [127.0.0.1]) by pichi.aluminati.org (Postfix) with ESMTP id 95762161E575; Mon, 21 Jan 2013 12:00:16 +0000 (GMT) X-Virus-Scanned: Debian amavisd-new at aluminati.org Received: from pichi.aluminati.org ([127.0.0.1]) by localhost (pichi.aluminati.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E3LhWcRwTz1X; Mon, 21 Jan 2013 12:00:16 +0000 (GMT) Received: from serenity.lan (tg1.aluminati.org [10.0.16.53]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pichi.aluminati.org (Postfix) with ESMTPSA id 3610B161E574; Mon, 21 Jan 2013 12:00:12 +0000 (GMT) Content-Disposition: inline In-Reply-To: <20130121112853.GA31693@thyrsus.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Mon, Jan 21, 2013 at 06:28:53AM -0500, Eric S. Raymond wrote: > John Keeping : >> But this is nothing more than a sticking plaster that happens to do >> enough in this particular case > > I'm beginning to think that's the best outcome we ever get in this > problem domain... I don't think we can ever get a perfect outcome, but it should be possible to do a little bit better without too much effort. >> - if the Git repository happened to be on >> a different branch, the start date would be wrong and too many or too >> few commits could be output. Git doesn't detect that they commits are >> identical to some that we already have because we're explicitly telling >> it to make a new commit with the specified parent. > > Then I don't understand the actual failure case. Either that or you > don't understand the effect of -i. Have you actually experimented with > it? The reason I suspect you don't understand the feature is that it > shouldn't make any difference to the way -i works which repository branch is > active at the time of the second import. > > Here is how I model what is going on: > > 1. We make commits to multiple branches of a CVS repo up to some given time T. > > 2. We import it, ending up with a collection of git branches all of which > have tip commits dated T or earlier. And *every* commit dated T or earlier > gets copied over. > > 3. We make more commits to the same set of branches in CVS. > > 4. We now run cvsps -d T on the repo. This generates an incremental > fast-import stream describing all CVS commits *newer* than T (see > the cvsps manual page). This is the problem step. There are two scenarios that have problems: 1. If I create a new development branch in my Git repository and commit something to it then git-cvsimport-3 will pass a time to cvsps that is newer than the actual time of the last import, so T is wrong. It may be possible to fix this case purely in git-cvsimport-3. 2. If the branch I have checked out is not the newest CVS branch, then git-cvsimport-3 will pass a value of T that is before the time of the last import. This case is more subtle but it results in unwanted duplicate commits since git-fast-import will just do what it's told and create the new commits. So if we have the following commits: commit1 at time 1 commit2 at time 2 commit3 at time 3 and I call "cvsps -d 2 -i" I end up with the series: commit1 at time 1 commit2 at time 2 commit3 at time 3 commit2 at time 2 - effectively reverting the previous commit commit3 at time 3 - a duplicate ... and potentially genuinely new commits This is demonstrated by running the Git test t9650. I also disagree that cvsps outputs commits *newer* than T since it will also output commits *at* T, which is what I changed with the patch in my previous message. This fixes the duplicate commit2 in the series above, but not the duplicate commit3. > 5. That stream should consist of a set of disconnected branches, each > (because of -i) beginning with a root commit containing "from > refs/heads/foo^0" which says to parent the commit on the tip of > branch foo, whatever that happens to be. (I don't have to guess > about this, I tested the feature before shipping.) > > 6. Now, when git fast-import interprets that stream in the context of > the repository produced in step 2, for each branch in the > incremental dump the branch root commit is parented on the tip > commit of the same branch in the repo. > > At step 6, it shouldn't matter at all which branch is active, because > where an incremental branch root gets attached has nothing to do with > which branch is active. > > It is sufficient to avoid duplicate commits that cvsps -d 0 -d T and > cvsps -d T run on the same CVS repo operate on *disjoint sets* of CVS > file commits. I can see this technique possibly getting confused if T > falls in the middle of a changeset where the CVS timestamps for the > file commits are out of order. But that's the same case that will > fail if we're importing at file-commit granularity, so there's no new > bug here. > > Can you explain at what step my logic is incorrect? Your logic is correct - for cvsps - the problem is where T comes from. Perhaps it is simplest to just save a CVS_LAST_IMPORT_TIME file in $GIT_DIR and not worry about it any more. John