From: John Keeping <john@keeping.me.uk>
To: "Eric S. Raymond" <esr@thyrsus.com>
Cc: git@vger.kernel.org
Subject: Re: git-cvsimport-3 and incremental imports
Date: Mon, 21 Jan 2013 12:00:10 +0000 [thread overview]
Message-ID: <20130121120010.GE7498@serenity.lan> (raw)
In-Reply-To: <20130121112853.GA31693@thyrsus.com>
On Mon, Jan 21, 2013 at 06:28:53AM -0500, Eric S. Raymond wrote:
> John Keeping <john@keeping.me.uk>:
>> But this is nothing more than a sticking plaster that happens to do
>> enough in this particular case
>
> I'm beginning to think that's the best outcome we ever get in this
> problem domain...
I don't think we can ever get a perfect outcome, but it should be
possible to do a little bit better without too much effort.
>> - if the Git repository happened to be on
>> a different branch, the start date would be wrong and too many or too
>> few commits could be output. Git doesn't detect that they commits are
>> identical to some that we already have because we're explicitly telling
>> it to make a new commit with the specified parent.
>
> Then I don't understand the actual failure case. Either that or you
> don't understand the effect of -i. Have you actually experimented with
> it? The reason I suspect you don't understand the feature is that it
> shouldn't make any difference to the way -i works which repository branch is
> active at the time of the second import.
>
> Here is how I model what is going on:
>
> 1. We make commits to multiple branches of a CVS repo up to some given time T.
>
> 2. We import it, ending up with a collection of git branches all of which
> have tip commits dated T or earlier. And *every* commit dated T or earlier
> gets copied over.
>
> 3. We make more commits to the same set of branches in CVS.
>
> 4. We now run cvsps -d T on the repo. This generates an incremental
> fast-import stream describing all CVS commits *newer* than T (see
> the cvsps manual page).
This is the problem step. There are two scenarios that have problems:
1. If I create a new development branch in my Git repository and commit
something to it then git-cvsimport-3 will pass a time to cvsps that
is newer than the actual time of the last import, so T is wrong.
It may be possible to fix this case purely in git-cvsimport-3.
2. If the branch I have checked out is not the newest CVS branch, then
git-cvsimport-3 will pass a value of T that is before the time of the
last import. This case is more subtle but it results in unwanted
duplicate commits since git-fast-import will just do what it's told
and create the new commits.
So if we have the following commits:
commit1 at time 1
commit2 at time 2
commit3 at time 3
and I call "cvsps -d 2 -i" I end up with the series:
commit1 at time 1
commit2 at time 2
commit3 at time 3
commit2 at time 2 - effectively reverting the previous commit
commit3 at time 3 - a duplicate
... and potentially genuinely new commits
This is demonstrated by running the Git test t9650.
I also disagree that cvsps outputs commits *newer* than T since it will
also output commits *at* T, which is what I changed with the patch in my
previous message. This fixes the duplicate commit2 in the series above,
but not the duplicate commit3.
> 5. That stream should consist of a set of disconnected branches, each
> (because of -i) beginning with a root commit containing "from
> refs/heads/foo^0" which says to parent the commit on the tip of
> branch foo, whatever that happens to be. (I don't have to guess
> about this, I tested the feature before shipping.)
>
> 6. Now, when git fast-import interprets that stream in the context of
> the repository produced in step 2, for each branch in the
> incremental dump the branch root commit is parented on the tip
> commit of the same branch in the repo.
>
> At step 6, it shouldn't matter at all which branch is active, because
> where an incremental branch root gets attached has nothing to do with
> which branch is active.
>
> It is sufficient to avoid duplicate commits that cvsps -d 0 -d T and
> cvsps -d T run on the same CVS repo operate on *disjoint sets* of CVS
> file commits. I can see this technique possibly getting confused if T
> falls in the middle of a changeset where the CVS timestamps for the
> file commits are out of order. But that's the same case that will
> fail if we're importing at file-commit granularity, so there's no new
> bug here.
>
> Can you explain at what step my logic is incorrect?
Your logic is correct - for cvsps - the problem is where T comes from.
Perhaps it is simplest to just save a CVS_LAST_IMPORT_TIME file in
$GIT_DIR and not worry about it any more.
John
next prev parent reply other threads:[~2013-01-21 12:00 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-20 20:09 git-cvsimport-3 and incremental imports John Keeping
2013-01-20 23:20 ` Eric S. Raymond
2013-01-20 23:34 ` Jonathan Nieder
2013-01-21 0:01 ` Junio C Hamano
2013-01-21 0:06 ` Junio C Hamano
2013-01-20 23:42 ` Jonathan Nieder
2013-01-21 1:06 ` Eric S. Raymond
2013-01-21 6:42 ` Jonathan Nieder
2013-01-21 7:28 ` Junio C Hamano
2013-01-21 7:35 ` Junio C Hamano
2013-01-21 9:36 ` John Keeping
2013-01-21 11:28 ` Eric S. Raymond
2013-01-21 12:00 ` John Keeping [this message]
2013-01-21 12:43 ` Eric S. Raymond
2013-01-21 13:27 ` John Keeping
2013-01-21 14:07 ` Eric S. Raymond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130121120010.GE7498@serenity.lan \
--to=john@keeping.me.uk \
--cc=esr@thyrsus.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).