From: John Keeping <john@keeping.me.uk>
To: "Eric S. Raymond" <esr@thyrsus.com>
Cc: git@vger.kernel.org
Subject: Re: git-cvsimport-3 and incremental imports
Date: Mon, 21 Jan 2013 12:00:10 +0000 [thread overview]
Message-ID: <20130121120010.GE7498@serenity.lan> (raw)
In-Reply-To: <20130121112853.GA31693@thyrsus.com>
On Mon, Jan 21, 2013 at 06:28:53AM -0500, Eric S. Raymond wrote:
> John Keeping <john@keeping.me.uk>:
>> But this is nothing more than a sticking plaster that happens to do
>> enough in this particular case
>
> I'm beginning to think that's the best outcome we ever get in this
> problem domain...
I don't think we can ever get a perfect outcome, but it should be
possible to do a little bit better without too much effort.
>> - if the Git repository happened to be on
>> a different branch, the start date would be wrong and too many or too
>> few commits could be output. Git doesn't detect that they commits are
>> identical to some that we already have because we're explicitly telling
>> it to make a new commit with the specified parent.
>
> Then I don't understand the actual failure case. Either that or you
> don't understand the effect of -i. Have you actually experimented with
> it? The reason I suspect you don't understand the feature is that it
> shouldn't make any difference to the way -i works which repository branch is
> active at the time of the second import.
>
> Here is how I model what is going on:
>
> 1. We make commits to multiple branches of a CVS repo up to some given time T.
>
> 2. We import it, ending up with a collection of git branches all of which
> have tip commits dated T or earlier. And *every* commit dated T or earlier
> gets copied over.
>
> 3. We make more commits to the same set of branches in CVS.
>
> 4. We now run cvsps -d T on the repo. This generates an incremental
> fast-import stream describing all CVS commits *newer* than T (see
> the cvsps manual page).
This is the problem step. There are two scenarios that have problems:
1. If I create a new development branch in my Git repository and commit
something to it then git-cvsimport-3 will pass a time to cvsps that
is newer than the actual time of the last import, so T is wrong.
It may be possible to fix this case purely in git-cvsimport-3.
2. If the branch I have checked out is not the newest CVS branch, then
git-cvsimport-3 will pass a value of T that is before the time of the
last import. This case is more subtle but it results in unwanted
duplicate commits since git-fast-import will just do what it's told
and create the new commits.
So if we have the following commits:
commit1 at time 1
commit2 at time 2
commit3 at time 3
and I call "cvsps -d 2 -i" I end up with the series:
commit1 at time 1
commit2 at time 2
commit3 at time 3
commit2 at time 2 - effectively reverting the previous commit
commit3 at time 3 - a duplicate
... and potentially genuinely new commits
This is demonstrated by running the Git test t9650.
I also disagree that cvsps outputs commits *newer* than T since it will
also output commits *at* T, which is what I changed with the patch in my
previous message. This fixes the duplicate commit2 in the series above,
but not the duplicate commit3.
> 5. That stream should consist of a set of disconnected branches, each
> (because of -i) beginning with a root commit containing "from
> refs/heads/foo^0" which says to parent the commit on the tip of
> branch foo, whatever that happens to be. (I don't have to guess
> about this, I tested the feature before shipping.)
>
> 6. Now, when git fast-import interprets that stream in the context of
> the repository produced in step 2, for each branch in the
> incremental dump the branch root commit is parented on the tip
> commit of the same branch in the repo.
>
> At step 6, it shouldn't matter at all which branch is active, because
> where an incremental branch root gets attached has nothing to do with
> which branch is active.
>
> It is sufficient to avoid duplicate commits that cvsps -d 0 -d T and
> cvsps -d T run on the same CVS repo operate on *disjoint sets* of CVS
> file commits. I can see this technique possibly getting confused if T
> falls in the middle of a changeset where the CVS timestamps for the
> file commits are out of order. But that's the same case that will
> fail if we're importing at file-commit granularity, so there's no new
> bug here.
>
> Can you explain at what step my logic is incorrect?
Your logic is correct - for cvsps - the problem is where T comes from.
Perhaps it is simplest to just save a CVS_LAST_IMPORT_TIME file in
$GIT_DIR and not worry about it any more.
John
next prev parent reply other threads:[~2013-01-21 12:00 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-20 20:09 git-cvsimport-3 and incremental imports John Keeping
2013-01-20 23:20 ` Eric S. Raymond
2013-01-20 23:34 ` Jonathan Nieder
2013-01-21 0:01 ` Junio C Hamano
2013-01-21 0:06 ` Junio C Hamano
2013-01-20 23:42 ` Jonathan Nieder
2013-01-21 1:06 ` Eric S. Raymond
2013-01-21 6:42 ` Jonathan Nieder
2013-01-21 7:28 ` Junio C Hamano
2013-01-21 7:35 ` Junio C Hamano
2013-01-21 9:36 ` John Keeping
2013-01-21 11:28 ` Eric S. Raymond
2013-01-21 12:00 ` John Keeping [this message]
2013-01-21 12:43 ` Eric S. Raymond
2013-01-21 13:27 ` John Keeping
2013-01-21 14:07 ` Eric S. Raymond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130121120010.GE7498@serenity.lan \
--to=john@keeping.me.uk \
--cc=esr@thyrsus.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.