* git-cvsimport doesn't quite work, wrt branches
@ 2006-06-13 16:41 Jim Meyering
2006-06-13 17:06 ` Jakub Narebski
2006-06-13 17:20 ` Linus Torvalds
0 siblings, 2 replies; 10+ messages in thread
From: Jim Meyering @ 2006-06-13 16:41 UTC (permalink / raw)
To: git; +Cc: Matthias Urlichs
Here's a test case that shows how git-cvsimport is misbehaving.
The script below demonstrates the problem with git-1.3.3 as
well as with 1.4.0.rc2.g5e3a6. As for cvsps, I'm using version 2.1.
The script creates a simple cvs module, with one file on the trunk,
and one file on a branch, then runs git-cvsimport on that. The error
is that the resulting git repository has both files on the branch.
FYI, this started when I tried to convert the GNU coreutils repository
(which takes barely an hour with git-cvsimport -- very quick, for 45K
revisions and 90MB of ,v files), but found that with a git-based working
directory, not all files on the b5_9x branch showed up after `git checkout
b5_9x' -- plus, there were some files there that didn't belong.
-----------------------------
#!/bin/sh
# Show that git-cvsimport doesn't quite work when
# there is one file on a branch, and another on the trunk.
# The resulting git repository has both files on the branch.
export PATH=/p/p/git/bin:$PATH
cvs='cvs -f -Q'
t=/tmp/.k
rm -rf $t
mkdir -p $t/git $t/cvs
R=$t/repo
$cvs -d $R init
mkdir -p $R/m
cd $t/cvs
$cvs -d $R co m
cd m
# Add a file on the trunk.
touch on-trunk
$cvs add on-trunk
$cvs ci -m. on-trunk
# Add another file, but destined for a branch.
touch on-br
$cvs add on-br
$cvs ci -m. on-br
$cvs tag -b B on-br
$cvs up -r B
echo x > on-br
$cvs ci -m. on-br
# Back to trunk.
$cvs up -A
# Remove our only-on-branch file from the trunk.
$cvs rm -f on-br
$cvs ci -m. on-br
$cvs up -r B
cd $t/git && git-cvsimport -p -x -v -d $R m >& $t/import-log
cd $t/git && git checkout B
cd $t
(cd cvs/m; ls -1 on-*) > cvs-files
(cd git; git-ls-files|sort) > git-files
diff -u1 cvs-files git-files
# The problem: diff reports the following differences.
# It should find none.
# --- cvs-files 2006-06-13 17:48:47.000000000 +0200
# +++ git-files 2006-06-13 17:48:47.000000000 +0200
# @@ -1 +1,2 @@
# ./on-br
# +./on-trunk
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches
2006-06-13 16:41 git-cvsimport doesn't quite work, wrt branches Jim Meyering
@ 2006-06-13 17:06 ` Jakub Narebski
2006-06-13 17:20 ` Linus Torvalds
1 sibling, 0 replies; 10+ messages in thread
From: Jakub Narebski @ 2006-06-13 17:06 UTC (permalink / raw)
To: git
Jim Meyering wrote:
> Here's a test case that shows how git-cvsimport is misbehaving.
> The script below demonstrates the problem with git-1.3.3 as
> well as with 1.4.0.rc2.g5e3a6. As for cvsps, I'm using version 2.1.
Do parsecvs has the same error?
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches
2006-06-13 16:41 git-cvsimport doesn't quite work, wrt branches Jim Meyering
2006-06-13 17:06 ` Jakub Narebski
@ 2006-06-13 17:20 ` Linus Torvalds
2006-06-13 18:46 ` Keith Packard
2006-06-13 21:13 ` Yann Dirson
1 sibling, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2006-06-13 17:20 UTC (permalink / raw)
To: Jim Meyering
Cc: Git Mailing List, Matthias Urlichs, Yann Dirson, Pavel Roskin
On Tue, 13 Jun 2006, Jim Meyering wrote:
>
> Here's a test case that shows how git-cvsimport is misbehaving.
> The script below demonstrates the problem with git-1.3.3 as
> well as with 1.4.0.rc2.g5e3a6. As for cvsps, I'm using version 2.1.
Well, it's a cvsps problem.
Big surprise.
Sadly, it also seems to be one that isn't fixed by the patches _I_ have,
and looking at Yann's set of patches, I don't think they fix it either.
This is what (my version of) CVSps reports for your repository:
---------------------
PatchSet 1
Date: 2006/06/13 10:06:42
Author: torvalds
Branch: HEAD
Tag: (none)
Log:
.
Members:
on-br:INITIAL->1.1
on-trunk:INITIAL->1.1
---------------------
PatchSet 2
Date: 2006/06/13 10:06:44
Author: torvalds
Branch: B
Ancestor branch: HEAD
Tag: (none)
Log:
.
Members:
on-br:1.1->1.1.2.1
---------------------
PatchSet 3
Date: 2006/06/13 10:06:46
Author: torvalds
Branch: HEAD
Tag: (none)
Log:
.
Members:
on-br:1.1->1.2(DEAD)
and note how the "on-br" file is part of the initial PatchSet 1.
So CVSps basically tells git-cvsimport that commit 2 (on branch B) is
based on commit 1, and doesn't say that "on-trunk" has gone away, so the
resulting git repository has branch B containing "on-trunk" version 1.1,
and "on-br" version 1.1.2.1.
CVS branches obviously sometimes confuse CVSps. Sadly, they also confuse
_me_, so I don't see how to fix this particular CVSps bug, because I'm as
confused as CVSps is ;)
We'd need to have CVSps tell git that the "on-trunk" file was never added
to branch B: the simplest way to do that would be to say that it has
become (DEAD) in PatchSet 2 (which is not technically true in CVS terms,
but _is_ technically true on git terms - on branch B, that file is
obviously dead).
Yann? Pavel? Anybody? Ideas?
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches
2006-06-13 17:20 ` Linus Torvalds
@ 2006-06-13 18:46 ` Keith Packard
2006-06-13 22:55 ` Martin Langhoff
2006-06-15 7:18 ` Yann Dirson
2006-06-13 21:13 ` Yann Dirson
1 sibling, 2 replies; 10+ messages in thread
From: Keith Packard @ 2006-06-13 18:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: keithp, Jim Meyering, Git Mailing List, Matthias Urlichs,
Yann Dirson, Pavel Roskin
[-- Attachment #1: Type: text/plain, Size: 542 bytes --]
On Tue, 2006-06-13 at 10:20 -0700, Linus Torvalds wrote:
> Well, it's a cvsps problem.
>
> Big surprise.
Yeah, we've got
git-cvsimport
cvsps
cvs rlog
,v files
cvs rlog is designed to 'represent' the history of the repository to
users. Cvsps was built as a software analysis tool, and is used by
putative software engineering researchers. Basing a supposedly lossless
repository conversion system on this pair seems foolish to me,
notwithstanding the heroic efforts to make it work.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches
2006-06-13 17:20 ` Linus Torvalds
2006-06-13 18:46 ` Keith Packard
@ 2006-06-13 21:13 ` Yann Dirson
1 sibling, 0 replies; 10+ messages in thread
From: Yann Dirson @ 2006-06-13 21:13 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jim Meyering, Git Mailing List, Matthias Urlichs, Pavel Roskin
On Tue, Jun 13, 2006 at 10:20:10AM -0700, Linus Torvalds wrote:
> Sadly, it also seems to be one that isn't fixed by the patches _I_ have,
> and looking at Yann's set of patches, I don't think they fix it either.
I don't think so either.
> So CVSps basically tells git-cvsimport that commit 2 (on branch B) is
> based on commit 1, and doesn't say that "on-trunk" has gone away, so the
> resulting git repository has branch B containing "on-trunk" version 1.1,
> and "on-br" version 1.1.2.1.
>
> CVS branches obviously sometimes confuse CVSps. Sadly, they also confuse
> _me_, so I don't see how to fix this particular CVSps bug, because I'm as
> confused as CVSps is ;)
>
> We'd need to have CVSps tell git that the "on-trunk" file was never added
> to branch B: the simplest way to do that would be to say that it has
> become (DEAD) in PatchSet 2 (which is not technically true in CVS terms,
> but _is_ technically true on git terms - on branch B, that file is
> obviously dead).
>
> Yann? Pavel? Anybody? Ideas?
This is exactly the problem I encountered one week ago with one my old
cvs repos, where I had created a branch only for a part of a source
hierarchy :)
One thing that amused me, is that in that case cvsps was DWIM enough
that the result was indeed what I expected from the conversion (I had
forgotten about the particular way that branch was created 3 years
ago). I only discovered the problem when tailor's cvs backend
generated deletions when starting my branch.
So basically, because of how awkward cvs branches are, cvsps may
indeed do what many users expect here, because branches in cvs repos
are sometimes created in strange ways, (in my case, to avoid having to
merge changes in unrelevant areas of the tree - nowadays, I'd just use
stgit to isolate changes).
I don't know what was the particular thing in coreutils developement
that led to branching only some files. In my case, it can be seen as
the cvs idiom for "branching a part of the tree" - something I don't
think there is a need to have a special idiom in GIT for.
If we want cvsps to output the exact history derived from cvs
(ie. what Jim expected, and I think it is reasonable), I fear it would
require substential modification to cvsps. I should check, but I
don't think it currently keeps track of which files are part of the
tree resulting from a changeset, but only of the files actually touhed
by the changeset. So the change would probably have a big ram
usage impact, if we store the file refs in each changeset.
That reminds me of another funny cs behaviour I noticed a couple of
months ago (not sure if it was in 1.11.x or 1.12.x): "cvs import" was
not marking files as dead on the vendor branch when it disappeared
from one upstream version to another, it was just not tagged in the
new version. I guess cvsps would have a hard time figuring out what
happenned, and would just mark the taks as invalid.
For this type of cvsps issues and cvs tags in general, my latest idea
would be to add "fake" patchsets on which to apply tags and
branchpoints. The ideal way would seem to make those similar to git's
merge commits, having as parents all patchsets the tag takes revision
from (obviously it's so biased towards the git model it would be a
pleasure to add support for this in git-cvsimport :) - but that would
produce patchsets not fitting well into the current cvsps model, so
that may require more thinking.
Anyway, it should provide a way to make sense out of what cvsps
currently considers to be "invalid" tags.
Best regards,
--
Yann Dirson <ydirson@altern.org> |
Debian-related: <dirson@debian.org> | Support Debian GNU/Linux:
| Freedom, Power, Stability, Gratis
http://ydirson.free.fr/ | Check <http://www.debian.org/>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches
2006-06-13 18:46 ` Keith Packard
@ 2006-06-13 22:55 ` Martin Langhoff
2006-06-13 23:30 ` Keith Packard
2006-06-14 9:37 ` sf
2006-06-15 7:18 ` Yann Dirson
1 sibling, 2 replies; 10+ messages in thread
From: Martin Langhoff @ 2006-06-13 22:55 UTC (permalink / raw)
To: Keith Packard
Cc: Linus Torvalds, Jim Meyering, Git Mailing List, Matthias Urlichs,
Yann Dirson, Pavel Roskin
On 6/14/06, Keith Packard <keithp@keithp.com> wrote:
> cvs rlog is designed to 'represent' the history of the repository to
> users. Cvsps was built as a software analysis tool, and is used by
> putative software engineering researchers. Basing a supposedly lossless
> repository conversion system on this pair seems foolish to me,
> notwithstanding the heroic efforts to make it work.
Yes, cvsps is relying on the wrong things. I am looking at parsecvs
and the cvs2svn tool and wondering where to from here.
In terms of history parsing, parsecvs and cvs2svn are similar. I like
cvs2svn "many passes" approach better, though the Python source is
really messy. A good thing about cvs2svn is that it is a lot more
conservative WRT memory use.
So far, I have been relying on parsecvs for initial imports, and for
cvsps+git-cvsimport for incrementals on top of that initial import.
But parsecvs falls over with large repos.
I am starting to look at what I can do with cvs2svn to get the import
into git. It seems to get very good patchsets, and it yields an easily
readable DB. I'll either learn Python, or read the DB from Perl
(probably from git-cvsimport).
The main problem, however, is that it doesn't do incremental imports,
so this would be a roundabout way of fixing parsecvs's
memory-bound-ness. We still need cvsps :(
martin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches
2006-06-13 22:55 ` Martin Langhoff
@ 2006-06-13 23:30 ` Keith Packard
2006-06-14 1:56 ` Martin Langhoff
2006-06-14 9:37 ` sf
1 sibling, 1 reply; 10+ messages in thread
From: Keith Packard @ 2006-06-13 23:30 UTC (permalink / raw)
To: Martin Langhoff
Cc: keithp, Linus Torvalds, Jim Meyering, Git Mailing List,
Matthias Urlichs, Yann Dirson, Pavel Roskin
[-- Attachment #1: Type: text/plain, Size: 1405 bytes --]
On Wed, 2006-06-14 at 10:55 +1200, Martin Langhoff wrote:
> In terms of history parsing, parsecvs and cvs2svn are similar. I like
> cvs2svn "many passes" approach better, though the Python source is
> really messy. A good thing about cvs2svn is that it is a lot more
> conservative WRT memory use.
I will try to fix parsecvs so it doesn't take so much memory. Of course,
my goal was to import various X.org repositories which have horrible
issues, but aren't all that huge. And, for them, it works just fine.
> So far, I have been relying on parsecvs for initial imports, and for
> cvsps+git-cvsimport for incrementals on top of that initial import.
> But parsecvs falls over with large repos.
I'd like some help figuring out how to do incremental imports with
parsecvs. As parsecvs already constructs the project history from the
present into the past, it should be possible to "notice" when it hits
existing bits in the repository and stop automatically. I think this
will just take saving a bit of state in the git repository to mark where
in CVS the tips of each branch come from.
> The main problem, however, is that it doesn't do incremental imports,
> so this would be a roundabout way of fixing parsecvs's
> memory-bound-ness. We still need cvsps :(
Parsecvs is currently O(nrev * nfile), and I'd like to make it O(nrev)
instead.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches
2006-06-13 23:30 ` Keith Packard
@ 2006-06-14 1:56 ` Martin Langhoff
0 siblings, 0 replies; 10+ messages in thread
From: Martin Langhoff @ 2006-06-14 1:56 UTC (permalink / raw)
To: Keith Packard
Cc: Linus Torvalds, Jim Meyering, Git Mailing List, Matthias Urlichs,
Yann Dirson, Pavel Roskin
On 6/14/06, Keith Packard <keithp@keithp.com> wrote:
> On Wed, 2006-06-14 at 10:55 +1200, Martin Langhoff wrote:
>
> > In terms of history parsing, parsecvs and cvs2svn are similar. I like
> > cvs2svn "many passes" approach better, though the Python source is
> > really messy. A good thing about cvs2svn is that it is a lot more
> > conservative WRT memory use.
>
> I will try to fix parsecvs so it doesn't take so much memory. Of course,
> my goal was to import various X.org repositories which have horrible
> issues, but aren't all that huge. And, for them, it works just fine.
Would it be possible to have it parse the RCS histories from a remote repo?
I had forgotten, but that's something else that the cvsps +
git-cvsimport combo can do. In short, to replace cvsps+git-cvsimport
...
+ not memory bound -- or at least must be able to import large
(mozilla, gentoo) with a decent amount of memory
+ must work local and remote (of course local can be faster)
+ must do incrementals reasonably well
> I'd like some help figuring out how to do incremental imports with
> parsecvs. As parsecvs already constructs the project history from the
> present into the past, it should be possible to "notice" when it hits
> existing bits in the repository and stop automatically. I think this
> will just take saving a bit of state in the git repository to mark where
> in CVS the tips of each branch come from.
Ok. Before starting to read the RCS files, I would look at all the
branch tips in the git repo, and read some metadata of the last commit
of each head into memory (author, commitmsg, timestamp, diffstat).
When parsing RCS files and building changesets to import, compare them
with the 'head' data. The timestamp granularity is seconds which is
pretty coarse -- you can ask for history post those timestamps, but
there's the risk of missing commits (this affects git-cvsimport today,
and I'm thinking how to fix it there). So borderline changesets should
be compared against the metadata you have.
There is the chance that your earlier import caught a commit partway
through, so you may end up putting in the 'rest' of the commit. That's
why diffstat can be useful.
Is that useful?
cheers,
martin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches
2006-06-13 22:55 ` Martin Langhoff
2006-06-13 23:30 ` Keith Packard
@ 2006-06-14 9:37 ` sf
1 sibling, 0 replies; 10+ messages in thread
From: sf @ 2006-06-14 9:37 UTC (permalink / raw)
To: git
Martin Langhoff wrote:
...
> Yes, cvsps is relying on the wrong things. I am looking at parsecvs
> and the cvs2svn tool and wondering where to from here.
...
> I am starting to look at what I can do with cvs2svn to get the import
> into git. It seems to get very good patchsets, and it yields an easily
> readable DB. I'll either learn Python, or read the DB from Perl
> (probably from git-cvsimport).
SVN has a portable format called "dumpfile" (see
http://svn.collab.net/repos/svn/trunk/notes/fs_dumprestore.txt) which is
produced by "svnadmin dump ..." and "cvs2svn --dump-only ...".
Why not use it as input for importing into git?
Pros:
- "svnadmin dump" should be fast
- svn repositories can be tracked with "svnadmin dump" (just remember
the last imported revision and restart from there)
- cvs2svn seems to be very good at its job
- only one tool needed
Cons:
- Both svnadmin and cvs2svn only work on local repositories
- cvs2svn cannot be used for tracking
Regards
Stephan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches
2006-06-13 18:46 ` Keith Packard
2006-06-13 22:55 ` Martin Langhoff
@ 2006-06-15 7:18 ` Yann Dirson
1 sibling, 0 replies; 10+ messages in thread
From: Yann Dirson @ 2006-06-15 7:18 UTC (permalink / raw)
To: Keith Packard
Cc: Linus Torvalds, Jim Meyering, Git Mailing List, Matthias Urlichs,
Pavel Roskin
On Tue, Jun 13, 2006 at 11:46:51AM -0700, Keith Packard wrote:
> Yeah, we've got
>
> git-cvsimport
> cvsps
> cvs rlog
> ,v files
>
> cvs rlog is designed to 'represent' the history of the repository to
> users.
I wouldn't exactly call that "history of the repository" :)
Are you thinking about any particular information from the ,v files,
that rlog fails to expose ? That is, wouldn't be possible to do a job
similar to what parsecvs does, with remote support ?
Best regards,
--
Yann Dirson <ydirson@altern.org> |
Debian-related: <dirson@debian.org> | Support Debian GNU/Linux:
| Freedom, Power, Stability, Gratis
http://ydirson.free.fr/ | Check <http://www.debian.org/>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-06-15 7:18 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-13 16:41 git-cvsimport doesn't quite work, wrt branches Jim Meyering
2006-06-13 17:06 ` Jakub Narebski
2006-06-13 17:20 ` Linus Torvalds
2006-06-13 18:46 ` Keith Packard
2006-06-13 22:55 ` Martin Langhoff
2006-06-13 23:30 ` Keith Packard
2006-06-14 1:56 ` Martin Langhoff
2006-06-14 9:37 ` sf
2006-06-15 7:18 ` Yann Dirson
2006-06-13 21:13 ` Yann Dirson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).