* git-cvsimport doesn't quite work, wrt branches @ 2006-06-13 16:41 Jim Meyering 2006-06-13 17:06 ` Jakub Narebski 2006-06-13 17:20 ` Linus Torvalds 0 siblings, 2 replies; 10+ messages in thread From: Jim Meyering @ 2006-06-13 16:41 UTC (permalink / raw) To: git; +Cc: Matthias Urlichs Here's a test case that shows how git-cvsimport is misbehaving. The script below demonstrates the problem with git-1.3.3 as well as with 1.4.0.rc2.g5e3a6. As for cvsps, I'm using version 2.1. The script creates a simple cvs module, with one file on the trunk, and one file on a branch, then runs git-cvsimport on that. The error is that the resulting git repository has both files on the branch. FYI, this started when I tried to convert the GNU coreutils repository (which takes barely an hour with git-cvsimport -- very quick, for 45K revisions and 90MB of ,v files), but found that with a git-based working directory, not all files on the b5_9x branch showed up after `git checkout b5_9x' -- plus, there were some files there that didn't belong. ----------------------------- #!/bin/sh # Show that git-cvsimport doesn't quite work when # there is one file on a branch, and another on the trunk. # The resulting git repository has both files on the branch. export PATH=/p/p/git/bin:$PATH cvs='cvs -f -Q' t=/tmp/.k rm -rf $t mkdir -p $t/git $t/cvs R=$t/repo $cvs -d $R init mkdir -p $R/m cd $t/cvs $cvs -d $R co m cd m # Add a file on the trunk. touch on-trunk $cvs add on-trunk $cvs ci -m. on-trunk # Add another file, but destined for a branch. touch on-br $cvs add on-br $cvs ci -m. on-br $cvs tag -b B on-br $cvs up -r B echo x > on-br $cvs ci -m. on-br # Back to trunk. $cvs up -A # Remove our only-on-branch file from the trunk. $cvs rm -f on-br $cvs ci -m. on-br $cvs up -r B cd $t/git && git-cvsimport -p -x -v -d $R m >& $t/import-log cd $t/git && git checkout B cd $t (cd cvs/m; ls -1 on-*) > cvs-files (cd git; git-ls-files|sort) > git-files diff -u1 cvs-files git-files # The problem: diff reports the following differences. # It should find none. # --- cvs-files 2006-06-13 17:48:47.000000000 +0200 # +++ git-files 2006-06-13 17:48:47.000000000 +0200 # @@ -1 +1,2 @@ # ./on-br # +./on-trunk ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches 2006-06-13 16:41 git-cvsimport doesn't quite work, wrt branches Jim Meyering @ 2006-06-13 17:06 ` Jakub Narebski 2006-06-13 17:20 ` Linus Torvalds 1 sibling, 0 replies; 10+ messages in thread From: Jakub Narebski @ 2006-06-13 17:06 UTC (permalink / raw) To: git Jim Meyering wrote: > Here's a test case that shows how git-cvsimport is misbehaving. > The script below demonstrates the problem with git-1.3.3 as > well as with 1.4.0.rc2.g5e3a6. As for cvsps, I'm using version 2.1. Do parsecvs has the same error? -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches 2006-06-13 16:41 git-cvsimport doesn't quite work, wrt branches Jim Meyering 2006-06-13 17:06 ` Jakub Narebski @ 2006-06-13 17:20 ` Linus Torvalds 2006-06-13 18:46 ` Keith Packard 2006-06-13 21:13 ` Yann Dirson 1 sibling, 2 replies; 10+ messages in thread From: Linus Torvalds @ 2006-06-13 17:20 UTC (permalink / raw) To: Jim Meyering Cc: Git Mailing List, Matthias Urlichs, Yann Dirson, Pavel Roskin On Tue, 13 Jun 2006, Jim Meyering wrote: > > Here's a test case that shows how git-cvsimport is misbehaving. > The script below demonstrates the problem with git-1.3.3 as > well as with 1.4.0.rc2.g5e3a6. As for cvsps, I'm using version 2.1. Well, it's a cvsps problem. Big surprise. Sadly, it also seems to be one that isn't fixed by the patches _I_ have, and looking at Yann's set of patches, I don't think they fix it either. This is what (my version of) CVSps reports for your repository: --------------------- PatchSet 1 Date: 2006/06/13 10:06:42 Author: torvalds Branch: HEAD Tag: (none) Log: . Members: on-br:INITIAL->1.1 on-trunk:INITIAL->1.1 --------------------- PatchSet 2 Date: 2006/06/13 10:06:44 Author: torvalds Branch: B Ancestor branch: HEAD Tag: (none) Log: . Members: on-br:1.1->1.1.2.1 --------------------- PatchSet 3 Date: 2006/06/13 10:06:46 Author: torvalds Branch: HEAD Tag: (none) Log: . Members: on-br:1.1->1.2(DEAD) and note how the "on-br" file is part of the initial PatchSet 1. So CVSps basically tells git-cvsimport that commit 2 (on branch B) is based on commit 1, and doesn't say that "on-trunk" has gone away, so the resulting git repository has branch B containing "on-trunk" version 1.1, and "on-br" version 1.1.2.1. CVS branches obviously sometimes confuse CVSps. Sadly, they also confuse _me_, so I don't see how to fix this particular CVSps bug, because I'm as confused as CVSps is ;) We'd need to have CVSps tell git that the "on-trunk" file was never added to branch B: the simplest way to do that would be to say that it has become (DEAD) in PatchSet 2 (which is not technically true in CVS terms, but _is_ technically true on git terms - on branch B, that file is obviously dead). Yann? Pavel? Anybody? Ideas? Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches 2006-06-13 17:20 ` Linus Torvalds @ 2006-06-13 18:46 ` Keith Packard 2006-06-13 22:55 ` Martin Langhoff 2006-06-15 7:18 ` Yann Dirson 2006-06-13 21:13 ` Yann Dirson 1 sibling, 2 replies; 10+ messages in thread From: Keith Packard @ 2006-06-13 18:46 UTC (permalink / raw) To: Linus Torvalds Cc: keithp, Jim Meyering, Git Mailing List, Matthias Urlichs, Yann Dirson, Pavel Roskin [-- Attachment #1: Type: text/plain, Size: 542 bytes --] On Tue, 2006-06-13 at 10:20 -0700, Linus Torvalds wrote: > Well, it's a cvsps problem. > > Big surprise. Yeah, we've got git-cvsimport cvsps cvs rlog ,v files cvs rlog is designed to 'represent' the history of the repository to users. Cvsps was built as a software analysis tool, and is used by putative software engineering researchers. Basing a supposedly lossless repository conversion system on this pair seems foolish to me, notwithstanding the heroic efforts to make it work. -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches 2006-06-13 18:46 ` Keith Packard @ 2006-06-13 22:55 ` Martin Langhoff 2006-06-13 23:30 ` Keith Packard 2006-06-14 9:37 ` sf 2006-06-15 7:18 ` Yann Dirson 1 sibling, 2 replies; 10+ messages in thread From: Martin Langhoff @ 2006-06-13 22:55 UTC (permalink / raw) To: Keith Packard Cc: Linus Torvalds, Jim Meyering, Git Mailing List, Matthias Urlichs, Yann Dirson, Pavel Roskin On 6/14/06, Keith Packard <keithp@keithp.com> wrote: > cvs rlog is designed to 'represent' the history of the repository to > users. Cvsps was built as a software analysis tool, and is used by > putative software engineering researchers. Basing a supposedly lossless > repository conversion system on this pair seems foolish to me, > notwithstanding the heroic efforts to make it work. Yes, cvsps is relying on the wrong things. I am looking at parsecvs and the cvs2svn tool and wondering where to from here. In terms of history parsing, parsecvs and cvs2svn are similar. I like cvs2svn "many passes" approach better, though the Python source is really messy. A good thing about cvs2svn is that it is a lot more conservative WRT memory use. So far, I have been relying on parsecvs for initial imports, and for cvsps+git-cvsimport for incrementals on top of that initial import. But parsecvs falls over with large repos. I am starting to look at what I can do with cvs2svn to get the import into git. It seems to get very good patchsets, and it yields an easily readable DB. I'll either learn Python, or read the DB from Perl (probably from git-cvsimport). The main problem, however, is that it doesn't do incremental imports, so this would be a roundabout way of fixing parsecvs's memory-bound-ness. We still need cvsps :( martin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches 2006-06-13 22:55 ` Martin Langhoff @ 2006-06-13 23:30 ` Keith Packard 2006-06-14 1:56 ` Martin Langhoff 2006-06-14 9:37 ` sf 1 sibling, 1 reply; 10+ messages in thread From: Keith Packard @ 2006-06-13 23:30 UTC (permalink / raw) To: Martin Langhoff Cc: keithp, Linus Torvalds, Jim Meyering, Git Mailing List, Matthias Urlichs, Yann Dirson, Pavel Roskin [-- Attachment #1: Type: text/plain, Size: 1405 bytes --] On Wed, 2006-06-14 at 10:55 +1200, Martin Langhoff wrote: > In terms of history parsing, parsecvs and cvs2svn are similar. I like > cvs2svn "many passes" approach better, though the Python source is > really messy. A good thing about cvs2svn is that it is a lot more > conservative WRT memory use. I will try to fix parsecvs so it doesn't take so much memory. Of course, my goal was to import various X.org repositories which have horrible issues, but aren't all that huge. And, for them, it works just fine. > So far, I have been relying on parsecvs for initial imports, and for > cvsps+git-cvsimport for incrementals on top of that initial import. > But parsecvs falls over with large repos. I'd like some help figuring out how to do incremental imports with parsecvs. As parsecvs already constructs the project history from the present into the past, it should be possible to "notice" when it hits existing bits in the repository and stop automatically. I think this will just take saving a bit of state in the git repository to mark where in CVS the tips of each branch come from. > The main problem, however, is that it doesn't do incremental imports, > so this would be a roundabout way of fixing parsecvs's > memory-bound-ness. We still need cvsps :( Parsecvs is currently O(nrev * nfile), and I'd like to make it O(nrev) instead. -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches 2006-06-13 23:30 ` Keith Packard @ 2006-06-14 1:56 ` Martin Langhoff 0 siblings, 0 replies; 10+ messages in thread From: Martin Langhoff @ 2006-06-14 1:56 UTC (permalink / raw) To: Keith Packard Cc: Linus Torvalds, Jim Meyering, Git Mailing List, Matthias Urlichs, Yann Dirson, Pavel Roskin On 6/14/06, Keith Packard <keithp@keithp.com> wrote: > On Wed, 2006-06-14 at 10:55 +1200, Martin Langhoff wrote: > > > In terms of history parsing, parsecvs and cvs2svn are similar. I like > > cvs2svn "many passes" approach better, though the Python source is > > really messy. A good thing about cvs2svn is that it is a lot more > > conservative WRT memory use. > > I will try to fix parsecvs so it doesn't take so much memory. Of course, > my goal was to import various X.org repositories which have horrible > issues, but aren't all that huge. And, for them, it works just fine. Would it be possible to have it parse the RCS histories from a remote repo? I had forgotten, but that's something else that the cvsps + git-cvsimport combo can do. In short, to replace cvsps+git-cvsimport ... + not memory bound -- or at least must be able to import large (mozilla, gentoo) with a decent amount of memory + must work local and remote (of course local can be faster) + must do incrementals reasonably well > I'd like some help figuring out how to do incremental imports with > parsecvs. As parsecvs already constructs the project history from the > present into the past, it should be possible to "notice" when it hits > existing bits in the repository and stop automatically. I think this > will just take saving a bit of state in the git repository to mark where > in CVS the tips of each branch come from. Ok. Before starting to read the RCS files, I would look at all the branch tips in the git repo, and read some metadata of the last commit of each head into memory (author, commitmsg, timestamp, diffstat). When parsing RCS files and building changesets to import, compare them with the 'head' data. The timestamp granularity is seconds which is pretty coarse -- you can ask for history post those timestamps, but there's the risk of missing commits (this affects git-cvsimport today, and I'm thinking how to fix it there). So borderline changesets should be compared against the metadata you have. There is the chance that your earlier import caught a commit partway through, so you may end up putting in the 'rest' of the commit. That's why diffstat can be useful. Is that useful? cheers, martin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches 2006-06-13 22:55 ` Martin Langhoff 2006-06-13 23:30 ` Keith Packard @ 2006-06-14 9:37 ` sf 1 sibling, 0 replies; 10+ messages in thread From: sf @ 2006-06-14 9:37 UTC (permalink / raw) To: git Martin Langhoff wrote: ... > Yes, cvsps is relying on the wrong things. I am looking at parsecvs > and the cvs2svn tool and wondering where to from here. ... > I am starting to look at what I can do with cvs2svn to get the import > into git. It seems to get very good patchsets, and it yields an easily > readable DB. I'll either learn Python, or read the DB from Perl > (probably from git-cvsimport). SVN has a portable format called "dumpfile" (see http://svn.collab.net/repos/svn/trunk/notes/fs_dumprestore.txt) which is produced by "svnadmin dump ..." and "cvs2svn --dump-only ...". Why not use it as input for importing into git? Pros: - "svnadmin dump" should be fast - svn repositories can be tracked with "svnadmin dump" (just remember the last imported revision and restart from there) - cvs2svn seems to be very good at its job - only one tool needed Cons: - Both svnadmin and cvs2svn only work on local repositories - cvs2svn cannot be used for tracking Regards Stephan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches 2006-06-13 18:46 ` Keith Packard 2006-06-13 22:55 ` Martin Langhoff @ 2006-06-15 7:18 ` Yann Dirson 1 sibling, 0 replies; 10+ messages in thread From: Yann Dirson @ 2006-06-15 7:18 UTC (permalink / raw) To: Keith Packard Cc: Linus Torvalds, Jim Meyering, Git Mailing List, Matthias Urlichs, Pavel Roskin On Tue, Jun 13, 2006 at 11:46:51AM -0700, Keith Packard wrote: > Yeah, we've got > > git-cvsimport > cvsps > cvs rlog > ,v files > > cvs rlog is designed to 'represent' the history of the repository to > users. I wouldn't exactly call that "history of the repository" :) Are you thinking about any particular information from the ,v files, that rlog fails to expose ? That is, wouldn't be possible to do a job similar to what parsecvs does, with remote support ? Best regards, -- Yann Dirson <ydirson@altern.org> | Debian-related: <dirson@debian.org> | Support Debian GNU/Linux: | Freedom, Power, Stability, Gratis http://ydirson.free.fr/ | Check <http://www.debian.org/> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-cvsimport doesn't quite work, wrt branches 2006-06-13 17:20 ` Linus Torvalds 2006-06-13 18:46 ` Keith Packard @ 2006-06-13 21:13 ` Yann Dirson 1 sibling, 0 replies; 10+ messages in thread From: Yann Dirson @ 2006-06-13 21:13 UTC (permalink / raw) To: Linus Torvalds Cc: Jim Meyering, Git Mailing List, Matthias Urlichs, Pavel Roskin On Tue, Jun 13, 2006 at 10:20:10AM -0700, Linus Torvalds wrote: > Sadly, it also seems to be one that isn't fixed by the patches _I_ have, > and looking at Yann's set of patches, I don't think they fix it either. I don't think so either. > So CVSps basically tells git-cvsimport that commit 2 (on branch B) is > based on commit 1, and doesn't say that "on-trunk" has gone away, so the > resulting git repository has branch B containing "on-trunk" version 1.1, > and "on-br" version 1.1.2.1. > > CVS branches obviously sometimes confuse CVSps. Sadly, they also confuse > _me_, so I don't see how to fix this particular CVSps bug, because I'm as > confused as CVSps is ;) > > We'd need to have CVSps tell git that the "on-trunk" file was never added > to branch B: the simplest way to do that would be to say that it has > become (DEAD) in PatchSet 2 (which is not technically true in CVS terms, > but _is_ technically true on git terms - on branch B, that file is > obviously dead). > > Yann? Pavel? Anybody? Ideas? This is exactly the problem I encountered one week ago with one my old cvs repos, where I had created a branch only for a part of a source hierarchy :) One thing that amused me, is that in that case cvsps was DWIM enough that the result was indeed what I expected from the conversion (I had forgotten about the particular way that branch was created 3 years ago). I only discovered the problem when tailor's cvs backend generated deletions when starting my branch. So basically, because of how awkward cvs branches are, cvsps may indeed do what many users expect here, because branches in cvs repos are sometimes created in strange ways, (in my case, to avoid having to merge changes in unrelevant areas of the tree - nowadays, I'd just use stgit to isolate changes). I don't know what was the particular thing in coreutils developement that led to branching only some files. In my case, it can be seen as the cvs idiom for "branching a part of the tree" - something I don't think there is a need to have a special idiom in GIT for. If we want cvsps to output the exact history derived from cvs (ie. what Jim expected, and I think it is reasonable), I fear it would require substential modification to cvsps. I should check, but I don't think it currently keeps track of which files are part of the tree resulting from a changeset, but only of the files actually touhed by the changeset. So the change would probably have a big ram usage impact, if we store the file refs in each changeset. That reminds me of another funny cs behaviour I noticed a couple of months ago (not sure if it was in 1.11.x or 1.12.x): "cvs import" was not marking files as dead on the vendor branch when it disappeared from one upstream version to another, it was just not tagged in the new version. I guess cvsps would have a hard time figuring out what happenned, and would just mark the taks as invalid. For this type of cvsps issues and cvs tags in general, my latest idea would be to add "fake" patchsets on which to apply tags and branchpoints. The ideal way would seem to make those similar to git's merge commits, having as parents all patchsets the tag takes revision from (obviously it's so biased towards the git model it would be a pleasure to add support for this in git-cvsimport :) - but that would produce patchsets not fitting well into the current cvsps model, so that may require more thinking. Anyway, it should provide a way to make sense out of what cvsps currently considers to be "invalid" tags. Best regards, -- Yann Dirson <ydirson@altern.org> | Debian-related: <dirson@debian.org> | Support Debian GNU/Linux: | Freedom, Power, Stability, Gratis http://ydirson.free.fr/ | Check <http://www.debian.org/> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-06-15 7:18 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-13 16:41 git-cvsimport doesn't quite work, wrt branches Jim Meyering 2006-06-13 17:06 ` Jakub Narebski 2006-06-13 17:20 ` Linus Torvalds 2006-06-13 18:46 ` Keith Packard 2006-06-13 22:55 ` Martin Langhoff 2006-06-13 23:30 ` Keith Packard 2006-06-14 1:56 ` Martin Langhoff 2006-06-14 9:37 ` sf 2006-06-15 7:18 ` Yann Dirson 2006-06-13 21:13 ` Yann Dirson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).