* cvs2svn conversion directly to git ready for experimentation @ 2007-08-01 0:09 Michael Haggerty 2007-08-01 0:41 ` Johannes Schindelin ` (3 more replies) 0 siblings, 4 replies; 40+ messages in thread From: Michael Haggerty @ 2007-08-01 0:09 UTC (permalink / raw) To: git; +Cc: users I am the maintainer of cvs2svn[1], which is a program for one-time conversions from CVS to Subversion. cvs2svn is very robust against the many peculiarities of CVS and can convert just about every CVS repository we have ever seen. I've been working on a cvs2svn output pass that writes the converted CVS repository directly into git rather than Subversion. The code runs now with at least one repository from our test suite of nasty CVS repositories. Unfortunately, I am a complete git newbie, so I would very much appreciate help from the git community with feedback and checking whether the conversion output is reasonable and gitlike. The git output is very preliminary and virtually untested, and has the following limitations (hopefully to be removed in the near future): - It is rather slow. Among other things, it still uses RCS or CVS to extract the contents of the CVS revisions, which will soon be changed to win a factor of 2 or so. - CVS allows a branch to be created from arbitrary combinations of source revisions and/or source branches. cvs2svn tries to create a branch from a single source, but if it can't figure out how to, it creates the branch using "merge" from multiple sources. In pathological situations, the number of merge sources for a branch can be arbitrarily large. - It is not very intelligent about creating tags. When asked to create a tag, it unconditionally creates a "tag fixup branch"[2] with the same name and contents as the tag, then tags this branch. The tag fixup branch is never deleted. - There are no checks that CVS branch and tag names are legal git names, or indeed that any other similar limitations of git are honored. - The data that should be fed to git-fast-input is written to two files, which have to be loaded into git-fast-import manually. Eventually I will add an option to invoke git-fast-import automatically and pipe the output directly into git-fast-import. - Only single projects can be converted at a time. I don't think that this will be a significant limitation when outputting to git. To try it out: 1. Install svn (to be able to check out cvs2svn) and either cvs or rcs. 2. Check out the current trunk version of cvs2svn: svn co http://cvs2svn.tigris.org/svn/cvs2svn/trunk cvs2svn-trunk cd cvs2svn-trunk make check # ...optional 3. Configure cvs2svn for your conversion. This has to be done via the "options-file method"[3]. See cvs2svn-example.options and test-data/main-cvsrepos/cvs2svn-git.options as examples; the former file includes voluminous documentation. 4. Run cvs2svn. This outputs two git-fast-import files, with the names specified by your options file. In the example, these files are named 'cvs2svn-tmp/git-blob.dat' and 'cvs2svn-tmp/git-dump.dat'. 5. Initialize a git repository, and load the dump files using git-fast-import: git-init cat cvs2svn-tmp/git-blob.dat | \ git-fast-import --export-marks=cvs2svn-tmp/git-marks.dat cat cvs2svn-tmp/git-dump.dat | \ git-fast-import --import-marks=cvs2svn-tmp/git-marks.dat I am looking forward to your feedback. Even better would be if somebody wants to join forces on this project. I would be happy to supply the cvs2svn knowledge if you can bring the git experience. Michael [1] http://cvs2svn.tigris.org/ [2] http://www.kernel.org/pub/software/scm/git/docs/git-fast-import.html [3] http://cvs2svn.tigris.org/cvs2svn.html#cmd-vs-options ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-01 0:09 cvs2svn conversion directly to git ready for experimentation Michael Haggerty @ 2007-08-01 0:41 ` Johannes Schindelin 2007-08-01 22:09 ` Jakub Narebski ` (2 subsequent siblings) 3 siblings, 0 replies; 40+ messages in thread From: Johannes Schindelin @ 2007-08-01 0:41 UTC (permalink / raw) To: Michael Haggerty; +Cc: git, users Hi, On Wed, 1 Aug 2007, Michael Haggerty wrote: > 2. Check out the current trunk version of cvs2svn: > > svn co http://cvs2svn.tigris.org/svn/cvs2svn/trunk cvs2svn-trunk > cd cvs2svn-trunk > make check # ...optional FWIW I tried to clone it with "git svn", and needed to prefix the url with "guest", i.e. $ git clone http://guest@cvs2svn.tigris.org/svn/cvs2svn/trunk and it still did not work at once. Somehow I managed to get the "Username" prompt, input "guest", and left the password empty. Even then, only the second attempt succeeded (I guess somehow that "password" got stored in $HOME/.subversion/auth/... Ciao, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-01 0:09 cvs2svn conversion directly to git ready for experimentation Michael Haggerty 2007-08-01 0:41 ` Johannes Schindelin @ 2007-08-01 22:09 ` Jakub Narebski 2007-08-02 16:58 ` Michael Haggerty 2007-08-02 23:44 ` Jon Smirl 2007-08-02 8:49 ` Steffen Prohaska [not found] ` <8b65902a0708010438s24d16109k601b52c04cf9c066@mail.gmail.com> 3 siblings, 2 replies; 40+ messages in thread From: Jakub Narebski @ 2007-08-01 22:09 UTC (permalink / raw) To: git; +Cc: users Michael Haggerty wrote: > I am the maintainer of cvs2svn[1], which is a program for one-time > conversions from CVS to Subversion. cvs2svn is very robust against the > many peculiarities of CVS and can convert just about every CVS > repository we have ever seen. > > I've been working on a cvs2svn output pass that writes the converted CVS > repository directly into git rather than Subversion. The code runs now > with at least one repository from our test suite of nasty CVS repositories. Have you contacted Jon Smirl about his unpublished work on cvs2git, cvs2svn based CVS to Git converter? Quote from InterfacesFrontendsAndTools page on GIT wiki[1]: cvs2git is the unofficial name of Jon Smirl's modifications to cvs2svn. These modifications allow cvs2svn to generate a data stream which is consumed by Shawn Pearce's git-fast-import (now included in git.git). git-fast-import converts its input stream directly into a Git .pack file, minimizing the amount of IO required on large imports. Jon Smirl stopped working on cvs2git[2] because first, Mozilla (which was main target of his work) decided that to not to move to git, and second because of troubles with cvs2svn architecture[*] (which it is based on). Jon Smirl has posted his impressions on working on CVS importer in "Some tips for doing a CVS importer" thread[3]. References: ----------- [1] http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#head-23858c2cde0cef60443d8e73e6829a95f8e191ef [2] http://msgid.gmane.org/9e4733910611190940y147992b8mbdfac5a51f42e0fe@mail.gmail.com [3] http://marc.theaimsgroup.com/?t=116405956000001&r=1&w=2 Footnotes: ---------- [*] If I remember correctly authors of cvs2svn were talking about separating the code dealing with disentangling CVS repository structure from the part translating it into Subversion repository (with its quirks), and the part generating Subversion repository. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-01 22:09 ` Jakub Narebski @ 2007-08-02 16:58 ` Michael Haggerty 2007-08-02 23:44 ` Jon Smirl 1 sibling, 0 replies; 40+ messages in thread From: Michael Haggerty @ 2007-08-02 16:58 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski wrote: > Michael Haggerty wrote: > Have you contacted Jon Smirl about his unpublished work on cvs2git, > cvs2svn based CVS to Git converter? Yes, I am familiar with Jon Smirl's work, and as soon as he let us know what he was working on, we tried to help. Unfortunately the cooperation was not very fruitful. - While Jon was (unknown to us) working on his git output patch, I was working on a big cvs2svn rewrite to make cvs2svn more robust and easier to hack. By the time he contacted us, his patch did not apply to the cvs2svn code. The refactoring that obsoleted the patch, in fact, was largely to remedy the very same architectural problems that were hampering his work. - In my opinion, Jon misdiagnosed the reason for the "fragmented branch creation" problem that he claimed was preventing a clean conversion to git, and he felt that we were not interested in fixing the problem. In fact, I was working on fixing another problem that I believe was the *real* reason for the fragmented branch creation. This fix is implemented in cvs2svn version 2.0. > Footnotes: > ---------- > [*] If I remember correctly authors of cvs2svn were talking about separating > the code dealing with disentangling CVS repository structure from the part > translating it into Subversion repository (with its quirks), and the part > generating Subversion repository. Yes, this is now done, which was why it was only a couple of days of programming for me to add a git output option. Michael ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-01 22:09 ` Jakub Narebski 2007-08-02 16:58 ` Michael Haggerty @ 2007-08-02 23:44 ` Jon Smirl 1 sibling, 0 replies; 40+ messages in thread From: Jon Smirl @ 2007-08-02 23:44 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, users On 8/1/07, Jakub Narebski <jnareb@gmail.com> wrote: > Michael Haggerty wrote: > > > I am the maintainer of cvs2svn[1], which is a program for one-time > > conversions from CVS to Subversion. cvs2svn is very robust against the > > many peculiarities of CVS and can convert just about every CVS > > repository we have ever seen. > > > > I've been working on a cvs2svn output pass that writes the converted CVS > > repository directly into git rather than Subversion. The code runs now > > with at least one repository from our test suite of nasty CVS repositories. > > Have you contacted Jon Smirl about his unpublished work on cvs2git, > cvs2svn based CVS to Git converter? My converter was derived from Michael's cvs2svn code. The bulk of my work was converting cvs2svn to output in a format that git-fastimport could consume. This was all rather straight forward and there was nothing really interesting in the code. What it exposed were fundamental issues about the technical complexities of trying to reconstruct a change set history from CVS which didn't record all of the needed info. I was never able to construct a satisfactory git representation of the Mozilla CVS repository. Michael has had a long time to work on the change set detection code and he's probably added some new strategies. My code did include a CVS file parser for extracting all the revisions from the file in a single pass. Doing that is a major performance benefit. I believe I posted the code to the cvs2svn mailing list. It was about 200 lines of code. Forking off cvs a million times to extract the revisions takes days to run. Same goes for forking git a million times.git-fastimport uses a pipe to cvs2svn to avoid forking. git-fastimport also uses a technique from the database world for bulk import, it imports everything without indexing it. Indexing is done after the import finishes. Between parsing the CVS files internally and Shawn's git-fastimport, it was possible to import Mozilla CVS (2.4G) in about 2 hours and generate a 450MB pack file. You need 3GB of RAM to do this - if swap happens the process will take weeks to finish. > Quote from InterfacesFrontendsAndTools page on GIT wiki[1]: > > cvs2git is the unofficial name of Jon Smirl's modifications to cvs2svn. > These modifications allow cvs2svn to generate a data stream which is > consumed by Shawn Pearce's git-fast-import (now included in git.git). > git-fast-import converts its input stream directly into a Git .pack file, > minimizing the amount of IO required on large imports. > > Jon Smirl stopped working on cvs2git[2] because first, Mozilla (which was > main target of his work) decided that to not to move to git, and second > because of troubles with cvs2svn architecture[*] (which it is based on). > Jon Smirl has posted his impressions on working on CVS importer in > "Some tips for doing a CVS importer" thread[3]. > > References: > ----------- > [1] http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#head-23858c2cde0cef60443d8e73e6829a95f8e191ef > [2] http://msgid.gmane.org/9e4733910611190940y147992b8mbdfac5a51f42e0fe@mail.gmail.com > [3] http://marc.theaimsgroup.com/?t=116405956000001&r=1&w=2 > > Footnotes: > ---------- > [*] If I remember correctly authors of cvs2svn were talking about separating > the code dealing with disentangling CVS repository structure from the part > translating it into Subversion repository (with its quirks), and the part > generating Subversion repository. > > -- > Jakub Narebski > Warsaw, Poland > ShadeHawk on #git > > > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-01 0:09 cvs2svn conversion directly to git ready for experimentation Michael Haggerty 2007-08-01 0:41 ` Johannes Schindelin 2007-08-01 22:09 ` Jakub Narebski @ 2007-08-02 8:49 ` Steffen Prohaska 2007-08-02 17:23 ` Michael Haggerty ` (3 more replies) [not found] ` <8b65902a0708010438s24d16109k601b52c04cf9c066@mail.gmail.com> 3 siblings, 4 replies; 40+ messages in thread From: Steffen Prohaska @ 2007-08-02 8:49 UTC (permalink / raw) To: Michael Haggerty; +Cc: git, users [-- Attachment #1: Type: text/plain, Size: 4365 bytes --] Michael, On Aug 1, 2007, at 2:09 AM, Michael Haggerty wrote: > I am looking forward to your feedback. Even better would be if > somebody > wants to join forces on this project. I would be happy to supply the > cvs2svn knowledge if you can bring the git experience. I tried it with revision trunk@3930 of cvs2svn. The results are as follows. some WARNING: problem encoding log message: [...] cvs2svn Statistics: ------------------ Total CVS Files: 9578 Total CVS Revisions: 66771 Total CVS Branches: 229121 Total CVS Tags: 371259 Total Unique Tags: 112 Total Unique Branches: 79 CVS Repos Size in KB: 210390 Total SVN Commits: 18178 First Revision Date: Fri Jul 23 10:26:11 1999 Last Revision Date: Thu Jul 19 17:50:40 2007 ------------------ Timings (seconds): ------------------ 3295 pass1 CollectRevsPass 0 pass2 CollateSymbolsPass 3642 pass3 FilterSymbolsPass 0 pass4 SortRevisionSummaryPass 1 pass5 SortSymbolSummaryPass 109 pass6 InitializeChangesetsPass 56 pass7 BreakRevisionChangesetCyclesPass 66 pass8 RevisionTopologicalSortPass 54 pass9 BreakSymbolChangesetCyclesPass 99 pass10 BreakAllChangesetCyclesPass 92 pass11 TopologicalSortPass 46 pass12 CreateRevsPass 7 pass13 SortSymbolsPass 2 pass14 IndexSymbolsPass 70 pass15 OutputPass 7540 total I checked that CVS head and two other branches match when checked out from CVS and from the imported git archive. Everything is ok (ignoring some differences introduced by keyword expansion). Note, I tried earlier to use cvs2svn to import to svn followed by git-svnimport to import to git. The repository resulting from this two step import not even passed this minimal requirement of matching checkouts from cvs and git. cvs2svn created a lot of branches that are not present in CVS, with names identical to CVS tags. Apparently these branches are used to create a commit matching a certain CVS tag. I checked one suspicious commit that indicates to me if the root points of branches are right. Note, git-cvsimport fails this check; parsecvs and cvs2svn pass the check. The branching structure looks, ... hmm ..., interesting. cvs2svn manufactured commits to get the branching points right. Apparently our CVS has some weired commits like 'unlabeled-1.1.1' and two other named tags (maybe vendor branches?) that cause these manufactured commits. In gitk I see long lines running parallel to the cvs trunk all down to these weired CVS tags. They are not very useful, altough they might be correct. Note, parsecvs imports our repository without such basically useless links. However, I can't verify if parsecvs gets something wrong. Other branches are created over a couple of commits mixing in several branches (maybe again our weired commits already mentioned). See branching1.png, branching2.png, branching3.png. [ I have to apologize, our cvs repository contains proprietary information, so I can't publish it's history freely. ] cvs2svn is the first tool besided parsecvs that worked for me, that is imported the whole repository, passed the basic test of matching checkouts from cvs and git, and got the one suspicious commit right that I'm using for verifying the branching points. [ I have no time to go into the details of all these tests. Therefore only a very short summary: All tools needed basic cleanup of a few corrupted ,v files and ,v files that were duplicated in Attic. git-cvsimport fails to create branches at the right commit. fromcvs's togit surrendered during the import. fromcvs's tohg accepted more of the history, but finally surrendered as well. parsecvs works for me (crashes on corrupted ,v files). cvs2svn followed by git-svnimport create wrong state at the tips of branches. cvs2svn direct git import works for me (reports corrupted ,v files). ] Right now, I'd prefer the import by parsecvs because of the simpler history. However, I don't know if I loose history information by doing so. I'd start by a run of cvs2svn to validate the overall structure of the CVS repository. Dealing with corruption in the CVS repository seems to be superior in cvs2svn. It reports errors when parsecvs just crashes. Steffen [-- Attachment #2.1: branching1.png --] [-- Type: application/applefile, Size: 74 bytes --] [-- Attachment #2.2: branching1.png --] [-- Type: image/png, Size: 3389 bytes --] [-- Attachment #3: Type: text/plain, Size: 1 bytes --] [-- Attachment #4.1: branching2.png --] [-- Type: application/applefile, Size: 74 bytes --] [-- Attachment #4.2: branching2.png --] [-- Type: image/png, Size: 1653 bytes --] [-- Attachment #5: Type: text/plain, Size: 1 bytes --] [-- Attachment #6.1: branching3.png --] [-- Type: application/applefile, Size: 74 bytes --] [-- Attachment #6.2: branching3.png --] [-- Type: image/png, Size: 2807 bytes --] [-- Attachment #7: Type: text/plain, Size: 4 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 8:49 ` Steffen Prohaska @ 2007-08-02 17:23 ` Michael Haggerty 2007-08-02 19:22 ` Marko Macek 2007-08-02 23:59 ` Jon Smirl 2007-08-02 17:35 ` Simon 'corecode' Schubert ` (2 subsequent siblings) 3 siblings, 2 replies; 40+ messages in thread From: Michael Haggerty @ 2007-08-02 17:23 UTC (permalink / raw) To: Steffen Prohaska; +Cc: git, users Steffen Prohaska wrote: > On Aug 1, 2007, at 2:09 AM, Michael Haggerty wrote: >> I am looking forward to your feedback. Even better would be if somebody >> wants to join forces on this project. I would be happy to supply the >> cvs2svn knowledge if you can bring the git experience. > > I tried it with revision trunk@3930 of cvs2svn. The results are as follows. Thanks for the feedback! > cvs2svn created a lot of branches that are not present in CVS, > with names identical to CVS tags. Apparently these branches are > used to create a commit matching a certain CVS tag. That is correct. This is something that I plan to work on, at least for tags that can be created from a single source commit. > The branching structure looks, ... hmm ..., interesting. cvs2svn > manufactured commits to get the branching points right. > Apparently our CVS has some weired commits like 'unlabeled-1.1.1' > and two other named tags (maybe vendor branches?) that cause > these manufactured commits. In gitk I see long lines running > parallel to the cvs trunk all down to these weired CVS tags. They > are not very useful, altough they might be correct. Note, > parsecvs imports our repository without such basically useless > links. However, I can't verify if parsecvs gets something wrong. Branches with names like "unlabeled-1.1.1" come from CVS branches for which the revisions are still contained in the RCS files but for which the branch name has been deleted. These wreak havoc on cvs2svn's attempt to find simple branch sources and cause a proliferation of basically useless branches. The main problem is that cvs2svn does not attempt to figure out that "unlabeled-1.2.4" in one file might be the same as "unlabeled-1.2.6" in another etc. An "unlabeled-1.1.1", in particular, means that the branch whose name was deleted was a vendor branch. The deletion of a vendor branch name can cause even more mayhem. In most cases it makes sense to exclude the unlabeled branches. After all, somebody tried to delete them, so they can't be that important, right? Use --exclude='unlabeled-.*', or add a line like this to your options file: ctx.symbol_strategy.add_rule(ExcludeRegexpStrategyRule(r'unlabeled-.*')) . This can of course cause problems if other branches or tags were created that branched off of the unlabeled branch. In such cases the dependent branches/tags might have to be excluded too. > Other branches are created over a couple of commits mixing in > several branches (maybe again our weired commits already > mentioned). See branching1.png, branching2.png, branching3.png. > [ I have to apologize, our cvs repository contains proprietary > information, so I can't publish it's history freely. ] This can definitely be caused by unlabeled branches. It can also be caused by branches rooted in a vendor branch. In many cases, such branches can actually be grafted onto trunk, but cvs2svn does not (yet) attempt this. > cvs2svn is the first tool besided parsecvs that worked for me, > that is imported the whole repository, passed the basic test of > matching checkouts from cvs and git, and got the one suspicious > commit right that I'm using for verifying the branching points. > > [ I have no time to go into the details of all these tests. > Therefore only a very short summary: > All tools needed basic cleanup of a few corrupted ,v files and > ,v files that were duplicated in Attic. > git-cvsimport fails to create branches at the right commit. > fromcvs's togit surrendered during the import. > fromcvs's tohg accepted more of the history, but finally > surrendered as well. > parsecvs works for me (crashes on corrupted ,v files). > cvs2svn followed by git-svnimport create wrong state at the > tips of branches. > cvs2svn direct git import works for me (reports corrupted ,v files). > ] Thanks very much for this interesting summary. > Right now, I'd prefer the import by parsecvs because of the > simpler history. However, I don't know if I loose history > information by doing so. I'd start by a run of cvs2svn to validate > the overall structure of the CVS repository. Dealing with corruption > in the CVS repository seems to be superior in cvs2svn. It reports > errors when parsecvs just crashes. If excluding the unlabeled branches does not fix things for you, I suggest checking out the first revision on such a branch, and comparing the results from CVS, from parsecvs, and from cvs2svn. It *should* be that the version of the file from the vendor branch is included in the working copy. cvs2svn should handle this correctly. I am curious whether parsecvs does. Michael ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 17:23 ` Michael Haggerty @ 2007-08-02 19:22 ` Marko Macek 2007-08-02 23:59 ` Jon Smirl 1 sibling, 0 replies; 40+ messages in thread From: Marko Macek @ 2007-08-02 19:22 UTC (permalink / raw) To: Michael Haggerty, git, users, prohaska [-- Attachment #1: Type: text/plain, Size: 685 bytes --] Michael Haggerty wrote: > This can definitely be caused by unlabeled branches. It can also be > caused by branches rooted in a vendor branch. In many cases, such > branches can actually be grafted onto trunk, but cvs2svn does not (yet) > attempt this. It would be nice to be able to exclude the vendor branch if only the initial commit was made on it (or maybe handle it better, by remapping the commits to the main branch when they match). I have tested this on my repository and currently gitk draws large 'railroad switching stations' because many tags have the vendor branch as a parent (and in some cases also the parent branch, in addition to the parent commit). Mark [-- Attachment #2: railroad.png --] [-- Type: image/png, Size: 3828 bytes --] [-- Attachment #3: Type: text/plain, Size: 193 bytes --] --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@cvs2svn.tigris.org For additional commands, e-mail: users-help@cvs2svn.tigris.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 17:23 ` Michael Haggerty 2007-08-02 19:22 ` Marko Macek @ 2007-08-02 23:59 ` Jon Smirl 2007-08-05 7:58 ` Oswald Buddenhagen 1 sibling, 1 reply; 40+ messages in thread From: Jon Smirl @ 2007-08-02 23:59 UTC (permalink / raw) To: Michael Haggerty; +Cc: Steffen Prohaska, git, users On 8/2/07, Michael Haggerty <mhagger@alum.mit.edu> wrote: > Branches with names like "unlabeled-1.1.1" come from CVS branches for > which the revisions are still contained in the RCS files but for which > the branch name has been deleted. These wreak havoc on cvs2svn's > attempt to find simple branch sources and cause a proliferation of > basically useless branches. The main problem is that cvs2svn does not > attempt to figure out that "unlabeled-1.2.4" in one file might be the > same as "unlabeled-1.2.6" in another etc. I seem to recall discussing an algorithm to fix this on the cvs2svn mailing list. There was a somewhat simple way to correlate the "unlabeled-1.2.4" in one file might be the same as "unlabeled-1.2.6" problem. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 23:59 ` Jon Smirl @ 2007-08-05 7:58 ` Oswald Buddenhagen 0 siblings, 0 replies; 40+ messages in thread From: Oswald Buddenhagen @ 2007-08-05 7:58 UTC (permalink / raw) To: Jon Smirl; +Cc: Michael Haggerty, Steffen Prohaska, git, users On Thu, Aug 02, 2007 at 07:59:41PM -0400, Jon Smirl wrote: > I seem to recall discussing an algorithm to fix this on the cvs2svn > mailing list. There was a somewhat simple way to correlate the > "unlabeled-1.2.4" in one file might be the same as "unlabeled-1.2.6" > problem. > yes, name them after the first symbol that appears on them. like unlabeled-1.2.4 being named __KDE_3_5_RELEASE because of such tag (without the underscores, obviously) appearing on it. the naive per-file implementation doesn't get you that far, though. again, one'd have to collect data from all files first, correlate it and make a "majority vote". very similar to your favorite symbol source problem. ;) -- Hi! I'm a .signature virus! Copy me into your ~/.signature, please! -- Chaos, panic, and disorder - my work here is done. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 8:49 ` Steffen Prohaska 2007-08-02 17:23 ` Michael Haggerty @ 2007-08-02 17:35 ` Simon 'corecode' Schubert 2007-08-02 19:13 ` Steffen Prohaska 2007-08-02 20:43 ` Linus Torvalds 2007-08-02 23:55 ` Jon Smirl 3 siblings, 1 reply; 40+ messages in thread From: Simon 'corecode' Schubert @ 2007-08-02 17:35 UTC (permalink / raw) To: Steffen Prohaska; +Cc: Michael Haggerty, git, users Steffen Prohaska wrote: > fromcvs's togit surrendered during the import. > fromcvs's tohg accepted more of the history, but finally > surrendered as well. Which repo is it you are converting? Is this available somewhere? I'd appreciate any reports concerning "surrenders" of fromcvs. Additionally, it seems strange that tohg should have worked "better" than togit, as these are basically just different backends. cheers simon ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 17:35 ` Simon 'corecode' Schubert @ 2007-08-02 19:13 ` Steffen Prohaska 2007-08-02 19:29 ` Simon 'corecode' Schubert 2007-08-02 23:37 ` Michael Haggerty 0 siblings, 2 replies; 40+ messages in thread From: Steffen Prohaska @ 2007-08-02 19:13 UTC (permalink / raw) To: Simon 'corecode' Schubert; +Cc: Michael Haggerty, git, users Simon, On Aug 2, 2007, at 7:35 PM, Simon 'corecode' Schubert wrote: > Steffen Prohaska wrote: >> fromcvs's togit surrendered during the import. >> fromcvs's tohg accepted more of the history, but finally >> surrendered as well. > > Which repo is it you are converting? Is this available somewhere? Unfortunately not, the content is a proprietary software package. > I'd appreciate any reports concerning "surrenders" of fromcvs. > Additionally, it seems strange that tohg should have worked > "better" than togit, as these are basically just different backends. Some time passed since I did the tests. I had no time to do a detailed investigation then. I'll have more time now and will prepare a bug report, which is not easy because I can't sent you the cvs repo, sorry. Any hints what would be most helpful for you? I remember that togit reported a broken pipe. My feeling was that git-fastimport aborted, which may be reason why tohg worked better. I didn't try to understand more details. I never read ruby code before and it was already a challenge for me to get everything up and running (rcs, rbtree). Steffen ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 19:13 ` Steffen Prohaska @ 2007-08-02 19:29 ` Simon 'corecode' Schubert 2007-08-02 20:21 ` Robin Rosenberg ` (2 more replies) 2007-08-02 23:37 ` Michael Haggerty 1 sibling, 3 replies; 40+ messages in thread From: Simon 'corecode' Schubert @ 2007-08-02 19:29 UTC (permalink / raw) To: Steffen Prohaska; +Cc: Michael Haggerty, git, users Steffen Prohaska wrote: > I remember that togit reported a broken pipe. My feeling was > that git-fastimport aborted, which may be reason why tohg > worked better. I didn't try to understand more details. I never > read ruby code before and it was already a challenge for me to > get everything up and running (rcs, rbtree). yah, that pretty much tells me it is shawn's bug :) but without more details, it is very hard to diagnose. tohg should tell you which rcs revs are the offenders. be sure to use a recent fromcvs however. cheers simon ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 19:29 ` Simon 'corecode' Schubert @ 2007-08-02 20:21 ` Robin Rosenberg [not found] ` <200708022221.13129.robin.rosenberg.lists-RgPrefM1rjDQT0dZR+AlfA@public.gmane.org> ` (2 more replies) 2007-08-02 22:02 ` Steffen Prohaska 2007-08-03 3:07 ` Shawn O. Pearce 2 siblings, 3 replies; 40+ messages in thread From: Robin Rosenberg @ 2007-08-02 20:21 UTC (permalink / raw) To: Simon 'corecode' Schubert Cc: Steffen Prohaska, Michael Haggerty, git, users torsdag 02 augusti 2007 skrev Simon 'corecode' Schubert: > Steffen Prohaska wrote: > > I remember that togit reported a broken pipe. My feeling was > > that git-fastimport aborted, which may be reason why tohg > > worked better. I didn't try to understand more details. I never > > read ruby code before and it was already a challenge for me to > > get everything up and running (rcs, rbtree). > > yah, that pretty much tells me it is shawn's bug :) but without more details, it is very hard to diagnose. tohg should tell you which rcs revs are the offenders. be sure to use a recent fromcvs however. If the bug is still unfixed and you haven't been able to diagnose for lack of repos, you could try the Eclipse CVS repo. When I converted the Eclipse source to git I had a problem converting the whole repo, i.e. fastimport died. The conversion died so I excluded some large parts that were effectively forks and some websites. -- robin ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <200708022221.13129.robin.rosenberg.lists-RgPrefM1rjDQT0dZR+AlfA@public.gmane.org>]
* Re: cvs2svn conversion directly to git ready for experimentation [not found] ` <200708022221.13129.robin.rosenberg.lists-RgPrefM1rjDQT0dZR+AlfA@public.gmane.org> @ 2007-08-02 20:31 ` Lübbe Onken 0 siblings, 0 replies; 40+ messages in thread From: Lübbe Onken @ 2007-08-02 20:31 UTC (permalink / raw) To: users-6zjzXkf2FExf8fUKLXF2/HdfcadvtA/q Cc: git-u79uwXL29TY76Z2rM5mHXA, users-6zjzXkf2FExf8fUKLXF2/HdfcadvtA/q Hi Folks, I guess that the initial poster sent this message to the TortoiseSVN users list only by mistake, because the subject has nothing at all to do with TortoiseSVN. Could you please be so kind and remove the TortoiseSVN users list from future replies to this thread? thanks -Lübbe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 20:21 ` Robin Rosenberg [not found] ` <200708022221.13129.robin.rosenberg.lists-RgPrefM1rjDQT0dZR+AlfA@public.gmane.org> @ 2007-08-02 20:32 ` Lübbe Onken 2007-08-02 20:33 ` Lübbe Onken 2 siblings, 0 replies; 40+ messages in thread From: Lübbe Onken @ 2007-08-02 20:32 UTC (permalink / raw) To: git; +Cc: users Hi Folks, I guess that the initial poster sent this message to the TortoiseSVN users list only by mistake, because the subject has nothing at all to do with TortoiseSVN. Could you please be so kind and remove the TortoiseSVN users list from future replies to this thread? thanks -Lübbe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 20:21 ` Robin Rosenberg [not found] ` <200708022221.13129.robin.rosenberg.lists-RgPrefM1rjDQT0dZR+AlfA@public.gmane.org> 2007-08-02 20:32 ` Lübbe Onken @ 2007-08-02 20:33 ` Lübbe Onken 2 siblings, 0 replies; 40+ messages in thread From: Lübbe Onken @ 2007-08-02 20:33 UTC (permalink / raw) To: Robin Rosenberg Cc: Simon 'corecode' Schubert, Steffen Prohaska, Michael Haggerty, git, users Hi Folks, I guess that the initial poster sent this message to the TortoiseSVN users list only by mistake, because the subject has nothing at all to do with TortoiseSVN. Could you please be so kind and remove the TortoiseSVN users list from future replies to this thread? thanks -Lübbe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 19:29 ` Simon 'corecode' Schubert 2007-08-02 20:21 ` Robin Rosenberg @ 2007-08-02 22:02 ` Steffen Prohaska 2007-08-02 22:50 ` Simon 'corecode' Schubert 2007-08-03 3:07 ` Shawn O. Pearce 2 siblings, 1 reply; 40+ messages in thread From: Steffen Prohaska @ 2007-08-02 22:02 UTC (permalink / raw) To: Simon 'corecode' Schubert; +Cc: Michael Haggerty, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 3884 bytes --] Simon, On Aug 2, 2007, at 9:29 PM, Simon 'corecode' Schubert wrote: > Steffen Prohaska wrote: >> I remember that togit reported a broken pipe. My feeling was >> that git-fastimport aborted, which may be reason why tohg >> worked better. I didn't try to understand more details. I never >> read ruby code before and it was already a challenge for me to >> get everything up and running (rcs, rbtree). > > yah, that pretty much tells me it is shawn's bug :) but without > more details, it is very hard to diagnose. I tried again. Interestingly now togit works but tohg still fails. togit starts with reporting fatal: Not a valid object name as the first line. But besides that it seems to work fine. What concerns me a bit is that the last line togit reports is committing set 18100/18173 I'd expect it should report 18173/18173. The rest are git-fast-import statistics. BTW, togit creates much more complex branching patterns than cvs2svn does. The attached file branching.png displays a small view of a branching pattern that extends downwards over a couple of screens. I checked the cvs2svn history again. It doesn't contain anything of similar complexity. > tohg should tell you which rcs revs are the offenders. be sure to > use a recent fromcvs however. tohg fails (on the same repo that togit imported) with the following error Traceback (most recent call last): File "./tohg.py", line 102, in <module> destrepo.dispatch() File "./tohg.py", line 98, in dispatch func(*l[1:]) File "./tohg.py", line 78, in cmd_commit extra = {'branch': branch}) File "/sw/lib/python2.5/site-packages/mercurial/localrepo.py", line 736, in commit mn = self.manifest.add(m1, tr, linkrev, c1[0], c2[0], (new, remove)) File "/sw/lib/python2.5/site-packages/mercurial/manifest.py", line 191, in add _("failed to remove %s from manifest") % f) AssertionError: failed to remove X/Y.cpp from manifest transaction abort! rollback completed ./tohg.rb:200:in `readline': End of file reached while handling set [core/X/Y.cpp,v:1.19,core/X/Z.cpp,v:1.22,core/X/Attic/W,v:1.12] (EOFError) from ./tohg.rb:200:in `_commit' from ./tohg.rb:154:in `commit' from ./fromcvs.rb:894:in `commit' from ./fromcvs.rb:965:in `commit_sets' from ./tohg.rb:228 The versions I used are listed below. I adjusted tohg a bit to use python 2.5 installed by fink. I'm working on Mac OS X. $ cd fromcvs $ hg tip changeset: 103:cccdab84e9e5 tag: tip user: Simon 'corecode' Schubert <corecode@fs.ei.tum.de> date: Mon Jul 16 23:49:52 2007 +0200 summary: Add error handling on committing sets. $ hg diff diff -r cccdab84e9e5 tohg.rb --- a/tohg.rb Mon Jul 16 23:49:52 2007 +0200 +++ b/tohg.rb Fri Jul 20 17:06:30 2007 +0200 @@ -60,7 +60,7 @@ class HGDestRepo @status = status @outs, @ins = \ - Open2.popen2('python', File.join(File.dirname($0), 'tohg.py'), hgroot) + Open2.popen2('python2.5', File.join(File.dirname($0), 'tohg.py'), hgroot) @last_date = Time.at(@ins.readline.strip.to_i) @branches = {} while l = @ins.readline do $ cd rcsparse $ hg tip changeset: 37:e871e108f2e4 tag: tip user: Simon 'corecode' Schubert <corecode@fs.ei.tum.de> date: Sun Feb 18 15:46:29 2007 +0100 summary: Return revision date in GMT, like RCS/CVS uses everywhere. rbtree-0.2.0.tar.gz ruby 1.8.2 (2004-12-25) [universal-darwin8.0] $ cd git $ git describe master v1.5.3-rc3-120-g68d4229 $ hg --version Mercurial Distributed SCM (version 0.9.3) Copyright (C) 2005, 2006 Matt Mackall <mpm@selenic.com> This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ /sw/bin/python2.5 --version Python 2.5.1 Hope this helps. Steffen [-- Attachment #2.1: branching.png --] [-- Type: application/applefile, Size: 73 bytes --] [-- Attachment #2.2: branching.png --] [-- Type: image/png, Size: 17562 bytes --] [-- Attachment #3: Type: text/plain, Size: 1 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 22:02 ` Steffen Prohaska @ 2007-08-02 22:50 ` Simon 'corecode' Schubert 2007-08-02 23:50 ` Michael Haggerty 2007-08-04 8:28 ` Steffen Prohaska 0 siblings, 2 replies; 40+ messages in thread From: Simon 'corecode' Schubert @ 2007-08-02 22:50 UTC (permalink / raw) To: Steffen Prohaska; +Cc: Michael Haggerty, Git Mailing List Steffen Prohaska wrote: >> yah, that pretty much tells me it is shawn's bug :) but without more >> details, it is very hard to diagnose. > > I tried again. Interestingly now togit works but tohg still fails. > > togit starts with reporting > > fatal: Not a valid object name that's fine. > as the first line. But besides that it seems to work fine. What > concerns me a bit is that the last line togit reports is > > committing set 18100/18173 > > I'd expect it should report 18173/18173. that's fine as well. You only saw multiples of 100, but you didn't consider it would skip the itermediate ones, right? :) > BTW, togit creates much more complex branching patterns than cvs2svn > does. The attached file branching.png displays a small view of a > branching pattern that extends downwards over a couple of screens. > I checked the cvs2svn history again. It doesn't contain anything > of similar complexity. haha yea, there is still some issue with duplicate branch names and the branchpoint. if it doesn't get the branch right, it will always "pull" files from the parent branch. did you do some manual RCS file copying or manual branch name changing of individual files? this could be the reason. I still have to find a simple repo to reproduce this. > tohg fails (on the same repo that togit imported) with the > following error [..] > AssertionError: failed to remove X/Y.cpp from manifest This is a mercurial 0.9.3 error, as far as I can tell from the reports. This never occured here, and nobody reporting to me could ever reproduce this problem to pinpoint it. cheers simon -- Serve - BSD +++ RENT this banner advert +++ ASCII Ribbon /"\ Work - Mac +++ space for low €€€ NOW!1 +++ Campaign \ / Party Enjoy Relax | http://dragonflybsd.org Against HTML \ Dude 2c 2 the max ! http://golden-apple.biz Mail + News / \ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 22:50 ` Simon 'corecode' Schubert @ 2007-08-02 23:50 ` Michael Haggerty 2007-08-03 8:40 ` Simon 'corecode' Schubert 2007-08-04 8:28 ` Steffen Prohaska 1 sibling, 1 reply; 40+ messages in thread From: Michael Haggerty @ 2007-08-02 23:50 UTC (permalink / raw) To: Simon 'corecode' Schubert; +Cc: Steffen Prohaska, Git Mailing List Simon 'corecode' Schubert wrote: > Steffen Prohaska wrote: >> BTW, togit creates much more complex branching patterns than cvs2svn >> does. The attached file branching.png displays a small view of a >> branching pattern that extends downwards over a couple of screens. >> I checked the cvs2svn history again. It doesn't contain anything >> of similar complexity. > > haha yea, there is still some issue with duplicate branch names and the > branchpoint. if it doesn't get the branch right, it will always "pull" > files from the parent branch. This sounds very much like the problem reported by Daniel Jacobowitz [1]. The problem is that if you create a branch A on a file, then create branch B from branch A before making a commit on branch A, then CVS doesn't record that branch A was the source of branch B. (It treats B as if it sprouted directly from the revision that was the *source* of branch A.) The same problem exists if "B" is a tag. The only way to determine the correct branch hierarchy is to consider the branch hierarchy of multiple files at the same time. cvs2svn 2.0 includes code to choose a "preferred parent" of each branch and try to use that parent for every file that is on the branch. It helps simplify branch creation quite a bit. The main limitation is that it still doesn't consider the revision copied back to trunk from a vendor branch as the possible parent of a branch whose nominal source was on the vendor branch (a limitation that has come up elsewhere in this thread). Michael [1] http://cvs2svn.tigris.org/servlets/ReadMsg?list=dev&msgNo=1441 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 23:50 ` Michael Haggerty @ 2007-08-03 8:40 ` Simon 'corecode' Schubert 0 siblings, 0 replies; 40+ messages in thread From: Simon 'corecode' Schubert @ 2007-08-03 8:40 UTC (permalink / raw) To: Michael Haggerty; +Cc: Steffen Prohaska, Git Mailing List Michael Haggerty wrote: > Simon 'corecode' Schubert wrote: >> Steffen Prohaska wrote: >>> BTW, togit creates much more complex branching patterns than cvs2svn >>> does. The attached file branching.png displays a small view of a >>> branching pattern that extends downwards over a couple of screens. >>> I checked the cvs2svn history again. It doesn't contain anything >>> of similar complexity. >> haha yea, there is still some issue with duplicate branch names and the >> branchpoint. if it doesn't get the branch right, it will always "pull" >> files from the parent branch. > > This sounds very much like the problem reported by Daniel Jacobowitz > [1]. The problem is that if you create a branch A on a file, then > create branch B from branch A before making a commit on branch A, then > CVS doesn't record that branch A was the source of branch B. (It treats > B as if it sprouted directly from the revision that was the *source* of > branch A.) The same problem exists if "B" is a tag. I think I have covered this case quite well. I believe "my" problem happens when there are files being copied manually within the repository and then branch names being changed (or just branch names being changed). However, the name change just happens only on a subset of files and branches, so you wind up with a commit which is part of two branches. Or something like that. I really should have the time to investigate this. One elementary problem with CVS is that you can assign two branch names to the same branch. During conversion you need to choose one over the other. cheers simon -- Serve - BSD +++ RENT this banner advert +++ ASCII Ribbon /"\ Work - Mac +++ space for low €€€ NOW!1 +++ Campaign \ / Party Enjoy Relax | http://dragonflybsd.org Against HTML \ Dude 2c 2 the max ! http://golden-apple.biz Mail + News / \ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 22:50 ` Simon 'corecode' Schubert 2007-08-02 23:50 ` Michael Haggerty @ 2007-08-04 8:28 ` Steffen Prohaska 1 sibling, 0 replies; 40+ messages in thread From: Steffen Prohaska @ 2007-08-04 8:28 UTC (permalink / raw) To: Simon 'corecode' Schubert; +Cc: Michael Haggerty, Git Mailing List On Aug 3, 2007, at 12:50 AM, Simon 'corecode' Schubert wrote: > Steffen Prohaska wrote: >>> yah, that pretty much tells me it is shawn's bug :) but without >>> more details, it is very hard to diagnose. >> I tried again. Interestingly now togit works but tohg still fails. >> togit starts with reporting >> fatal: Not a valid object name > > that's fine. Looks a bit scary. Could you hide the message from the user if it's fine. >> as the first line. But besides that it seems to work fine. What >> concerns me a bit is that the last line togit reports is >> committing set 18100/18173 >> I'd expect it should report 18173/18173. > > that's fine as well. You only saw multiples of 100, but you didn't > consider it would skip the itermediate ones, right? :) I don't care about the intermediates, but only about the last one. I'd expect that a successful import would report as the last line 18173/18173. If the first number is smaller than the second, this indicates to me that there's something left to do. >> BTW, togit creates much more complex branching patterns than cvs2svn >> does. The attached file branching.png displays a small view of a >> branching pattern that extends downwards over a couple of screens. >> I checked the cvs2svn history again. It doesn't contain anything >> of similar complexity. > > haha yea, there is still some issue with duplicate branch names and > the branchpoint. if it doesn't get the branch right, it will > always "pull" files from the parent branch. > > did you do some manual RCS file copying or manual branch name > changing of individual files? this could be the reason. I still > have to find a simple repo to reproduce this. Maybe, the repo is 8 years old. It started before I joined the development. Steffen ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 19:29 ` Simon 'corecode' Schubert 2007-08-02 20:21 ` Robin Rosenberg 2007-08-02 22:02 ` Steffen Prohaska @ 2007-08-03 3:07 ` Shawn O. Pearce 2 siblings, 0 replies; 40+ messages in thread From: Shawn O. Pearce @ 2007-08-03 3:07 UTC (permalink / raw) To: Simon 'corecode' Schubert Cc: Steffen Prohaska, Michael Haggerty, git, users Simon 'corecode' Schubert <corecode@fs.ei.tum.de> wrote: > Steffen Prohaska wrote: > >I remember that togit reported a broken pipe. My feeling was > >that git-fastimport aborted, which may be reason why tohg > >worked better. > > yah, that pretty much tells me it is shawn's bug :) but without more > details, it is very hard to diagnose. tohg should tell you which rcs revs > are the offenders. be sure to use a recent fromcvs however. Tonight I'm going to try and add crash dump reporting to fast-import. Once that's in it should make debugging some of these failed imports easier, as we'll be able to see the immediate commands leading up to the crash and the internal state of fast-import when it barfed. Of course one needs to locate an ugly repository and run on it... -- Shawn. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 19:13 ` Steffen Prohaska 2007-08-02 19:29 ` Simon 'corecode' Schubert @ 2007-08-02 23:37 ` Michael Haggerty 1 sibling, 0 replies; 40+ messages in thread From: Michael Haggerty @ 2007-08-02 23:37 UTC (permalink / raw) To: Steffen Prohaska; +Cc: Simon 'corecode' Schubert, git, users Steffen Prohaska wrote: > On Aug 2, 2007, at 7:35 PM, Simon 'corecode' Schubert wrote: >> Steffen Prohaska wrote: >>> fromcvs's togit surrendered during the import. >>> fromcvs's tohg accepted more of the history, but finally >>> surrendered as well. >> >> Which repo is it you are converting? Is this available somewhere? > > Unfortunately not, the content is a proprietary software package. > >> I'd appreciate any reports concerning "surrenders" of fromcvs. >> [...] > > Some time passed since I did the tests. I had no time to do a > detailed investigation then. I'll have more time now and will > prepare a bug report, which is not easy because I can't sent you > the cvs repo, sorry. I wrote a couple of scripts for dealing with just this situation for cvs2svn bug reports, but they should also work for you, and I highly recommend them. Both scripts are included in the cvs2svn source tree: 1. contrib/destroy_repository.py [1] -- strips almost all of the information out of a CVS repository, including author names, log messages, and file contents (but not file names, commit dates, or branch/tag names). Most bugs are not affected by the omission of such data. Use of this script has the effect of deleting most information that might be considered proprietary and also shrinking the size of the test case considerably. Use of this script is described in the script comments itself and also in [2]. 2. contrib/shrink_test_case.py [2] -- you provide the script with a command that should "exit 0" if the bug you are looking for still exists. It does a kind of "binary search" through CVS repository space, iteratively attempting to delete a chunk of the CVS repository, running the test command, then (depending on whether the test succeeded) either reverting or making permanent the deletion. It can boil most test cases down to just 1-3 files (though presumably not if the "problem" is a 23-way merge). The things that it will try to delete are: - Entire directories and groups of directories - Entire files and groups of files - Branches within individual files - Tags within individual files It does this in a somewhat optimal way, trying to minimize the number of times that the test has to be run. This script is documented in its own comments and also in [4]. Michael [1] http://cvs2svn.tigris.org/svn/cvs2svn/trunk/contrib/destroy_repository.py [2] http://cvs2svn.tigris.org/faq.html#reportingbugs [3] http://cvs2svn.tigris.org/svn/cvs2svn/trunk/contrib/shrink_test_case.py [4] http://cvs2svn.tigris.org/faq.html#testcase ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 8:49 ` Steffen Prohaska 2007-08-02 17:23 ` Michael Haggerty 2007-08-02 17:35 ` Simon 'corecode' Schubert @ 2007-08-02 20:43 ` Linus Torvalds 2007-08-02 23:19 ` Michael Haggerty 2007-08-02 23:55 ` Jon Smirl 3 siblings, 1 reply; 40+ messages in thread From: Linus Torvalds @ 2007-08-02 20:43 UTC (permalink / raw) To: Steffen Prohaska; +Cc: Michael Haggerty, git, users On Thu, 2 Aug 2007, Steffen Prohaska wrote: > > Right now, I'd prefer the import by parsecvs because of the > simpler history. However, I don't know if I loose history > information by doing so. I'd start by a run of cvs2svn to validate > the overall structure of the CVS repository. Well, once imported, you could just go through the branches and tags, and just delete the ones you consider uninteresting, and then do a "git gc". You'd want to re-pack after a fast-import anyway (regardless of the source of the fast-import input), so maybe cvs2svn ends up giving you a bit unnecessary info, but it should be easy enough to get rid of after-the-fact. Linus ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 20:43 ` Linus Torvalds @ 2007-08-02 23:19 ` Michael Haggerty 2007-08-03 3:12 ` Shawn O. Pearce 0 siblings, 1 reply; 40+ messages in thread From: Michael Haggerty @ 2007-08-02 23:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Steffen Prohaska, git, users Linus Torvalds wrote: > On Thu, 2 Aug 2007, Steffen Prohaska wrote: >> Right now, I'd prefer the import by parsecvs because of the >> simpler history. However, I don't know if I loose history >> information by doing so. I'd start by a run of cvs2svn to validate >> the overall structure of the CVS repository. > > Well, once imported, you could just go through the branches and tags, and > just delete the ones you consider uninteresting, and then do a "git gc". > > You'd want to re-pack after a fast-import anyway (regardless of the source > of the fast-import input), so maybe cvs2svn ends up giving you a bit > unnecessary info, but it should be easy enough to get rid of > after-the-fact. The real goal is to get cvs2svn to include the useful information and exclude the rest. :-) I definitely want to address the problem of the helper branches used to create tags. This problem has has two aspects: 1. The helper branches should be deleted after the tag has been defined. I simply couldn't figure out how to do this using git-fast-import, and git-fast-import complained when I tried to use a branch called "TAG_FIXUP" without the "refs/head/" prefix. 2. The helper branch is not needed at all if an existing revision has exactly the same contents as needed on the tag. This requires cvs2svn to keep a record of which files exist in the complete file tree on every branch at every revision (which it can already do, though it is expensive), and also to give it the smarts to choose the optimal tag point (which it already does, except that it currently doesn't penalize sources that require files to be deleted before making the tag). If the problem is lots of seemingly-unnecessary merges involving a vendor branch, then it is time for me or some other volunteer to add the optimization of allowing branches to be grafted from the vendor branch to trunk. I know of the problem and have a good idea how to implement it; it is just a matter of finding the time to get it done. If the problem is unlabeled branches that can't be excluded (because other branches or tags depend on them), then the real problem is that it is not known which unlabeled branches in individual files correspond to the same project-wide conceptual branch. I have considered two possibilities to improve this situation: 1. Allow unlabeled -- indeed any -- branches to be discarded even if other branches or tags depend on them. This could be done by incorporating the content of the source revision (i.e., the revision on the unlabeled branch that is going to be discarded) into the zeroth revision of the daughter branch, then grafting the daughter onto the branch from which the unlabeled branch sprouted. 2. Rename the unlabeled branches by figuring out which unlabeled branch in fileA corresponds to which unlabeled branch in fileB, fileC, etc. This would involve a tricky bit of matching file-wise dependency trees onto one another to unify unlabeled branch labels, keeping in mind that: - The trees have other differences as well. - The unlabeled branch does not necessarily occur in every file. - There may be multiple unlabeled branches per file. Michael ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 23:19 ` Michael Haggerty @ 2007-08-03 3:12 ` Shawn O. Pearce 0 siblings, 0 replies; 40+ messages in thread From: Shawn O. Pearce @ 2007-08-03 3:12 UTC (permalink / raw) To: Michael Haggerty; +Cc: Linus Torvalds, Steffen Prohaska, git, users Michael Haggerty <mhagger@alum.mit.edu> wrote: > 1. The helper branches should be deleted after the tag has been defined. > I simply couldn't figure out how to do this using git-fast-import, and > git-fast-import complained when I tried to use a branch called > "TAG_FIXUP" without the "refs/head/" prefix. Two issues there: * Deleting branches: I currently don't support this in fast-import, but I'll add support for it. Its actually pretty simple to tell it to drop a branch, especially if the dang thing doesn't actually exist in the git repository yet (because its only in-memory). * Creating a branch without refs/heads/ prefix: This is a bug. I had good intentions by trying to verify the name was one that didn't contain special reserved characters, but I wound up also requiring you to create branches only in the refs/heads/ namespace. That was not what I wanted to do. I'm patching it tonight. -- Shawn. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 8:49 ` Steffen Prohaska ` (2 preceding siblings ...) 2007-08-02 20:43 ` Linus Torvalds @ 2007-08-02 23:55 ` Jon Smirl 3 siblings, 0 replies; 40+ messages in thread From: Jon Smirl @ 2007-08-02 23:55 UTC (permalink / raw) To: Steffen Prohaska; +Cc: Michael Haggerty, git, users On 8/2/07, Steffen Prohaska <prohaska@zib.de> wrote: > Right now, I'd prefer the import by parsecvs because of the > simpler history. However, I don't know if I loose history > information by doing so. I'd start by a run of cvs2svn to validate > the overall structure of the CVS repository. Dealing with corruption > in the CVS repository seems to be superior in cvs2svn. It reports > errors when parsecvs just crashes. Parsecvs silently throws away things that confuse it. cvs2svn is much more careful about not losing track of anything. For example parsecvs is unable to process Mozilla CVS and cvs2svn can. The branching in Mozilla CVS is too complex for parsecvs to handle. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <8b65902a0708010438s24d16109k601b52c04cf9c066@mail.gmail.com>]
* Re: cvs2svn conversion directly to git ready for experimentation [not found] ` <8b65902a0708010438s24d16109k601b52c04cf9c066@mail.gmail.com> @ 2007-08-02 15:34 ` Michael Haggerty 2007-08-02 23:08 ` Martin Langhoff 0 siblings, 1 reply; 40+ messages in thread From: Michael Haggerty @ 2007-08-02 15:34 UTC (permalink / raw) To: Guilhem Bonnefille; +Cc: git, users [I am CCing this response to the mailing lists.] Guilhem Bonnefille wrote: > On 8/1/07, Michael Haggerty <mhagger@alum.mit.edu> wrote: >> I am the maintainer of cvs2svn[1], which is a program for one-time >> conversions from CVS to Subversion. cvs2svn is very robust against the >> many peculiarities of CVS and can convert just about every CVS >> repository we have ever seen. > > What are the differences with cvsps ( http://www.cobite.com/cvsps/ )? I'm not extremely familiar with cvsps, and I don't really want to get into a "my-tool-is-better-than-your-tool" kind of argument. Instead I will mention that the goals of the two projects are somewhat different: cvs2svn is meant for one-time conversions from CVS, and therefore aims for maximum conversion accuracy, robustness even in the presence of some kinds of CVS repository corruption, intelligent translation of CVS idioms to the idioms of a modern SCM, and scalability to large repositories (by using on-disk databases instead of RAM for intermediate data). Conversion speed is not a primary goal of cvs2svn, and incremental conversions are not supported at all. cvs2svn requires filesystem access to the CVS repository (it parses the RCS files directly). cvsps is not a conversion tool at all, though it is used by other conversion tools to generate the changesets. It appears (I hope I am not misinterpreting things) to emphasize speed and incremental operation, for example attempting to make changesets consistent from one run to the next, even if the CVS repository has been changed prudently between runs. cvsps does not appear to attempt to create atomic branch and tag creation commits or handle CVS's special vendorbranch behavior. cvsps operates via the CVS protocol; you don't need filesystem access to the CVS repository. I can also point you to a list of cvs2svn features, which includes a list of some of the CVS quirks that it knows how to handle: http://cvs2svn.tigris.org/cvs2svn.html#features cvs2svn includes a large suite of perverse CVS repositories that we use for testing. Many of them are derived from real-life CVS repositories that people have had problems with. It would be very interesting to see how other conversion tools handle these repositories, but I don't expect to have time to do so in the near future. Michael ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 15:34 ` Michael Haggerty @ 2007-08-02 23:08 ` Martin Langhoff 2007-08-03 4:03 ` Johannes Schindelin ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Martin Langhoff @ 2007-08-02 23:08 UTC (permalink / raw) To: Michael Haggerty; +Cc: Guilhem Bonnefille, git, users On 8/3/07, Michael Haggerty <mhagger@alum.mit.edu> wrote: > cvsps is not a conversion tool at all, though it is used by other > conversion tools to generate the changesets. It appears (I hope I am > not misinterpreting things) to emphasize speed and incremental > operation, for example attempting to make changesets consistent from one > run to the next, even if the CVS repository has been changed prudently > between runs. cvsps does not appear to attempt to create atomic branch > and tag creation commits or handle CVS's special vendorbranch behavior. > cvsps operates via the CVS protocol; you don't need filesystem access > to the CVS repository. 100% in agreement. And though I can't claim to be happy with cvsps, in many scenarios it is mighty useful, in spite of its significant warts. The "does incrementals" is hugely important these days, as lots of people use git to run "vendor branches" of upstream projects that use CVS. To me, that's *the* killer-app feature of git. Of course, others see different aspects of git as their deal-maker. But I'm sure I'm not alone on this. Surely enough, others have written git-svn which accomplishes this and more for those tracking SVN upstreams. Is there any way we can run tweak cvs2svn to run incrementals, even if not as fast as cvsps/git-cvsimport? The "do it remotely" part can be worked around in most cases. cheers, martin ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 23:08 ` Martin Langhoff @ 2007-08-03 4:03 ` Johannes Schindelin 2007-08-03 6:48 ` Steffen Prohaska 2007-08-03 7:10 ` Steffen Prohaska 2007-08-03 8:36 ` Michael Haggerty 2 siblings, 1 reply; 40+ messages in thread From: Johannes Schindelin @ 2007-08-03 4:03 UTC (permalink / raw) To: Martin Langhoff; +Cc: Michael Haggerty, Guilhem Bonnefille, git, users Hi, On Fri, 3 Aug 2007, Martin Langhoff wrote: > On 8/3/07, Michael Haggerty <mhagger@alum.mit.edu> wrote: > > cvsps is not a conversion tool at all, though it is used by other > > conversion tools to generate the changesets. It appears (I hope I am > > not misinterpreting things) to emphasize speed and incremental > > operation, for example attempting to make changesets consistent from one > > run to the next, even if the CVS repository has been changed prudently > > between runs. cvsps does not appear to attempt to create atomic branch > > and tag creation commits or handle CVS's special vendorbranch behavior. > > cvsps operates via the CVS protocol; you don't need filesystem access > > to the CVS repository. > > 100% in agreement. And though I can't claim to be happy with cvsps, in > many scenarios it is mighty useful, in spite of its significant warts. > The "does incrementals" is hugely important these days, as lots of > people use git to run "vendor branches" of upstream projects that use > CVS. Me too: 100% agreement. A couple of people seem to be content to proclaim that their incomplete solutions are better, but in the end of the day, they are as bad as the programs they purport to replace: incomplete. For the moment, I help myself with tracking the different branches individually, but there, really, git-cvsimport is as good as the other "solutions", with the further advantage that they are actually hackable, and not closed to everybody outside a very small community. So I look forward to testing cvs2svn(git-branch) this weekend. Ciao, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-03 4:03 ` Johannes Schindelin @ 2007-08-03 6:48 ` Steffen Prohaska 0 siblings, 0 replies; 40+ messages in thread From: Steffen Prohaska @ 2007-08-03 6:48 UTC (permalink / raw) To: Johannes Schindelin Cc: Martin Langhoff, Michael Haggerty, Guilhem Bonnefille, git, users On Aug 3, 2007, at 6:03 AM, Johannes Schindelin wrote: > On Fri, 3 Aug 2007, Martin Langhoff wrote: > >> On 8/3/07, Michael Haggerty <mhagger@alum.mit.edu> wrote: >>> cvsps is not a conversion tool at all, though it is used by other >>> conversion tools to generate the changesets. It appears (I hope >>> I am >>> not misinterpreting things) to emphasize speed and incremental >>> operation, for example attempting to make changesets consistent >>> from one >>> run to the next, even if the CVS repository has been changed >>> prudently >>> between runs. cvsps does not appear to attempt to create atomic >>> branch >>> and tag creation commits or handle CVS's special vendorbranch >>> behavior. >>> cvsps operates via the CVS protocol; you don't need filesystem >>> access >>> to the CVS repository. >> >> 100% in agreement. And though I can't claim to be happy with >> cvsps, in >> many scenarios it is mighty useful, in spite of its significant >> warts. >> The "does incrementals" is hugely important these days, as lots of >> people use git to run "vendor branches" of upstream projects that use >> CVS. > > Me too: 100% agreement. A couple of people seem to be content to > proclaim > that their incomplete solutions are better, but in the end of the day, > they are as bad as the programs they purport to replace: incomplete. > > For the moment, I help myself with tracking the different branches > individually, but there, really, git-cvsimport is as good as the other > "solutions", with the further advantage that they are actually > hackable, > and not closed to everybody outside a very small community. I just want to add a warning. You should be suspicious of branched imported using git-cvsimport (which is based on cvsps). If the time the branch is created differs from the time of the first commit to the branch git- cvsimport may get the branching point wrong. This introduces a race condition. Someone may have committed changes to a file that is later changed on the branch. At that point the history of the imported branch is broken and git reports _wrong_ changesets. I ran into this issue and abandoned the use of git-cvsimport. It's too dangerous for me. The testcase in [1] illustrates the problem. I still strongly believe the warning should be stated in *BOLD* in the documentation. I'm not saying git-cvsimport is useless. But you should be suspicious about the result of the import, especially if you plan to rely on changesets derived from the imported repo, for example if you plan to do cherry-picking or merging in git; or if you plan to blame people for their stupid changes based on what you see in gitk (almost happend to me ;). Steffen [1] http://marc.info/?l=git&m=118260312708709&w=2 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 23:08 ` Martin Langhoff 2007-08-03 4:03 ` Johannes Schindelin @ 2007-08-03 7:10 ` Steffen Prohaska 2007-08-03 8:36 ` Michael Haggerty 2 siblings, 0 replies; 40+ messages in thread From: Steffen Prohaska @ 2007-08-03 7:10 UTC (permalink / raw) To: Martin Langhoff; +Cc: Michael Haggerty, Guilhem Bonnefille, git, users On Aug 3, 2007, at 1:08 AM, Martin Langhoff wrote: > Is there any way we can run tweak cvs2svn to run incrementals, even if > not as fast as cvsps/git-cvsimport? The "do it remotely" part can be > worked around in most cases. What I currently do with parsecvs is to run complete imports again on the repo. For 'normal' changes to cvs the old import can be fast forwarded to the new import. However, if you add or remove files or tweak revision in another abnormal way (cvs admin) this might fail. In this case I manually search the last common commit and rebase new commits to the old, already imported branch. I need to do this if I already publishes the imported branch. Otherwise I can as well just reset to the newly imported branch and rebase my work on top of it. Some careful validation (git diff-*) is included in my workflow. A complete run of parsecvs is fine for me because it is so fast. I run git-filter-branch afterwards anyway to cleanup some commit messages and author information. This takes most of the time, because it spawns off tons of sub processes. I'd not recommend my approach for incremental imports every hour, but you can run it every day (although I do less often). You only need to validate the final result (fast forward or not). The rest can be fully automated by some shell scripting. Steffen ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-02 23:08 ` Martin Langhoff 2007-08-03 4:03 ` Johannes Schindelin 2007-08-03 7:10 ` Steffen Prohaska @ 2007-08-03 8:36 ` Michael Haggerty 2007-08-03 14:35 ` Patwardhan, Rajesh 2 siblings, 1 reply; 40+ messages in thread From: Michael Haggerty @ 2007-08-03 8:36 UTC (permalink / raw) To: Martin Langhoff; +Cc: Guilhem Bonnefille, git, users Martin Langhoff wrote: > Is there any way we can run tweak cvs2svn to run incrementals, even if > not as fast as cvsps/git-cvsimport? The "do it remotely" part can be > worked around in most cases. I don't see any fundamental reason why not, but I think it would be a significant amount of work. There are two main issues: 1. With CVS, it is possible to change things retroactively, such as changing which version of a file is included in a tag, or adding a new file to a tag, or changing whether a file is text vs. binary. And many people copy and/or rename files within the CVS repository itself (to get around CVS's inability to rename a file). This makes it look like the file has *always* existed under the new name and *never* existed under the old name. An incremental conversion tool would have to look carefully for such changes and either handle them properly or complain loudly and abort. 2. cvs2svn uses a lot of repository-wide information to make decisions about how to group CVSItems into changesets, and a lot of these decisions are based on heuristics. Incremental conversion would require that the decisions made in one cvs2svn run are recorded and treated as unalterable in subsequent runs. This hasn't been a priority in the Subversion world, because, frankly, what reason would a person have to stick with CVS instead of switching to Subversion, given that (1) they are intentionally so similar in workflow, an (2) there is no significant competition from other centralized SCMs? But of course until the distributed SCM playing field has been thinned out a bit, people will probably be reluctant to commit to one or the other. I don't expect to have time to implement incremental conversions in cvs2svn in the near future. (I'd much rather work on output back ends to other distributed SCMs.) But if any volunteers step forward (hint, hint) I would be happy to help them get started and answer their questions. I think that cvs2svn is quite hackable now, so the learning curve is hopefully much less frightening than when I started on the project :-) Michael ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: Re: cvs2svn conversion directly to git ready for experimentation 2007-08-03 8:36 ` Michael Haggerty @ 2007-08-03 14:35 ` Patwardhan, Rajesh 2007-08-03 15:41 ` Jon Smirl 0 siblings, 1 reply; 40+ messages in thread From: Patwardhan, Rajesh @ 2007-08-03 14:35 UTC (permalink / raw) To: Michael Haggerty, Martin Langhoff; +Cc: Guilhem Bonnefille, git, users Hello Michael, I will explain a scenario (we are passing thru this right now) 1) you have 10 years worth of cvs data. 2) We want to move to svn. 3) The repository move should be in such a way that the development does not get hampered for any 1 work day. 4) We have atleast 4 major modules in cvs which takes about 30 - 40 hours each for conversion currently. 5) With increamental conversions we can do a few things ... A) Keep the downtime for hard cutoff minimal B) try out the svn move for other auxillary tools that are needed by the SCM process. C) Do some meaningful testing and validation with simulated live moves of changes from cvs to svn before the actual move on a day to day basis. Hopefuly this would substantiate the request \ need for increamental moves. Or if someone out there has a better suggestion for such scenario's please point me in the right direction. Regards, Rajesh -----Original Message----- From: Michael Haggerty [mailto:mhagger@alum.mit.edu] Sent: Friday, August 03, 2007 1:36 AM To: Martin Langhoff Cc: Guilhem Bonnefille; git@vger.kernel.org; users@cvs2svn.tigris.org Subject: Re: cvs2svn conversion directly to git ready for experimentation Martin Langhoff wrote: > Is there any way we can run tweak cvs2svn to run incrementals, even if > not as fast as cvsps/git-cvsimport? The "do it remotely" part can be > worked around in most cases. I don't see any fundamental reason why not, but I think it would be a significant amount of work. There are two main issues: 1. With CVS, it is possible to change things retroactively, such as changing which version of a file is included in a tag, or adding a new file to a tag, or changing whether a file is text vs. binary. And many people copy and/or rename files within the CVS repository itself (to get around CVS's inability to rename a file). This makes it look like the file has *always* existed under the new name and *never* existed under the old name. An incremental conversion tool would have to look carefully for such changes and either handle them properly or complain loudly and abort. 2. cvs2svn uses a lot of repository-wide information to make decisions about how to group CVSItems into changesets, and a lot of these decisions are based on heuristics. Incremental conversion would require that the decisions made in one cvs2svn run are recorded and treated as unalterable in subsequent runs. This hasn't been a priority in the Subversion world, because, frankly, what reason would a person have to stick with CVS instead of switching to Subversion, given that (1) they are intentionally so similar in workflow, an (2) there is no significant competition from other centralized SCMs? But of course until the distributed SCM playing field has been thinned out a bit, people will probably be reluctant to commit to one or the other. I don't expect to have time to implement incremental conversions in cvs2svn in the near future. (I'd much rather work on output back ends to other distributed SCMs.) But if any volunteers step forward (hint, hint) I would be happy to help them get started and answer their questions. I think that cvs2svn is quite hackable now, so the learning curve is hopefully much less frightening than when I started on the project :-) Michael --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@cvs2svn.tigris.org For additional commands, e-mail: users-help@cvs2svn.tigris.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: cvs2svn conversion directly to git ready for experimentation 2007-08-03 14:35 ` Patwardhan, Rajesh @ 2007-08-03 15:41 ` Jon Smirl 2007-08-03 16:42 ` Patwardhan, Rajesh 2007-08-03 18:58 ` Michael Haggerty 0 siblings, 2 replies; 40+ messages in thread From: Jon Smirl @ 2007-08-03 15:41 UTC (permalink / raw) To: Patwardhan, Rajesh Cc: Michael Haggerty, Martin Langhoff, Guilhem Bonnefille, git, users On 8/3/07, Patwardhan, Rajesh <rajesh.patwardhan@etrade.com> wrote: > > Hello Michael, > I will explain a scenario (we are passing thru this right now) > 1) you have 10 years worth of cvs data. > 2) We want to move to svn. > 3) The repository move should be in such a way that the development does > not get hampered for any 1 work day. > 4) We have atleast 4 major modules in cvs which takes about 30 - 40 > hours each for conversion currently. There are known ways (that haven't been implemented) to get the 40 hr number down to 1/2 hour. Would that be a better approach than doing incremental imports? > 5) With increamental conversions we can do a few things ... > A) Keep the downtime for hard cutoff minimal > B) try out the svn move for other auxillary tools that are > needed by the SCM process. > C) Do some meaningful testing and validation with simulated live > moves of changes from cvs to svn before the actual move on a day to day > basis. > > Hopefuly this would substantiate the request \ need for increamental > moves. Or if someone out there has a better suggestion for such > scenario's please point me in the right direction. > > Regards, > Rajesh > > -----Original Message----- > From: Michael Haggerty [mailto:mhagger@alum.mit.edu] > Sent: Friday, August 03, 2007 1:36 AM > To: Martin Langhoff > Cc: Guilhem Bonnefille; git@vger.kernel.org; users@cvs2svn.tigris.org > Subject: Re: cvs2svn conversion directly to git ready for > experimentation > > Martin Langhoff wrote: > > Is there any way we can run tweak cvs2svn to run incrementals, even if > > > not as fast as cvsps/git-cvsimport? The "do it remotely" part can be > > worked around in most cases. > > I don't see any fundamental reason why not, but I think it would be a > significant amount of work. There are two main issues: > > 1. With CVS, it is possible to change things retroactively, such as > changing which version of a file is included in a tag, or adding a new > file to a tag, or changing whether a file is text vs. binary. And many > people copy and/or rename files within the CVS repository itself (to get > around CVS's inability to rename a file). This makes it look like the > file has *always* existed under the new name and *never* existed under > the old name. An incremental conversion tool would have to look > carefully for such changes and either handle them properly or complain > loudly and abort. > > 2. cvs2svn uses a lot of repository-wide information to make decisions > about how to group CVSItems into changesets, and a lot of these > decisions are based on heuristics. Incremental conversion would require > that the decisions made in one cvs2svn run are recorded and treated as > unalterable in subsequent runs. > > This hasn't been a priority in the Subversion world, because, frankly, > what reason would a person have to stick with CVS instead of switching > to Subversion, given that (1) they are intentionally so similar in > workflow, an (2) there is no significant competition from other > centralized SCMs? But of course until the distributed SCM playing field > has been thinned out a bit, people will probably be reluctant to commit > to one or the other. > > I don't expect to have time to implement incremental conversions in > cvs2svn in the near future. (I'd much rather work on output back ends > to other distributed SCMs.) But if any volunteers step forward (hint, > hint) I would be happy to help them get started and answer their > questions. I think that cvs2svn is quite hackable now, so the learning > curve is hopefully much less frightening than when I started on the > project :-) > > Michael > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@cvs2svn.tigris.org > For additional commands, e-mail: users-help@cvs2svn.tigris.org > > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: Re: cvs2svn conversion directly to git ready for experimentation 2007-08-03 15:41 ` Jon Smirl @ 2007-08-03 16:42 ` Patwardhan, Rajesh 2007-08-03 18:58 ` Michael Haggerty 1 sibling, 0 replies; 40+ messages in thread From: Patwardhan, Rajesh @ 2007-08-03 16:42 UTC (permalink / raw) To: Jon Smirl Cc: Michael Haggerty, Martin Langhoff, Guilhem Bonnefille, git, users Thank you very much for the email. Yes if the time for conversion can be brought down to 1/2 hour then it would be really great. We could do a automated cvs2svn everyday for testing and that way maximum lag between cvs and test svn repo would be 1 day. Please do let me know when available. Regards, Rajesh -----Original Message----- From: Jon Smirl [mailto:jonsmirl@gmail.com] Sent: Friday, August 03, 2007 8:41 AM To: Patwardhan, Rajesh Cc: Michael Haggerty; Martin Langhoff; Guilhem Bonnefille; git@vger.kernel.org; users@cvs2svn.tigris.org Subject: Re: Re: cvs2svn conversion directly to git ready for experimentation On 8/3/07, Patwardhan, Rajesh <rajesh.patwardhan@etrade.com> wrote: > > Hello Michael, > I will explain a scenario (we are passing thru this right now) > 1) you have 10 years worth of cvs data. > 2) We want to move to svn. > 3) The repository move should be in such a way that the development > does not get hampered for any 1 work day. > 4) We have atleast 4 major modules in cvs which takes about 30 - 40 > hours each for conversion currently. There are known ways (that haven't been implemented) to get the 40 hr number down to 1/2 hour. Would that be a better approach than doing incremental imports? > 5) With increamental conversions we can do a few things ... > A) Keep the downtime for hard cutoff minimal > B) try out the svn move for other auxillary tools that are > needed by the SCM process. > C) Do some meaningful testing and validation with simulated > live moves of changes from cvs to svn before the actual move on a day > to day basis. > > Hopefuly this would substantiate the request \ need for increamental > moves. Or if someone out there has a better suggestion for such > scenario's please point me in the right direction. > > Regards, > Rajesh > > -----Original Message----- > From: Michael Haggerty [mailto:mhagger@alum.mit.edu] > Sent: Friday, August 03, 2007 1:36 AM > To: Martin Langhoff > Cc: Guilhem Bonnefille; git@vger.kernel.org; users@cvs2svn.tigris.org > Subject: Re: cvs2svn conversion directly to git ready for > experimentation > > Martin Langhoff wrote: > > Is there any way we can run tweak cvs2svn to run incrementals, even > > if > > > not as fast as cvsps/git-cvsimport? The "do it remotely" part can be > > worked around in most cases. > > I don't see any fundamental reason why not, but I think it would be a > significant amount of work. There are two main issues: > > 1. With CVS, it is possible to change things retroactively, such as > changing which version of a file is included in a tag, or adding a new > file to a tag, or changing whether a file is text vs. binary. And > many people copy and/or rename files within the CVS repository itself > (to get around CVS's inability to rename a file). This makes it look > like the file has *always* existed under the new name and *never* > existed under the old name. An incremental conversion tool would have > to look carefully for such changes and either handle them properly or > complain loudly and abort. > > 2. cvs2svn uses a lot of repository-wide information to make decisions > about how to group CVSItems into changesets, and a lot of these > decisions are based on heuristics. Incremental conversion would > require that the decisions made in one cvs2svn run are recorded and > treated as unalterable in subsequent runs. > > This hasn't been a priority in the Subversion world, because, frankly, > what reason would a person have to stick with CVS instead of switching > to Subversion, given that (1) they are intentionally so similar in > workflow, an (2) there is no significant competition from other > centralized SCMs? But of course until the distributed SCM playing > field has been thinned out a bit, people will probably be reluctant to > commit to one or the other. > > I don't expect to have time to implement incremental conversions in > cvs2svn in the near future. (I'd much rather work on output back ends > to other distributed SCMs.) But if any volunteers step forward (hint, > hint) I would be happy to help them get started and answer their > questions. I think that cvs2svn is quite hackable now, so the > learning curve is hopefully much less frightening than when I started > on the project :-) > > Michael > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@cvs2svn.tigris.org > For additional commands, e-mail: users-help@cvs2svn.tigris.org > > - > To unsubscribe from this list: send the line "unsubscribe git" in the > body of a message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-03 15:41 ` Jon Smirl 2007-08-03 16:42 ` Patwardhan, Rajesh @ 2007-08-03 18:58 ` Michael Haggerty 2007-08-03 20:16 ` Jon Smirl 1 sibling, 1 reply; 40+ messages in thread From: Michael Haggerty @ 2007-08-03 18:58 UTC (permalink / raw) To: Jon Smirl Cc: Patwardhan, Rajesh, Martin Langhoff, Guilhem Bonnefille, git, users [I set followup-to users@cvs2svn.tigris.org, since this has nothing to do with git.] Jon Smirl wrote: > On 8/3/07, Patwardhan, Rajesh <rajesh.patwardhan@etrade.com> wrote: >> Hello Michael, >> I will explain a scenario (we are passing thru this right now) >> 1) you have 10 years worth of cvs data. >> 2) We want to move to svn. >> 3) The repository move should be in such a way that the development does >> not get hampered for any 1 work day. >> 4) We have atleast 4 major modules in cvs which takes about 30 - 40 >> hours each for conversion currently. > > There are known ways (that haven't been implemented) to get the 40 hr > number down to 1/2 hour. Would that be a better approach than doing > incremental imports? Jon, I would like very much to hear how you propose to get an 60-fold speed increase in cvs2svn. I've never heard of any plausible way to accomplish anything even close to this. Please note that the user wants to convert to Subversion, not git. But even converting to git, I don't think that such speeds are possible without massive changes that would include processing everything in RAM and switching large parts of cvs2svn from Python to a compiled language. Michael ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-03 18:58 ` Michael Haggerty @ 2007-08-03 20:16 ` Jon Smirl 2007-08-03 20:27 ` Jon Smirl 0 siblings, 1 reply; 40+ messages in thread From: Jon Smirl @ 2007-08-03 20:16 UTC (permalink / raw) To: Michael Haggerty Cc: Patwardhan, Rajesh, Martin Langhoff, Guilhem Bonnefille, git, users On 8/3/07, Michael Haggerty <mhagger@alum.mit.edu> wrote: > [I set followup-to users@cvs2svn.tigris.org, since this has nothing to > do with git.] > > Jon Smirl wrote: > > On 8/3/07, Patwardhan, Rajesh <rajesh.patwardhan@etrade.com> wrote: > >> Hello Michael, > >> I will explain a scenario (we are passing thru this right now) > >> 1) you have 10 years worth of cvs data. > >> 2) We want to move to svn. > >> 3) The repository move should be in such a way that the development does > >> not get hampered for any 1 work day. > >> 4) We have atleast 4 major modules in cvs which takes about 30 - 40 > >> hours each for conversion currently. > > > > There are known ways (that haven't been implemented) to get the 40 hr > > number down to 1/2 hour. Would that be a better approach than doing > > incremental imports? > > Jon, I would like very much to hear how you propose to get an 60-fold > speed increase in cvs2svn. I've never heard of any plausible way to > accomplish anything even close to this. > > Please note that the user wants to convert to Subversion, not git. But > even converting to git, I don't think that such speeds are possible > without massive changes that would include processing everything in RAM > and switching large parts of cvs2svn from Python to a compiled language. Make a bulk importer for SVN like git-fastimport. I measured some SVN imports and the bulk of the time was spent forking off SVN. Before git-fast import it would have taken git two weeks to import Mozilla CVS. > > Michael > > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cvs2svn conversion directly to git ready for experimentation 2007-08-03 20:16 ` Jon Smirl @ 2007-08-03 20:27 ` Jon Smirl 0 siblings, 0 replies; 40+ messages in thread From: Jon Smirl @ 2007-08-03 20:27 UTC (permalink / raw) To: Michael Haggerty Cc: Patwardhan, Rajesh, Martin Langhoff, Guilhem Bonnefille, git, users On 8/3/07, Jon Smirl <jonsmirl@gmail.com> wrote: > Make a bulk importer for SVN like git-fastimport. I measured some SVN > imports and the bulk of the time was spent forking off SVN. Before > git-fast import it would have taken git two weeks to import Mozilla > CVS. And add a CVS parser to cvs2svn. Use the one I posted or write it again. Fork is not a very fast operation, millions of forks take a week to run. In the cvs2git code I did there was one process running cvs2svn and it parsed the CVS files internally. A second process ran git-fastimport. Nothing else was forked. When I first started we were forking both git and cvs. When I ran oprofile on it 95% of the CPU time was being spent in the kernel. Linus helped me figure out what was going on. It was the overhead of page table copies associated with millions of forks that was taking so long. The solution is to eliminate the forks. My first try with forks for both cvs and git took about a week to import Mozilla CVS. After all the forks were eliminated I could import Mozilla CVS in four hours. > > > > > Michael > > > > > > > -- > Jon Smirl > jonsmirl@gmail.com > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2007-08-05 7:58 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-08-01 0:09 cvs2svn conversion directly to git ready for experimentation Michael Haggerty 2007-08-01 0:41 ` Johannes Schindelin 2007-08-01 22:09 ` Jakub Narebski 2007-08-02 16:58 ` Michael Haggerty 2007-08-02 23:44 ` Jon Smirl 2007-08-02 8:49 ` Steffen Prohaska 2007-08-02 17:23 ` Michael Haggerty 2007-08-02 19:22 ` Marko Macek 2007-08-02 23:59 ` Jon Smirl 2007-08-05 7:58 ` Oswald Buddenhagen 2007-08-02 17:35 ` Simon 'corecode' Schubert 2007-08-02 19:13 ` Steffen Prohaska 2007-08-02 19:29 ` Simon 'corecode' Schubert 2007-08-02 20:21 ` Robin Rosenberg [not found] ` <200708022221.13129.robin.rosenberg.lists-RgPrefM1rjDQT0dZR+AlfA@public.gmane.org> 2007-08-02 20:31 ` Lübbe Onken 2007-08-02 20:32 ` Lübbe Onken 2007-08-02 20:33 ` Lübbe Onken 2007-08-02 22:02 ` Steffen Prohaska 2007-08-02 22:50 ` Simon 'corecode' Schubert 2007-08-02 23:50 ` Michael Haggerty 2007-08-03 8:40 ` Simon 'corecode' Schubert 2007-08-04 8:28 ` Steffen Prohaska 2007-08-03 3:07 ` Shawn O. Pearce 2007-08-02 23:37 ` Michael Haggerty 2007-08-02 20:43 ` Linus Torvalds 2007-08-02 23:19 ` Michael Haggerty 2007-08-03 3:12 ` Shawn O. Pearce 2007-08-02 23:55 ` Jon Smirl [not found] ` <8b65902a0708010438s24d16109k601b52c04cf9c066@mail.gmail.com> 2007-08-02 15:34 ` Michael Haggerty 2007-08-02 23:08 ` Martin Langhoff 2007-08-03 4:03 ` Johannes Schindelin 2007-08-03 6:48 ` Steffen Prohaska 2007-08-03 7:10 ` Steffen Prohaska 2007-08-03 8:36 ` Michael Haggerty 2007-08-03 14:35 ` Patwardhan, Rajesh 2007-08-03 15:41 ` Jon Smirl 2007-08-03 16:42 ` Patwardhan, Rajesh 2007-08-03 18:58 ` Michael Haggerty 2007-08-03 20:16 ` Jon Smirl 2007-08-03 20:27 ` Jon Smirl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).