Some tips for doing a CVS importer

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Some tips for doing a CVS importer
@ 2006-11-20 21:49 Jon Smirl
  2006-11-20 23:03 ` Martin Langhoff
  2006-11-27 11:24 ` Michael Haggerty
  0 siblings, 2 replies; 30+ messages in thread
From: Jon Smirl @ 2006-11-20 21:49 UTC (permalink / raw)
  To: Git Mailing List

I have tried all of the available CVS importers. None of them are
without problems. If anyone is interested in writing one for git here
are some ideas on how to structure it.

1) there is a working lex/yacc for CVS in the parsecvs source code
2) The first time you parse a CVS file record everything and don't
parse it again.
3) When the file is first parsed use the deltas to generate the
revisions and feed them to git-fastimport, just remember the SHA1 or
an id in the import code. This is a critical step to getting decent
performance.
4) If you do #1 and #2 you don't need to store CVS revision numbers
and file names in memory. Because of that you can can easily do a
Mozilla import in 2GB, probably 1GB.
5) When comparing CVS revisions only use the CVS timestamps as a last
resort, instead use the dependency information in the CVS file
6) Match up commits by using an sha1 of the author and commit message
7) After all files are loaded, match up the symbols and insert them
into the dependency chains, if any of the symbols depend on a branch
commit the symbol lies on the branch, otherwise the symbol is on the
trunk,
8) Do a topological sort to build the change set commit tree
9) when you hit a loop in the tree break up delta change sets until
the loop can be removed, don't break up symbol change sets.
10) Mozilla has some large commits that were made over dial up. Commit
change sets can span hours. All of these commits need to be merged
into a single change set.
11) An algorithm needs to be developed for detecting branches merging
back into the trunk
12) cvs2svn has excellent test cases, use them to test the new
importer. The cvs2svn code is quite nice but it doesn't handle #7

-- 
Jon Smirl

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-20 21:49 Some tips for doing a CVS importer Jon Smirl
@ 2006-11-20 23:03 ` Martin Langhoff
  2006-11-20 23:37   ` Jon Smirl
  2006-11-27 11:24 ` Michael Haggerty
  1 sibling, 1 reply; 30+ messages in thread
From: Martin Langhoff @ 2006-11-20 23:03 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List

On 11/21/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> I have tried all of the available CVS importers. None of them are
> without problems. If anyone is interested in writing one for git here
> are some ideas on how to structure it.

Hi Jon,

I gather this means that the cvs2svn track hasn't been as productive
as expected. Any remaining/unsolvable issues with it? I have been
chronically busy on my e-learning projects, but don't discard coming
back to this when I have some time.

cheers,




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-20 23:03 ` Martin Langhoff
@ 2006-11-20 23:37   ` Jon Smirl
  2006-11-21  0:29     ` Martin Langhoff
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Smirl @ 2006-11-20 23:37 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Git Mailing List

On 11/20/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> On 11/21/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> > I have tried all of the available CVS importers. None of them are
> > without problems. If anyone is interested in writing one for git here
> > are some ideas on how to structure it.
>
> Hi Jon,
>
> I gather this means that the cvs2svn track hasn't been as productive
> as expected. Any remaining/unsolvable issues with it? I have been
> chronically busy on my e-learning projects, but don't discard coming
> back to this when I have some time.

Look in this thread
[Fwd: Re: What's in git.git]

There is a message in there that explains a problem that the cvs2svn
people aren't going to fix and it kills git.


>
> cheers,
>
>
>
> martin
>


-- 
Jon Smirl

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-20 23:37   ` Jon Smirl
@ 2006-11-21  0:29     ` Martin Langhoff
  2006-11-21  0:55       ` Carl Worth
                         ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Martin Langhoff @ 2006-11-21  0:29 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List

On 11/21/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> > I gather this means that the cvs2svn track hasn't been as productive
> > as expected. Any remaining/unsolvable issues with it? I have been
> > chronically busy on my e-learning projects, but don't discard coming
> > back to this when I have some time.
>
> Look in this thread
> [Fwd: Re: What's in git.git]
>
> There is a message in there that explains a problem that the cvs2svn
> people aren't going to fix and it kills git.

I see - thanks for the pointer. Sorry to hear others in the Moz
project weren't so keen on hearing about alternatives to SVN. Long
term only something like GIT seems viable for such a large project (in
terms of community, branches/subprojects and codebase).

Two remaining questions
 - Where can I get your latest code? :-)
 - I gather the moz cvs repo has some cases that require getting the
symbol resolution right. Could this be performed as an extra pass /
task?

Eventually the Moz crowd will outgrow SVN - perhaps we should be
parsing the SVN dump format instead ;-)

cheers,



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21  0:29     ` Martin Langhoff
@ 2006-11-21  0:55       ` Carl Worth
  2006-11-21  1:40         ` Jon Smirl
  2006-11-21  1:53       ` Jon Smirl
  2006-11-21  6:43       ` Shawn Pearce
  2 siblings, 1 reply; 30+ messages in thread
From: Carl Worth @ 2006-11-21  0:55 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Jon Smirl, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 753 bytes --]

On Tue, 21 Nov 2006 13:29:20 +1300, "Martin Langhoff" wrote:
> I see - thanks for the pointer. Sorry to hear others in the Moz
> project weren't so keen on hearing about alternatives to SVN. Long
> term only something like GIT seems viable for such a large project (in
> terms of community, branches/subprojects and codebase).
...
> Eventually the Moz crowd will outgrow SVN - perhaps we should be
> parsing the SVN dump format instead ;-)

From what I understand, mozilla is currently using CVS and is looking
to replace that. The remaining options being considered are bzr and
hg, (git having been discarded due to the lack of a "native" win32
client---the cygwin stuff is apparently not considered viable for
whatever reason).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21  0:55       ` Carl Worth
@ 2006-11-21  1:40         ` Jon Smirl
  2006-11-21  6:39           ` Shawn Pearce
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Smirl @ 2006-11-21  1:40 UTC (permalink / raw)
  To: Carl Worth; +Cc: Martin Langhoff, Git Mailing List

On 11/20/06, Carl Worth <cworth@cworth.org> wrote:
> On Tue, 21 Nov 2006 13:29:20 +1300, "Martin Langhoff" wrote:
> > I see - thanks for the pointer. Sorry to hear others in the Moz
> > project weren't so keen on hearing about alternatives to SVN. Long
> > term only something like GIT seems viable for such a large project (in
> > terms of community, branches/subprojects and codebase).
> ...
> > Eventually the Moz crowd will outgrow SVN - perhaps we should be
> > parsing the SVN dump format instead ;-)
>
> From what I understand, mozilla is currently using CVS and is looking
> to replace that. The remaining options being considered are bzr and
> hg, (git having been discarded due to the lack of a "native" win32

brendan said SVN is likely for the main Mozilla repo and monotone for
the new Mozilla 2 work. No native win32 caused git to be immediately
discarded.

> client---the cygwin stuff is apparently not considered viable for
> whatever reason).
>
> -Carl
>
>
>


-- 
Jon Smirl

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21  0:29     ` Martin Langhoff
  2006-11-21  0:55       ` Carl Worth
@ 2006-11-21  1:53       ` Jon Smirl
  2006-11-26 10:18         ` Marko Macek
  2006-11-21  6:43       ` Shawn Pearce
  2 siblings, 1 reply; 30+ messages in thread
From: Jon Smirl @ 2006-11-21  1:53 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Git Mailing List

On 11/20/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> On 11/21/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> > > I gather this means that the cvs2svn track hasn't been as productive
> > > as expected. Any remaining/unsolvable issues with it? I have been
> > > chronically busy on my e-learning projects, but don't discard coming
> > > back to this when I have some time.
> >
> > Look in this thread
> > [Fwd: Re: What's in git.git]
> >
> > There is a message in there that explains a problem that the cvs2svn
> > people aren't going to fix and it kills git.
>
> I see - thanks for the pointer. Sorry to hear others in the Moz
> project weren't so keen on hearing about alternatives to SVN. Long
> term only something like GIT seems viable for such a large project (in
> terms of community, branches/subprojects and codebase).
>
> Two remaining questions
>  - Where can I get your latest code? :-)

I gave up on my cvs2git code, cvs2svn has been refactored so badly
that it was too much trouble tracking. It would be easier to write it
again. Most of the smarts from the import process is in the
git-fastimport code which Shawn has. cvs2svn underwent a major
algorithm change after I wrote the first version of git2svn.

I can probably find the code if you really want it, but it will be
leading you off in the wrong direction.

>  - I gather the moz cvs repo has some cases that require getting the
> symbol resolution right. Could this be performed as an extra pass /
> task?

Processing the symbols is integral to deciding how to build the change
sets. Right now cvs2svn ignores the symbol dependency information and
builds the change sets in a way that forces the mini-branches. That
causes 60% of the 2,000 symbols in Mozilla CVS to end up as little
branches. Look at the three commit example in the other thread to see
exactly what the problem is.

SVN hides the mini branch by creating a symbol like this:

Symbol XXX, change set 70
copy All from change set 50
copy file A from change set 55
copy file B,C from change set 60
copy file D from change set 61
copy file E,F,G from change set 63
copy file H from change set 67

It has to do all of those copies because the change sets weren't
constructed while taking symbol dependency information into account.

Symbol XXX can't copy from change set 69 because commits from after
the symbol was created are included in change sets 51-69.

> Eventually the Moz crowd will outgrow SVN - perhaps we should be
> parsing the SVN dump format instead ;-)
>
> cheers,
>
>
> martin
>

-- 
Jon Smirl

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21  1:40         ` Jon Smirl
@ 2006-11-21  6:39           ` Shawn Pearce
  2006-11-21 19:56             ` lamikr
  2006-11-21 20:03             ` Petr Baudis
  0 siblings, 2 replies; 30+ messages in thread
From: Shawn Pearce @ 2006-11-21  6:39 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Carl Worth, Martin Langhoff, Git Mailing List

Jon Smirl <jonsmirl@gmail.com> wrote:
> brendan said SVN is likely for the main Mozilla repo and monotone for
> the new Mozilla 2 work. No native win32 caused git to be immediately
> discarded.

Yea, that lack of native win32 seems to be one of a number of
blockers for people switching their projects onto Git.

I think there's a number of issues that are keeping people from
switching to Git and are instead causing them to choose SVN, hg
or Monotone:

  - No GUI.
  - No native win32 installation.
  - CVS import fails on some projects (e.g. Mozilla).
  - Confusing documentation.
  - pull/merge debate.
  - Fear of hash conflicts corrupting a repository.

I think Junio has solved the pull/merge debate issue.  We've talked
the hash conflict issue to death, but some new people still haven't
read those threads (or won't believe them).  I know people are trying
to work on improving the documentation, but there is obviously still
room for improvements.

Right now I'm trying to work on the no GUI problem with git-gui...

-- 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21  0:29     ` Martin Langhoff
  2006-11-21  0:55       ` Carl Worth
  2006-11-21  1:53       ` Jon Smirl
@ 2006-11-21  6:43       ` Shawn Pearce
  2 siblings, 0 replies; 30+ messages in thread
From: Shawn Pearce @ 2006-11-21  6:43 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Jon Smirl, Git Mailing List

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> Eventually the Moz crowd will outgrow SVN - perhaps we should be
> parsing the SVN dump format instead ;-)

Its a mess.  :)

Jon and I considered using the SVN dump format to feed git-fastimport
but chose against it.  Its a pretty horrible format.  Especially with
how it handles branches and tags, and file data.

Fortunately SVN has a C library which parses the file for you.
Which means that probably the best way to read the SVN dump format is
to write a program which links against the SVN library and translates
it into the datastructures used internally by git-fastimport to
generate an initial pack file, then repack that after the import
to get good compression.

-- 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21  6:39           ` Shawn Pearce
@ 2006-11-21 19:56             ` lamikr
  2006-11-21 20:05               ` Shawn Pearce
  2006-11-21 20:03             ` Petr Baudis
  1 sibling, 1 reply; 30+ messages in thread
From: lamikr @ 2006-11-21 19:56 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Jon Smirl, Carl Worth, Martin Langhoff, Git Mailing List

Shawn Pearce wrote:
> Jon Smirl <jonsmirl@gmail.com> wrote:
>   
>> brendan said SVN is likely for the main Mozilla repo and monotone for
>> the new Mozilla 2 work. No native win32 caused git to be immediately
>> discarded.
>>     
>
> Yea, that lack of native win32 seems to be one of a number of
> blockers for people switching their projects onto Git.
>
> I think there's a number of issues that are keeping people from
> switching to Git and are instead causing them to choose SVN, hg
> or Monotone:
>
>   - No GUI.
>   
QGIT allows using some commands. I plan to try out the GIT eclipse
plugin in near future myself.
This mail list have some discussion and download link to it's repo in
archives.
(title: Java GIT/Eclipse GIT version 0.1.1, )

>   - No native win32 installation.
>   - CVS import fails on some projects (e.g. Mozilla).
>   
Well, committing the files from Mozilla cvs to svn has also own problems.
SVN accepts only a text files which have either a "Unix" or DOS style
line endings.
If file contains a both some lines using "Unix" way and others using dos
way SVN roll's
back the commit and you need to tools like "dos2unix" or "unix2dos" to
manipulate those.
(And randomly changing all to either of those is propably not a good idea)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21  6:39           ` Shawn Pearce
  2006-11-21 19:56             ` lamikr
@ 2006-11-21 20:03             ` Petr Baudis
  2006-11-21 20:15               ` Shawn Pearce
                                 ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Petr Baudis @ 2006-11-21 20:03 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Jon Smirl, Carl Worth, Martin Langhoff, Git Mailing List

On Tue, Nov 21, 2006 at 07:39:35AM CET, Shawn Pearce wrote:
> Jon Smirl <jonsmirl@gmail.com> wrote:
> > brendan said SVN is likely for the main Mozilla repo and monotone for
> > the new Mozilla 2 work. No native win32 caused git to be immediately
> > discarded.
> 
> Yea, that lack of native win32 seems to be one of a number of
> blockers for people switching their projects onto Git.

Yep. :-(

> I think there's a number of issues that are keeping people from
> switching to Git and are instead causing them to choose SVN, hg
> or Monotone:
> 
>   - No GUI.

It has been my impression that Git's situation is far better than in
case of the other systems (except SVN: TortoiseSVN and RapidSVN). Is
that not so?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
The meaning of Stonehenge in Traflamadorian, when viewed from above, is:
"Replacement part being rushed with all possible speed."

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21 19:56             ` lamikr
@ 2006-11-21 20:05               ` Shawn Pearce
  2006-11-23 19:45                 ` Robin Rosenberg
  0 siblings, 1 reply; 30+ messages in thread
From: Shawn Pearce @ 2006-11-21 20:05 UTC (permalink / raw)
  To: lamikr; +Cc: Jon Smirl, Carl Worth, Martin Langhoff, Git Mailing List

lamikr <lamikr@cc.jyu.fi> wrote:
> Shawn Pearce wrote:
> >   - No GUI.
> >   
> QGIT allows using some commands. I plan to try out the GIT eclipse
> plugin in near future myself.
> This mail list have some discussion and download link to it's repo in
> archives.
> (title: Java GIT/Eclipse GIT version 0.1.1, )

I'm the author of that plugin.  :-)

Its not even capable of making a commit yet.  The underling plumbing
(aka jgit) can make commits but the Eclipse GUI has no function to
actually invoke that plumbing and make a commit to the repository.

The Eclipse plugin has apparently been a low priority for me.
I haven't worked on it very recently.  Robin Rosenburg has supposedly
gotten the revision compare interface to work, but its slow as a
duck in November due to jgit's pack reading code not running as
fast as it should.

-- 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21 20:03             ` Petr Baudis
@ 2006-11-21 20:15               ` Shawn Pearce
  2006-11-21 20:22               ` Johannes Schindelin
  2006-11-21 20:40               ` Martin Langhoff
  2 siblings, 0 replies; 30+ messages in thread
From: Shawn Pearce @ 2006-11-21 20:15 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jon Smirl, Carl Worth, Martin Langhoff, Git Mailing List

Petr Baudis <pasky@suse.cz> wrote:
> On Tue, Nov 21, 2006 at 07:39:35AM CET, Shawn Pearce wrote:
> > I think there's a number of issues that are keeping people from
> > switching to Git and are instead causing them to choose SVN, hg
> > or Monotone:
> > 
> >   - No GUI.
> 
> It has been my impression that Git's situation is far better than in
> case of the other systems (except SVN: TortoiseSVN and RapidSVN). Is
> that not so?

Hmm.

hg has a browser (hgk).  Its a direct port of gitk.  I don't see
a GUI otherwise, such as qgit or git-gui.  They do however have a
Windows installer.

Monotone has mtsh and guitone.  Neither appear to be as far along
as say qgit or even git-gui, which isn't that far along at all.

So I guess you are right.  Git's situation is better than that
of hg or Monotone.  Now if only I can finish everything I want
to put into git-gui, and get it included as part of the core Git
distribution.  :)

-- 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21 20:03             ` Petr Baudis
  2006-11-21 20:15               ` Shawn Pearce
@ 2006-11-21 20:22               ` Johannes Schindelin
  2006-11-23  9:10                 ` Johannes Sixt
  2006-11-21 20:40               ` Martin Langhoff
  2 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2006-11-21 20:22 UTC (permalink / raw)
  To: Petr Baudis
  Cc: Shawn Pearce, Jon Smirl, Carl Worth, Martin Langhoff,
	Git Mailing List

Hi,

On Tue, 21 Nov 2006, Petr Baudis wrote:

> On Tue, Nov 21, 2006 at 07:39:35AM CET, Shawn Pearce wrote:
> > 
> > Yea, that lack of native win32 seems to be one of a number of
> > blockers for people switching their projects onto Git.
> 
> Yep. :-(

I started playing with MinGW, and got it to compile and run, with some 
features lacking. See

Message-ID: <Pine.LNX.4.63.0609021724110.28360@wbgn013.biozentrum.uni-wuerzburg.de>

for details. From TFM

: The two biggest obstacles are fork() and the network stuff (I do not 
: plan on supporting Git.pm there). To overcome the absence of fork() I 
: wanted to use the subprocess stuff in MinGW's port of GNU make.

> > I think there's a number of issues that are keeping people from
> > switching to Git and are instead causing them to choose SVN, hg
> > or Monotone:
> > 
> >   - No GUI.
> 
> It has been my impression that Git's situation is far better than in
> case of the other systems (except SVN: TortoiseSVN and RapidSVN). Is
> that not so?

I also started playing with writing a shell extension (this is what custom 
context menu entries are called in Windows) using only MinGW, and no 
payware (except, of course, Windows).

Since both of these little projects were sidetracks from what I am really 
supposed to do, I will not be able to continue on these on a regular 
basis. Get somebody else interested, though, and I will be glad to help!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21 20:03             ` Petr Baudis
  2006-11-21 20:15               ` Shawn Pearce
  2006-11-21 20:22               ` Johannes Schindelin
@ 2006-11-21 20:40               ` Martin Langhoff
  2 siblings, 0 replies; 30+ messages in thread
From: Martin Langhoff @ 2006-11-21 20:40 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Shawn Pearce, Jon Smirl, Carl Worth, Git Mailing List

On 11/22/06, Petr Baudis <pasky@suse.cz> wrote:
> >   - No GUI.
>
> It has been my impression that Git's situation is far better than in
> case of the other systems (except SVN: TortoiseSVN and RapidSVN). Is
> that not so?

I think GIT is in pretty good shape in all the items mentioned Shawn
lists except the Win32 port.

     Confusing doco? All of them ;-)
     Push/pull terminology confusion -- all of them again.

My only thing is that I continue to teach Cogito instead of GIT
because the index is a great thing for a top-level maintainer of a
large project but it really offers almost next to nothing to a user
who wants to make a commit.

but that hasn't stopped adoption over here...

cheers,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21 20:22               ` Johannes Schindelin
@ 2006-11-23  9:10                 ` Johannes Sixt
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Sixt @ 2006-11-23  9:10 UTC (permalink / raw)
  To: git

Johannes Schindelin wrote:
> I started playing with MinGW, and got it to compile and run, with some
> features lacking. See
> 
> Message-ID: <Pine.LNX.4.63.0609021724110.28360@wbgn013.biozentrum.uni-wuerzburg.de>
> 
> for details. From TFM
> 
> : The two biggest obstacles are fork() and the network stuff (I do not
> : plan on supporting Git.pm there). To overcome the absence of fork() I
> : wanted to use the subprocess stuff in MinGW's port of GNU make.

I'd like to do something about it. Is your work accessible in some way?

At the moment I'm limping along with CVS on Windows, which really is the
wrong tool for my current task (CVS I mean, not Windows ;)

-- Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21 20:05               ` Shawn Pearce
@ 2006-11-23 19:45                 ` Robin Rosenberg
  2006-11-25  6:59                   ` Shawn Pearce
  0 siblings, 1 reply; 30+ messages in thread
From: Robin Rosenberg @ 2006-11-23 19:45 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: lamikr, Jon Smirl, Carl Worth, Martin Langhoff, Git Mailing List

tisdag 21 november 2006 21:05 skrev Shawn Pearce:
> lamikr <lamikr@cc.jyu.fi> wrote:
> > Shawn Pearce wrote:
> > >   - No GUI.
> >
> > QGIT allows using some commands. I plan to try out the GIT eclipse
> > plugin in near future myself.
> > This mail list have some discussion and download link to it's repo in
> > archives.
> > (title: Java GIT/Eclipse GIT version 0.1.1, )
>
> I'm the author of that plugin.  :-)
>
> Its not even capable of making a commit yet.  The underling plumbing
> (aka jgit) can make commits but the Eclipse GUI has no function to
> actually invoke that plumbing and make a commit to the repository.
>
> The Eclipse plugin has apparently been a low priority for me.
> I haven't worked on it very recently.  Robin Rosenburg has supposedly
> gotten the revision compare interface to work, but its slow as a
> duck in November due to jgit's pack reading code not running as
> fast as it should.

Slow it is. It is somewhat usable though, especially the quickdiff. I worked 
the whole day with help from quickdiff today. The diff is computed against 
HEAD^ (i.e. I get to see the changes that my topmost StGit patch introduces).

The project contains 20000+ files and six years of history.  Reading the whole 
history is out of the question with the current performance so I restrict 
reading to 500 entries which is just about bearable. That's enough for 
practical use with quickdiff and compare though. Improving jgit's speed 50 
times will probably be enough to make jgit shine. 

Activating the Git connection seems to be a problem with the egit projects, 
i.e. it works sometimes, but not with my much bigger repo. The only problem 
is that the first time is dog slow. The structure is different though, as my 
repo has .project at the top, not one level down.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-23 19:45                 ` Robin Rosenberg
@ 2006-11-25  6:59                   ` Shawn Pearce
  0 siblings, 0 replies; 30+ messages in thread
From: Shawn Pearce @ 2006-11-25  6:59 UTC (permalink / raw)
  To: Robin Rosenberg
  Cc: lamikr, Jon Smirl, Carl Worth, Martin Langhoff, Git Mailing List

Robin Rosenberg <robin.rosenberg.lists@dewire.com> wrote:
> Slow it is. It is somewhat usable though, especially the quickdiff. I worked 
> the whole day with help from quickdiff today. The diff is computed against 
> HEAD^ (i.e. I get to see the changes that my topmost StGit patch introduces).

That's good to hear!

> The project contains 20000+ files and six years of history.  Reading the whole 
> history is out of the question with the current performance so I restrict 
> reading to 500 entries which is just about bearable. That's enough for 
> practical use with quickdiff and compare though. Improving jgit's speed 50 
> times will probably be enough to make jgit shine. 

Yes.  I have a plan on how to rewrite the pack reading code which
should help somewhat here.  There's some fundamental limitations
of Java though that are going to keep us from performing as well
as core-Git does (due to the object memory overheads) but I would
like to get close.  :-)

jgit also has a few quirks still.  For example it assumes everything
is encoded as UTF-8 but this isn't true.  The encoding is project
specific and can be set by any user, which isn't that portable.
This is a problem for jgit and I need to go back and refactor the
parsing code...

I'd like to get back to jgit sometime in mid-Decemeber.  I'm trying
to push through git-gui first.  :-)

> Activating the Git connection seems to be a problem with the egit projects, 
> i.e. it works sometimes, but not with my much bigger repo. The only problem 
> is that the first time is dog slow. The structure is different though, as my 
> repo has .project at the top, not one level down.

Hmm.  That's a bug.  Sounds like a thread timing issue if it works
sometimes, as the logic should be completely deterministic.

-- 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-21  1:53       ` Jon Smirl
@ 2006-11-26 10:18         ` Marko Macek
  2006-11-26 15:35           ` Jon Smirl
  0 siblings, 1 reply; 30+ messages in thread
From: Marko Macek @ 2006-11-26 10:18 UTC (permalink / raw)
  To: jonsmirl; +Cc: git

Jon Smirl wrote:

> 
> SVN hides the mini branch by creating a symbol like this:
> 
> Symbol XXX, change set 70
> copy All from change set 50
> copy file A from change set 55
> copy file B,C from change set 60
> copy file D from change set 61
> copy file E,F,G from change set 63
> copy file H from change set 67
> 
> It has to do all of those copies because the change sets weren't
> constructed while taking symbol dependency information into account.
> 
> Symbol XXX can't copy from change set 69 because commits from after
> the symbol was created are included in change sets 51-69.

Sometimes it is not actually possible to have a 'simple' symbol, even 
by following proper symbol dependencies. 

Some situations:
- tags on some files are readjusted later, or tagged separately with an older
 version
- tag is created with a -D "date" and the file times are not in sync
- tag is created from a mixed-revision working copy

While in the cases of 'time warp' the revision sequence should be 
considered more important than timestamps, this is not necessarily
true for tags, since it's easily possible to create them on mixed 
revisions.

cvs2svn also has a problem with vendor branches because it creates
tags/branches that contain files from vendor branch by copying some
files from the trunk and other files from the vendor branch.
If the vendor branch/tag was only used for the initial import, 
it's IMO best to skip them in the conversion (this needs a patch).
There are however problems because keyword expansion causes file
differences.

It seems that mozilla CVS repository has vendor branches/imports in
some parts of the tree.

Mark

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-26 10:18         ` Marko Macek
@ 2006-11-26 15:35           ` Jon Smirl
  2006-11-26 16:11             ` Marko Macek
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Smirl @ 2006-11-26 15:35 UTC (permalink / raw)
  To: Marko Macek; +Cc: git

On 11/26/06, Marko Macek <marko.macek@gmx.net> wrote:
> Jon Smirl wrote:
>
> >
> > SVN hides the mini branch by creating a symbol like this:
> >
> > Symbol XXX, change set 70
> > copy All from change set 50
> > copy file A from change set 55
> > copy file B,C from change set 60
> > copy file D from change set 61
> > copy file E,F,G from change set 63
> > copy file H from change set 67
> >
> > It has to do all of those copies because the change sets weren't
> > constructed while taking symbol dependency information into account.
> >
> > Symbol XXX can't copy from change set 69 because commits from after
> > the symbol was created are included in change sets 51-69.
>
> Sometimes it is not actually possible to have a 'simple' symbol, even
> by following proper symbol dependencies.
>
> Some situations:
> - tags on some files are readjusted later, or tagged separately with an older
>  version
> - tag is created with a -D "date" and the file times are not in sync
> - tag is created from a mixed-revision working copy

I agree that there are a few exceptions to making simple symbols. But
the current cvs2svn makes no attempt at all to preserve simple
symbols. In my attempts at converting Mozilla 60% of the symbols ended
up as tiny branches. I investigated a couple by hand and was able to
rearrange things to create simple symbols in every case I looked at.

This can be dealt with during the topological sort. If there are
complex symbol creations you will end up with loops during the sort
process. At that point you need to start breaking up change sets to
remove the loops. You would use a heuristic at this point, something
like try breaking up to ten commit change sets to preserve a symbol,
if you can't preserve it with 10 breaks then break the symbol once and
try again, repeat until the loop is gone.

The current cvs2svn code effectively implements a heuristic when the
commits are always preserved at the expense of breaking the symbols.
Since some commit comments are very common comments (blank ones) those
commits get combined into bigger change sets and trash the simple
symbols.

Another note for doing a converter. When combining things into change
sets, for git import the comments in the branches should not be mixed
between branches and the trunk when detecting change set. Git doesn't
allow simultaneous commits to the trunk and branches.

> While in the cases of 'time warp' the revision sequence should be
> considered more important than timestamps, this is not necessarily
> true for tags, since it's easily possible to create them on mixed
> revisions.
>
> cvs2svn also has a problem with vendor branches because it creates
> tags/branches that contain files from vendor branch by copying some
> files from the trunk and other files from the vendor branch.
> If the vendor branch/tag was only used for the initial import,
> it's IMO best to skip them in the conversion (this needs a patch).
> There are however problems because keyword expansion causes file
> differences.
>
> It seems that mozilla CVS repository has vendor branches/imports in
> some parts of the tree.

I never got around to checking out problems with vendor branches in Mozilla.

>
> Mark
>
>

-- 
Jon Smirl

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-26 15:35           ` Jon Smirl
@ 2006-11-26 16:11             ` Marko Macek
  2006-11-26 17:51               ` Jon Smirl
  2006-11-27 11:29               ` Michael Haggerty
  0 siblings, 2 replies; 30+ messages in thread
From: Marko Macek @ 2006-11-26 16:11 UTC (permalink / raw)
  To: Jon Smirl; +Cc: git

Jon Smirl wrote:

> Another note for doing a converter. When combining things into change
> sets, for git import the comments in the branches should not be mixed
> between branches and the trunk when detecting change set. Git doesn't
> allow simultaneous commits to the trunk and branches.

Yup, this is the current problem I'm facing now. Even for CVS->SVN conversion,
I don't want to see multi-branch commits.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-26 16:11             ` Marko Macek
@ 2006-11-26 17:51               ` Jon Smirl
  2006-11-27 11:29               ` Michael Haggerty
  1 sibling, 0 replies; 30+ messages in thread
From: Jon Smirl @ 2006-11-26 17:51 UTC (permalink / raw)
  To: Marko Macek; +Cc: git

On 11/26/06, Marko Macek <marko.macek@gmx.net> wrote:
> Jon Smirl wrote:
>
> > Another note for doing a converter. When combining things into change
> > sets, for git import the comments in the branches should not be mixed
> > between branches and the trunk when detecting change set. Git doesn't
> > allow simultaneous commits to the trunk and branches.
>
> Yup, this is the current problem I'm facing now. Even for CVS->SVN conversion,
> I don't want to see multi-branch commits.

There is a command line option on cvs2svn to isolate the branches. I
got him to add it as part of the attempt at doing git support.

>
> Mark
>


-- 
Jon Smirl

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-20 21:49 Some tips for doing a CVS importer Jon Smirl
  2006-11-20 23:03 ` Martin Langhoff
@ 2006-11-27 11:24 ` Michael Haggerty
  2006-11-27 11:51   ` Markus Schiltknecht
  2006-11-27 15:20   ` Jon Smirl
  1 sibling, 2 replies; 30+ messages in thread
From: Michael Haggerty @ 2006-11-27 11:24 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List, dev, Shawn Pearce

I am currently the main (and pretty much the only) cvs2svn maintainer.
Development has been proceeding more slowly lately because (1) I'm very
busy with my day job, and (2) nobody has stepped forward to help.

Jon Smirl wrote:
> #1) There needs to be a tool that can accurately import the
> repository. cvs2svn does not do this. The good programmers working on
> git could probably whip this out in a week or two if they wanted to.
> cvs2svn is very close but they refuse to solve the symbol dependency
> problem.

Jon, I wish you wouldn't portray as obstinacy what is simply a lack of
resources.  I would like very much to support other cvs2svn output
formats.  I think it would be great if other projects could benefit from
our work.  Most of the work I've been doing on cvs2svn lately has been
towards supporting other output SCMs.

Jon Smirl wrote:
> I gave up on my cvs2git code, cvs2svn has been refactored so badly
> that it was too much trouble tracking. It would be easier to write it
> again. Most of the smarts from the import process is in the
> git-fastimport code which Shawn has. cvs2svn underwent a major
> algorithm change after I wrote the first version of git2svn.

I hope that by "badly" you mean "extensively" and not "poorly" :-\  If
you mean "poorly", then I'd like to hear your feedback/suggestions.

A large amount of refactoring has been needed to make the change to
dependency-based conversion possible, and a lot more to help support
different output formats.  I understand that this causes difficulties
for people trying to do parallel development, but most of the
refactoring was done before your first appearance on the cvs2svn mailing
lists.  If you had let us know what you were working on, I would have
avoided making conflicting changes (as I did with Oswald Buddenhagen's
commit-dependencies changes).

Jon Smirl wrote:
> I have tried all of the available CVS importers. None of them are
> without problems. If anyone is interested in writing one for git here
> are some ideas on how to structure it.
> 
> 1) there is a working lex/yacc for CVS in the parsecvs source code
> 2) The first time you parse a CVS file record everything and don't
> parse it again.
> 3) When the file is first parsed use the deltas to generate the
> revisions and feed them to git-fastimport, just remember the SHA1 or
> an id in the import code. This is a critical step to getting decent
> performance.
> 4) If you do #1 and #2 you don't need to store CVS revision numbers
> and file names in memory. Because of that you can can easily do a
> Mozilla import in 2GB, probably 1GB.
> 5) When comparing CVS revisions only use the CVS timestamps as a last
> resort, instead use the dependency information in the CVS file
> 6) Match up commits by using an sha1 of the author and commit message
> 7) After all files are loaded, match up the symbols and insert them
> into the dependency chains, if any of the symbols depend on a branch
> commit the symbol lies on the branch, otherwise the symbol is on the
> trunk,
> 8) Do a topological sort to build the change set commit tree
> 9) when you hit a loop in the tree break up delta change sets until
> the loop can be removed, don't break up symbol change sets.
> 10) Mozilla has some large commits that were made over dial up. Commit
> change sets can span hours. All of these commits need to be merged
> into a single change set.
> 11) An algorithm needs to be developed for detecting branches merging
> back into the trunk
> 12) cvs2svn has excellent test cases, use them to test the new
> importer. The cvs2svn code is quite nice but it doesn't handle #7

Most of this is possible now using cvs2svn, but it is not enough.

But first there is a problem with your point #9.  It is in general not
possible to avoid breaking up symbol changesets, even if you are willing
to massacre the revision changesets.  CVS allows cases like this:

file1:

    1.1
    1.2 ----> branch "A"
              1.2.0.1
              1.2.0.2 ----> branch "B"

file2:

    1.1
    1.2 ----> branch "B"
              1.2.0.1
              1.2.0.2 ----> branch "A"

Clearly there is no way to create symbols "A" and "B" both in a single
changeset.

But even disallowing cases like the one above, it is often very
questionable whether you want to avoid breaking up symbol commits at all
costs.  For example, CVS allows

January:     file1<1.1>               file2<1.1>
February:    file1<1.1> tagged "T"
March:       file1<1.2>
November:                             file2<1.2>
December:                             file2<1.2> tagged "T"

In such a case, the only way to avoid splitting up the creation of tag
"T" would be to pretend that the commit file1<1.2> didn't occur in March
but rather in November.

The bottom line is that cvs2svn should do a better job of handling
symbols, but even then the git importer will necessarily have to deal
with some unusual CVS cases.

> Processing the symbols is integral to deciding how to build the change
> sets. Right now cvs2svn ignores the symbol dependency information and
> builds the change sets in a way that forces the mini-branches. That
> causes 60% of the 2,000 symbols in Mozilla CVS to end up as little
> branches. Look at the three commit example in the other thread to see
> exactly what the problem is.
>
> SVN hides the mini branch by creating a symbol like this:
>
> Symbol XXX, change set 70
> copy All from change set 50
> copy file A from change set 55
> copy file B,C from change set 60
> copy file D from change set 61
> copy file E,F,G from change set 63
> copy file H from change set 67
>
> It has to do all of those copies because the change sets weren't
> constructed while taking symbol dependency information into account.
>
> Symbol XXX can't copy from change set 69 because commits from after
> the symbol was created are included in change sets 51-69.

The vast majority of the mixed-source symbol creations have nothing to
do with honoring symbol dependencies, but rather with the fact the
cvs2svn is not so clever about deducing which branch should be used as
the source for a symbol (CVS often does not record this information
unambiguously).

Changes needed for git import:

The symbol dependency problem that Jon has focused on is IMO just the
least significant of three main changes that have to be made to support
git output from cvs2svn:

1. The symbol dependency problem.  Occasionally symbols are created in
an order that is inconsistent with the CVS dependency graph.  We want to
fix this in any case (even for SVN).  Work done so far: the symbol
dependency graph is already generated and recorded when the repository
is parsed, and the symbol dependencies are carried through the
conversion (though not yet used).

2. Symbols are often created using multiple branches as sources, when
they could be created from a single branch.  This happens because in
many cases CVS doesn't record unambiguously which branch was tagged, and
cvs2svn's heuristics are not especially clever.  A patch has been
submitted to fix this problem, but unfortunately it doesn't apply to
HEAD anymore.  See

http://cvs2svn.tigris.org/servlets/ReadMsg?list=dev&msgNo=1441

for a discussion.  (The main difficulty with picking better sources for
symbols is that the obvious approaches all require tons of intermediate
storage.)  I am currently trying to understand symbol handling in
cvs2svn well enough that I can port the patch to trunk.

3. The default current output format of cvs2svn is a single dump file
with file revisions in commit order.  For the distributed SCMs, it is
usually far more efficient to generate the file revisions file-by-file
(non-chronologically) during the initial parse of the CVS files, and
refer to the revisions by hash for the rest of the conversion.  In
October I added a bunch of hooks to cvs2svn to make this possible.  Work
remaining: code to reconstruct file text from CVS text + deltas,
including proper handling of line-end conventions and keyword
expansion/unexpansion, and of course the code to output the
reconstructed snapshots in a git-consumable format.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-26 16:11             ` Marko Macek
  2006-11-26 17:51               ` Jon Smirl
@ 2006-11-27 11:29               ` Michael Haggerty
  1 sibling, 0 replies; 30+ messages in thread
From: Michael Haggerty @ 2006-11-27 11:29 UTC (permalink / raw)
  To: Marko Macek; +Cc: Jon Smirl, git

Marko Macek wrote:
>> Another note for doing a converter. When combining things into change
>> sets, for git import the comments in the branches should not be mixed
>> between branches and the trunk when detecting change set. Git doesn't
>> allow simultaneous commits to the trunk and branches.
> 
> Yup, this is the current problem I'm facing now. Even for CVS->SVN
> conversion,
> I don't want to see multi-branch commits.

To avoid multi-branch commits, you have to start cvs2svn with an
--options file, and in the options file set

ctx.cross_project_commits = False


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-27 11:24 ` Michael Haggerty
@ 2006-11-27 11:51   ` Markus Schiltknecht
  2006-11-27 22:09     ` Michael Haggerty
  2006-11-27 15:20   ` Jon Smirl
  1 sibling, 1 reply; 30+ messages in thread
From: Markus Schiltknecht @ 2006-11-27 11:51 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jon Smirl, Git Mailing List, dev, Shawn Pearce

Hi,

Michael Haggerty wrote:
> I am currently the main (and pretty much the only) cvs2svn maintainer.
> Development has been proceeding more slowly lately because (1) I'm very
> busy with my day job, and (2) nobody has stepped forward to help.

I understand very well. Same for me here with monotone's cvs_import vs. 
my day job... and then I also have a life ;-)

> Jon, I wish you wouldn't portray as obstinacy what is simply a lack of
> resources.  I would like very much to support other cvs2svn output
> formats.  I think it would be great if other projects could benefit from
> our work.  Most of the work I've been doing on cvs2svn lately has been
> towards supporting other output SCMs.

Really? Hm. I'm somehow sorry for not joining cvs2svn but running my own 
thing with monotone. But I really think it took me less time. OTOH, I'm 
far from finished, yet...

Anyway, I've made an attempt at solving the 'picking better sources for
symbols'-problem:

During parsing of all the *,v files, where I'm collecting events 
(commits, branching and tagging) into blobs, I do also remember 
'possible parent branches' for all the symbols (tag and branch events).

After that and *before* the blob sorting, I check all blobs and try to 
find one single parent branch for them. In the best case, those symbol 
blobs do have exactly one possible parent branch, then I just pick that 
one. If there are multiple possible parents, I try to pick the deepest. 
As branches are symbols themselves, I have to run that multiple times 
until all symbols are resolved.

An example: having branches ROOT -> A -> B -> C (branched in that order) 
plus a branch D derived from branch A.

The symbol blob for branch A: has only one possible parent: ROOT. Thus I 
assign A->parent_branch = ROOT.

Next comes the blob for branch C: it has two possible parents: branch B 
and branch A. At that point we know that A is derived from ROOT, but we 
don't have assigned a parent to B, yet. Thus we can not resolve C this time.

Then comes branch B: one parent: A. Mark it.

Next round, we process C again: this time, we know B is branched from A. 
Thus we can remove the possible parent A. Leaving only one possible 
parent branch: B.

Now, say we have a tag 'X', which ended up in a blob having A, B, C and 
D as possible parent branches. I currently remove A and B, as they are 
parents of C. But C and D still remain and conflict. I'm unable to 
resolve that symbol. I'm thinking about leaving such conflicts to the 
user to resolve.

I've not yet tested this algorithm extensively. Most larger repositories 
seem to fail somewhere, but not necessarily because of that symbol 
resolving algorithm... :-(

Any comments? Questions? Ideas? I hope to have explained clearly...

And I wish you all a lot of time for your open source projects and your 
families, friends, wifes, girl-friends, etc...! ;-)

Regards

Markus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-27 11:24 ` Michael Haggerty
  2006-11-27 11:51   ` Markus Schiltknecht
@ 2006-11-27 15:20   ` Jon Smirl
  1 sibling, 0 replies; 30+ messages in thread
From: Jon Smirl @ 2006-11-27 15:20 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Git Mailing List, dev, Shawn Pearce

On 11/27/06, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> I am currently the main (and pretty much the only) cvs2svn maintainer.
> Development has been proceeding more slowly lately because (1) I'm very
> busy with my day job, and (2) nobody has stepped forward to help.
>
> Jon Smirl wrote:
> > #1) There needs to be a tool that can accurately import the
> > repository. cvs2svn does not do this. The good programmers working on
> > git could probably whip this out in a week or two if they wanted to.
> > cvs2svn is very close but they refuse to solve the symbol dependency
> > problem.
>
> Jon, I wish you wouldn't portray as obstinacy what is simply a lack of
> resources.  I would like very much to support other cvs2svn output
> formats.  I think it would be great if other projects could benefit from
> our work.  Most of the work I've been doing on cvs2svn lately has been
> towards supporting other output SCMs.

cvs2avn is a nice piece of code, it is a worthy goal to have a
univeral conversion tool.

>
> Jon Smirl wrote:
> > I gave up on my cvs2git code, cvs2svn has been refactored so badly
> > that it was too much trouble tracking. It would be easier to write it
> > again. Most of the smarts from the import process is in the
> > git-fastimport code which Shawn has. cvs2svn underwent a major
> > algorithm change after I wrote the first version of git2svn.
>
> I hope that by "badly" you mean "extensively" and not "poorly" :-\  If
> you mean "poorly", then I'd like to hear your feedback/suggestions.

Extensively, the dependency rewrite changed things some much that my
patches were basically worthless. I tried merging them and gave up, it
would be more efficient to rewrite them or builld hooks in the right
places.

>
> A large amount of refactoring has been needed to make the change to
> dependency-based conversion possible, and a lot more to help support
> different output formats.  I understand that this causes difficulties
> for people trying to do parallel development, but most of the
> refactoring was done before your first appearance on the cvs2svn mailing
> lists.  If you had let us know what you were working on, I would have
> avoided making conflicting changes (as I did with Oswald Buddenhagen's
> commit-dependencies changes).
>
> Jon Smirl wrote:
> > I have tried all of the available CVS importers. None of them are
> > without problems. If anyone is interested in writing one for git here
> > are some ideas on how to structure it.
> >
> > 1) there is a working lex/yacc for CVS in the parsecvs source code
> > 2) The first time you parse a CVS file record everything and don't
> > parse it again.
> > 3) When the file is first parsed use the deltas to generate the
> > revisions and feed them to git-fastimport, just remember the SHA1 or
> > an id in the import code. This is a critical step to getting decent
> > performance.
> > 4) If you do #1 and #2 you don't need to store CVS revision numbers
> > and file names in memory. Because of that you can can easily do a
> > Mozilla import in 2GB, probably 1GB.
> > 5) When comparing CVS revisions only use the CVS timestamps as a last
> > resort, instead use the dependency information in the CVS file
> > 6) Match up commits by using an sha1 of the author and commit message
> > 7) After all files are loaded, match up the symbols and insert them
> > into the dependency chains, if any of the symbols depend on a branch
> > commit the symbol lies on the branch, otherwise the symbol is on the
> > trunk,
> > 8) Do a topological sort to build the change set commit tree
> > 9) when you hit a loop in the tree break up delta change sets until
> > the loop can be removed, don't break up symbol change sets.
> > 10) Mozilla has some large commits that were made over dial up. Commit
> > change sets can span hours. All of these commits need to be merged
> > into a single change set.
> > 11) An algorithm needs to be developed for detecting branches merging
> > back into the trunk
> > 12) cvs2svn has excellent test cases, use them to test the new
> > importer. The cvs2svn code is quite nice but it doesn't handle #7
>
> Most of this is possible now using cvs2svn, but it is not enough.
>
> But first there is a problem with your point #9.  It is in general not
> possible to avoid breaking up symbol changesets, even if you are willing
> to massacre the revision changesets.  CVS allows cases like this:

We don't know how often this case occurs until more alogirthms are
tried. All I know is that 60% of the Mozilla symbols end up needing
copies. And for the few cases I decoded things by hand I was able to
rearrange things so that copies were not needed. It is likely that
some symbols in Mozilla will need copies to construct them, it is a
question of degree, I don't believe copies are required for 60% of the
symbols.

>
> file1:
>
>     1.1
>     1.2 ----> branch "A"
>               1.2.0.1
>               1.2.0.2 ----> branch "B"
>
> file2:
>
>     1.1
>     1.2 ----> branch "B"
>               1.2.0.1
>               1.2.0.2 ----> branch "A"
>
> Clearly there is no way to create symbols "A" and "B" both in a single
> changeset.
>
> But even disallowing cases like the one above, it is often very
> questionable whether you want to avoid breaking up symbol commits at all
> costs.  For example, CVS allows
>
>
> January:     file1<1.1>               file2<1.1>
> February:    file1<1.1> tagged "T"
> March:       file1<1.2>
> November:                             file2<1.2>
> December:                             file2<1.2> tagged "T"
>
> In such a case, the only way to avoid splitting up the creation of tag
> "T" would be to pretend that the commit file1<1.2> didn't occur in March
> but rather in November.
>
> The bottom line is that cvs2svn should do a better job of handling
> symbols, but even then the git importer will necessarily have to deal
> with some unusual CVS cases.

The unusal cases can be made into branches. If I remember correctly
Mozilla has about 300 symbols with "BRANCH" in the name. But the
converted repositories are ending up with over 2,000 branches. When
you load this into the git visualization tools it is obvious that the
bowl of spaghetti caused by 2,000 branches is not a repository a human
would have created.

>
> > Processing the symbols is integral to deciding how to build the change
> > sets. Right now cvs2svn ignores the symbol dependency information and
> > builds the change sets in a way that forces the mini-branches. That
> > causes 60% of the 2,000 symbols in Mozilla CVS to end up as little
> > branches. Look at the three commit example in the other thread to see
> > exactly what the problem is.
> >
> > SVN hides the mini branch by creating a symbol like this:
> >
> > Symbol XXX, change set 70
> > copy All from change set 50
> > copy file A from change set 55
> > copy file B,C from change set 60
> > copy file D from change set 61
> > copy file E,F,G from change set 63
> > copy file H from change set 67
> >
> > It has to do all of those copies because the change sets weren't
> > constructed while taking symbol dependency information into account.
> >
> > Symbol XXX can't copy from change set 69 because commits from after
> > the symbol was created are included in change sets 51-69.
>
> The vast majority of the mixed-source symbol creations have nothing to
> do with honoring symbol dependencies, but rather with the fact the
> cvs2svn is not so clever about deducing which branch should be used as
> the source for a symbol (CVS often does not record this information
> unambiguously).
>
> Changes needed for git import:
>
> The symbol dependency problem that Jon has focused on is IMO just the
> least significant of three main changes that have to be made to support
> git output from cvs2svn:
>
> 1. The symbol dependency problem.  Occasionally symbols are created in
> an order that is inconsistent with the CVS dependency graph.  We want to
> fix this in any case (even for SVN).  Work done so far: the symbol
> dependency graph is already generated and recorded when the repository
> is parsed, and the symbol dependencies are carried through the
> conversion (though not yet used).
>
> 2. Symbols are often created using multiple branches as sources, when
> they could be created from a single branch.  This happens because in
> many cases CVS doesn't record unambiguously which branch was tagged, and
> cvs2svn's heuristics are not especially clever.  A patch has been
> submitted to fix this problem, but unfortunately it doesn't apply to
> HEAD anymore.  See
>
> http://cvs2svn.tigris.org/servlets/ReadMsg?list=dev&msgNo=1441
>
> for a discussion.  (The main difficulty with picking better sources for
> symbols is that the obvious approaches all require tons of intermediate
> storage.)  I am currently trying to understand symbol handling in
> cvs2svn well enough that I can port the patch to trunk.

I'm happy to give new alogorithm a try as they are developed.

>
> 3. The default current output format of cvs2svn is a single dump file
> with file revisions in commit order.  For the distributed SCMs, it is
> usually far more efficient to generate the file revisions file-by-file
> (non-chronologically) during the initial parse of the CVS files, and
> refer to the revisions by hash for the rest of the conversion.  In
> October I added a bunch of hooks to cvs2svn to make this possible.  Work
> remaining: code to reconstruct file text from CVS text + deltas,
> including proper handling of line-end conventions and keyword
> expansion/unexpansion, and of course the code to output the
> reconstructed snapshots in a git-consumable format.

This is a major benefit for git conversion, but it hasn't been a big
issues with the cvs2svn code. Hooks will be helpful.

-- 
Jon Smirl

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-27 11:51   ` Markus Schiltknecht
@ 2006-11-27 22:09     ` Michael Haggerty
  2006-11-28 15:18       ` Markus Schiltknecht
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Haggerty @ 2006-11-27 22:09 UTC (permalink / raw)
  To: Markus Schiltknecht; +Cc: Jon Smirl, Git Mailing List, dev, Shawn Pearce

Markus Schiltknecht wrote:
> Michael Haggerty wrote:
>> Jon, I wish you wouldn't portray as obstinacy what is simply a lack of
>> resources.  I would like very much to support other cvs2svn output
>> formats.  I think it would be great if other projects could benefit from
>> our work.  Most of the work I've been doing on cvs2svn lately has been
>> towards supporting other output SCMs.
> 
> Really? Hm. I'm somehow sorry for not joining cvs2svn but running my own
> thing with monotone. But I really think it took me less time. OTOH, I'm
> far from finished, yet...

There's still time to join forces :-)  "Far from finished" on a project
of this messiness can equal quite a bit of time.

But even if you want to pursue your own converter, consider visiting
#cvs2svn on irc.freenode.net if you want to discuss things.

> Anyway, I've made an attempt at solving the 'picking better sources for
> symbols'-problem:

Let me try to understand this...

> During parsing of all the *,v files, where I'm collecting events
> (commits, branching and tagging) into blobs, I do also remember
> 'possible parent branches' for all the symbols (tag and branch events).

This is the part that can get quite expensive for large repositories, as
there can be orders of magnitude more symbol creations than revisions.
According to Daniel Jacobowitz:

> [...] at one point I believe the GCC repository was gaining up
> to four tags a day (head, two supported release branches, and one
> vendor branch).  I've been using the principal that the number of tags
> might be unworkable, but the number of branches generally is not.

This means that the number of tag events is O(number-of-days *
total-number-of-files-in-repo), where the gcc repo has about 50000
files.  By contrast, only a small fraction of files is typically touched
in any day.

I've been trying to find a solution that doesn't require quite so much
space.  I think that if you allow yourself this much space, the problem
is not very difficult.

> After that and *before* the blob sorting, I check all blobs and try to
> find one single parent branch for them. In the best case, those symbol
> blobs do have exactly one possible parent branch, then I just pick that
> one. If there are multiple possible parents, I try to pick the deepest.
> As branches are symbols themselves, I have to run that multiple times
> until all symbols are resolved.
> 
> An example: having branches ROOT -> A -> B -> C (branched in that order)
> plus a branch D derived from branch A.

I assume that you are talking about a situation for which CVS is
ambiguous, like a file with

A = 1.2.2
B = 1.2.4
C = 1.2.6
D = 1.2.2.5.2

> The symbol blob for branch A: has only one possible parent: ROOT. Thus I
> assign A->parent_branch = ROOT.
> 
> Next comes the blob for branch C: it has two possible parents: branch B
> and branch A.

Why is ROOT not considered as a possible parent of C?

> At that point we know that A is derived from ROOT, but we
> don't have assigned a parent to B, yet. Thus we can not resolve C this
> time.
> 
> Then comes branch B: one parent: A. Mark it.
> 
> Next round, we process C again: this time, we know B is branched from A.
> Thus we can remove the possible parent A. Leaving only one possible
> parent branch: B.

But the fact that B preceded C chronologically does not mean that C is
derived from B.  If I branch from ROOT or A after creating branch B, the
result as stored in CVS looks exactly the same as if I branch from B
(unless a file was modified between the creation of the parent branch
and the creation of the child branch).

> Now, say we have a tag 'X', which ended up in a blob having A, B, C and
> D as possible parent branches. I currently remove A and B, as they are
> parents of C. But C and D still remain and conflict. I'm unable to
> resolve that symbol. I'm thinking about leaving such conflicts to the
> user to resolve.

From your description, this sounds like a tag that cannot be created
from a single parent branch.  Therefore it would have to be cobbled
together from multiple parents.

> I've not yet tested this algorithm extensively. Most larger repositories
> seem to fail somewhere, but not necessarily because of that symbol
> resolving algorithm... :-(
> 
> Any comments? Questions? Ideas? I hope to have explained clearly...
> 
> And I wish you all a lot of time for your open source projects and your
> families, friends, wifes, girl-friends, etc...! ;-)

:-) Thanks.  The same to you!


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-27 22:09     ` Michael Haggerty
@ 2006-11-28 15:18       ` Markus Schiltknecht
  2006-11-30  0:35         ` Michael Haggerty
  0 siblings, 1 reply; 30+ messages in thread
From: Markus Schiltknecht @ 2006-11-28 15:18 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jon Smirl, Git Mailing List, dev, Shawn Pearce

Hi,

Michael Haggerty wrote:
> There's still time to join forces :-)  "Far from finished" on a project
> of this messiness can equal quite a bit of time.

Yes. Maybe I'm a little pessimistic ;-)

> But even if you want to pursue your own converter, consider visiting
> #cvs2svn on irc.freenode.net if you want to discuss things.

Thanks, I just happen to not particularly like IRC... I prefer emails
and mailing lists.

>> During parsing of all the *,v files, where I'm collecting events
>> (commits, branching and tagging) into blobs, I do also remember
>> 'possible parent branches' for all the symbols (tag and branch events).
> 
> This is the part that can get quite expensive for large repositories, as
> there can be orders of magnitude more symbol creations than revisions.
> According to Daniel Jacobowitz:
> 
>> [...] at one point I believe the GCC repository was gaining up
>> to four tags a day (head, two supported release branches, and one
>> vendor branch).  I've been using the principal that the number of tags
>> might be unworkable, but the number of branches generally is not.
> 
> This means that the number of tag events is O(number-of-days *
> total-number-of-files-in-repo), where the gcc repo has about 50000
> files.  By contrast, only a small fraction of files is typically touched
> in any day.

Yeah, 50'000 * 1825 (5 years) * say 100 bytes -> 8GB  sounds like a lot.
OTOH, I certainly don't need 100 bytes per tag and one tag per day over 
five years is really a lot. Repositories that large are probably not 
converted to CVS on an old Pentium III...

I've just tested with the mozilla repository (I don't have the gcc one). 
The import has been run only through the first two stpes: collecting the 
blobs and symbol resolving. That took almost one and a half hour on my 
Core Duo with 2GB of memory:

real    85m20.684s
user    39m59.082s
sys     1m32.874s

And peak memory consumption was:

VmPeak:  1814024 kB

While the mozilla/mozilla cvs repository sums up to 3.1 GB. The monotone 
repository (which is still lacking the revisions, but has all files and 
file deltas) is 588MB after that step. I'd guess that once it finishes, 
it would be less than 1GB.

> I've been trying to find a solution that doesn't require quite so much
> space.  I think that if you allow yourself this much space, the problem
> is not very difficult.

Okay. As long as I can import it on my laptop I'm fine ;-)

>> After that and *before* the blob sorting, I check all blobs and try to
>> find one single parent branch for them. In the best case, those symbol
>> blobs do have exactly one possible parent branch, then I just pick that
>> one. If there are multiple possible parents, I try to pick the deepest.
>> As branches are symbols themselves, I have to run that multiple times
>> until all symbols are resolved.
>>
>> An example: having branches ROOT -> A -> B -> C (branched in that order)
>> plus a branch D derived from branch A.
> 
> I assume that you are talking about a situation for which CVS is
> ambiguous, like a file with
> 
> A = 1.2.2
> B = 1.2.4
> C = 1.2.6
> D = 1.2.2.5.2

Well, almost. I meant a whole repository with these branches. If one
file included all the branches it's getting easy to resolve. But for my
example, I had something like that in mind:

fileA:

A = 1.2.2
(no changes for branch B)
C = 1.2.4      --> makes A a possible parent of branch C
D = 1.2.2.5.2  --> makes A a possible parent of branch D
X = 1.2.4      --> makes C a possible parent of tag X

fileB:

A = 1.2.2
B = 1.2.4      --> makes A a possible parent of branch B
C = 1.2.6      --> makes B a possible parent of branch C
D = 1.2.2.5.2  --> makes A a possible parent of branch D
X = 1.2.2.5.2  --> makes D a possible parent of tag X

fileC:
A = 1.2.2
X = 1.2.2      --> makes A a possible parent of tag X

fileD:
A = 1.2.2
B = 1.2.4
X = 1.2.4      --> makes B a possible parent of tag X

>> The symbol blob for branch A: has only one possible parent: ROOT. Thus I
>> assign A->parent_branch = ROOT.
>>
>> Next comes the blob for branch C: it has two possible parents: branch B
>> and branch A.
> 
> Why is ROOT not considered as a possible parent of C?

Those were just examples. In my CVS-repository-in-mind, none of the
files were branching from ROOT directly into C.

>> At that point we know that A is derived from ROOT, but we
>> don't have assigned a parent to B, yet. Thus we can not resolve C this
>> time.
>>
>> Then comes branch B: one parent: A. Mark it.
>>
>> Next round, we process C again: this time, we know B is branched from A.
>> Thus we can remove the possible parent A. Leaving only one possible
>> parent branch: B.
> 
> But the fact that B preceded C chronologically does not mean that C is
> derived from B.

No. And I don't assume so in any place. Given the files above, I can
however clearly say that C got branched off from B, no?

> If I branch from ROOT or A after creating branch B, the
> result as stored in CVS looks exactly the same as if I branch from B
> (unless a file was modified between the creation of the parent branch
> and the creation of the child branch).

Sure. That would result in an unresolvable symbol.

>> Now, say we have a tag 'X', which ended up in a blob having A, B, C and
>> D as possible parent branches. I currently remove A and B, as they are
>> parents of C. But C and D still remain and conflict. I'm unable to
>> resolve that symbol. I'm thinking about leaving such conflicts to the
>> user to resolve.
> 
> From your description, this sounds like a tag that cannot be created
> from a single parent branch.  Therefore it would have to be cobbled
> together from multiple parents.

Right. I somehow have to cope with those cases, as CVS allows them and
monotone does not.

The main point in my symbol resolving code is trying to uniquely assign
a symbol to one branch wherever possible. And handing cases where this
is not possible to the user. AFAICT, it does so quite well.

Regards

Markus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-28 15:18       ` Markus Schiltknecht
@ 2006-11-30  0:35         ` Michael Haggerty
  2006-11-30  0:45           ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Haggerty @ 2006-11-30  0:35 UTC (permalink / raw)
  To: Markus Schiltknecht; +Cc: Jon Smirl, Git Mailing List, dev, Shawn Pearce

[-- Attachment #1: Type: text/plain, Size: 8538 bytes --]

Markus Schiltknecht wrote:
> Michael Haggerty wrote:
>> This is the part that can get quite expensive for large repositories, as
>> there can be orders of magnitude more symbol creations than revisions.
>> According to Daniel Jacobowitz:
>>
>>> [...] at one point I believe the GCC repository was gaining up
>>> to four tags a day (head, two supported release branches, and one
>>> vendor branch).  I've been using the principal that the number of tags
>>> might be unworkable, but the number of branches generally is not.
>>
>> This means that the number of tag events is O(number-of-days *
>> total-number-of-files-in-repo), where the gcc repo has about 50000
>> files.  By contrast, only a small fraction of files is typically touched
>> in any day.
> 
> Yeah, 50'000 * 1825 (5 years) * say 100 bytes -> 8GB  sounds like a lot.
> OTOH, I certainly don't need 100 bytes per tag and one tag per day over
> five years is really a lot. Repositories that large are probably not
> converted to CVS on an old Pentium III...

...times 4 (tags per day) -> 32GB.  If I understand correctly, the tags
were created nightly by automated scripts.

I admit that this is an extreme example, but the philosophy of the
cvs2svn project (a philosophy that I inherited from my predecessors, by
the way) is to be able to handle the most absurd repositories out there.

> Well, almost. I meant a whole repository with these branches. If one
> file included all the branches it's getting easy to resolve. But for my
> example, I had something like that in mind:

I am glad that we are getting into concrete examples.  But your example
needs some clarifications (see below).

> fileA:
> 
> A = 1.2.2
> (no changes for branch B)
> C = 1.2.4      --> makes A a possible parent of branch C

In this case, ROOT can also be C's parent.

> D = 1.2.2.5.2  --> makes A a possible parent of branch D

This implies that A is *necessarily* the parent of D.  If there were a
E=1.2.2.5.4, then the parent of E would be ambiguous but the parent of D
would still unambiguously be A.

> X = 1.2.4      --> makes C a possible parent of tag X

Wait a minute.  A tag always has an even number of integers.  Do you
mean X=1.2 or X=1.2.4.1?  The same below.

> fileB:
> 
> A = 1.2.2
> B = 1.2.4      --> makes A a possible parent of branch B

or ROOT

> C = 1.2.6      --> makes B a possible parent of branch C

or A or ROOT

> D = 1.2.2.5.2  --> makes A a possible parent of branch D

A is unambiguously the parent of D

> X = 1.2.2.5.2  --> makes D a possible parent of tag X
>
> fileC:
> A = 1.2.2
> X = 1.2.2      --> makes A a possible parent of tag X
> 
> fileD:
> A = 1.2.2
> B = 1.2.4
> X = 1.2.4      --> makes B a possible parent of tag X
> 
>>> The symbol blob for branch A: has only one possible parent: ROOT. Thus I
>>> assign A->parent_branch = ROOT.
>>>
>>> Next comes the blob for branch C: it has two possible parents: branch B
>>> and branch A.
>>
>> Why is ROOT not considered as a possible parent of C?
> 
> Those were just examples. In my CVS-repository-in-mind, none of the
> files were branching from ROOT directly into C.

In your example, ROOT *is* a possible parent of C.

>>> At that point we know that A is derived from ROOT, but we
>>> don't have assigned a parent to B, yet. Thus we can not resolve C this
>>> time.
>>>
>>> Then comes branch B: one parent: A. Mark it.

In your example, ROOT is also a possible parent of B.

>>> Next round, we process C again: this time, we know B is branched from A.
>>> Thus we can remove the possible parent A. Leaving only one possible
>>> parent branch: B.
>>
>> But the fact that B preceded C chronologically does not mean that C is
>> derived from B.
> 
> No. And I don't assume so in any place. Given the files above, I can
> however clearly say that C got branched off from B, no?

No.  C is nowhere unambiguously derived from B, therefore its parent
could be ROOT, A, or B.  See my example below.

>> If I branch from ROOT or A after creating branch B, the
>> result as stored in CVS looks exactly the same as if I branch from B
>> (unless a file was modified between the creation of the parent branch
>> and the creation of the child branch).
> 
> Sure. That would result in an unresolvable symbol.
> 
>>> Now, say we have a tag 'X', which ended up in a blob having A, B, C and
>>> D as possible parent branches. I currently remove A and B, as they are
>>> parents of C. But C and D still remain and conflict. I'm unable to
>>> resolve that symbol. I'm thinking about leaving such conflicts to the
>>> user to resolve.

I don't know how to deal with tag X because the numbers that you
assigned to it above can't be correct.

Consider the attached script.  It unambiguously creates branches A1 and
A2 from ROOT and branch B from A1, then adds tag X on branch B.  But in
the files:

fileA symbols
        X:1.1
        B:1.1.0.6
        A2:1.1.0.4
        A1:1.1.0.2;

fileB symbols
        X:1.1.2.1
        B:1.1.2.1.0.2
        A2:1.1.0.4
        A1:1.1.0.2;

fileC symbols
        X:1.1.6.1
        B:1.1.0.6
        A2:1.1.0.4
        A1:1.1.0.2;

fileD symbols
        X:1.1
        B:1.1.0.4
        A2:1.2.0.2
        A1:1.1.0.2;

Note that from looking at fileA alone, there is no way to tell whether
A2 was created from ROOT or A1, or whether B was created from ROOT, A1,
or A2.  And tag X is all over the place, even though for each file it
was created from branch B.

If only information from fileA,v is considered, any of the following
branching topologies would give identical fileA,v contents:

      ROOT
      /|\
     / | \
    A1 A2 B

      ROOT
      / \
     /   \
    A1   A2
    |
    B

      ROOT
      / \
     /   \
    A1   A2
          |
          B

      ROOT
      / \
     /   \
    A1    B
    |
    A2

      ROOT
       |
       A1
      / \
     /   \
    A2    B

      ROOT
       |
       A1
       |
       A2
       |
       B

And from the information present in fileA,v, it is not possible to tell
whether tag X was applied to ROOT, A1, A2, or B.

(Some topologies *are* ruled out because the revision numbers are
ordered incorrectly; for example:

      ROOT
       |
       B
      / \
     /   \
    A1   A2

      ROOT
       |
       A2
       |
       A1
       |
       B

are not consistent with fileA,v.)

If we also consider the information in fileB, it is clear that branch
B's parent is branch A1, but it is still not clear whether branch A2's
parent is ROOT or A1, or whether tag X was applied to branch A1, A2, or B.

Similarly, fileC,v tells us that tag X was applied to branch B, and
fileD,v tells us that A2's parent is ROOT.

Each file alone is quite ambiguous, but in this case putting the
information from all files together (with the assumption that they have
a mutually-consistent history) is enough to reconstruct the entire
branching topology.

What's worse in real life?  Each file rules out some possible histories
and the goal is to find a history that is consistent with all files.  But...

- There can easily be cases where even the total information from all
files is still not enough to choose a unique history.  In such cases we
need a way to select between the possible histories.

- Since files in CVS don't necessarily *have* a globally consistent
branching/tagging history, heuristics have to be used in such cases to
find histories that apply to subsets of the repository in some
reasonable way (i.e., the one that is most likely considering the way
people typically work with CVS).

- "Unlabeled branches": often users have removed the label from a
branch, but the branch is still used as a source for other branches.
Figuring out this situation is a real mess.

I imagine that the best results (never mind whether it is practical)
would be obtained by recording the topology constraints implied by each
*,v file, then trying to map the topologies onto each other pair by pair
to (1) combine the constraints and thereby limit the possible histories
and (2) deduce which unlabeled branches correspond to one another.  But
I still don't know how to deal with inconsistent histories.  I think a
bottom-up approach would be the most sensible, given that people are
probably more likely to tag a whole subdirectory rather than files
scattered here and there.

The second step is to decide at what point in time a branch or tag
should be created, with the goal of being able to create it as a
snapshot of the source branch at that moment.  This is not always
possible, even if the branch topologies are compatible.

Michael

[-- Attachment #2: makerepo.sh --]
[-- Type: application/x-shellscript, Size: 626 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Some tips for doing a CVS importer
  2006-11-30  0:35         ` Michael Haggerty
@ 2006-11-30  0:45           ` Daniel Jacobowitz
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Jacobowitz @ 2006-11-30  0:45 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Markus Schiltknecht, Jon Smirl, Git Mailing List, dev,
	Shawn Pearce

On Thu, Nov 30, 2006 at 01:35:18AM +0100, Michael Haggerty wrote:
> ...times 4 (tags per day) -> 32GB.  If I understand correctly, the tags
> were created nightly by automated scripts.

Correct.  Remember, checking out a branch from a CVS repository from a
particular date was extremely awkward; the tags were the only way to
have reproducible snapshots.

-- 
Daniel Jacobowitz

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2006-11-30  0:45 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-20 21:49 Some tips for doing a CVS importer Jon Smirl
2006-11-20 23:03 ` Martin Langhoff
2006-11-20 23:37   ` Jon Smirl
2006-11-21  0:29     ` Martin Langhoff
2006-11-21  0:55       ` Carl Worth
2006-11-21  1:40         ` Jon Smirl
2006-11-21  6:39           ` Shawn Pearce
2006-11-21 19:56             ` lamikr
2006-11-21 20:05               ` Shawn Pearce
2006-11-23 19:45                 ` Robin Rosenberg
2006-11-25  6:59                   ` Shawn Pearce
2006-11-21 20:03             ` Petr Baudis
2006-11-21 20:15               ` Shawn Pearce
2006-11-21 20:22               ` Johannes Schindelin
2006-11-23  9:10                 ` Johannes Sixt
2006-11-21 20:40               ` Martin Langhoff
2006-11-21  1:53       ` Jon Smirl
2006-11-26 10:18         ` Marko Macek
2006-11-26 15:35           ` Jon Smirl
2006-11-26 16:11             ` Marko Macek
2006-11-26 17:51               ` Jon Smirl
2006-11-27 11:29               ` Michael Haggerty
2006-11-21  6:43       ` Shawn Pearce
2006-11-27 11:24 ` Michael Haggerty
2006-11-27 11:51   ` Markus Schiltknecht
2006-11-27 22:09     ` Michael Haggerty
2006-11-28 15:18       ` Markus Schiltknecht
2006-11-30  0:35         ` Michael Haggerty
2006-11-30  0:45           ` Daniel Jacobowitz
2006-11-27 15:20   ` Jon Smirl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).