* Re: [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] [not found] <47BE167A.4060005@internode.on.net> @ 2008-02-22 7:32 ` Shawn O. Pearce 2008-02-22 7:47 ` Ian Clatworthy ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Shawn O. Pearce @ 2008-02-22 7:32 UTC (permalink / raw) To: Ian Clatworthy; +Cc: Bazaar, git Ian Clatworthy <ian.clatworthy@internode.on.net> wrote: > FYI. I thought you'd be interested in this as it's inspired by and based > on git-fast-import. You can download the Python source from > https://code.launchpad.net/bzr-fastimport/. In particular, there's a > Python parser of the stream format included that may be useful to the > Git community or other VCS communities. The fast-import-info and > fast-import-filter commands might also be useful to others. This is interesting. I'm not a Python guy, but the info and filter commands do look like they could be useful beyond the Bazaar community. Michael Haggerty of cvs2svn has spent a good amount of time creating a git-fast-import backend to cvs2svn. Given that cvs2svn is one of the few tools that can read some of the really strange real world CVS trees its good to be able to leverage that work for other systems (SVN, Git, and now Bazaar). > BTW, you might want to either extend the specification (a little) or fix > git-fast-export so they match. :-) See doc/notes.txt under > http://bazaar.launchpad.net/~bzr/bzr-fastimport/fastimport.dev/files for > details. For example, running git-fast-export on 64-bit Hardy Heron > produces file modes longer than permitted if the spec was strictly > interpreted. We may need to take a small hammer to git-fast-export and fix its output. Generating long mode strings like your notes suggest is incorrect. The fast-import format is very strict, to avoid any sort of ambiguous behavior and implicit data corruption during import. This is one reason we don't use "auto format detection" for dates and instead require that the frontend tell us what date format it is using, and stick to only that format thoughout the stream. Its also a reason why we only support a limited number of date formats. File modes in Git are very limited. We really only want symlinks or regular files with permissions of 644 and 755. Everything else is bogus. We also now have the S_IFGITLINK mode to deal with but fast-import does not currently support it. > Looking forward, I'd probably like to extend the spec to support some > Bazaar-specific features, e.g. versioning of directories without files > inside them. If you have a preferred way of me doing this or would like > to work on it together when that time comes, please let me know. To keep > backwards compatibility, the first option that springs to mind is using > specially marked comments for stuff like this, e.g. > > ##bzr:: blah blah blah Technically its valid for a Git tree to contain an empty subdirectory, but that directory would disappear if the user tries to make a commit on top of it due to the current limitations of the index file. So git-fast-import could actually allow the frontend to create an empty directory in the stream format, and record that correctly in Git. Its just that building on top of that may cause the directory to disappear. :) If we are heading in the direction of making this a common stream format I'd like to try and work it out such that any additional extensions aren't VCS specific, at least as much as we can avoid it. That way exports from a source into this format can be loaded into any VCS that recognizes it, and have little or no loss. So yes, I am interested in trying to work with you and anyone else who wants to extend the format further. > Finally, thanks for writing git-fast-import and the associated > documentation. It's well done. If you have any thoughts on the various > front-ends available, I'm interested in hearing them. As well as saving > me time vs testing lots of them, your thoughts will give us things to > keep in mind when developing bzr-fastexport soon. I think the fast-import documentation is the longest chunk of docs we have in git, at least as a single manpage. :-) The git-p4 importer in the git.git contrib/ directory is probably the most well known and most widely used frontend. It is also an incremental tool, and I know a number of Qt developers use it to mirror the Qt Perforce tree into Git. The t/t9300-fast-import.sh test script contains a number of tests for git-fast-import. The test cases themselves (in terms of the stream it feeds in) may be of some use to you as it covers most of the currently recognized stream format. > -------- Original Message -------- > Subject: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option > Date: Fri, 22 Feb 2008 01:23:34 +1000 > From: Ian Clatworthy <ian.clatworthy@internode.on.net> > To: Bazaar <bazaar@lists.canonical.com>, bazaar-announce@lists.canonical.com > > I'm pleased to announce bzr-fastimport, a plugin useful for loading data > exported by a large number of foreign VCS tools. Places to start are: > > * the Launchpad page - https://launchpad.net/bzr-fastimport > * the Wiki page - http://bazaar-vcs.org/BzrFastImport. > > Please note that this is not yet production quality but seems to be > working well enough to be useful for a large number of projects. > > I would *greatly* appreciate testing, feedback and improvements. In > particular, I'm using this for migrating the OpenOffice.org repository > (76K files and 500K revisions) into Bazaar from Subversion, so I'd > really like some help with testing out and enhancing the existing > Subversion front-ends. Heh. OOo is _huge_. I think the best import into Git thus far is taking up about 1.5G of disk space once fully repacked. I don't recall how they did the import, but coming from SVN I think they used git-svn, which is not based on git-fast-import. What frontend are you using to go from SVN -> fast-import? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] 2008-02-22 7:32 ` [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] Shawn O. Pearce @ 2008-02-22 7:47 ` Ian Clatworthy 2008-02-22 10:36 ` Johannes Schindelin 2008-02-22 10:33 ` Johannes Schindelin 2008-02-22 11:37 ` Pierre Habouzit 2 siblings, 1 reply; 10+ messages in thread From: Ian Clatworthy @ 2008-02-22 7:47 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Bazaar, git Shawn O. Pearce wrote: > If we are heading in the direction of making this a common stream > format I'd like to try and work it out such that any additional > extensions aren't VCS specific, at least as much as we can avoid it. > That way exports from a source into this format can be loaded into > any VCS that recognizes it, and have little or no loss. So yes, > I am interested in trying to work with you and anyone else who wants > to extend the format further. Excellent. That sounds the right way to go. I'll contact you if and when I want to add stuff. > Heh. OOo is _huge_. I think the best import into Git thus far is > taking up about 1.5G of disk space once fully repacked. I don't > recall how they did the import, but coming from SVN I think they > used git-svn, which is not based on git-fast-import. > > What frontend are you using to go from SVN -> fast-import? The pack file in the Git clone I have is 2.4G. I thought that was large but it's quite small compared to the 82G svn dump that creates a 55G svn repo! I'm using svn-fast-export.c currently. I'd rather enhance the Python one but my Subversion binding knowledge is slim and there's a bug wrt "too many open files" that causes it to crash almost immediately on the OOo repo. It's not obvious to me how to fix that unfortunately. It worked fine for Wordpress OTOH. The svn-all-fast-export tool sounds interesting but is completely undocumented to my knowledge. Ian C. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] 2008-02-22 7:47 ` Ian Clatworthy @ 2008-02-22 10:36 ` Johannes Schindelin 0 siblings, 0 replies; 10+ messages in thread From: Johannes Schindelin @ 2008-02-22 10:36 UTC (permalink / raw) To: Ian Clatworthy; +Cc: Shawn O. Pearce, Bazaar, git Hi, On Fri, 22 Feb 2008, Ian Clatworthy wrote: > Shawn O. Pearce wrote: > > > Heh. OOo is _huge_. I think the best import into Git thus far is > > taking up about 1.5G of disk space once fully repacked. I don't > > recall how they did the import, but coming from SVN I think they used > > git-svn, which is not based on git-fast-import. > > > > What frontend are you using to go from SVN -> fast-import? > > The pack file in the Git clone I have is 2.4G. I thought that was large > but it's quite small compared to the 82G svn dump that creates a 55G svn > repo! The 2.4G have been compressed (loss-lessly ;-) to less than 1.5G. Unlike other SCMs, git has transparent access to the object database, which means that we can actually repack _expensively_ for a better compression. So yes, the Git clone you have _is_ 2.4G, but that size is not the best size you _can_ have. Ciao, Dscho ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] 2008-02-22 7:32 ` [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] Shawn O. Pearce 2008-02-22 7:47 ` Ian Clatworthy @ 2008-02-22 10:33 ` Johannes Schindelin 2008-02-22 11:37 ` Pierre Habouzit 2 siblings, 0 replies; 10+ messages in thread From: Johannes Schindelin @ 2008-02-22 10:33 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Bazaar, git Hi, On Fri, 22 Feb 2008, Shawn O. Pearce wrote: > Ian Clatworthy <ian.clatworthy@internode.on.net> wrote: > > > BTW, you might want to either extend the specification (a little) or > > fix git-fast-export so they match. :-) See doc/notes.txt under > > http://bazaar.launchpad.net/~bzr/bzr-fastimport/fastimport.dev/files > > for details. For example, running git-fast-export on 64-bit Hardy > > Heron produces file modes longer than permitted if the spec was > > strictly interpreted. > > We may need to take a small hammer to git-fast-export and fix its > output. Generating long mode strings like your notes suggest is > incorrect. Indeed. Raising the issue with the original author of git-fast-export would not have hurt either. Ciao, Dscho ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] 2008-02-22 7:32 ` [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] Shawn O. Pearce 2008-02-22 7:47 ` Ian Clatworthy 2008-02-22 10:33 ` Johannes Schindelin @ 2008-02-22 11:37 ` Pierre Habouzit 2008-02-22 11:47 ` cvs2svn, was " Johannes Schindelin ` (2 more replies) 2 siblings, 3 replies; 10+ messages in thread From: Pierre Habouzit @ 2008-02-22 11:37 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Ian Clatworthy, Bazaar, git [-- Attachment #1: Type: text/plain, Size: 1651 bytes --] On Fri, Feb 22, 2008 at 07:32:28AM +0000, Shawn O. Pearce wrote: > Ian Clatworthy <ian.clatworthy@internode.on.net> wrote: > > FYI. I thought you'd be interested in this as it's inspired by and based > > on git-fast-import. You can download the Python source from > > https://code.launchpad.net/bzr-fastimport/. In particular, there's a > > Python parser of the stream format included that may be useful to the > > Git community or other VCS communities. The fast-import-info and > > fast-import-filter commands might also be useful to others. > > This is interesting. I'm not a Python guy, but the info and filter > commands do look like they could be useful beyond the Bazaar community. > > Michael Haggerty of cvs2svn has spent a good amount of time creating > a git-fast-import backend to cvs2svn. Given that cvs2svn is one of > the few tools that can read some of the really strange real world > CVS trees its good to be able to leverage that work for other systems > (SVN, Git, and now Bazaar). /me opens bigs ears and eyes: does this mean that we have an incremental importer of CVS based on git-fast-import ? I mean I'm really interested into that, as git-cvsimport is really broken with the glibc CVS tree, and as the glibc CVSROOT is rsync-able, an incremental importer that has access to the CVSROOT RCS files is probably the most efficient way. Is there any link you can provide to me about these new features of cvs2svn ? -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* cvs2svn, was Re: [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] 2008-02-22 11:37 ` Pierre Habouzit @ 2008-02-22 11:47 ` Johannes Schindelin 2008-02-22 13:14 ` Michael Haggerty 2008-02-22 14:44 ` Aidan Van Dyk 2 siblings, 0 replies; 10+ messages in thread From: Johannes Schindelin @ 2008-02-22 11:47 UTC (permalink / raw) To: Pierre Habouzit; +Cc: git Hi, On Fri, 22 Feb 2008, Pierre Habouzit wrote: > Is there any link you can provide to me about these new features of > cvs2svn ? http://article.gmane.org/gmane.comp.version-control.git/74461/match=cvs2svn Hth, Dscho ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] 2008-02-22 11:37 ` Pierre Habouzit 2008-02-22 11:47 ` cvs2svn, was " Johannes Schindelin @ 2008-02-22 13:14 ` Michael Haggerty 2008-02-22 14:44 ` Aidan Van Dyk 2 siblings, 0 replies; 10+ messages in thread From: Michael Haggerty @ 2008-02-22 13:14 UTC (permalink / raw) To: Pierre Habouzit, Shawn O. Pearce, Ian Clatworthy, Bazaar, git Pierre Habouzit wrote: > On Fri, Feb 22, 2008 at 07:32:28AM +0000, Shawn O. Pearce wrote: >> Michael Haggerty of cvs2svn has spent a good amount of time creating >> a git-fast-import backend to cvs2svn. Given that cvs2svn is one of >> the few tools that can read some of the really strange real world >> CVS trees its good to be able to leverage that work for other systems >> (SVN, Git, and now Bazaar). > > /me opens bigs ears and eyes: does this mean that we have an > incremental importer of CVS based on git-fast-import ? cvs2svn is robust and uses git-fast-import, but it is *not* incremental. Incremental conversion would be fun but it would be a lot of work to implement in such a way that it works reliably. Michael ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] 2008-02-22 11:37 ` Pierre Habouzit 2008-02-22 11:47 ` cvs2svn, was " Johannes Schindelin 2008-02-22 13:14 ` Michael Haggerty @ 2008-02-22 14:44 ` Aidan Van Dyk 2008-02-22 18:20 ` Pierre Habouzit 2 siblings, 1 reply; 10+ messages in thread From: Aidan Van Dyk @ 2008-02-22 14:44 UTC (permalink / raw) To: Pierre Habouzit, Shawn O. Pearce, Ian Clatworthy, Bazaar, git [-- Attachment #1: Type: text/plain, Size: 1591 bytes --] * Pierre Habouzit <madcoder@debian.org> [080201 08:20]: > /me opens bigs ears and eyes: does this mean that we have an > incremental importer of CVS based on git-fast-import ? I mean I'm really > interested into that, as git-cvsimport is really broken with the glibc > CVS tree, and as the glibc CVSROOT is rsync-able, an incremental > importer that has access to the CVSROOT RCS files is probably the most > efficient way. In the repository I convert (PostgreSQL), I'm using the ruby fromcvs/togit converter, which has worked well, because git-cvsimport doesn't work. I actually found the problem with the PostgreSQL CVS repository - it is a TAG, which seems to have some cyclic dependencies which throws cvsps into a loop. Unfortuntely, I have neither time nor energy to be able to look into fixing cvsps, especially since fromcvs "just works" on it. http://mid.gmane.org/20080220220014.GB16099@yugib.highrise.ca I don't know what the problem with the glibc CVSROOT, but if it's the same, that might be something to look at. Note that fromcvs doesn't import tags (that's probably why it didn't have any trouble with the PostgreSQL CVS) but that doesn't bother me, since CVS tags carry none of the authority git tags do, and the git commit ids provide a stable way to refer to particular commits anyways. a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] 2008-02-22 14:44 ` Aidan Van Dyk @ 2008-02-22 18:20 ` Pierre Habouzit 2008-02-22 19:25 ` Aidan Van Dyk 0 siblings, 1 reply; 10+ messages in thread From: Pierre Habouzit @ 2008-02-22 18:20 UTC (permalink / raw) To: Aidan Van Dyk; +Cc: Shawn O. Pearce, Ian Clatworthy, Bazaar, git [-- Attachment #1: Type: text/plain, Size: 1060 bytes --] On Fri, Feb 22, 2008 at 02:44:15PM +0000, Aidan Van Dyk wrote: > * Pierre Habouzit <madcoder@debian.org> [080201 08:20]: > > > /me opens bigs ears and eyes: does this mean that we have an > > incremental importer of CVS based on git-fast-import ? I mean I'm really > > interested into that, as git-cvsimport is really broken with the glibc > > CVS tree, and as the glibc CVSROOT is rsync-able, an incremental > > importer that has access to the CVSROOT RCS files is probably the most > > efficient way. > > In the repository I convert (PostgreSQL), I'm using the ruby > fromcvs/togit converter, which has worked well, because git-cvsimport > doesn't work. Well, last time I tried, it exploded miserably (big fat OOM) because glibc CVS repository comes back to 1984 or so, and has a very nasty big fat Changelog with literally thousands of modifications. -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] 2008-02-22 18:20 ` Pierre Habouzit @ 2008-02-22 19:25 ` Aidan Van Dyk 0 siblings, 0 replies; 10+ messages in thread From: Aidan Van Dyk @ 2008-02-22 19:25 UTC (permalink / raw) To: Pierre Habouzit, Shawn O. Pearce, Ian Clatworthy, Bazaar, git [-- Attachment #1: Type: text/plain, Size: 643 bytes --] * Pierre Habouzit <madcoder@debian.org> [080222 13:40]: > Well, last time I tried, it exploded miserably (big fat OOM) because > glibc CVS repository comes back to 1984 or so, and has a very nasty big > fat Changelog with literally thousands of modifications. <me run="prepare to"> When importing a repository like that wouldn't it make sense to nuke the redundant ChangeLog,v? </me run="far away"> -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-02-22 19:26 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <47BE167A.4060005@internode.on.net>
2008-02-22 7:32 ` [Fwd: [ANNOUNCE] bzr-fastimport plugin, yet another Bazaar import option] Shawn O. Pearce
2008-02-22 7:47 ` Ian Clatworthy
2008-02-22 10:36 ` Johannes Schindelin
2008-02-22 10:33 ` Johannes Schindelin
2008-02-22 11:37 ` Pierre Habouzit
2008-02-22 11:47 ` cvs2svn, was " Johannes Schindelin
2008-02-22 13:14 ` Michael Haggerty
2008-02-22 14:44 ` Aidan Van Dyk
2008-02-22 18:20 ` Pierre Habouzit
2008-02-22 19:25 ` Aidan Van Dyk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).