* working with a large repository and git svn @ 2011-01-12 1:27 Joe Corneli 2011-01-12 15:30 ` Wesley J. Landaker 0 siblings, 1 reply; 12+ messages in thread From: Joe Corneli @ 2011-01-12 1:27 UTC (permalink / raw) To: git Greetings - I am experiencing trouble with git svn, trying to import a large repository (7.9 gigs, ~54000 commits) from Git into SVN. This has failed in a couple of different ways, depending on the operating environment. With Git version 1.7.3.5 running on Ubuntu 9.10, in the final step git svn dcommit --no-rebase of the formula described below, I get: failing with "Can't fork at /usr/share/perl/5.10.0/Git.pm line 1261." after committing just over 2000 revisions. Previously, on Mac OS X 10.6.4 with git version 1.7.3.4, it made it through about 18000 commits before failing with some other error. (I don't have that one recorded at the moment.) Separately from the latest attempt, I tried repacking the repository before doing the "git svn" stuff, with git repack -a -d --depth=250 --window=250 -f but that also failed ("pack-objects died of signal 11"). Any tips for dealing with new, large, repositories would be appreciated. The sequence of commands I used are below the 8<. Thanks, Joe 8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8< ## Creating an svn repo $ mkdir repo; $ svnadmin create repo; $ mkdir init; $ touch init/README; $ svn import init file://`pwd`/repo/init -m "Initial import"; $ svn checkout file://`pwd`/repo/init working; ## THIS PART FOLLOWS THE MODEL SUGGESTED BY THE FOLKS AT code.google.com ## (/Users/jac2349/planetary/destination/ IS THE LOCATION OF MY GIT REPO.) $ mkdir cloning $ cd cloning $ git svn clone file:///Users/jac2349/planetary/repo/init $ cd init $ git fetch git:///Users/jac2349/planetary/destination/.git $ git branch tmp $(cut -b-40 .git/FETCH_HEAD) $ git tag -a -m "Last fetch" last tmp $ INIT_COMMIT=$(git log tmp --pretty=format:%H | tail -1) $ git checkout $INIT_COMMIT . $ git commit -C $INIT_COMMIT $ git rebase master tmp $ git branch -M tmp master $ git svn dcommit --no-rebase $ mv .git/refs/tags/newlast .git/refs/tags/last ## BTW, THE --no-rebase FLAG KEEPS IT FROM BEING IMPOSSIBLY SLOW! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: working with a large repository and git svn 2011-01-12 1:27 working with a large repository and git svn Joe Corneli @ 2011-01-12 15:30 ` Wesley J. Landaker 2011-01-13 0:54 ` Joe Corneli 2011-01-13 3:23 ` Jonathan Nieder 0 siblings, 2 replies; 12+ messages in thread From: Wesley J. Landaker @ 2011-01-12 15:30 UTC (permalink / raw) To: Joe Corneli; +Cc: git On Tuesday, January 11, 2011 18:27:10 Joe Corneli wrote: > I am experiencing trouble with git svn, trying to import a > large repository (7.9 gigs, ~54000 commits) from Git into > SVN. > > This has failed in a couple of different ways, depending > on the operating environment. With Git version 1.7.3.5 > running on Ubuntu 9.10, in the final step > > git svn dcommit --no-rebase > > of the formula described below, I get: > > failing with "Can't fork at /usr/share/perl/5.10.0/Git.pm line 1261." > > after committing just over 2000 revisions. I haven't tried importing 8 GB from Git to Subversion, but I have used Git against existing huge Subversion repositories that are >= 10 GB with little trouble, other than that it takes forever because Subversion is slow. Here are some thoughts on how I'd approach what you are doing. Realize that no matter what, it's still probably going to take "forever" (e.g. run it over the weekend). 1) Sounds like git-svn is running out of resources on your machine -- that's probably a bug, but work around it: Don't dcommit all 20000 revisions at once. Maybe write a shell script that goes through and dcommits a 100 commits at a time. 2) Do you need the full history to be in SVN? Can you rebase/squash large parts together and thus need to commit less revisions in the first place? 3) I love git-svn for working with Subversion repositories, but you could consider a different tool, like tailor, if you can't make git-svn do what you want. I have also heard talk (but I don't know the state of things) of people working on a fast-import tool for SVN, so you could git-fast-export and svn-fast-import in a big batch. 4) Does 8 GB of data really belong in the same repository? Maybe it should really be split up and used with git submodules or SVN externals? That may make things easier to work with in the long term. 5) Do you really want to be going from Git, to Subversion? That seems like a big step backwards. =) In any case, good luck! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: working with a large repository and git svn 2011-01-12 15:30 ` Wesley J. Landaker @ 2011-01-13 0:54 ` Joe Corneli 2011-01-13 3:23 ` Jonathan Nieder 1 sibling, 0 replies; 12+ messages in thread From: Joe Corneli @ 2011-01-13 0:54 UTC (permalink / raw) Cc: git > 1) Sounds like git-svn is running out of resources on your machine -- > that's probably a bug, but work around it: Don't dcommit all 20000 revisions > at once. Maybe write a shell script that goes through and dcommits a 100 > commits at a time. Hm, I found a related blog post here, but designed for interactive use: http://fredericiana.com/2009/12/31/partial-svn-dcommit-with-git/ Could you give me a more detailed hint about how to do what you suggested? > 2) Do you need the full history to be in SVN? Can you rebase/squash large > parts together and thus need to commit less revisions in the first place? Maybe. We want a tool for managing the entire history, and Git seems like a good tool for that. At the same time, checking out the entire history can take a long time - if we could just check out just the latest files and check them back in in a sensible way, that would be good - SVN does seem suitable for that purpose. If there's a git-only way to do this I'd be happy to know about that as well! > 3) I love git-svn for working with Subversion repositories, but you could > consider a different tool, like tailor, if you can't make git-svn do what > you want. Tried it, but it didn't even get through the initiation phase. I asked for help in the relevant mailing list. > people working on a fast-import tool for SVN, so you could git-fast-export > and svn-fast-import in a big batch. Not finding these. > 4) Does 8 GB of data really belong in the same repository? Maybe it should > really be split up and used with git submodules or SVN externals? That may > make things easier to work with in the long term. Probably true. if there was a nice way to give each *file* its own associated "repository", then stitch these together into packets (even "on demand"), that would be cool. I was assuming we could do fancy stuff like this as "future work" however - and it would seem that if we use a completely git-based solution we'll be there. > 5) Do you really want to be going from Git, to Subversion? That seems like > a big step backwards. =) If there's a good way to just pull down the latest revision into a working copy and be able to push that back to the repo that would be nice. This doesn't seem to be the Git way, but for an 8 gig repo it's probably pretty important feature. Thoughts? Thanks, Joe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: working with a large repository and git svn 2011-01-12 15:30 ` Wesley J. Landaker 2011-01-13 0:54 ` Joe Corneli @ 2011-01-13 3:23 ` Jonathan Nieder 2011-01-14 7:43 ` Joe Corneli 1 sibling, 1 reply; 12+ messages in thread From: Jonathan Nieder @ 2011-01-13 3:23 UTC (permalink / raw) To: Wesley J. Landaker; +Cc: Joe Corneli, git, Ramkumar Ramachandra Wesley J. Landaker wrote: > 3) I love git-svn for working with Subversion repositories, but you could > consider a different tool, like tailor, if you can't make git-svn do what > you want. I have also heard talk (but I don't know the state of things) of > people working on a fast-import tool for SVN, so you could git-fast-export > and svn-fast-import in a big batch. I think the state of the art is currently git2svn[1] + "svnrdump load". This requires permission to change properties on the svn repo, just like svnsync would. Hope that helps, Jonathan [1] http://repo.or.cz/w/git2svn.git ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: working with a large repository and git svn 2011-01-13 3:23 ` Jonathan Nieder @ 2011-01-14 7:43 ` Joe Corneli 2011-01-14 8:05 ` Ramkumar Ramachandra 0 siblings, 1 reply; 12+ messages in thread From: Joe Corneli @ 2011-01-14 7:43 UTC (permalink / raw) To: git > I think the state of the art is currently git2svn Thanks, that did indeed work, though, for the record it uses committer name and email in the log that it generates, not author name and email, but no worries! Joe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: working with a large repository and git svn 2011-01-14 7:43 ` Joe Corneli @ 2011-01-14 8:05 ` Ramkumar Ramachandra 2011-01-14 8:29 ` Jonathan Nieder 0 siblings, 1 reply; 12+ messages in thread From: Ramkumar Ramachandra @ 2011-01-14 8:05 UTC (permalink / raw) To: Joe Corneli; +Cc: git, Jonathan Nieder Hi Joe, Joe Corneli writes: > > I think the state of the art is currently git2svn > > Thanks, that did indeed work, though, for the record it uses committer > name and email in the log that it generates, not author name and > email, but no worries! That should be easy enough to fix with something like this (warning: untested). A more elegant solution would actually use some sort of user-configurable mapping from Git authors/ committers to SVN authors though. Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> --8<-- diff --git a/git2svn b/git2svn index 2380775..3856696 100755 --- a/git2svn +++ b/git2svn @@ -261,12 +261,8 @@ COMMAND: while (!eof(IN)) { $commit{Mark} = $1; $next = next_line($IN); } - if ($next =~ m/author +(.*)/) { - $commit{Author} = $1; - $next = next_line($IN); - } - unless ($next =~ m/committer +(.+) +<([^>]+)> +(\d+) +[+-](\d+)$/) { - die "missing comitter: $_"; + unless ($next =~ m/author +(.+) +<([^>]+)> +(\d+) +[+-](\d+)$/) { + die "missing author: $_"; } $commit{CommitterName} = $1; @@ -275,6 +271,9 @@ COMMAND: while (!eof(IN)) { $commit{CommitterTZ} = $4; $next = next_line($IN); + if ($next =~ m/committer +(.*)/) { + $next = next_line($IN); + } my $log = read_data($IN, $next); $next = next_line($IN); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: working with a large repository and git svn 2011-01-14 8:05 ` Ramkumar Ramachandra @ 2011-01-14 8:29 ` Jonathan Nieder 2011-01-14 9:23 ` Michael Haggerty [not found] ` <F0299861-B36C-459C-972E-856212A92615@kth.se> 0 siblings, 2 replies; 12+ messages in thread From: Jonathan Nieder @ 2011-01-14 8:29 UTC (permalink / raw) To: Ramkumar Ramachandra; +Cc: Joe Corneli, git, Love Hörnquist Åstrand Ramkumar Ramachandra wrote: > Joe Corneli writes: >>> I think the state of the art is currently git2svn >> >> Thanks, that did indeed work, though, for the record it uses committer >> name and email in the log that it generates, not author name and >> email, but no worries! > > That should be easy enough to fix with something like this (warning: > untested). A more elegant solution would actually use some sort of > user-configurable mapping from Git authors/ committers to SVN authors > though. Thanks for the cc. (cc-ing lha, as I should have before.) I suppose if svn will show only one of the two (committer and author) then it is better to show the author. Possible complications: . The author lines in fast-import streams are optional. . Existing users of the incremental import facility might not want the meaning of svn:author to change between imports. _If_ that is a problem then a command-line option to switch behaviors might help. . Is svn okay with non-monotonic dates? (If not, then the committer date would need to be used.) Modulo those complications I like the idea. (Though I haven't read the implementation, which follows for reference.) > > Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> > --8<-- > diff --git a/git2svn b/git2svn > index 2380775..3856696 100755 > --- a/git2svn > +++ b/git2svn > @@ -261,12 +261,8 @@ COMMAND: while (!eof(IN)) { > $commit{Mark} = $1; > $next = next_line($IN); > } > - if ($next =~ m/author +(.*)/) { > - $commit{Author} = $1; > - $next = next_line($IN); > - } > - unless ($next =~ m/committer +(.+) +<([^>]+)> +(\d+) +[+-](\d+)$/) { > - die "missing comitter: $_"; > + unless ($next =~ m/author +(.+) +<([^>]+)> +(\d+) +[+-](\d+)$/) { > + die "missing author: $_"; > } > > $commit{CommitterName} = $1; > @@ -275,6 +271,9 @@ COMMAND: while (!eof(IN)) { > $commit{CommitterTZ} = $4; > > $next = next_line($IN); > + if ($next =~ m/committer +(.*)/) { > + $next = next_line($IN); > + } > my $log = read_data($IN, $next); > > $next = next_line($IN); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: working with a large repository and git svn 2011-01-14 8:29 ` Jonathan Nieder @ 2011-01-14 9:23 ` Michael Haggerty [not found] ` <F0299861-B36C-459C-972E-856212A92615@kth.se> 1 sibling, 0 replies; 12+ messages in thread From: Michael Haggerty @ 2011-01-14 9:23 UTC (permalink / raw) To: Jonathan Nieder Cc: Ramkumar Ramachandra, Joe Corneli, git, Love Hörnquist Åstrand On 01/14/2011 09:29 AM, Jonathan Nieder wrote: > . Is svn okay with non-monotonic dates? (If not, then the committer > date would need to be used.) Subversion can tolerate non-monotonic dates with one caveat: it breaks the find-revision-by-date feature (e.g., "svn update -r '{2010-12-25}'") for the time intervals with non-monotonic dates. This is a seldom-used feature and therefore its sacrifice is often accepted, for example when the history of the Subversion project itself was migrated into the Apache project's Subversion repository. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <F0299861-B36C-459C-972E-856212A92615@kth.se>]
* [PATCH] Optionally parse author information [not found] ` <F0299861-B36C-459C-972E-856212A92615@kth.se> @ 2011-01-14 10:16 ` Ramkumar Ramachandra 2011-01-16 2:17 ` Joe Corneli 0 siblings, 1 reply; 12+ messages in thread From: Ramkumar Ramachandra @ 2011-01-14 10:16 UTC (permalink / raw) To: Love Hörnquist Åstrand Cc: Jonathan Nieder, Joe Corneli, Git List, Michael Haggerty When creating a new commit, instead of picking up the SVN author from the committer's email, pick it up from the author's email, when possible. Also add a new command-line switch '--ignore-author' to force older behavior for backward compatibilty. Noticed-by: Joe Corneli <holtzermann17@gmail.com> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> --- git2svn | 25 +++++++++++++++++++------ 1 files changed, 19 insertions(+), 6 deletions(-) diff --git a/git2svn b/git2svn index 2380775..8ef55f1 100755 --- a/git2svn +++ b/git2svn @@ -36,7 +36,7 @@ use Pod::Usage; my $IN; my $OUT; -my ($help, $verbose, $keeplogs, $no_load); +my ($help, $verbose, $keeplogs, $no_load, $ignore_author); # svn my $svntree = "repro"; @@ -200,6 +200,7 @@ $result = GetOptions ("git-branch=s" => \$branch, "svn-prefix=s" => \$basedir, "keep-logs" => \$keeplogs, "no-load" => \$no_load, + "ignore-author" => \$ignore_author, "verbose+" => \$verbose, "help" => \$help) or pod2usage(2); @@ -261,12 +262,15 @@ COMMAND: while (!eof(IN)) { $commit{Mark} = $1; $next = next_line($IN); } - if ($next =~ m/author +(.*)/) { - $commit{Author} = $1; + if ($next =~ m/author +(.+) +<([^>]+)> +(\d+) +[+-](\d+)$/) { + $commit{AuthorName} = $1; + $commit{AuthorEmail} = $2; + $commit{AuthorWhen} = $3; + $commit{AuthorTZ} = $4; $next = next_line($IN); } unless ($next =~ m/committer +(.+) +<([^>]+)> +(\d+) +[+-](\d+)$/) { - die "missing comitter: $_"; + die "missing committer: $_"; } $commit{CommitterName} = $1; @@ -291,11 +295,15 @@ COMMAND: while (!eof(IN)) { strftime("%Y-%m-%dT%H:%M:%S.000000Z", gmtime($commit{CommitterWhen})); - my $author = "(no author)"; + my $author = "git2svn-dump"; if ($commit{CommitterEmail} =~ m/([^@]+)/) { $author = $1; } - $author = "git2svn-dump" if ($author eq "(no author)"); + unless ($ignore_author) { + if ($commit{AuthorEmail} =~ m/([^@]+)/) { + $author = $1; + } + } my $props = ""; $props .= prop("svn:author", $author); @@ -486,6 +494,11 @@ match the default GIT branch (master). Don't load the svn repository or update the syncpoint tagname. +=item B<--ignore-author> + +Ignore "author" lines in the fast-import stream. Use "committer" +information instead. + =item B<--keep-logs> Don't delete the logs in $CWD/.data on success. -- 1.7.4.rc1.7.g2cf08.dirty ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] Optionally parse author information 2011-01-14 10:16 ` [PATCH] Optionally parse author information Ramkumar Ramachandra @ 2011-01-16 2:17 ` Joe Corneli 2011-01-16 2:57 ` Jonathan Nieder 2011-01-16 3:32 ` Ramkumar Ramachandra 0 siblings, 2 replies; 12+ messages in thread From: Joe Corneli @ 2011-01-16 2:17 UTC (permalink / raw) To: Ramkumar Ramachandra Cc: Love Hörnquist Åstrand, Jonathan Nieder, Git List, Michael Haggerty I tested it, and it seems to use email handle instead of author name (perhaps that's intentional, though in my case it's not so desirable) but, quite critically, it gets the dates wrong: ~/pmhistory.svn$ svn log -l 5 ------------------------------------------------------------------------ r53127 | majordomo | 2011-01-10 18:31:58 -0500 (Mon, 10 Jan 2011) | 1 line ------------------------------------------------------------------------ r53126 | majordomo | 2011-01-10 18:31:58 -0500 (Mon, 10 Jan 2011) | 1 line ------------------------------------------------------------------------ r53125 | majordomo | 2011-01-10 18:31:58 -0500 (Mon, 10 Jan 2011) | 1 line ------------------------------------------------------------------------ r53124 | majordomo | 2011-01-10 18:31:57 -0500 (Mon, 10 Jan 2011) | 1 line ------------------------------------------------------------------------ r53123 | majordomo | 2011-01-10 18:31:57 -0500 (Mon, 10 Jan 2011) | 1 line ~/pmhistory.git$ git log -5 commit 411b8698e494ee12799300611fed0c8029e76ad3 Author: milogardner <majordomo@planetmath.org> Date: Thu Dec 16 14:11:57 2010 +0000 commit d12f8472cc06feec1a0e3a652e4ac14d7869fb3f Author: milogardner <majordomo@planetmath.org> Date: Thu Dec 16 14:00:13 2010 +0000 commit 5febb4767563255280d95091ff9b2b0207042071 Author: Mathprof <majordomo@planetmath.org> Date: Wed Dec 15 23:02:47 2010 +0000 commit 61b11b97c4e503c353af5c1cd68e17b053d12b8e Author: pahio <majordomo@planetmath.org> Date: Mon Dec 13 17:59:12 2010 +0000 commit 14d66e2dcd6151eb7214a9afbab159459912da6d Author: pahio <majordomo@planetmath.org> Date: Mon Dec 13 17:54:21 2010 +0000 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Optionally parse author information 2011-01-16 2:17 ` Joe Corneli @ 2011-01-16 2:57 ` Jonathan Nieder 2011-01-16 3:32 ` Ramkumar Ramachandra 1 sibling, 0 replies; 12+ messages in thread From: Jonathan Nieder @ 2011-01-16 2:57 UTC (permalink / raw) To: Joe Corneli Cc: Ramkumar Ramachandra, Love Hörnquist Åstrand, Git List, Michael Haggerty Joe Corneli wrote: > I tested it, and it seems to use email handle instead of author name > (perhaps that's intentional, though in my case it's not so desirable) Good point. Presumably git2svn is using the local part of the email address to mimic svn's default behavior of using one's username. Other possibilities: - email address (e.g., majordomo@planetmath.org, as in most google repositories) - display name (e.g., "Joe Corneli"). I don't know if svn-related tools or scripts assume that svn:author doesn't contain spaces. - full ident string (e.g., "Joe Corneli <majordomo@planetmath.org>") - whatever the operator wants (mapping specified in authors file). My guess: an "authors file" facility would be needed to cover all cases, but whichever rule you want to implement short of that could also be useful. > but, quite critically, it gets the dates wrong: > > ~/pmhistory.svn$ svn log -l 5 > ------------------------------------------------------------------------ > r53127 | majordomo | 2011-01-10 18:31:58 -0500 (Mon, 10 Jan 2011) | 1 line [...] > ~/pmhistory.git$ git log -5 > commit 411b8698e494ee12799300611fed0c8029e76ad3 > Author: milogardner <majordomo@planetmath.org> > Date: Thu Dec 16 14:11:57 2010 +0000 Maybe it is using the committer date (as shown by "git log --format=fuller") and someone rebased recently. If you don't care about svn's '{date}' construct working (meaning out-of-order dates are ok) then author date might be more suitable. Presumably the important thing is for it to be consistent. Hope that helps, Jonathan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Optionally parse author information 2011-01-16 2:17 ` Joe Corneli 2011-01-16 2:57 ` Jonathan Nieder @ 2011-01-16 3:32 ` Ramkumar Ramachandra 1 sibling, 0 replies; 12+ messages in thread From: Ramkumar Ramachandra @ 2011-01-16 3:32 UTC (permalink / raw) To: Joe Corneli Cc: Love Hörnquist Åstrand, Jonathan Nieder, Git List, Michael Haggerty Hi Joe, Joe Corneli writes: > I tested it, and it seems to use email handle instead of author name > (perhaps that's intentional, though in my case it's not so desirable) > but, quite critically, it gets the dates wrong: Yes. I didn't change it's core behavior- it used to extract the information from the committer's email address previously; I just changed it to use the author's email address. For dates, it uses committer dates again, and this is probably desirable: author dates aren't necessarily monotonic, and this can break some functionality in SVN. Ofcourse, a lot more is possible with an patch that allows users to configure all these things. Until then, I recommend that you just edit the source to achieve the desired results. -- Ram ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-01-16 3:32 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-12 1:27 working with a large repository and git svn Joe Corneli 2011-01-12 15:30 ` Wesley J. Landaker 2011-01-13 0:54 ` Joe Corneli 2011-01-13 3:23 ` Jonathan Nieder 2011-01-14 7:43 ` Joe Corneli 2011-01-14 8:05 ` Ramkumar Ramachandra 2011-01-14 8:29 ` Jonathan Nieder 2011-01-14 9:23 ` Michael Haggerty [not found] ` <F0299861-B36C-459C-972E-856212A92615@kth.se> 2011-01-14 10:16 ` [PATCH] Optionally parse author information Ramkumar Ramachandra 2011-01-16 2:17 ` Joe Corneli 2011-01-16 2:57 ` Jonathan Nieder 2011-01-16 3:32 ` Ramkumar Ramachandra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).