* git repository size vs. subversion repository size @ 2008-04-04 22:02 Sean Brown 2008-04-04 22:17 ` Björn Steinbrink 2008-04-05 3:11 ` Shawn O. Pearce 0 siblings, 2 replies; 11+ messages in thread From: Sean Brown @ 2008-04-04 22:02 UTC (permalink / raw) To: git Last night I decided to see what storage size differences I might see between an svn repo and a git one. So I imported a highly used subversion repository into git and was shocked to see how huge the git version was. I used a repo that has a lot of branches and tagged releases just to make sure importing into git would in fact keep all of the history. It did keep the history, but the total disk usage was very different: $subversionbox # du -hs ./my_sample_website/ 67M ./my_sample_website $localhost # du -hs ./git-samplesite/ 3.6GB ./git-samplesite/ Here are the steps I took (locally): mkdir git-samplesite-tmp cd git-samplesite-tmp git-svn init http://subversion.myco.com/my_sample_website --no-metadata git config svn.authorsfile ~/Desktop/users.txt # mapped svn users to git users git-svn fetch git clone git-samplesite-tmp git-samplesite I did this based on reading the documents in the git wiki, so I assumed they were "best practice." Did I do something wrong? If this is a normal amount of storage need increase, we'd likely not move to git based on the need for new hardware alone. Any help would be appreciated. Sean ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git repository size vs. subversion repository size 2008-04-04 22:02 git repository size vs. subversion repository size Sean Brown @ 2008-04-04 22:17 ` Björn Steinbrink 2008-04-04 23:49 ` Stephen Bannasch 2008-04-05 2:27 ` Sean Brown 2008-04-05 3:11 ` Shawn O. Pearce 1 sibling, 2 replies; 11+ messages in thread From: Björn Steinbrink @ 2008-04-04 22:17 UTC (permalink / raw) To: Sean Brown; +Cc: git On 2008.04.04 18:02:56 -0400, Sean Brown wrote: > Last night I decided to see what storage size differences I might see > between an svn repo and a git one. So I imported a highly used > subversion repository into git and was shocked to see how huge the git > version was. I used a repo that has a lot of branches and tagged > releases just to make sure importing into git would in fact keep all > of the history. It did keep the history, but the total disk usage was > very different: > > $subversionbox # du -hs ./my_sample_website/ > 67M ./my_sample_website > > $localhost # du -hs ./git-samplesite/ > 3.6GB ./git-samplesite/ How much of that is in the .git/svn directory? The contents of that directory are used to map git commits to svn revision and git versions before 1.5.4 had a quite space consuming file format for that. The new format is a lot better. If you want to switch completely, you can even just delete the .git/svn directory, as that's only required as long as you want to interact with the corresponding svn repository. And finally, you might want to repack to repository once after the initial import, to get a smaller repo. Something like: git repack -a -d -f --window=100 --depth=100 Björn ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git repository size vs. subversion repository size 2008-04-04 22:17 ` Björn Steinbrink @ 2008-04-04 23:49 ` Stephen Bannasch 2008-04-05 0:01 ` Steven Walter 2008-04-05 2:27 ` Sean Brown 1 sibling, 1 reply; 11+ messages in thread From: Stephen Bannasch @ 2008-04-04 23:49 UTC (permalink / raw) To: git I'm just fooling around with git so far but I found a huge space savings after running git gc. Here are the rough numbers: svn repo on server: 1GB svn repo checked out: 2GB git svn clone after gc: 384MB That's saving the full history in git -- about 13000 revisions. Using git version 1.5.4.4. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git repository size vs. subversion repository size 2008-04-04 23:49 ` Stephen Bannasch @ 2008-04-05 0:01 ` Steven Walter 2008-04-05 0:04 ` Stephen Bannasch ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Steven Walter @ 2008-04-05 0:01 UTC (permalink / raw) To: Stephen Bannasch; +Cc: git On Fri, Apr 04, 2008 at 07:49:24PM -0400, Stephen Bannasch wrote: > I'm just fooling around with git so far but I found a huge space savings > after running git gc. Here are the rough numbers: > > svn repo on server: 1GB > svn repo checked out: 2GB > git svn clone after gc: 384MB > > That's saving the full history in git -- about 13000 revisions. git-gc is such an important step in importing a repository from svn. Why doesn't git-svn take this vital step automatically? -- -Steven Walter <stevenrwalter@gmail.com> Freedom is the freedom to say that 2 + 2 = 4 B2F1 0ECC E605 7321 E818 7A65 FC81 9777 DC28 9E8F ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git repository size vs. subversion repository size 2008-04-05 0:01 ` Steven Walter @ 2008-04-05 0:04 ` Stephen Bannasch 2008-04-05 0:18 ` Björn Steinbrink 2008-04-14 15:28 ` Eric Hanchrow 2 siblings, 0 replies; 11+ messages in thread From: Stephen Bannasch @ 2008-04-05 0:04 UTC (permalink / raw) To: git >On Fri, Apr 04, 2008 at 07:49:24PM -0400, Stephen Bannasch wrote: >> I'm just fooling around with git so far but I found a huge space savings >> after running git gc. Here are the rough numbers: >> >> svn repo on server: 1GB >> svn repo checked out: 2GB >> git svn clone after gc: 384MB >> >> That's saving the full history in git -- about 13000 revisions. > >git-gc is such an important step in importing a repository from svn. >Why doesn't git-svn take this vital step automatically? I think because it is not necessary for continued productive use of git and the gc operation is expensive. On the repo above it took about 8 hours running in the background while I was working on other stuff. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git repository size vs. subversion repository size 2008-04-05 0:01 ` Steven Walter 2008-04-05 0:04 ` Stephen Bannasch @ 2008-04-05 0:18 ` Björn Steinbrink 2008-04-14 15:28 ` Eric Hanchrow 2 siblings, 0 replies; 11+ messages in thread From: Björn Steinbrink @ 2008-04-05 0:18 UTC (permalink / raw) To: Steven Walter; +Cc: Stephen Bannasch, git [Stephan, please stop dropping me from Cc:, thanks] On 2008.04.04 20:01:41 -0400, Steven Walter wrote: > On Fri, Apr 04, 2008 at 07:49:24PM -0400, Stephen Bannasch wrote: > > I'm just fooling around with git so far but I found a huge space savings > > after running git gc. Here are the rough numbers: > > > > svn repo on server: 1GB > > svn repo checked out: 2GB > > git svn clone after gc: 384MB > > > > That's saving the full history in git -- about 13000 revisions. > > git-gc is such an important step in importing a repository from svn. > Why doesn't git-svn take this vital step automatically? Starting from 1.5.4 (IIRC) git-svn will repack every 1000 revisions (by default). That won't give you a reeeeally tiny pack but OTOH it won't take ages to do the repacks either. Björn ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git repository size vs. subversion repository size 2008-04-05 0:01 ` Steven Walter 2008-04-05 0:04 ` Stephen Bannasch 2008-04-05 0:18 ` Björn Steinbrink @ 2008-04-14 15:28 ` Eric Hanchrow 2 siblings, 0 replies; 11+ messages in thread From: Eric Hanchrow @ 2008-04-14 15:28 UTC (permalink / raw) To: git >>>>> "Steven" == Steven Walter <stevenrwalter@gmail.com> writes: Steven> git-gc is such an important step in importing a repository Steven> from svn. Why doesn't git-svn take this vital step Steven> automatically? Mine did: git-svn version 1.5.5 (svn 1.3.2) git-svn init file://$HOME/svn-repos git-svn fetch ... M trunk/home/local/bin/spam/print-subjects.ss r5480 = b3edab03f5bacda1db025bd2cca769abbe007f23 (git-svn) Auto packing your repository for optimum performance. You may also run "git gc" manually. See "git help gc" for more information. Counting objects: 11182, done. Compressing objects: 100% (11021/11021), done. Writing objects: 100% (11182/11182), done. Total 11182 (delta 9378), reused 0 (delta 0) Checked out HEAD: file:///home/erich/svn-repos r5480 $ du -sh /tmp/ya/.git/ ~/svn-repos/ 23M /tmp/ya/.git/ 88M /home/erich/svn-repos/ -- The old graybeards in the Smalltalk world may not seem relevant, but if you ask them a question about ORM, they have been thinking about it for 20 years. -- Avi Bryant ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git repository size vs. subversion repository size 2008-04-04 22:17 ` Björn Steinbrink 2008-04-04 23:49 ` Stephen Bannasch @ 2008-04-05 2:27 ` Sean Brown 2008-04-05 2:34 ` Björn Steinbrink 1 sibling, 1 reply; 11+ messages in thread From: Sean Brown @ 2008-04-05 2:27 UTC (permalink / raw) To: Björn Steinbrink; +Cc: git On Fri, Apr 4, 2008 at 6:17 PM, Björn Steinbrink <B.Steinbrink@gmx.de> wrote: > On 2008.04.04 18:02:56 -0400, Sean Brown wrote: > > Last night I decided to see what storage size differences I might see > > between an svn repo and a git one. So I imported a highly used > > subversion repository into git and was shocked to see how huge the git > > version was. I used a repo that has a lot of branches and tagged > > releases just to make sure importing into git would in fact keep all > > of the history. It did keep the history, but the total disk usage was > > very different: > > > > $subversionbox # du -hs ./my_sample_website/ > > 67M ./my_sample_website > > > > $localhost # du -hs ./git-samplesite/ > > 3.6GB ./git-samplesite/ > > How much of that is in the .git/svn directory? The contents of that > directory are used to map git commits to svn revision and git versions > before 1.5.4 had a quite space consuming file format for that. The new > format is a lot better. If you want to switch completely, you can even > just delete the .git/svn directory, as that's only required as long as > you want to interact with the corresponding svn repository. > > And finally, you might want to repack to repository once after the > initial import, to get a smaller repo. Something like: > git repack -a -d -f --window=100 --depth=100 > The svn folder (in the.git directory) was only about 4.2 MB. After running the repack (and then after that git gc as mentioned by another in this thread), it's still about 3.5 GB. git-samplesite (master)]$ du -hs ./* 2.1G ./branches 1.4G ./tags 66M ./trunk The site does have a lot of binary files (PDFs, photographs an such). I suppose we could leave all of the branches and tags in subversion and just move the trunk to git, but I was hoping to make a clean break from subversion. If anyone has any further suggestions I'd love to hear them. Sean -- Sean Brown seanmichaelbrown@gmail.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git repository size vs. subversion repository size 2008-04-05 2:27 ` Sean Brown @ 2008-04-05 2:34 ` Björn Steinbrink 2008-04-13 9:57 ` Jan Hudec 0 siblings, 1 reply; 11+ messages in thread From: Björn Steinbrink @ 2008-04-05 2:34 UTC (permalink / raw) To: Sean Brown; +Cc: git On 2008.04.04 22:27:12 -0400, Sean Brown wrote: > On Fri, Apr 4, 2008 at 6:17 PM, Björn Steinbrink <B.Steinbrink@gmx.de> wrote: > > On 2008.04.04 18:02:56 -0400, Sean Brown wrote: > > > Last night I decided to see what storage size differences I might see > > > between an svn repo and a git one. So I imported a highly used > > > subversion repository into git and was shocked to see how huge the git > > > version was. I used a repo that has a lot of branches and tagged > > > releases just to make sure importing into git would in fact keep all > > > of the history. It did keep the history, but the total disk usage was > > > very different: > > > > > > $subversionbox # du -hs ./my_sample_website/ > > > 67M ./my_sample_website > > > > > > $localhost # du -hs ./git-samplesite/ > > > 3.6GB ./git-samplesite/ > > > > How much of that is in the .git/svn directory? The contents of that > > directory are used to map git commits to svn revision and git versions > > before 1.5.4 had a quite space consuming file format for that. The new > > format is a lot better. If you want to switch completely, you can even > > just delete the .git/svn directory, as that's only required as long as > > you want to interact with the corresponding svn repository. > > > > And finally, you might want to repack to repository once after the > > initial import, to get a smaller repo. Something like: > > git repack -a -d -f --window=100 --depth=100 > > > > The svn folder (in the.git directory) was only about 4.2 MB. After > running the repack (and then after that git gc as mentioned by another > in this thread), it's still about 3.5 GB. > > git-samplesite (master)]$ du -hs ./* > 2.1G ./branches > 1.4G ./tags > 66M ./trunk Uhm, you forgot to use -s when doing the clone. That would have created real git branches instead of the directories... What you are counting is the size of the checked out, uncompressed files of _all_ branches and _all_ tags (and trunk). The repo size of basically what "du -sh .git" would give. Björn ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git repository size vs. subversion repository size 2008-04-05 2:34 ` Björn Steinbrink @ 2008-04-13 9:57 ` Jan Hudec 0 siblings, 0 replies; 11+ messages in thread From: Jan Hudec @ 2008-04-13 9:57 UTC (permalink / raw) To: Björn Steinbrink; +Cc: Sean Brown, git On Sat, Apr 05, 2008 at 04:34:37 +0200, Björn Steinbrink wrote: > On 2008.04.04 22:27:12 -0400, Sean Brown wrote: > > On Fri, Apr 4, 2008 at 6:17 PM, Björn Steinbrink <B.Steinbrink@gmx.de> wrote: > > > On 2008.04.04 18:02:56 -0400, Sean Brown wrote: > > > > Last night I decided to see what storage size differences I might see > > > > between an svn repo and a git one. So I imported a highly used > > > > subversion repository into git and was shocked to see how huge the git > > > > version was. I used a repo that has a lot of branches and tagged > > > > releases just to make sure importing into git would in fact keep all > > > > of the history. It did keep the history, but the total disk usage was > > > > very different: > > > > > > > > $subversionbox # du -hs ./my_sample_website/ > > > > 67M ./my_sample_website > > > > > > > > $localhost # du -hs ./git-samplesite/ > > > > 3.6GB ./git-samplesite/ > > > > > > How much of that is in the .git/svn directory? The contents of that > > > directory are used to map git commits to svn revision and git versions > > > before 1.5.4 had a quite space consuming file format for that. The new > > > format is a lot better. If you want to switch completely, you can even > > > just delete the .git/svn directory, as that's only required as long as > > > you want to interact with the corresponding svn repository. > > > > > > And finally, you might want to repack to repository once after the > > > initial import, to get a smaller repo. Something like: > > > git repack -a -d -f --window=100 --depth=100 > > > > > > > The svn folder (in the.git directory) was only about 4.2 MB. After > > running the repack (and then after that git gc as mentioned by another > > in this thread), it's still about 3.5 GB. > > > > git-samplesite (master)]$ du -hs ./* > > 2.1G ./branches > > 1.4G ./tags > > 66M ./trunk > > Uhm, you forgot to use -s when doing the clone. That would have created No, not the clone, but the git svn init. > real git branches instead of the directories... What you are counting is > the size of the checked out, uncompressed files of _all_ branches and > _all_ tags (and trunk). The repo size of basically what "du -sh .git" > would give. > > Björn > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan 'Bulb' Hudec <bulb@ucw.cz> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: git repository size vs. subversion repository size 2008-04-04 22:02 git repository size vs. subversion repository size Sean Brown 2008-04-04 22:17 ` Björn Steinbrink @ 2008-04-05 3:11 ` Shawn O. Pearce 1 sibling, 0 replies; 11+ messages in thread From: Shawn O. Pearce @ 2008-04-05 3:11 UTC (permalink / raw) To: Sean Brown; +Cc: git Sean Brown <seanmichaelbrown@gmail.com> wrote: > > Here are the steps I took (locally): > > mkdir git-samplesite-tmp > cd git-samplesite-tmp > git-svn init http://subversion.myco.com/my_sample_website --no-metadata > git config svn.authorsfile ~/Desktop/users.txt # mapped svn users to git users > git-svn fetch > git clone git-samplesite-tmp git-samplesite > > I did this based on reading the documents in the git wiki, so I > assumed they were "best practice." Did I do something wrong? The last command there didn't get you the most efficiently packed repository possible. More recent versions of git-clone will prefer to hardlink all of the loose objects and packs from the source to the destination, so the clone can occur more quickly when they are on the same filesystem. Really what you want to do here is repack the cloned directory (cd git-samplesite && git repack -a -d -f) and maybe include some aggressive --depth and --window options (e.g. 100/100) if you have some CPU time to burn and are reasonably certain you will be keeping the result. You only have to spend that CPU time once when converting from SVN, and all future clones from this one will benefit. But your really major disk usage was due to what someone else pointed out, which was missing the "-s" flag to git-svn. So the Git working directory was huge, as we created working files for every single branch and every single tag. Ouch. -- Shawn. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-04-14 15:39 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-04-04 22:02 git repository size vs. subversion repository size Sean Brown 2008-04-04 22:17 ` Björn Steinbrink 2008-04-04 23:49 ` Stephen Bannasch 2008-04-05 0:01 ` Steven Walter 2008-04-05 0:04 ` Stephen Bannasch 2008-04-05 0:18 ` Björn Steinbrink 2008-04-14 15:28 ` Eric Hanchrow 2008-04-05 2:27 ` Sean Brown 2008-04-05 2:34 ` Björn Steinbrink 2008-04-13 9:57 ` Jan Hudec 2008-04-05 3:11 ` Shawn O. Pearce
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.