From: Michael J Gruber <git@drmicha.warpmail.net>
To: Thomas Jarosch <thomas.jarosch@intra2net.com>
Cc: git@vger.kernel.org
Subject: Re: help needed: Splitting a git repository after subversion migration
Date: Mon, 08 Dec 2008 14:30:28 +0100 [thread overview]
Message-ID: <493D2174.80500@drmicha.warpmail.net> (raw)
In-Reply-To: <493C0AAD.1040208@intra2net.com>
Thomas Jarosch venit, vidit, dixit 07.12.2008 18:41:
> Hello together,
>
> I've successfully imported a large subversion repository into git.
> The tree contains source code and binary data ("releases"),
> the resulting .git directory is about 11GB.
>
> After the import I recreated the tags/branches by converting the refs
> to the subversion tags using a small shell script from the web:
>
> for branch in `git branch -r`; do
> ...
> version=`basename $branch`
> git tag -s -f -m "$subject" "$version" "$branch^"
> git branch -d -r $branch
> done
>
> Ok, so far everything went really smooth. I wanted to split this repository
> into two repositories, one for the source code and one for the binary data.
> The current tree layout is like this:
>
> sources/c++_xyz
> releases/large_binary_data
> ...
>
> The original tree was imported from CVS to subversion and the layout
> of the trunk was once reorganized/moved later. Here's the command
> I used to split out the "source" tree:
>
> git filter-branch --index-filter 'git rm --cached --ignore-unmatch -r -f
> CVSROOT Attic source/Attic develpkg/Attic
> source/packages/Attic releases update_pkg' -- --all
>
> After that I ran these commands to reclaim the space:
> - git clone --no-hardlinks filtered_tree final_output
> - cd final_output
> - git gc
> - git prune
> - git repack -a -d --depth=250 --window=250
>
> Unfortunately the .git directory of the "source" tree is still 7.5GB big.
>
> When I just imported the "trunk" from subversion without any tags
> and then ran "git filter-branch --subdirectory-filter source" + git gc,
> the .git directory was about 1.5GB afterwards.
>
> How can I find out where those other 6GB go to?
> I already looked at the tags with gitk,
> there's no sign of the releases/* stuff left.
I strongly suspect the reorganization/move to be the cause. Most
probably some releases were put in places where you don't expect them,
and therefore they are not filtered out by removing the releases subdir.
If they have distinguished file names (say you know a name from before
the move) you can find them using "git log". Or use gitk --all, switch
to "tree display" and look for unexpected files in the earliest revisions.
Also, it may be better to do the tag creation (from tags/... branches)
after the filter-branch. If you don't rewrite the tags (have you?) then
the tags will still point to the original commits (before the rewrite)
and therefore include all the "fat blobs". You avoid this best by
creating them after the rewrite.
Michael
next prev parent reply other threads:[~2008-12-08 13:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-07 17:41 help needed: Splitting a git repository after subversion migration Thomas Jarosch
2008-12-08 13:30 ` Michael J Gruber [this message]
2008-12-08 14:24 ` Björn Steinbrink
2008-12-08 17:34 ` Thomas Jarosch
2008-12-10 16:33 ` Thomas Jarosch
2008-12-11 8:10 ` Björn Steinbrink
2008-12-12 14:22 ` Thomas Jarosch
2008-12-12 14:49 ` Björn Steinbrink
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=493D2174.80500@drmicha.warpmail.net \
--to=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=thomas.jarosch@intra2net.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).