* Problems getting rid of large files using git-filter-branch
@ 2009-01-06 21:59 Øyvind Harboe
  2009-01-06 22:20 ` Johannes Schindelin
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Øyvind Harboe @ 2009-01-06 21:59 UTC (permalink / raw)
  To: git
I'm trying to get rid of some large objects in my .git repository
using git-filter-branch. These are remnants from conversion from
CVS.
Q1: How can I figure out what it is in .git that takes so much space?
Q2: Where can I read more about what to do after running git-filter-branch to
removing the offending objects?
1. I ran this command to get rid of the offending files and that appears to
have worked. I can't find any traces of them anymore...
git filter-branch --tree-filter 'find . -regex ".*toolchain\..*" -exec
rm -f {} \;' HEAD
2. Running "git gc" takes a few seconds. The repository is still
huge(it should be
perhaps 10-20mByte).
du -skh .git/
187M    .git/
3. I tried "git reflog expire --all" + lots of other tricks in the
link below, but no luck.
I tried the tricks I could find in this thread, but no luck:
http://article.gmane.org/gmane.comp.version-control.git/60219/match=trying+use+git+filter+branch+compress
-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 XScale Cortex
JTAG debugger and flash programmer
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-06 21:59 Problems getting rid of large files using git-filter-branch Øyvind Harboe
@ 2009-01-06 22:20 ` Johannes Schindelin
  2009-01-06 22:36   ` Øyvind Harboe
  2009-01-06 22:31 ` Nicolas Pitre
  2009-01-07  8:26 ` Øyvind Harboe
  2 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2009-01-06 22:20 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 324 bytes --]
Hi,
On Tue, 6 Jan 2009, Øyvind Harboe wrote:
> Q1: How can I figure out what it is in .git that takes so much space?
If it is a pack that is taking so much space:
$ git verify-pack -v $PACK | grep -v "^chain " | sort -n -k 4
and then for the last few lines do a
$ git rev-list --all --objects | grep $SHA1
Hth,
Dscho
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-06 21:59 Problems getting rid of large files using git-filter-branch Øyvind Harboe
  2009-01-06 22:20 ` Johannes Schindelin
@ 2009-01-06 22:31 ` Nicolas Pitre
  2009-01-06 22:41   ` Øyvind Harboe
  2009-01-06 23:17   ` Stephen R. van den Berg
  2009-01-07  8:26 ` Øyvind Harboe
  2 siblings, 2 replies; 13+ messages in thread
From: Nicolas Pitre @ 2009-01-06 22:31 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1106 bytes --]
On Tue, 6 Jan 2009, Øyvind Harboe wrote:
> Q1: How can I figure out what it is in .git that takes so much space?
> 
> Q2: Where can I read more about what to do after running git-filter-branch to
> removing the offending objects?
> 
> 
> 
> 1. I ran this command to get rid of the offending files and that appears to
> have worked. I can't find any traces of them anymore...
> 
> git filter-branch --tree-filter 'find . -regex ".*toolchain\..*" -exec
> rm -f {} \;' HEAD
> 
> 2. Running "git gc" takes a few seconds. The repository is still
> huge(it should be
> perhaps 10-20mByte).
> 
> du -skh .git/
> 187M    .git/
> 
> 3. I tried "git reflog expire --all" + lots of other tricks in the
> link below, but no luck.
OK, try this:
	cd ..
	mv my_repo my_repo.orig
	mkdir my_repo
	cd my_repo
	git init
	git pull file://$(pwd)/../my_repo.orig
This is the easiest way to ensure you have only the necessary objects in 
the new repo, without all the extra stuff tied to reflogs, etc.
Then, if your repo is still seemingly too big, you can get a bit dirty 
with the sequence Johannes just posted.
Nicolas
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-06 22:20 ` Johannes Schindelin
@ 2009-01-06 22:36   ` Øyvind Harboe
  2009-01-07 10:07     ` Johannes Schindelin
  0 siblings, 1 reply; 13+ messages in thread
From: Øyvind Harboe @ 2009-01-06 22:36 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
On Tue, Jan 6, 2009 at 11:20 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Tue, 6 Jan 2009, Øyvind Harboe wrote:
>
>> Q1: How can I figure out what it is in .git that takes so much space?
>
> If it is a pack that is taking so much space:
it is.
>
> $ git verify-pack -v $PACK | grep -v "^chain " | sort -n -k 4
I have never used the git verify-pack command, but I'm pretty sure the
"Terminated" string isn't the normal output :-)
$ git verify-pack -v
.git/objects/pack/pack-1e039b82d8ae53ef5ec3614a3021466663cc70a4
Terminated
This is running git version 1.6.1. on CentOS on a virtual machine. I'm not quite
sure how to debug this. I'm sure I've done something wrong when I installed git.
I'm just a humble user of git trying to convert from cvs/svn.
> and then for the last few lines do a
>
> $ git rev-list --all --objects | grep $SHA1
I was able to run this procedure on a different machine than the
server and I can
then tell which objects take up all the space.
However, I'm unnerved by git verify-pack "Terminated"'ing on me above
and I'll have
to sort that out before I can think about using git in production.
Thanks for the pointers though! They definitely answered my questions!
>
> Hth,
> Dscho
>
-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 XScale Cortex
JTAG debugger and flash programmer
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-06 22:31 ` Nicolas Pitre
@ 2009-01-06 22:41   ` Øyvind Harboe
  2009-01-06 23:17   ` Stephen R. van den Berg
  1 sibling, 0 replies; 13+ messages in thread
From: Øyvind Harboe @ 2009-01-06 22:41 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
>> 3. I tried "git reflog expire --all" + lots of other tricks in the
>> link below, but no luck.
>
> OK, try this:
>
>        cd ..
>        mv my_repo my_repo.orig
>        mkdir my_repo
>        cd my_repo
>        git init
>        git pull file://$(pwd)/../my_repo.orig
>
> This is the easiest way to ensure you have only the necessary objects in
> the new repo, without all the extra stuff tied to reflogs, etc.
Super!
That worked!
> Then, if your repo is still seemingly too big, you can get a bit dirty
> with the sequence Johannes just posted.
Johannes procedure had the unexpected side effect of showing that
my server setup is flaky somehow though... :-) I'll need his
tricks for other situations soon enough.
-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 XScale Cortex
JTAG debugger and flash programmer
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-06 22:31 ` Nicolas Pitre
  2009-01-06 22:41   ` Øyvind Harboe
@ 2009-01-06 23:17   ` Stephen R. van den Berg
  2009-01-07  0:56     ` Nicolas Pitre
  1 sibling, 1 reply; 13+ messages in thread
From: Stephen R. van den Berg @ 2009-01-06 23:17 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: ?yvind Harboe, git
Nicolas Pitre wrote:
>On Tue, 6 Jan 2009, ?yvind Harboe wrote:
>OK, try this:
>	git pull file://$(pwd)/../my_repo.orig
Alternately, try:
rm -rf .git/ORIG_HEAD .git/FETCH_HEAD .git/index .git/logs .git/info/refs \
  .git/objects/pack/pack-*.keep .git/refs/original .git/refs/patches \
  .git/patches .git/gitk.cache &&
 git prune --expire now &&
 git repack -a -d --window=200 &&
 git gc
-- 
Sincerely,
           Stephen R. van den Berg.
"Very funny, Mr. Scott. Now beam down my clothes!"
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-06 23:17   ` Stephen R. van den Berg
@ 2009-01-07  0:56     ` Nicolas Pitre
  0 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2009-01-07  0:56 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: ?yvind Harboe, git
On Wed, 7 Jan 2009, Stephen R. van den Berg wrote:
> Nicolas Pitre wrote:
> >On Tue, 6 Jan 2009, ?yvind Harboe wrote:
> >OK, try this:
> 
> >	git pull file://$(pwd)/../my_repo.orig
> 
> Alternately, try:
> 
> rm -rf .git/ORIG_HEAD .git/FETCH_HEAD .git/index .git/logs .git/info/refs \
>   .git/objects/pack/pack-*.keep .git/refs/original .git/refs/patches \
>   .git/patches .git/gitk.cache &&
>  git prune --expire now &&
>  git repack -a -d --window=200 &&
>  git gc
This might not be sufficient.  Or at least you better run 'git prune' at 
the very end, and possibly add -f to 'git repack'.  And if you somehow 
delete something you shouldn't have deleted then you're really screwed, 
whereas the pull method in another repository doesn't alter the original 
repository in case you need to go back to it and try something 
different.
Nicolas
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-06 21:59 Problems getting rid of large files using git-filter-branch Øyvind Harboe
  2009-01-06 22:20 ` Johannes Schindelin
  2009-01-06 22:31 ` Nicolas Pitre
@ 2009-01-07  8:26 ` Øyvind Harboe
  2009-01-07 15:02   ` Nicolas Pitre
  2 siblings, 1 reply; 13+ messages in thread
From: Øyvind Harboe @ 2009-01-07  8:26 UTC (permalink / raw)
  To: git
Here is a summary of the solution I used. I'm a beginner in git
and just summarizing what others told me and what I did. Use at
your own risk!
1. Remove anything you know should be removed, e.g.:
git filter-branch --tree-filter 'find . -regex ".*toolchain\..*" -exec
rm -f {} \;' HEAD
2. Expire the log:
git reflog expire --all
3. Delete stuff from .git that should be manually "verified" to be
correct. I don't actually
know how to "verify" that at this point... Use backups Luke!
rm -rf .git/refs/original
# delete lines w/"refs/original" from .git/packed-refs
vi .git/packed-refs
# for good measure...
git reflog expire --all
git gc
4. Your repository is still huge. By creating a new repository and pulling from
this one, the garbage will stay in the old one...
mkdir newrep
cd newrep
git init
git pull file:///oldrep
5. Check size of .git. If it is still too big, try figuring out which
files that are big by looking at the packs(.git/objects/pack/xxx):
$ git verify-pack -v $PACK | grep -v "^chain " | sort -n -k 4
and then for the last few lines do a
$ git rev-list --all --objects | grep $SHA1
6. Go back to #1 until done.
Your repository should now be of reasonable size...
I've found some great scripts for converting from svn/cvs, but really
the above procedure
is necessary to run when converting nasty old repositories...
-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 XScale Cortex
JTAG debugger and flash programmer
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-06 22:36   ` Øyvind Harboe
@ 2009-01-07 10:07     ` Johannes Schindelin
  2009-01-07 10:15       ` Øyvind Harboe
  2009-01-07 18:18       ` Sitaram Chamarty
  0 siblings, 2 replies; 13+ messages in thread
From: Johannes Schindelin @ 2009-01-07 10:07 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 836 bytes --]
Hi,
On Tue, 6 Jan 2009, Øyvind Harboe wrote:
> On Tue, Jan 6, 2009 at 11:20 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > $ git verify-pack -v $PACK | grep -v "^chain " | sort -n -k 4
> 
> I have never used the git verify-pack command, but I'm pretty sure the
> "Terminated" string isn't the normal output :-)
> 
> $ git verify-pack -v
> .git/objects/pack/pack-1e039b82d8ae53ef5ec3614a3021466663cc70a4
> Terminated
I did
	$ git grep Terminated
and came up empty :-)
Seriously, I guess this could be some OOM thing.  We _should_ handle this 
more gracefully, but it is possible that some uncatchable condition hits 
you, such as out-of-stack-space.
I'd try running the command either with strace or with gdb, and I'd look 
at $? after the command returns, to find out what is actually happening.
Hth,
Dscho
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-07 10:07     ` Johannes Schindelin
@ 2009-01-07 10:15       ` Øyvind Harboe
  2009-01-07 12:45         ` Johannes Schindelin
  2009-01-07 18:18       ` Sitaram Chamarty
  1 sibling, 1 reply; 13+ messages in thread
From: Øyvind Harboe @ 2009-01-07 10:15 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
On Wed, Jan 7, 2009 at 11:07 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Tue, 6 Jan 2009, Øyvind Harboe wrote:
>
>> On Tue, Jan 6, 2009 at 11:20 PM, Johannes Schindelin
>> <Johannes.Schindelin@gmx.de> wrote:
>>
>> > $ git verify-pack -v $PACK | grep -v "^chain " | sort -n -k 4
>>
>> I have never used the git verify-pack command, but I'm pretty sure the
>> "Terminated" string isn't the normal output :-)
>>
>> $ git verify-pack -v
>> .git/objects/pack/pack-1e039b82d8ae53ef5ec3614a3021466663cc70a4
>> Terminated
>
> I did
>
>        $ git grep Terminated
>
> and came up empty :-)
>
> Seriously, I guess this could be some OOM thing.  We _should_ handle this
> more gracefully, but it is possible that some uncatchable condition hits
> you, such as out-of-stack-space.
>
> I'd try running the command either with strace or with gdb, and I'd look
> at $? after the command returns, to find out what is actually happening.
After some investigation it turns out that my server has 228mByte of RAM
available. It is a virtual server running CentOS, hence the strange number
and ridiciulously tiny amount of memory(these days).
Now the strange thing is that I'm not getting this error message this morning...
How would git behave if it ran out of memory?
-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 XScale Cortex
JTAG debugger and flash programmer
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-07 10:15       ` Øyvind Harboe
@ 2009-01-07 12:45         ` Johannes Schindelin
  0 siblings, 0 replies; 13+ messages in thread
From: Johannes Schindelin @ 2009-01-07 12:45 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 162 bytes --]
Hi,
On Wed, 7 Jan 2009, Øyvind Harboe wrote:
> How would git behave if it ran out of memory?
Something like
	fatal: Out of memory, malloc failed
Ciao,
Dscho
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-07  8:26 ` Øyvind Harboe
@ 2009-01-07 15:02   ` Nicolas Pitre
  0 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2009-01-07 15:02 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2045 bytes --]
On Wed, 7 Jan 2009, Øyvind Harboe wrote:
> Here is a summary of the solution I used. I'm a beginner in git
> and just summarizing what others told me and what I did. Use at
> your own risk!
> 
> 1. Remove anything you know should be removed, e.g.:
> 
> git filter-branch --tree-filter 'find . -regex ".*toolchain\..*" -exec
> rm -f {} \;' HEAD
> 
> 2. Expire the log:
> 
> git reflog expire --all
> 
> 3. Delete stuff from .git that should be manually "verified" to be
> correct. I don't actually
> know how to "verify" that at this point... Use backups Luke!
> 
> rm -rf .git/refs/original
> # delete lines w/"refs/original" from .git/packed-refs
> vi .git/packed-refs
> # for good measure...
> git reflog expire --all
> git gc
> 
> 4. Your repository is still huge. By creating a new repository and pulling from
> this one, the garbage will stay in the old one...
> 
> mkdir newrep
> cd newrep
> git init
> git pull file:///oldrep
I'd suggest you skip 2 and 3, and do 4 only.  Using 4 makes 2 
unnecessary, and is far safer than 3.  Manually deleting stuff in .git 
is fine only if you really know what you're doing and have some 
acquaintance with the git internals.
> 5. Check size of .git. If it is still too big, try figuring out which
> files that are big by looking at the packs(.git/objects/pack/xxx):
> 
> $ git verify-pack -v $PACK | grep -v "^chain " | sort -n -k 4
> 
> and then for the last few lines do a
> 
> $ git rev-list --all --objects | grep $SHA1
> 
> 6. Go back to #1 until done.
> 
> Your repository should now be of reasonable size...
> 
> I've found some great scripts for converting from svn/cvs, but really
> the above procedure
> is necessary to run when converting nasty old repositories...
> 
> -- 
> Øyvind Harboe
> http://www.zylin.com/zy1000.html
> ARM7 ARM9 XScale Cortex
> JTAG debugger and flash programmer
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Problems getting rid of large files using git-filter-branch
  2009-01-07 10:07     ` Johannes Schindelin
  2009-01-07 10:15       ` Øyvind Harboe
@ 2009-01-07 18:18       ` Sitaram Chamarty
  1 sibling, 0 replies; 13+ messages in thread
From: Sitaram Chamarty @ 2009-01-07 18:18 UTC (permalink / raw)
  To: git
On 2009-01-07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>> $ git verify-pack -v
>> .git/objects/pack/pack-1e039b82d8ae53ef5ec3614a3021466663cc70a4
>> Terminated
>
> I did
>
> 	$ git grep Terminated
>
> and came up empty :-)
It comes from libc, afaik.
^ permalink raw reply	[flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-01-07 18:20 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-06 21:59 Problems getting rid of large files using git-filter-branch Øyvind Harboe
2009-01-06 22:20 ` Johannes Schindelin
2009-01-06 22:36   ` Øyvind Harboe
2009-01-07 10:07     ` Johannes Schindelin
2009-01-07 10:15       ` Øyvind Harboe
2009-01-07 12:45         ` Johannes Schindelin
2009-01-07 18:18       ` Sitaram Chamarty
2009-01-06 22:31 ` Nicolas Pitre
2009-01-06 22:41   ` Øyvind Harboe
2009-01-06 23:17   ` Stephen R. van den Berg
2009-01-07  0:56     ` Nicolas Pitre
2009-01-07  8:26 ` Øyvind Harboe
2009-01-07 15:02   ` Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).