* why is my repo size ten times bigger than what I expected?
[not found] <AANLkTimi+OnpdX+Y7jx1JaOmGbZc_XEgJFeK0PKLpu2o@mail.gmail.com>
@ 2011-03-05 10:05 ` Ruben Laguna
2011-03-05 10:49 ` Pascal Obry
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Ruben Laguna @ 2011-03-05 10:05 UTC (permalink / raw)
To: git
Hi,
I had a repo which was big 143MB because it contained a bunch of jar
files. So I decided to remove those completely from the history.
In short I used the git-large-blob [1] to find all the jars and used
the git-remove-history script [2] which does the filter-branch thing,
prune, etc.
I did this on all branches (that I know of) and now I can see that the
jars are gone because I can't find them with git-large-blob. and the
repo size has dropped from 143Mb to 87Mb.
My concern is that 87Mb is still really big taking into account he
size of the project. in fact if I run "git diff-tree -r -p $commit
|wc -c" for each commit and sum all I get 5.5Mb.
I also ran the git-rev-size [3] script that I found in this mailing
list and I only see that the size grows steadly from commit to commit
up to 1482731 bytes. So again how come the .git directory is 87MB?
So, Can anybody tell me if this repository size is "normal" for a
project with 1.4MB source and 352 commits?
Is there a better way to calculate the size (in bytes) of each commit?
Is there any other thing I could do to reduce and audit the repository size?
Thanks in advance!
Rubén
---
[1] http://stackoverflow.com/questions/298314/find-files-in-git-repo-over-x-megabytes-that-dont-exist-in-head
[2] http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/
[3] http://markmail.org/message/762zzg5zckbiq2i7
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected?
2011-03-05 10:05 ` why is my repo size ten times bigger than what I expected? Ruben Laguna
@ 2011-03-05 10:49 ` Pascal Obry
2011-03-05 10:49 ` Jonathan del Strother
2011-03-08 21:44 ` Tor Arvid Lund
2 siblings, 0 replies; 10+ messages in thread
From: Pascal Obry @ 2011-03-05 10:49 UTC (permalink / raw)
To: Ruben Laguna; +Cc: git
Le 05/03/2011 11:05, Ruben Laguna a écrit :
> Is there any other thing I could do to reduce and audit the repository size?
$ git gc
?
--
--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://www.obry.net - http://v2p.fr.eu.org
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver keys.gnupg.net --recv-key F949BD3B
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected?
2011-03-05 10:05 ` why is my repo size ten times bigger than what I expected? Ruben Laguna
2011-03-05 10:49 ` Pascal Obry
@ 2011-03-05 10:49 ` Jonathan del Strother
2011-03-05 11:41 ` Ruben Laguna
2011-03-08 21:44 ` Tor Arvid Lund
2 siblings, 1 reply; 10+ messages in thread
From: Jonathan del Strother @ 2011-03-05 10:49 UTC (permalink / raw)
To: Ruben Laguna; +Cc: git
On 5 March 2011 10:05, Ruben Laguna <ruben.laguna@gmail.com> wrote:
> Hi,
>
> I had a repo which was big 143MB because it contained a bunch of jar
> files. So I decided to remove those completely from the history.
>
> In short I used the git-large-blob [1] to find all the jars and used
> the git-remove-history script [2] which does the filter-branch thing,
> prune, etc.
>
> I did this on all branches (that I know of) and now I can see that the
> jars are gone because I can't find them with git-large-blob. and the
> repo size has dropped from 143Mb to 87Mb.
>
> My concern is that 87Mb is still really big taking into account he
> size of the project. in fact if I run "git diff-tree -r -p $commit
> |wc -c" for each commit and sum all I get 5.5Mb.
>
>
> I also ran the git-rev-size [3] script that I found in this mailing
> list and I only see that the size grows steadly from commit to commit
> up to 1482731 bytes. So again how come the .git directory is 87MB?
>
>
> So, Can anybody tell me if this repository size is "normal" for a
> project with 1.4MB source and 352 commits?
> Is there a better way to calculate the size (in bytes) of each commit?
>
> Is there any other thing I could do to reduce and audit the repository size?
>
>
> Thanks in advance!
> Rubén
>
> ---
> [1] http://stackoverflow.com/questions/298314/find-files-in-git-repo-over-x-megabytes-that-dont-exist-in-head
> [2] http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/
> [3] http://markmail.org/message/762zzg5zckbiq2i7
What happens if you clone that repo?
git-gc will only pruned unused objects that're older than 2 weeks by
default, so it's possible that your repo size will suddenly shrink in
2 weeks time (or sooner, if you run git-gc with the appropriate
options)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected?
2011-03-05 10:49 ` Jonathan del Strother
@ 2011-03-05 11:41 ` Ruben Laguna
2011-03-05 12:57 ` Andreas Schwab
0 siblings, 1 reply; 10+ messages in thread
From: Ruben Laguna @ 2011-03-05 11:41 UTC (permalink / raw)
To: Jonathan del Strother; +Cc: git
well, the git-remove-history script does
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git fsck --unreachable
git gc --prune=now
git gc --aggressive --prune=now
after filter-branch so I don't think it's that.
also cloning the repo doesn't change a thing
$ git clone en4j en4j_xx
Cloning into en4j_xx...
done.
$ cd en4j_xx
$ du -sh .git
87M .git
any other idea?
On Sat, Mar 5, 2011 at 11:49 AM, Jonathan del Strother
<maillist@steelskies.com> wrote:
> On 5 March 2011 10:05, Ruben Laguna <ruben.laguna@gmail.com> wrote:
>> Hi,
>>
>> I had a repo which was big 143MB because it contained a bunch of jar
>> files. So I decided to remove those completely from the history.
>>
>> In short I used the git-large-blob [1] to find all the jars and used
>> the git-remove-history script [2] which does the filter-branch thing,
>> prune, etc.
>>
>> I did this on all branches (that I know of) and now I can see that the
>> jars are gone because I can't find them with git-large-blob. and the
>> repo size has dropped from 143Mb to 87Mb.
>>
>> My concern is that 87Mb is still really big taking into account he
>> size of the project. in fact if I run "git diff-tree -r -p $commit
>> |wc -c" for each commit and sum all I get 5.5Mb.
>>
>>
>> I also ran the git-rev-size [3] script that I found in this mailing
>> list and I only see that the size grows steadly from commit to commit
>> up to 1482731 bytes. So again how come the .git directory is 87MB?
>>
>>
>> So, Can anybody tell me if this repository size is "normal" for a
>> project with 1.4MB source and 352 commits?
>> Is there a better way to calculate the size (in bytes) of each commit?
>>
>> Is there any other thing I could do to reduce and audit the repository size?
>>
>>
>> Thanks in advance!
>> Rubén
>>
>> ---
>> [1] http://stackoverflow.com/questions/298314/find-files-in-git-repo-over-x-megabytes-that-dont-exist-in-head
>> [2] http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/
>> [3] http://markmail.org/message/762zzg5zckbiq2i7
>
> What happens if you clone that repo?
> git-gc will only pruned unused objects that're older than 2 weeks by
> default, so it's possible that your repo size will suddenly shrink in
> 2 weeks time (or sooner, if you run git-gc with the appropriate
> options)
>
--
/Rubén
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected?
2011-03-05 11:41 ` Ruben Laguna
@ 2011-03-05 12:57 ` Andreas Schwab
2011-03-07 9:59 ` Ruben Laguna
0 siblings, 1 reply; 10+ messages in thread
From: Andreas Schwab @ 2011-03-05 12:57 UTC (permalink / raw)
To: Ruben Laguna; +Cc: Jonathan del Strother, git
Ruben Laguna <ruben.laguna@gmail.com> writes:
> also cloning the repo doesn't change a thing
>
> $ git clone en4j en4j_xx
> Cloning into en4j_xx...
> done.
> $ cd en4j_xx
> $ du -sh .git
> 87M .git
>
> any other idea?
Please use file://$PWD/en4j as URL, otherwise git clone just hard links
everything.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected?
2011-03-05 12:57 ` Andreas Schwab
@ 2011-03-07 9:59 ` Ruben Laguna
2011-03-08 21:25 ` Phillip Susi
0 siblings, 1 reply; 10+ messages in thread
From: Ruben Laguna @ 2011-03-07 9:59 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Jonathan del Strother, git
Cloning it that way didn't help either,
But I have more info
If I set a bare repo and push my four branches to it (master, develop,
gh-pages and experimental) the total size of the repo is 2.4MB
(instead of 87MB)
$ git init --bare en4j_xx
$ cd en4j
$ git checkout master
$ git push file://$PWD/../en4j_xx master
$ git checkout develop
$ git push file://$PWD/../en4j_xx develop
$ git checkout experimental
$ git push file://$PWD/../en4j_xx experimental
$ git checkout gh-pages
$ git push file://$PWD/../en4j_xx gh-pages
$ $ du -sh ../en4j_xx
2.3M ../en4j_xx
So, how can I find the contents present in en4j that are not present in en4j_xx?
On Sat, Mar 5, 2011 at 1:57 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Ruben Laguna <ruben.laguna@gmail.com> writes:
>
>> also cloning the repo doesn't change a thing
>>
>> $ git clone en4j en4j_xx
>> Cloning into en4j_xx...
>> done.
>> $ cd en4j_xx
>> $ du -sh .git
>> 87M .git
>>
>> any other idea?
>
> Please use file://$PWD/en4j as URL, otherwise git clone just hard links
> everything.
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
>
--
/Rubén
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected?
2011-03-07 9:59 ` Ruben Laguna
@ 2011-03-08 21:25 ` Phillip Susi
0 siblings, 0 replies; 10+ messages in thread
From: Phillip Susi @ 2011-03-08 21:25 UTC (permalink / raw)
To: Ruben Laguna; +Cc: Andreas Schwab, Jonathan del Strother, git
On 3/7/2011 4:59 AM, Ruben Laguna wrote:
> Cloning it that way didn't help either,
>
> But I have more info
>
> If I set a bare repo and push my four branches to it (master, develop,
> gh-pages and experimental) the total size of the repo is 2.4MB
> (instead of 87MB)
Then there are other branches using that space. Run git branch -a and
see what else is there.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected?
2011-03-05 10:05 ` why is my repo size ten times bigger than what I expected? Ruben Laguna
2011-03-05 10:49 ` Pascal Obry
2011-03-05 10:49 ` Jonathan del Strother
@ 2011-03-08 21:44 ` Tor Arvid Lund
2011-03-09 16:35 ` Ruben Laguna
2 siblings, 1 reply; 10+ messages in thread
From: Tor Arvid Lund @ 2011-03-08 21:44 UTC (permalink / raw)
To: Ruben Laguna; +Cc: git
On Sat, Mar 5, 2011 at 11:05 AM, Ruben Laguna <ruben.laguna@gmail.com> wrote:
> Hi,
>
> I had a repo which was big 143MB because it contained a bunch of jar
> files. So I decided to remove those completely from the history.
>
> In short I used the git-large-blob [1] to find all the jars and used
> the git-remove-history script [2] which does the filter-branch thing,
> prune, etc.
>
> I did this on all branches (that I know of) and now I can see that the
> jars are gone because I can't find them with git-large-blob. and the
> repo size has dropped from 143Mb to 87Mb.
I just thought I'd mention that the git-remove-history script that you
mention does filter-branch on HEAD, and not using the --all parameter.
I thought --all was the best way to "catch all" branches in one go...
-- Tor Arvid
> My concern is that 87Mb is still really big taking into account he
> size of the project. in fact if I run "git diff-tree -r -p $commit
> |wc -c" for each commit and sum all I get 5.5Mb.
>
>
> I also ran the git-rev-size [3] script that I found in this mailing
> list and I only see that the size grows steadly from commit to commit
> up to 1482731 bytes. So again how come the .git directory is 87MB?
>
>
> So, Can anybody tell me if this repository size is "normal" for a
> project with 1.4MB source and 352 commits?
> Is there a better way to calculate the size (in bytes) of each commit?
>
> Is there any other thing I could do to reduce and audit the repository size?
>
>
> Thanks in advance!
> Rubén
>
> ---
> [1] http://stackoverflow.com/questions/298314/find-files-in-git-repo-over-x-megabytes-that-dont-exist-in-head
> [2] http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/
> [3] http://markmail.org/message/762zzg5zckbiq2i7
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected?
2011-03-08 21:44 ` Tor Arvid Lund
@ 2011-03-09 16:35 ` Ruben Laguna
2011-03-09 21:06 ` Tor Arvid Lund
0 siblings, 1 reply; 10+ messages in thread
From: Ruben Laguna @ 2011-03-09 16:35 UTC (permalink / raw)
To: Tor Arvid Lund; +Cc: git
> I just thought I'd mention that the git-remove-history script that you
> mention does filter-branch on HEAD, and not using the --all parameter.
> I thought --all was the best way to "catch all" branches in one go...
>
> -- Tor Arvid
>
Much faster this way, thanks Tor,
But it still gives the same result 88MB
$ git branch -a
* develop
master
remotes/origin/HEAD -> origin/develop
remotes/origin/develop
remotes/origin/experimental
remotes/origin/gh-pages
remotes/origin/master
Finally I have deleted my public repo on github, created a new one and
pushed master and develop to the new empty one.
--
/Rubén
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected?
2011-03-09 16:35 ` Ruben Laguna
@ 2011-03-09 21:06 ` Tor Arvid Lund
0 siblings, 0 replies; 10+ messages in thread
From: Tor Arvid Lund @ 2011-03-09 21:06 UTC (permalink / raw)
To: Ruben Laguna; +Cc: git
On Wed, Mar 9, 2011 at 5:35 PM, Ruben Laguna <ruben.laguna@gmail.com> wrote:
>> I just thought I'd mention that the git-remove-history script that you
>> mention does filter-branch on HEAD, and not using the --all parameter.
>> I thought --all was the best way to "catch all" branches in one go...
>>
>> -- Tor Arvid
>>
>
> Much faster this way, thanks Tor,
>
> But it still gives the same result 88MB
>
>
> $ git branch -a
> * develop
> master
> remotes/origin/HEAD -> origin/develop
> remotes/origin/develop
> remotes/origin/experimental
> remotes/origin/gh-pages
> remotes/origin/master
>
> Finally I have deleted my public repo on github, created a new one and
> pushed master and develop to the new empty one.
Ah, that's why I got only 3.6M when i cloned just now ;)
FWIW (if you still want to figure it out...) - Whatever refs that your
origin branches point to - their history and objects will *not* get
deleted by git gc/prune/whatever. So if they point to commits which
have these big jars in the history, that may be the cause. Also, when
I do filter-branch, it saves the old refs in .git/refs/original so
that I can revert it all those times when I screw it up ;)
Basically - since your "new" repo is so small, there is something in
your original repo that refers to your large objects.
Have a good night.
-- Tor Arvid
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-03-09 21:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <AANLkTimi+OnpdX+Y7jx1JaOmGbZc_XEgJFeK0PKLpu2o@mail.gmail.com>
2011-03-05 10:05 ` why is my repo size ten times bigger than what I expected? Ruben Laguna
2011-03-05 10:49 ` Pascal Obry
2011-03-05 10:49 ` Jonathan del Strother
2011-03-05 11:41 ` Ruben Laguna
2011-03-05 12:57 ` Andreas Schwab
2011-03-07 9:59 ` Ruben Laguna
2011-03-08 21:25 ` Phillip Susi
2011-03-08 21:44 ` Tor Arvid Lund
2011-03-09 16:35 ` Ruben Laguna
2011-03-09 21:06 ` Tor Arvid Lund
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).