* why is my repo size ten times bigger than what I expected? [not found] <AANLkTimi+OnpdX+Y7jx1JaOmGbZc_XEgJFeK0PKLpu2o@mail.gmail.com> @ 2011-03-05 10:05 ` Ruben Laguna 2011-03-05 10:49 ` Pascal Obry ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Ruben Laguna @ 2011-03-05 10:05 UTC (permalink / raw) To: git Hi, I had a repo which was big 143MB because it contained a bunch of jar files. So I decided to remove those completely from the history. In short I used the git-large-blob [1] to find all the jars and used the git-remove-history script [2] which does the filter-branch thing, prune, etc. I did this on all branches (that I know of) and now I can see that the jars are gone because I can't find them with git-large-blob. and the repo size has dropped from 143Mb to 87Mb. My concern is that 87Mb is still really big taking into account he size of the project. in fact if I run "git diff-tree -r -p $commit |wc -c" for each commit and sum all I get 5.5Mb. I also ran the git-rev-size [3] script that I found in this mailing list and I only see that the size grows steadly from commit to commit up to 1482731 bytes. So again how come the .git directory is 87MB? So, Can anybody tell me if this repository size is "normal" for a project with 1.4MB source and 352 commits? Is there a better way to calculate the size (in bytes) of each commit? Is there any other thing I could do to reduce and audit the repository size? Thanks in advance! Rubén --- [1] http://stackoverflow.com/questions/298314/find-files-in-git-repo-over-x-megabytes-that-dont-exist-in-head [2] http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/ [3] http://markmail.org/message/762zzg5zckbiq2i7 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected? 2011-03-05 10:05 ` why is my repo size ten times bigger than what I expected? Ruben Laguna @ 2011-03-05 10:49 ` Pascal Obry 2011-03-05 10:49 ` Jonathan del Strother 2011-03-08 21:44 ` Tor Arvid Lund 2 siblings, 0 replies; 10+ messages in thread From: Pascal Obry @ 2011-03-05 10:49 UTC (permalink / raw) To: Ruben Laguna; +Cc: git Le 05/03/2011 11:05, Ruben Laguna a écrit : > Is there any other thing I could do to reduce and audit the repository size? $ git gc ? -- --|------------------------------------------------------ --| Pascal Obry Team-Ada Member --| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE --|------------------------------------------------------ --| http://www.obry.net - http://v2p.fr.eu.org --| "The best way to travel is by means of imagination" --| --| gpg --keyserver keys.gnupg.net --recv-key F949BD3B ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected? 2011-03-05 10:05 ` why is my repo size ten times bigger than what I expected? Ruben Laguna 2011-03-05 10:49 ` Pascal Obry @ 2011-03-05 10:49 ` Jonathan del Strother 2011-03-05 11:41 ` Ruben Laguna 2011-03-08 21:44 ` Tor Arvid Lund 2 siblings, 1 reply; 10+ messages in thread From: Jonathan del Strother @ 2011-03-05 10:49 UTC (permalink / raw) To: Ruben Laguna; +Cc: git On 5 March 2011 10:05, Ruben Laguna <ruben.laguna@gmail.com> wrote: > Hi, > > I had a repo which was big 143MB because it contained a bunch of jar > files. So I decided to remove those completely from the history. > > In short I used the git-large-blob [1] to find all the jars and used > the git-remove-history script [2] which does the filter-branch thing, > prune, etc. > > I did this on all branches (that I know of) and now I can see that the > jars are gone because I can't find them with git-large-blob. and the > repo size has dropped from 143Mb to 87Mb. > > My concern is that 87Mb is still really big taking into account he > size of the project. in fact if I run "git diff-tree -r -p $commit > |wc -c" for each commit and sum all I get 5.5Mb. > > > I also ran the git-rev-size [3] script that I found in this mailing > list and I only see that the size grows steadly from commit to commit > up to 1482731 bytes. So again how come the .git directory is 87MB? > > > So, Can anybody tell me if this repository size is "normal" for a > project with 1.4MB source and 352 commits? > Is there a better way to calculate the size (in bytes) of each commit? > > Is there any other thing I could do to reduce and audit the repository size? > > > Thanks in advance! > Rubén > > --- > [1] http://stackoverflow.com/questions/298314/find-files-in-git-repo-over-x-megabytes-that-dont-exist-in-head > [2] http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/ > [3] http://markmail.org/message/762zzg5zckbiq2i7 What happens if you clone that repo? git-gc will only pruned unused objects that're older than 2 weeks by default, so it's possible that your repo size will suddenly shrink in 2 weeks time (or sooner, if you run git-gc with the appropriate options) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected? 2011-03-05 10:49 ` Jonathan del Strother @ 2011-03-05 11:41 ` Ruben Laguna 2011-03-05 12:57 ` Andreas Schwab 0 siblings, 1 reply; 10+ messages in thread From: Ruben Laguna @ 2011-03-05 11:41 UTC (permalink / raw) To: Jonathan del Strother; +Cc: git well, the git-remove-history script does rm -rf .git/refs/original/ git reflog expire --expire=now --all git fsck --unreachable git gc --prune=now git gc --aggressive --prune=now after filter-branch so I don't think it's that. also cloning the repo doesn't change a thing $ git clone en4j en4j_xx Cloning into en4j_xx... done. $ cd en4j_xx $ du -sh .git 87M .git any other idea? On Sat, Mar 5, 2011 at 11:49 AM, Jonathan del Strother <maillist@steelskies.com> wrote: > On 5 March 2011 10:05, Ruben Laguna <ruben.laguna@gmail.com> wrote: >> Hi, >> >> I had a repo which was big 143MB because it contained a bunch of jar >> files. So I decided to remove those completely from the history. >> >> In short I used the git-large-blob [1] to find all the jars and used >> the git-remove-history script [2] which does the filter-branch thing, >> prune, etc. >> >> I did this on all branches (that I know of) and now I can see that the >> jars are gone because I can't find them with git-large-blob. and the >> repo size has dropped from 143Mb to 87Mb. >> >> My concern is that 87Mb is still really big taking into account he >> size of the project. in fact if I run "git diff-tree -r -p $commit >> |wc -c" for each commit and sum all I get 5.5Mb. >> >> >> I also ran the git-rev-size [3] script that I found in this mailing >> list and I only see that the size grows steadly from commit to commit >> up to 1482731 bytes. So again how come the .git directory is 87MB? >> >> >> So, Can anybody tell me if this repository size is "normal" for a >> project with 1.4MB source and 352 commits? >> Is there a better way to calculate the size (in bytes) of each commit? >> >> Is there any other thing I could do to reduce and audit the repository size? >> >> >> Thanks in advance! >> Rubén >> >> --- >> [1] http://stackoverflow.com/questions/298314/find-files-in-git-repo-over-x-megabytes-that-dont-exist-in-head >> [2] http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/ >> [3] http://markmail.org/message/762zzg5zckbiq2i7 > > What happens if you clone that repo? > git-gc will only pruned unused objects that're older than 2 weeks by > default, so it's possible that your repo size will suddenly shrink in > 2 weeks time (or sooner, if you run git-gc with the appropriate > options) > -- /Rubén ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected? 2011-03-05 11:41 ` Ruben Laguna @ 2011-03-05 12:57 ` Andreas Schwab 2011-03-07 9:59 ` Ruben Laguna 0 siblings, 1 reply; 10+ messages in thread From: Andreas Schwab @ 2011-03-05 12:57 UTC (permalink / raw) To: Ruben Laguna; +Cc: Jonathan del Strother, git Ruben Laguna <ruben.laguna@gmail.com> writes: > also cloning the repo doesn't change a thing > > $ git clone en4j en4j_xx > Cloning into en4j_xx... > done. > $ cd en4j_xx > $ du -sh .git > 87M .git > > any other idea? Please use file://$PWD/en4j as URL, otherwise git clone just hard links everything. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected? 2011-03-05 12:57 ` Andreas Schwab @ 2011-03-07 9:59 ` Ruben Laguna 2011-03-08 21:25 ` Phillip Susi 0 siblings, 1 reply; 10+ messages in thread From: Ruben Laguna @ 2011-03-07 9:59 UTC (permalink / raw) To: Andreas Schwab; +Cc: Jonathan del Strother, git Cloning it that way didn't help either, But I have more info If I set a bare repo and push my four branches to it (master, develop, gh-pages and experimental) the total size of the repo is 2.4MB (instead of 87MB) $ git init --bare en4j_xx $ cd en4j $ git checkout master $ git push file://$PWD/../en4j_xx master $ git checkout develop $ git push file://$PWD/../en4j_xx develop $ git checkout experimental $ git push file://$PWD/../en4j_xx experimental $ git checkout gh-pages $ git push file://$PWD/../en4j_xx gh-pages $ $ du -sh ../en4j_xx 2.3M ../en4j_xx So, how can I find the contents present in en4j that are not present in en4j_xx? On Sat, Mar 5, 2011 at 1:57 PM, Andreas Schwab <schwab@linux-m68k.org> wrote: > Ruben Laguna <ruben.laguna@gmail.com> writes: > >> also cloning the repo doesn't change a thing >> >> $ git clone en4j en4j_xx >> Cloning into en4j_xx... >> done. >> $ cd en4j_xx >> $ du -sh .git >> 87M .git >> >> any other idea? > > Please use file://$PWD/en4j as URL, otherwise git clone just hard links > everything. > > Andreas. > > -- > Andreas Schwab, schwab@linux-m68k.org > GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 > "And now for something completely different." > -- /Rubén ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected? 2011-03-07 9:59 ` Ruben Laguna @ 2011-03-08 21:25 ` Phillip Susi 0 siblings, 0 replies; 10+ messages in thread From: Phillip Susi @ 2011-03-08 21:25 UTC (permalink / raw) To: Ruben Laguna; +Cc: Andreas Schwab, Jonathan del Strother, git On 3/7/2011 4:59 AM, Ruben Laguna wrote: > Cloning it that way didn't help either, > > But I have more info > > If I set a bare repo and push my four branches to it (master, develop, > gh-pages and experimental) the total size of the repo is 2.4MB > (instead of 87MB) Then there are other branches using that space. Run git branch -a and see what else is there. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected? 2011-03-05 10:05 ` why is my repo size ten times bigger than what I expected? Ruben Laguna 2011-03-05 10:49 ` Pascal Obry 2011-03-05 10:49 ` Jonathan del Strother @ 2011-03-08 21:44 ` Tor Arvid Lund 2011-03-09 16:35 ` Ruben Laguna 2 siblings, 1 reply; 10+ messages in thread From: Tor Arvid Lund @ 2011-03-08 21:44 UTC (permalink / raw) To: Ruben Laguna; +Cc: git On Sat, Mar 5, 2011 at 11:05 AM, Ruben Laguna <ruben.laguna@gmail.com> wrote: > Hi, > > I had a repo which was big 143MB because it contained a bunch of jar > files. So I decided to remove those completely from the history. > > In short I used the git-large-blob [1] to find all the jars and used > the git-remove-history script [2] which does the filter-branch thing, > prune, etc. > > I did this on all branches (that I know of) and now I can see that the > jars are gone because I can't find them with git-large-blob. and the > repo size has dropped from 143Mb to 87Mb. I just thought I'd mention that the git-remove-history script that you mention does filter-branch on HEAD, and not using the --all parameter. I thought --all was the best way to "catch all" branches in one go... -- Tor Arvid > My concern is that 87Mb is still really big taking into account he > size of the project. in fact if I run "git diff-tree -r -p $commit > |wc -c" for each commit and sum all I get 5.5Mb. > > > I also ran the git-rev-size [3] script that I found in this mailing > list and I only see that the size grows steadly from commit to commit > up to 1482731 bytes. So again how come the .git directory is 87MB? > > > So, Can anybody tell me if this repository size is "normal" for a > project with 1.4MB source and 352 commits? > Is there a better way to calculate the size (in bytes) of each commit? > > Is there any other thing I could do to reduce and audit the repository size? > > > Thanks in advance! > Rubén > > --- > [1] http://stackoverflow.com/questions/298314/find-files-in-git-repo-over-x-megabytes-that-dont-exist-in-head > [2] http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/ > [3] http://markmail.org/message/762zzg5zckbiq2i7 > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected? 2011-03-08 21:44 ` Tor Arvid Lund @ 2011-03-09 16:35 ` Ruben Laguna 2011-03-09 21:06 ` Tor Arvid Lund 0 siblings, 1 reply; 10+ messages in thread From: Ruben Laguna @ 2011-03-09 16:35 UTC (permalink / raw) To: Tor Arvid Lund; +Cc: git > I just thought I'd mention that the git-remove-history script that you > mention does filter-branch on HEAD, and not using the --all parameter. > I thought --all was the best way to "catch all" branches in one go... > > -- Tor Arvid > Much faster this way, thanks Tor, But it still gives the same result 88MB $ git branch -a * develop master remotes/origin/HEAD -> origin/develop remotes/origin/develop remotes/origin/experimental remotes/origin/gh-pages remotes/origin/master Finally I have deleted my public repo on github, created a new one and pushed master and develop to the new empty one. -- /Rubén ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: why is my repo size ten times bigger than what I expected? 2011-03-09 16:35 ` Ruben Laguna @ 2011-03-09 21:06 ` Tor Arvid Lund 0 siblings, 0 replies; 10+ messages in thread From: Tor Arvid Lund @ 2011-03-09 21:06 UTC (permalink / raw) To: Ruben Laguna; +Cc: git On Wed, Mar 9, 2011 at 5:35 PM, Ruben Laguna <ruben.laguna@gmail.com> wrote: >> I just thought I'd mention that the git-remove-history script that you >> mention does filter-branch on HEAD, and not using the --all parameter. >> I thought --all was the best way to "catch all" branches in one go... >> >> -- Tor Arvid >> > > Much faster this way, thanks Tor, > > But it still gives the same result 88MB > > > $ git branch -a > * develop > master > remotes/origin/HEAD -> origin/develop > remotes/origin/develop > remotes/origin/experimental > remotes/origin/gh-pages > remotes/origin/master > > Finally I have deleted my public repo on github, created a new one and > pushed master and develop to the new empty one. Ah, that's why I got only 3.6M when i cloned just now ;) FWIW (if you still want to figure it out...) - Whatever refs that your origin branches point to - their history and objects will *not* get deleted by git gc/prune/whatever. So if they point to commits which have these big jars in the history, that may be the cause. Also, when I do filter-branch, it saves the old refs in .git/refs/original so that I can revert it all those times when I screw it up ;) Basically - since your "new" repo is so small, there is something in your original repo that refers to your large objects. Have a good night. -- Tor Arvid ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-03-09 21:07 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <AANLkTimi+OnpdX+Y7jx1JaOmGbZc_XEgJFeK0PKLpu2o@mail.gmail.com> 2011-03-05 10:05 ` why is my repo size ten times bigger than what I expected? Ruben Laguna 2011-03-05 10:49 ` Pascal Obry 2011-03-05 10:49 ` Jonathan del Strother 2011-03-05 11:41 ` Ruben Laguna 2011-03-05 12:57 ` Andreas Schwab 2011-03-07 9:59 ` Ruben Laguna 2011-03-08 21:25 ` Phillip Susi 2011-03-08 21:44 ` Tor Arvid Lund 2011-03-09 16:35 ` Ruben Laguna 2011-03-09 21:06 ` Tor Arvid Lund
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).