* object/pack size x5 larger than a fresh clone? @ 2010-07-24 21:57 Hin-Tak Leung 2010-07-26 8:09 ` Andreas Ericsson 0 siblings, 1 reply; 6+ messages in thread From: Hin-Tak Leung @ 2010-07-24 21:57 UTC (permalink / raw) To: git Is there any reason why a fresh git clone has a object pack around 140MB but one that has been updated over the years has it over 700MB? (even with git gc --aggressive --prune=now and git fsck?) $ du .git/objects/ 711364 .git/objects/pack $ du *wine/.git/objects/pack 144692 git-wine/.git/objects/pack 144604 wine/.git/objects/pack I had a problem with git fetch "Cannot obtain needed object" from wine's git repository (which seems to be something to do with http proxy, although AFAIK I don't have one) since about 2 weeks ago which obviously does not apply to anybody else as I would have heard from wine-devel. Editing .git/config to switch from a http url to git url cure it... but in the course of investigating, I git clone fresh (there are only about 3 local changes so I could just git-format-patch them and move them) http://source.winehq.org/git/wine.git git://source.winehq.org/git/wine.git and I am a bit surprised that the new clones are so much smaller than the one I have been working on these last few years. (I have had the old one for at least 3-4 years). ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: object/pack size x5 larger than a fresh clone? 2010-07-24 21:57 object/pack size x5 larger than a fresh clone? Hin-Tak Leung @ 2010-07-26 8:09 ` Andreas Ericsson 2010-07-26 18:42 ` Hin-Tak Leung 0 siblings, 1 reply; 6+ messages in thread From: Andreas Ericsson @ 2010-07-26 8:09 UTC (permalink / raw) To: Hin-Tak Leung; +Cc: git On 07/24/2010 11:57 PM, Hin-Tak Leung wrote: > Is there any reason why a fresh git clone has a object pack around > 140MB but one that has been updated over the years has it over 700MB? > (even with git gc --aggressive --prune=now and git fsck?) > > $ du .git/objects/ > 711364 .git/objects/pack > > $ du *wine/.git/objects/pack > 144692 git-wine/.git/objects/pack > 144604 wine/.git/objects/pack > > I had a problem with git fetch "Cannot obtain needed object" from > wine's git repository (which seems to be something to do with http > proxy, although AFAIK I don't have one) since about 2 weeks ago which > obviously does not apply to anybody else as I would have heard from > wine-devel. > > Editing .git/config to switch from a http url to git url cure it... > but in the course of investigating, I git clone fresh (there are only > about 3 local changes so I could just git-format-patch them and move > them) > > http://source.winehq.org/git/wine.git > git://source.winehq.org/git/wine.git > > and I am a bit surprised that the new clones are so much smaller than > the one I have been working on these last few years. (I have had the > old one for at least 3-4 years). To make a fair comparison, try git repack -a -f -d && git prune --expire=now in your old repository. Be warned that this will remove all commits reachable from reflogs but not from branch heads or tags though. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: object/pack size x5 larger than a fresh clone? 2010-07-26 8:09 ` Andreas Ericsson @ 2010-07-26 18:42 ` Hin-Tak Leung 2010-07-27 16:57 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Hin-Tak Leung @ 2010-07-26 18:42 UTC (permalink / raw) To: Andreas Ericsson; +Cc: git On 7/26/10, Andreas Ericsson <ae@op5.se> wrote: > On 07/24/2010 11:57 PM, Hin-Tak Leung wrote: >> Is there any reason why a fresh git clone has a object pack around >> 140MB but one that has been updated over the years has it over 700MB? >> (even with git gc --aggressive --prune=now and git fsck?) >> >> $ du .git/objects/ >> 711364 .git/objects/pack >> >> $ du *wine/.git/objects/pack >> 144692 git-wine/.git/objects/pack >> 144604 wine/.git/objects/pack >> >> I had a problem with git fetch "Cannot obtain needed object" from >> wine's git repository (which seems to be something to do with http >> proxy, although AFAIK I don't have one) since about 2 weeks ago which >> obviously does not apply to anybody else as I would have heard from >> wine-devel. >> >> Editing .git/config to switch from a http url to git url cure it... >> but in the course of investigating, I git clone fresh (there are only >> about 3 local changes so I could just git-format-patch them and move >> them) >> >> http://source.winehq.org/git/wine.git >> git://source.winehq.org/git/wine.git >> >> and I am a bit surprised that the new clones are so much smaller than >> the one I have been working on these last few years. (I have had the >> old one for at least 3-4 years). > > To make a fair comparison, try > git repack -a -f -d && git prune --expire=now > > in your old repository. Be warned that this will remove all commits > reachable from reflogs but not from branch heads or tags though. I have tried as you said, and it has gone slightly worse - $ du .git/objects/ 741172 .git/objects/pack 16 .git/objects/info 741196 .git/objects/ However, I think I found one very big anormaly - in most of my git clones (I have ~20 projects I track) , .git/object/pack consists of pairs of files like this: pack-<sha1>.idx pack-<sha1>. pack with the occasional 3rd member, "pack-<sha1>. keep" . I did not look before I did the above, but now the strange repository consists of one such pair to about 147MB, which has a very new time stamp, but a lot of singular pack-<sha1>.idx without a corresponding pack-<sha1>.pack , and some of them quite large, and many of them has time stamps going back to Mar 2008. I have got almost 500 of them, and they vary from about 1.2k to 12MB, so it adds up to over 550MB. So I guess these *.idx without a corresponding *.pack are safe to delete? But git gc or one of the other house keeping commands should get rid of them though, I think. Should I file a bug etc for this? Thanks a lot any how. Hin-Tak ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: object/pack size x5 larger than a fresh clone? 2010-07-26 18:42 ` Hin-Tak Leung @ 2010-07-27 16:57 ` Junio C Hamano 2010-07-27 17:03 ` Shawn O. Pearce 0 siblings, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2010-07-27 16:57 UTC (permalink / raw) To: Hin-Tak Leung; +Cc: Andreas Ericsson, git Hin-Tak Leung <hintak.leung@gmail.com> writes: > So I guess these *.idx without a corresponding *.pack are safe to > delete? But git gc or one of the other house keeping commands should > get rid of them though, I think. I agree. I think the dumb transports like http:// grab *.idx files without downloading corresponding *.pack files when they encounter an object that is not found loose in the originating repository to see which packfile to fetch, but after they are done (or when they are interrupted, for that matter), these *.idx files may not be getting garbage-collected. And they should be, perhaps with or without some grace period (I don't know which offhand---I didn't think this through). ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: object/pack size x5 larger than a fresh clone? 2010-07-27 16:57 ` Junio C Hamano @ 2010-07-27 17:03 ` Shawn O. Pearce 2010-07-27 21:15 ` Hin-Tak Leung 0 siblings, 1 reply; 6+ messages in thread From: Shawn O. Pearce @ 2010-07-27 17:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: Hin-Tak Leung, Andreas Ericsson, git Junio C Hamano <gitster@pobox.com> wrote: > Hin-Tak Leung <hintak.leung@gmail.com> writes: > > > So I guess these *.idx without a corresponding *.pack are safe to > > delete? But git gc or one of the other house keeping commands should > > get rid of them though, I think. > > I agree. I think the dumb transports like http:// grab *.idx files > without downloading corresponding *.pack files when they encounter an > object that is not found loose in the originating repository to see which > packfile to fetch, but after they are done (or when they are interrupted, > for that matter), these *.idx files may not be getting garbage-collected. > > And they should be, perhaps with or without some grace period (I don't > know which offhand---I didn't think this through). We should GC these, but only after a grace period. Long ago when I used dumb http it really helped to have the *.idx files cached. If the upstream only did an incremental repack holding onto the *.idx files locally meant I didn't need to redownload them in order to rule-out those packs as onces interesting for the current fetch. Maybe we just prune those during git fetch if they don't have a local *.pack and they don't match a pack listed by the remote's objects/info/packs file? -- Shawn. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: object/pack size x5 larger than a fresh clone? 2010-07-27 17:03 ` Shawn O. Pearce @ 2010-07-27 21:15 ` Hin-Tak Leung 0 siblings, 0 replies; 6+ messages in thread From: Hin-Tak Leung @ 2010-07-27 21:15 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Junio C Hamano, Andreas Ericsson, git On 7/27/10, Shawn O. Pearce <spearce@spearce.org> wrote: > Junio C Hamano <gitster@pobox.com> wrote: >> Hin-Tak Leung <hintak.leung@gmail.com> writes: >> >> > So I guess these *.idx without a corresponding *.pack are safe to >> > delete? But git gc or one of the other house keeping commands should >> > get rid of them though, I think. >> >> I agree. I think the dumb transports like http:// grab *.idx files >> without downloading corresponding *.pack files when they encounter an >> object that is not found loose in the originating repository to see which >> packfile to fetch, but after they are done (or when they are interrupted, >> for that matter), these *.idx files may not be getting garbage-collected. >> >> And they should be, perhaps with or without some grace period (I don't >> know which offhand---I didn't think this through). > > We should GC these, but only after a grace period. > > Long ago when I used dumb http it really helped to have the *.idx > files cached. If the upstream only did an incremental repack holding > onto the *.idx files locally meant I didn't need to redownload > them in order to rule-out those packs as onces interesting for the > current fetch. > > Maybe we just prune those during git fetch if they don't have a > local *.pack and they don't match a pack listed by the remote's > objects/info/packs file? Okay, so they are left-overs from using http:// for fetching but serves a useful purpose for a limited period. The usual gc --prune defaults to 2 weeks, is that good enough? Or should there be a longer grace period? I only switch over to git:// these last few days after it failing to fetch for two weeks and it looks like only I have this problem and around the web, failure-to-fetch seems to indicate an http proxy problem. FWIW, my left-over files seems to co-incide with the 2-3 week snapshot release schedule of wine. I don't know if any of you is familiar with wine, but only one person (AJ) has commit rights and he reviews all patches and does a periodic push, usually just before or after a snapshot release but occasionally more frequent. My left-overs seems to co-incide with the first fetch after such a push. My most recent "permanent" fetch failure is due to the long-awaited wine 1.2 release, I think. Thanks for all the insights, and I hope a future git release will prune some of these left-over files after a period. Hin-Tak ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-07-27 21:15 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-24 21:57 object/pack size x5 larger than a fresh clone? Hin-Tak Leung 2010-07-26 8:09 ` Andreas Ericsson 2010-07-26 18:42 ` Hin-Tak Leung 2010-07-27 16:57 ` Junio C Hamano 2010-07-27 17:03 ` Shawn O. Pearce 2010-07-27 21:15 ` Hin-Tak Leung
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).