* object/pack size x5 larger than a fresh clone?
@ 2010-07-24 21:57 Hin-Tak Leung
2010-07-26 8:09 ` Andreas Ericsson
0 siblings, 1 reply; 6+ messages in thread
From: Hin-Tak Leung @ 2010-07-24 21:57 UTC (permalink / raw)
To: git
Is there any reason why a fresh git clone has a object pack around
140MB but one that has been updated over the years has it over 700MB?
(even with git gc --aggressive --prune=now and git fsck?)
$ du .git/objects/
711364 .git/objects/pack
$ du *wine/.git/objects/pack
144692 git-wine/.git/objects/pack
144604 wine/.git/objects/pack
I had a problem with git fetch "Cannot obtain needed object" from
wine's git repository (which seems to be something to do with http
proxy, although AFAIK I don't have one) since about 2 weeks ago which
obviously does not apply to anybody else as I would have heard from
wine-devel.
Editing .git/config to switch from a http url to git url cure it...
but in the course of investigating, I git clone fresh (there are only
about 3 local changes so I could just git-format-patch them and move
them)
http://source.winehq.org/git/wine.git
git://source.winehq.org/git/wine.git
and I am a bit surprised that the new clones are so much smaller than
the one I have been working on these last few years. (I have had the
old one for at least 3-4 years).
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: object/pack size x5 larger than a fresh clone?
2010-07-24 21:57 object/pack size x5 larger than a fresh clone? Hin-Tak Leung
@ 2010-07-26 8:09 ` Andreas Ericsson
2010-07-26 18:42 ` Hin-Tak Leung
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Ericsson @ 2010-07-26 8:09 UTC (permalink / raw)
To: Hin-Tak Leung; +Cc: git
On 07/24/2010 11:57 PM, Hin-Tak Leung wrote:
> Is there any reason why a fresh git clone has a object pack around
> 140MB but one that has been updated over the years has it over 700MB?
> (even with git gc --aggressive --prune=now and git fsck?)
>
> $ du .git/objects/
> 711364 .git/objects/pack
>
> $ du *wine/.git/objects/pack
> 144692 git-wine/.git/objects/pack
> 144604 wine/.git/objects/pack
>
> I had a problem with git fetch "Cannot obtain needed object" from
> wine's git repository (which seems to be something to do with http
> proxy, although AFAIK I don't have one) since about 2 weeks ago which
> obviously does not apply to anybody else as I would have heard from
> wine-devel.
>
> Editing .git/config to switch from a http url to git url cure it...
> but in the course of investigating, I git clone fresh (there are only
> about 3 local changes so I could just git-format-patch them and move
> them)
>
> http://source.winehq.org/git/wine.git
> git://source.winehq.org/git/wine.git
>
> and I am a bit surprised that the new clones are so much smaller than
> the one I have been working on these last few years. (I have had the
> old one for at least 3-4 years).
To make a fair comparison, try
git repack -a -f -d && git prune --expire=now
in your old repository. Be warned that this will remove all commits
reachable from reflogs but not from branch heads or tags though.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: object/pack size x5 larger than a fresh clone?
2010-07-26 8:09 ` Andreas Ericsson
@ 2010-07-26 18:42 ` Hin-Tak Leung
2010-07-27 16:57 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Hin-Tak Leung @ 2010-07-26 18:42 UTC (permalink / raw)
To: Andreas Ericsson; +Cc: git
On 7/26/10, Andreas Ericsson <ae@op5.se> wrote:
> On 07/24/2010 11:57 PM, Hin-Tak Leung wrote:
>> Is there any reason why a fresh git clone has a object pack around
>> 140MB but one that has been updated over the years has it over 700MB?
>> (even with git gc --aggressive --prune=now and git fsck?)
>>
>> $ du .git/objects/
>> 711364 .git/objects/pack
>>
>> $ du *wine/.git/objects/pack
>> 144692 git-wine/.git/objects/pack
>> 144604 wine/.git/objects/pack
>>
>> I had a problem with git fetch "Cannot obtain needed object" from
>> wine's git repository (which seems to be something to do with http
>> proxy, although AFAIK I don't have one) since about 2 weeks ago which
>> obviously does not apply to anybody else as I would have heard from
>> wine-devel.
>>
>> Editing .git/config to switch from a http url to git url cure it...
>> but in the course of investigating, I git clone fresh (there are only
>> about 3 local changes so I could just git-format-patch them and move
>> them)
>>
>> http://source.winehq.org/git/wine.git
>> git://source.winehq.org/git/wine.git
>>
>> and I am a bit surprised that the new clones are so much smaller than
>> the one I have been working on these last few years. (I have had the
>> old one for at least 3-4 years).
>
> To make a fair comparison, try
> git repack -a -f -d && git prune --expire=now
>
> in your old repository. Be warned that this will remove all commits
> reachable from reflogs but not from branch heads or tags though.
I have tried as you said, and it has gone slightly worse -
$ du .git/objects/
741172 .git/objects/pack
16 .git/objects/info
741196 .git/objects/
However, I think I found one very big anormaly - in most of my git
clones (I have ~20 projects I track) , .git/object/pack consists of
pairs of files like this:
pack-<sha1>.idx
pack-<sha1>. pack
with the occasional 3rd member, "pack-<sha1>. keep" . I did not look
before I did the above, but now the strange repository consists of one
such pair to about 147MB, which has a very new time stamp, but a lot
of singular pack-<sha1>.idx without a corresponding pack-<sha1>.pack ,
and some of them quite large, and many of them has time stamps going
back to Mar 2008. I have got almost 500 of them, and they vary from
about 1.2k to 12MB, so it adds up to over 550MB.
So I guess these *.idx without a corresponding *.pack are safe to
delete? But git gc or one of the other house keeping commands should
get rid of them though, I think.
Should I file a bug etc for this? Thanks a lot any how.
Hin-Tak
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: object/pack size x5 larger than a fresh clone?
2010-07-26 18:42 ` Hin-Tak Leung
@ 2010-07-27 16:57 ` Junio C Hamano
2010-07-27 17:03 ` Shawn O. Pearce
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2010-07-27 16:57 UTC (permalink / raw)
To: Hin-Tak Leung; +Cc: Andreas Ericsson, git
Hin-Tak Leung <hintak.leung@gmail.com> writes:
> So I guess these *.idx without a corresponding *.pack are safe to
> delete? But git gc or one of the other house keeping commands should
> get rid of them though, I think.
I agree. I think the dumb transports like http:// grab *.idx files
without downloading corresponding *.pack files when they encounter an
object that is not found loose in the originating repository to see which
packfile to fetch, but after they are done (or when they are interrupted,
for that matter), these *.idx files may not be getting garbage-collected.
And they should be, perhaps with or without some grace period (I don't
know which offhand---I didn't think this through).
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: object/pack size x5 larger than a fresh clone?
2010-07-27 16:57 ` Junio C Hamano
@ 2010-07-27 17:03 ` Shawn O. Pearce
2010-07-27 21:15 ` Hin-Tak Leung
0 siblings, 1 reply; 6+ messages in thread
From: Shawn O. Pearce @ 2010-07-27 17:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Hin-Tak Leung, Andreas Ericsson, git
Junio C Hamano <gitster@pobox.com> wrote:
> Hin-Tak Leung <hintak.leung@gmail.com> writes:
>
> > So I guess these *.idx without a corresponding *.pack are safe to
> > delete? But git gc or one of the other house keeping commands should
> > get rid of them though, I think.
>
> I agree. I think the dumb transports like http:// grab *.idx files
> without downloading corresponding *.pack files when they encounter an
> object that is not found loose in the originating repository to see which
> packfile to fetch, but after they are done (or when they are interrupted,
> for that matter), these *.idx files may not be getting garbage-collected.
>
> And they should be, perhaps with or without some grace period (I don't
> know which offhand---I didn't think this through).
We should GC these, but only after a grace period.
Long ago when I used dumb http it really helped to have the *.idx
files cached. If the upstream only did an incremental repack holding
onto the *.idx files locally meant I didn't need to redownload
them in order to rule-out those packs as onces interesting for the
current fetch.
Maybe we just prune those during git fetch if they don't have a
local *.pack and they don't match a pack listed by the remote's
objects/info/packs file?
--
Shawn.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: object/pack size x5 larger than a fresh clone?
2010-07-27 17:03 ` Shawn O. Pearce
@ 2010-07-27 21:15 ` Hin-Tak Leung
0 siblings, 0 replies; 6+ messages in thread
From: Hin-Tak Leung @ 2010-07-27 21:15 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Junio C Hamano, Andreas Ericsson, git
On 7/27/10, Shawn O. Pearce <spearce@spearce.org> wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
>> Hin-Tak Leung <hintak.leung@gmail.com> writes:
>>
>> > So I guess these *.idx without a corresponding *.pack are safe to
>> > delete? But git gc or one of the other house keeping commands should
>> > get rid of them though, I think.
>>
>> I agree. I think the dumb transports like http:// grab *.idx files
>> without downloading corresponding *.pack files when they encounter an
>> object that is not found loose in the originating repository to see which
>> packfile to fetch, but after they are done (or when they are interrupted,
>> for that matter), these *.idx files may not be getting garbage-collected.
>>
>> And they should be, perhaps with or without some grace period (I don't
>> know which offhand---I didn't think this through).
>
> We should GC these, but only after a grace period.
>
> Long ago when I used dumb http it really helped to have the *.idx
> files cached. If the upstream only did an incremental repack holding
> onto the *.idx files locally meant I didn't need to redownload
> them in order to rule-out those packs as onces interesting for the
> current fetch.
>
> Maybe we just prune those during git fetch if they don't have a
> local *.pack and they don't match a pack listed by the remote's
> objects/info/packs file?
Okay, so they are left-overs from using http:// for fetching but
serves a useful purpose for a limited period. The usual gc --prune
defaults to 2 weeks, is that good enough? Or should there be a longer
grace period?
I only switch over to git:// these last few days after it failing to
fetch for two weeks and it looks like only I have this problem and
around the web, failure-to-fetch seems to indicate an http proxy
problem.
FWIW, my left-over files seems to co-incide with the 2-3 week snapshot
release schedule of wine. I don't know if any of you is familiar with
wine, but only one person (AJ) has commit rights and he reviews all
patches and does a periodic push, usually just before or after a
snapshot release but occasionally more frequent. My left-overs seems
to co-incide with the first fetch after such a push. My most recent
"permanent" fetch failure is due to the long-awaited wine 1.2 release,
I think.
Thanks for all the insights, and I hope a future git release will
prune some of these left-over files after a period.
Hin-Tak
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-07-27 21:15 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-24 21:57 object/pack size x5 larger than a fresh clone? Hin-Tak Leung
2010-07-26 8:09 ` Andreas Ericsson
2010-07-26 18:42 ` Hin-Tak Leung
2010-07-27 16:57 ` Junio C Hamano
2010-07-27 17:03 ` Shawn O. Pearce
2010-07-27 21:15 ` Hin-Tak Leung
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).